1. 12 Sep, 2024 3 commits
  2. 11 Sep, 2024 15 commits
  3. 10 Sep, 2024 9 commits
  4. 09 Sep, 2024 13 commits
    • Benjamin Poirier's avatar
      net/mlx5: Fix bridge mode operations when there are no VFs · b1d305ab
      Benjamin Poirier authored
      Currently, trying to set the bridge mode attribute when numvfs=0 leads to a
      crash:
      
      bridge link set dev eth2 hwmode vepa
      
      [  168.967392] BUG: kernel NULL pointer dereference, address: 0000000000000030
      [...]
      [  168.969989] RIP: 0010:mlx5_add_flow_rules+0x1f/0x300 [mlx5_core]
      [...]
      [  168.976037] Call Trace:
      [  168.976188]  <TASK>
      [  168.978620]  _mlx5_eswitch_set_vepa_locked+0x113/0x230 [mlx5_core]
      [  168.979074]  mlx5_eswitch_set_vepa+0x7f/0xa0 [mlx5_core]
      [  168.979471]  rtnl_bridge_setlink+0xe9/0x1f0
      [  168.979714]  rtnetlink_rcv_msg+0x159/0x400
      [  168.980451]  netlink_rcv_skb+0x54/0x100
      [  168.980675]  netlink_unicast+0x241/0x360
      [  168.980918]  netlink_sendmsg+0x1f6/0x430
      [  168.981162]  ____sys_sendmsg+0x3bb/0x3f0
      [  168.982155]  ___sys_sendmsg+0x88/0xd0
      [  168.985036]  __sys_sendmsg+0x59/0xa0
      [  168.985477]  do_syscall_64+0x79/0x150
      [  168.987273]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
      [  168.987773] RIP: 0033:0x7f8f7950f917
      
      (esw->fdb_table.legacy.vepa_fdb is null)
      
      The bridge mode is only relevant when there are multiple functions per
      port. Therefore, prevent setting and getting this setting when there are no
      VFs.
      
      Note that after this change, there are no settings to change on the PF
      interface using `bridge link` when there are no VFs, so the interface no
      longer appears in the `bridge link` output.
      
      Fixes: 4b89251d ("net/mlx5: Support ndo bridge_setlink and getlink")
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@nvidia.com>
      Reviewed-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b1d305ab
    • Carolina Jubran's avatar
      net/mlx5: Verify support for scheduling element and TSAR type · 861cd9b9
      Carolina Jubran authored
      Before creating a scheduling element in a NIC or E-Switch scheduler,
      ensure that the requested element type is supported. If the element is
      of type Transmit Scheduling Arbiter (TSAR), also verify that the
      specific TSAR type is supported.
      
      Fixes: 214baf22 ("net/mlx5e: Support HTB offload")
      Fixes: 85c5f7c9 ("net/mlx5: E-switch, Create QoS on demand")
      Fixes: 0fe132ea ("net/mlx5: E-switch, Allow to add vports to rate groups")
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Reviewed-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      861cd9b9
    • Carolina Jubran's avatar
      net/mlx5: Add missing masks and QoS bit masks for scheduling elements · 452ef7f8
      Carolina Jubran authored
      Add the missing masks for supported element types and Transmit
      Scheduling Arbiter (TSAR) types in scheduling elements.
      
      Also, add the corresponding bit masks for these types in the QoS
      capabilities of a NIC scheduler.
      
      Fixes: 214baf22 ("net/mlx5e: Support HTB offload")
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Reviewed-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      452ef7f8
    • Carolina Jubran's avatar
      net/mlx5: Explicitly set scheduling element and TSAR type · c88146ab
      Carolina Jubran authored
      Ensure the scheduling element type and TSAR type are explicitly
      initialized in the QoS rate group creation.
      
      This prevents potential issues due to default values.
      
      Fixes: 1ae258f8 ("net/mlx5: E-switch, Introduce rate limiting groups API")
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Reviewed-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c88146ab
    • Shahar Shitrit's avatar
      net/mlx5e: Add missing link mode to ptys2ext_ethtool_map · 80bf4742
      Shahar Shitrit authored
      Add MLX5E_400GAUI_8_400GBASE_CR8 to the extended modes
      in ptys2ext_ethtool_table, since it was missing.
      
      Fixes: 6a897372 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes")
      Signed-off-by: default avatarShahar Shitrit <shshitrit@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      80bf4742
    • Shahar Shitrit's avatar
      net/mlx5e: Add missing link modes to ptys2ethtool_map · 7617d62c
      Shahar Shitrit authored
      Add MLX5E_1000BASE_T and MLX5E_100BASE_TX to the legacy
      modes in ptys2legacy_ethtool_table, since they were missing.
      
      Fixes: 665bc539 ("net/mlx5e: Use new ethtool get/set link ksettings API")
      Signed-off-by: default avatarShahar Shitrit <shshitrit@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7617d62c
    • Maher Sanalla's avatar
      net/mlx5: Update the list of the PCI supported devices · 7472d157
      Maher Sanalla authored
      Add the upcoming ConnectX-9 device ID to the table of supported
      PCI device IDs.
      
      Fixes: f908a35b ("net/mlx5: Update the list of the PCI supported devices")
      Signed-off-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7472d157
    • Sriram Yagnaraman's avatar
      igb: Always call igb_xdp_ring_update_tail() under Tx lock · 27717f8b
      Sriram Yagnaraman authored
      Always call igb_xdp_ring_update_tail() under __netif_tx_lock, add a comment
      and lockdep assert to indicate that. This is needed to share the same TX
      ring between XDP, XSK and slow paths. Furthermore, the current XDP
      implementation is racy on tail updates.
      
      Fixes: 9cbc948b ("igb: add XDP support")
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      [Kurt: Add lockdep assert and fixes tag]
      Signed-off-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      27717f8b
    • Michal Schmidt's avatar
      ice: fix VSI lists confusion when adding VLANs · d2940002
      Michal Schmidt authored
      The description of function ice_find_vsi_list_entry says:
        Search VSI list map with VSI count 1
      
      However, since the blamed commit (see Fixes below), the function no
      longer checks vsi_count. This causes a problem in ice_add_vlan_internal,
      where the decision to share VSI lists between filter rules relies on the
      vsi_count of the found existing VSI list being 1.
      
      The reproducing steps:
      1. Have a PF and two VFs.
         There will be a filter rule for VLAN 0, referring to a VSI list
         containing VSIs: 0 (PF), 2 (VF#0), 3 (VF#1).
      2. Add VLAN 1234 to VF#0.
         ice will make the wrong decision to share the VSI list with the new
         rule. The wrong behavior may not be immediately apparent, but it can
         be observed with debug prints.
      3. Add VLAN 1234 to VF#1.
         ice will unshare the VSI list for the VLAN 1234 rule. Due to the
         earlier bad decision, the newly created VSI list will contain
         VSIs 0 (PF) and 3 (VF#1), instead of expected 2 (VF#0) and 3 (VF#1).
      4. Try pinging a network peer over the VLAN interface on VF#0.
         This fails.
      
      Reproducer script at:
      https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/test-vlan-vsi-list-confusion.sh
      Commented debug trace:
      https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/ice-vlan-vsi-lists-debug.txt
      Patch adding the debug prints:
      https://gitlab.com/mschmidt2/linux/-/commit/f8a8814623944a45091a77c6094c40bfe726bfdb
      (Unsafe, by the way. Lacks rule_lock when dumping in ice_remove_vlan.)
      
      Michal Swiatkowski added to the explanation that the bug is caused by
      reusing a VSI list created for VLAN 0. All created VFs' VSIs are added
      to VLAN 0 filter. When a non-zero VLAN is created on a VF which is already
      in VLAN 0 (normal case), the VSI list from VLAN 0 is reused.
      It leads to a problem because all VFs (VSIs to be specific) that are
      subscribed to VLAN 0 will now receive a new VLAN tag traffic. This is
      one bug, another is the bug described above. Removing filters from
      one VF will remove VLAN filter from the previous VF. It happens a VF is
      reset. Example:
      - creation of 3 VFs
      - we have VSI list (used for VLAN 0) [0 (pf), 2 (vf1), 3 (vf2), 4 (vf3)]
      - we are adding VLAN 100 on VF1, we are reusing the previous list
        because 2 is there
      - VLAN traffic works fine, but VLAN 100 tagged traffic can be received
        on all VSIs from the list (for example broadcast or unicast)
      - trust is turning on VF2, VF2 is resetting, all filters from VF2 are
        removed; the VLAN 100 filter is also removed because 3 is on the list
      - VLAN traffic to VF1 isn't working anymore, there is a need to recreate
        VLAN interface to readd VLAN filter
      
      One thing I'm not certain about is the implications for the LAG feature,
      which is another caller of ice_find_vsi_list_entry. I don't have a
      LAG-capable card at hand to test.
      
      Fixes: 23ccae5c ("ice: changes to the interface with the HW and FW for SRIOV_VF+LAG")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reviewed-by: default avatarDave Ertman <David.m.ertman@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d2940002
    • Przemek Kitszel's avatar
      ice: stop calling pci_disable_device() as we use pcim · e6501fc3
      Przemek Kitszel authored
      Our driver uses devres to manage resources, in particular we call
      pcim_enable_device(), what also means we express the intent to get
      automatic pci_disable_device() call at driver removal. Manual calls to
      pci_disable_device() misuse the API.
      
      Recent commit (see "Fixes" tag) has changed the removal action from
      conditional (silent ignore of double call to pci_disable_device()) to
      unconditional, but able to catch unwanted redundant calls; see cited
      "Fixes" commit for details.
      
      Since that, unloading the driver yields following warn+splat:
      
      [70633.628490] ice 0000:af:00.7: disabling already-disabled device
      [70633.628512] WARNING: CPU: 52 PID: 33890 at drivers/pci/pci.c:2250 pci_disable_device+0xf4/0x100
      ...
      [70633.628744]  ? pci_disable_device+0xf4/0x100
      [70633.628752]  release_nodes+0x4a/0x70
      [70633.628759]  devres_release_all+0x8b/0xc0
      [70633.628768]  device_unbind_cleanup+0xe/0x70
      [70633.628774]  device_release_driver_internal+0x208/0x250
      [70633.628781]  driver_detach+0x47/0x90
      [70633.628786]  bus_remove_driver+0x80/0x100
      [70633.628791]  pci_unregister_driver+0x2a/0xb0
      [70633.628799]  ice_module_exit+0x11/0x3a [ice]
      
      Note that this is the only Intel ethernet driver that needs such fix.
      
      Fixes: f748a07a ("PCI: Remove legacy pcim_release()")
      Reviewed-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Reviewed-by: default avatarPhilipp Stanner <pstanner@redhat.com>
      Signed-off-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      e6501fc3
    • Jacob Keller's avatar
      ice: fix accounting for filters shared by multiple VSIs · e843cf7b
      Jacob Keller authored
      When adding a switch filter (such as a MAC or VLAN filter), it is expected
      that the driver will detect the case where the filter already exists, and
      return -EEXIST. This is used by calling code such as ice_vc_add_mac_addr,
      and ice_vsi_add_vlan to avoid incrementing the accounting fields such as
      vsi->num_vlan or vf->num_mac.
      
      This logic works correctly for the case where only a single VSI has added a
      given switch filter.
      
      When a second VSI adds the same switch filter, the driver converts the
      existing filter from an ICE_FWD_TO_VSI filter into an ICE_FWD_TO_VSI_LIST
      filter. This saves switch resources, by ensuring that multiple VSIs can
      re-use the same filter.
      
      The ice_add_update_vsi_list() function is responsible for doing this
      conversion. When first converting a filter from the FWD_TO_VSI into
      FWD_TO_VSI_LIST, it checks if the VSI being added is the same as the
      existing rule's VSI. In such a case it returns -EEXIST.
      
      However, when the switch rule has already been converted to a
      FWD_TO_VSI_LIST, the logic is different. Adding a new VSI in this case just
      requires extending the VSI list entry. The logic for checking if the rule
      already exists in this case returns 0 instead of -EEXIST.
      
      This breaks the accounting logic mentioned above, so the counters for how
      many MAC and VLAN filters exist for a given VF or VSI no longer accurately
      reflect the actual count. This breaks other code which relies on these
      counts.
      
      In typical usage this primarily affects such filters generally shared by
      multiple VSIs such as VLAN 0, or broadcast and multicast MAC addresses.
      
      Fix this by correctly reporting -EEXIST in the case of adding the same VSI
      to a switch rule already converted to ICE_FWD_TO_VSI_LIST.
      
      Fixes: 9daf8208 ("ice: Add support for switch filter programming")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      e843cf7b
    • Martyna Szapar-Mudlaw's avatar
      ice: Fix lldp packets dropping after changing the number of channels · 9debb703
      Martyna Szapar-Mudlaw authored
      After vsi setup refactor commit 6624e780 ("ice: split ice_vsi_setup
      into smaller functions") ice_cfg_sw_lldp function which removes rx rule
      directing LLDP packets to vsi is moved from ice_vsi_release to
      ice_vsi_decfg function. ice_vsi_decfg is used in more cases than just in
      vsi_release resulting in unnecessary removal of rx lldp packets handling
      switch rule. This leads to lldp packets being dropped after a change number
      of channels via ethtool.
      This patch moves ice_cfg_sw_lldp function that removes rx lldp sw rule back
      to ice_vsi_release function.
      
      Fixes: 6624e780 ("ice: split ice_vsi_setup into smaller functions")
      Reported-by: default avatarMatěj Grégr <mgregr@netx.as>
      Closes: https://lore.kernel.org/intel-wired-lan/1be45a76-90af-4813-824f-8398b69745a9@netx.as/T/#uReviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarMartyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9debb703
    • Eric Dumazet's avatar
      net: hsr: remove seqnr_lock · b3c9e65e
      Eric Dumazet authored
      syzbot found a new splat [1].
      
      Instead of adding yet another spin_lock_bh(&hsr->seqnr_lock) /
      spin_unlock_bh(&hsr->seqnr_lock) pair, remove seqnr_lock
      and use atomic_t for hsr->sequence_nr and hsr->sup_sequence_nr.
      
      This also avoid a race in hsr_fill_info().
      
      Also remove interlink_sequence_nr which is unused.
      
      [1]
       WARNING: CPU: 1 PID: 9723 at net/hsr/hsr_forward.c:602 handle_std_frame+0x247/0x2c0 net/hsr/hsr_forward.c:602
      Modules linked in:
      CPU: 1 UID: 0 PID: 9723 Comm: syz.0.1657 Not tainted 6.11.0-rc6-syzkaller-00026-g88fac175 #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
       RIP: 0010:handle_std_frame+0x247/0x2c0 net/hsr/hsr_forward.c:602
      Code: 49 8d bd b0 01 00 00 be ff ff ff ff e8 e2 58 25 00 31 ff 89 c5 89 c6 e8 47 53 a8 f6 85 ed 0f 85 5a ff ff ff e8 fa 50 a8 f6 90 <0f> 0b 90 e9 4c ff ff ff e8 cc e7 06 f7 e9 8f fe ff ff e8 52 e8 06
      RSP: 0018:ffffc90000598598 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffffc90000598670 RCX: ffffffff8ae2c919
      RDX: ffff888024e94880 RSI: ffffffff8ae2c926 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
      R13: ffff8880627a8cc0 R14: 0000000000000000 R15: ffff888012b03c3a
      FS:  0000000000000000(0000) GS:ffff88802b700000(0063) knlGS:00000000f5696b40
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 0000000020010000 CR3: 00000000768b4000 CR4: 0000000000350ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
        hsr_fill_frame_info+0x2c8/0x360 net/hsr/hsr_forward.c:630
        fill_frame_info net/hsr/hsr_forward.c:700 [inline]
        hsr_forward_skb+0x7df/0x25c0 net/hsr/hsr_forward.c:715
        hsr_handle_frame+0x603/0x850 net/hsr/hsr_slave.c:70
        __netif_receive_skb_core.constprop.0+0xa3d/0x4330 net/core/dev.c:5555
        __netif_receive_skb_list_core+0x357/0x950 net/core/dev.c:5737
        __netif_receive_skb_list net/core/dev.c:5804 [inline]
        netif_receive_skb_list_internal+0x753/0xda0 net/core/dev.c:5896
        gro_normal_list include/net/gro.h:515 [inline]
        gro_normal_list include/net/gro.h:511 [inline]
        napi_complete_done+0x23f/0x9a0 net/core/dev.c:6247
        gro_cell_poll+0x162/0x210 net/core/gro_cells.c:66
        __napi_poll.constprop.0+0xb7/0x550 net/core/dev.c:6772
        napi_poll net/core/dev.c:6841 [inline]
        net_rx_action+0xa92/0x1010 net/core/dev.c:6963
        handle_softirqs+0x216/0x8f0 kernel/softirq.c:554
        do_softirq kernel/softirq.c:455 [inline]
        do_softirq+0xb2/0xf0 kernel/softirq.c:442
       </IRQ>
       <TASK>
      
      Fixes: 06afd2c3 ("hsr: Synchronize sending frames to have always incremented outgoing seq nr.")
      Fixes: f421436a ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3c9e65e