1. 01 Nov, 2021 15 commits
  2. 30 Oct, 2021 5 commits
  3. 29 Oct, 2021 20 commits
    • Ariel Levkovich's avatar
      net/mlx5: Support internal port as decap route device · b16eb3c8
      Ariel Levkovich authored
      When performing route device lookup for decap action, support
      the case of ovs internal port as the lookup result.
      
      In such case, an internal port struct is mapped and attached
      to the flow attributes so that the source port matching of the
      rule will match on the internal port's metadata value.
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b16eb3c8
    • Ariel Levkovich's avatar
      net/mlx5e: Term table handling of internal port rules · 5e994272
      Ariel Levkovich authored
      Adjust termination table logic to handle rules which
      involve internal port as filter or forwarding device.
      
      For cases where the rule forwards from internal port
      to uplink, always choose to go via termination table.
      This is because it is not known from where the packet
      originally arrived to the internal port and it is possible
      that it came from the uplink itself, in which case
      a term table is required to perform hairpin.
      If the packet arrived from a vport, going via term
      table has no effect.
      
      For cases where the rule forwards to an internal port
      from uplink the rep pointer will point to the uplink rep,
      avoid going via termination table as it is not required.
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5e994272
    • Ariel Levkovich's avatar
      net/mlx5e: Add indirect tc offload of ovs internal port · 166f431e
      Ariel Levkovich authored
      Register callbacks for tc blocks of ovs internal port devices.
      
      This allows an indirect offloading rules that apply on
      such devices as the filter device.
      
      In case a rule is added to a tc block of an internal port,
      the mlx5 driver will implicitly add a matching on the internal
      port's unique vport metadata value to the rule's matching list.
      Therefore, only packets that previously hit a rule that redirects
      to an internal port and got the vport metadata overwritten to the
      internal port's unique metadata, can match on such indirect rule.
      
      Offloading of both ingress and egress tc blocks of internal ports
      is supported as opposed to other devices where only ingress block
      offloading is supported.
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      166f431e
    • Ariel Levkovich's avatar
      net/mlx5e: Offload internal port as encap route device · 100ad4e2
      Ariel Levkovich authored
      When pefroming encap action, a route lookup is performed
      to find the routing device the packet should be forwarded
      to after the encapsulation. This is the device that has the
      local tunnel ip address.
      
      This change adds support to offload an encap rule where the
      route device ends up being an ovs internal port.
      In such case, the driver will add a HW rule that will encapsulate
      the packet with the tunnel header and will overwrite the vport
      metadata in reg_c0 to the internal port metadata value.
      Finally, the packet will be forwarded to the root table to be
      processed again with the indication that it came from an internal
      port.
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      100ad4e2
    • Ariel Levkovich's avatar
      net/mlx5e: Offload tc rules that redirect to ovs internal port · 27484f71
      Ariel Levkovich authored
      Allow offloading rules that redirect to ovs internal port
      ingress and egress.
      
      To support redirect to ingress device, offloading of REDIRECT_INGRESS
      action is added.
      
      When a tc rule redirects to ovs internal port, the hw rule will
      overwrite the input vport value in reg_c0 with a new vport metadata
      value that is mapped for this internal port using the internal
      port mapping api that is introduce in previous patches.
      After that the hw rule will redirect the packet to the root table
      to continue processing with the new vport metadata value.
      
      The new vport metadata value indicates that this packet is now
      arriving through an internal port and therefore should be processed
      using rules that apply on the same internal port as the filter device.
      Therefore, following rules that apply on this internal port will have
      to match on the same vport metadata value as part of their matching
      keys to make sure the packet belongs to the internal port.
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      27484f71
    • Ariel Levkovich's avatar
      net/mlx5e: Accept action skbedit in the tc actions list · dbac71f2
      Ariel Levkovich authored
      Setting the skb packet type field to host is usually
      done when performing forwarding to ingress device.
      
      This is required since the receive handling that is used
      by the redirect to ingress action checks whether the packet
      doesn't belong to this host and drops the packet in such case.
      
      In order to be able to offload action redirect ingress, tc offload
      code needs to accept the skbedit ptype action as well.
      
      There's no special handling in HW for such action since it will
      be followed by a redirect action and therefore, this code
      only allows us to accept such action in the actions list but
      not performing anything specific in HW for it.
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      dbac71f2
    • Ariel Levkovich's avatar
      net/mlx5: E-Switch, Add ovs internal port mapping to metadata support · 4f4edcc2
      Ariel Levkovich authored
      Adding infrastructure to map ovs internal port device to vport
      match metadata to support offload of rules with internal port as
      the filter device or as the destination device.
      
      The infrastructure allows adding and removing internal port device
      to an eswitch database and getting a unique vport metadata value to
      be placed and match on in reg_c0 when offloading rules that are coming
      from or going to an internal port.
      
      The new int port metadata can be written to the source port register
      in HW to indicate that current source port of the packet is the
      internal port and not one of the actual HW vports (uplink or VF).
      Using this method, it is possible to offload TC rules with an OVS
      internal port as their destination port (overwriting the src vport
      register) or as the filter port (matching on the value of the src
      vport register and making sure it matches to the internal port's
      value).
      
      There is also a need to handle a miss case where the packet's
      src port value was changed in HW to an internal port but a following
      rule which matches on this new src port value wasn't found in HW.
      
      In such case, the packet will be forwarded to the driver with
      metadata which allows driver to restore the info of the internal
      port's netdevice. Once this info is restored, the uplink driver
      can forward the packet to the relevant netdevice in SW.
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4f4edcc2
    • Ariel Levkovich's avatar
      net/mlx5e: Use generic name for the forwarding dev pointer · 189ce08e
      Ariel Levkovich authored
      Rename tun_dev to fwd_dev within mlx5e_tc_update_priv struct
      since future implementation may introduce other device types
      which the handler is forwarding to.
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      189ce08e
    • Ariel Levkovich's avatar
      net/mlx5e: Refactor rx handler of represetor device · 28e7606f
      Ariel Levkovich authored
      Move the ownership of skb forwarding to network stack to the
      tc update_skb handler as different cases will require different
      handling of the skb.
      
      While the tc handler will take care of the various cases and
      properly handle the handover of the skb to the network stack
      and freeing the skb, the main rx handler will be kept clean
      from branches and usage of flags.
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      28e7606f
    • Muhammad Sammar's avatar
      net/mlx5: DR, Add check for unsupported fields in match param · 941f1979
      Muhammad Sammar authored
      When a matcher is being built, we "consume" (clear) mask fields one by one,
      and to verify that we do support all the required fields we check if the
      whole mask was consumed, else the matching request includes unsupported
      fields.
      Signed-off-by: default avatarMuhammad Sammar <muhammads@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      941f1979
    • Paul Blakey's avatar
      net/mlx5: Allow skipping counter refresh on creation · 504e1572
      Paul Blakey authored
      CT creates a counter for each CT rule, and for each such counter,
      fs_counters tries to queue mlx5_fc_stats_work() work again via
      mod_delayed_work(0) call to refresh all counters. This call has a
      large performance impact when reaching high insertion rate and
      accounts for ~8% of the insertion time when using software steering.
      
      Allow skipping the refresh of all counters during counter creation.
      Change CT to use this refresh skipping for it's counters.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      504e1572
    • Raed Salem's avatar
      net/mlx5e: IPsec: Refactor checksum code in tx data path · 428ffea0
      Raed Salem authored
      Part of code that is related solely to IPsec is always compiled in the
      driver code regardless if the IPsec functionality is enabled or disabled
      in the driver code, this will add unnecessary branch in case IPsec is
      disabled at Tx data path.
      
      Move IPsec related code to IPsec related file such that in case of IPsec
      is disabled and because of unlikely macro the compiler should be able to
      optimize and omit the checksum IPsec code all together from Tx data path
      Signed-off-by: default avatarRaed Salem <raeds@nvidia.com>
      Reviewed-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      428ffea0
    • Paul Blakey's avatar
      net/mlx5: CT: Remove warning of ignore_flow_level support for VFs · ae2ee3be
      Paul Blakey authored
      ignore_flow_level isn't supported for VFs, and so it causes
      post_act and ct to warn about it.
      
      Instead of disabling CT for VFs, and a driver update will be need
      to enable CT again once firmware support this, remove this warning
      specifically for VFs. This way, it could be automatically enabled on
      future firmwares where VFs support ignore_flow_level capability.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ae2ee3be
    • Nathan Chancellor's avatar
      net/mlx5: Add esw assignment back in mlx5e_tc_sample_unoffload() · 1aec8597
      Nathan Chancellor authored
      Clang warns:
      
      drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c:635:34: error: variable 'esw' is uninitialized when used here [-Werror,-Wuninitialized]
              mlx5_eswitch_del_offloaded_rule(esw, sample_flow->pre_rule, sample_flow->pre_attr);
                                              ^~~
      drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c:626:26: note: initialize the variable 'esw' to silence this warning
              struct mlx5_eswitch *esw;
                                      ^
                                       = NULL
      1 error generated.
      
      It appears that the assignment should have been shuffled instead of
      removed outright like in mlx5e_tc_sample_offload(). Add it back so there
      is no use of esw uninitialized.
      
      Fixes: a64c5edb ("net/mlx5: Remove unnecessary checks for slow path flag")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1494Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1aec8597
    • Przemyslaw Patynowski's avatar
      iavf: Fix kernel BUG in free_msi_irqs · 605ca7c5
      Przemyslaw Patynowski authored
      Fix driver not freeing VF's traffic irqs, prior to calling
      pci_disable_msix in iavf_remove.
      There were possible 2 erroneous states in which, iavf_close would
      not be called.
      One erroneous state is fixed by allowing netdev to register, when state
      is already running. It was possible for VF adapter to enter state loop
      from running to resetting, where iavf_open would subsequently fail.
      If user would then unload driver/remove VF pci, iavf_close would not be
      called, as the netdev was not registered, leaving traffic pcis still
      allocated.
      Fixed this by breaking loop, allowing netdev to open device when adapter
      state is __IAVF_RUNNING and it is not explicitily downed.
      Other possiblity is entering to iavf_remove from __IAVF_RESETTING state,
      where iavf_close would not free irqs, but just return 0.
      Fixed this by checking for last adapter state and then removing irqs.
      
      Kernel panic:
      [ 2773.628585] kernel BUG at drivers/pci/msi.c:375!
      ...
      [ 2773.631567] RIP: 0010:free_msi_irqs+0x180/0x1b0
      ...
      [ 2773.640939] Call Trace:
      [ 2773.641572]  pci_disable_msix+0xf7/0x120
      [ 2773.642224]  iavf_reset_interrupt_capability.part.41+0x15/0x30 [iavf]
      [ 2773.642897]  iavf_remove+0x12e/0x500 [iavf]
      [ 2773.643578]  pci_device_remove+0x3b/0xc0
      [ 2773.644266]  device_release_driver_internal+0x103/0x1f0
      [ 2773.644948]  pci_stop_bus_device+0x69/0x90
      [ 2773.645576]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 2773.646215]  pci_iov_remove_virtfn+0xba/0x120
      [ 2773.646862]  sriov_disable+0x2f/0xe0
      [ 2773.647531]  ice_free_vfs+0x2f8/0x350 [ice]
      [ 2773.648207]  ice_sriov_configure+0x94/0x960 [ice]
      [ 2773.648883]  ? _kstrtoull+0x3b/0x90
      [ 2773.649560]  sriov_numvfs_store+0x10a/0x190
      [ 2773.650249]  kernfs_fop_write+0x116/0x190
      [ 2773.650948]  vfs_write+0xa5/0x1a0
      [ 2773.651651]  ksys_write+0x4f/0xb0
      [ 2773.652358]  do_syscall_64+0x5b/0x1a0
      [ 2773.653075]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      Fixes: 22ead37f ("i40evf: Add longer wait after remove module")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      605ca7c5
    • Karen Sornek's avatar
      iavf: Add helper function to go from pci_dev to adapter · 247aa001
      Karen Sornek authored
      Add helper function to go from pci_dev to adapter to make work simple -
      to go from a pci_dev to the adapter structure and make netdev assignment
      instead of having to go to the net_device then the adapter.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Signed-off-by: default avatarKaren Sornek <karen.sornek@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      247aa001
    • Brett Creeley's avatar
      virtchnl: Use the BIT() macro for capability/offload flags · 4a15022f
      Brett Creeley authored
      Currently raw hex values are used to define specific bits for each
      capability/offload in virtchnl.h. Using raw hex values makes it
      unclear which bits are used/available. Fix this by using the BIT()
      macro so it's immediately obvious which bits are used/available.
      
      Also, move the VIRTCHNL_VF_CAP_ADV_LINK_SPEED define in the correct
      place to line up with the other bit values and add a comment for its
      purpose.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarTony Brelinski <tony.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      4a15022f
    • Brett Creeley's avatar
      virtchnl: Remove unused VIRTCHNL_VF_OFFLOAD_RSVD define · 5bf84b29
      Brett Creeley authored
      Remove unused define that is currently marked as reserved. This will
      open up space for a new feature if/when it's introduced. Also, there is
      no reason to keep unused defines around.
      Suggested-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarTony Brelinski <tony.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      5bf84b29
    • Marcin Szycik's avatar
      ice: Hide bus-info in ethtool for PRs in switchdev mode · bfaaba99
      Marcin Szycik authored
      Disable showing bus-info information for port representors in switchdev
      mode. This fixes a bug that caused displaying wrong netdev descriptions in
      lshw tool - one port representor displayed PF branding string, and in turn
      one PF displayed a "generic" description. The bug occurs when many devices
      show the same bus-info in ethtool, which was the case in switchdev mode (PF
      and its port representors displayed the same bus-info). The bug occurs only
      if a port representor netdev appears before PF netdev in /proc/net/dev.
      
      In the examples below:
      ens6fX is PF
      ens6fXvY is VF
      ethX is port representor
      One irrelevant column was removed from output
      
      Before:
      $ sudo lshw -c net -businfo
      Bus info          Device      Description
      =========================================
      pci@0000:02:00.0  eth102       Ethernet Controller E810-XXV for SFP
      pci@0000:02:00.1  ens6f1       Ethernet Controller E810-XXV for SFP
      pci@0000:02:01.0  ens6f0v0     Ethernet Adaptive Virtual Function
      pci@0000:02:01.1  ens6f0v1     Ethernet Adaptive Virtual Function
      pci@0000:02:01.2  ens6f0v2     Ethernet Adaptive Virtual Function
      pci@0000:02:00.0  ens6f0       Ethernet interface
      
      Notice that eth102 and ens6f0 have the same bus-info and their descriptions
      are swapped.
      
      After:
      $ sudo lshw -c net -businfo
      Bus info          Device      Description
      =========================================
      pci@0000:02:00.0  ens6f0      Ethernet Controller E810-XXV for SFP
      pci@0000:02:00.1  ens6f1      Ethernet Controller E810-XXV for SFP
      pci@0000:02:01.0  ens6f0v0    Ethernet Adaptive Virtual Function
      pci@0000:02:01.1  ens6f0v1    Ethernet Adaptive Virtual Function
      pci@0000:02:01.2  ens6f0v2    Ethernet Adaptive Virtual Function
      
      Fixes: 7aae80ce ("ice: add port representor ethtool ops and stats")
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bfaaba99
    • Marcin Szycik's avatar
      ice: Clear synchronized addrs when adding VFs in switchdev mode · c79bb28e
      Marcin Szycik authored
      When spawning VFs in switchdev mode, internal filter list of VSIs is
      cleared, which includes MAC rules. However MAC entries stay on netdev's
      multicast list, which causes error message when bringing link up after
      spawning VFs ("Failed to delete MAC filters"). __dev_mc_sync() is
      called and tries to unsync addresses that were already removed
      internally when adding VFs.
      
      This can be reproduced with:
      1) Load ice driver
      2) Change PF to switchdev mode
      3) Bring PF link up
      4) Bring PF link down
      5) Create a VF on PF
      6) Bring PF link up
      
      Added clearing of netdev's multicast (and also unicast) list when
      spawning VFs in switchdev mode, so the state of internal rule list and
      netdev's MAC list is consistent.
      
      Fixes: 1a1c40df ("ice: set and release switchdev environment")
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c79bb28e