1. 14 Apr, 2021 2 commits
    • Parav Pandit's avatar
      net/mlx5: E-Switch, Skip querying SF enabled bits · 7d5ae478
      Parav Pandit authored
      With vhca events, SF state is queried through the VHCA events. Device no
      longer expects SF bitmap in the query eswitch functions command.
      
      Hence, remove it to simplify the code.
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7d5ae478
    • Parav Pandit's avatar
      net/mlx5: E-Switch, let user to enable disable metadata · 7bf481d7
      Parav Pandit authored
      Currently each packet inserted in eswitch is tagged with a internal
      metadata to indicate source vport. Metadata tagging is not always
      needed. Metadata insertion is needed for multi-port RoCE, failover
      between representors and stacked devices. In many other cases,
      metadata enablement is not needed.
      
      Metadata insertion slows down the packet processing rate of the E-switch
      when it is in switchdev mode.
      
      Below table show performance gain with metadata disabled for VXLAN
      offload rules in both SMFS and DMFS steering mode on ConnectX-5 device.
      
      ----------------------------------------------
      | steering | metadata | pkt size | rx pps    |
      | mode     |          |          | (million) |
      ----------------------------------------------
      | smfs     | disabled | 128Bytes | 42        |
      ----------------------------------------------
      | smfs     | enabled  | 128Bytes | 36        |
      ----------------------------------------------
      | dmfs     | disabled | 128Bytes | 42        |
      ----------------------------------------------
      | dmfs     | enabled  | 128Bytes | 36        |
      ----------------------------------------------
      
      Hence, allow user to disable metadata using driver specific devlink
      parameter. Metadata setting of the eswitch is applicable only for the
      switchdev mode.
      
      Example to show and disable metadata before changing eswitch mode:
      $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
      pci/0000:06:00.0:
        name esw_port_metadata type driver-specific
          values:
            cmode runtime value true
      
      $ devlink dev param set pci/0000:06:00.0 \
      	  name esw_port_metadata value false cmode runtime
      
      $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarVu Pham <vuhuong@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ---
      changelog:
      v1->v2:
       - added performance numbers in commit log
       - updated commit log and documentation for switchdev mode
       - added explicit note on when user can disable metadata in
         documentation
      7bf481d7
  2. 12 Apr, 2021 23 commits
  3. 11 Apr, 2021 15 commits
    • David S. Miller's avatar
      Merge branch 'ipa-next' · 5b489fea
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: support two more platforms
      
      This series adds IPA support for two more Qualcomm SoCs.
      
      The first patch updates the DT binding to add compatible strings.
      
      The second temporarily disables checksum offload support for IPA
      version 4.5 and above.  Changes are required to the RMNet driver
      to support the "inline" checksum offload used for IPA v4.5+, and
      once those are present this capability will be enabled for IPA.
      
      The third and fourth patches add configuration data for IPA versions
      4.5 (used for the SDX55 SoC) and 4.11 (used for the SD7280 SoC).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b489fea
    • Alex Elder's avatar
      net: ipa: add IPA v4.11 configuration data · 927c5043
      Alex Elder authored
      Add support for the SC7280 SoC, which includes IPA version 4.11.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      927c5043
    • Alex Elder's avatar
      net: ipa: add IPA v4.5 configuration data · fbb763e7
      Alex Elder authored
      Add support for the SDX55 SoC, which includes IPA version 4.5.
      
      Starting with IPA v4.5, a few of the memory regions have a different
      number of "canary" values; update comments in the where the region
      identifers are defined to accurately reflect that.
      
      I'll note three differences in SDX55 versus the other two existing
      platforms (SDM845 and SC7180):
        - SDX55 uses a 32-bit Linux kernel
        - SDX55 has four interconnects rather than three
        - SDX55 uses IPA v4.5, which uses inline checksum offload
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbb763e7
    • Alex Elder's avatar
      net: ipa: disable checksum offload for IPA v4.5+ · c88c34fc
      Alex Elder authored
      Checksum offload for IPA v4.5+ is implemented differently, using
      "inline" offload (which uses a common header format for both upload
      and download offload).
      
      The IPA hardware must be programmed to enable MAP checksum offload,
      but the RMNet driver is responsible for interpreting checksum
      metadata supplied with messages.
      
      Currently, the RMNet driver does not support inline checksum offload.
      This support is imminent, but until it is available, do not allow
      newer versions of IPA to specify checksum offload for endpoints.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c88c34fc
    • Alex Elder's avatar
      dt-bindings: net: qcom,ipa: add some compatible strings · c3264fee
      Alex Elder authored
      Add existing supported platform "qcom,sc7180-ipa" to the set of IPA
      compatible strings.  Also add newly-supported "qcom,sdx55-ipa",
      "qcom,sc7280-ipa".
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3264fee
    • Qiheng Lin's avatar
      ehea: add missing MODULE_DEVICE_TABLE · 95291ced
      Qiheng Lin authored
      This patch adds missing MODULE_DEVICE_TABLE definition which generates
      correct modalias for automatic loading of this driver when it is built
      as an external module.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarQiheng Lin <linqiheng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95291ced
    • David S. Miller's avatar
      Merge branch 'veth-gro' · 23cfa4d4
      David S. Miller authored
      Paolo Abeni  says:
      
      ====================
      veth: allow GRO even without XDP
      
      This series allows the user-space to enable GRO/NAPI on a veth
      device even without attaching an XDP program.
      
      It does not change the default veth behavior (no NAPI, no GRO),
      except that the GRO feature bit on top of this series will be
      effectively off by default on veth devices. Note that currently
      the GRO bit is on by default, but GRO never takes place in
      absence of XDP.
      
      On top of this series, setting the GRO feature bit enables NAPI
      and allows the GRO to take place. The TSO features on the peer
      device are preserved.
      
      The main goal is improving UDP forwarding performances for
      containers in a typical virtual network setup:
      
      (container) veth -> veth peer -> bridge/ovs -> vxlan -> NIC
      
      Enabling the NAPI threaded mode, GRO the NETIF_F_GRO_UDP_FWD
      feature on the veth peer improves the UDP stream performance
      with not void netfilter configuration by 2x factor with no
      measurable overhead for TCP traffic: some heuristic ensures
      that TCP will not go through the additional NAPI/GRO layer.
      
      Some self-tests are added to check the expected behavior in
      the default configuration, with XDP and with plain GRO enabled.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23cfa4d4
    • Paolo Abeni's avatar
      self-tests: add veth tests · 1c3cadbe
      Paolo Abeni authored
      Add some basic veth tests, that verify the expected flags and
      aggregation with different setups (default, xdp, etc...)
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c3cadbe
    • Paolo Abeni's avatar
      veth: refine napi usage · 47e550e0
      Paolo Abeni authored
      After the previous patch, when enabling GRO, locally generated
      TCP traffic experiences some measurable overhead, as it traverses
      the GRO engine without any chance of aggregation.
      
      This change refine the NAPI receive path admission test, to avoid
      unnecessary GRO overhead in most scenarios, when GRO is enabled
      on a veth peer.
      
      Only skbs that are eligible for aggregation enter the GRO layer,
      the others will go through the traditional receive path.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47e550e0
    • Paolo Abeni's avatar
      veth: allow enabling NAPI even without XDP · d3256efd
      Paolo Abeni authored
      Currently the veth device has the GRO feature bit set, even if
      no GRO aggregation is possible with the default configuration,
      as the veth device does not hook into the GRO engine.
      
      Flipping the GRO feature bit from user-space is a no-op, unless
      XDP is enabled. In such scenario GRO could actually take place, but
      TSO is forced to off on the peer device.
      
      This change allow user-space to really control the GRO feature, with
      no need for an XDP program.
      
      The GRO feature bit is now cleared by default - so that there are no
      user-visible behavior changes with the default configuration.
      
      When the GRO bit is set, the per-queue NAPI instances are initialized
      and registered. On xmit, when napi instances are available, we try
      to use them.
      
      Some additional checks are in place to ensure we initialize/delete NAPIs
      only when needed in case of overlapping XDP and GRO configuration
      changes.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3256efd
    • Paolo Abeni's avatar
      veth: use skb_orphan_partial instead of skb_orphan · c75fb320
      Paolo Abeni authored
      As described by commit 9c4c3252 ("skbuff: preserve sock
      reference when scrubbing the skb."), orphaning a skb
      in the TX path will cause OoO.
      
      Let's use skb_orphan_partial() instead of skb_orphan(), so
      that we keep the sk around for queue's selection sake and we
      still avoid the problem fixed with commit 4bf9ffa0 ("veth:
      Orphan skb before GRO")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c75fb320
    • David S. Miller's avatar
      Merge branch 'ethtool-eeprom' · 7dc85b59
      David S. Miller authored
      Moshe Shemesh says:
      
      ====================
      ethtool: Extend module EEPROM dump API
      
      Ethtool supports module EEPROM dumps via the `ethtool -m <dev>` command.
      But in current state its functionality is limited - offset and length
      parameters, which are used to specify a linear desired region of EEPROM
      data to dump, is not enough, considering emergence of complex module
      EEPROM layouts such as CMIS 4.0.
      Moreover, CMIS 4.0 extends the amount of pages that may be accessible by
      introducing another parameter for page addressing - banks.
      
      Besides, currently module EEPROM is represented as a chunk of
      concatenated pages, where lower 128 bytes of all pages, except page 00h,
      are omitted. Offset and length are used to address parts of this fake
      linear memory. But in practice drivers, which implement
      get_module_info() and get_module_eeprom() ethtool ops still calculate
      page number and set I2C address on their own.
      
      This series tackles these issues by adding ethtool op, which allows to
      pass page number, bank number and I2C address in addition to offset and
      length parameters to the driver, adds corresponding netlink
      infrastructure and implements the new interface in mlx5 driver.
      
      This allows to extend userspace 'ethtool -m' CLI by adding new
      parameters - page, bank and i2c. New command line format:
       ethtool -m <dev> [hex on|off] [raw on|off] [offset N] [length N] [page N] [bank N] [i2c N]
      
      The consequence of this series is a possibility to dump arbitrary EEPROM
      page at a time, in contrast to dumps of concatenated pages. Therefore,
      offset and length change their semantics and may be used only to specify
      a part of data within half page boundary, which size is currently limited
      to 128 bytes.
      
      As for drivers that support legacy get_module_info() and
      get_module_eeprom() pair, the series addresses it by implementing a
      fallback mechanism. As mentioned earlier, such drivers derive a page
      number from 'global' offset, so this can be done vice versa without
      their involvement thanks to standardization. If kernel netlink handler
      of 'ethtool -m' command detects that new ethtool op is not supported by
      the driver, it calculates offset from given page number and page offset
      and calls old ndos, if they are available.
      ====================
      
      \Signed-off-by: David S. Miller <davem@davemloft.net>
      7dc85b59
    • Andrew Lunn's avatar
      ethtool: wire in generic SFP module access · c97a31f6
      Andrew Lunn authored
      If the device has a sfp bus attached, call its
      sfp_get_module_eeprom_by_page() function, otherwise use the ethtool op
      for the device. This follows how the IOCTL works.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c97a31f6
    • Andrew Lunn's avatar
      phy: sfp: add netlink SFP support to generic SFP code · d740513f
      Andrew Lunn authored
      The new netlink API for reading SFP data requires a new op to be
      implemented. The idea of the new netlink SFP code is that userspace is
      responsible to parsing the EEPROM data and requesting pages, rather
      than have the kernel decide what pages are interesting and returning
      them. This allows greater flexibility for newer formats.
      
      Currently the generic SFP code only supports simple SFPs. Allow i2c
      address 0x50 and 0x51 to be accessed with page and bank must always be
      0. This interface will later be extended when for example QSFP support
      is added.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarVladyslav Tarasiuk <vladyslavt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d740513f
    • Vladyslav Tarasiuk's avatar
      ethtool: Add fallback to get_module_eeprom from netlink command · 96d971e3
      Vladyslav Tarasiuk authored
      In case netlink get_module_eeprom_by_page() callback is not implemented
      by the driver, try to call old get_module_info() and get_module_eeprom()
      pair. Recalculate parameters to get_module_eeprom() offset and len using
      page number and their sizes. Return error if this can't be done.
      Signed-off-by: default avatarVladyslav Tarasiuk <vladyslavt@nvidia.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96d971e3