1. 20 Aug, 2021 19 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · f96b48c6
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2021-08-19
      
      This series introduces the support for two new mlx5 features:
      
      1) Sample offload for tunneled traffic
      2) devlink rate objects support
      
      1) From Chris Mi: Sample offload for tunneled traffic
      =====================================================
      
      Background and solution
      -----------------------
      
      Currently the sample offload actions send the encapsulated packet
      to software. This series de-capsulates the packet before performing
      the sampling and set the tunnel properties on the skb metadata
      fields to make the behavior consistent with OVS sFlow.
      
      If de-capsulating first, we can't use the same match like before in
      default table. So instantiate a post action instance to continue
      processing the action list. If HW can preserve reg_c, also use the
      post action instance.
      
      Post action infrastructure
      --------------------------
      
      Some tc actions are modeled in hardware using multiple tables
      causing a tc action list split. For example, CT action is modeled
      by jumping to a ct table which is controlled by nf flow table.
      sFlow jumps in hardware to a sample table, which continues to a
      "default table" where it should continue processing the action list.
      
      Multi table actions are modeled in hardware using a unique fte_id.
      The fte_id is set before jumping to a table. Split actions continue
      to a post-action table where the matched fte_id value continues the
      execution the tc action list.
      
      This series also introduces post action infrastructure. Both ct and
      sample use it.
      
      Sample for tunnel in TC SW
      --------------------------
      
      tc filter add dev vxlan1 protocol ip parent ffff: prio 3		\
      	flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02	\
      	enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13			\
      	enc_dst_port 4789 enc_key_id 4					\
      	action sample rate 1 group 6					\
      	action tunnel_key unset						\
      	action mirred egress redirect dev enp4s0f0_1
      
      MLX5 sample HW offload
      ----------------------
      
      For the following typical flow table:
      
      +-------------------------------+
      +       original flow table     +
      +-------------------------------+
      +         original match        +
      +-------------------------------+
      + sample action + other actions +
      +-------------------------------+
      
      We translate the tc filter with sample action to the following HW model:
      
              +---------------------+
              + original flow table +
              +---------------------+
              +   original match    +
              +---------------------+
                    | set fte_id (if reg_c preserve cap)
                    | do decap
                    v
      +------------------------------------------------+
      +                Flow Sampler Object             +
      +------------------------------------------------+
      +                    sample ratio                +
      +------------------------------------------------+
      +    sample table id    |    default table id    +
      +------------------------------------------------+
                 |                            |
                 v                            v
      +-----------------------------+  +-------------------+
      +        sample table         +  +   default table   +
      +-----------------------------+  +-------------------+
      + forward to management vport +             |
      +-----------------------------+             |
                                          +-------+------+
                                          |              |reg_c preserve cap
                                          |              |or decap action
                                          v              v
                             +-----------------+   +-------------+
                             + per vport table +   + post action +
                             +-----------------+   +-------------+
                             + original match  +
                             +-----------------+
                             + other actions   +
                             +-----------------+
      
      2) From Dmytro Linkin: devlink rate object support for mlx5_core driver
      =======================================================================
      
      HIGH-LEVEL OVERVIEW
      
      Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField)
      in switchdev mode on devlink port registration.
      Implement devlink ops callbacks to create/destroy rate groups, set TX
      rate values of the vport/group, assign vport to the group.
      Driver accepts TX rate values as fraction of 1Mbps.
      
      Refactor existing eswitch QoS infrastructure to be accessible by legacy
      NDO rate API and new devlink rate API. NDO rate API is not
      removed/disabled in switchdev mode to not break existing users. Rate
      values configured with NDO rate API are not visible for devlink
      infrastructure, therefore APIs should not be used simultaneously.
      
      IMPLEMENTATION DETAILS
      
      Driver provide two level rate hierarchy to manage bandwidth - group
      level and vport level. Initially each vport added to internal unlimited
      group created by default. Each rate element (vport or group) receive
      bandwidth relative to its parent element (for groups the parent is a
      physical link itself) in a Round Robin manner, where element get
      bandwidth value according to its weight. Example:
      
      Created four rate groups with tx_share limits:
      
      $ devlink port function rate add \
          pci/0000:06:00.0/group_1 tx_share 30gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_2 tx_share 20gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_3 tx_share 20gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_4 tx_share 10gbit
      
      Weights created in HW for each group are relative to the bigest tx_share
      value, which is 30gbit:
      
      <group_1> 1.0
      <group_2> 0.67
      <group_3> 0.67
      <group_4> 0.33
      
      Assuming link speed is 50 Gbit/sec and each group can sustain such
      amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33)
       = ~18.75 Gbit/sec. Normilized bandwidth values for groups:
      
      <group_1> 18.75 * 1.0  = 18.75 Gbit/sec
      <group_2> 18.75 * 0.67 = 12.5 Gbit/sec
      <group_3> 18.75 * 0.67 = 12.5 Gbit/sec
      <group_4> 18.75 * 0.33 = 6.25 Gbit/sec
      
      If in example above group_1 doesn't produce any traffic, then maximum
      bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized
      values:
      
      <group_2> 30.0 * 0.67 = 20.0 Gbit/sec
      <group_3> 30.0 * 0.67 = 20.0 Gbit/sec
      <group_4> 30.0 * 0.33 = 10.0 Gbit/sec
      
      Same normalization applied to each vport in the group.
      
      Normalized values are internal, therefore driver provides QoS
      tracepoints for next events:
      
      * vport rate element creation/deletion:
      * vport rate element configuration;
      * group rate element creation/deletion;
      * group rate element configuration.
      
      PATCHES OVERVIEW
      
      1 - Moving and isolation of eswitch QoS logic in separate file;
      
      2 - Implement devlink leaf rate object support for vports;
      
      3 - Implement rate groups creation/deletion;
      
      4 - Implement TX rate management for the groups;
      
      5 - Implement parent set for vports;
      
      6 - Eswitch QoS tracepoints.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f96b48c6
    • David S. Miller's avatar
      Merge tag 'for-net-next-2021-08-19' of... · e61fbee7
      David S. Miller authored
      Merge tag 'for-net-next-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth-next pull request for net-next:
      
       - Add support for Foxconn Mediatek Chip
       - Add support for LG LGSBWAC92/TWCM-K505D
       - hci_h5 flow control fixes and suspend support
       - Switch to use lock_sock for SCO and RFCOMM
       - Various fixes for extended advertising
       - Reword Intel's setup on btusb unifying the supported generations
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e61fbee7
    • David S. Miller's avatar
      Merge tag 'batadv-next-pullrequest-20210819' of git://git.open-mesh.org/linux-merge · 815cc21d
      David S. Miller authored
      Simon Wunderlich says:
      
      ====================
      This cleanup patchset includes the following patches:
      
       - bump version strings, by Simon Wunderlich
      
       - update docs about move IRC channel away from freenode,
         by Sven Eckelmann
      
       - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann
      
       - Update NULL checks, by Sven Eckelmann (2 patches)
      
       - remove remaining skb-copy calls for broadcast packets,
         by Linus Lüssing
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      815cc21d
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Add QoS tracepoints · 3202ea65
      Dmytro Linkin authored
      Add tracepoints to log QoS enabling/disabling/configuration for vports
      and rate groups.
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3202ea65
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Allow to add vports to rate groups · 0fe132ea
      Dmytro Linkin authored
      Implement eswitch API that allows updating rate groups. If group
      pointer is NULL, then move the vport to internal unlimited group zero.
      
      Implement devlink_ops->rate_parent_node_set() callback in the terms of
      the new eswitch group update API.
      
      Enable QoS for all group's elements if a group has allocated BW share.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0fe132ea
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups · f47e04eb
      Dmytro Linkin authored
      Provide eswitch API to allow controlling group rate limits. Use it to
      implement devlink_ops->mlx5_devlink_rate_node_tx_{share|max}_set().
      
      The share rate will create relative bandwidth share on the groups level
      while within the group the user can set shared rate on the member vports
      of that group and this rate will be relative to the group's share rate.
      The group with the highest shared rate will get a BW share of 100 and
      the rest of the groups will get a value that reflects the ratio between
      their share rate and the maximum share rate.
      
      Example:
      Created four rate groups with tx_share limits:
      
      $ devlink port function rate add \
          pci/0000:06:00.0/group_1 tx_share 30gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_2 tx_share 20gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_3 tx_share 20gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_4 tx_share 10gbit
      
      Assuming link speed is 50 Gbit/sec ratio divider will be
      50 / (30+20+20+10) = 0.625. Normalized rate values for the groups:
      
      <group_1> 30 * 0.625 = 18.75 Gbit/sec
      <group_2> 20 * 0.625 = 12.5 Gbit/sec
      <group_3> 20 * 0.625 = 12.5 Gbit/sec
      <group_4> 10 * 0.625 = 6.25 Gbit/sec
      
      Rate group with unlimited tx_share rate will receive minimum BW value
      (1Mbit/sec) if presented any group with tx_share rate limit. This allow
      to not drop all packets in case of heavy traffic.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f47e04eb
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Introduce rate limiting groups API · 1ae258f8
      Dmytro Linkin authored
      Extend eswitch API with rate limiting groups:
      
      - Define new struct mlx5_esw_rate_group that is used to hold all
        internal group data.
      
      - Implement functions that allow creation, destruction and cleanup of
        groups.
      
      - Assign all vports to internal unlimited zero group by default.
      
      This commit lays the groundwork for group rate limiting by implementing
      devlink_ops->rate_node_{new|del}() callbacks to support creating and
      deleting groups through devlink rate node objects. APIs that allows
      setting rates and adding/removing members are implemented in following
      patches.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1ae258f8
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Enable devlink port tx_{share|max} rate control · ad34f02f
      Dmytro Linkin authored
      Register devlink rate leaf object for every eswitch vport.
      Implement devlink ops that enable setting shared and max tx rates
      through devlink API.
      Extract common eswitch code from existing tx rate set function that is
      accessed through NDO to be reused for the devlink. Values configured
      with NDO API are not visible for the devlink API, therefore shouldn't be
      used simultaneously.
      
      When normalizing the BW share value, dividing the desired minimum rate
      by the common divider results in losing information since the quotient
      is rounded down. This has a significant affect on configurations of low
      rate where the round down eliminates a large percentage of the total
      rate. To improve the formula, round up the division result to make sure
      that the BW share is at least the value it was supposed to be and won't
      lost a significant amount of the expected value.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ad34f02f
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Move QoS related code to dedicated file · 2d116e3e
      Dmytro Linkin authored
      Move eswitch QoS related code into dedicated file. Provide eswitch API
      to access this code meaning it is isolated and restricted to be used
      only by eswitch.c. Exception is legacy NDO vf set rate, which moved to
      esw/legacy.c.
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2d116e3e
    • Chris Mi's avatar
      net/mlx5e: TC, Support sample offload action for tunneled traffic · 2741f223
      Chris Mi authored
      Currently the sample offload actions send the encapsulated packet
      to software. This commit decapsulates the packet before performing
      the sampling and set the tunnel properties on the skb metadata
      fields to make the behavior consistent with OVS sFlow.
      
      If decapsulating first, we can't use the same match like before in
      default table. So instantiate a post action instance to continue
      processing the action list. If HW can preserve reg_c, also use the
      post action instance.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2741f223
    • Chris Mi's avatar
      net/mlx5e: TC, Restore tunnel info for sample offload · ee950e5d
      Chris Mi authored
      Currently the sample offload actions send the encapsulated packet
      to software. sFlow expects tunneled packets to be decapsulated while
      having the tunnel properties on the skb metadata fields.
      
      Reuse the functions used by connection tracking to map the outer
      header properties to a unique id. The next patch  will use that id
      to restore the tunnel information of decapsulated packets onto the
      skb.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ee950e5d
    • Chris Mi's avatar
      net/mlx5e: TC, Remove CONFIG_NET_TC_SKB_EXT dependency when restoring tunnel · d12e20ac
      Chris Mi authored
      CONFIG_NET_TC_SKB_EXT controls the SKB extension support for
      restoring chain ids. SKB extension is not required for tunnel
      restoration.
      
      Remove the CONFIG_NET_TC_SKB_EXT dependency as a pre-step for
      using the tunnel restore methods for sample offload use cases.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d12e20ac
    • Chris Mi's avatar
      net/mlx5e: Refactor ct to use post action infrastructure · f0da4daa
      Chris Mi authored
      Move post action table management to common library providing
      add/del/get API. Refactor the ct action offload to use the common
      API.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f0da4daa
    • Chris Mi's avatar
      net/mlx5e: Introduce post action infrastructure · 6f0b692a
      Chris Mi authored
      Some tc actions are modeled in hardware using multiple tables
      causing a tc action list split. For example, CT action is modeled
      by jumping to a ct table which is controlled by nf flowtable.
      sFlow jumps in hardware to a sample table, which continues to a
      "default table" where it should continue processing the action list.
      
      Multi table actions are modeled in hardware using a unique fte_id.
      The fte_id is set before jumping to a table. Split actions continue
      to a post-action table where the matched fte_id value continues the
      execution the tc action list.
      
      Currently the post-action design is implemented only by the ct
      action. Introduce post action infrastructure as a pre-step for
      reusing it with the sFlow offload feature. Init and destroy the
      common post action table. Refactor the ct offload to use the
      common post table infrastructure in the next patch.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6f0b692a
    • Chris Mi's avatar
      net/mlx5e: CT, Use xarray to manage fte ids · 27997978
      Chris Mi authored
      IDR is deprecated. Use xarray instead.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      27997978
    • Chris Mi's avatar
      net/mlx5e: Move sample attribute to flow attribute · bcd6740c
      Chris Mi authored
      Currently it is in eswitch attribute. Move it to flow attribute to
      reflect the change in previous patch.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      bcd6740c
    • Chris Mi's avatar
      net/mlx5e: Move esw/sample to en/tc/sample · 0027d70c
      Chris Mi authored
      Module sample belongs to en/tc instead of esw. Move it and rename
      accordingly.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0027d70c
    • Saeed Mahameed's avatar
      net/mlx5e: Remove mlx5e dependency from E-Switch sample · 5024fa95
      Saeed Mahameed authored
      mlx5/esw/sample.c doesn't really need mlx5e_priv object, we can remove
      this redundant dependency by passing the eswitch object directly to
      the sample object constructor.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      5024fa95
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f444fea7
      Jakub Kicinski authored
      drivers/ptp/Kconfig:
        55c8fca1 ("ptp_pch: Restore dependency on PCI")
        e5f31552 ("ethernet: fix PTP_1588_CLOCK dependencies")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f444fea7
  2. 19 Aug, 2021 21 commits