1. 18 Jun, 2024 6 commits
  2. 17 Jun, 2024 8 commits
    • David S. Miller's avatar
      Merge branch 'net-smc-IPPROTO_SMC' · 4314175a
      David S. Miller authored
      D. Wythe says:
      
      ====================
      Introduce IPPROTO_SMC
      
      This patch allows to create smc socket via AF_INET,
      similar to the following code,
      
      /* create v4 smc sock */
      v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
      
      /* create v6 smc sock */
      v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
      
      There are several reasons why we believe it is appropriate here:
      
      1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
      address. There is no AF_SMC address at all.
      
      2. Create smc socket in the AF_INET(6) path, which allows us to reuse
      the infrastructure of AF_INET(6) path, such as common ebpf hooks.
      Otherwise, smc have to implement it again in AF_SMC path. Such as:
        1. Replace IPPROTO_TCP with IPPROTO_SMC in the socket() syscall
           initiated by the user, without the use of LD-PRELOAD.
        2. Select whether immediate fallback is required based on peer's port/ip
           before connect().
      
      A very significant result is that we can now use eBPF to implement smc_run
      instead of LD_PRELOAD, who is completely ineffective in scenarios of static
      linking.
      
      Another potential value is that we are attempting to optimize the
      performance of fallback socks, where merging socks is an important part,
      and it relies on the creation of SMC sockets under the AF_INET path.
      (More information :
      https://lore.kernel.org/netdev/1699442703-25015-1-git-send-email-alibuda@linux.alibaba.com/T/)
      
      v2 -> v1:
      
      - Code formatting, mainly including alignment and annotation repair.
      - move inet_smc proto ops to inet_smc.c, avoiding af_smc.c becoming too bulky.
      - Fix the issue where refactoring affects the initialization order.
      - Fix compile warning (unused out_inet_prot) while CONFIG_IPV6 was not set.
      
      v3 -> v2:
      
      - Add Alibaba's copyright information to the newfile
      
      v4 -> v3:
      
      - Fix some spelling errors
      - Align function naming style with smc_sock_init() to smc_sk_init()
      - Reversing the order of the conditional checks on clcsock to make the code more intuitive
      
      v5 -> v4:
      
      - Fix some spelling errors
      - Added comment, "/* CONFIG_IPV6 */", after the final #endif directive.
      - Rename smc_inet.h and smc_inet.c to smc_inet.h and smc_inet.c
      - Encapsulate the initialization and destruction of inet_smc in inet_smc.c,
        rather than implementing it directly in af_smc.c.
      - Remove useless header files in smc_inet.h
      - Make smc_inet_prot_xxx and smc_inet_sock_init() to be static, since it's
        only used in smc_inet.c
      
      v6 -> v5:
      
      - Wrapping lines to not exceed 80 characters
      - Combine initialization and error handling of smc_inet6 into the same #if
        macro block.
      
      v7 -> v6:
      
      - Modify the value of IPPROTO_SMC to 256 so that it does not affect IPPROTO-MAX
      
      v8 -> v7:
      
      - Remove useless declarations.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4314175a
    • D. Wythe's avatar
      net/smc: Introduce IPPROTO_SMC · d25a92cc
      D. Wythe authored
      This patch allows to create smc socket via AF_INET,
      similar to the following code,
      
      /* create v4 smc sock */
      v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
      
      /* create v6 smc sock */
      v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
      
      There are several reasons why we believe it is appropriate here:
      
      1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
      address. There is no AF_SMC address at all.
      
      2. Create smc socket in the AF_INET(6) path, which allows us to reuse
      the infrastructure of AF_INET(6) path, such as common ebpf hooks.
      Otherwise, smc have to implement it again in AF_SMC path.
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Tested-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Tested-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d25a92cc
    • D. Wythe's avatar
      net/smc: expose smc proto operations · 13543d02
      D. Wythe authored
      Externalize smc proto operations (smc_xxx) to allow
      access from files other than af_smc.c
      
      This is in preparation for the subsequent implementation
      of the AF_INET version of SMC.
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Tested-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Tested-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13543d02
    • D. Wythe's avatar
      net/smc: refactoring initialization of smc sock · d0e35656
      D. Wythe authored
      This patch aims to isolate the shared components of SMC socket
      allocation by introducing smc_sk_init() for sock initialization
      and __smc_create_clcsk() for the initialization of clcsock.
      
      This is in preparation for the subsequent implementation of the
      AF_INET version of SMC.
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Tested-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Tested-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0e35656
    • Jakub Kicinski's avatar
      net: make for_each_netdev_dump() a little more bug-proof · f22b4b55
      Jakub Kicinski authored
      I find the behavior of xa_for_each_start() slightly counter-intuitive.
      It doesn't end the iteration by making the index point after the last
      element. IOW calling xa_for_each_start() again after it "finished"
      will run the body of the loop for the last valid element, instead
      of doing nothing.
      
      This works fine for netlink dumps if they terminate correctly
      (i.e. coalesce or carefully handle NLM_DONE), but as we keep getting
      reminded legacy dumps are unlikely to go away.
      
      Fixing this generically at the xa_for_each_start() level seems hard -
      there is no index reserved for "end of iteration".
      ifindexes are 31b wide, tho, and iterator is ulong so for
      for_each_netdev_dump() it's safe to go to the next element.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f22b4b55
    • David S. Miller's avatar
      Merge branch 'mlx5-genl-queue-stats' · 69776921
      David S. Miller authored
      Joe Damato says:
      
      ====================
      mlx5: Add netdev-genl queue stats
      
      Welcome to v5.
      
      Switched from RFC to just a v5, because I think this is pretty close.
      Minor changes from v4 summarized below in the changelog.
      
      Note that my NIC does not seem to support PTP and I couldn't get the
      mlnx-tools mlnx_qos script to work, so I was only able to test the
      following cases:
      
      - device up at boot
      - adjusting queue counts
      - device down (e.g. ip link set dev eth4 down)
      
      Please see the commit message of patch 2/2 for more details on output
      and test cases.
      
      rfcv4 thread:
        https://lore.kernel.org/linux-kernel/20240604004629.299699-1-jdamato@fastly.com/T/
      
      rfcv4 -> v5:
       - Patch 1/2: change variable name 'mlx5e_qid' to 'txq_ix'.
       - Patch 2/2:
          - remove logic in mlx5e_get_queue_stats_rx for PTP. PTP RX are
            always reported in base.
          - report PTP TX in mlx5e_get_base_stats only if:
            - PTP has ever been opened, and
            - either PTP is NULL (closed) or the MLX5E_PTP_STATE_TX bit in its
              state is not set
      
          Otherwise, PTP TX will be reported when the txq_ix is passed into
          mlx5e_get_queue_stats_tx
      
      rfcv3 -> rfcv4:
       - Patch 1/2 now creates a mapping (priv->txq2sq_stats) which maps txq
         indices to sq_stats structures so stats can be accessed directly.
         This mapping is kept up to date along side txq2sq.
      
       - Patch 2/2:
         - All mutex_lock/unlock on state_lock has been dropped.
         - mlx5e_get_queue_stats_rx now uses ASSERT_RTNL() and has a special
           case for PTP. If PTP was ever opened, is currently opened, and the
           channel index matches, stats for PTP RX are output.
         - mlx5e_get_queue_stats_tx rewritten to use priv->txq2sq_stats. No
           corner cases are needed here because any txq idx (passed in as i)
           will have an up to date mapping in priv->txq2sq_stats.
         - mlx5e_get_base_stats:
           - in the RX case:
             - iterates from [params.num_channels, stats_nch) collecting
               stats.
             - if ptp was ever opened but is currently closed, add the PTP
               stats.
           - in the TX case:
             - handle 2 cases:
               - the channel is available, so sum only the unavailable TCs
                 [mlx5e_get_dcb_num_tc, max_opened_tc).
               - the channel is unavailable, so sum all TCs [0, max_opened_tc).
             - if ptp was ever opened but is currently closed, add the PTP
               sq stats.
      
      v2 -> rfcv3:
       - Added patch 1/2 which creates some helpers for computing the txq_ix
         and ch_ix/tc_ix.
      
       - Patch 2/2 modified in several ways:
         - Fixed variable declarations in mlx5e_get_queue_stats_rx to be at
           the start of the function.
         - mlx5e_get_queue_stats_tx rewritten to access sq stats directly by
           using the helpers added in the previous patch.
         - mlx5e_get_base_stats modified in several ways:
           - Took the state_lock when accessing priv->channels.
           - For the base RX stats, code was simplified to call
             mlx5e_get_queue_stats_rx instead of repeating the same code.
           - For the base TX stats, I attempted to implement what I think
             Tariq suggested in the previous thread:
               - for available channels, only unavailable TC stats are summed
      	 - for unavailable channels, all stats for TCs up to
      	   max_opened_tc are summed.
      
      v1 - > v2:
        - Essentially a full rewrite after comments from Jakub, Tariq, and
          Zhu.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69776921
    • Joe Damato's avatar
      net/mlx5e: Add per queue netdev-genl stats · 7b66ae53
      Joe Damato authored
      ./cli.py --spec netlink/specs/netdev.yaml \
               --dump qstats-get --json '{"scope": "queue"}'
      
      ...snip
      
       {'ifindex': 7,
        'queue-id': 62,
        'queue-type': 'rx',
        'rx-alloc-fail': 0,
        'rx-bytes': 105965251,
        'rx-packets': 179790},
       {'ifindex': 7,
        'queue-id': 0,
        'queue-type': 'tx',
        'tx-bytes': 9402665,
        'tx-packets': 17551},
      
      ...snip
      
      Also tested with the script tools/testing/selftests/drivers/net/stats.py
      in several scenarios to ensure stats tallying was correct:
      
      - on boot (default queue counts)
      - adjusting queue count up or down (ethtool -L eth0 combined ...)
      
      The tools/testing/selftests/drivers/net/stats.py brings the device up,
      so to test with the device down, I did the following:
      
      $ ip link show eth4
      7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..]
        [..snip..]
      
      $ cat /proc/net/dev | grep eth4
      eth4: 235710489  434811 [..snip rx..] 2878744 21227  [..snip tx..]
      
      $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
                 --dump qstats-get --json '{"ifindex": 7}'
      [{'ifindex': 7,
        'rx-alloc-fail': 0,
        'rx-bytes': 235710489,
        'rx-packets': 434811,
        'tx-bytes': 2878744,
        'tx-packets': 21227}]
      
      Compare the values in /proc/net/dev match the output of cli for the same
      device, even while the device is down.
      
      Note that while the device is down, per queue stats output nothing
      (because the device is down there are no queues):
      
      $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
                 --dump qstats-get --json '{"scope": "queue", "ifindex": 7}'
      []
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b66ae53
    • Joe Damato's avatar
      net/mlx5e: Add txq to sq stats mapping · 0a3e5c1b
      Joe Damato authored
      mlx5 currently maps txqs to an sq via priv->txq2sq. It is useful to map
      txqs to sq_stats, as well, for direct access to stats.
      
      Add priv->txq2sq_stats and insert mappings. The mappings will be used
      next to tabulate stats information.
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a3e5c1b
  3. 15 Jun, 2024 23 commits
  4. 14 Jun, 2024 3 commits