Commits · e44ef1d4de577aca369199b16da382d6e5aafaa3 · Kirill Smelkov / linux

02 Jan, 2022 9 commits

Hamish MacDonald authored Jan 01, 2022

Removed spaces and added a tab that was causing an error on checkpatch
Signed-off-by: Hamish MacDonald <elusivenode@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e44ef1d4

ipv6: ioam: Support for Queue depth data field · b63c5478

Justin Iurman authored Dec 30, 2021

v3:
 - Report 'backlog' (bytes) instead of 'qlen' (number of packets)

v2:
 - Fix sparse warning (use rcu_dereference)

This patch adds support for the queue depth in IOAM trace data fields.

The draft [1] says the following:

   The "queue depth" field is a 4-octet unsigned integer field.  This
   field indicates the current length of the egress interface queue of
   the interface from where the packet is forwarded out.  The queue
   depth is expressed as the current amount of memory buffers used by
   the queue (a packet could consume one or more memory buffers,
   depending on its size).

An existing function (i.e., qdisc_qstats_qlen_backlog) is used to
retrieve the current queue length without reinventing the wheel.

Note: it was tested and qlen is increasing when an artificial delay is
added on the egress with tc.

  [1] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.2.7Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b63c5478

net/smc: remove redundant re-assignment of pointer link · 3a856c14

Colin Ian King authored Dec 30, 2021

The pointer link is being re-assigned the same value that it was
initialized with in the previous declaration statement. The
re-assignment is redundant and can be removed.

Fixes: 387707fd ("net/smc: convert static link ID to dynamic references")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a856c14

net/smc: Introduce TCP ULP support · d7cd421d

Tony Lu authored Dec 28, 2021

This implements TCP ULP for SMC, helps applications to replace TCP with
SMC protocol in place. And we use it to implement transparent
replacement.

This replaces original TCP sockets with SMC, reuse TCP as clcsock when
calling setsockopt with TCP_ULP option, and without any overhead.

To replace TCP sockets with SMC, there are two approaches:

- use setsockopt() syscall with TCP_ULP option, if error, it would
  fallback to TCP.

- use BPF prog with types BPF_CGROUP_INET_SOCK_CREATE or others to
  replace transparently. BPF hooks some points in create socket, bind
  and others, users can inject their BPF logics without modifying their
  applications, and choose which connections should be replaced with SMC
  by calling setsockopt() in BPF prog, based on rules, such as TCP tuples,
  PID, cgroup, etc...

  BPF doesn't support calling setsockopt with TCP_ULP now, I will send the
  patches after this accepted.
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7cd421d

Merge branch 'smc-RDMA-net-namespace' · ab6dd952

David S. Miller authored Jan 02, 2022

Tony Lu says:

====================
RDMA device net namespace support for SMC

This patch set introduces net namespace support for linkgroups.

Path 1 is the main approach to implement net ns support.

Path 2 - 4 are the additional modifications to let us know the netns.
Also, I will submit changes of smc-tools to github later.

Currently, smc doesn't support net namespace isolation. The ibdevs
registered to smc are shared for all linkgroups and connections. When
running applications in different net namespaces, such as container
environment, applications should only use the ibdevs that belongs to the
same net namespace.

This adds a new field, net, in smc linkgroup struct. During first
contact, it checks and find the linkgroup has same net namespace, if
not, it is going to create and initialized the net field with first
link's ibdev net namespace. When finding the rdma devices, it also checks
the sk net device's and ibdev's net namespaces. After net namespace
destroyed, the net device and ibdev move to root net namespace,
linkgroups won't be matched, and wait for lgr free.

If rdma net namespace exclusive mode is not enabled, it behaves as
before.

Steps to enable and test net namespaces:

1. enable RDMA device net namespace exclusive support
	rdma system set netns exclusive # default is shared

2. create new net namespace, move and initialize them
	ip netns add test1
	rdma dev set mlx5_1 netns test1
	ip link set dev eth2 netns test1
	ip netns exec test1 ip link set eth2 up
	ip netns exec test1 ip addr add ${HOST_IP}/26 dev eth2

3. setup server and client, connect N <-> M
	ip netns exec test1 smc_run sockperf server --tcp # server
	ip netns exec test1 smc_run sockperf pp --tcp -i ${SERVER_IP} # client

4. netns isolated linkgroups (2 * 2 mesh) with their own linkgroups
  - server
LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
00000100 SERV     SINGLE      0       0
00000200 SERV     SINGLE      0       0
00000300 SERV     SINGLE      0       0
00000400 SERV     SINGLE      0       0

  - client
LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
00000100 CLNT     SINGLE      0       0
00000200 CLNT     SINGLE      0       0
00000300 CLNT     SINGLE      0       0
00000400 CLNT     SINGLE      0       0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ab6dd952

net/smc: Add net namespace for tracepoints · a838f508

Tony Lu authored Dec 28, 2021

This prints net namespace ID, helps us to distinguish different net
namespaces when using tracepoints.
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a838f508

net/smc: Print net namespace in log · de2fea7b

Tony Lu authored Dec 28, 2021

This adds net namespace ID to the kernel log, net_cookie is unique in
the whole system. It is useful in container environment.
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de2fea7b

net/smc: Add netlink net namespace support · 79d39fc5

Tony Lu authored Dec 28, 2021

This adds net namespace ID to diag of linkgroup, helps us to distinguish
different namespaces, and net_cookie is unique in the whole system.
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79d39fc5

net/smc: Introduce net namespace support for linkgroup · 0237a3a6

Tony Lu authored Dec 28, 2021

Currently, rdma device supports exclusive net namespace isolation,
however linkgroup doesn't know and support ibdev net namespace.
Applications in the containers don't want to share the nics if we
enabled rdma exclusive mode. Every net namespaces should have their own
linkgroups.

This patch introduce a new field net for linkgroup, which is standing
for the ibdev net namespace in the linkgroup. The net in linkgroup is
initialized with the net namespace of link's ibdev. It compares the net
of linkgroup and sock or ibdev before choose it, if no matched, create
new one in current net namespace. If rdma net namespace exclusive mode
is not enabled, it behaves as before.
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0237a3a6

31 Dec, 2021 31 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · e63a0234

David S. Miller authored Dec 31, 2021

Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e63a0234

Merge tag 'mlx5-updates-2021-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ce2b6eb4

David S. Miller authored Dec 31, 2021

Saeed Mahameed says:

====================
mlx5 Software steering, New features and optimizations

This patch series brings various SW steering features, optimizations and
debug-ability focused improvements.

 1) Expose debugfs for dumping the SW steering resources
 2) Removing unused fields
 3) support for matching on new fields
 4) steering optimization for RX/TX-only rules
 5) Make Software steering the default steering mechanism when
    available, applies only to Switchdev mode FDB

From Yevgeny Kliteynik and Muhammad Sammar:

 - Patch 1 fixes an error flow in creating matchers
 - Patch 2 fix lower case macro prefix "mlx5_" to "MLX5_"
 - Patch 3 removes unused struct member in mlx5dr_matcher
 - Patch 4 renames list field in matcher struct to list_node to reflect the
   fact that is field is for list node that is stored on another struct's lists
 - Patch 5 adds checking for valid Flex parser ID value
 - Patch 6 adds the missing reserved fields to dr_match_param and aligns it to
   the format that is defined by HW spec
 - Patch 7 adds support for dumping SW steering (SMFS) resources using debugfs
   in CSV format: domain and its tables, matchers and rules
 - Patch 8 adds support for a new destination type - UPLINK
 - Patch 9 adds WARN_ON_ONCE on refcount checks in SW steering object destructors
 - Patches 10, 11, 12 add misc5 flow table match parameters and add support for
   matching on tunnel headers 0 and 1
 - Patch 13 adds support for matching on geneve_tlv_option_0_exist field
 - Patch 14 implements performance optimization for for empty or RX/TX-only
   matchers by splitting RX and TX matchers handling: matcher connection in the
   matchers chain is split into two separate lists (RX only and TX only), which
   solves a usecase of many RX or TX only rules that create a long chain of
   RX/TX-only paths w/o the actual rules
 - Patch 15 ignores modify TTL if device doesn't support it instead of
   adding and unsupported action
 - Patch 16 sets SMFS as a default steering mode
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ce2b6eb4

Merge branch 'hnsd3-next' · 20a9013e

David S. Miller authored Dec 31, 2021

Guangbin Huang says:

====================
net: hns3: refactor cmdq functions in PF/VF

Currently, hns3 PF and VF module have two sets of cmdq APIs to provide
cmdq message interaction functions. Most of these APIs are the same. The
only differences are the function variables and names with pf and vf
suffixes. These two sets of cmdq APIs are redundent and add extra bug fix
work.

This series refactor the cmdq APIs in hns3 PF and VF by implementing one
set of common cmdq APIs for PF and VF reuse and deleting the old APIs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

20a9013e