Commits · 586d87021f224b0434372f5b4418216e29b0a544 · Kirill Smelkov / linux

27 Aug, 2024 22 commits

selftests/net: Add trace events matching to tcp_ao · 586d8702

Dmitry Safonov authored Aug 23, 2024

Setup trace points, add a new ftrace instance in order to not interfere
with the rest of the system, filtering by net namespace cookies.
Raise a new background thread that parses trace_pipe, matches them with
the list of expected events.

Wiring up trace events to selftests provides another insight if there is
anything unexpected happining in the tcp-ao code (i.e. key rotation when
it's not expected).

Note: in real programs libtraceevent should be used instead of this
manual labor of setting ftrace up and parsing. I'm not using it here
as I don't want to have an .so library dependency that one would have to
bring into VM or DUT (Device Under Test). Please, don't copy it over
into any real world programs, that aren't tests.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-8-05623636fe8c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

586d8702

selftests/net: Synchronize client/server before counters checks · 044e0370

Dmitry Safonov authored Aug 23, 2024

On tests that are expecting failure the timeout value is
TEST_RETRANSMIT_SEC == 1 second. Which is big enough for most of devices
under tests. But on a particularly slow machine/VM, 1 second might be
not enough for another thread to be scheduled and attempt to connect().
It is not a problem for tests that expect connect() to succeed as
the timeout value for them (TEST_TIMEOUT_SEC) is intentionally bigger.

One obvious way to solve this would be to increase TEST_RETRANSMIT_SEC.
But as all tests would increase the timeouts, that's going to sum up.

But here is less obvious way that keeps timeouts for expected connect()
failures low: just synchronize the two threads, which will assure that
before counter checks the other thread got a chance to run and timeout
on connect(). The expected increase of the related counter for listen()
socket will yet test the expected failure.

Never happens on my machine, but I suppose the majority of netdev's
connect-deny-* flakes [1] are caused by this.

Prevents the following testing issue:
> # selftests: net/tcp_ao: connect-deny_ipv6
> # 1..21
> # # 462[lib/setup.c:243] rand seed 1720905426
> # TAP version 13
> # ok 1 Non-AO server + AO client
> # not ok 2 Non-AO server + AO client: TCPAOKeyNotFound counter did not increase: 0 <= 0
> # ok 3 AO server + Non-AO client
> # ok 4 AO server + Non-AO client: counter TCPAORequired increased 0 => 1
...

[1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao/results/681741/6-connect-deny-ipv6/stdoutSigned-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-7-05623636fe8c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

044e0370

selftests/tcp_ao: Fix printing format for uint64_t · 1c69e1f4

Mohammad Nassiri authored Aug 23, 2024

It's not safe to use '%zu' specifier for printing uint64_t on 32-bit
systems. For uint64_t, we should use the 'PRIu64' macro from
the inttypes.h library. This ensures that the uint64_t is printed
correctly from the selftests regardless of the system architecture.
Signed-off-by: Mohammad Nassiri <mnassiri@ciena.com>
[Added missing spaces in fail/ok messages and uint64_t cast in
 setsockopt-closed, as otherwise it was giving warnings on 64bit.
 And carried it to netdev ml]
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-6-05623636fe8c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1c69e1f4

selftests/net: Don't forget to close nsfd after switch_save_ns() · a9e16934

Dmitry Safonov authored Aug 23, 2024

The switch_save_ns() helper suppose to help switching to another
namespace for some action and to return back to original namespace.

The fd should be closed.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-5-05623636fe8c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a9e16934

selftests/net: Open /proc/thread-self in open_netns() · 8acb1806

Dmitry Safonov authored Aug 23, 2024

It turns to be that open_netns() is called rarely from the child-thread
and more often from parent-thread. Yet, on initialization of kconfig
checks, either of threads may reach kconfig_lock mutex first.
VRF-related checks do create a temporary ksft-check VRF in
an unshare()'d namespace and than setns() back to the original.
As original was opened from "/proc/self/ns/net", it's valid for
thread-leader (parent), but it's invalid for the child, resulting
in the following failure on tests that check has_vrfs() support:
> # ok 54 TCP-AO required on socket + TCP-MD5 key: prefailed as expected: Key was rejected by service
> # not ok 55 # error 381[unsigned-md5.c:24] Failed to add a VRF: -17
> # not ok 56 # error 383[unsigned-md5.c:33] Failed to add a route to VRF: -22: Key was rejected by service
> not ok 1 selftests: net/tcp_ao: unsigned-md5_ipv6 # exit=1

Use "/proc/thread-self/ns/net" which is valid for any thread.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-4-05623636fe8c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8acb1806

selftests/net: Be consistent in kconfig checks · bc2468f9

Dmitry Safonov authored Aug 23, 2024

Most of the functions in tcp-ao lib/ return negative errno or -1 in case
of a failure. That creates inconsistencies in lib/kconfig, which saves
what was the error code. As well as the uninitialized kconfig value is
-1, which also may be the result of a check.

Define KCONFIG_UNKNOWN and save negative return code, rather than
libc-style errno.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-3-05623636fe8c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bc2468f9

selftests/net: Provide test_snprintf() helper · 7053e788

Dmitry Safonov authored Aug 23, 2024

Instead of pre-allocating a fixed-sized buffer of TEST_MSG_BUFFER_SIZE
and printing into it, call vsnprintf() with str = NULL, which will
return the needed size of the buffer. This hack is documented in
man 3 vsnprintf.

Essentially, in C++ terms, it re-invents std::stringstream, which is
going to be used to print different tracing paths and formatted strings.
Use it straight away in __test_print() - which is thread-safe version of
printing in selftests.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-2-05623636fe8c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7053e788

selftests/net: Clean-up double assignment · 79504a47

Dmitry Safonov authored Aug 23, 2024

Correct copy'n'paste typo: the previous line already initialises get_all
to 1.
Reported-by: Nassiri, Mohammad <mnassiri@ciena.com>
Closes: https://lore.kernel.org/all/DM6PR04MB4202BC58A9FD5BDD24A16E8EC56F2@DM6PR04MB4202.namprd04.prod.outlook.com/Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240823-tcp-ao-selftests-upd-6-12-v4-1-05623636fe8c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

79504a47

l2tp: avoid using drain_workqueue in l2tp_pre_exit_net · 73d33bd0

James Chapman authored Aug 23, 2024

Recent commit fc7ec7f5 ("l2tp: delete sessions using work queue")
incorrectly uses drain_workqueue. The use of drain_workqueue in
l2tp_pre_exit_net is flawed because the workqueue is shared by all
nets and it is therefore possible for new work items to be queued
for other nets while drain_workqueue runs.

Instead of using drain_workqueue, use __flush_workqueue twice. The
first one will run all tunnel delete work items and any work already
queued. When tunnel delete work items are run, they may queue
new session delete work items, which the second __flush_workqueue will
run.

In l2tp_exit_net, warn if any of the net's idr lists are not empty.

Fixes: fc7ec7f5 ("l2tp: delete sessions using work queue")
Signed-off-by: James Chapman <jchapman@katalix.com>
Link: https://patch.msgid.link/20240823142257.692667-1-jchapman@katalix.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

73d33bd0

Merge branch 'add-gmac-support-for-rk3576' · aed7136a

Jakub Kicinski authored Aug 27, 2024

Detlev Casanova says:

====================
Add GMAC support for rk3576

Add the necessary constants and functions to support the GMAC devices on
the rk3576.
====================

Link: https://patch.msgid.link/20240823141318.51201-1-detlev.casanova@collabora.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

aed7136a

ethernet: stmmac: dwmac-rk: Add GMAC support for RK3576 · f9cc9997

David Wu authored Aug 23, 2024

Add constants and callback functions for the dwmac on RK3576 soc.
Signed-off-by: David Wu <david.wu@rock-chips.com>
[rebase, extracted bindings]
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20240823141318.51201-4-detlev.casanova@collabora.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f9cc9997

dt-bindings: net: Add support for rk3576 dwmac · 299e2aef

Detlev Casanova authored Aug 23, 2024

Add a rockchip,rk3576-gmac compatible for supporting the 2 gmac
devices on the rk3576.
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20240823141318.51201-3-detlev.casanova@collabora.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

299e2aef

ethernet: stmmac: dwmac-rk: Fix typo for RK3588 code · 78a60497

Detlev Casanova authored Aug 23, 2024

Fix SELET -> SELECT in RK3588_GMAC_CLK_SELET_CRU and
RK3588_GMAC_CLK_SELET_IO
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20240823141318.51201-2-detlev.casanova@collabora.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

78a60497

net: ethernet: ti: am65-cpsw-nuss: Replace of_node_to_fwnode() with more suitable API · 3333df3b

Andy Shevchenko authored Aug 23, 2024

of_node_to_fwnode() is a IRQ domain specific implementation of
of_fwnode_handle(). Replace the former with more suitable API.
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822230550.708112-1-andy.shevchenko@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3333df3b

net: fix unreleased lock in cable test · 3d6a0c4f

Diogo Jahchan Koike authored Aug 26, 2024

fix an unreleased lock in out_dev_put path by removing the (now)
unnecessary path.

Reported-by: syzbot+c641161e97237326ea74@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c641161e97237326ea74
Fixes: 3688ff30 ("net: ethtool: cable-test: Target the command to the requested PHY")
Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20240826134656.94892-1-djahchankoike@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3d6a0c4f

Merge branch 'tc-adjust-network-header-after-2nd-vlan-push' · f8fdda9e

Paolo Abeni authored Aug 27, 2024

Boris Sukholitko says:

====================
tc: adjust network header after 2nd vlan push

<tldr>
skb network header of the single-tagged vlan packet continues to point the
vlan payload (e.g. IP) after second vlan tag is pushed by tc act_vlan. This
causes problem at the dissector which expects double-tagged packet network
header to point to the inner vlan.

The fix is to adjust network header in tcf_act_vlan.c but requires
refactoring of skb_vlan_push function.
</tldr>

Consider the following shell script snippet configuring TC rules on the
veth interface:

ip link add veth0 type veth peer veth1
ip link set veth0 up
ip link set veth1 up

tc qdisc add dev veth0 clsact

tc filter add dev veth0 ingress pref 10 chain 0 flower \
	num_of_vlans 2 cvlan_ethtype 0x800 action goto chain 5
tc filter add dev veth0 ingress pref 20 chain 0 flower \
	num_of_vlans 1 action vlan push id 100 \
	protocol 0x8100 action goto chain 5
tc filter add dev veth0 ingress pref 30 chain 5 flower \
	num_of_vlans 2 cvlan_ethtype 0x800 action simple sdata "success"

Sending double-tagged vlan packet with the IP payload inside:

cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 64   ..........."...d
0010  81 00 00 14 08 00 45 04 00 26 04 d2 00 00 7f 11   ......E..&......
0020  18 ef 0a 00 00 01 14 00 00 02 00 00 00 00 00 12   ................
0030  e1 c7 00 00 00 00 00 00 00 00 00 00               ............
ENDS

will match rule 10, goto rule 30 in chain 5 and correctly emit "success" to
the dmesg.

OTOH, sending single-tagged vlan packet:

cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 14   ..........."....
0010  08 00 45 04 00 2a 04 d2 00 00 7f 11 18 eb 0a 00   ..E..*..........
0020  00 01 14 00 00 02 00 00 00 00 00 16 e1 bf 00 00   ................
0030  00 00 00 00 00 00 00 00 00 00 00 00               ............
ENDS

will match rule 20, will push the second vlan tag but will *not* match
rule 30. IOW, the match at rule 30 fails if the second vlan was freshly
pushed by the kernel.

Lets look at  __skb_flow_dissect working on the double-tagged vlan packet.
Here is the relevant code from around net/core/flow_dissector.c:1277
copy-pasted here for convenience:

	if (dissector_vlan == FLOW_DISSECTOR_KEY_MAX &&
	    skb && skb_vlan_tag_present(skb)) {
		proto = skb->protocol;
	} else {
		vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
					    data, hlen, &_vlan);
		if (!vlan) {
			fdret = FLOW_DISSECT_RET_OUT_BAD;
			break;
		}

		proto = vlan->h_vlan_encapsulated_proto;
		nhoff += sizeof(*vlan);
	}

The "else" clause above gets the protocol of the encapsulated packet from
the skb data at the network header location. printk debugging has showed
that in the good double-tagged packet case proto is
htons(0x800 == ETH_P_IP) as expected. However in the single-tagged packet
case proto is garbage leading to the failure to match tc filter 30.

proto is being set from the skb header pointed by nhoff parameter which is
defined at the beginning of __skb_flow_dissect
(net/core/flow_dissector.c:1055 in the current version):

		nhoff = skb_network_offset(skb);

Therefore the culprit seems to be that the skb network offset is different
between double-tagged packet received from the interface and single-tagged
packet having its vlan tag pushed by TC.

Lets look at the interesting points of the lifetime of the single/double
tagged packets as they traverse our packet flow.

Both of them will start at __netif_receive_skb_core where the first vlan
tag will be stripped:

	if (eth_type_vlan(skb->protocol)) {
		skb = skb_vlan_untag(skb);
		if (unlikely(!skb))
			goto out;
	}

At this stage in double-tagged case skb->data points to the second vlan tag
while in single-tagged case skb->data points to the network (eg. IP)
header.

Looking at TC vlan push action (net/sched/act_vlan.c) we have the following
code at tcf_vlan_act (interesting points are in square brackets):

	if (skb_at_tc_ingress(skb))
[1]		skb_push_rcsum(skb, skb->mac_len);

	....

	case TCA_VLAN_ACT_PUSH:
		err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
				    (p->tcfv_push_prio << VLAN_PRIO_SHIFT),
				    0);
		if (err)
			goto drop;
		break;

	....

out:
	if (skb_at_tc_ingress(skb))
[3]		skb_pull_rcsum(skb, skb->mac_len);

And skb_vlan_push (net/core/skbuff.c:6204) function does:

		err = __vlan_insert_tag(skb, skb->vlan_proto,
					skb_vlan_tag_get(skb));
		if (err)
			return err;

		skb->protocol = skb->vlan_proto;
[2]		skb->mac_len += VLAN_HLEN;

in the case of pushing the second tag. Lets look at what happens with
skb->data of the single-tagged packet at each of the above points:

1. As a result of the skb_push_rcsum, skb->data is moved back to the start
   of the packet.

2. First VLAN tag is moved from the skb into packet buffer, skb->mac_len is
   incremented, skb->data still points to the start of the packet.

3. As a result of the skb_pull_rcsum, skb->data is moved forward by the
   modified skb->mac_len, thus pointing to the network header again.

Then __skb_flow_dissect will get confused by having double-tagged vlan
packet with the skb->data at the network header.

The solution for the bug is to preserve "skb->data at second vlan header"
semantics in the skb_vlan_push function. We do this by manipulating
skb->network_header rather than skb->mac_len. skb_vlan_push callers are
updated to do skb_reset_mac_len.

More about the patch series:

* patch 1 fixes skb_vlan_push and the callers
* patch 2 adds ingress tc_actions test
* patch 3 adds egress tc_actions test
====================

Link: https://patch.msgid.link/20240822103510.468293-1-boris.sukholitko@broadcom.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

f8fdda9e

selftests: tc_actions: test egress 2nd vlan push · 2da44703

Boris Sukholitko authored Aug 22, 2024

Add new test checking the correctness of inner vlan flushing to the skb
data when outer vlan tag is added through act_vlan on egress.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

2da44703

selftests: tc_actions: test ingress 2nd vlan push · 59c330ec

Boris Sukholitko authored Aug 22, 2024

Add new test checking the correctness of inner vlan flushing to the skb
data when outer vlan tag is added through act_vlan on ingress.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

59c330ec

tc: adjust network header after 2nd vlan push · 93886372

Boris Sukholitko authored Aug 22, 2024

<tldr>
skb network header of the single-tagged vlan packet continues to point the
vlan payload (e.g. IP) after second vlan tag is pushed by tc act_vlan. This
causes problem at the dissector which expects double-tagged packet network
header to point to the inner vlan.

The fix is to adjust network header in tcf_act_vlan.c but requires
refactoring of skb_vlan_push function.
</tldr>

Consider the following shell script snippet configuring TC rules on the
veth interface:

ip link add veth0 type veth peer veth1
ip link set veth0 up
ip link set veth1 up

tc qdisc add dev veth0 clsact

tc filter add dev veth0 ingress pref 10 chain 0 flower \
	num_of_vlans 2 cvlan_ethtype 0x800 action goto chain 5
tc filter add dev veth0 ingress pref 20 chain 0 flower \
	num_of_vlans 1 action vlan push id 100 \
	protocol 0x8100 action goto chain 5
tc filter add dev veth0 ingress pref 30 chain 5 flower \
	num_of_vlans 2 cvlan_ethtype 0x800 action simple sdata "success"

Sending double-tagged vlan packet with the IP payload inside:

cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 64   ..........."...d
0010  81 00 00 14 08 00 45 04 00 26 04 d2 00 00 7f 11   ......E..&......
0020  18 ef 0a 00 00 01 14 00 00 02 00 00 00 00 00 12   ................
0030  e1 c7 00 00 00 00 00 00 00 00 00 00               ............
ENDS

will match rule 10, goto rule 30 in chain 5 and correctly emit "success" to
the dmesg.

OTOH, sending single-tagged vlan packet:

cat <<ENDS | text2pcap - - | tcpreplay -i veth1 -
0000  00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 14   ..........."....
0010  08 00 45 04 00 2a 04 d2 00 00 7f 11 18 eb 0a 00   ..E..*..........
0020  00 01 14 00 00 02 00 00 00 00 00 16 e1 bf 00 00   ................
0030  00 00 00 00 00 00 00 00 00 00 00 00               ............
ENDS

will match rule 20, will push the second vlan tag but will *not* match
rule 30. IOW, the match at rule 30 fails if the second vlan was freshly
pushed by the kernel.

Lets look at  __skb_flow_dissect working on the double-tagged vlan packet.
Here is the relevant code from around net/core/flow_dissector.c:1277
copy-pasted here for convenience:

	if (dissector_vlan == FLOW_DISSECTOR_KEY_MAX &&
	    skb && skb_vlan_tag_present(skb)) {
		proto = skb->protocol;
	} else {
		vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
					    data, hlen, &_vlan);
		if (!vlan) {
			fdret = FLOW_DISSECT_RET_OUT_BAD;
			break;
		}

		proto = vlan->h_vlan_encapsulated_proto;
		nhoff += sizeof(*vlan);
	}

The "else" clause above gets the protocol of the encapsulated packet from
the skb data at the network header location. printk debugging has showed
that in the good double-tagged packet case proto is
htons(0x800 == ETH_P_IP) as expected. However in the single-tagged packet
case proto is garbage leading to the failure to match tc filter 30.

proto is being set from the skb header pointed by nhoff parameter which is
defined at the beginning of __skb_flow_dissect
(net/core/flow_dissector.c:1055 in the current version):

		nhoff = skb_network_offset(skb);

Therefore the culprit seems to be that the skb network offset is different
between double-tagged packet received from the interface and single-tagged
packet having its vlan tag pushed by TC.

Lets look at the interesting points of the lifetime of the single/double
tagged packets as they traverse our packet flow.

Both of them will start at __netif_receive_skb_core where the first vlan
tag will be stripped:

	if (eth_type_vlan(skb->protocol)) {
		skb = skb_vlan_untag(skb);
		if (unlikely(!skb))
			goto out;
	}

At this stage in double-tagged case skb->data points to the second vlan tag
while in single-tagged case skb->data points to the network (eg. IP)
header.

Looking at TC vlan push action (net/sched/act_vlan.c) we have the following
code at tcf_vlan_act (interesting points are in square brackets):

	if (skb_at_tc_ingress(skb))
[1]		skb_push_rcsum(skb, skb->mac_len);

	....

	case TCA_VLAN_ACT_PUSH:
		err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
				    (p->tcfv_push_prio << VLAN_PRIO_SHIFT),
				    0);
		if (err)
			goto drop;
		break;

	....

out:
	if (skb_at_tc_ingress(skb))
[3]		skb_pull_rcsum(skb, skb->mac_len);

And skb_vlan_push (net/core/skbuff.c:6204) function does:

		err = __vlan_insert_tag(skb, skb->vlan_proto,
					skb_vlan_tag_get(skb));
		if (err)
			return err;

		skb->protocol = skb->vlan_proto;
[2]		skb->mac_len += VLAN_HLEN;

in the case of pushing the second tag. Lets look at what happens with
skb->data of the single-tagged packet at each of the above points:

1. As a result of the skb_push_rcsum, skb->data is moved back to the start
   of the packet.

2. First VLAN tag is moved from the skb into packet buffer, skb->mac_len is
   incremented, skb->data still points to the start of the packet.

3. As a result of the skb_pull_rcsum, skb->data is moved forward by the
   modified skb->mac_len, thus pointing to the network header again.

Then __skb_flow_dissect will get confused by having double-tagged vlan
packet with the skb->data at the network header.

The solution for the bug is to preserve "skb->data at second vlan header"
semantics in the skb_vlan_push function. We do this by manipulating
skb->network_header rather than skb->mac_len. skb_vlan_push callers are
updated to do skb_reset_mac_len.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

93886372

Merge branch 'add-embedded-sync-feature-for-a-dpll-s-pin' · d0cb324c

Jakub Kicinski authored Aug 26, 2024

Arkadiusz Kubalewski says:

====================
Add Embedded SYNC feature for a dpll's pin

Introduce and allow DPLL subsystem users to get/set capabilities of
Embedded SYNC on a dpll's pin.
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
====================

Link: https://patch.msgid.link/20240822222513.255179-1-arkadiusz.kubalewski@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d0cb324c

ice: add callbacks for Embedded SYNC enablement on dpll pins · 87abc566

Arkadiusz Kubalewski authored Aug 23, 2024

Allow the user to get and set configuration of Embedded SYNC feature
on the ice driver dpll pins.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20240822222513.255179-3-arkadiusz.kubalewski@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

87abc566

dpll: add Embedded SYNC feature for a pin · cda1fba1

Arkadiusz Kubalewski authored Aug 23, 2024

Implement and document new pin attributes for providing Embedded SYNC
capabilities to the DPLL subsystem users through a netlink pin-get
do/dump messages. Allow the user to set Embedded SYNC frequency with
pin-set do netlink message.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20240822222513.255179-2-arkadiusz.kubalewski@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

cda1fba1

26 Aug, 2024 18 commits

net: dpaa: reduce number of synchronize_net() calls · 2c163922

Xi Huang authored Aug 22, 2024

In the function dpaa_napi_del(), we execute the netif_napi_del()
for each cpu, which is actually a high overhead operation
because each call to netif_napi_del() contains a synchronize_net(),
i.e. an RCU operation. In fact, it is only necessary to call
 __netif_napi_del and use synchronize_net() once outside of the loop.
This change is similar to commit 2543a600 ("gro_cells: reduce
number of synchronize_net() calls") and commit 5198d545 (" net:
remove napi_hash_del() from driver-facing API") 5198d545.
Signed-off-by: Xi Huang <xuiagnh@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240822072042.42750-1-xuiagnh@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2c163922

ipv6: avoid indirect calls for SOL_IP socket options · 89683b45

Eric Dumazet authored Aug 23, 2024

ipv6_setsockopt() can directly call ip_setsockopt()
instead of going through udp_prot.setsockopt()

ipv6_getsockopt() can directly call ip_getsockopt()
instead of going through udp_prot.getsockopt()

These indirections predate git history, not sure why they
were there.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240823140019.3727643-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

89683b45

net/ipv4: fix macro definition sk_for_each_bound_bhash · 9ceebd7a

Hongbo Li authored Aug 23, 2024

The macro sk_for_each_bound_bhash accepts a parameter
__sk, but it was not used, rather the sk2 is directly
used, so we replace the sk2 with __sk in macro.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://patch.msgid.link/20240823070453.3327832-1-lihongbo22@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9ceebd7a

tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process · 0d9e5df4

Jason Xing authored Aug 23, 2024

We found that one close-wait socket was reset by the other side
due to a new connection reusing the same port which is beyond our
expectation, so we have to investigate the underlying reason.

The following experiment is conducted in the test environment. We
limit the port range from 40000 to 40010 and delay the time to close()
after receiving a fin from the active close side, which can help us
easily reproduce like what happened in production.

Here are three connections captured by tcpdump:
127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965525191
127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 2769915070
127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1
127.0.0.1.40002 > 127.0.0.1.9999: Flags [F.], seq 1, ack 1
// a few seconds later, within 60 seconds
127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730
127.0.0.1.9999 > 127.0.0.1.40002: Flags [.], ack 2
127.0.0.1.40002 > 127.0.0.1.9999: Flags [R], seq 2965525193
// later, very quickly
127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730
127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 3120990805
127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1

As we can see, the first flow is reset because:
1) client starts a new connection, I mean, the second one
2) client tries to find a suitable port which is a timewait socket
   (its state is timewait, substate is fin_wait2)
3) client occupies that timewait port to send a SYN
4) server finds a corresponding close-wait socket in ehash table,
   then replies with a challenge ack
5) client sends an RST to terminate this old close-wait socket.

I don't think the port selection algo can choose a FIN_WAIT2 socket
when we turn on tcp_tw_reuse because on the server side there
remain unread data. In some cases, if one side haven't call close() yet,
we should not consider it as expendable and treat it at will.

Even though, sometimes, the server isn't able to call close() as soon
as possible like what we expect, it can not be terminated easily,
especially due to a second unrelated connection happening.

After this patch, we can see the expected failure if we start a
connection when all the ports are occupied in fin_wait2 state:
"Ncat: Cannot assign requested address."
Reported-by: Jade Dong <jadedong@tencent.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240823001152.31004-1-kerneljasonxing@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0d9e5df4

Merge branch 'net-pse-pd-tps23881-reset-gpio-support' · 73b22ba0

Jakub Kicinski authored Aug 26, 2024

Kyle Swenson says:

====================
net: pse-pd: tps23881: Reset GPIO support

On some boards, the TPS2388x's reset line (active low) is pulled low to
keep the chip in reset until the SoC pulls the device out of reset.
This series updates the device-tree binding for the tps23881 and then
adds support for the reset gpio handling in the tps23881 driver.

v1: https://lore.kernel.org/20240819190151.93253-1-kyle.swenson@est.tech
====================

Link: https://patch.msgid.link/20240822220100.3030184-1-kyle.swenson@est.techSigned-off-by: Jakub Kicinski <kuba@kernel.org>

73b22ba0

net: pse-pd: tps23881: Support reset-gpios · 69f47cad

Kyle Swenson authored Aug 22, 2024

The TPS23880/1 has an active-low reset pin that some boards connect to
the SoC to control when the TPS23880 is pulled out of reset.

Add support for this via a reset-gpios property in the DTS.
Signed-off-by: Kyle Swenson <kyle.swenson@est.tech>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240822220100.3030184-3-kyle.swenson@est.techSigned-off-by: Jakub Kicinski <kuba@kernel.org>

69f47cad

dt-bindings: pse: tps23881: add reset-gpios · ec82fa2c

Kyle Swenson authored Aug 22, 2024

The TPS23881 has an active-low reset pin that can be connected to an
SoC. Document this with the device-tree binding.
Signed-off-by: Kyle Swenson <kyle.swenson@est.tech>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240822220100.3030184-2-kyle.swenson@est.techSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ec82fa2c

net: ag71xx: move clk_eth out of struct · d2ab3bb8

Rosen Penev authored Aug 22, 2024

It's only used in one place. It doesn't need to be in the struct.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20240822192758.141201-1-rosenp@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d2ab3bb8

l2tp: avoid overriding sk->sk_user_data · 1461f5a3

Cong Wang authored Aug 22, 2024

Although commit 4a4cd703 ("l2tp: don't set sk_user_data in tunnel socket")
removed sk->sk_user_data usage, setup_udp_tunnel_sock() still touches
sk->sk_user_data, this conflicts with sockmap which also leverages
sk->sk_user_data to save psock.

Restore this sk->sk_user_data check to avoid such conflicts.

Fixes: 4a4cd703 ("l2tp: don't set sk_user_data in tunnel socket")
Reported-by: syzbot+8dbe3133b840c470da0e@syzkaller.appspotmail.com
Cc: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Tested-by: James Chapman <jchapman@katalix.com>
Reviewed-by: James Chapman <jchapman@katalix.com>
Link: https://patch.msgid.link/20240822182544.378169-1-xiyou.wangcong@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1461f5a3

Merge branch 'net-xilinx-axienet-multicast-fixes-and-improvements' · 7888173e

Jakub Kicinski authored Aug 26, 2024

Sean Anderson says:

====================
net: xilinx: axienet: Multicast fixes and improvements

This series has a few small patches improving the handling of multicast
addresses. In particular, it makes the driver a whole lot less spammy,
and adjusts things so we aren't in promiscuous mode when we have more
than four multicast addresses (a common occurance on modern systems).

As the hardware has a 4-entry CAM, the ideal method would be to "pack"
multiple addresses into one CAM entry. Something like:

entry.address = address[0] | address[1];
entry.mask = ~(address[0] ^ address[1]);

Which would make the entry match both addresses (along with some others
that would need to be filtered in software).

Mapping addresses to entries in an efficient way is a bit tricky. If
anyone knows of an in-tree example of something like this, I'd be glad
to hear about it.
====================

Link: https://patch.msgid.link/20240822154059.1066595-1-sean.anderson@linux.devSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7888173e

net: xilinx: axienet: Support IFF_ALLMULTI · 749e67d5

Sean Anderson authored Aug 22, 2024

Add support for IFF_ALLMULTI by configuring a single filter to match the
multicast address bit. This allows us to keep promiscuous mode disabled,
even when we have more than four multicast addresses. An even better
solution would be to "pack" addresses into the available CAM registers,
but that can wait for a future series.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822154059.1066595-6-sean.anderson@linux.devSigned-off-by: Jakub Kicinski <kuba@kernel.org>

749e67d5

net: xilinx: axienet: Don't set IFF_PROMISC in ndev->flags · 7a826fb3

Sean Anderson authored Aug 22, 2024

Contrary to the comment, we don't have to inform the net subsystem.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822154059.1066595-5-sean.anderson@linux.devSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7a826fb3

net: xilinx: axienet: Don't print if we go into promiscuous mode · cd039e67

Sean Anderson authored Aug 22, 2024

A message about being in promiscuous mode is printed every time each
additional multicast address beyond four is added. Suppress this message
like is done in other drivers.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822154059.1066595-4-sean.anderson@linux.devSigned-off-by: Jakub Kicinski <kuba@kernel.org>

cd039e67

Merge branch 'some-modifications-to-optimize-code-readability' · 5efc9623

Jakub Kicinski authored Aug 26, 2024

Li Zetao says:

====================
Some modifications to optimize code readability

This patchset is mainly optimized for readability in contexts where size
needs to be determined. By using min() or max(), or even directly
removing redundant judgments (such as the 5th patch), the code is more
consistent with the context.
====================

Link: https://patch.msgid.link/20240822133908.1042240-1-lizetao1@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5efc9623

tipc: use min() to simplify the code · a1830862

Li Zetao authored Aug 22, 2024

When calculating size of own domain based on number of peers, the result
should be less than MAX_MON_DOMAIN, so using min() here is very semantic.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822133908.1042240-8-lizetao1@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a1830862

ipv6: mcast: use min() to simplify the code · 26549dab

Li Zetao authored Aug 22, 2024

When coping sockaddr in ip6_mc_msfget(), the time of copies
depends on the minimum value between sl_count and gf_numsrc.
Using min() here is very semantic.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822133908.1042240-7-lizetao1@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

26549dab

net: caif: use max() to simplify the code · b4985aa8

Li Zetao authored Aug 22, 2024

When processing the tail append of sk buffer, the final length needs
to be determined based on expectlen and addlen. Using max() here can
increase the readability of the code.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822133908.1042240-4-lizetao1@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b4985aa8

Merge branch 'net-header-and-core-spelling-corrections' · 77f0caec

Jakub Kicinski authored Aug 26, 2024

Simon Horman says:

====================
net: header and core spelling corrections

This patchset addresses a number of spelling errors in comments in
Networking files under include/, and files in net/core/. Spelling
problems are as flagged by codespell.

It aims to provide patches that can be accepted directly into net-next.
And splits patches up based on maintainer boundaries: many things
feed directly into net-next. This is a complex process and I apologise
for any errors.

I also plan to address, via separate patches, spelling errors in other
files in the same directories, for files whose changes typically go
through trees other than net-next (which feed into net-next).
====================

Link: https://patch.msgid.link/20240822-net-spell-v1-0-3a98971ce2d2@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

77f0caec