Commits · 3f4093e2bf4673f218c0bf17d8362337c400e77b · Kirill Smelkov / linux

09 Aug, 2022 6 commits

atm: idt77252: fix use-after-free bugs caused by tst_timer · 3f4093e2

Duoming Zhou authored Aug 05, 2022

There are use-after-free bugs caused by tst_timer. The root cause
is that there are no functions to stop tst_timer in idt77252_exit().
One of the possible race conditions is shown below:

    (thread 1)          |        (thread 2)
                        |  idt77252_init_one
                        |    init_card
                        |      fill_tst
                        |        mod_timer(&card->tst_timer, ...)
idt77252_exit           |  (wait a time)
                        |  tst_timer
                        |
                        |    ...
  kfree(card) // FREE   |
                        |    card->soft_tst[e] // USE

The idt77252_dev is deallocated in idt77252_exit() and used in
timer handler.

This patch adds del_timer_sync() in idt77252_exit() in order that
the timer handler could be stopped before the idt77252_dev is
deallocated.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20220805070008.18007-1-duoming@zju.edu.cnSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3f4093e2

net: dsa: felix: fix min gate len calculation for tc when its first gate is closed · 7e4babff

Vladimir Oltean authored Aug 04, 2022

min_gate_len[tc] is supposed to track the shortest interval of
continuously open gates for a traffic class. For example, in the
following case:

TC 76543210

t0 00000001b 200000 ns
t1 00000010b 200000 ns

min_gate_len[0] and min_gate_len[1] should be 200000, while
min_gate_len[2-7] should be 0.

However what happens is that min_gate_len[0] is 200000, but
min_gate_len[1] ends up being 0 (despite gate_len[1] being 200000 at the
point where the logic detects the gate close event for TC 1).

The problem is that the code considers a "gate close" event whenever it
sees that there is a 0 for that TC (essentially it's level rather than
edge triggered). By doing that, any time a gate is seen as closed
without having been open prior, gate_len, which is 0, will be written
into min_gate_len. Once min_gate_len becomes 0, it's impossible for it
to track anything higher than that (the length of actually open
intervals).

To fix this, we make the writing to min_gate_len[tc] be edge-triggered,
which avoids writes for gates that are closed in consecutive intervals.
However what this does is it makes us need to special-case the
permanently closed gates at the end.

Fixes: 55a515b1 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220804202817.1677572-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7e4babff

net/x25: fix call timeouts in blocking connects · 944e594c

Martin Schiller authored Aug 05, 2022

When a userspace application starts a blocking connect(), a CALL REQUEST
is sent, the t21 timer is started and the connect is waiting in
x25_wait_for_connection_establishment(). If then for some reason the t21
timer expires before any reaction on the assigned logical channel (e.g.
CALL ACCEPT, CLEAR REQUEST), there is sent a CLEAR REQUEST and timer
t23 is started waiting for a CLEAR confirmation. If we now receive a
CLEAR CONFIRMATION from the peer, x25_disconnect() is called in
x25_state2_machine() with reason "0", which means "normal" call
clearing. This is ok, but the parameter "reason" is used as sk->sk_err
in x25_disconnect() and sock_error(sk) is evaluated in
x25_wait_for_connection_establishment() to check if the call is still
pending. As "0" is not rated as an error, the connect will stuck here
forever.

To fix this situation, also check if the sk->sk_state changed form
TCP_SYN_SENT to TCP_CLOSE in the meantime, which is also done by
x25_disconnect().
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20220805061810.10824-1-ms@dev.tdt.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

944e594c

Merge branch 'tsnep-two-fixes-for-the-driver' · 8eb6fcc9

Jakub Kicinski authored Aug 08, 2022

Gerhard Engleder says:

====================
tsnep: Two fixes for the driver

Two simple bugfixes for tsnep driver.
====================

Link: https://lore.kernel.org/r/20220804183935.73763-1-gerhard@engleder-embedded.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8eb6fcc9

tsnep: Fix tsnep_tx_unmap() error path usage · b3bb8628

Gerhard Engleder authored Aug 04, 2022

If tsnep_tx_map() fails, then tsnep_tx_unmap() shall start at the write
index like tsnep_tx_map(). This is different to the normal operation.
Thus, add an additional parameter to tsnep_tx_unmap() to enable start at
different positions for successful TX and failed TX.

Fixes: 403f69bb ("tsnep: Add TSN endpoint Ethernet MAC driver")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b3bb8628

tsnep: Fix unused warning for 'tsnep_of_match' · 73afd781

Gerhard Engleder authored Aug 04, 2022

Kernel test robot found the following warning:

drivers/net/ethernet/engleder/tsnep_main.c:1254:34: warning:
'tsnep_of_match' defined but not used [-Wunused-const-variable=]

of_match_ptr() compiles into NULL if CONFIG_OF is disabled.
tsnep_of_match exists always so use of of_match_ptr() is useless.
Fix warning by dropping of_match_ptr().
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

73afd781

08 Aug, 2022 2 commits

net: bpf: Use the protocol's set_rcvlowat behavior if there is one · f574f7f8

Gao Feng authored Aug 04, 2022

The commit d1361840 ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning")
add one new (struct proto_ops)->set_rcvlowat method so that a protocol
can override the default setsockopt(SO_RCVLOWAT) behavior.

The prior bpf codes don't check and invoke the protos's set_rcvlowat,
now correct it.
Signed-off-by: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f574f7f8

virtio_net: fix memory leak inside XPD_TX with mergeable · 7a542bee

Xuan Zhuo authored Aug 04, 2022

When we call xdp_convert_buff_to_frame() to get xdpf, if it returns
NULL, we should check if xdp_page was allocated by xdp_linearize_page().
If it is newly allocated, it should be freed here alone. Just like any
other "goto err_xdp".

Fixes: 44fa2dbd ("xdp: transition into using xdp_frame for ndo_xdp_xmit")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a542bee

06 Aug, 2022 16 commits

net: seg6: initialize induction variable to first valid array index · ac0dbed9

Nick Desaulniers authored Aug 02, 2022

Fixes the following warnings observed when building
CONFIG_IPV6_SEG6_LWTUNNEL=y with clang:

net/ipv6/seg6_local.o: warning: objtool: seg6_local_fill_encap() falls
through to next function seg6_local_get_encap_size()
net/ipv6/seg6_local.o: warning: objtool: seg6_local_cmp_encap() falls
through to next function input_action_end()

LLVM can fully unroll loops in seg6_local_get_encap_size() and
seg6_local_cmp_encap(). One issue in those loops is that the induction
variable is initialized to 0. The loop iterates over members of
seg6_action_params, a global array of struct seg6_action_param calling
their put() function pointer members. seg6_action_param uses an array
initializer to initialize SEG6_LOCAL_SRH and later elements, which is
the third enumeration of an anonymous union.

The guard `if (attrs & SEG6_F_ATTR(i))` may prevent this from being
called at runtime, but it would still be UB for
`seg6_action_params[0]->put` to be called; the unrolled loop will make
the initial iterations unreachable, which LLVM will later rotate to
fallthrough to the next function.

Make this more obvious that this cannot happen to the compiler by
initializing the loop induction variable to the minimum valid index that
seg6_action_params is initialized to.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20220802161203.622293-1-ndesaulniers@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ac0dbed9

net: bcmgenet: Indicate MAC is in charge of PHY PM · bc3410f2

Florian Fainelli authored Aug 04, 2022

Avoid the PHY library call unnecessarily into the suspend/resume functions by
setting phydev->mac_managed_pm to true. The GENET driver essentially does
exactly what mdio_bus_phy_resume() does by calling phy_init_hw() plus
phy_resume().

Fixes: fba863b8 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220804173605.1266574-1-f.fainelli@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bc3410f2

eth: fix the help in Wangxun's Kconfig · 049d5d98

Jakub Kicinski authored Aug 04, 2022

The text was copy&pasted from Intel, adjust it to say Wangxun.
Reported-by: Ingo Saitz <ingo@hannover.ccc.de>
Fixes: 3ce7547e ("net: txgbe: Add build support for txgbe")
Link: https://lore.kernel.org/r/20220804182641.1442000-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

049d5d98

net: avoid overflow when rose /proc displays timer information. · df1c9414

Francois Romieu authored Aug 02, 2022

rose /proc code does not serialize timer accesses.

Initial report by Bernard F6BVP Pidoux exhibits overflow amounting
to 116 ticks on its HZ=250 system.

Full timer access serialization would imho be overkill as rose /proc
does not enforce consistency between displayed ROSE_STATE_XYZ and
timer values during changes of state.

The patch may also fix similar behavior in ax25 /proc, ax25 ioctl
and netrom /proc as they all exhibit the same timer serialization
policy. This point has not been reported though.

The sole remaining use of ax25_display_timer - ax25 rtt valuation -
may also perform marginally better but I have not analyzed it too
deeply.

Cc: Thomas DL9SAU Osterried <thomas@osterried.de>
Link: https://lore.kernel.org/all/d5e93cc7-a91f-13d3-49a1-b50c11f0f811@free.fr/Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Link: https://lore.kernel.org/r/Yuk9vq7t7VhmnOXu@electric-eye.fr.zoreil.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

df1c9414

octeontx2-pf: Fix NIX_AF_TL3_TL2X_LINKX_CFG register configuration · 13c9f4dc

Naveen Mamindlapalli authored Aug 02, 2022

For packets scheduled to RPM and LBK, NIX_AF_PSE_CHANNEL_LEVEL[BP_LEVEL]
selects the TL3 or TL2 scheduling level as the one used for link/channel
selection and backpressure. For each scheduling queue at the selected
level: Setting NIX_AF_TL3_TL2(0..255)_LINK(0..12)_CFG[ENA] = 1 allows
the TL3/TL2 queue to schedule packets to a specified RPM or LBK link
and channel.

There is an issue in the code where NIX_AF_PSE_CHANNEL_LEVEL[BP_LEVEL]
is set to TL3 where as the NIX_AF_TL3_TL2(0..255)_LINK(0..12)_CFG is
configured for TL2 queue in some cases. As a result packets will not
transmit on that link/channel. This patch fixes the issue by configuring
the NIX_AF_TL3_TL2(0..255)_LINK(0..12)_CFG register depending on the
NIX_AF_PSE_CHANNEL_LEVEL[BP_LEVEL] value.

Fixes: caa2da34 ("octeontx2-pf: Initialize and config queues")
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20220802142813.25031-1-naveenm@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

13c9f4dc

Merge branch 'octeontx2-af-driver-fixes-for-npc' · 63e36289

Jakub Kicinski authored Aug 05, 2022

Subbaraya Sundeep says:

====================
Octeontx2 AF driver fixes for NPC

This patchset includes AF driver fixes wrt packet parser NPC.
Following are the changes:

Patch 1: The parser nibble configuration must be same for
TX and RX interfaces and if not fix up is applied. This fixup was
applied only for default profile currently and it has been fixed
to apply for all profiles.
Patch 2: Firmware image may not be present all times in the kernel image
and default profile is used mostly hence suppress the warning.
Patch 3: This patch fixes a corner case where NIXLF is detached but
without freeing its mcam entries which results in resource leak.
Patch 4: SMAC is overlapped with DMAC mistakenly while installing
rules based on SMAC. This patch fixes that.
====================

Link: https://lore.kernel.org/r/1659513255-28667-1-git-send-email-sbhatta@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

63e36289

octeontx2-af: Fix key checking for source mac · c3c29027

Subbaraya Sundeep authored Aug 03, 2022

Given a field with its location/offset in input packet,
the key checking logic verifies whether extracting the
field can be supported or not based on the mkex profile
loaded in hardware. This logic is wrong wrt source mac
and this patch fixes that.

Fixes: 9b179a96 ("octeontx2-af: Generate key field bit mask from KEX profile")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c3c29027

octeontx2-af: Fix mcam entry resource leak · 3f8fe40a

Subbaraya Sundeep authored Aug 03, 2022

The teardown sequence in FLR handler returns if no NIX LF
is attached to PF/VF because it indicates that graceful
shutdown of resources already happened. But there is a
chance of all allocated MCAM entries not being freed by
PF/VF. Hence free mcam entries even in case of detached LF.

Fixes: c554f9c1 ("octeontx2-af: Teardown NPA, NIX LF upon receiving FLR")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3f8fe40a

octeontx2-af: suppress external profile loading warning · cf243762

Harman Kalra authored Aug 03, 2022

The packet parser profile supplied as firmware may not
be present all the time and default profile is used mostly.
Hence suppress firmware loading warning from kernel due to
absence of firmware in kernel image.

Fixes: 3a724415 ("octeontx2-af: add support for custom KPU entries")
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cf243762

octeontx2-af: Apply tx nibble fixup always · dd1d1a8a

Stanislaw Kardach authored Aug 03, 2022

NPC_PARSE_NIBBLE for TX interface has to be equal to the RX one for some
silicon revisions. Mistakenly this fixup was only applied to the default
MKEX profile while it should also be applied to any loaded profile.

Fixes: 1c1935c9 ("octeontx2-af: Add NIX1 interfaces to NPC")
Signed-off-by: Stanislaw Kardach <skardach@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dd1d1a8a

MAINTAINERS: Update ibmveth maintainer · 8a5dfc28

Nick Child authored Aug 03, 2022

Add Nick Child as the maintainer of the IBM Power Virtual Ethernet
Device Driver, replacing Cristobal Forno.
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Link: https://lore.kernel.org/r/20220803155246.39582-1-nnac123@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8a5dfc28

bnxt_en: Remove duplicated include bnxt_devlink.c · 07977a8a

Yang Li authored Aug 04, 2022

bnxt_ethtool.h is included twice in bnxt_devlink.c,
remove one of them.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1817Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220804003722.54088-1-yang.lee@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

07977a8a

Merge branch 'netfilter-followup-fixes-for-net' · f6ac85a1

Jakub Kicinski authored Aug 05, 2022

Florian Westphal says:

====================
netfilter followup fixes for net

Regressions, since 5.19:
Fix crash when packet tracing is enabled via 'meta nftrace set 1' rule.
Also comes with a test case.

Regressions, this cycle:
Fix Kconfig dependency for the flowtable /proc interface, we want this
to be off by default.
====================

Link: https://lore.kernel.org/r/20220804172629.29748-1-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f6ac85a1

netfilter: flowtable: fix incorrect Kconfig dependencies · b06ada6d

Pablo Neira Ayuso authored Aug 04, 2022

Remove default to 'y', this infrastructure is not fundamental for the
flowtable operational.

Add a missing dependency on CONFIG_NF_FLOW_TABLE.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: b0381776 ("netfilter: nf_flow_table: count pending offload workqueue tasks")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b06ada6d

selftests: netfilter: add test case for nf trace infrastructure · fe9e420d

Florian Westphal authored Aug 04, 2022

Enable/disable tracing infrastructure while packets are in-flight.
This triggers KASAN splat after
e34b9ed9 ("netfilter: nf_tables: avoid skb access on nf_stolen").

While at it, reduce script run time as well.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fe9e420d

netfilter: nf_tables: fix crash when nf_trace is enabled · 399a14ec

Florian Westphal authored Aug 04, 2022

do not access info->pkt when info->trace is not 1.
nft_traceinfo is not initialized, except when tracing is enabled.

The 'nft_trace_enabled' static key cannot be used for this, we must
always check info->trace first.

Pass nft_pktinfo directly to avoid this.

Fixes: e34b9ed9 ("netfilter: nf_tables: avoid skb access on nf_stolen")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

399a14ec

05 Aug, 2022 6 commits

selftests: add few test cases for tap driver · 2e64fe46

Cezar Bulinaru authored Aug 03, 2022

Few test cases related to the fix for 924a9bc3:
"net: check if protocol extracted by virtio_net_hdr_set_proto is correct"

Need test for the case when a non-standard packet (GSO without NEEDS_CSUM)
sent to the tap device causes a BUG check in the tap driver.
Signed-off-by: Cezar Bulinaru <cbulinaru@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e64fe46

net: tap: NULL pointer derefence in dev_parse_header_protocol when skb->dev is null · 4f61f133

Cezar Bulinaru authored Aug 03, 2022

Fixes a NULL pointer derefence bug triggered from tap driver.
When tap_get_user calls virtio_net_hdr_to_skb the skb->dev is null
(in tap.c skb->dev is set after the call to virtio_net_hdr_to_skb)
virtio_net_hdr_to_skb calls dev_parse_header_protocol which
needs skb->dev field to be valid.

The line that trigers the bug is in dev_parse_header_protocol
(dev is at offset 0x10 from skb and is stored in RAX register)
  if (!dev->header_ops || !dev->header_ops->parse_protocol)
  22e1:   mov    0x10(%rbx),%rax
  22e5:	  mov    0x230(%rax),%rax

Setting skb->dev before the call in tap.c fixes the issue.

BUG: kernel NULL pointer dereference, address: 0000000000000230
RIP: 0010:virtio_net_hdr_to_skb.constprop.0+0x335/0x410 [tap]
Code: c0 0f 85 b7 fd ff ff eb d4 41 39 c6 77 cf 29 c6 48 89 df 44 01 f6 e8 7a 79 83 c1 48 85 c0 0f 85 d9 fd ff ff eb b7 48 8b 43 10 <48> 8b 80 30 02 00 00 48 85 c0 74 55 48 8b 40 28 48 85 c0 74 4c 48
RSP: 0018:ffffc90005c27c38 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888298f25300 RCX: 0000000000000010
RDX: 0000000000000005 RSI: ffffc90005c27cb6 RDI: ffff888298f25300
RBP: ffffc90005c27c80 R08: 00000000ffffffea R09: 00000000000007e8
R10: ffff88858ec77458 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000014 R14: ffffc90005c27e08 R15: ffffc90005c27cb6
FS:  0000000000000000(0000) GS:ffff88858ec40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000230 CR3: 0000000281408006 CR4: 00000000003706e0
Call Trace:
 tap_get_user+0x3f1/0x540 [tap]
 tap_sendmsg+0x56/0x362 [tap]
 ? get_tx_bufs+0xc2/0x1e0 [vhost_net]
 handle_tx_copy+0x114/0x670 [vhost_net]
 handle_tx+0xb0/0xe0 [vhost_net]
 handle_tx_kick+0x15/0x20 [vhost_net]
 vhost_worker+0x7b/0xc0 [vhost]
 ? vhost_vring_call_reset+0x40/0x40 [vhost]
 kthread+0xfa/0x120
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30

Fixes: 924a9bc3 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
Signed-off-by: Cezar Bulinaru <cbulinaru@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f61f133

Merge branch 'mptcp-fixes' · 9f05f9ad

David S. Miller authored Aug 05, 2022

Mat Martineau says:

====================
mptcp: Fixes for mptcp cleanup/close and a selftest

Patch 1 fixes an issue with leaking subflow sockets if there's a failure
in a CGROUP_INET_SOCK_CREATE eBPF program.

Patch 2 fixes a syzkaller-detected race at MPTCP socket close.

Patch 3 is a fix for one mode of the mptcp_connect.sh selftest.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9f05f9ad

selftests: mptcp: make sendfile selftest work · df9e03ae

Florian Westphal authored Aug 04, 2022

When the selftest got added, sendfile() on mptcp sockets returned
-EOPNOTSUPP, so running 'mptcp_connect.sh -m sendfile' failed
immediately.

This is no longer the case, but the script fails anyway due to timeout.
Let the receiver know once the sender has sent all data, just like
with '-m mmap' mode.

v2: need to respect cfg_wait too, as pm_userspace.sh relied
on -m sendfile to keep the connection open (Mat Martineau)

Fixes: 048d19d4 ("mptcp: add basic kselftest for mptcp")
Reported-by: Xiumei Mu <xmu@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

df9e03ae

mptcp: do not queue data on closed subflows · c886d702

Paolo Abeni authored Aug 04, 2022

Dipanjan reported a syzbot splat at close time:

WARNING: CPU: 1 PID: 10818 at net/ipv4/af_inet.c:153
inet_sock_destruct+0x6d0/0x8e0 net/ipv4/af_inet.c:153
Modules linked in: uio_ivshmem(OE) uio(E)
CPU: 1 PID: 10818 Comm: kworker/1:16 Tainted: G           OE
5.19.0-rc6-g2eae0556bb9d #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:inet_sock_destruct+0x6d0/0x8e0 net/ipv4/af_inet.c:153
Code: 21 02 00 00 41 8b 9c 24 28 02 00 00 e9 07 ff ff ff e8 34 4d 91
f9 89 ee 4c 89 e7 e8 4a 47 60 ff e9 a6 fc ff ff e8 20 4d 91 f9 <0f> 0b
e9 84 fe ff ff e8 14 4d 91 f9 0f 0b e9 d4 fd ff ff e8 08 4d
RSP: 0018:ffffc9001b35fa78 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000002879d0 RCX: ffff8881326f3b00
RDX: 0000000000000000 RSI: ffff8881326f3b00 RDI: 0000000000000002
RBP: ffff888179662674 R08: ffffffff87e983a0 R09: 0000000000000000
R10: 0000000000000005 R11: 00000000000004ea R12: ffff888179662400
R13: ffff888179662428 R14: 0000000000000001 R15: ffff88817e38e258
FS:  0000000000000000(0000) GS:ffff8881f5f00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020007bc0 CR3: 0000000179592000 CR4: 0000000000150ee0
Call Trace:
 <TASK>
 __sk_destruct+0x4f/0x8e0 net/core/sock.c:2067
 sk_destruct+0xbd/0xe0 net/core/sock.c:2112
 __sk_free+0xef/0x3d0 net/core/sock.c:2123
 sk_free+0x78/0xa0 net/core/sock.c:2134
 sock_put include/net/sock.h:1927 [inline]
 __mptcp_close_ssk+0x50f/0x780 net/mptcp/protocol.c:2351
 __mptcp_destroy_sock+0x332/0x760 net/mptcp/protocol.c:2828
 mptcp_worker+0x5d2/0xc90 net/mptcp/protocol.c:2586
 process_one_work+0x9cc/0x1650 kernel/workqueue.c:2289
 worker_thread+0x623/0x1070 kernel/workqueue.c:2436
 kthread+0x2e9/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
 </TASK>

The root cause of the problem is that an mptcp-level (re)transmit can
race with mptcp_close() and the packet scheduler checks the subflow
state before acquiring the socket lock: we can try to (re)transmit on
an already closed ssk.

Fix the issue checking again the subflow socket status under the
subflow socket lock protection. Additionally add the missing check
for the fallback-to-tcp case.

Fixes: d5f49190 ("mptcp: allow picking different xmit subflows")
Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c886d702

mptcp: move subflow cleanup in mptcp_destroy_common() · c0bf3c6a

Paolo Abeni authored Aug 04, 2022

If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
eBPF program, the MPTCP protocol ends-up leaking all the subflows:
the related cleanup happens in __mptcp_destroy_sock() that is not
invoked in such code path.

Address the issue moving the subflow sockets cleanup in the
mptcp_destroy_common() helper, which is invoked in every msk cleanup
path.

Additionally get rid of the intermediate list_splice_init step, which
is an unneeded relic from the past.

The issue is present since before the reported root cause commit, but
any attempt to backport the fix before that hash will require a complete
rewrite.

Fixes: e16163b6 ("mptcp: refactor shutdown and close")
Reported-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Co-developed-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0bf3c6a

04 Aug, 2022 7 commits

nfp: ethtool: fix the display error of `ethtool -m DEVNAME` · 4ae97cae

Yu Xiao authored Aug 02, 2022

The port flag isn't set to `NFP_PORT_CHANGED` when using
`ethtool -m DEVNAME` before, so the port state (e.g. interface)
cannot be updated. Therefore, it caused that `ethtool -m DEVNAME`
sometimes cannot read the correct information.

E.g. `ethtool -m DEVNAME` cannot work when load driver before plug
in optical module, as the port interface is still NONE without port
update.

Now update the port state before sending info to NIC to ensure that
port interface is correct (latest state).

Fixes: 61f7c6f4 ("nfp: implement ethtool get module EEPROM")
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220802093355.69065-1-simon.horman@corigine.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4ae97cae

net: phy: Warn about incorrect mdio_bus_phy_resume() state · 744d23c7

Florian Fainelli authored Aug 01, 2022

Calling mdio_bus_phy_resume() with neither the PHY state machine set to
PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication
that we can produce a race condition looking like this:

CPU0						CPU1
bcmgenet_resume
 -> phy_resume
   -> phy_init_hw
 -> phy_start
   -> phy_resume
                                                phy_start_aneg()
mdio_bus_phy_resume
 -> phy_resume
    -> phy_write(..., BMCR_RESET)
     -> usleep()                                  -> phy_read()

with the phy_resume() function triggering a PHY behavior that might have
to be worked around with (see bf8bfc43 ("net: phy: broadcom: Fix
brcm_fet_config_init()") for instance) that ultimately leads to an error
reading from the PHY.

Fixes: fba863b8 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220801233403.258871-1-f.fainelli@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

744d23c7

Merge branch 'make-dsa-work-with-bonding-s-arp-monitor' · 7de196a6

Jakub Kicinski authored Aug 03, 2022

Vladimir Oltean says:

====================
Make DSA work with bonding's ARP monitor

Since commit 2b86cb82 ("net: dsa: declare lockless TX feature for
slave ports") in v5.7, DSA breaks the ARP monitoring logic from the
bonding driver, fact which was pointed out by Brian Hutchinson who uses
a linux-5.10.y stable kernel.

Initially I got lured by other similar hacks introduced for other
NETIF_F_LLTX drivers, which, inspired by the bonding documentation,
update the trans_start of their TX queues by hand.

However Jakub pointed out that this simply isn't a proper solution, and
after coming to think more about it, I agree, and it doesn't work
properly with DSA nor is it maintainable for the future changes I plan
for it (multiple DSA masters in a LAG).

I've tested these changes using a DSA-based setup and a veth-based
setup, using the active-backup mode and ARP monitoring, with and without
arp_validate.

Link to v1:
https://patchwork.kernel.org/project/netdevbpf/patch/20220715232641.952532-1-vladimir.oltean@nxp.com/

Link to v2:
https://patchwork.kernel.org/project/netdevbpf/patch/20220727152000.3616086-1-vladimir.oltean@nxp.com/
====================

Link: https://lore.kernel.org/r/20220731124108.2810233-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7de196a6

docs: net: bonding: remove mentions of trans_start · cba8d8f5

Vladimir Oltean authored Jul 31, 2022

ARP monitoring no longer depends on dev->last_rx or dev_trans_start(),
so delete this information.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cba8d8f5

Revert "veth: Add updating of trans_start" · 08b403d5

Vladimir Oltean authored Jul 31, 2022

This reverts commit e66e257a. The veth
driver no longer needs these hacks which are slightly detrimential to
the fast path performance, because the bonding driver is keeping track
of TX times of ARP and NS probes by itself, which it should.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

08b403d5

net/sched: remove hacks added to dev_trans_start() for bonding to work · 4873a1b2

Vladimir Oltean authored Jul 31, 2022

Now that the bonding driver keeps track of the last TX time of ARP and
NS probes, we effectively revert the following commits:

32d3e51a ("net_sched: use macvlan real dev trans_start in dev_trans_start()")
07ce76aa ("net_sched: make dev_trans_start return vlan's real dev trans_start")

Note that the approach of continuing to hack at this function would not
get us very far, hence the desire to take a different approach. DSA is
also a virtual device that uses NETIF_F_LLTX, but there, many uppers
share the same lower (DSA master, i.e. the physical host port of a
switch). By making dev_trans_start() on a DSA interface return the
dev_trans_start() of the master, we effectively assume that all other
DSA interfaces are silent, otherwise this corrupts the validity of the
probe timestamp data from the bonding driver's perspective.

Furthermore, the hacks didn't take into consideration the fact that the
lower interface of @dev may not have been physical either. For example,
VLAN over VLAN, or DSA with 2 masters in a LAG.

And even furthermore, there are NETIF_F_LLTX devices which are not
stacked, like veth. The hack here would not work with those, because it
would not have to provide the bonding driver something to chew at all.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

4873a1b2

net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS · 06799a90

Vladimir Oltean authored Jul 31, 2022

The bonding driver piggybacks on time stamps kept by the network stack
for the purpose of the netdev TX watchdog, and this is problematic
because it does not work with NETIF_F_LLTX devices.

It is hard to say why the driver looks at dev_trans_start() of the
slave->dev, considering that this is updated even by non-ARP/NS probes
sent by us, and even by traffic not sent by us at all (for example PTP
on physical slave devices). ARP monitoring in active-backup mode appears
to still work even if we track only the last TX time of actual ARP
probes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

06799a90

03 Aug, 2022 3 commits

Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · f86d1fbb

Linus Torvalds authored Aug 03, 2022

Pull networking changes from Paolo Abeni:
"Core:

- Refactor the forward memory allocation to better cope with memory
pressure with many open sockets, moving from a per socket cache to
a per-CPU one

- Replace rwlocks with RCU for better fairness in ping, raw sockets
and IP multicast router.

- Network-side support for IO uring zero-copy send.

- A few skb drop reason improvements, including codegen the source
file with string mapping instead of using macro magic.

- Rename reference tracking helpers to a more consistent netdev_*
schema.

- Adapt u64_stats_t type to address load/store tearing issues.

- Refine debug helper usage to reduce the log noise caused by bots.

BPF:

- Improve socket map performance, avoiding skb cloning on read
operation.

- Add support for 64 bits enum, to match types exposed by kernel.

- Introduce support for sleepable uprobes program.

- Introduce support for enum textual representation in libbpf.

- New helpers to implement synproxy with eBPF/XDP.

- Improve loop performances, inlining indirect calls when possible.

- Removed all the deprecated libbpf APIs.

- Implement new eBPF-based LSM flavor.

- Add type match support, which allow accurate queries to the eBPF
used types.

- A few TCP congetsion control framework usability improvements.

- Add new infrastructure to manipulate CT entries via eBPF programs.

- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.

Protocols:

- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.

- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.

- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.

- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.

- Support for changing the initial MTPCP subflow priority/backup
status

- Introduce virtually contingus buffers for sockets over RDMA, to
cope better with memory pressure.

- Extend CAN ethtool support with timestamping capabilities

- Refactor CAN build infrastructure to allow building only the needed
features.

Driver API:

- Remove devlink mutex to allow parallel commands on multiple links.

- Add support for pause stats in distributed switch.

- Implement devlink helpers to query and flash line cards.

- New helper for phy mode to register conversion.

New hardware / drivers:

- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.

- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.

- Ethernet DSA driver for the Microchip LAN937x switch.

- Ethernet PHY driver for the Aquantia AQR113C EPHY.

- CAN driver for the OBD-II ELM327 interface.

- CAN driver for RZ/N1 SJA1000 CAN controller.

- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.

Drivers:

- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload

- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate

- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default

- Microsoft vNIC driver (mana)
- add support for XDP redirect

- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support

- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability (parts 1-6)
- support for PTP in Spectrum-2 asics

- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY

- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload

- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share the
probe code (parts 1, 2), add support for phylink mac
configuration

- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support

Old code removal:

- Neterion vxge ethernet driver: this is untouched since more than 10 years"

* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
doc: sfp-phylink: Fix a broken reference
wireguard: selftests: support UML
wireguard: allowedips: don't corrupt stack when detecting overflow
wireguard: selftests: update config fragments
wireguard: ratelimiter: use hrtimer in selftest
net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
net: usb: ax88179_178a: Bind only to vendor-specific interface
selftests: net: fix IOAM test skip return code
net: usb: make USB_RTL8153_ECM non user configurable
net: marvell: prestera: remove reduntant code
octeontx2-pf: Reduce minimum mtu size to 60
net: devlink: Fix missing mutex_unlock() call
net/tls: Remove redundant workqueue flush before destroy
net: txgbe: Fix an error handling path in txgbe_probe()
net: dsa: Fix spelling mistakes and cleanup code
Documentation: devlink: add add devlink-selftests to the table of contents
dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
net: ionic: fix error check for vlan flags in ionic_set_nic_features()
net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
nfp: flower: add support for tunnel offload without key ID
...

f86d1fbb

Merge tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 526942b8

Linus Torvalds authored Aug 03, 2022

Pull ATA updates from Damien Le Moal:

 - Some code refactoring for the pata_hpt37x and pata_hpt3x2n drivers,
   from Sergei.

 - Several patches to cleanup in libata-core, libata-scsi and libata-eh
   code: fixes arguments and variables types, change some functions
   declaration to static and fix for a typo in a comment. From Sergey
   and Xiang.

 - Fix a compilation warning in the pata_macio driver, from me.

 - A fix for the expected number of resources in the sata_mv driver fix,
   from Andrew.

* tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: sata_mv: Fixes expected number of resources now IRQs are gone
  ata: libata-scsi: fix result type of ata_ioc32()
  ata: pata_macio: Fix compilation warning
  ata: libata-eh: fix sloppy result type of ata_internal_cmd_timeout()
  ata: libata-core: fix sloppy parameter type in ata_exec_internal[_sg]()
  ata: make ata_port::fastdrain_cnt *unsigned int*
  ata: libata-eh: fix sloppy result type of ata_eh_nr_in_flight()
  ata: libata-core: make ata_exec_internal_sg() *static*
  ata: make transfer mode masks *unsigned int*
  ata: libata-core: get rid of *else* branches in ata_id_n_sectors()
  ata: libata-core: fix sloppy typing in ata_id_n_sectors()
  ata: pata_hpt3x2n: pass base DPLL frequency to hpt3x2n_pci_clock()
  ata: pata_hpt37x: merge hpt374_read_freq() to hpt37x_pci_clock()
  ata: pata_hpt37x: factor out hpt37x_pci_clock()
  ata: pata_hpt37x: move claculating PCI clock from hpt37x_clock_slot()
  ata: libata: Fix syntax errors in comments

526942b8

Merge tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · a39b5dbd

Linus Torvalds authored Aug 03, 2022

Pull zonefs update from Damien Le Moal:
 "A single change for this cycle to simplify handling of the memory page
  used as super block buffer during mount (from Fabio)"

* tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Call page_address() on page acquired with GFP_KERNEL flag

a39b5dbd