Commits · 0548c5f26a0fefa721b29be2bfff1c44f15b6d73 · Kirill Smelkov / linux

28 Jan, 2023 9 commits

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 0548c5f2

Jakub Kicinski authored Jan 27, 2023

Daniel Borkmann says:

====================
bpf 2023-01-27

We've added 10 non-merge commits during the last 9 day(s) which contain
a total of 10 files changed, 170 insertions(+), 59 deletions(-).

The main changes are:

1) Fix preservation of register's parent/live fields when copying
   range-info, from Eduard Zingerman.

2) Fix an off-by-one bug in bpf_mem_cache_idx() to select the right
   cache, from Hou Tao.

3) Fix stack overflow from infinite recursion in sock_map_close(),
   from Jakub Sitnicki.

4) Fix missing btf_put() in register_btf_id_dtor_kfuncs()'s error path,
   from Jiri Olsa.

5) Fix a splat from bpf_setsockopt() via lsm_cgroup/socket_sock_rcv_skb,
   from Kui-Feng Lee.

6) Fix bpf_send_signal[_thread]() helpers to hold a reference on the task,
   from Yonghong Song.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix the kernel crash caused by bpf_setsockopt().
  selftests/bpf: Cover listener cloning with progs attached to sockmap
  selftests/bpf: Pass BPF skeleton to sockmap_listen ops tests
  bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener
  bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself
  bpf: Add missing btf_put to register_btf_id_dtor_kfuncs
  selftests/bpf: Verify copy_register_state() preserves parent/live fields
  bpf: Fix to preserve reg parent/live fields when copying range info
  bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers
  bpf: Fix off-by-one error in bpf_mem_cache_idx()
====================

Link: https://lore.kernel.org/r/20230127215820.4993-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0548c5f2

skb: Do mix page pool and page referenced frags in GRO · 7d2c89b3

Alexander Duyck authored Jan 26, 2023

GSO should not merge page pool recycled frames with standard reference
counted frames. Traditionally this didn't occur, at least not often.
However as we start looking at adding support for wireless adapters there
becomes the potential to mix the two due to A-MSDU repartitioning frames in
the receive path. There are possibly other places where this may have
occurred however I suspect they must be few and far between as we have not
seen this issue until now.

Fixes: 53e0961d ("page_pool: add frag page recycling support in page pool")
Reported-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/167475990764.1934330.11960904198087757911.stgit@localhost.localdomainSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7d2c89b3

net: dsa: mt7530: fix tristate and help description · ff445b83

Arınç ÜNAL authored Jan 26, 2023

Fix description for tristate and help sections which include inaccurate
information.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20230126190110.9124-1-arinc.unal@arinc9.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ff445b83

Merge branch 'net-xdp-execute-xdp_do_flush-before-napi_complete_done' · 3ac77ecd

Jakub Kicinski authored Jan 27, 2023

Magnus Karlsson says:

====================
net: xdp: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found in [1].

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in [2].

The drivers have only been compile-tested since I do not own any of
the HW below. So if you are a maintainer, it would be great if you
could take a quick look to make sure I did not mess something up.

Note that these were the drivers I found that violated the ordering by
running a simple script and manually checking the ones that came up as
potential offenders. But the script was not perfect in any way. There
might still be offenders out there, since the script can generate
false negatives.

[1] https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
[2] https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
====================

Link: https://lore.kernel.org/r/20230125074901.2737-1-magnus.karlsson@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3ac77ecd

dpaa2-eth: execute xdp_do_flush() before napi_complete_done() · a3191c4d

Magnus Karlsson authored Jan 25, 2023

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: d678be1d ("dpaa2-eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a3191c4d

dpaa_eth: execute xdp_do_flush() before napi_complete_done() · b5340137

Magnus Karlsson authored Jan 25, 2023

Fixes: a1e031ff ("dpaa_eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/Acked-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b5340137

virtio-net: execute xdp_do_flush() before napi_complete_done() · ad7e615f

Magnus Karlsson authored Jan 25, 2023

Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ad7e615f

lan966x: execute xdp_do_flush() before napi_complete_done() · 12b57179

Magnus Karlsson authored Jan 25, 2023

Fixes: a825b611 ("net: lan966x: Add support for XDP_REDIRECT")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/Signed-off-by: Jakub Kicinski <kuba@kernel.org>

12b57179

qede: execute xdp_do_flush() before napi_complete_done() · 2ccce20d

Magnus Karlsson authored Jan 25, 2023

Fixes: d1b25b79 ("qede: add .ndo_xdp_xmit() and XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2ccce20d

27 Jan, 2023 1 commit

bpf: Fix the kernel crash caused by bpf_setsockopt(). · 5416c9ae

Kui-Feng Lee authored Jan 26, 2023

The kernel crash was caused by a BPF program attached to the
"lsm_cgroup/socket_sock_rcv_skb" hook, which performed a call to
`bpf_setsockopt()` in order to set the TCP_NODELAY flag as an
example. Flags like TCP_NODELAY can prompt the kernel to flush a
socket's outgoing queue, and this hook
"lsm_cgroup/socket_sock_rcv_skb" is frequently triggered by
softirqs. The issue was that in certain circumstances, when
`tcp_write_xmit()` was called to flush the queue, it would also allow
BH (bottom-half) to run. This could lead to our program attempting to
flush the same socket recursively, which caused a `skbuff` to be
unlinked twice.

`security_sock_rcv_skb()` is triggered by `tcp_filter()`. This occurs
before the sock ownership is checked in `tcp_v4_rcv()`. Consequently,
if a bpf program runs on `security_sock_rcv_skb()` while under softirq
conditions, it may not possess the lock needed for `bpf_setsockopt()`,
thus presenting an issue.

The patch fixes this issue by ensuring that a BPF program attached to
the "lsm_cgroup/socket_sock_rcv_skb" hook is not allowed to call
`bpf_setsockopt()`.

The differences from v1 are
 - changing commit log to explain holding the lock of the sock,
 - emphasizing that TCP_NODELAY is not the only flag, and
 - adding the fixes tag.

v1: https://lore.kernel.org/bpf/20230125000244.1109228-1-kuifeng@meta.com/Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Fixes: 9113d7e4 ("bpf: expose bpf_{g,s}etsockopt to lsm cgroup")
Link: https://lore.kernel.org/r/20230127001732.4162630-1-kuifeng@meta.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

5416c9ae

26 Jan, 2023 6 commits

Merge tag 'net-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 28b4387f

Linus Torvalds authored Jan 26, 2023

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - sched: sch_taprio: do not schedule in taprio_reset()

  Previous releases - regressions:

   - core: fix UaF in netns ops registration error path

   - ipv4: prevent potential spectre v1 gadgets

   - ipv6: fix reachability confirmation with proxy_ndp

   - netfilter: fix for the set rbtree

   - eth: fec: use page_pool_put_full_page when freeing rx buffers

   - eth: iavf: fix temporary deadlock and failure to set MAC address

  Previous releases - always broken:

   - netlink: prevent potential spectre v1 gadgets

   - netfilter: fixes for SCTP connection tracking

   - mctp: struct sock lifetime fixes

   - eth: ravb: fix possible hang if RIS2_QFF1 happen

   - eth: tg3: resolve deadlock in tg3_reset_task() during EEH

  Misc:

   - Mat stepped out as MPTCP co-maintainer"

* tag 'net-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
  net: mdio-mux-meson-g12a: force internal PHY off on mux switch
  docs: networking: Fix bridge documentation URL
  tsnep: Fix TX queue stop/wake for multiple queues
  net/tg3: resolve deadlock in tg3_reset_task() during EEH
  net: mctp: mark socks as dead on unhash, prevent re-add
  net: mctp: hold key reference when looking up a general key
  net: mctp: move expiry timer delete to unhash
  net: mctp: add an explicit reference from a mctp_sk_key to sock
  net: ravb: Fix possible hang if RIS2_QFF1 happen
  net: ravb: Fix lack of register setting after system resumed for Gen3
  net/x25: Fix to not accept on connected socket
  ice: move devlink port creation/deletion
  sctp: fail if no bound addresses can be used for a given scope
  net/sched: sch_taprio: do not schedule in taprio_reset()
  Revert "Merge branch 'ethtool-mac-merge'"
  netrom: Fix use-after-free of a listening socket.
  netfilter: conntrack: unify established states for SCTP paths
  Revert "netfilter: conntrack: add sctp DATA_SENT state"
  netfilter: conntrack: fix bug in for_each_sctp_chunk
  netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
  ...

28b4387f

treewide: fix up files incorrectly marked executable · 262b42e0

Linus Torvalds authored Jan 26, 2023

I'm not exactly clear on what strange workflow causes people to do it,
but clearly occasionally some files end up being committed as executable
even though they clearly aren't.

This is a reprise of commit 90fda63f ("treewide: fix up files
incorrectly marked executable"), just with a different set of files (but
with the same trivial shell scripting).

So apparently we need to re-do this every five years or so, and Joe
needs to just keep reminding me to do so ;)
Reported-by: Joe Perches <joe@perches.com>
Fixes: 523375c9 ("drm/vmwgfx: Port vmwgfx to arm64")
Fixes: 5c439937 ("ASoC: codecs: add support for ES8326")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

262b42e0

net: mdio-mux-meson-g12a: force internal PHY off on mux switch · 7083df59

Jerome Brunet authored Jan 24, 2023

Force the internal PHY off then on when switching to the internal path.
This fixes problems where the PHY ID is not properly set.

Fixes: 70904251 ("net: phy: add amlogic g12a mdio mux support")
Suggested-by: Qi Duan <qi.duan@amlogic.com>
Co-developed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20230124101157.232234-1-jbrunet@baylibre.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7083df59

docs: networking: Fix bridge documentation URL · aee2770d

Ivan Vecera authored Jan 24, 2023

Current documentation URL [1] is no longer valid.

[1] https://www.linuxfoundation.org/collaborate/workgroups/networking/bridgeSigned-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230124145127.189221-1-ivecera@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

aee2770d

tsnep: Fix TX queue stop/wake for multiple queues · 3d53aaef

Gerhard Engleder authored Jan 24, 2023

netif_stop_queue() and netif_wake_queue() act on TX queue 0. This is ok
as long as only a single TX queue is supported. But support for multiple
TX queues was introduced with 76203137 and I missed to adapt stop
and wake of TX queues.

Use netif_stop_subqueue() and netif_tx_wake_queue() to act on specific
TX queue.

Fixes: 76203137 ("tsnep: Support multiple TX/RX queue pairs")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://lore.kernel.org/r/20230124191440.56887-1-gerhard@engleder-embedded.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3d53aaef

net/tg3: resolve deadlock in tg3_reset_task() during EEH · 6c4ca03b

David Christensen authored Jan 24, 2023

During EEH error injection testing, a deadlock was encountered in the tg3
driver when tg3_io_error_detected() was attempting to cancel outstanding
reset tasks:

crash> foreach UN bt
...
PID: 159    TASK: c0000000067c6000  CPU: 8   COMMAND: "eehd"
...
 #5 [c00000000681f990] __cancel_work_timer at c00000000019fd18
 #6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3]
 #7 [c00000000681faf0] eeh_report_error at c00000000004e25c
...

PID: 290    TASK: c000000036e5f800  CPU: 6   COMMAND: "kworker/6:1"
...
 #4 [c00000003721fbc0] rtnl_lock at c000000000c940d8
 #5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c00000003721fc60] process_one_work at c00000000019e5c4
...

PID: 296    TASK: c000000037a65800  CPU: 21  COMMAND: "kworker/21:1"
...
 #4 [c000000037247bc0] rtnl_lock at c000000000c940d8
 #5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c000000037247c60] process_one_work at c00000000019e5c4
...

PID: 655    TASK: c000000036f49000  CPU: 16  COMMAND: "kworker/16:2"
...:1

 #4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8
 #5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c0000000373ebc60] process_one_work at c00000000019e5c4
...

Code inspection shows that both tg3_io_error_detected() and
tg3_reset_task() attempt to acquire the RTNL lock at the beginning of
their code blocks.  If tg3_reset_task() should happen to execute between
the times when tg3_io_error_deteced() acquires the RTNL lock and
tg3_reset_task_cancel() is called, a deadlock will occur.

Moving tg3_reset_task_cancel() call earlier within the code block, prior
to acquiring RTNL, prevents this from happening, but also exposes another
deadlock issue where tg3_reset_task() may execute AFTER
tg3_io_error_detected() has executed:

crash> foreach UN bt
PID: 159    TASK: c0000000067d2000  CPU: 9   COMMAND: "eehd"
...
 #4 [c000000006867a60] rtnl_lock at c000000000c940d8
 #5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3]
 #6 [c000000006867b00] eeh_report_reset at c00000000004de88
...
PID: 363    TASK: c000000037564000  CPU: 6   COMMAND: "kworker/6:1"
...
 #3 [c000000036c1bb70] msleep at c000000000259e6c
 #4 [c000000036c1bba0] napi_disable at c000000000c6b848
 #5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3]
 #6 [c000000036c1bc60] process_one_work at c00000000019e5c4
...

This issue can be avoided by aborting tg3_reset_task() if EEH error
recovery is already in progress.

Fixes: db84bf43 ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize")
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230124185339.225806-1-drc@linux.vnet.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6c4ca03b

25 Jan, 2023 23 commits

Merge tag 'fs.fuse.acl.v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping · 7c46948a

Linus Torvalds authored Jan 25, 2023

Pull fuse ACL fix from Christian Brauner:
 "The new posix acl API doesn't depend on the xattr handler
  infrastructure anymore and instead only relies on the posix acl inode
  operations. As a result daemons without FUSE_POSIX_ACL are unable to
  use posix acls like they used to.

  Fix this by copying what we did for overlayfs during the posix acl api
  conversion. Make fuse implement a dedicated ->get_inode_acl() method
  as does overlayfs. Fuse can then also uses this to express different
  needs for vfs permission checking during lookup and acl based
  retrieval via the regular system call path.

  This allows fuse to continue to refuse retrieving posix acls for
  daemons that don't set FUSE_POSXI_ACL for permission checking while
  also allowing a fuse server to retrieve it via the usual system calls"

* tag 'fs.fuse.acl.v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fuse: fixes after adapting to new posix acl api

7c46948a

Merge branch 'mptcp-fixes' · ac8d986c

David S. Miller authored Jan 25, 2023

Jeremy Kerr says:

====================
net: mctp: struct sock lifetime fixes

This series is a set of fixes for the sock lifetime handling in the
AF_MCTP code, fixing a uaf reported by Noam Rathaus
<noamr@ssd-disclosure.com>.

The Fixes: tags indicate the original patches affected, but some
tweaking to backport to those commits may be needed; I have a separate
branch with backports to 5.15 if that helps with stable trees.

Of course, any comments/queries most welcome.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ac8d986c

net: mctp: mark socks as dead on unhash, prevent re-add · b98e1a04

Jeremy Kerr authored Jan 24, 2023

Once a socket has been unhashed, we want to prevent it from being
re-used in a sk_key entry as part of a routing operation.

This change marks the sk as SOCK_DEAD on unhash, which prevents addition
into the net's key list.

We need to do this during the key add path, rather than key lookup, as
we release the net keys_lock between those operations.

Fixes: 4a992bbd ("mctp: Implement message fragmentation & reassembly")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

b98e1a04

net: mctp: hold key reference when looking up a general key · 6e54ea37

Paolo Abeni authored Jan 24, 2023

Currently, we have a race where we look up a sock through a "general"
(ie, not directly associated with the (src,dest,tag) tuple) key, then
drop the key reference while still holding the key's sock.

This change expands the key reference until we've finished using the
sock, and hence the sock reference too.

Commit message changes from Jeremy Kerr <jk@codeconstruct.com.au>.
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Fixes: 73c61845 ("mctp: locking, lifetime and validity changes for sk_keys")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e54ea37

net: mctp: move expiry timer delete to unhash · 5f41ae6f

Jeremy Kerr authored Jan 24, 2023

Currently, we delete the key expiry timer (in sk->close) before
unhashing the sk. This means that another thread may find the sk through
its presence on the key list, and re-queue the timer.

This change moves the timer deletion to the unhash, after we have made
the key no longer observable, so the timer cannot be re-queued.

Fixes: 7b14e15a ("mctp: Implement a timeout for tags")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f41ae6f

net: mctp: add an explicit reference from a mctp_sk_key to sock · de8a6b15

Jeremy Kerr authored Jan 24, 2023

Currently, we correlate the mctp_sk_key lifetime to the sock lifetime
through the sock hash/unhash operations, but this is pretty tenuous, and
there are cases where we may have a temporary reference to an unhashed
sk.

This change makes the reference more explicit, by adding a hold on the
sock when it's associated with a mctp_sk_key, released on final key
unref.

Fixes: 73c61845 ("mctp: locking, lifetime and validity changes for sk_keys")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

de8a6b15

Merge branch 'ravb-fixes' · a9e9b78d

David S. Miller authored Jan 25, 2023

Yoshihiro Shimoda says:

====================
net: ravb: Fix potential issues

Fix potentiall issues on the ravb driver.

Changes from v2:
https://lore.kernel.org/all/20230123131331.1425648-1-yoshihiro.shimoda.uh@renesas.com/
 - Add Reviewed-by in the patch [2/2].
 - Add a commit description in the patch [2/2].

Changes from v1:
https://lore.kernel.org/all/20230119043920.875280-1-yoshihiro.shimoda.uh@renesas.com/
 - Fix typo in the patch [1/2].
 - Add Reviewed-by in the patch [1/2].
 - Fix "Fixed" tag in the patch [2/2].
 - Fix a comment indentation of the code in the patch [2/2].
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a9e9b78d

net: ravb: Fix possible hang if RIS2_QFF1 happen · f3c07758

Yoshihiro Shimoda authored Jan 24, 2023

Since this driver enables the interrupt by RIC2_QFE1, this driver
should clear the interrupt flag if it happens. Otherwise, the interrupt
causes to hang the system.

Note that this also fix a minor coding style (a comment indentation)
around the fixed code.

Fixes: c156633f ("Renesas Ethernet AVB driver proper")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3c07758

net: ravb: Fix lack of register setting after system resumed for Gen3 · c2b6cdee

Yoshihiro Shimoda authored Jan 24, 2023

After system entered Suspend to RAM, registers setting of this
hardware is reset because the SoC will be turned off. On R-Car Gen3
(info->ccc_gac), ravb_ptp_init() is called in ravb_probe() only. So,
after system resumed, it lacks of the initial settings for ptp. So,
add ravb_ptp_{init,stop}() into ravb_{resume,suspend}().

Fixes: f5d7837f ("ravb: ptp: Add CONFIG mode support")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2b6cdee

net/x25: Fix to not accept on connected socket · f2b0b521

Hyunwoo Kim authored Jan 23, 2023

When listen() and accept() are called on an x25 socket
that connect() succeeds, accept() succeeds immediately.
This is because x25_connect() queues the skb to
sk->sk_receive_queue, and x25_accept() dequeues it.

This creates a child socket with the sk of the parent
x25 socket, which can cause confusion.

Fix x25_listen() to return -EINVAL if the socket has
already been successfully connect()ed to avoid this issue.
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

f2b0b521

Merge branch 'bpf, sockmap: Fix infinite recursion in sock_map_close' · e8c8fd9b

Alexei Starovoitov authored Jan 24, 2023

Jakub Sitnicki says:

====================

This patch set addresses the syzbot report in [1].

Patch #1 has been suggested by Eric [2]. I extended it to cover the rest of
sock_map proto callbacks. Otherwise we would still overflow the stack.

Patch #2 contains the actual fix and bug analysis.
Patches #3 & #4 add coverage to selftests to trigger the bug.

[1] https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/
[2] https://lore.kernel.org/all/CANn89iK2UN1FmdUcH12fv_xiZkv2G+Nskvmq7fG6aA_6VKRf6g@mail.gmail.com/
---
v1 -> v2:
v1: https://lore.kernel.org/r/20230113-sockmap-fix-v1-0-d3cad092ee10@cloudflare.com
[v1 didn't hit bpf@ ML by mistake]

 * pull in Eric's patch to protect against recursion loop bugs (Eric)
 * add a macro helper to check if pointer is inside a memory range (Eric)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

e8c8fd9b

selftests/bpf: Cover listener cloning with progs attached to sockmap · c88ea16a

Jakub Sitnicki authored Jan 21, 2023

Today we test if a child socket is cloned properly from a listening socket
inside a sockmap only when there are no BPF programs attached to the map.

A bug has been reported [1] for the case when sockmap has a verdict program
attached. So cover this case as well to prevent regressions.

[1]: https://lore.kernel.org/r/00000000000073b14905ef2e7401@google.comSigned-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230113-sockmap-fix-v2-4-1e0ee7ac2f90@cloudflare.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c88ea16a

selftests/bpf: Pass BPF skeleton to sockmap_listen ops tests · b4ea530d

Jakub Sitnicki authored Jan 21, 2023

Following patch extends the sockmap ops tests to cover the scenario when a
sockmap with attached programs holds listening sockets.

Pass the BPF skeleton to sockmap ops test so that the can access and attach
the BPF programs.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230113-sockmap-fix-v2-3-1e0ee7ac2f90@cloudflare.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

b4ea530d

bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener · ddce1e09

Jakub Sitnicki authored Jan 21, 2023

A listening socket linked to a sockmap has its sk_prot overridden. It
points to one of the struct proto variants in tcp_bpf_prots. The variant
depends on the socket's family and which sockmap programs are attached.

A child socket cloned from a TCP listener initially inherits their sk_prot.
But before cloning is finished, we restore the child's proto to the
listener's original non-tcp_bpf_prots one. This happens in
tcp_create_openreq_child -> tcp_bpf_clone.

Today, in tcp_bpf_clone we detect if the child's proto should be restored
by checking only for the TCP_BPF_BASE proto variant. This is not
correct. The sk_prot of listening socket linked to a sockmap can point to
to any variant in tcp_bpf_prots.

If the listeners sk_prot happens to be not the TCP_BPF_BASE variant, then
the child socket unintentionally is left if the inherited sk_prot by
tcp_bpf_clone.

This leads to issues like infinite recursion on close [1], because the
child state is otherwise not set up for use with tcp_bpf_prot operations.

Adjust the check in tcp_bpf_clone to detect all of tcp_bpf_prots variants.

Note that it wouldn't be sufficient to check the socket state when
overriding the sk_prot in tcp_bpf_update_proto in order to always use the
TCP_BPF_BASE variant for listening sockets. Since commit
b8b8315e ("bpf, sockmap: Remove unhash handler for BPF sockmap usage")
it is possible for a socket to transition to TCP_LISTEN state while already
linked to a sockmap, e.g. connect() -> insert into map ->
connect(AF_UNSPEC) -> listen().

[1]: https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/

Fixes: e8025155 ("tcp_bpf: Don't let child socket inherit parent protocol ops on copy")
Reported-by: syzbot+04c21ed96d861dccc5cd@syzkaller.appspotmail.com
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230113-sockmap-fix-v2-2-1e0ee7ac2f90@cloudflare.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ddce1e09

bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself · 5b4a79ba

Jakub Sitnicki authored Jan 21, 2023

sock_map proto callbacks should never call themselves by design. Protect
against bugs like [1] and break out of the recursive loop to avoid a stack
overflow in favor of a resource leak.

[1] https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20230113-sockmap-fix-v2-1-1e0ee7ac2f90@cloudflare.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5b4a79ba

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 2a48216c

Jakub Kicinski authored Jan 24, 2023

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Perform SCTP vtag verification for ABORT/SHUTDOWN_COMPLETE according
   to RFC 9260, Sect 8.5.1.

2) Fix infinite loop if SCTP chunk size is zero in for_each_sctp_chunk().
   And remove useless check in this macro too.

3) Revert DATA_SENT state in the SCTP tracker, this was applied in the
   previous merge window. Next patch in this series provides a more
   simple approach to multihoming support.

4) Unify HEARTBEAT_ACKED and ESTABLISHED states for SCTP multihoming
   support, use default ESTABLISHED of 210 seconds based on
   heartbeat timeout * maximum number of retransmission + round-trip timeout.
   Otherwise, SCTP conntrack entry that represents secondary paths
   remain stale in the table for up to 5 days.

This is a slightly large batch with fixes for the SCTP connection
tracking helper, all patches from Sriram Yagnaraman.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: conntrack: unify established states for SCTP paths
  Revert "netfilter: conntrack: add sctp DATA_SENT state"
  netfilter: conntrack: fix bug in for_each_sctp_chunk
  netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
====================

Link: https://lore.kernel.org/r/20230124183933.4752-1-pablo@netfilter.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2a48216c

ice: move devlink port creation/deletion · 418e5340

Paul M Stillwell Jr authored Jan 23, 2023

Commit a286ba73 ("ice: reorder PF/representor devlink
port register/unregister flows") moved the code to create
and destroy the devlink PF port. This was fine, but created
a corner case issue in the case of ice_register_netdev()
failing. In that case, the driver would end up calling
ice_devlink_destroy_pf_port() twice.

Additionally, it makes no sense to tie creation of the devlink
PF port to the creation of the netdev so separate out the
code to create/destroy the devlink PF port from the netdev
code. This makes it a cleaner interface.

Fixes: a286ba73 ("ice: reorder PF/representor devlink port register/unregister flows")
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230124005714.3996270-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

418e5340

sctp: fail if no bound addresses can be used for a given scope · 458e279f

Marcelo Ricardo Leitner authored Jan 23, 2023

Currently, if you bind the socket to something like:
        servaddr.sin6_family = AF_INET6;
        servaddr.sin6_port = htons(0);
        servaddr.sin6_scope_id = 0;
        inet_pton(AF_INET6, "::1", &servaddr.sin6_addr);

And then request a connect to:
        connaddr.sin6_family = AF_INET6;
        connaddr.sin6_port = htons(20000);
        connaddr.sin6_scope_id = if_nametoindex("lo");
        inet_pton(AF_INET6, "fe88::1", &connaddr.sin6_addr);

What the stack does is:
 - bind the socket
 - create a new asoc
 - to handle the connect
   - copy the addresses that can be used for the given scope
   - try to connect

But the copy returns 0 addresses, and the effect is that it ends up
trying to connect as if the socket wasn't bound, which is not the
desired behavior. This unexpected behavior also allows KASLR leaks
through SCTP diag interface.

The fix here then is, if when trying to copy the addresses that can
be used for the scope used in connect() it returns 0 addresses, bail
out. This is what TCP does with a similar reproducer.
Reported-by: Pietro Borrello <borrello@diag.uniroma1.it>
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/9fcd182f1099f86c6661f3717f63712ddd1c676c.1674496737.git.marcelo.leitner@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

458e279f

Merge tag 'modules-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux · 948ef7bb

Linus Torvalds authored Jan 24, 2023

Pull module fix from Luis Chamberlain:
 "Theis is a fix we have been delaying for v6.2 due to lack of early
  testing on linux-next.

  The commit has been sitting in linux-next since December and testing
  has also been now a bit extensive by a few developers. Since this is a
  fix which definitely will go to v6.3 it should also apply to v6.2 so
  if there are any issues we pick them up earlier rather than later. The
  fix fixes a regression since v5.3, prior to me helping with module
  maintenance, however, the issue is real in that in the worst case now
  can prevent boot.

  We've discussed all possible corner cases [0] and at last do feel this
  is ready for v6.2-rc6"

Link https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/ [0]

* tag 'modules-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: Don't wait for GOING modules

948ef7bb

net/sched: sch_taprio: do not schedule in taprio_reset() · ea4fdbaa

Eric Dumazet authored Jan 23, 2023

As reported by syzbot and hinted by Vinicius, I should not have added
a qdisc_synchronize() call in taprio_reset()

taprio_reset() can be called with qdisc spinlock held (and BH disabled)
as shown in included syzbot report [1].

Only taprio_destroy() needed this synchronization, as explained
in the blamed commit changelog.

[1]

BUG: scheduling while atomic: syz-executor150/5091/0x00000202
2 locks held by syz-executor150/5091:
Modules linked in:
Preemption disabled at:
[<0000000000000000>] 0x0
Kernel panic - not syncing: scheduling while atomic: panic_on_warn set ...
CPU: 1 PID: 5091 Comm: syz-executor150 Not tainted 6.2.0-rc3-syzkaller-00219-g010a74f5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
panic+0x2cc/0x626 kernel/panic.c:318
check_panic_on_warn.cold+0x19/0x35 kernel/panic.c:238
__schedule_bug.cold+0xd5/0xfe kernel/sched/core.c:5836
schedule_debug kernel/sched/core.c:5865 [inline]
__schedule+0x34e4/0x5450 kernel/sched/core.c:6500
schedule+0xde/0x1b0 kernel/sched/core.c:6682
schedule_timeout+0x14e/0x2a0 kernel/time/timer.c:2167
schedule_timeout_uninterruptible kernel/time/timer.c:2201 [inline]
msleep+0xb6/0x100 kernel/time/timer.c:2322
qdisc_synchronize include/net/sch_generic.h:1295 [inline]
taprio_reset+0x93/0x270 net/sched/sch_taprio.c:1703
qdisc_reset+0x10c/0x770 net/sched/sch_generic.c:1022
dev_reset_queue+0x92/0x130 net/sched/sch_generic.c:1285
netdev_for_each_tx_queue include/linux/netdevice.h:2464 [inline]
dev_deactivate_many+0x36d/0x9f0 net/sched/sch_generic.c:1351
dev_deactivate+0xed/0x1b0 net/sched/sch_generic.c:1374
qdisc_graft+0xe4a/0x1380 net/sched/sch_api.c:1080
tc_modify_qdisc+0xb6b/0x19a0 net/sched/sch_api.c:1689
rtnetlink_rcv_msg+0x43e/0xca0 net/core/rtnetlink.c:6141
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564
netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356
netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
____sys_sendmsg+0x712/0x8c0 net/socket.c:2476
___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
__sys_sendmsg+0xf7/0x1c0 net/socket.c:2559
do_syscall_x64 arch/x86/entry/common.c:50 [inline]

Fixes: 3a415d59 ("net/sched: sch_taprio: fix possible use-after-free")
Link: https://lore.kernel.org/netdev/167387581653.2747.13878941339893288655.git-patchwork-notify@kernel.org/T/Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20230123084552.574396-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ea4fdbaa

Merge tag 'rust-fixes-6.2' of https://github.com/Rust-for-Linux/linux · 246dc53f

Linus Torvalds authored Jan 24, 2023

Pull rust fix from Miguel Ojeda:

 - Avoid evaluating arguments in 'pr_*' macros in 'unsafe' blocks

* tag 'rust-fixes-6.2' of https://github.com/Rust-for-Linux/linux:
  rust: print: avoid evaluating arguments in `pr_*` macros in `unsafe` blocks

246dc53f

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · b2f31717

Linus Torvalds authored Jan 24, 2023

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Pass the correct address to mte_clear_page_tags() on initialising a
     tagged page

   - Plug a race against a GICv4.1 doorbell interrupt while saving the
     vgic-v3 pending state.

  x86:

   - A command line parsing fix and a clang compilation fix for
     selftests

   - A fix for a longstanding VMX issue, that surprisingly was only
     found now to affect real world guests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: selftests: Make reclaim_period_ms input always be positive
  KVM: x86/vmx: Do not skip segment attributes if unusable bit is set
  selftests: kvm: move declaration at the beginning of main()
  KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation
  KVM: arm64: Pass the actual page address to mte_clear_page_tags()

b2f31717

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 02db81a7

Linus Torvalds authored Jan 24, 2023

Pull SCSI fixes from James Bottomley:
 "Six fixes, all in drivers.

  The biggest are the UFS devfreq fixes which address a lock inversion
  and the two iscsi_tcp fixes which try to prevent a use after free from
  userspace still accessing an area which the kernel has released (seen
  by KASAN)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: device_handler: alua: Remove a might_sleep() annotation
  scsi: iscsi_tcp: Fix UAF during login when accessing the shost ipaddress
  scsi: iscsi_tcp: Fix UAF during logout when accessing the shost ipaddress
  scsi: ufs: core: Fix devfreq deadlocks
  scsi: hpsa: Fix allocation size for scsi_host_alloc()
  scsi: target: core: Fix warning on RT kernels

02db81a7

24 Jan, 2023 1 commit

Merge tag 'nfsd-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · fb6e71db

Linus Torvalds authored Jan 24, 2023

Pull nfsd fix from Chuck Lever:

 - Nail another UAF in NFSD's filecache

* tag 'nfsd-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: don't free files unconditionally in __nfsd_file_cache_purge

fb6e71db