1. 05 Apr, 2023 3 commits
    • Jakub Kicinski's avatar
      Merge branch 'raw-ping-fix-locking-in-proc-net-raw-icmp' · 95fac540
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      raw/ping: Fix locking in /proc/net/{raw,icmp}.
      
      The first patch fixes a NULL deref for /proc/net/raw and second one fixes
      the same issue for ping sockets.
      
      The first patch also converts hlist_nulls to hlist, but this is because
      the current code uses sk_nulls_for_each() for lockless readers, instead
      of sk_nulls_for_each_rcu() which adds memory barrier, but raw sockets
      does not use the nulls marker nor SLAB_TYPESAFE_BY_RCU in the first place.
      
      OTOH, the ping sockets already uses sk_nulls_for_each_rcu(), and such
      conversion can be posted later for net-next.
      ====================
      
      Link: https://lore.kernel.org/r/20230403194959.48928-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95fac540
    • Kuniyuki Iwashima's avatar
      ping: Fix potentail NULL deref for /proc/net/icmp. · ab5fb73f
      Kuniyuki Iwashima authored
      After commit dbca1596 ("ping: convert to RCU lookups, get rid
      of rwlock"), we use RCU for ping sockets, but we should use spinlock
      for /proc/net/icmp to avoid a potential NULL deref mentioned in
      the previous patch.
      
      Let's go back to using spinlock there.
      
      Note we can convert ping sockets to use hlist instead of hlist_nulls
      because we do not use SLAB_TYPESAFE_BY_RCU for ping sockets.
      
      Fixes: dbca1596 ("ping: convert to RCU lookups, get rid of rwlock")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ab5fb73f
    • Kuniyuki Iwashima's avatar
      raw: Fix NULL deref in raw_get_next(). · 0a78cf72
      Kuniyuki Iwashima authored
      Dae R. Jeong reported a NULL deref in raw_get_next() [0].
      
      It seems that the repro was running these sequences in parallel so
      that one thread was iterating on a socket that was being freed in
      another netns.
      
        unshare(0x40060200)
        r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00')
        socket$inet_icmp_raw(0x2, 0x3, 0x1)
        pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f)
      
      After commit 0daf07e5 ("raw: convert raw sockets to RCU"), we
      use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW
      sockets.  However, we should use spinlock for slow paths to avoid
      the NULL deref.
      
      Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object
      is not reused during iteration in the grace period.  In fact, the
      lockless readers do not check the nulls marker with get_nulls_value().
      So, SOCK_RAW should use hlist instead of hlist_nulls.
      
      Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(),
      let's convert hlist_nulls to hlist and use sk_for_each_rcu() for
      fast paths and sk_for_each() and spinlock for /proc/net/raw.
      
      [0]:
      general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
      CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
      RIP: 0010:sock_net include/net/sock.h:649 [inline]
      RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
      RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
      RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
      Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
      RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
      RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
      RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
      RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
      R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
      R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
      FS:  00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055bb9614b35f CR3: 000000003c672000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225
       seq_read+0x224/0x320 fs/seq_file.c:162
       pde_read fs/proc/inode.c:316 [inline]
       proc_reg_read+0x23f/0x330 fs/proc/inode.c:328
       vfs_read+0x31e/0xd30 fs/read_write.c:468
       ksys_pread64 fs/read_write.c:665 [inline]
       __do_sys_pread64 fs/read_write.c:675 [inline]
       __se_sys_pread64 fs/read_write.c:672 [inline]
       __x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x478d29
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f843ae8dbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
      RAX: ffffffffffffffda RBX: 0000000000791408 RCX: 0000000000478d29
      RDX: 000000000000000a RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000f477909a R08: 0000000000000000 R09: 0000000000000000
      R10: 000010000000007f R11: 0000000000000246 R12: 0000000000791740
      R13: 0000000000791414 R14: 0000000000791408 R15: 00007ffc2eb48a50
       </TASK>
      Modules linked in:
      ---[ end trace 0000000000000000 ]---
      RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
      RIP: 0010:sock_net include/net/sock.h:649 [inline]
      RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
      RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
      RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
      Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
      RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
      RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
      RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
      RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
      R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
      R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
      FS:  00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f92ff166000 CR3: 000000003c672000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 0daf07e5 ("raw: convert raw sockets to RCU")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reported-by: default avatarDae R. Jeong <threeearcat@gmail.com>
      Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a78cf72
  2. 04 Apr, 2023 2 commits
    • Corinna Vinschen's avatar
      net: stmmac: fix up RX flow hash indirection table when setting channels · 218c5973
      Corinna Vinschen authored
      stmmac_reinit_queues() fails to fix up the RX hash.  Even if the number
      of channels gets restricted, the output of `ethtool -x' indicates that
      all RX queues are used:
      
        $ ethtool -l enp0s29f2
        Channel parameters for enp0s29f2:
        Pre-set maximums:
        RX:		8
        TX:		8
        Other:		n/a
        Combined:	n/a
        Current hardware settings:
        RX:		8
        TX:		8
        Other:		n/a
        Combined:	n/a
        $ ethtool -x enp0s29f2
        RX flow hash indirection table for enp0s29f2 with 8 RX ring(s):
            0:      0     1     2     3     4     5     6     7
            8:      0     1     2     3     4     5     6     7
        [...]
        $ ethtool -L enp0s29f2 rx 3
        $ ethtool -x enp0s29f2
        RX flow hash indirection table for enp0s29f2 with 3 RX ring(s):
            0:      0     1     2     3     4     5     6     7
            8:      0     1     2     3     4     5     6     7
        [...]
      
      Fix this by setting the indirection table according to the number
      of specified queues.  The result is now as expected:
      
        $ ethtool -L enp0s29f2 rx 3
        $ ethtool -x enp0s29f2
        RX flow hash indirection table for enp0s29f2 with 3 RX ring(s):
            0:      0     1     2     0     1     2     0     1
            8:      2     0     1     2     0     1     2     0
        [...]
      
      Tested on Intel Elkhart Lake.
      
      Fixes: 0366f7e0 ("net: stmmac: add ethtool support for get/set channels")
      Signed-off-by: default avatarCorinna Vinschen <vinschen@redhat.com>
      Link: https://lore.kernel.org/r/20230403121120.489138-1-vinschen@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      218c5973
    • Siddharth Vadapalli's avatar
      net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe · c6b486fb
      Siddharth Vadapalli authored
      In the am65_cpsw_nuss_probe() function's cleanup path, the call to
      of_platform_device_destroy() for the common->mdio_dev device is invoked
      unconditionally. It is possible that either the MDIO node is not present
      in the device-tree, or the MDIO node is disabled in the device-tree. In
      both these cases, the MDIO device is not created, resulting in a NULL
      pointer dereference when the of_platform_device_destroy() function is
      invoked on the common->mdio_dev device on the cleanup path.
      
      Fix this by ensuring that the common->mdio_dev device exists, before
      attempting to invoke of_platform_device_destroy().
      
      Fixes: a45cfcc6 ("net: ethernet: ti: am65-cpsw-nuss: use of_platform_device_create() for mdio")
      Signed-off-by: default avatarSiddharth Vadapalli <s-vadapalli@ti.com>
      Reviewed-by: default avatarRoger Quadros <rogerq@kernel.org>
      Link: https://lore.kernel.org/r/20230403090321.835877-1-s-vadapalli@ti.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c6b486fb
  3. 03 Apr, 2023 3 commits
    • Ziyang Xuan's avatar
      ipv6: Fix an uninit variable access bug in __ip6_make_skb() · ea30388b
      Ziyang Xuan authored
      Syzbot reported a bug as following:
      
      =====================================================
      BUG: KMSAN: uninit-value in arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline]
      BUG: KMSAN: uninit-value in arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline]
      BUG: KMSAN: uninit-value in atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline]
      BUG: KMSAN: uninit-value in __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956
       arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline]
       arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline]
       atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline]
       __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956
       ip6_finish_skb include/net/ipv6.h:1122 [inline]
       ip6_push_pending_frames+0x10e/0x550 net/ipv6/ip6_output.c:1987
       rawv6_push_pending_frames+0xb12/0xb90 net/ipv6/raw.c:579
       rawv6_sendmsg+0x297e/0x2e60 net/ipv6/raw.c:922
       inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       ____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476
       ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530
       __sys_sendmsg net/socket.c:2559 [inline]
       __do_sys_sendmsg net/socket.c:2568 [inline]
       __se_sys_sendmsg net/socket.c:2566 [inline]
       __x64_sys_sendmsg+0x367/0x540 net/socket.c:2566
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:766 [inline]
       slab_alloc_node mm/slub.c:3452 [inline]
       __kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
       __do_kmalloc_node mm/slab_common.c:967 [inline]
       __kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988
       kmalloc_reserve net/core/skbuff.c:492 [inline]
       __alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565
       alloc_skb include/linux/skbuff.h:1270 [inline]
       __ip6_append_data+0x51c1/0x6bb0 net/ipv6/ip6_output.c:1684
       ip6_append_data+0x411/0x580 net/ipv6/ip6_output.c:1854
       rawv6_sendmsg+0x2882/0x2e60 net/ipv6/raw.c:915
       inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       ____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476
       ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530
       __sys_sendmsg net/socket.c:2559 [inline]
       __do_sys_sendmsg net/socket.c:2568 [inline]
       __se_sys_sendmsg net/socket.c:2566 [inline]
       __x64_sys_sendmsg+0x367/0x540 net/socket.c:2566
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      It is because icmp6hdr does not in skb linear region under the scenario
      of SOCK_RAW socket. Access icmp6_hdr(skb)->icmp6_type directly will
      trigger the uninit variable access bug.
      
      Use a local variable icmp6_type to carry the correct value in different
      scenarios.
      
      Fixes: 14878f75 ("[IPV6]: Add ICMPMsgStats MIB (RFC 4293) [rev 2]")
      Reported-by: syzbot+8257f4dcef79de670baf@syzkaller.appspotmail.com
      Link: https://syzkaller.appspot.com/bug?id=3d605ec1d0a7f2a269a1a6936ac7f2b85975ee9cSigned-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea30388b
    • Sricharan Ramabadhran's avatar
      net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT · 839349d1
      Sricharan Ramabadhran authored
      On the remote side, when QRTR socket is removed, af_qrtr will call
      qrtr_port_remove() which broadcasts the DEL_CLIENT packet to all neighbours
      including local NS. NS upon receiving the DEL_CLIENT packet, will remove
      the lookups associated with the node:port and broadcasts the DEL_SERVER
      packet.
      
      But on the host side, due to the arrival of the DEL_CLIENT packet, the NS
      would've already deleted the server belonging to that port. So when the
      remote's NS again broadcasts the DEL_SERVER for that port, it throws below
      error message on the host:
      
      "failed while handling packet from 2:-2"
      
      So fix this error by not broadcasting the DEL_SERVER packet when the
      DEL_CLIENT packet gets processed."
      
      Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
      Reviewed-by: default avatarManivannan Sadhasivam <mani@kernel.org>
      Signed-off-by: default avatarRam Kumar Dharuman <quic_ramd@quicinc.com>
      Signed-off-by: default avatarSricharan Ramabadhran <quic_srichara@quicinc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      839349d1
    • Daniel Golle's avatar
      net: sfp: add quirk enabling 2500Base-x for HG MXPD-483II · ad651d68
      Daniel Golle authored
      The HG MXPD-483II 1310nm SFP module is meant to operate with 2500Base-X,
      however, in their EEPROM they incorrectly specify:
          Transceiver type                          : Ethernet: 1000BASE-LX
          ...
          BR, Nominal                               : 2600MBd
      
      Use sfp_quirk_2500basex for this module to allow 2500Base-X mode anyway.
      
      https://forum.banana-pi.org/t/bpi-r3-sfp-module-compatibility/14573/60Reported-by: default avatarchowtom <chowtom@gmail.com>
      Tested-by: default avatarchowtom <chowtom@gmail.com>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad651d68
  4. 02 Apr, 2023 4 commits
    • Xin Long's avatar
      sctp: check send stream number after wait_for_sndbuf · 2584024b
      Xin Long authored
      This patch fixes a corner case where the asoc out stream count may change
      after wait_for_sndbuf.
      
      When the main thread in the client starts a connection, if its out stream
      count is set to N while the in stream count in the server is set to N - 2,
      another thread in the client keeps sending the msgs with stream number
      N - 1, and waits for sndbuf before processing INIT_ACK.
      
      However, after processing INIT_ACK, the out stream count in the client is
      shrunk to N - 2, the same to the in stream count in the server. The crash
      occurs when the thread waiting for sndbuf is awake and sends the msg in a
      non-existing stream(N - 1), the call trace is as below:
      
        KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
        Call Trace:
         <TASK>
         sctp_cmd_send_msg net/sctp/sm_sideeffect.c:1114 [inline]
         sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1777 [inline]
         sctp_side_effects net/sctp/sm_sideeffect.c:1199 [inline]
         sctp_do_sm+0x197d/0x5310 net/sctp/sm_sideeffect.c:1170
         sctp_primitive_SEND+0x9f/0xc0 net/sctp/primitive.c:163
         sctp_sendmsg_to_asoc+0x10eb/0x1a30 net/sctp/socket.c:1868
         sctp_sendmsg+0x8d4/0x1d90 net/sctp/socket.c:2026
         inet_sendmsg+0x9d/0xe0 net/ipv4/af_inet.c:825
         sock_sendmsg_nosec net/socket.c:722 [inline]
         sock_sendmsg+0xde/0x190 net/socket.c:745
      
      The fix is to add an unlikely check for the send stream number after the
      thread wakes up from the wait_for_sndbuf.
      
      Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
      Reported-by: syzbot+47c24ca20a2fa01f082e@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2584024b
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: fix remaining throughput regression · e669ce46
      Felix Fietkau authored
      Based on further tests, it seems that the QDMA shaper is not able to
      perform shaping close to the MAC link rate without throughput loss.
      This cannot be compensated by increasing the shaping rate, so it seems
      to be an internal limit.
      
      Fix the remaining throughput regression by detecting that condition and
      limiting shaping to ports with lower link speed.
      
      This patch intentionally ignores link speed gain from TRGMII, because
      even on such links, shaping to 1000 Mbit/s incurs some throughput
      degradation.
      
      Fixes: f63959c7 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues")
      Tested-By: default avatarFrank Wunderlich <frank-w@public-files.de>
      Reported-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e669ce46
    • Gustav Ekelund's avatar
      net: dsa: mv88e6xxx: Reset mv88e6393x force WD event bit · 089b91a0
      Gustav Ekelund authored
      The force watchdog event bit is not cleared during SW reset in the
      mv88e6393x switch. This is a different behavior compared to mv886390 which
      clears the force WD event bit as advertised. This causes a force WD event
      to be handled over and over again as the SW reset following the event never
      clears the force WD event bit.
      
      Explicitly clear the watchdog event register to 0 in irq_action when
      handling an event to prevent the switch from sending continuous interrupts.
      Marvell aren't aware of any other stuck bits apart from the force WD
      bit.
      
      Fixes: de776d0d ("net: dsa: mv88e6xxx: add support for mv88e6393x family"
      Signed-off-by: default avatarGustav Ekelund <gustaek@axis.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      089b91a0
    • Jakub Kicinski's avatar
      net: don't let netpoll invoke NAPI if in xmit context · 275b471e
      Jakub Kicinski authored
      Commit 0db3dc73 ("[NETPOLL]: tx lock deadlock fix") narrowed
      down the region under netif_tx_trylock() inside netpoll_send_skb().
      (At that point in time netif_tx_trylock() would lock all queues of
      the device.) Taking the tx lock was problematic because driver's
      cleanup method may take the same lock. So the change made us hold
      the xmit lock only around xmit, and expected the driver to take
      care of locking within ->ndo_poll_controller().
      
      Unfortunately this only works if netpoll isn't itself called with
      the xmit lock already held. Netpoll code is careful and uses
      trylock(). The drivers, however, may be using plain lock().
      Printing while holding the xmit lock is going to result in rare
      deadlocks.
      
      Luckily we record the xmit lock owners, so we can scan all the queues,
      the same way we scan NAPI owners. If any of the xmit locks is held
      by the local CPU we better not attempt any polling.
      
      It would be nice if we could narrow down the check to only the NAPIs
      and the queue we're trying to use. I don't see a way to do that now.
      Reported-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Fixes: 0db3dc73 ("[NETPOLL]: tx lock deadlock fix")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      275b471e
  5. 01 Apr, 2023 2 commits
    • Eric Dumazet's avatar
      icmp: guard against too small mtu · 7d63b671
      Eric Dumazet authored
      syzbot was able to trigger a panic [1] in icmp_glue_bits(), or
      more exactly in skb_copy_and_csum_bits()
      
      There is no repro yet, but I think the issue is that syzbot
      manages to lower device mtu to a small value, fooling __icmp_send()
      
      __icmp_send() must make sure there is enough room for the
      packet to include at least the headers.
      
      We might in the future refactor skb_copy_and_csum_bits() and its
      callers to no longer crash when something bad happens.
      
      [1]
      kernel BUG at net/core/skbuff.c:3343 !
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 15766 Comm: syz-executor.0 Not tainted 6.3.0-rc4-syzkaller-00039-gffe78bbd #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      RIP: 0010:skb_copy_and_csum_bits+0x798/0x860 net/core/skbuff.c:3343
      Code: f0 c1 c8 08 41 89 c6 e9 73 ff ff ff e8 61 48 d4 f9 e9 41 fd ff ff 48 8b 7c 24 48 e8 52 48 d4 f9 e9 c3 fc ff ff e8 c8 27 84 f9 <0f> 0b 48 89 44 24 28 e8 3c 48 d4 f9 48 8b 44 24 28 e9 9d fb ff ff
      RSP: 0018:ffffc90000007620 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 00000000000001e8 RCX: 0000000000000100
      RDX: ffff8880276f6280 RSI: ffffffff87fdd138 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
      R10: 00000000000001e8 R11: 0000000000000001 R12: 000000000000003c
      R13: 0000000000000000 R14: ffff888028244868 R15: 0000000000000b0e
      FS: 00007fbc81f1c700(0000) GS:ffff88802ca00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2df43000 CR3: 00000000744db000 CR4: 0000000000150ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <IRQ>
      icmp_glue_bits+0x7b/0x210 net/ipv4/icmp.c:353
      __ip_append_data+0x1d1b/0x39f0 net/ipv4/ip_output.c:1161
      ip_append_data net/ipv4/ip_output.c:1343 [inline]
      ip_append_data+0x115/0x1a0 net/ipv4/ip_output.c:1322
      icmp_push_reply+0xa8/0x440 net/ipv4/icmp.c:370
      __icmp_send+0xb80/0x1430 net/ipv4/icmp.c:765
      ipv4_send_dest_unreach net/ipv4/route.c:1239 [inline]
      ipv4_link_failure+0x5a9/0x9e0 net/ipv4/route.c:1246
      dst_link_failure include/net/dst.h:423 [inline]
      arp_error_report+0xcb/0x1c0 net/ipv4/arp.c:296
      neigh_invalidate+0x20d/0x560 net/core/neighbour.c:1079
      neigh_timer_handler+0xc77/0xff0 net/core/neighbour.c:1166
      call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700
      expire_timers+0x29b/0x4b0 kernel/time/timer.c:1751
      __run_timers kernel/time/timer.c:2022 [inline]
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+d373d60fddbdc915e666@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230330174502.1915328-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7d63b671
    • Jakub Kicinski's avatar
      Revert "net: netcp: MAX_SKB_FRAGS is now 'int'" · adef41b0
      Jakub Kicinski authored
      This reverts commit c5b959ee.
      
      Reverted change is required after commit 3948b059 ("net: introduce
      a config option to tweak MAX_SKB_FRAGS") which does not exist
      in this tree, yet. It's only present in -next trees at the time
      of writing.
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://lore.kernel.org/all/20230331214444.GA1426512@dev-arch.thelio-3990X/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      adef41b0
  6. 31 Mar, 2023 11 commits
  7. 30 Mar, 2023 15 commits