1. 11 Apr, 2024 11 commits
  2. 10 Apr, 2024 7 commits
    • Heiner Kallweit's avatar
      r8169: fix LED-related deadlock on module removal · 19fa4f2a
      Heiner Kallweit authored
      Binding devm_led_classdev_register() to the netdev is problematic
      because on module removal we get a RTNL-related deadlock. Fix this
      by avoiding the device-managed LED functions.
      
      Note: We can safely call led_classdev_unregister() for a LED even
      if registering it failed, because led_classdev_unregister() detects
      this and is a no-op in this case.
      
      Fixes: 18764b88 ("r8169: add support for LED's on RTL8168/RTL8101")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19fa4f2a
    • Brett Creeley's avatar
      pds_core: Fix pdsc_check_pci_health function to use work thread · 81665adf
      Brett Creeley authored
      When the driver notices fw_status == 0xff it tries to perform a PCI
      reset on itself via pci_reset_function() in the context of the driver's
      health thread. However, pdsc_reset_prepare calls
      pdsc_stop_health_thread(), which attempts to stop/flush the health
      thread. This results in a deadlock because the stop/flush will never
      complete since the driver called pci_reset_function() from the health
      thread context. Fix by changing the pdsc_check_pci_health_function()
      to queue a newly introduced pdsc_pci_reset_thread() on the pdsc's
      work queue.
      
      Unloading the driver in the fw_down/dead state uncovered another issue,
      which can be seen in the following trace:
      
      WARNING: CPU: 51 PID: 6914 at kernel/workqueue.c:1450 __queue_work+0x358/0x440
      [...]
      RIP: 0010:__queue_work+0x358/0x440
      [...]
      Call Trace:
       <TASK>
       ? __warn+0x85/0x140
       ? __queue_work+0x358/0x440
       ? report_bug+0xfc/0x1e0
       ? handle_bug+0x3f/0x70
       ? exc_invalid_op+0x17/0x70
       ? asm_exc_invalid_op+0x1a/0x20
       ? __queue_work+0x358/0x440
       queue_work_on+0x28/0x30
       pdsc_devcmd_locked+0x96/0xe0 [pds_core]
       pdsc_devcmd_reset+0x71/0xb0 [pds_core]
       pdsc_teardown+0x51/0xe0 [pds_core]
       pdsc_remove+0x106/0x200 [pds_core]
       pci_device_remove+0x37/0xc0
       device_release_driver_internal+0xae/0x140
       driver_detach+0x48/0x90
       bus_remove_driver+0x6d/0xf0
       pci_unregister_driver+0x2e/0xa0
       pdsc_cleanup_module+0x10/0x780 [pds_core]
       __x64_sys_delete_module+0x142/0x2b0
       ? syscall_trace_enter.isra.18+0x126/0x1a0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7fbd9d03a14b
      [...]
      
      Fix this by preventing the devcmd reset if the FW is not running.
      
      Fixes: d9407ff1 ("pds_core: Prevent health thread from running during reset/remove")
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81665adf
    • Jiri Benc's avatar
      ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr · 7633c4da
      Jiri Benc authored
      Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
      still means hlist_for_each_entry_rcu can return an item that got removed
      from the list. The memory itself of such item is not freed thanks to RCU
      but nothing guarantees the actual content of the memory is sane.
      
      In particular, the reference count can be zero. This can happen if
      ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
      from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
      references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
      timing, this can happen:
      
      1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
      
      2. Then, the whole ipv6_del_addr is executed for the given entry. The
         reference count drops to zero and kfree_rcu is scheduled.
      
      3. ipv6_get_ifaddr continues and tries to increments the reference count
         (in6_ifa_hold).
      
      4. The rcu is unlocked and the entry is freed.
      
      5. The freed entry is returned.
      
      Prevent increasing of the reference count in such case. The name
      in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
      
      [   41.506330] refcount_t: addition on 0; use-after-free.
      [   41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130
      [   41.507413] Modules linked in: veth bridge stp llc
      [   41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be8 #14
      [   41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
      [   41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130
      [   41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff
      [   41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282
      [   41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000
      [   41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900
      [   41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff
      [   41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000
      [   41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48
      [   41.514086] FS:  00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000
      [   41.514726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0
      [   41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   41.516799] Call Trace:
      [   41.517037]  <TASK>
      [   41.517249]  ? __warn+0x7b/0x120
      [   41.517535]  ? refcount_warn_saturate+0xa5/0x130
      [   41.517923]  ? report_bug+0x164/0x190
      [   41.518240]  ? handle_bug+0x3d/0x70
      [   41.518541]  ? exc_invalid_op+0x17/0x70
      [   41.520972]  ? asm_exc_invalid_op+0x1a/0x20
      [   41.521325]  ? refcount_warn_saturate+0xa5/0x130
      [   41.521708]  ipv6_get_ifaddr+0xda/0xe0
      [   41.522035]  inet6_rtm_getaddr+0x342/0x3f0
      [   41.522376]  ? __pfx_inet6_rtm_getaddr+0x10/0x10
      [   41.522758]  rtnetlink_rcv_msg+0x334/0x3d0
      [   41.523102]  ? netlink_unicast+0x30f/0x390
      [   41.523445]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
      [   41.523832]  netlink_rcv_skb+0x53/0x100
      [   41.524157]  netlink_unicast+0x23b/0x390
      [   41.524484]  netlink_sendmsg+0x1f2/0x440
      [   41.524826]  __sys_sendto+0x1d8/0x1f0
      [   41.525145]  __x64_sys_sendto+0x1f/0x30
      [   41.525467]  do_syscall_64+0xa5/0x1b0
      [   41.525794]  entry_SYSCALL_64_after_hwframe+0x72/0x7a
      [   41.526213] RIP: 0033:0x7fbc4cfcea9a
      [   41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
      [   41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a
      [   41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005
      [   41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c
      [   41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [   41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b
      [   41.531573]  </TASK>
      
      Fixes: 5c578aed ("IPv6: convert addrconf hash list to RCU")
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.1712585809.git.jbenc@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7633c4da
    • Jakub Kicinski's avatar
      Merge branch 'net-start-to-replace-copy_from_sockptr' · 7b6575c6
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      net: start to replace copy_from_sockptr()
      
      We got several syzbot reports about unsafe copy_from_sockptr()
      calls. After fixing some of them, it appears that we could
      use a new helper to factorize all the checks in one place.
      
      This series targets net tree, we can later start converting
      many call sites in net-next.
      ====================
      
      Link: https://lore.kernel.org/r/20240408082845.3957374-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b6575c6
    • Eric Dumazet's avatar
      nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies · 7a87441c
      Eric Dumazet authored
      syzbot reported unsafe calls to copy_from_sockptr() [1]
      
      Use copy_safe_from_sockptr() instead.
      
      [1]
      
      BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255
      Read of size 4 at addr ffff88801caa1ec3 by task syz-executor459/5078
      
      CPU: 0 PID: 5078 Comm: syz-executor459 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255
        do_sock_setsockopt+0x3b1/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfd/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      RIP: 0033:0x7f7fac07fd89
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fff660eb788 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f7fac07fd89
      RDX: 0000000000000000 RSI: 0000000000000118 RDI: 0000000000000004
      RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
      R10: 0000000020000a80 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20240408082845.3957374-4-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a87441c
    • Eric Dumazet's avatar
      mISDN: fix MISDN_TIME_STAMP handling · 138b7878
      Eric Dumazet authored
      syzbot reports one unsafe call to copy_from_sockptr() [1]
      
      Use copy_safe_from_sockptr() instead.
      
      [1]
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in data_sock_setsockopt+0x46c/0x4cc drivers/isdn/mISDN/socket.c:417
      Read of size 4 at addr ffff0000c6d54083 by task syz-executor406/6167
      
      CPU: 1 PID: 6167 Comm: syz-executor406 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call trace:
        dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291
        show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x178/0x518 mm/kasan/report.c:488
        kasan_report+0xd8/0x138 mm/kasan/report.c:601
        __asan_report_load_n_noabort+0x1c/0x28 mm/kasan/report_generic.c:391
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        data_sock_setsockopt+0x46c/0x4cc drivers/isdn/mISDN/socket.c:417
        do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2311
        __sys_setsockopt+0x128/0x1a8 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2340
        __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
        invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:48
        el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:133
        do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:152
        el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
        el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
        el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
      
      Fixes: 1b2b03f8 ("Add mISDN core files")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Link: https://lore.kernel.org/r/20240408082845.3957374-3-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      138b7878
    • Eric Dumazet's avatar
      net: add copy_safe_from_sockptr() helper · 6309863b
      Eric Dumazet authored
      copy_from_sockptr() helper is unsafe, unless callers
      did the prior check against user provided optlen.
      
      Too many callers get this wrong, lets add a helper to
      fix them and avoid future copy/paste bugs.
      
      Instead of :
      
         if (optlen < sizeof(opt)) {
             err = -EINVAL;
             break;
         }
         if (copy_from_sockptr(&opt, optval, sizeof(opt)) {
             err = -EFAULT;
             break;
         }
      
      Use :
      
         err = copy_safe_from_sockptr(&opt, sizeof(opt),
                                      optval, optlen);
         if (err)
             break;
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408082845.3957374-2-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6309863b
  3. 09 Apr, 2024 6 commits
    • Arnd Bergmann's avatar
      ipv4/route: avoid unused-but-set-variable warning · cf1b7201
      Arnd Bergmann authored
      The log_martians variable is only used in an #ifdef, causing a 'make W=1'
      warning with gcc:
      
      net/ipv4/route.c: In function 'ip_rt_send_redirect':
      net/ipv4/route.c:880:13: error: variable 'log_martians' set but not used [-Werror=unused-but-set-variable]
      
      Change the #ifdef to an equivalent IS_ENABLED() to let the compiler
      see where the variable is used.
      
      Fixes: 30038fc6 ("net: ip_rt_send_redirect() optimization")
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408074219.3030256-2-arnd@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cf1b7201
    • Arnd Bergmann's avatar
      ipv6: fib: hide unused 'pn' variable · 74043489
      Arnd Bergmann authored
      When CONFIG_IPV6_SUBTREES is disabled, the only user is hidden, causing
      a 'make W=1' warning:
      
      net/ipv6/ip6_fib.c: In function 'fib6_add':
      net/ipv6/ip6_fib.c:1388:32: error: variable 'pn' set but not used [-Werror=unused-but-set-variable]
      
      Add another #ifdef around the variable declaration, matching the other
      uses in this file.
      
      Fixes: 66729e18 ("[IPV6] ROUTE: Make sure we have fn->leaf when adding a node on subtree.")
      Link: https://lore.kernel.org/netdev/20240322131746.904943-1-arnd@kernel.org/Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408074219.3030256-1-arnd@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      74043489
    • Geetha sowjanya's avatar
      octeontx2-af: Fix NIX SQ mode and BP config · faf23006
      Geetha sowjanya authored
      NIX SQ mode and link backpressure configuration is required for
      all platforms. But in current driver this code is wrongly placed
      under specific platform check. This patch fixes the issue by
      moving the code out of platform check.
      
      Fixes: 5d9b976d ("octeontx2-af: Support fixed transmit scheduler topology")
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Link: https://lore.kernel.org/r/20240408063643.26288-1-gakula@marvell.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      faf23006
    • Kuniyuki Iwashima's avatar
      af_unix: Clear stale u->oob_skb. · b46f4eaa
      Kuniyuki Iwashima authored
      syzkaller started to report deadlock of unix_gc_lock after commit
      4090fa37 ("af_unix: Replace garbage collection algorithm."), but
      it just uncovers the bug that has been there since commit 314001f0
      ("af_unix: Add OOB support").
      
      The repro basically does the following.
      
        from socket import *
        from array import array
      
        c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
        c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)
        c2.recv(1)  # blocked as no normal data in recv queue
      
        c2.close()  # done async and unblock recv()
        c1.close()  # done async and trigger GC
      
      A socket sends its file descriptor to itself as OOB data and tries to
      receive normal data, but finally recv() fails due to async close().
      
      The problem here is wrong handling of OOB skb in manage_oob().  When
      recvmsg() is called without MSG_OOB, manage_oob() is called to check
      if the peeked skb is OOB skb.  In such a case, manage_oob() pops it
      out of the receive queue but does not clear unix_sock(sk)->oob_skb.
      This is wrong in terms of uAPI.
      
      Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB.
      The 'o' is handled as OOB data.  When recv() is called twice without
      MSG_OOB, the OOB data should be lost.
      
        >>> from socket import *
        >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)
        >>> c1.send(b'hello', MSG_OOB)  # 'o' is OOB data
        5
        >>> c1.send(b'world')
        5
        >>> c2.recv(5)  # OOB data is not received
        b'hell'
        >>> c2.recv(5)  # OOB date is skipped
        b'world'
        >>> c2.recv(5, MSG_OOB)  # This should return an error
        b'o'
      
      In the same situation, TCP actually returns -EINVAL for the last
      recv().
      
      Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set
      EPOLLPRI even though the data has passed through by previous recv().
      
      To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing
      it from recv queue.
      
      The reason why the old GC did not trigger the deadlock is because the
      old GC relied on the receive queue to detect the loop.
      
      When it is triggered, the socket with OOB data is marked as GC candidate
      because file refcount == inflight count (1).  However, after traversing
      all inflight sockets, the socket still has a positive inflight count (1),
      thus the socket is excluded from candidates.  Then, the old GC lose the
      chance to garbage-collect the socket.
      
      With the old GC, the repro continues to create true garbage that will
      never be freed nor detected by kmemleak as it's linked to the global
      inflight list.  That's why we couldn't even notice the issue.
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Reported-by: syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240405221057.2406-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b46f4eaa
    • Marek Vasut's avatar
      net: ks8851: Handle softirqs at the end of IRQ thread to fix hang · be0384bf
      Marek Vasut authored
      The ks8851_irq() thread may call ks8851_rx_pkts() in case there are
      any packets in the MAC FIFO, which calls netif_rx(). This netif_rx()
      implementation is guarded by local_bh_disable() and local_bh_enable().
      The local_bh_enable() may call do_softirq() to run softirqs in case
      any are pending. One of the softirqs is net_rx_action, which ultimately
      reaches the driver .start_xmit callback. If that happens, the system
      hangs. The entire call chain is below:
      
      ks8851_start_xmit_par from netdev_start_xmit
      netdev_start_xmit from dev_hard_start_xmit
      dev_hard_start_xmit from sch_direct_xmit
      sch_direct_xmit from __dev_queue_xmit
      __dev_queue_xmit from __neigh_update
      __neigh_update from neigh_update
      neigh_update from arp_process.constprop.0
      arp_process.constprop.0 from __netif_receive_skb_one_core
      __netif_receive_skb_one_core from process_backlog
      process_backlog from __napi_poll.constprop.0
      __napi_poll.constprop.0 from net_rx_action
      net_rx_action from __do_softirq
      __do_softirq from call_with_stack
      call_with_stack from do_softirq
      do_softirq from __local_bh_enable_ip
      __local_bh_enable_ip from netif_rx
      netif_rx from ks8851_irq
      ks8851_irq from irq_thread_fn
      irq_thread_fn from irq_thread
      irq_thread from kthread
      kthread from ret_from_fork
      
      The hang happens because ks8851_irq() first locks a spinlock in
      ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...)
      and with that spinlock locked, calls netif_rx(). Once the execution
      reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again
      which attempts to claim the already locked spinlock again, and the
      hang happens.
      
      Move the do_softirq() call outside of the spinlock protected section
      of ks8851_irq() by disabling BHs around the entire spinlock protected
      section of ks8851_irq() handler. Place local_bh_enable() outside of
      the spinlock protected section, so that it can trigger do_softirq()
      without the ks8851_par.c ks8851_lock_par() spinlock being held, and
      safely call ks8851_start_xmit_par() without attempting to lock the
      already locked spinlock.
      
      Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable()
      now, replace netif_rx() with __netif_rx() which is not duplicating the
      local_bh_disable()/local_bh_enable() calls.
      
      Fixes: 797047f8 ("net: ks8851: Implement Parallel bus operations")
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Link: https://lore.kernel.org/r/20240405203204.82062-2-marex@denx.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      be0384bf
    • Marek Vasut's avatar
      net: ks8851: Inline ks8851_rx_skb() · f96f7004
      Marek Vasut authored
      Both ks8851_rx_skb_par() and ks8851_rx_skb_spi() call netif_rx(skb),
      inline the netif_rx(skb) call directly into ks8851_common.c and drop
      the .rx_skb callback and ks8851_rx_skb() wrapper. This removes one
      indirect call from the driver, no functional change otherwise.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Link: https://lore.kernel.org/r/20240405203204.82062-1-marex@denx.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f96f7004
  4. 08 Apr, 2024 12 commits
  5. 07 Apr, 2024 2 commits
    • Hariprasad Kelam's avatar
      octeontx2-pf: Fix transmit scheduler resource leak · bccb798e
      Hariprasad Kelam authored
      Inorder to support shaping and scheduling, Upon class creation
      Netdev driver allocates trasmit schedulers.
      
      The previous patch which added support for Round robin scheduling has
      a bug due to which driver is not freeing transmit schedulers post
      class deletion.
      
      This patch fixes the same.
      
      Fixes: 47a9656f ("octeontx2-pf: htb offload support for Round Robin scheduling")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bccb798e
    • Breno Leitao's avatar
      virtio_net: Do not send RSS key if it is not supported · 059a49aa
      Breno Leitao authored
      There is a bug when setting the RSS options in virtio_net that can break
      the whole machine, getting the kernel into an infinite loop.
      
      Running the following command in any QEMU virtual machine with virtionet
      will reproduce this problem:
      
          # ethtool -X eth0  hfunc toeplitz
      
      This is how the problem happens:
      
      1) ethtool_set_rxfh() calls virtnet_set_rxfh()
      
      2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
      
      3) virtnet_commit_rss_command() populates 4 entries for the rss
      scatter-gather
      
      4) Since the command above does not have a key, then the last
      scatter-gatter entry will be zeroed, since rss_key_size == 0.
      sg_buf_size = vi->rss_key_size;
      
      5) This buffer is passed to qemu, but qemu is not happy with a buffer
      with zero length, and do the following in virtqueue_map_desc() (QEMU
      function):
      
        if (!sz) {
            virtio_error(vdev, "virtio: zero sized buffers are not allowed");
      
      6) virtio_error() (also QEMU function) set the device as broken
      
          vdev->broken = true;
      
      7) Qemu bails out, and do not repond this crazy kernel.
      
      8) The kernel is waiting for the response to come back (function
      virtnet_send_command())
      
      9) The kernel is waiting doing the following :
      
            while (!virtqueue_get_buf(vi->cvq, &tmp) &&
      	     !virtqueue_is_broken(vi->cvq))
      	      cpu_relax();
      
      10) None of the following functions above is true, thus, the kernel
      loops here forever. Keeping in mind that virtqueue_is_broken() does
      not look at the qemu `vdev->broken`, so, it never realizes that the
      vitio is broken at QEMU side.
      
      Fix it by not sending RSS commands if the feature is not available in
      the device.
      
      Fixes: c7114b12 ("drivers/net/virtio_net: Added basic RSS support.")
      Cc: stable@vger.kernel.org
      Cc: qemu-devel@nongnu.org
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarHeng Qi <hengqi@linux.alibaba.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      059a49aa
  6. 06 Apr, 2024 2 commits
    • Eric Dumazet's avatar
      xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING · 237f3cf1
      Eric Dumazet authored
      syzbot reported an illegal copy in xsk_setsockopt() [1]
      
      Make sure to validate setsockopt() @optlen parameter.
      
      [1]
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
      Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
      
      CPU: 0 PID: 7549 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
        do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      RIP: 0033:0x7fb40587de69
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb40665a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007fb4059abf80 RCX: 00007fb40587de69
      RDX: 0000000000000005 RSI: 000000000000011b RDI: 0000000000000006
      RBP: 00007fb4058ca47a R08: 0000000000000002 R09: 0000000000000000
      R10: 0000000020001980 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fb4059abf80 R15: 00007fff57ee4d08
       </TASK>
      
      Allocated by task 7549:
        kasan_save_stack mm/kasan/common.c:47 [inline]
        kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
        poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
        __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
        kasan_kmalloc include/linux/kasan.h:211 [inline]
        __do_kmalloc_node mm/slub.c:3966 [inline]
        __kmalloc+0x233/0x4a0 mm/slub.c:3979
        kmalloc include/linux/slab.h:632 [inline]
        __cgroup_bpf_run_filter_setsockopt+0xd2f/0x1040 kernel/bpf/cgroup.c:1869
        do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      The buggy address belongs to the object at ffff888028c6cde0
       which belongs to the cache kmalloc-8 of size 8
      The buggy address is located 1 bytes to the right of
       allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
      
      The buggy address belongs to the physical page:
      page:ffffea0000a31b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888028c6c9c0 pfn:0x28c6c
      anon flags: 0xfff00000000800(slab|node=0|zone=1|lastcpupid=0x7ff)
      page_type: 0xffffffff()
      raw: 00fff00000000800 ffff888014c41280 0000000000000000 dead000000000001
      raw: ffff888028c6c9c0 0000000080800057 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER|__GFP_NOWARN|__GFP_NORETRY), pid 6648, tgid 6644 (syz-executor.0), ts 133906047828, free_ts 133859922223
        set_page_owner include/linux/page_owner.h:31 [inline]
        post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
        prep_new_page mm/page_alloc.c:1540 [inline]
        get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
        __alloc_pages+0x256/0x680 mm/page_alloc.c:4569
        __alloc_pages_node include/linux/gfp.h:238 [inline]
        alloc_pages_node include/linux/gfp.h:261 [inline]
        alloc_slab_page+0x5f/0x160 mm/slub.c:2175
        allocate_slab mm/slub.c:2338 [inline]
        new_slab+0x84/0x2f0 mm/slub.c:2391
        ___slab_alloc+0xc73/0x1260 mm/slub.c:3525
        __slab_alloc mm/slub.c:3610 [inline]
        __slab_alloc_node mm/slub.c:3663 [inline]
        slab_alloc_node mm/slub.c:3835 [inline]
        __do_kmalloc_node mm/slub.c:3965 [inline]
        __kmalloc_node+0x2db/0x4e0 mm/slub.c:3973
        kmalloc_node include/linux/slab.h:648 [inline]
        __vmalloc_area_node mm/vmalloc.c:3197 [inline]
        __vmalloc_node_range+0x5f9/0x14a0 mm/vmalloc.c:3392
        __vmalloc_node mm/vmalloc.c:3457 [inline]
        vzalloc+0x79/0x90 mm/vmalloc.c:3530
        bpf_check+0x260/0x19010 kernel/bpf/verifier.c:21162
        bpf_prog_load+0x1667/0x20f0 kernel/bpf/syscall.c:2895
        __sys_bpf+0x4ee/0x810 kernel/bpf/syscall.c:5631
        __do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
        __se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
        __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      page last free pid 6650 tgid 6647 stack trace:
        reset_page_owner include/linux/page_owner.h:24 [inline]
        free_pages_prepare mm/page_alloc.c:1140 [inline]
        free_unref_page_prepare+0x95d/0xa80 mm/page_alloc.c:2346
        free_unref_page_list+0x5a3/0x850 mm/page_alloc.c:2532
        release_pages+0x2117/0x2400 mm/swap.c:1042
        tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
        tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
        tlb_flush_mmu+0x34d/0x4e0 mm/mmu_gather.c:300
        tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:392
        exit_mmap+0x4b6/0xd40 mm/mmap.c:3300
        __mmput+0x115/0x3c0 kernel/fork.c:1345
        exit_mm+0x220/0x310 kernel/exit.c:569
        do_exit+0x99e/0x27e0 kernel/exit.c:865
        do_group_exit+0x207/0x2c0 kernel/exit.c:1027
        get_signal+0x176e/0x1850 kernel/signal.c:2907
        arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:310
        exit_to_user_mode_loop kernel/entry/common.c:105 [inline]
        exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
        __syscall_exit_to_user_mode_work kernel/entry/common.c:201 [inline]
        syscall_exit_to_user_mode+0xc9/0x360 kernel/entry/common.c:212
        do_syscall_64+0x10a/0x240 arch/x86/entry/common.c:89
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Memory state around the buggy address:
       ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
       ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
      >ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
                                                             ^
       ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
       ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
      
      Fixes: 423f3832 ("xsk: add umem fill queue support and mmap")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: "Björn Töpel" <bjorn@kernel.org>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/r/20240404202738.3634547-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      237f3cf1
    • Petr Tesarik's avatar
      u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file · 38a15d0a
      Petr Tesarik authored
      Fix bogus lockdep warnings if multiple u64_stats_sync variables are
      initialized in the same file.
      
      With CONFIG_LOCKDEP, seqcount_init() is a macro which declares:
      
      	static struct lock_class_key __key;
      
      Since u64_stats_init() is a function (albeit an inline one), all calls
      within the same file end up using the same instance, effectively treating
      them all as a single lock-class.
      
      Fixes: 9464ca65 ("net: make u64_stats_init() a function")
      Closes: https://lore.kernel.org/netdev/ea1567d9-ce66-45e6-8168-ac40a47d1821@roeck-us.net/Signed-off-by: default avatarPetr Tesarik <petr@tesarici.cz>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240404075740.30682-1-petr@tesarici.czSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      38a15d0a