1. 10 Apr, 2024 1 commit
  2. 09 Apr, 2024 6 commits
    • Arnd Bergmann's avatar
      ipv4/route: avoid unused-but-set-variable warning · cf1b7201
      Arnd Bergmann authored
      The log_martians variable is only used in an #ifdef, causing a 'make W=1'
      warning with gcc:
      
      net/ipv4/route.c: In function 'ip_rt_send_redirect':
      net/ipv4/route.c:880:13: error: variable 'log_martians' set but not used [-Werror=unused-but-set-variable]
      
      Change the #ifdef to an equivalent IS_ENABLED() to let the compiler
      see where the variable is used.
      
      Fixes: 30038fc6 ("net: ip_rt_send_redirect() optimization")
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408074219.3030256-2-arnd@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cf1b7201
    • Arnd Bergmann's avatar
      ipv6: fib: hide unused 'pn' variable · 74043489
      Arnd Bergmann authored
      When CONFIG_IPV6_SUBTREES is disabled, the only user is hidden, causing
      a 'make W=1' warning:
      
      net/ipv6/ip6_fib.c: In function 'fib6_add':
      net/ipv6/ip6_fib.c:1388:32: error: variable 'pn' set but not used [-Werror=unused-but-set-variable]
      
      Add another #ifdef around the variable declaration, matching the other
      uses in this file.
      
      Fixes: 66729e18 ("[IPV6] ROUTE: Make sure we have fn->leaf when adding a node on subtree.")
      Link: https://lore.kernel.org/netdev/20240322131746.904943-1-arnd@kernel.org/Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408074219.3030256-1-arnd@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      74043489
    • Geetha sowjanya's avatar
      octeontx2-af: Fix NIX SQ mode and BP config · faf23006
      Geetha sowjanya authored
      NIX SQ mode and link backpressure configuration is required for
      all platforms. But in current driver this code is wrongly placed
      under specific platform check. This patch fixes the issue by
      moving the code out of platform check.
      
      Fixes: 5d9b976d ("octeontx2-af: Support fixed transmit scheduler topology")
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Link: https://lore.kernel.org/r/20240408063643.26288-1-gakula@marvell.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      faf23006
    • Kuniyuki Iwashima's avatar
      af_unix: Clear stale u->oob_skb. · b46f4eaa
      Kuniyuki Iwashima authored
      syzkaller started to report deadlock of unix_gc_lock after commit
      4090fa37 ("af_unix: Replace garbage collection algorithm."), but
      it just uncovers the bug that has been there since commit 314001f0
      ("af_unix: Add OOB support").
      
      The repro basically does the following.
      
        from socket import *
        from array import array
      
        c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
        c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)
        c2.recv(1)  # blocked as no normal data in recv queue
      
        c2.close()  # done async and unblock recv()
        c1.close()  # done async and trigger GC
      
      A socket sends its file descriptor to itself as OOB data and tries to
      receive normal data, but finally recv() fails due to async close().
      
      The problem here is wrong handling of OOB skb in manage_oob().  When
      recvmsg() is called without MSG_OOB, manage_oob() is called to check
      if the peeked skb is OOB skb.  In such a case, manage_oob() pops it
      out of the receive queue but does not clear unix_sock(sk)->oob_skb.
      This is wrong in terms of uAPI.
      
      Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB.
      The 'o' is handled as OOB data.  When recv() is called twice without
      MSG_OOB, the OOB data should be lost.
      
        >>> from socket import *
        >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)
        >>> c1.send(b'hello', MSG_OOB)  # 'o' is OOB data
        5
        >>> c1.send(b'world')
        5
        >>> c2.recv(5)  # OOB data is not received
        b'hell'
        >>> c2.recv(5)  # OOB date is skipped
        b'world'
        >>> c2.recv(5, MSG_OOB)  # This should return an error
        b'o'
      
      In the same situation, TCP actually returns -EINVAL for the last
      recv().
      
      Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set
      EPOLLPRI even though the data has passed through by previous recv().
      
      To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing
      it from recv queue.
      
      The reason why the old GC did not trigger the deadlock is because the
      old GC relied on the receive queue to detect the loop.
      
      When it is triggered, the socket with OOB data is marked as GC candidate
      because file refcount == inflight count (1).  However, after traversing
      all inflight sockets, the socket still has a positive inflight count (1),
      thus the socket is excluded from candidates.  Then, the old GC lose the
      chance to garbage-collect the socket.
      
      With the old GC, the repro continues to create true garbage that will
      never be freed nor detected by kmemleak as it's linked to the global
      inflight list.  That's why we couldn't even notice the issue.
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Reported-by: syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240405221057.2406-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b46f4eaa
    • Marek Vasut's avatar
      net: ks8851: Handle softirqs at the end of IRQ thread to fix hang · be0384bf
      Marek Vasut authored
      The ks8851_irq() thread may call ks8851_rx_pkts() in case there are
      any packets in the MAC FIFO, which calls netif_rx(). This netif_rx()
      implementation is guarded by local_bh_disable() and local_bh_enable().
      The local_bh_enable() may call do_softirq() to run softirqs in case
      any are pending. One of the softirqs is net_rx_action, which ultimately
      reaches the driver .start_xmit callback. If that happens, the system
      hangs. The entire call chain is below:
      
      ks8851_start_xmit_par from netdev_start_xmit
      netdev_start_xmit from dev_hard_start_xmit
      dev_hard_start_xmit from sch_direct_xmit
      sch_direct_xmit from __dev_queue_xmit
      __dev_queue_xmit from __neigh_update
      __neigh_update from neigh_update
      neigh_update from arp_process.constprop.0
      arp_process.constprop.0 from __netif_receive_skb_one_core
      __netif_receive_skb_one_core from process_backlog
      process_backlog from __napi_poll.constprop.0
      __napi_poll.constprop.0 from net_rx_action
      net_rx_action from __do_softirq
      __do_softirq from call_with_stack
      call_with_stack from do_softirq
      do_softirq from __local_bh_enable_ip
      __local_bh_enable_ip from netif_rx
      netif_rx from ks8851_irq
      ks8851_irq from irq_thread_fn
      irq_thread_fn from irq_thread
      irq_thread from kthread
      kthread from ret_from_fork
      
      The hang happens because ks8851_irq() first locks a spinlock in
      ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...)
      and with that spinlock locked, calls netif_rx(). Once the execution
      reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again
      which attempts to claim the already locked spinlock again, and the
      hang happens.
      
      Move the do_softirq() call outside of the spinlock protected section
      of ks8851_irq() by disabling BHs around the entire spinlock protected
      section of ks8851_irq() handler. Place local_bh_enable() outside of
      the spinlock protected section, so that it can trigger do_softirq()
      without the ks8851_par.c ks8851_lock_par() spinlock being held, and
      safely call ks8851_start_xmit_par() without attempting to lock the
      already locked spinlock.
      
      Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable()
      now, replace netif_rx() with __netif_rx() which is not duplicating the
      local_bh_disable()/local_bh_enable() calls.
      
      Fixes: 797047f8 ("net: ks8851: Implement Parallel bus operations")
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Link: https://lore.kernel.org/r/20240405203204.82062-2-marex@denx.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      be0384bf
    • Marek Vasut's avatar
      net: ks8851: Inline ks8851_rx_skb() · f96f7004
      Marek Vasut authored
      Both ks8851_rx_skb_par() and ks8851_rx_skb_spi() call netif_rx(skb),
      inline the netif_rx(skb) call directly into ks8851_common.c and drop
      the .rx_skb callback and ks8851_rx_skb() wrapper. This removes one
      indirect call from the driver, no functional change otherwise.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Link: https://lore.kernel.org/r/20240405203204.82062-1-marex@denx.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f96f7004
  3. 08 Apr, 2024 12 commits
  4. 07 Apr, 2024 2 commits
    • Hariprasad Kelam's avatar
      octeontx2-pf: Fix transmit scheduler resource leak · bccb798e
      Hariprasad Kelam authored
      Inorder to support shaping and scheduling, Upon class creation
      Netdev driver allocates trasmit schedulers.
      
      The previous patch which added support for Round robin scheduling has
      a bug due to which driver is not freeing transmit schedulers post
      class deletion.
      
      This patch fixes the same.
      
      Fixes: 47a9656f ("octeontx2-pf: htb offload support for Round Robin scheduling")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bccb798e
    • Breno Leitao's avatar
      virtio_net: Do not send RSS key if it is not supported · 059a49aa
      Breno Leitao authored
      There is a bug when setting the RSS options in virtio_net that can break
      the whole machine, getting the kernel into an infinite loop.
      
      Running the following command in any QEMU virtual machine with virtionet
      will reproduce this problem:
      
          # ethtool -X eth0  hfunc toeplitz
      
      This is how the problem happens:
      
      1) ethtool_set_rxfh() calls virtnet_set_rxfh()
      
      2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
      
      3) virtnet_commit_rss_command() populates 4 entries for the rss
      scatter-gather
      
      4) Since the command above does not have a key, then the last
      scatter-gatter entry will be zeroed, since rss_key_size == 0.
      sg_buf_size = vi->rss_key_size;
      
      5) This buffer is passed to qemu, but qemu is not happy with a buffer
      with zero length, and do the following in virtqueue_map_desc() (QEMU
      function):
      
        if (!sz) {
            virtio_error(vdev, "virtio: zero sized buffers are not allowed");
      
      6) virtio_error() (also QEMU function) set the device as broken
      
          vdev->broken = true;
      
      7) Qemu bails out, and do not repond this crazy kernel.
      
      8) The kernel is waiting for the response to come back (function
      virtnet_send_command())
      
      9) The kernel is waiting doing the following :
      
            while (!virtqueue_get_buf(vi->cvq, &tmp) &&
      	     !virtqueue_is_broken(vi->cvq))
      	      cpu_relax();
      
      10) None of the following functions above is true, thus, the kernel
      loops here forever. Keeping in mind that virtqueue_is_broken() does
      not look at the qemu `vdev->broken`, so, it never realizes that the
      vitio is broken at QEMU side.
      
      Fix it by not sending RSS commands if the feature is not available in
      the device.
      
      Fixes: c7114b12 ("drivers/net/virtio_net: Added basic RSS support.")
      Cc: stable@vger.kernel.org
      Cc: qemu-devel@nongnu.org
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarHeng Qi <hengqi@linux.alibaba.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      059a49aa
  5. 06 Apr, 2024 3 commits
    • Eric Dumazet's avatar
      xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING · 237f3cf1
      Eric Dumazet authored
      syzbot reported an illegal copy in xsk_setsockopt() [1]
      
      Make sure to validate setsockopt() @optlen parameter.
      
      [1]
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
      Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
      
      CPU: 0 PID: 7549 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
        do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      RIP: 0033:0x7fb40587de69
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb40665a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007fb4059abf80 RCX: 00007fb40587de69
      RDX: 0000000000000005 RSI: 000000000000011b RDI: 0000000000000006
      RBP: 00007fb4058ca47a R08: 0000000000000002 R09: 0000000000000000
      R10: 0000000020001980 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fb4059abf80 R15: 00007fff57ee4d08
       </TASK>
      
      Allocated by task 7549:
        kasan_save_stack mm/kasan/common.c:47 [inline]
        kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
        poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
        __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
        kasan_kmalloc include/linux/kasan.h:211 [inline]
        __do_kmalloc_node mm/slub.c:3966 [inline]
        __kmalloc+0x233/0x4a0 mm/slub.c:3979
        kmalloc include/linux/slab.h:632 [inline]
        __cgroup_bpf_run_filter_setsockopt+0xd2f/0x1040 kernel/bpf/cgroup.c:1869
        do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      The buggy address belongs to the object at ffff888028c6cde0
       which belongs to the cache kmalloc-8 of size 8
      The buggy address is located 1 bytes to the right of
       allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
      
      The buggy address belongs to the physical page:
      page:ffffea0000a31b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888028c6c9c0 pfn:0x28c6c
      anon flags: 0xfff00000000800(slab|node=0|zone=1|lastcpupid=0x7ff)
      page_type: 0xffffffff()
      raw: 00fff00000000800 ffff888014c41280 0000000000000000 dead000000000001
      raw: ffff888028c6c9c0 0000000080800057 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER|__GFP_NOWARN|__GFP_NORETRY), pid 6648, tgid 6644 (syz-executor.0), ts 133906047828, free_ts 133859922223
        set_page_owner include/linux/page_owner.h:31 [inline]
        post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
        prep_new_page mm/page_alloc.c:1540 [inline]
        get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
        __alloc_pages+0x256/0x680 mm/page_alloc.c:4569
        __alloc_pages_node include/linux/gfp.h:238 [inline]
        alloc_pages_node include/linux/gfp.h:261 [inline]
        alloc_slab_page+0x5f/0x160 mm/slub.c:2175
        allocate_slab mm/slub.c:2338 [inline]
        new_slab+0x84/0x2f0 mm/slub.c:2391
        ___slab_alloc+0xc73/0x1260 mm/slub.c:3525
        __slab_alloc mm/slub.c:3610 [inline]
        __slab_alloc_node mm/slub.c:3663 [inline]
        slab_alloc_node mm/slub.c:3835 [inline]
        __do_kmalloc_node mm/slub.c:3965 [inline]
        __kmalloc_node+0x2db/0x4e0 mm/slub.c:3973
        kmalloc_node include/linux/slab.h:648 [inline]
        __vmalloc_area_node mm/vmalloc.c:3197 [inline]
        __vmalloc_node_range+0x5f9/0x14a0 mm/vmalloc.c:3392
        __vmalloc_node mm/vmalloc.c:3457 [inline]
        vzalloc+0x79/0x90 mm/vmalloc.c:3530
        bpf_check+0x260/0x19010 kernel/bpf/verifier.c:21162
        bpf_prog_load+0x1667/0x20f0 kernel/bpf/syscall.c:2895
        __sys_bpf+0x4ee/0x810 kernel/bpf/syscall.c:5631
        __do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
        __se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
        __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      page last free pid 6650 tgid 6647 stack trace:
        reset_page_owner include/linux/page_owner.h:24 [inline]
        free_pages_prepare mm/page_alloc.c:1140 [inline]
        free_unref_page_prepare+0x95d/0xa80 mm/page_alloc.c:2346
        free_unref_page_list+0x5a3/0x850 mm/page_alloc.c:2532
        release_pages+0x2117/0x2400 mm/swap.c:1042
        tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
        tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
        tlb_flush_mmu+0x34d/0x4e0 mm/mmu_gather.c:300
        tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:392
        exit_mmap+0x4b6/0xd40 mm/mmap.c:3300
        __mmput+0x115/0x3c0 kernel/fork.c:1345
        exit_mm+0x220/0x310 kernel/exit.c:569
        do_exit+0x99e/0x27e0 kernel/exit.c:865
        do_group_exit+0x207/0x2c0 kernel/exit.c:1027
        get_signal+0x176e/0x1850 kernel/signal.c:2907
        arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:310
        exit_to_user_mode_loop kernel/entry/common.c:105 [inline]
        exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
        __syscall_exit_to_user_mode_work kernel/entry/common.c:201 [inline]
        syscall_exit_to_user_mode+0xc9/0x360 kernel/entry/common.c:212
        do_syscall_64+0x10a/0x240 arch/x86/entry/common.c:89
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Memory state around the buggy address:
       ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
       ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
      >ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
                                                             ^
       ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
       ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
      
      Fixes: 423f3832 ("xsk: add umem fill queue support and mmap")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: "Björn Töpel" <bjorn@kernel.org>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/r/20240404202738.3634547-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      237f3cf1
    • Petr Tesarik's avatar
      u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file · 38a15d0a
      Petr Tesarik authored
      Fix bogus lockdep warnings if multiple u64_stats_sync variables are
      initialized in the same file.
      
      With CONFIG_LOCKDEP, seqcount_init() is a macro which declares:
      
      	static struct lock_class_key __key;
      
      Since u64_stats_init() is a function (albeit an inline one), all calls
      within the same file end up using the same instance, effectively treating
      them all as a single lock-class.
      
      Fixes: 9464ca65 ("net: make u64_stats_init() a function")
      Closes: https://lore.kernel.org/netdev/ea1567d9-ce66-45e6-8168-ac40a47d1821@roeck-us.net/Signed-off-by: default avatarPetr Tesarik <petr@tesarici.cz>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240404075740.30682-1-petr@tesarici.czSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      38a15d0a
    • Ilya Maximets's avatar
      net: openvswitch: fix unwanted error log on timeout policy probing · 4539f91f
      Ilya Maximets authored
      On startup, ovs-vswitchd probes different datapath features including
      support for timeout policies.  While probing, it tries to execute
      certain operations with OVS_PACKET_ATTR_PROBE or OVS_FLOW_ATTR_PROBE
      attributes set.  These attributes tell the openvswitch module to not
      log any errors when they occur as it is expected that some of the
      probes will fail.
      
      For some reason, setting the timeout policy ignores the PROBE attribute
      and logs a failure anyway.  This is causing the following kernel log
      on each re-start of ovs-vswitchd:
      
        kernel: Failed to associated timeout policy `ovs_test_tp'
      
      Fix that by using the same logging macro that all other messages are
      using.  The message will still be printed at info level when needed
      and will be rate limited, but with a net rate limiter instead of
      generic printk one.
      
      The nf_ct_set_timeout() itself will still print some info messages,
      but at least this change makes logging in openvswitch module more
      consistent.
      
      Fixes: 06bd2bdf ("openvswitch: Add timeout support to ct action")
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Link: https://lore.kernel.org/r/20240403203803.2137962-1-i.maximets@ovn.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4539f91f
  6. 04 Apr, 2024 16 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c88b9b4c
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter, bluetooth and bpf.
      
        Fairly usual collection of driver and core fixes. The large selftest
        accompanying one of the fixes is also becoming a common occurrence.
      
        Current release - regressions:
      
         - ipv6: fix infinite recursion in fib6_dump_done()
      
         - net/rds: fix possible null-deref in newly added error path
      
        Current release - new code bugs:
      
         - net: do not consume a full cacheline for system_page_pool
      
         - bpf: fix bpf_arena-related file descriptor leaks in the verifier
      
         - drv: ice: fix freeing uninitialized pointers, fixing misuse of the
           newfangled __free() auto-cleanup
      
        Previous releases - regressions:
      
         - x86/bpf: fixes the BPF JIT with retbleed=stuff
      
         - xen-netfront: add missing skb_mark_for_recycle, fix page pool
           accounting leaks, revealed by recently added explicit warning
      
         - tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6
           non-wildcard addresses
      
         - Bluetooth:
            - replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with
              better workarounds to un-break some buggy Qualcomm devices
            - set conn encrypted before conn establishes, fix re-connecting to
              some headsets which use slightly unusual sequence of msgs
      
         - mptcp:
            - prevent BPF accessing lowat from a subflow socket
            - don't account accept() of non-MPC client as fallback to TCP
      
         - drv: mana: fix Rx DMA datasize and skb_over_panic
      
         - drv: i40e: fix VF MAC filter removal
      
        Previous releases - always broken:
      
         - gro: various fixes related to UDP tunnels - netns crossing
           problems, incorrect checksum conversions, and incorrect packet
           transformations which may lead to panics
      
         - bpf: support deferring bpf_link dealloc to after RCU grace period
      
         - nf_tables:
            - release batch on table validation from abort path
            - release mutex after nft_gc_seq_end from abort path
            - flush pending destroy work before exit_net release
      
         - drv: r8169: skip DASH fw status checks when DASH is disabled"
      
      * tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
        netfilter: validate user input for expected length
        net/sched: act_skbmod: prevent kernel-infoleak
        net: usb: ax88179_178a: avoid the interface always configured as random address
        net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
        net: ravb: Always update error counters
        net: ravb: Always process TX descriptor ring
        netfilter: nf_tables: discard table flag update with pending basechain deletion
        netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
        netfilter: nf_tables: reject new basechain after table flag update
        netfilter: nf_tables: flush pending destroy work before exit_net release
        netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
        netfilter: nf_tables: release batch on table validation from abort path
        Revert "tg3: Remove residual error handling in tg3_suspend"
        tg3: Remove residual error handling in tg3_suspend
        net: mana: Fix Rx DMA datasize and skb_over_panic
        net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
        net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
        net: stmmac: fix rx queue priority assignment
        net: txgbe: fix i2c dev name cannot match clkdev
        net: fec: Set mac_managed_pm during probe
        ...
      c88b9b4c
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs · ec25bd8d
      Linus Torvalds authored
      Pull bcachefs repair code from Kent Overstreet:
       "A couple more small fixes, and new repair code.
      
        We can now automatically recover from arbitrary corrupted interior
        btree nodes by scanning, and we can reconstruct metadata as needed to
        bring a filesystem back into a working, consistent, read-write state
        and preserve access to whatevver wasn't corrupted.
      
        Meaning - you can blow away all metadata except for extents and
        dirents leaf nodes, and repair will reconstruct everything else and
        give you your data, and under the correct paths. If inodes are missing
        i_size will be slightly off and permissions/ownership/timestamps will
        be gone, and we do still need the snapshots btree if snapshots were in
        use - in the future we'll be able to guess the snapshot tree structure
        in some situations.
      
        IOW - aside from shaking out remaining bugs (fuzz testing is still
        coming), repair code should be complete and if repair ever doesn't
        work that's the highest priority bug that I want to know about
        immediately.
      
        This patchset was kindly tested by a user from India who accidentally
        wiped one drive out of a three drive filesystem with no replication on
        the family computer - it took a couple weeks but we got everything
        important back"
      
      * tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: reconstruct_inode()
        bcachefs: Subvolume reconstruction
        bcachefs: Check for extents that point to same space
        bcachefs: Reconstruct missing snapshot nodes
        bcachefs: Flag btrees with missing data
        bcachefs: Topology repair now uses nodes found by scanning to fill holes
        bcachefs: Repair pass for scanning for btree nodes
        bcachefs: Don't skip fake btree roots in fsck
        bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()
        bcachefs: Etyzinger cleanups
        bcachefs: bch2_shoot_down_journal_keys()
        bcachefs: Clear recovery_passes_required as they complete without errors
        bcachefs: ratelimit informational fsck errors
        bcachefs: Check for bad needs_discard before doing discard
        bcachefs: Improve bch2_btree_update_to_text()
        mean_and_variance: Drop always failing tests
        bcachefs: fix nocow lock deadlock
        bcachefs: BCH_WATERMARK_interior_updates
        bcachefs: Fix btree node reserve
      ec25bd8d
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1cfa2f10
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2024-04-04
      
      We've added 7 non-merge commits during the last 5 day(s) which contain
      a total of 9 files changed, 75 insertions(+), 24 deletions(-).
      
      The main changes are:
      
      1) Fix x86 BPF JIT under retbleed=stuff which causes kernel panics due to
         incorrect destination IP calculation and incorrect IP for relocations,
         from Uros Bizjak and Joan Bruguera Micó.
      
      2) Fix BPF arena file descriptor leaks in the verifier,
         from Anton Protopopov.
      
      3) Defer bpf_link deallocation to after RCU grace period as currently
         running multi-{kprobes,uprobes} programs might still access cookie
         information from the link, from Andrii Nakryiko.
      
      4) Fix a BPF sockmap lock inversion deadlock in map_delete_elem reported
         by syzkaller, from Jakub Sitnicki.
      
      5) Fix resolve_btfids build with musl libc due to missing linux/types.h
         include, from Natanael Copa.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, sockmap: Prevent lock inversion deadlock in map delete elem
        x86/bpf: Fix IP for relocating call depth accounting
        x86/bpf: Fix IP after emitting call depth accounting
        bpf: fix possible file descriptor leaks in verifier
        tools/resolve_btfids: fix build with musl libc
        bpf: support deferring bpf_link dealloc to after RCU grace period
        bpf: put uprobe link's path and task in release callback
      ====================
      
      Link: https://lore.kernel.org/r/20240404183258.4401-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1cfa2f10
    • Eric Dumazet's avatar
      netfilter: validate user input for expected length · 0c83842d
      Eric Dumazet authored
      I got multiple syzbot reports showing old bugs exposed
      by BPF after commit 20f2505f ("bpf: Try to avoid kzalloc
      in cgroup/{s,g}etsockopt")
      
      setsockopt() @optlen argument should be taken into account
      before copying data.
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
       BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
      Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238
      
      CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
        __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
        do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
        nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
        do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x72/0x7a
      RIP: 0033:0x7fd22067dde9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9
      RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8
       </TASK>
      
      Allocated by task 7238:
        kasan_save_stack mm/kasan/common.c:47 [inline]
        kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
        poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
        __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
        kasan_kmalloc include/linux/kasan.h:211 [inline]
        __do_kmalloc_node mm/slub.c:4069 [inline]
        __kmalloc_noprof+0x200/0x410 mm/slub.c:4082
        kmalloc_noprof include/linux/slab.h:664 [inline]
        __cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869
        do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x72/0x7a
      
      The buggy address belongs to the object at ffff88802cd73da0
       which belongs to the cache kmalloc-8 of size 8
      The buggy address is located 0 bytes inside of
       allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
      
      The buggy address belongs to the physical page:
      page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73
      flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
      page_type: 0xffffefff(slab)
      raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122
      raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828
        set_page_owner include/linux/page_owner.h:32 [inline]
        post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490
        prep_new_page mm/page_alloc.c:1498 [inline]
        get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454
        __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712
        __alloc_pages_node_noprof include/linux/gfp.h:244 [inline]
        alloc_pages_node_noprof include/linux/gfp.h:271 [inline]
        alloc_slab_page+0x5f/0x120 mm/slub.c:2249
        allocate_slab+0x5a/0x2e0 mm/slub.c:2412
        new_slab mm/slub.c:2465 [inline]
        ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615
        __slab_alloc+0x58/0xa0 mm/slub.c:3705
        __slab_alloc_node mm/slub.c:3758 [inline]
        slab_alloc_node mm/slub.c:3936 [inline]
        __do_kmalloc_node mm/slub.c:4068 [inline]
        kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089
        kstrdup+0x3a/0x80 mm/util.c:62
        device_rename+0xb5/0x1b0 drivers/base/core.c:4558
        dev_change_name+0x275/0x860 net/core/dev.c:1232
        do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864
        __rtnl_newlink net/core/rtnetlink.c:3680 [inline]
        rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727
        rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594
        netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
        netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
      page last free pid 5146 tgid 5146 stack trace:
        reset_page_owner include/linux/page_owner.h:25 [inline]
        free_pages_prepare mm/page_alloc.c:1110 [inline]
        free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617
        discard_slab mm/slub.c:2511 [inline]
        __put_partials+0xeb/0x130 mm/slub.c:2980
        put_cpu_partial+0x17c/0x250 mm/slub.c:3055
        __slab_free+0x2ea/0x3d0 mm/slub.c:4254
        qlink_free mm/kasan/quarantine.c:163 [inline]
        qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179
        kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
        __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
        kasan_slab_alloc include/linux/kasan.h:201 [inline]
        slab_post_alloc_hook mm/slub.c:3888 [inline]
        slab_alloc_node mm/slub.c:3948 [inline]
        __do_kmalloc_node mm/slub.c:4068 [inline]
        __kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076
        kmalloc_node_noprof include/linux/slab.h:681 [inline]
        kvmalloc_node_noprof+0x72/0x190 mm/util.c:634
        bucket_table_alloc lib/rhashtable.c:186 [inline]
        rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367
        rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427
        process_one_work kernel/workqueue.c:3218 [inline]
        process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
        worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
        kthread+0x2f0/0x390 kernel/kthread.c:388
        ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
      
      Memory state around the buggy address:
       ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc
       ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
      >ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
                                     ^
       ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc
       ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c83842d
    • Jakub Kicinski's avatar
      Merge tag 'nf-24-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · d432f7bd
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      Patch #1 unlike early commit path stage which triggers a call to abort,
               an explicit release of the batch is required on abort, otherwise
               mutex is released and commit_list remains in place.
      
      Patch #2 release mutex after nft_gc_seq_end() in commit path, otherwise
               async GC worker could collect expired objects.
      
      Patch #3 flush pending destroy work in module removal path, otherwise UaF
               is possible.
      
      Patch #4 and #6 restrict the table dormant flag with basechain updates
      	 to fix state inconsistency in the hook registration.
      
      Patch #5 adds missing RCU read side lock to flowtable type to avoid races
      	 with module removal.
      
      * tag 'nf-24-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: discard table flag update with pending basechain deletion
        netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
        netfilter: nf_tables: reject new basechain after table flag update
        netfilter: nf_tables: flush pending destroy work before exit_net release
        netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
        netfilter: nf_tables: release batch on table validation from abort path
      ====================
      
      Link: https://lore.kernel.org/r/20240404104334.1627-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d432f7bd
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · a66323e4
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-04-03 (ice, idpf)
      
      This series contains updates to ice and idpf drivers.
      
      Dan Carpenter initializes some pointer declarations to NULL as needed for
      resource cleanup on ice driver.
      
      Petr Oros corrects assignment of VLAN operators to fix Rx VLAN filtering
      in legacy mode for ice.
      
      Joshua calls eth_type_trans() on unknown packets to prevent possible
      kernel panic on idpf.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        idpf: fix kernel panic on unknown packet types
        ice: fix enabling RX VLAN filtering
        ice: Fix freeing uninitialized pointers
      ====================
      
      Link: https://lore.kernel.org/r/20240403201929.1945116-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a66323e4
    • Eric Dumazet's avatar
      net/sched: act_skbmod: prevent kernel-infoleak · d313eb8b
      Eric Dumazet authored
      syzbot found that tcf_skbmod_dump() was copying four bytes
      from kernel stack to user space [1].
      
      The issue here is that 'struct tc_skbmod' has a four bytes hole.
      
      We need to clear the structure before filling fields.
      
      [1]
      BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
       BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline]
       BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline]
       BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
       BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline]
       BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
        instrument_copy_to_user include/linux/instrumented.h:114 [inline]
        copy_to_user_iter lib/iov_iter.c:24 [inline]
        iterate_ubuf include/linux/iov_iter.h:29 [inline]
        iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
        iterate_and_advance include/linux/iov_iter.h:271 [inline]
        _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
        copy_to_iter include/linux/uio.h:196 [inline]
        simple_copy_to_iter net/core/datagram.c:532 [inline]
        __skb_datagram_iter+0x185/0x1000 net/core/datagram.c:420
        skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546
        skb_copy_datagram_msg include/linux/skbuff.h:4050 [inline]
        netlink_recvmsg+0x432/0x1610 net/netlink/af_netlink.c:1962
        sock_recvmsg_nosec net/socket.c:1046 [inline]
        sock_recvmsg+0x2c4/0x340 net/socket.c:1068
        __sys_recvfrom+0x35a/0x5f0 net/socket.c:2242
        __do_sys_recvfrom net/socket.c:2260 [inline]
        __se_sys_recvfrom net/socket.c:2256 [inline]
        __x64_sys_recvfrom+0x126/0x1d0 net/socket.c:2256
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Uninit was stored to memory at:
        pskb_expand_head+0x30f/0x19d0 net/core/skbuff.c:2253
        netlink_trim+0x2c2/0x330 net/netlink/af_netlink.c:1317
        netlink_unicast+0x9f/0x1260 net/netlink/af_netlink.c:1351
        nlmsg_unicast include/net/netlink.h:1144 [inline]
        nlmsg_notify+0x21d/0x2f0 net/netlink/af_netlink.c:2610
        rtnetlink_send+0x73/0x90 net/core/rtnetlink.c:741
        rtnetlink_maybe_send include/linux/rtnetlink.h:17 [inline]
        tcf_add_notify net/sched/act_api.c:2048 [inline]
        tcf_action_add net/sched/act_api.c:2071 [inline]
        tc_ctl_action+0x146e/0x19d0 net/sched/act_api.c:2119
        rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
        netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
        rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
        netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
        netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg+0x30f/0x380 net/socket.c:745
        ____sys_sendmsg+0x877/0xb60 net/socket.c:2584
        ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
        __sys_sendmsg net/socket.c:2667 [inline]
        __do_sys_sendmsg net/socket.c:2676 [inline]
        __se_sys_sendmsg net/socket.c:2674 [inline]
        __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Uninit was stored to memory at:
        __nla_put lib/nlattr.c:1041 [inline]
        nla_put+0x1c6/0x230 lib/nlattr.c:1099
        tcf_skbmod_dump+0x23f/0xc20 net/sched/act_skbmod.c:256
        tcf_action_dump_old net/sched/act_api.c:1191 [inline]
        tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
        tcf_action_dump+0x1fd/0x460 net/sched/act_api.c:1251
        tca_get_fill+0x519/0x7a0 net/sched/act_api.c:1628
        tcf_add_notify_msg net/sched/act_api.c:2023 [inline]
        tcf_add_notify net/sched/act_api.c:2042 [inline]
        tcf_action_add net/sched/act_api.c:2071 [inline]
        tc_ctl_action+0x1365/0x19d0 net/sched/act_api.c:2119
        rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
        netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
        rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
        netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
        netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg+0x30f/0x380 net/socket.c:745
        ____sys_sendmsg+0x877/0xb60 net/socket.c:2584
        ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
        __sys_sendmsg net/socket.c:2667 [inline]
        __do_sys_sendmsg net/socket.c:2676 [inline]
        __se_sys_sendmsg net/socket.c:2674 [inline]
        __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Local variable opt created at:
        tcf_skbmod_dump+0x9d/0xc20 net/sched/act_skbmod.c:244
        tcf_action_dump_old net/sched/act_api.c:1191 [inline]
        tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
      
      Bytes 188-191 of 248 are uninitialized
      Memory access of size 248 starts at ffff888117697680
      Data copied to user address 00007ffe56d855f0
      
      Fixes: 86da71b5 ("net_sched: Introduce skbmod action")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20240403130908.93421-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d313eb8b
    • Jose Ignacio Tornos Martinez's avatar
      net: usb: ax88179_178a: avoid the interface always configured as random address · 2e91bb99
      Jose Ignacio Tornos Martinez authored
      After the commit d2689b6a ("net: usb: ax88179_178a: avoid two
      consecutive device resets"), reset is not executed from bind operation and
      mac address is not read from the device registers or the devicetree at that
      moment. Since the check to configure if the assigned mac address is random
      or not for the interface, happens after the bind operation from
      usbnet_probe, the interface keeps configured as random address, although the
      address is correctly read and set during open operation (the only reset
      now).
      
      In order to keep only one reset for the device and to avoid the interface
      always configured as random address, after reset, configure correctly the
      suitable field from the driver, if the mac address is read successfully from
      the device registers or the devicetree. Take into account if a locally
      administered address (random) was previously stored.
      
      cc: stable@vger.kernel.org # 6.6+
      Fixes: d2689b6a ("net: usb: ax88179_178a: avoid two consecutive device resets")
      Reported-by: default avatarDave Stevenson  <dave.stevenson@raspberrypi.com>
      Signed-off-by: default avatarJose Ignacio Tornos Martinez <jtornosm@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240403132158.344838-1-jtornosm@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2e91bb99
    • Christophe JAILLET's avatar
      net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45() · c120209b
      Christophe JAILLET authored
      The definition and declaration of sja1110_pcs_mdio_write_c45() don't have
      parameters in the same order.
      
      Knowing that sja1110_pcs_mdio_write_c45() is used as a function pointer
      in 'sja1105_info' structure with .pcs_mdio_write_c45, and that we have:
      
         int (*pcs_mdio_write_c45)(struct mii_bus *bus, int phy, int mmd,
      				  int reg, u16 val);
      
      it is likely that the definition is the one to change.
      
      Found with cppcheck, funcArgOrderDifferent.
      
      Fixes: ae271547 ("net: dsa: sja1105: C45 only transactions for PCS")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarMichael Walle <mwalle@kernel.org>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/ff2a5af67361988b3581831f7bd1eddebfb4c48f.1712082763.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c120209b
    • Paul Barker's avatar
      net: ravb: Always update error counters · 101b7641
      Paul Barker authored
      The error statistics should be updated each time the poll function is
      called, even if the full RX work budget has been consumed. This prevents
      the counts from becoming stuck when RX bandwidth usage is high.
      
      This also ensures that error counters are not updated after we've
      re-enabled interrupts as that could result in a race condition.
      
      Also drop an unnecessary space.
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Signed-off-by: default avatarPaul Barker <paul.barker.ct@bp.renesas.com>
      Reviewed-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Link: https://lore.kernel.org/r/20240402145305.82148-2-paul.barker.ct@bp.renesas.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      101b7641
    • Paul Barker's avatar
      net: ravb: Always process TX descriptor ring · 596a4254
      Paul Barker authored
      The TX queue should be serviced each time the poll function is called,
      even if the full RX work budget has been consumed. This prevents
      starvation of the TX queue when RX bandwidth usage is high.
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Signed-off-by: default avatarPaul Barker <paul.barker.ct@bp.renesas.com>
      Reviewed-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Link: https://lore.kernel.org/r/20240402145305.82148-1-paul.barker.ct@bp.renesas.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      596a4254
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: discard table flag update with pending basechain deletion · 1bc83a01
      Pablo Neira Ayuso authored
      Hook unregistration is deferred to the commit phase, same occurs with
      hook updates triggered by the table dormant flag. When both commands are
      combined, this results in deleting a basechain while leaving its hook
      still registered in the core.
      
      Fixes: 179d9ba5 ("netfilter: nf_tables: fix table flag updates")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1bc83a01
    • Ziyang Xuan's avatar
      netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get() · 24225011
      Ziyang Xuan authored
      nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can
      concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable().
      And thhere is not any protection when iterate over nf_tables_flowtables
      list in __nft_flowtable_type_get(). Therefore, there is pertential
      data-race of nf_tables_flowtables list entry.
      
      Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list
      in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller
      nft_flowtable_type_get() to protect the entire type query process.
      
      Fixes: 3b49e2e9 ("netfilter: nf_tables: add flow table netlink frontend")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      24225011
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: reject new basechain after table flag update · 994209dd
      Pablo Neira Ayuso authored
      When dormant flag is toggled, hooks are disabled in the commit phase by
      iterating over current chains in table (existing and new).
      
      The following configuration allows for an inconsistent state:
      
        add table x
        add chain x y { type filter hook input priority 0; }
        add table x { flags dormant; }
        add chain x w { type filter hook input priority 1; }
      
      which triggers the following warning when trying to unregister chain w
      which is already unregistered.
      
      [  127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50                                                                     1 __nf_unregister_net_hook+0x21a/0x260
      [...]
      [  127.322519] Call Trace:
      [  127.322521]  <TASK>
      [  127.322524]  ? __warn+0x9f/0x1a0
      [  127.322531]  ? __nf_unregister_net_hook+0x21a/0x260
      [  127.322537]  ? report_bug+0x1b1/0x1e0
      [  127.322545]  ? handle_bug+0x3c/0x70
      [  127.322552]  ? exc_invalid_op+0x17/0x40
      [  127.322556]  ? asm_exc_invalid_op+0x1a/0x20
      [  127.322563]  ? kasan_save_free_info+0x3b/0x60
      [  127.322570]  ? __nf_unregister_net_hook+0x6a/0x260
      [  127.322577]  ? __nf_unregister_net_hook+0x21a/0x260
      [  127.322583]  ? __nf_unregister_net_hook+0x6a/0x260
      [  127.322590]  ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables]
      [  127.322655]  nft_table_disable+0x75/0xf0 [nf_tables]
      [  127.322717]  nf_tables_commit+0x2571/0x2620 [nf_tables]
      
      Fixes: 179d9ba5 ("netfilter: nf_tables: fix table flag updates")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      994209dd
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: flush pending destroy work before exit_net release · 24cea967
      Pablo Neira Ayuso authored
      Similar to 2c9f0293 ("netfilter: nf_tables: flush pending destroy
      work before netlink notifier") to address a race between exit_net and
      the destroy workqueue.
      
      The trace below shows an element to be released via destroy workqueue
      while exit_net path (triggered via module removal) has already released
      the set that is used in such transaction.
      
      [ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465
      [ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359
      [ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
      [ 1360.547984] Call Trace:
      [ 1360.547991]  <TASK>
      [ 1360.547998]  dump_stack_lvl+0x53/0x70
      [ 1360.548014]  print_report+0xc4/0x610
      [ 1360.548026]  ? __virt_addr_valid+0xba/0x160
      [ 1360.548040]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
      [ 1360.548054]  ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.548176]  kasan_report+0xae/0xe0
      [ 1360.548189]  ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.548312]  nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
      [ 1360.548447]  ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables]
      [ 1360.548577]  ? _raw_spin_unlock_irq+0x18/0x30
      [ 1360.548591]  process_one_work+0x2f1/0x670
      [ 1360.548610]  worker_thread+0x4d3/0x760
      [ 1360.548627]  ? __pfx_worker_thread+0x10/0x10
      [ 1360.548640]  kthread+0x16b/0x1b0
      [ 1360.548653]  ? __pfx_kthread+0x10/0x10
      [ 1360.548665]  ret_from_fork+0x2f/0x50
      [ 1360.548679]  ? __pfx_kthread+0x10/0x10
      [ 1360.548690]  ret_from_fork_asm+0x1a/0x30
      [ 1360.548707]  </TASK>
      
      [ 1360.548719] Allocated by task 192061:
      [ 1360.548726]  kasan_save_stack+0x20/0x40
      [ 1360.548739]  kasan_save_track+0x14/0x30
      [ 1360.548750]  __kasan_kmalloc+0x8f/0xa0
      [ 1360.548760]  __kmalloc_node+0x1f1/0x450
      [ 1360.548771]  nf_tables_newset+0x10c7/0x1b50 [nf_tables]
      [ 1360.548883]  nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink]
      [ 1360.548909]  nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
      [ 1360.548927]  netlink_unicast+0x367/0x4f0
      [ 1360.548935]  netlink_sendmsg+0x34b/0x610
      [ 1360.548944]  ____sys_sendmsg+0x4d4/0x510
      [ 1360.548953]  ___sys_sendmsg+0xc9/0x120
      [ 1360.548961]  __sys_sendmsg+0xbe/0x140
      [ 1360.548971]  do_syscall_64+0x55/0x120
      [ 1360.548982]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
      
      [ 1360.548994] Freed by task 192222:
      [ 1360.548999]  kasan_save_stack+0x20/0x40
      [ 1360.549009]  kasan_save_track+0x14/0x30
      [ 1360.549019]  kasan_save_free_info+0x3b/0x60
      [ 1360.549028]  poison_slab_object+0x100/0x180
      [ 1360.549036]  __kasan_slab_free+0x14/0x30
      [ 1360.549042]  kfree+0xb6/0x260
      [ 1360.549049]  __nft_release_table+0x473/0x6a0 [nf_tables]
      [ 1360.549131]  nf_tables_exit_net+0x170/0x240 [nf_tables]
      [ 1360.549221]  ops_exit_list+0x50/0xa0
      [ 1360.549229]  free_exit_list+0x101/0x140
      [ 1360.549236]  unregister_pernet_operations+0x107/0x160
      [ 1360.549245]  unregister_pernet_subsys+0x1c/0x30
      [ 1360.549254]  nf_tables_module_exit+0x43/0x80 [nf_tables]
      [ 1360.549345]  __do_sys_delete_module+0x253/0x370
      [ 1360.549352]  do_syscall_64+0x55/0x120
      [ 1360.549360]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
      
      (gdb) list *__nft_release_table+0x473
      0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354).
      11349           list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) {
      11350                   list_del(&flowtable->list);
      11351                   nft_use_dec(&table->use);
      11352                   nf_tables_flowtable_destroy(flowtable);
      11353           }
      11354           list_for_each_entry_safe(set, ns, &table->sets, list) {
      11355                   list_del(&set->list);
      11356                   nft_use_dec(&table->use);
      11357                   if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
      11358                           nft_map_deactivate(&ctx, set);
      (gdb)
      
      [ 1360.549372] Last potentially related work creation:
      [ 1360.549376]  kasan_save_stack+0x20/0x40
      [ 1360.549384]  __kasan_record_aux_stack+0x9b/0xb0
      [ 1360.549392]  __queue_work+0x3fb/0x780
      [ 1360.549399]  queue_work_on+0x4f/0x60
      [ 1360.549407]  nft_rhash_remove+0x33b/0x340 [nf_tables]
      [ 1360.549516]  nf_tables_commit+0x1c6a/0x2620 [nf_tables]
      [ 1360.549625]  nfnetlink_rcv_batch+0x728/0xdc0 [nfnetlink]
      [ 1360.549647]  nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
      [ 1360.549671]  netlink_unicast+0x367/0x4f0
      [ 1360.549680]  netlink_sendmsg+0x34b/0x610
      [ 1360.549690]  ____sys_sendmsg+0x4d4/0x510
      [ 1360.549697]  ___sys_sendmsg+0xc9/0x120
      [ 1360.549706]  __sys_sendmsg+0xbe/0x140
      [ 1360.549715]  do_syscall_64+0x55/0x120
      [ 1360.549725]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
      
      Fixes: 0935d558 ("netfilter: nf_tables: asynchronous release")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      24cea967
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path · 0d459e2f
      Pablo Neira Ayuso authored
      The commit mutex should not be released during the critical section
      between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC
      worker could collect expired objects and get the released commit lock
      within the same GC sequence.
      
      nf_tables_module_autoload() temporarily releases the mutex to load
      module dependencies, then it goes back to replay the transaction again.
      Move it at the end of the abort phase after nft_gc_seq_end() is called.
      
      Cc: stable@vger.kernel.org
      Fixes: 72034434 ("netfilter: nf_tables: GC transaction race with abort path")
      Reported-by: default avatarKuan-Ting Chen <hexrabbit@devco.re>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0d459e2f