1. 13 Oct, 2022 4 commits
    • Kuniyuki Iwashima's avatar
      ipv6: Fix data races around sk->sk_prot. · 364f997b
      Kuniyuki Iwashima authored
      Commit 086d4905 ("ipv6: annotate some data-races around sk->sk_prot")
      fixed some data-races around sk->sk_prot but it was not enough.
      
      Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot
      without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid
      load tearing.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      364f997b
    • Kuniyuki Iwashima's avatar
      tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). · d38afeec
      Kuniyuki Iwashima authored
      Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
      able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
      IPv4 conversion by IPV6_ADDRFORM.  However, commit 03485f2a ("udpv6:
      Add lockless sendmsg() support") added a lockless memory allocation path,
      which could cause a memory leak:
      
      setsockopt(IPV6_ADDRFORM)                 sendmsg()
      +-----------------------+                 +-------+
      - do_ipv6_setsockopt(sk, ...)             - udpv6_sendmsg(sk, ...)
        - sockopt_lock_sock(sk)                   ^._ called via udpv6_prot
          - lock_sock(sk)                             before WRITE_ONCE()
        - WRITE_ONCE(sk->sk_prot, &tcp_prot)
        - inet6_destroy_sock()                    - if (!corkreq)
        - sockopt_release_sock(sk)                  - ip6_make_skb(sk, ...)
          - release_sock(sk)                          ^._ lockless fast path for
                                                          the non-corking case
      
                                                      - __ip6_append_data(sk, ...)
                                                        - ipv6_local_rxpmtu(sk, ...)
                                                          - xchg(&np->rxpmtu, skb)
                                                            ^._ rxpmtu is never freed.
      
                                                      - goto out_no_dst;
      
                                                  - lock_sock(sk)
      
      For now, rxpmtu is only the case, but not to miss the future change
      and a similar bug fixed in commit e2732600 ("net: ping6: Fix
      memleak in ipv6_renew_options()."), let's set a new function to IPv6
      sk->sk_destruct() and call inet6_cleanup_sock() there.  Since the
      conversion does not change sk->sk_destruct(), we can guarantee that
      we can clean up IPv6 resources finally.
      
      We can now remove all inet6_destroy_sock() calls from IPv6 protocol
      specific ->destroy() functions, but such changes are invasive to
      backport.  So they can be posted as a follow-up later for net-next.
      
      Fixes: 03485f2a ("udpv6: Add lockless sendmsg() support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d38afeec
    • Kuniyuki Iwashima's avatar
      udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM). · 21985f43
      Kuniyuki Iwashima authored
      Commit 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support") forgot
      to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
      socket into IPv4 with IPV6_ADDRFORM.  After conversion, sk_prot is
      changed to udp_prot and ->destroy() never cleans it up, resulting in
      a memory leak.
      
      This is due to the discrepancy between inet6_destroy_sock() and
      IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
      to remove the difference.
      
      However, this is not enough for now because rxpmtu can be changed
      without lock_sock() after commit 03485f2a ("udpv6: Add lockless
      sendmsg() support").  We will fix this case in the following patch.
      
      Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
      remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
      in the future.
      
      Fixes: 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      21985f43
    • Kuniyuki Iwashima's avatar
      tcp/udp: Fix memory leak in ipv6_renew_options(). · 3c52c6bb
      Kuniyuki Iwashima authored
      syzbot reported a memory leak [0] related to IPV6_ADDRFORM.
      
      The scenario is that while one thread is converting an IPv6 socket into
      IPv4 with IPV6_ADDRFORM, another thread calls do_ipv6_setsockopt() and
      allocates memory to inet6_sk(sk)->XXX after conversion.
      
      Then, the converted sk with (tcp|udp)_prot never frees the IPv6 resources,
      which inet6_destroy_sock() should have cleaned up.
      
      setsockopt(IPV6_ADDRFORM)                 setsockopt(IPV6_DSTOPTS)
      +-----------------------+                 +----------------------+
      - do_ipv6_setsockopt(sk, ...)
        - sockopt_lock_sock(sk)                 - do_ipv6_setsockopt(sk, ...)
          - lock_sock(sk)                         ^._ called via tcpv6_prot
        - WRITE_ONCE(sk->sk_prot, &tcp_prot)          before WRITE_ONCE()
        - xchg(&np->opt, NULL)
        - txopt_put(opt)
        - sockopt_release_sock(sk)
          - release_sock(sk)                      - sockopt_lock_sock(sk)
                                                    - lock_sock(sk)
                                                  - ipv6_set_opt_hdr(sk, ...)
                                                    - ipv6_update_options(sk, opt)
                                                      - xchg(&inet6_sk(sk)->opt, opt)
                                                        ^._ opt is never freed.
      
                                                  - sockopt_release_sock(sk)
                                                    - release_sock(sk)
      
      Since IPV6_DSTOPTS allocates options under lock_sock(), we can avoid this
      memory leak by testing whether sk_family is changed by IPV6_ADDRFORM after
      acquiring the lock.
      
      This issue exists from the initial commit between IPV6_ADDRFORM and
      IPV6_PKTOPTIONS.
      
      [0]:
      BUG: memory leak
      unreferenced object 0xffff888009ab9f80 (size 96):
        comm "syz-executor583", pid 328, jiffies 4294916198 (age 13.034s)
        hex dump (first 32 bytes):
          01 00 00 00 48 00 00 00 08 00 00 00 00 00 00 00  ....H...........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002ee98ae1>] kmalloc include/linux/slab.h:605 [inline]
          [<000000002ee98ae1>] sock_kmalloc+0xb3/0x100 net/core/sock.c:2566
          [<0000000065d7b698>] ipv6_renew_options+0x21e/0x10b0 net/ipv6/exthdrs.c:1318
          [<00000000a8c756d7>] ipv6_set_opt_hdr net/ipv6/ipv6_sockglue.c:354 [inline]
          [<00000000a8c756d7>] do_ipv6_setsockopt.constprop.0+0x28b7/0x4350 net/ipv6/ipv6_sockglue.c:668
          [<000000002854d204>] ipv6_setsockopt+0xdf/0x190 net/ipv6/ipv6_sockglue.c:1021
          [<00000000e69fdcf8>] tcp_setsockopt+0x13b/0x2620 net/ipv4/tcp.c:3789
          [<0000000090da4b9b>] __sys_setsockopt+0x239/0x620 net/socket.c:2252
          [<00000000b10d192f>] __do_sys_setsockopt net/socket.c:2263 [inline]
          [<00000000b10d192f>] __se_sys_setsockopt net/socket.c:2260 [inline]
          [<00000000b10d192f>] __x64_sys_setsockopt+0xbe/0x160 net/socket.c:2260
          [<000000000a80d7aa>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<000000000a80d7aa>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
          [<000000004562b5c6>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3c52c6bb
  2. 12 Oct, 2022 12 commits
    • Jeremy Kerr's avatar
      mctp: prevent double key removal and unref · 3a732b46
      Jeremy Kerr authored
      Currently, we have a bug where a simultaneous DROPTAG ioctl and socket
      close may race, as we attempt to remove a key from lists twice, and
      perform an unref for each removal operation. This may result in a uaf
      when we attempt the second unref.
      
      This change fixes the race by making __mctp_key_remove tolerant to being
      called on a key that has already been removed from the socket/net lists,
      and only performs the unref when we do the actual remove. We also need
      to hold the list lock on the ioctl cleanup path.
      
      This fix is based on a bug report and comprehensive analysis from
      butt3rflyh4ck <butterflyhuangxx@gmail.com>, found via syzkaller.
      
      Cc: stable@vger.kernel.org
      Fixes: 63ed1aab ("mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control")
      Reported-by: default avatarbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Signed-off-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a732b46
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · ed5d1f61
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      netfilter fixes for net
      
      This series from Phil Sutter for the *net* tree fixes a problem with a change
      from the 6.1 development phase: the change to nft_fib should have used
      the more recent flowic_l3mdev field.  Pointed out by Guillaume Nault.
      This also makes the older iptables module follow the same pattern.
      
      Also add selftest case and avoid test failure in nft_fib.sh when the
      host environment has set rp_filter=1.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed5d1f61
    • Phil Sutter's avatar
      selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1 · 6a91e727
      Phil Sutter authored
      If net.ipv4.conf.all.rp_filter is set, it overrides the per-interface
      setting and thus defeats the fix from bbe4c089 ("selftests:
      netfilter: disable rp_filter on router"). Unset it as well to cover that
      case.
      
      Fixes: bbe4c089 ("selftests: netfilter: disable rp_filter on router")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      6a91e727
    • Phil Sutter's avatar
      netfilter: rpfilter/fib: Populate flowic_l3mdev field · acc641ab
      Phil Sutter authored
      Use the introduced field for correct operation with VRF devices instead
      of conditionally overwriting flowic_oif. This is a partial revert of
      commit b575b24b ("netfilter: Fix rpfilter dropping vrf packets by
      mistake"), implementing a simpler solution.
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      acc641ab
    • Phil Sutter's avatar
      selftests: netfilter: Test reverse path filtering · 6e31ce83
      Phil Sutter authored
      Test reverse path (filter) matches in iptables, ip6tables and nftables.
      Both with a regular interface and a VRF.
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      6e31ce83
    • Leon Romanovsky's avatar
      net/mlx5: Make ASO poll CQ usable in atomic context · 739cfa34
      Leon Romanovsky authored
      Poll CQ functions shouldn't sleep as they are called in atomic context.
      The following splat appears once the mlx5_aso_poll_cq() is used in such
      flow.
      
       BUG: scheduling while atomic: swapper/17/0/0x00000100
       Modules linked in: sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core fuse [last unloaded: mlx5_core]
       CPU: 17 PID: 0 Comm: swapper/17 Tainted: G        W          6.0.0-rc2+ #13
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        <IRQ>
        dump_stack_lvl+0x34/0x44
        __schedule_bug.cold+0x47/0x53
        __schedule+0x4b6/0x670
        ? hrtimer_start_range_ns+0x28d/0x360
        schedule+0x50/0x90
        schedule_hrtimeout_range_clock+0x98/0x120
        ? __hrtimer_init+0xb0/0xb0
        usleep_range_state+0x60/0x90
        mlx5_aso_poll_cq+0xad/0x190 [mlx5_core]
        mlx5e_ipsec_aso_update_curlft+0x81/0xb0 [mlx5_core]
        xfrm_timer_handler+0x6b/0x360
        ? xfrm_find_acq_byseq+0x50/0x50
        __hrtimer_run_queues+0x139/0x290
        hrtimer_run_softirq+0x7d/0xe0
        __do_softirq+0xc7/0x272
        irq_exit_rcu+0x87/0xb0
        sysvec_apic_timer_interrupt+0x72/0x90
        </IRQ>
        <TASK>
        asm_sysvec_apic_timer_interrupt+0x16/0x20
       RIP: 0010:default_idle+0x18/0x20
       Code: ae 7d ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 8b 05 b5 30 0d 01 85 c0 7e 07 0f 00 2d 0a e3 53 00 fb f4 <c3> 0f 1f 80 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 80 ad 01 00
       RSP: 0018:ffff888100883ee0 EFLAGS: 00000242
       RAX: 0000000000000001 RBX: ffff888100849580 RCX: 4000000000000000
       RDX: 0000000000000001 RSI: 0000000000000083 RDI: 000000000008863c
       RBP: 0000000000000011 R08: 00000064e6977fa9 R09: 0000000000000001
       R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
        default_idle_call+0x37/0xb0
        do_idle+0x1cd/0x1e0
        cpu_startup_entry+0x19/0x20
        start_secondary+0xfe/0x120
        secondary_startup_64_no_verify+0xcd/0xdb
        </TASK>
       softirq: huh, entered softirq 8 HRTIMER 00000000a97c08cb with preempt_count 00000100, exited with 00000000?
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      739cfa34
    • Eric Dumazet's avatar
      tcp: cdg: allow tcp_cdg_release() to be called multiple times · 72e560cb
      Eric Dumazet authored
      Apparently, mptcp is able to call tcp_disconnect() on an already
      disconnected flow. This is generally fine, unless current congestion
      control is CDG, because it might trigger a double-free [1]
      
      Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
      more resilient.
      
      [1]
      BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
      BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567
      
      CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      Workqueue: events mptcp_worker
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:317 [inline]
      print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
      kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
      ____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
      kasan_slab_free include/linux/kasan.h:200 [inline]
      slab_free_hook mm/slub.c:1759 [inline]
      slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
      slab_free mm/slub.c:3539 [inline]
      kfree+0xe2/0x580 mm/slub.c:4567
      tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
      __mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
      mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
      mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
      process_one_work+0x991/0x1610 kernel/workqueue.c:2289
      worker_thread+0x665/0x1080 kernel/workqueue.c:2436
      kthread+0x2e4/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
      </TASK>
      
      Allocated by task 3671:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track mm/kasan/common.c:45 [inline]
      set_alloc_info mm/kasan/common.c:437 [inline]
      ____kasan_kmalloc mm/kasan/common.c:516 [inline]
      ____kasan_kmalloc mm/kasan/common.c:475 [inline]
      __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
      kmalloc_array include/linux/slab.h:640 [inline]
      kcalloc include/linux/slab.h:671 [inline]
      tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
      tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
      tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
      tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
      do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
      tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
      mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
      __sys_setsockopt+0x2d6/0x690 net/socket.c:2252
      __do_sys_setsockopt net/socket.c:2263 [inline]
      __se_sys_setsockopt net/socket.c:2260 [inline]
      __x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 16:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track+0x21/0x30 mm/kasan/common.c:45
      kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
      ____kasan_slab_free mm/kasan/common.c:367 [inline]
      ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
      kasan_slab_free include/linux/kasan.h:200 [inline]
      slab_free_hook mm/slub.c:1759 [inline]
      slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
      slab_free mm/slub.c:3539 [inline]
      kfree+0xe2/0x580 mm/slub.c:4567
      tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
      tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
      tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
      inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
      tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
      tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
      tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
      tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
      ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
      ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
      NF_HOOK include/linux/netfilter.h:302 [inline]
      NF_HOOK include/linux/netfilter.h:296 [inline]
      ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
      dst_input include/net/dst.h:455 [inline]
      ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
      ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
      ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
      nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
      nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
      nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
      NF_HOOK include/linux/netfilter.h:300 [inline]
      ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
      __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
      netif_receive_skb_internal net/core/dev.c:5685 [inline]
      netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
      NF_HOOK include/linux/netfilter.h:302 [inline]
      NF_HOOK include/linux/netfilter.h:296 [inline]
      br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
      br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
      br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
      br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
      NF_HOOK include/linux/netfilter.h:302 [inline]
      br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
      br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
      nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
      nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
      br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
      __netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
      __netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
      __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
      process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
      __napi_poll+0xb3/0x6d0 net/core/dev.c:6494
      napi_poll net/core/dev.c:6561 [inline]
      net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
      __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571
      
      Fixes: 2b0a8c9e ("tcp: add CDG congestion control")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72e560cb
    • David S. Miller's avatar
      Merge branch 'inet-ping-fixes' · 4a4462a0
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      inet: ping: give ping some care
      
      First patch fixes an ipv6 ping bug that has been there forever,
      for large sizes.
      
      Second patch fixes a recent and elusive bug, that can potentially
      crash the host. This is what I mentioned privately to Paolo and
      Jakub at LPC in Dublin.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a4462a0
    • Eric Dumazet's avatar
      inet: ping: fix recent breakage · 0d24148b
      Eric Dumazet authored
      Blamed commit broke the assumption used by ping sendmsg() that
      allocated skb would have MAX_HEADER bytes in skb->head.
      
      This patch changes the way ping works, by making sure
      the skb head contains space for the icmp header,
      and adjusting ping_getfrag() which was desperate
      about going past the icmp header :/
      
      This is adopting what UDP does, mostly.
      
      syzbot is able to crash a host using both kfence and following repro in a loop.
      
      fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6)
      connect(fd, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0),
      		inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28
      sendmsg(fd, {msg_name=NULL, msg_namelen=0, msg_iov=[
      		{iov_base="\200\0\0\0\23\0\0\0\0\0\0\0\0\0"..., iov_len=65496}],
      		msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0
      
      When kfence triggers, skb->head only has 64 bytes, immediately followed
      by struct skb_shared_info (no extra headroom based on ksize(ptr))
      
      Then icmpv6_push_pending_frames() is overwriting first bytes
      of skb_shinfo(skb), making nr_frags bigger than MAX_SKB_FRAGS,
      and/or setting shinfo->gso_size to a non zero value.
      
      If nr_frags is mangled, a crash happens in skb_release_data()
      
      If gso_size is mangled, we have the following report:
      
      lo: caps=(0x00000516401d7c69, 0x00000516401d7c69)
      WARNING: CPU: 0 PID: 7548 at net/core/dev.c:3239 skb_warn_bad_offload+0x119/0x230 net/core/dev.c:3239
      Modules linked in:
      CPU: 0 PID: 7548 Comm: syz-executor268 Not tainted 6.0.0-syzkaller-02754-g557f0501 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      RIP: 0010:skb_warn_bad_offload+0x119/0x230 net/core/dev.c:3239
      Code: 70 03 00 00 e8 58 c3 24 fa 4c 8d a5 e8 00 00 00 e8 4c c3 24 fa 4c 89 e9 4c 89 e2 4c 89 f6 48 c7 c7 00 53 f5 8a e8 13 ac e7 01 <0f> 0b 5b 5d 41 5c 41 5d 41 5e e9 28 c3 24 fa e8 23 c3 24 fa 48 89
      RSP: 0018:ffffc9000366f3e8 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffff88807a9d9d00 RCX: 0000000000000000
      RDX: ffff8880780c0000 RSI: ffffffff8160f6f8 RDI: fffff520006cde6f
      RBP: ffff888079952000 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000400 R11: 0000000000000000 R12: ffff8880799520e8
      R13: ffff88807a9da070 R14: ffff888079952000 R15: 0000000000000000
      FS: 0000555556be6300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020010000 CR3: 000000006eb7b000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      gso_features_check net/core/dev.c:3521 [inline]
      netif_skb_features+0x83e/0xb90 net/core/dev.c:3554
      validate_xmit_skb+0x2b/0xf10 net/core/dev.c:3659
      __dev_queue_xmit+0x998/0x3ad0 net/core/dev.c:4248
      dev_queue_xmit include/linux/netdevice.h:3008 [inline]
      neigh_hh_output include/net/neighbour.h:530 [inline]
      neigh_output include/net/neighbour.h:544 [inline]
      ip6_finish_output2+0xf97/0x1520 net/ipv6/ip6_output.c:134
      __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
      ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
      dst_output include/net/dst.h:445 [inline]
      ip6_local_out+0xaf/0x1a0 net/ipv6/output_core.c:161
      ip6_send_skb+0xb7/0x340 net/ipv6/ip6_output.c:1966
      ip6_push_pending_frames+0xdd/0x100 net/ipv6/ip6_output.c:1986
      icmpv6_push_pending_frames+0x2af/0x490 net/ipv6/icmp.c:303
      ping_v6_sendmsg+0xc44/0x1190 net/ipv6/ping.c:190
      inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x712/0x8c0 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f21aab42b89
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fff1729d038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f21aab42b89
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
      R10: 000000000000000d R11: 0000000000000246 R12: 00007fff1729d050
      R13: 00000000000f4240 R14: 0000000000021dd1 R15: 00007fff1729d044
      </TASK>
      
      Fixes: 47cf8899 ("net: unify alloclen calculation for paged requests")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d24148b
    • Eric Dumazet's avatar
      ipv6: ping: fix wrong checksum for large frames · 87445f36
      Eric Dumazet authored
      For a given ping datagram, ping_getfrag() is called once
      per skb fragment.
      
      A large datagram requiring more than one page fragment
      is currently getting the checksum of the last fragment,
      instead of the cumulative one.
      
      After this patch, "ping -s 35000 ::1" is working correctly.
      
      Fixes: 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87445f36
    • Matthias Schiffer's avatar
      net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports · 7e777b1b
      Matthias Schiffer authored
      am65_cpsw_nuss_register_ndevs() skips calling devlink_port_type_eth_set()
      for ports without assigned netdev, triggering the following warning when
      DEVLINK_PORT_TYPE_WARN_TIMEOUT elapses after 3600s:
      
          Type was not set for devlink port.
          WARNING: CPU: 0 PID: 129 at net/core/devlink.c:8095 devlink_port_type_warn+0x18/0x30
      
      Fixes: 0680e20a ("net: ethernet: ti: am65-cpsw: Fix devlink port register sequence")
      Signed-off-by: default avatarMatthias Schiffer <matthias.schiffer@ew.tq-group.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e777b1b
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · 72da9dc2
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.1
      
      First set of fixes for v6.1. Quite a lot of fixes in stack but also
      for mt76.
      
      cfg80211/mac80211
       - fix locking error in mac80211's hw addr change
       - fix TX queue stop for internal TXQs
       - handling of very small (e.g. STP TCN) packets
       - two memcpy() hardening fixes
       - fix probe request 6 GHz capability warning
       - fix various connection prints
       - fix decapsulation offload for AP VLAN
      
      mt76
       - fix rate reporting, LLC packets and receive checksum offload on specific chipsets
      
      iwlwifi
       - fix crash due to list corruption
      
      ath11k
       - fix a compiler warning with GCC 11 and KASAN
      
      * tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: ath11k: mac: fix reading 16 bytes from a region of size 0 warning
        wifi: iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (other cases)
        wifi: mt76: fix rx checksum offload on mt7615/mt7915/mt7921
        wifi: mt76: fix receiving LLC packets on mt7615/mt7915
        wifi: nl80211: Split memcpy() of struct nl80211_wowlan_tcp_data_token flexible array
        wifi: wext: use flex array destination for memcpy()
        wifi: cfg80211: fix ieee80211_data_to_8023_exthdr handling of small packets
        wifi: mac80211: netdev compatible TX stop for iTXQ drivers
        wifi: mac80211: fix decap offload for stations on AP_VLAN interfaces
        wifi: mac80211: unlock on error in ieee80211_can_powered_addr_change()
        wifi: mac80211: remove/avoid misleading prints
        wifi: mac80211: fix probe req HE capabilities access
        wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rx
        wifi: mt76: fix rate reporting / throughput regression on mt7915 and newer
      ====================
      
      Link: https://lore.kernel.org/r/20221011163123.A093CC433D6@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      72da9dc2
  3. 11 Oct, 2022 14 commits
  4. 10 Oct, 2022 2 commits
  5. 09 Oct, 2022 5 commits
  6. 07 Oct, 2022 3 commits
    • Kees Cook's avatar
      wifi: nl80211: Split memcpy() of struct nl80211_wowlan_tcp_data_token flexible array · 10d5ea5a
      Kees Cook authored
      To work around a misbehavior of the compiler's ability to see into
      composite flexible array structs (as detailed in the coming memcpy()
      hardening series[1]), split the memcpy() of the header and the payload
      so no false positive run-time overflow warning will be generated.
      
      [1] https://lore.kernel.org/linux-hardening/20220901065914.1417829-2-keescook@chromium.org/Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      10d5ea5a
    • Hawkins Jiawei's avatar
      wifi: wext: use flex array destination for memcpy() · e3e6e1d1
      Hawkins Jiawei authored
      Syzkaller reports buffer overflow false positive as follows:
      ------------[ cut here ]------------
      memcpy: detected field-spanning write (size 8) of single field
      	"&compat_event->pointer" at net/wireless/wext-core.c:623 (size 4)
      WARNING: CPU: 0 PID: 3607 at net/wireless/wext-core.c:623
      	wireless_send_event+0xab5/0xca0 net/wireless/wext-core.c:623
      Modules linked in:
      CPU: 1 PID: 3607 Comm: syz-executor659 Not tainted
      	6.0.0-rc6-next-20220921-syzkaller #0
      [...]
      Call Trace:
       <TASK>
       ioctl_standard_call+0x155/0x1f0 net/wireless/wext-core.c:1022
       wireless_process_ioctl+0xc8/0x4c0 net/wireless/wext-core.c:955
       wext_ioctl_dispatch net/wireless/wext-core.c:988 [inline]
       wext_ioctl_dispatch net/wireless/wext-core.c:976 [inline]
       wext_handle_ioctl+0x26b/0x280 net/wireless/wext-core.c:1049
       sock_ioctl+0x285/0x640 net/socket.c:1220
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:870 [inline]
       __se_sys_ioctl fs/ioctl.c:856 [inline]
       __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
       [...]
       </TASK>
      
      Wireless events will be sent on the appropriate channels in
      wireless_send_event(). Different wireless events may have different
      payload structure and size, so kernel uses **len** and **cmd** field
      in struct __compat_iw_event as wireless event common LCP part, uses
      **pointer** as a label to mark the position of remaining different part.
      
      Yet the problem is that, **pointer** is a compat_caddr_t type, which may
      be smaller than the relative structure at the same position. So during
      wireless_send_event() tries to parse the wireless events payload, it may
      trigger the memcpy() run-time destination buffer bounds checking when the
      relative structure's data is copied to the position marked by **pointer**.
      
      This patch solves it by introducing flexible-array field **ptr_bytes**,
      to mark the position of the wireless events remaining part next to
      LCP part. What's more, this patch also adds **ptr_len** variable in
      wireless_send_event() to improve its maintainability.
      
      Reported-and-tested-by: syzbot+473754e5af963cf014cf@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/all/00000000000070db2005e95a5984@google.com/Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      e3e6e1d1
    • Felix Fietkau's avatar
      wifi: cfg80211: fix ieee80211_data_to_8023_exthdr handling of small packets · d9e24970
      Felix Fietkau authored
      STP topology change notification packets only have a payload of 7 bytes,
      so they get dropped due to the skb->len < hdrlen + 8 check.
      Fix this by removing the extra 8 from the skb->len check and checking the
      return code on the skb_copy_bits calls.
      
      Fixes: 2d1c304c ("cfg80211: add function for 802.3 conversion with separate output buffer")
      Reported-by: default avatarChad Monroe <chad.monroe@smartrg.com>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      d9e24970