1. 19 Feb, 2022 5 commits
  2. 18 Feb, 2022 6 commits
    • Xiaoke Wang's avatar
      net: ll_temac: check the return value of devm_kmalloc() · b352c346
      Xiaoke Wang authored
      devm_kmalloc() returns a pointer to allocated memory on success, NULL
      on failure. While lp->indirect_lock is allocated by devm_kmalloc()
      without proper check. It is better to check the value of it to
      prevent potential wrong memory access.
      
      Fixes: f14f5c11 ("net: ll_temac: Support indirect_mutex share within TEMAC IP")
      Signed-off-by: default avatarXiaoke Wang <xkernel.wang@foxmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b352c346
    • Eric Dumazet's avatar
      net-timestamp: convert sk->sk_tskey to atomic_t · a1cdec57
      Eric Dumazet authored
      UDP sendmsg() can be lockless, this is causing all kinds
      of data races.
      
      This patch converts sk->sk_tskey to remove one of these races.
      
      BUG: KCSAN: data-race in __ip_append_data / __ip_append_data
      
      read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1:
       __ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0:
       __ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000054d -> 0x0000054e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 09c2d251 ("net-timestamp: add key to disambiguate concurrent datagrams")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1cdec57
    • Oliver Neukum's avatar
      sr9700: sanity check for packet length · e9da0b56
      Oliver Neukum authored
      A malicious device can leak heap data to user space
      providing bogus frame lengths. Introduce a sanity check.
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.com>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9da0b56
    • Paul Blakey's avatar
      net/sched: act_ct: Fix flow table lookup after ct clear or switching zones · 2f131de3
      Paul Blakey authored
      Flow table lookup is skipped if packet either went through ct clear
      action (which set the IP_CT_UNTRACKED flag on the packet), or while
      switching zones and there is already a connection associated with
      the packet. This will result in no SW offload of the connection,
      and the and connection not being removed from flow table with
      TCP teardown (fin/rst packet).
      
      To fix the above, remove these unneccary checks in flow
      table lookup.
      
      Fixes: 46475bb2 ("net/sched: act_ct: Software offload of established flows")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f131de3
    • suresh kumar's avatar
      net-sysfs: add check for netdevice being present to speed_show · 4224cfd7
      suresh kumar authored
      When bringing down the netdevice or system shutdown, a panic can be
      triggered while accessing the sysfs path because the device is already
      removed.
      
          [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
          [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
          ...
          [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
          [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
      
          crash> bt
          ...
          PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
          ...
           #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
              [exception RIP: dma_pool_alloc+0x1ab]
              RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
              RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
              RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
              RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
              R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
              R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
              ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
          #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
          #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
          #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
          #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
          #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
          #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
          #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
          #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
          #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
          #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
          #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
          #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
          #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
          #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
          #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
          #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
          #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92
      
          crash> net_device.state ffff89443b0c0000
            state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)
      
      To prevent this scenario, we also make sure that the netdevice is present.
      Signed-off-by: default avatarsuresh kumar <suresh2514@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4224cfd7
    • Duoming Zhou's avatar
      drivers: hamradio: 6pack: fix UAF bug caused by mod_timer() · efe4186e
      Duoming Zhou authored
      When a 6pack device is detaching, the sixpack_close() will act to cleanup
      necessary resources. Although del_timer_sync() in sixpack_close()
      won't return if there is an active timer, one could use mod_timer() in
      sp_xmit_on_air() to wake up timer again by calling userspace syscall such
      as ax25_sendmsg(), ax25_connect() and ax25_ioctl().
      
      This unexpected waked handler, sp_xmit_on_air(), realizes nothing about
      the undergoing cleanup and may still call pty_write() to use driver layer
      resources that have already been released.
      
      One of the possible race conditions is shown below:
      
            (USE)                      |      (FREE)
      ax25_sendmsg()                   |
       ax25_queue_xmit()               |
        ...                            |
        sp_xmit()                      |
         sp_encaps()                   | sixpack_close()
          sp_xmit_on_air()             |  del_timer_sync(&sp->tx_t)
           mod_timer(&sp->tx_t,...)    |  ...
                                       |  unregister_netdev()
                                       |  ...
           (wait a while)              | tty_release()
                                       |  tty_release_struct()
                                       |   release_tty()
          sp_xmit_on_air()             |    tty_kref_put(tty_struct) //FREE
           pty_write(tty_struct) //USE |    ...
      
      The corresponding fail log is shown below:
      ===============================================================
      BUG: KASAN: use-after-free in __run_timers.part.0+0x170/0x470
      Write of size 8 at addr ffff88800a652ab8 by task swapper/2/0
      ...
      Call Trace:
        ...
        queue_work_on+0x3f/0x50
        pty_write+0xcd/0xe0pty_write+0xcd/0xe0
        sp_xmit_on_air+0xb2/0x1f0
        call_timer_fn+0x28/0x150
        __run_timers.part.0+0x3c2/0x470
        run_timer_softirq+0x3b/0x80
        __do_softirq+0xf1/0x380
        ...
      
      This patch reorders the del_timer_sync() after the unregister_netdev()
      to avoid UAF bugs. Because the unregister_netdev() is well synchronized,
      it flushs out any pending queues, waits the refcount of net_device
      decreases to zero and removes net_device from kernel. There is not any
      running routines after executing unregister_netdev(). Therefore, we could
      not arouse timer from userspace again.
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efe4186e
  3. 17 Feb, 2022 23 commits
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 7a2fb912
      Jakub Kicinski authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2022-02-17
      
      We've added 8 non-merge commits during the last 7 day(s) which contain
      a total of 8 files changed, 119 insertions(+), 15 deletions(-).
      
      The main changes are:
      
      1) Add schedule points in map batch ops, from Eric.
      
      2) Fix bpf_msg_push_data with len 0, from Felix.
      
      3) Fix crash due to incorrect copy_map_value, from Kumar.
      
      4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar.
      
      5) Fix a bpf_timer initialization issue with clang, from Yonghong.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Add schedule points in batch ops
        bpf: Fix crash due to out of bounds access into reg2btf_ids.
        selftests: bpf: Check bpf_msg_push_data return value
        bpf: Fix a bpf_timer initialization issue
        bpf: Emit bpf_timer in vmlinux BTF
        selftests/bpf: Add test for bpf_timer overwriting crash
        bpf: Fix crash due to incorrect copy_map_value
        bpf: Do not try bpf_msg_push_data with len 0
      ====================
      
      Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a2fb912
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8b97cae3
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wireless and netfilter.
      
        Current release - regressions:
      
         - dsa: lantiq_gswip: fix use after free in gswip_remove()
      
         - smc: avoid overwriting the copies of clcsock callback functions
      
        Current release - new code bugs:
      
         - iwlwifi:
            - fix use-after-free when no FW is present
            - mei: fix the pskb_may_pull check in ipv4
            - mei: retry mapping the shared area
            - mvm: don't feed the hardware RFKILL into iwlmei
      
        Previous releases - regressions:
      
         - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
      
         - tipc: fix wrong publisher node address in link publications
      
         - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
           assertion
      
         - bgmac: make idm and nicpm resource optional again
      
         - atl1c: fix tx timeout after link flap
      
        Previous releases - always broken:
      
         - vsock: remove vsock from connected table when connect is
           interrupted by a signal
      
         - ping: change destination interface checks to match raw sockets
      
         - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
           semantics (and null-deref) after SO_RESERVE_MEM was added
      
         - ipv6: make exclusive flowlabel checks per-netns
      
         - bonding: force carrier update when releasing slave
      
         - sched: limit TC_ACT_REPEAT loops
      
         - bridge: multicast: notify switchdev driver whenever MC processing
           gets disabled because of max entries reached
      
         - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found
      
         - iwlwifi: fix locking when "HW not ready"
      
         - phy: mediatek: remove PHY mode check on MT7531
      
         - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
      
         - dsa: lan9303:
            - fix polarity of reset during probe
            - fix accelerated VLAN handling"
      
      * tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
        bonding: force carrier update when releasing slave
        nfp: flower: netdev offload check for ip6gretap
        ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
        ipv4: fix data races in fib_alias_hw_flags_set
        net: dsa: lan9303: add VLAN IDs to master device
        net: dsa: lan9303: handle hwaccel VLAN tags
        vsock: remove vsock from connected table when connect is interrupted by a signal
        Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
        ping: fix the dif and sdif check in ping_lookup
        net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
        net: sched: limit TC_ACT_REPEAT loops
        tipc: fix wrong notification node addresses
        net: dsa: lantiq_gswip: fix use after free in gswip_remove()
        ipv6: per-netns exclusive flowlabel checks
        net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
        CDC-NCM: avoid overflow in sanity checking
        mctp: fix use after free
        net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
        bonding: fix data-races around agg_select_timer
        dpaa2-eth: Initialize mutex used in one step timestamping path
        ...
      8b97cae3
    • Zhang Changzhong's avatar
      bonding: force carrier update when releasing slave · a6ab75ce
      Zhang Changzhong authored
      In __bond_release_one(), bond_set_carrier() is only called when bond
      device has no slave. Therefore, if we remove the up slave from a master
      with two slaves and keep the down slave, the master will remain up.
      
      Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
      statement.
      
      Reproducer:
      $ insmod bonding.ko mode=0 miimon=100 max_bonds=2
      $ ifconfig bond0 up
      $ ifenslave bond0 eth0 eth1
      $ ifconfig eth0 down
      $ ifenslave -d bond0 eth1
      $ cat /proc/net/bonding/bond0
      
      Fixes: ff59c456 ("[PATCH] bonding: support carrier state for master")
      Signed-off-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a6ab75ce
    • Eric Dumazet's avatar
      bpf: Add schedule points in batch ops · 75134f16
      Eric Dumazet authored
      syzbot reported various soft lockups caused by bpf batch operations.
      
       INFO: task kworker/1:1:27 blocked for more than 140 seconds.
       INFO: task hung in rcu_barrier
      
      Nothing prevents batch ops to process huge amount of data,
      we need to add schedule points in them.
      
      Note that maybe_wait_bpf_programs(map) calls from
      generic_map_delete_batch() can be factorized by moving
      the call after the loop.
      
      This will be done later in -next tree once we get this fix merged,
      unless there is strong opinion doing this optimization sooner.
      
      Fixes: aa2e93b8 ("bpf: Add generic support for update and delete batch ops")
      Fixes: cb4d03ab ("bpf: Add generic support for lookup batch op")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarBrian Vazquez <brianvv@google.com>
      Link: https://lore.kernel.org/bpf/20220217181902.808742-1-eric.dumazet@gmail.com
      75134f16
    • Luis Chamberlain's avatar
      fs/file_table: fix adding missing kmemleak_not_leak() · a3580ac9
      Luis Chamberlain authored
      Commit b42bc9a3 ("Fix regression due to "fs: move binfmt_misc sysctl
      to its own file") fixed a regression, however it failed to add a
      kmemleak_not_leak().
      
      Fixes: b42bc9a3 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file")
      Reported-by: default avatarTong Zhang <ztong0001@gmail.com>
      Cc: Tong Zhang <ztong0001@gmail.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3580ac9
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of... · 2dd3a8a1
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix corrupt inject files when only last branch option is enabled with
         ARM CoreSight ETM
      
       - Fix use-after-free for realloc(..., 0) in libsubcmd, found by gcc 12
      
       - Defer freeing string after possible strlen() on it in the BPF loader,
         found by gcc 12
      
       - Avoid early exit in 'perf trace' due SIGCHLD from non-workload
         processes
      
       - Fix arm64 perf_event_attr 'perf test's wrt --call-graph
         initialization
      
       - Fix libperf 32-bit build for 'perf test' wrt uint64_t printf
      
       - Fix perf_cpu_map__for_each_cpu macro in libperf, providing access to
         the CPU iterator
      
       - Sync linux/perf_event.h UAPI with the kernel sources
      
       - Update Jiri Olsa's email address in MAINTAINERS
      
      * tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf bpf: Defer freeing string after possible strlen() on it
        perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization
        libsubcmd: Fix use-after-free for realloc(..., 0)
        libperf: Fix perf_cpu_map__for_each_cpu macro
        perf cs-etm: Fix corrupt inject files when only last branch option is enabled
        perf cs-etm: No-op refactor of synth opt usage
        libperf: Fix 32-bit build for tests uint64_t printf
        tools headers UAPI: Sync linux/perf_event.h with the kernel sources
        perf trace: Avoid early exit due SIGCHLD from non-workload processes
        MAINTAINERS: Update Jiri's email address
      2dd3a8a1
    • Linus Torvalds's avatar
      Merge tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux · edbd6c62
      Linus Torvalds authored
      Pull module fix from Luis Chamberlain:
       "Fixes module decompression when CONFIG_SYSFS=n
      
        The only fix trickled down for v5.17-rc cycle so far is the fix for
        module decompression when CONFIG_SYSFS=n. This was reported through
        0-day"
      
      * tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
        module: fix building with sysfs disabled
      edbd6c62
    • Danie du Toit's avatar
      nfp: flower: netdev offload check for ip6gretap · 7dbcda58
      Danie du Toit authored
      IPv6 GRE tunnels are not being offloaded, this is caused by a missing
      netdev offload check. The functionality of IPv6 GRE tunnel offloading
      was previously added but this check was not included. Adding the
      ip6gretap check allows IPv6 GRE tunnels to be offloaded correctly.
      
      Fixes: f7536ffb ("nfp: flower: Allow ipv6gretap interface for offloading")
      Signed-off-by: default avatarDanie du Toit <danie.dutoit@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220217124820.40436-1-louis.peens@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7dbcda58
    • Eric Dumazet's avatar
      ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt · d95d6320
      Eric Dumazet authored
      Because fib6_info_hw_flags_set() is called without any synchronization,
      all accesses to gi6->offload, fi->trap and fi->offload_failed
      need some basic protection like READ_ONCE()/WRITE_ONCE().
      
      BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt
      
      read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
       fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
       fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
       fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
       __ip6_del_rt net/ipv6/route.c:3876 [inline]
       ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
       __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
       ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
       __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
       ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
       inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
       __sock_release net/socket.c:650 [inline]
       sock_close+0x6c/0x150 net/socket.c:1318
       __fput+0x295/0x520 fs/file_table.c:280
       ____fput+0x11/0x20 fs/file_table.c:313
       task_work_run+0x8e/0x110 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
       exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
       fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
       nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
       nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
       nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
       nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
       nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x22 -> 0x2a
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 0c5fcf9e ("IPv6: Add "offload failed" indication to routes")
      Fixes: bb3c4ab9 ("ipv6: Add "offload" and "trap" indications to routes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Amit Cohen <amcohen@nvidia.com>
      Cc: Ido Schimmel <idosch@nvidia.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d95d6320
    • Eric Dumazet's avatar
      ipv4: fix data races in fib_alias_hw_flags_set · 9fcf986c
      Eric Dumazet authored
      fib_alias_hw_flags_set() can be used by concurrent threads,
      and is only RCU protected.
      
      We need to annotate accesses to following fields of struct fib_alias:
      
          offload, trap, offload_failed
      
      Because of READ_ONCE()WRITE_ONCE() limitations, make these
      field u8.
      
      BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set
      
      read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
       fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
       nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
       nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
       nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
       nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
       nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       process_scheduled_works kernel/workqueue.c:2370 [inline]
       worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
       fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
       nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
       nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
       nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
       nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
       nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       process_scheduled_works kernel/workqueue.c:2370 [inline]
       worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00 -> 0x02
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e8-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 90b93f1b ("ipv4: Add "offload" and "trap" indications to routes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9fcf986c
    • Mans Rullgard's avatar
      net: dsa: lan9303: add VLAN IDs to master device · 430065e2
      Mans Rullgard authored
      If the master device does VLAN filtering, the IDs used by the switch
      must be added for any frames to be received.  Do this in the
      port_enable() function, and remove them in port_disable().
      
      Fixes: a1292595 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      430065e2
    • Mans Rullgard's avatar
      net: dsa: lan9303: handle hwaccel VLAN tags · 017b355b
      Mans Rullgard authored
      Check for a hwaccel VLAN tag on rx and use it if present.  Otherwise,
      use __skb_vlan_pop() like the other tag parsers do.  This fixes the case
      where the VLAN tag has already been consumed by the master.
      
      Fixes: a1292595 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      017b355b
    • Linus Torvalds's avatar
      mm: don't try to NUMA-migrate COW pages that have other uses · 80d47f5d
      Linus Torvalds authored
      Oded Gabbay reports that enabling NUMA balancing causes corruption with
      his Gaudi accelerator test load:
      
       "All the details are in the bug, but the bottom line is that somehow,
        this patch causes corruption when the numa balancing feature is
        enabled AND we don't use process affinity AND we use GUP to pin pages
        so our accelerator can DMA to/from system memory.
      
        Either disabling numa balancing, using process affinity to bind to
        specific numa-node or reverting this patch causes the bug to
        disappear"
      
      and Oded bisected the issue to commit 09854ba9 ("mm: do_wp_page()
      simplification").
      
      Now, the NUMA balancing shouldn't actually be changing the writability
      of a page, and as such shouldn't matter for COW.  But it appears it
      does.  Suspicious.
      
      However, regardless of that, the condition for enabling NUMA faults in
      change_pte_range() is nonsensical.  It uses "page_mapcount(page)" to
      decide if a COW page should be NUMA-protected or not, and that makes
      absolutely no sense.
      
      The number of mappings a page has is irrelevant: not only does GUP get a
      reference to a page as in Oded's case, but the other mappings migth be
      paged out and the only reference to them would be in the page count.
      
      Since we should never try to NUMA-balance a page that we can't move
      anyway due to other references, just fix the code to use 'page_count()'.
      Oded confirms that that fixes his issue.
      
      Now, this does imply that something in NUMA balancing ends up changing
      page protections (other than the obvious one of making the page
      inaccessible to get the NUMA faulting information).  Otherwise the COW
      simplification wouldn't matter - since doing the GUP on the page would
      make sure it's writable.
      
      The cause of that permission change would be good to figure out too,
      since it clearly results in spurious COW events - but fixing the
      nonsensical test that just happened to work before is obviously the
      CorrectThing(tm) to do regardless.
      
      Fixes: 09854ba9 ("mm: do_wp_page() simplification")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215616
      Link: https://lore.kernel.org/all/CAFCwf10eNmwq2wD71xjUhqkvv5+_pJMR1nPug2RqNDcFT4H86Q@mail.gmail.com/Reported-and-tested-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80d47f5d
    • Seth Forshee's avatar
      vsock: remove vsock from connected table when connect is interrupted by a signal · b9208492
      Seth Forshee authored
      vsock_connect() expects that the socket could already be in the
      TCP_ESTABLISHED state when the connecting task wakes up with a signal
      pending. If this happens the socket will be in the connected table, and
      it is not removed when the socket state is reset. In this situation it's
      common for the process to retry connect(), and if the connection is
      successful the socket will be added to the connected table a second
      time, corrupting the list.
      
      Prevent this by calling vsock_remove_connected() if a signal is received
      while waiting for a connection. This is harmless if the socket is not in
      the connected table, and if it is in the table then removing it will
      prevent list corruption from a double add.
      
      Note for backporting: this patch requires d5afa82c ("vsock: correct
      removal of socket from the list"), which is in all current stable trees
      except 4.9.y.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Signed-off-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b9208492
    • Jonas Gorski's avatar
      Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname" · 6aba04ee
      Jonas Gorski authored
      This reverts commit 3710e809.
      
      Since idm_base and nicpm_base are still optional resources not present
      on all platforms, this breaks the driver for everything except Northstar
      2 (which has both).
      
      The same change was already reverted once with 755f5738 ("net:
      broadcom: fix a mistake about ioremap resource").
      
      So let's do it again.
      
      Fixes: 3710e809 ("net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname")
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      [florian: Added comments to explain the resources are optional]
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220216184634.2032460-1-f.fainelli@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6aba04ee
    • Xin Long's avatar
      ping: fix the dif and sdif check in ping_lookup · 35a79e64
      Xin Long authored
      When 'ping' changes to use PING socket instead of RAW socket by:
      
         # sysctl -w net.ipv4.ping_group_range="0 100"
      
      There is another regression caused when matching sk_bound_dev_if
      and dif, RAW socket is using inet_iif() while PING socket lookup
      is using skb->dev->ifindex, the cmd below fails due to this:
      
        # ip link add dummy0 type dummy
        # ip link set dummy0 up
        # ip addr add 192.168.111.1/24 dev dummy0
        # ping -I dummy0 192.168.111.1 -c1
      
      The issue was also reported on:
      
        https://github.com/iputils/iputils/issues/104
      
      But fixed in iputils in a wrong way by not binding to device when
      destination IP is on device, and it will cause some of kselftests
      to fail, as Jianlin noticed.
      
      This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
      sdif for PING socket, and keep consistent with RAW socket.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35a79e64
    • Daniele Palmas's avatar
      net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990 · 21e8a963
      Daniele Palmas authored
      Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FN990
      0x1071 composition in order to avoid bind error.
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21e8a963
    • Arnaldo Carvalho de Melo's avatar
      perf bpf: Defer freeing string after possible strlen() on it · 31ded153
      Arnaldo Carvalho de Melo authored
      This was detected by the gcc in Fedora Rawhide's gcc:
      
        50    11.01 fedora:rawhide                : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC)
              inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9:
          util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free]
           1225 |                 *key_scan_pos += strlen(map_opt);
                |                                  ^~~~~~~~~~~~~~~
          util/bpf-loader.c:1223:9: note: call to 'free' here
           1223 |         free(map_name);
                |         ^~~~~~~~~~~~~~
          cc1: all warnings being treated as errors
      
      So do the calculations on the pointer before freeing it.
      
      Fixes: 04f9bf2b ("perf bpf-loader: Add missing '*' for key_scan_pos")
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
      Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      31ded153
    • Eric Dumazet's avatar
      net: sched: limit TC_ACT_REPEAT loops · 5740d068
      Eric Dumazet authored
      We have been living dangerously, at the mercy of malicious users,
      abusing TC_ACT_REPEAT, as shown by this syzpot report [1].
      
      Add an arbitrary limit (32) to the number of times an action can
      return TC_ACT_REPEAT.
      
      v2: switch the limit to 32 instead of 10.
          Use net_warn_ratelimited() instead of pr_err_once().
      
      [1] (C repro available on demand)
      
      rcu: INFO: rcu_preempt self-detected stall on CPU
      rcu:    1-...!: (10500 ticks this GP) idle=021/1/0x4000000000000000 softirq=5592/5592 fqs=0
              (t=10502 jiffies g=5305 q=190)
      rcu: rcu_preempt kthread timer wakeup didn't happen for 10502 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
      rcu:    Possible timer handling issue on cpu=0 timer-softirq=3527
      rcu: rcu_preempt kthread starved for 10505 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
      rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
      rcu: RCU grace-period kthread stack dump:
      task:rcu_preempt     state:I stack:29344 pid:   14 ppid:     2 flags:0x00004000
      Call Trace:
       <TASK>
       context_switch kernel/sched/core.c:4986 [inline]
       __schedule+0xab2/0x4db0 kernel/sched/core.c:6295
       schedule+0xd2/0x260 kernel/sched/core.c:6368
       schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
       rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1963
       rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2136
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
       </TASK>
      rcu: Stack dump where RCU GP kthread last ran:
      Sending NMI from CPU 1 to CPUs 0:
      NMI backtrace for cpu 0
      CPU: 0 PID: 3646 Comm: syz-executor358 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline]
      RIP: 0010:cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline]
      RIP: 0010:pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:437 [inline]
      RIP: 0010:__pv_queued_spin_lock_slowpath+0x3b8/0xb40 kernel/locking/qspinlock.c:508
      Code: 48 89 eb c6 45 01 01 41 bc 00 80 00 00 48 c1 e9 03 83 e3 07 41 be 01 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 2c 01 eb 0c <f3> 90 41 83 ec 01 0f 84 72 04 00 00 41 0f b6 45 00 38 d8 7f 08 84
      RSP: 0018:ffffc9000283f1b0 EFLAGS: 00000206
      RAX: 0000000000000003 RBX: 0000000000000000 RCX: 1ffff1100fc0071e
      RDX: 0000000000000001 RSI: 0000000000000201 RDI: 0000000000000000
      RBP: ffff88807e0038f0 R08: 0000000000000001 R09: ffffffff8ffbf9ff
      R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000004c1e
      R13: ffffed100fc0071e R14: 0000000000000001 R15: ffff8880b9c3aa80
      FS:  00005555562bf300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffdbfef12b8 CR3: 00000000723c2000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
       queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
       queued_spin_lock include/asm-generic/qspinlock.h:85 [inline]
       do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:115
       spin_lock_bh include/linux/spinlock.h:354 [inline]
       sch_tree_lock include/net/sch_generic.h:610 [inline]
       sch_tree_lock include/net/sch_generic.h:605 [inline]
       prio_tune+0x3b9/0xb50 net/sched/sch_prio.c:211
       prio_init+0x5c/0x80 net/sched/sch_prio.c:244
       qdisc_create.constprop.0+0x44a/0x10f0 net/sched/sch_api.c:1253
       tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
       rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5594
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f7ee98aae99
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffdbfef12d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007ffdbfef1300 RCX: 00007f7ee98aae99
      RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
      R10: 000000000000000d R11: 0000000000000246 R12: 00007ffdbfef12f0
      R13: 00000000000f4240 R14: 000000000004ca47 R15: 00007ffdbfef12e4
       </TASK>
      INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.293 msecs
      NMI backtrace for cpu 1
      CPU: 1 PID: 3260 Comm: kworker/1:3 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: mld mld_ifc_work
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
       nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
       trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
       rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
       print_cpu_stall kernel/rcu/tree_stall.h:604 [inline]
       check_cpu_stall kernel/rcu/tree_stall.h:688 [inline]
       rcu_pending kernel/rcu/tree.c:3919 [inline]
       rcu_sched_clock_irq.cold+0x5c/0x759 kernel/rcu/tree.c:2617
       update_process_times+0x16d/0x200 kernel/time/timer.c:1785
       tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
       tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
       __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
       __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
       hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
       local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
       __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
       sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
       </IRQ>
       <TASK>
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
      RIP: 0010:__sanitizer_cov_trace_const_cmp4+0xc/0x70 kernel/kcov.c:286
      Code: 00 00 00 48 89 7c 30 e8 48 89 4c 30 f0 4c 89 54 d8 20 48 89 10 5b c3 0f 1f 80 00 00 00 00 41 89 f8 bf 03 00 00 00 4c 8b 14 24 <89> f1 65 48 8b 34 25 00 70 02 00 e8 14 f9 ff ff 84 c0 74 4b 48 8b
      RSP: 0018:ffffc90002c5eea8 EFLAGS: 00000246
      RAX: 0000000000000007 RBX: ffff88801c625800 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: ffff8880137d3100 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff874fcd88 R11: 0000000000000000 R12: ffff88801d692dc0
      R13: ffff8880137d3104 R14: 0000000000000000 R15: ffff88801d692de8
       tcf_police_act+0x358/0x11d0 net/sched/act_police.c:256
       tcf_action_exec net/sched/act_api.c:1049 [inline]
       tcf_action_exec+0x1a6/0x530 net/sched/act_api.c:1026
       tcf_exts_exec include/net/pkt_cls.h:326 [inline]
       route4_classify+0xef0/0x1400 net/sched/cls_route.c:179
       __tcf_classify net/sched/cls_api.c:1549 [inline]
       tcf_classify+0x3e8/0x9d0 net/sched/cls_api.c:1615
       prio_classify net/sched/sch_prio.c:42 [inline]
       prio_enqueue+0x3a7/0x790 net/sched/sch_prio.c:75
       dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3668
       __dev_xmit_skb net/core/dev.c:3756 [inline]
       __dev_queue_xmit+0x1f61/0x3660 net/core/dev.c:4081
       neigh_hh_output include/net/neighbour.h:533 [inline]
       neigh_output include/net/neighbour.h:547 [inline]
       ip_finish_output2+0x14dc/0x2170 net/ipv4/ip_output.c:228
       __ip_finish_output net/ipv4/ip_output.c:306 [inline]
       __ip_finish_output+0x396/0x650 net/ipv4/ip_output.c:288
       ip_finish_output+0x32/0x200 net/ipv4/ip_output.c:316
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip_output+0x196/0x310 net/ipv4/ip_output.c:430
       dst_output include/net/dst.h:451 [inline]
       ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:126
       iptunnel_xmit+0x628/0xa50 net/ipv4/ip_tunnel_core.c:82
       geneve_xmit_skb drivers/net/geneve.c:966 [inline]
       geneve_xmit+0x10c8/0x3530 drivers/net/geneve.c:1077
       __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
       netdev_start_xmit include/linux/netdevice.h:4697 [inline]
       xmit_one net/core/dev.c:3473 [inline]
       dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489
       __dev_queue_xmit+0x2985/0x3660 net/core/dev.c:4116
       neigh_hh_output include/net/neighbour.h:533 [inline]
       neigh_output include/net/neighbour.h:547 [inline]
       ip6_finish_output2+0xf7a/0x14f0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170
       ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:451 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       mld_sendpack+0x9a3/0xe40 net/ipv6/mcast.c:1826
       mld_send_cr net/ipv6/mcast.c:2127 [inline]
       mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2659
       process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
       </TASK>
      ----------------
      Code disassembly (best guess):
         0:   48 89 eb                mov    %rbp,%rbx
         3:   c6 45 01 01             movb   $0x1,0x1(%rbp)
         7:   41 bc 00 80 00 00       mov    $0x8000,%r12d
         d:   48 c1 e9 03             shr    $0x3,%rcx
        11:   83 e3 07                and    $0x7,%ebx
        14:   41 be 01 00 00 00       mov    $0x1,%r14d
        1a:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
        21:   fc ff df
        24:   4c 8d 2c 01             lea    (%rcx,%rax,1),%r13
        28:   eb 0c                   jmp    0x36
      * 2a:   f3 90                   pause <-- trapping instruction
        2c:   41 83 ec 01             sub    $0x1,%r12d
        30:   0f 84 72 04 00 00       je     0x4a8
        36:   41 0f b6 45 00          movzbl 0x0(%r13),%eax
        3b:   38 d8                   cmp    %bl,%al
        3d:   7f 08                   jg     0x47
        3f:   84                      .byte 0x84
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220215235305.3272331-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5740d068
    • Jon Maloy's avatar
      tipc: fix wrong notification node addresses · c08e5843
      Jon Maloy authored
      The previous bug fix had an unfortunate side effect that broke
      distribution of binding table entries between nodes. The updated
      tipc_sock_addr struct is also used further down in the same
      function, and there the old value is still the correct one.
      
      Fixes: 032062f3 ("tipc: fix wrong publisher node address in link publications")
      Signed-off-by: default avatarJon Maloy <jmaloy@redhat.com>
      Link: https://lore.kernel.org/r/20220216020009.3404578-1-jmaloy@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c08e5843
    • Alexey Khoroshilov's avatar
      net: dsa: lantiq_gswip: fix use after free in gswip_remove() · 8c6ae461
      Alexey Khoroshilov authored
      of_node_put(priv->ds->slave_mii_bus->dev.of_node) should be
      done before mdiobus_free(priv->ds->slave_mii_bus).
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Fixes: 0d120dfb ("net: dsa: lantiq_gswip: don't use devres for mdiobus")
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/1644921768-26477-1-git-send-email-khoroshilov@ispras.ruSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8c6ae461
    • Willem de Bruijn's avatar
      ipv6: per-netns exclusive flowlabel checks · 0b0dff5b
      Willem de Bruijn authored
      Ipv6 flowlabels historically require a reservation before use.
      Optionally in exclusive mode (e.g., user-private).
      
      Commit 59c820b2 ("ipv6: elide flowlabel check if no exclusive
      leases exist") introduced a fastpath that avoids this check when no
      exclusive leases exist in the system, and thus any flowlabel use
      will be granted.
      
      That allows skipping the control operation to reserve a flowlabel
      entirely. Though with a warning if the fast path fails:
      
        This is an optimization. Robust applications still have to revert to
        requesting leases if the fast path fails due to an exclusive lease.
      
      Still, this is subtle. Better isolate network namespaces from each
      other. Flowlabels are per-netns. Also record per-netns whether
      exclusive leases are in use. Then behavior does not change based on
      activity in other netns.
      
      Changes
        v2
          - wrap in IS_ENABLED(CONFIG_IPV6) to avoid breakage if disabled
      
      Fixes: 59c820b2 ("ipv6: elide flowlabel check if no exclusive leases exist")
      Link: https://lore.kernel.org/netdev/MWHPR2201MB1072BCCCFCE779E4094837ACD0329@MWHPR2201MB1072.namprd22.prod.outlook.com/Reported-by: default avatarCongyu Liu <liu3101@purdue.edu>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Tested-by: default avatarCongyu Liu <liu3101@purdue.edu>
      Link: https://lore.kernel.org/r/20220215160037.1976072-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0b0dff5b
    • Oleksandr Mazur's avatar
      net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled · c832962a
      Oleksandr Mazur authored
      Whenever bridge driver hits the max capacity of MDBs, it disables
      the MC processing (by setting corresponding bridge option), but never
      notifies switchdev about such change (the notifiers are called only upon
      explicit setting of this option, through the registered netlink interface).
      
      This could lead to situation when Software MDB processing gets disabled,
      but this event never gets offloaded to the underlying Hardware.
      
      Fix this by adding a notify message in such case.
      
      Fixes: 147c1e9b ("switchdev: bridge: Offload multicast disabled")
      Signed-off-by: default avatarOleksandr Mazur <oleksandr.mazur@plvision.eu>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20220215165303.31908-1-oleksandr.mazur@plvision.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c832962a
  4. 16 Feb, 2022 6 commits
    • Dmitry Torokhov's avatar
      module: fix building with sysfs disabled · a8e8f851
      Dmitry Torokhov authored
      Sysfs support might be disabled so we need to guard the code that
      instantiates "compression" attribute with an #ifdef.
      
      Fixes: b1ae6dc4 ("module: add in-kernel support for decompressing")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      a8e8f851
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fix crash due to out of bounds access into reg2btf_ids. · 45ce4b4f
      Kumar Kartikeya Dwivedi authored
      When commit e6ac2450 ("bpf: Support bpf program calling kernel function") added
      kfunc support, it defined reg2btf_ids as a cheap way to translate the verifier
      reg type to the appropriate btf_vmlinux BTF ID, however
      commit c25b2ae1 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
      moved the __BPF_REG_TYPE_MAX from the last member of bpf_reg_type enum to after
      the base register types, and defined other variants using type flag
      composition. However, now, the direct usage of reg->type to index into
      reg2btf_ids may no longer fall into __BPF_REG_TYPE_MAX range, and hence lead to
      out of bounds access and kernel crash on dereference of bad pointer.
      
      Fixes: c25b2ae1 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220216201943.624869-1-memxor@gmail.com
      45ce4b4f
    • Linus Torvalds's avatar
      Merge tag 'mmc-v5.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · f71077a4
      Linus Torvalds authored
      Pull MMC fix from Ulf Hansson:
       "Fix recovery logic for multi block I/O reads (MMC_READ_MULTIPLE_BLOCK)"
      
      * tag 'mmc-v5.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: block: fix read single on recovery logic
      f71077a4
    • Linus Torvalds's avatar
      tty: n_tty: do not look ahead for EOL character past the end of the buffer · 35930307
      Linus Torvalds authored
      Daniel Gibson reports that the n_tty code gets line termination wrong in
      very specific cases:
      
       "If you feed a line with exactly 64 chars + terminating newline, and
        directly afterwards (without reading) another line into a pseudo
        terminal, the the first read() on the other side will return the 64
        char line *without* terminating newline, and the next read() will
        return the missing terminating newline AND the complete next line (if
        it fits in the buffer)"
      
      and bisected the behavior to commit 3b830a9c ("tty: convert
      tty_ldisc_ops 'read()' function to take a kernel pointer").
      
      Now, digging deeper, it turns out that the behavior isn't exactly new:
      what changed in commit 3b830a9c was that the tty line discipline
      .read() function is now passed an intermediate kernel buffer rather than
      the final user space buffer.
      
      And that intermediate kernel buffer is 64 bytes in size - thus that
      special case with exactly 64 bytes plus terminating newline.
      
      The same problem did exist before, but historically the boundary was not
      the 64-byte chunk, but the user-supplied buffer size, which is obviously
      generally bigger (and potentially bigger than N_TTY_BUF_SIZE, which
      would hide the issue entirely).
      
      The reason is that the n_tty canon_copy_from_read_buf() code would look
      ahead for the EOL character one byte further than it would actually
      copy.  It would then decide that it had found the terminator, and unmark
      it as an EOL character - which in turn explains why the next read
      wouldn't then be terminated by it.
      
      Now, the reason it did all this in the first place is related to some
      historical and pretty obscure EOF behavior, see commit ac8f3bf8
      ("n_tty: Fix poll() after buffer-limited eof push read") and commit
      40d5e090 ("n_tty: Fix EOF push handling").
      
      And the reason for the EOL confusion is that we treat EOF as a special
      EOL condition, with the EOL character being NUL (aka "__DISABLED_CHAR"
      in the kernel sources).
      
      So that EOF look-ahead also affects the normal EOL handling.
      
      This patch just removes the look-ahead that causes problems, because EOL
      is much more critical than the historical "EOF in the middle of a line
      that coincides with the end of the buffer" handling ever was.
      
      Now, it is possible that we should indeed re-introduce the "look at next
      character to see if it's a EOF" behavior, but if so, that should be done
      not at the kernel buffer chunk boundary in canon_copy_from_read_buf(),
      but at a higher level, when we run out of the user buffer.
      
      In particular, the place to do that would be at the top of
      'n_tty_read()', where we check if it's a continuation of a previously
      started read, and there is no more buffer space left, we could decide to
      just eat the __DISABLED_CHAR at that point.
      
      But that would be a separate patch, because I suspect nobody actually
      cares, and I'd like to get a report about it before bothering.
      
      Fixes: 3b830a9c ("tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer")
      Fixes: ac8f3bf8 ("n_tty: Fix  poll() after buffer-limited eof push read")
      Fixes: 40d5e090 ("n_tty: Fix EOF push handling")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215611Reported-and-tested-by: default avatarDaniel Gibson <metalcaedes@gmail.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      35930307
    • German Gomez's avatar
      perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization · 047e6032
      German Gomez authored
      The struct perf_event_attr is initialised differently in Arm64 when
      recording in call-graph fp mode, so update the relevant tests, and add
      two extra arm64-only tests.
      
      Before:
      
        $ perf test 17 -v
        17: Setup struct perf_event_attr
        [...]
        running './tests/attr/test-record-graph-default'
        expected sample_type=295, got 4391
        expected sample_regs_user=0, got 1073741824
        FAILED './tests/attr/test-record-graph-default' - match failure
        test child finished with -1
        ---- end ----
      
      After:
      
      [...]
        running './tests/attr/test-record-graph-default-aarch64'
        test limitation 'aarch64'
        running './tests/attr/test-record-graph-fp-aarch64'
        test limitation 'aarch64'
        running './tests/attr/test-record-graph-default'
        test limitation '!aarch64'
        excluded architecture list ['aarch64']
        skipped [aarch64] './tests/attr/test-record-graph-default'
        running './tests/attr/test-record-graph-fp'
        test limitation '!aarch64'
        excluded architecture list ['aarch64']
        skipped [aarch64] './tests/attr/test-record-graph-fp'
      [...]
      
      Fixes: 7248e308 ("perf tools: Record ARM64 LR register automatically")
      Signed-off-by: default avatarGerman Gomez <german.gomez@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexandre Truong <alexandre.truong@arm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: KP Singh <kpsingh@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: http://lore.kernel.org/lkml/20220125104435.2737-1-german.gomez@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      047e6032
    • Kees Cook's avatar
      libsubcmd: Fix use-after-free for realloc(..., 0) · 52a9dab6
      Kees Cook authored
      GCC 12 correctly reports a potential use-after-free condition in the
      xrealloc helper. Fix the warning by avoiding an implicit "free(ptr)"
      when size == 0:
      
      In file included from help.c:12:
      In function 'xrealloc',
          inlined from 'add_cmdname' at help.c:24:2: subcmd-util.h:56:23: error: pointer may be used after 'realloc' [-Werror=use-after-free]
         56 |                 ret = realloc(ptr, size);
            |                       ^~~~~~~~~~~~~~~~~~
      subcmd-util.h:52:21: note: call to 'realloc' here
         52 |         void *ret = realloc(ptr, size);
            |                     ^~~~~~~~~~~~~~~~~~
      subcmd-util.h:58:31: error: pointer may be used after 'realloc' [-Werror=use-after-free]
         58 |                         ret = realloc(ptr, 1);
            |                               ^~~~~~~~~~~~~~~
      subcmd-util.h:52:21: note: call to 'realloc' here
         52 |         void *ret = realloc(ptr, size);
            |                     ^~~~~~~~~~~~~~~~~~
      
      Fixes: 2f4ce5ec ("perf tools: Finalize subcmd independence")
      Reported-by: default avatarValdis Klētnieks <valdis.kletnieks@vt.edu>
      Signed-off-by: default avatarKees Kook <keescook@chromium.org>
      Tested-by: default avatarValdis Klētnieks <valdis.kletnieks@vt.edu>
      Tested-by: default avatarJustin M. Forbes <jforbes@fedoraproject.org>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: linux-hardening@vger.kernel.org
      Cc: Valdis Klētnieks <valdis.kletnieks@vt.edu>
      Link: http://lore.kernel.org/lkml/20220213182443.4037039-1-keescook@chromium.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      52a9dab6