1. 12 May, 2023 4 commits
    • Jiri Pirko's avatar
      devlink: change per-devlink netdev notifier to static one · e93c9378
      Jiri Pirko authored
      The commit 565b4824 ("devlink: change port event netdev notifier
      from per-net to global") changed original per-net notifier to be
      per-devlink instance. That fixed the issue of non-receiving events
      of netdev uninit if that moved to a different namespace.
      That worked fine in -net tree.
      
      However, later on when commit ee75f1fc ("net/mlx5e: Create
      separate devlink instance for ethernet auxiliary device") and
      commit 72ed5d56 ("net/mlx5: Suspend auxiliary devices only in
      case of PCI device suspend") were merged, a deadlock was introduced
      when removing a namespace with devlink instance with another nested
      instance.
      
      Here there is the bad flow example resulting in deadlock with mlx5:
      net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
      devlink_pernet_pre_exit() -> devlink_reload() ->
      mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
      mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
      mlx5e_destroy_devlink() -> devlink_free() ->
      unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)
      
      Steps to reproduce:
      $ modprobe mlx5_core
      $ ip netns add ns1
      $ devlink dev reload pci/0000:08:00.0 netns ns1
      $ ip netns del ns1
      
      Resolve this by converting the notifier from per-devlink instance to
      a static one registered during init phase and leaving it registered
      forever. Use this notifier for all devlink port instances created
      later on.
      
      Note what a tree needs this fix only in case all of the cited fixes
      commits are present.
      Reported-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Fixes: 565b4824 ("devlink: change port event netdev notifier from per-net to global")
      Fixes: ee75f1fc ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
      Fixes: 72ed5d56 ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230510144621.932017-1-jiri@resnulli.usSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e93c9378
    • Jakub Kicinski's avatar
      Merge branch 'selftests-seg6-make-srv6_end_dt4_l3vpn_test-more-robust' · 7ce93d6f
      Jakub Kicinski authored
      Andrea Mayer says:
      
      ====================
      selftests: seg6: make srv6_end_dt4_l3vpn_test more robust
      
      This pachset aims to improve and make more robust the selftests performed to
      check whether SRv6 End.DT4 beahvior works as expected under different system
      configurations.
      Some Linux distributions enable Deduplication Address Detection and Reverse
      Path Filtering mechanisms by default which can interfere with SRv6 End.DT4
      behavior and cause selftests to fail.
      
      The following patches improve selftests for End.DT4 by taking these two
      mechanisms into account. Specifically:
       - patch 1/2: selftests: seg6: disable DAD on IPv6 router cfg for
                    srv6_end_dt4_l3vpn_test
       - patch 2/2: selftets: seg6: disable rp_filter by default in
                    srv6_end_dt4_l3vpn_test
      ====================
      
      Link: https://lore.kernel.org/r/20230510111638.12408-1-andrea.mayer@uniroma2.itSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7ce93d6f
    • Andrea Mayer's avatar
      selftets: seg6: disable rp_filter by default in srv6_end_dt4_l3vpn_test · f97b8401
      Andrea Mayer authored
      On some distributions, the rp_filter is automatically set (=1) by
      default on a netdev basis (also on VRFs).
      In an SRv6 End.DT4 behavior, decapsulated IPv4 packets are routed using
      the table associated with the VRF bound to that tunnel. During lookup
      operations, the rp_filter can lead to packet loss when activated on the
      VRF.
      Therefore, we chose to make this selftest more robust by explicitly
      disabling the rp_filter during tests (as it is automatically set by some
      Linux distributions).
      
      Fixes: 2195444e ("selftests: add selftest for the SRv6 End.DT4 behavior")
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Tested-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f97b8401
    • Andrea Mayer's avatar
      selftests: seg6: disable DAD on IPv6 router cfg for srv6_end_dt4_l3vpn_test · 21a933c7
      Andrea Mayer authored
      The srv6_end_dt4_l3vpn_test instantiates a virtual network consisting of
      several routers (rt-1, rt-2) and hosts.
      When the IPv6 addresses of rt-{1,2} routers are configured, the Deduplicate
      Address Detection (DAD) kicks in when enabled in the Linux distros running
      the selftests. DAD is used to check whether an IPv6 address is already
      assigned in a network. Such a mechanism consists of sending an ICMPv6 Echo
      Request and waiting for a reply.
      As the DAD process could take too long to complete, it may cause the
      failing of some tests carried out by the srv6_end_dt4_l3vpn_test script.
      
      To make the srv6_end_dt4_l3vpn_test more robust, we disable DAD on routers
      since we configure the virtual network manually and do not need any address
      deduplication mechanism at all.
      
      Fixes: 2195444e ("selftests: add selftest for the SRv6 End.DT4 behavior")
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      21a933c7
  2. 11 May, 2023 8 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 6e27831b
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from netfilter.
      
        Current release - regressions:
      
         - mtk_eth_soc: fix NULL pointer dereference
      
        Previous releases - regressions:
      
         - core:
            - skb_partial_csum_set() fix against transport header magic value
            - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
            - annotate sk->sk_err write from do_recvmmsg()
            - add vlan_get_protocol_and_depth() helper
      
         - netlink: annotate accesses to nlk->cb_running
      
         - netfilter: always release netdev hooks from notifier
      
        Previous releases - always broken:
      
         - core: deal with most data-races in sk_wait_event()
      
         - netfilter: fix possible bug_on with enable_hooks=1
      
         - eth: bonding: fix send_peer_notif overflow
      
         - eth: xpcs: fix incorrect number of interfaces
      
         - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb
      
         - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"
      
      * tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
        af_unix: Fix data races around sk->sk_shutdown.
        af_unix: Fix a data race of sk->sk_receive_queue->qlen.
        net: datagram: fix data-races in datagram_poll()
        net: mscc: ocelot: fix stat counter register values
        ipvlan:Fix out-of-bounds caused by unclear skb->cb
        docs: networking: fix x25-iface.rst heading & index order
        gve: Remove the code of clearing PBA bit
        tcp: add annotations around sk->sk_shutdown accesses
        net: add vlan_get_protocol_and_depth() helper
        net: pcs: xpcs: fix incorrect number of interfaces
        net: deal with most data-races in sk_wait_event()
        net: annotate sk->sk_err write from do_recvmmsg()
        netlink: annotate accesses to nlk->cb_running
        kselftest: bonding: add num_grat_arp test
        selftests: forwarding: lib: add netns support for tc rule handle stats get
        Documentation: bonding: fix the doc of peer_notif_delay
        bonding: fix send_peer_notif overflow
        net: ethernet: mtk_eth_soc: fix NULL pointer dereference
        selftests: nft_flowtable.sh: check ingress/egress chain too
        selftests: nft_flowtable.sh: monitor result file sizes
        ...
      6e27831b
    • Linus Torvalds's avatar
      Merge tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 691e1eee
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
      
       - fix some unused-variable warning in mtk-mdp3
      
       - ignore unused suspend operations in nxp
      
       - some driver fixes in rcar-vin
      
      * tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: platform: mtk-mdp3: work around unused-variable warning
        media: nxp: ignore unused suspend operations
        media: rcar-vin: Select correct interrupt mode for V4L2_FIELD_ALTERNATE
        media: rcar-vin: Fix NV12 size alignment
        media: rcar-vin: Gen3 can not scale NV12
      691e1eee
    • Jakub Kicinski's avatar
      Merge tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · cceac926
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Fix UAF when releasing netnamespace, from Florian Westphal.
      
      2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
         from Florian Westphal.
      
      3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.
      
      4) Extend nft_flowtable.sh selftest to cover integration with
         ingress/egress hooks, from Florian Westphal.
      
      * tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        selftests: nft_flowtable.sh: check ingress/egress chain too
        selftests: nft_flowtable.sh: monitor result file sizes
        selftests: nft_flowtable.sh: wait for specific nc pids
        selftests: nft_flowtable.sh: no need for ps -x option
        selftests: nft_flowtable.sh: use /proc for pid checking
        netfilter: conntrack: fix possible bug_on with enable_hooks=1
        netfilter: nf_tables: always release netdev hooks from notifier
      ====================
      
      Link: https://lore.kernel.org/r/20230510083313.152961-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cceac926
    • Jakub Kicinski's avatar
      Merge branch 'af_unix-fix-two-data-races-reported-by-kcsan' · 33dcee99
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      af_unix: Fix two data races reported by KCSAN.
      
      KCSAN reported data races around these two fields for AF_UNIX sockets.
      
        * sk->sk_receive_queue->qlen
        * sk->sk_shutdown
      
      Let's annotate them properly.
      ====================
      
      Link: https://lore.kernel.org/r/20230510003456.42357-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      33dcee99
    • Kuniyuki Iwashima's avatar
      af_unix: Fix data races around sk->sk_shutdown. · e1d09c2c
      Kuniyuki Iwashima authored
      KCSAN found a data race around sk->sk_shutdown where unix_release_sock()
      and unix_shutdown() update it under unix_state_lock(), OTOH unix_poll()
      and unix_dgram_poll() read it locklessly.
      
      We need to annotate the writes and reads with WRITE_ONCE() and READ_ONCE().
      
      BUG: KCSAN: data-race in unix_poll / unix_release_sock
      
      write to 0xffff88800d0f8aec of 1 bytes by task 264 on cpu 0:
       unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631
       unix_release+0x59/0x80 net/unix/af_unix.c:1042
       __sock_release+0x7d/0x170 net/socket.c:653
       sock_close+0x19/0x30 net/socket.c:1397
       __fput+0x179/0x5e0 fs/file_table.c:321
       ____fput+0x15/0x20 fs/file_table.c:349
       task_work_run+0x116/0x1a0 kernel/task_work.c:179
       resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
       __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
       syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
       do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      read to 0xffff88800d0f8aec of 1 bytes by task 222 on cpu 1:
       unix_poll+0xa3/0x2a0 net/unix/af_unix.c:3170
       sock_poll+0xcf/0x2b0 net/socket.c:1385
       vfs_poll include/linux/poll.h:88 [inline]
       ep_item_poll.isra.0+0x78/0xc0 fs/eventpoll.c:855
       ep_send_events fs/eventpoll.c:1694 [inline]
       ep_poll fs/eventpoll.c:1823 [inline]
       do_epoll_wait+0x6c4/0xea0 fs/eventpoll.c:2258
       __do_sys_epoll_wait fs/eventpoll.c:2270 [inline]
       __se_sys_epoll_wait fs/eventpoll.c:2265 [inline]
       __x64_sys_epoll_wait+0xcc/0x190 fs/eventpoll.c:2265
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      value changed: 0x00 -> 0x03
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 222 Comm: dbus-broker Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      
      Fixes: 3c73419c ("af_unix: fix 'poll for write'/ connected DGRAM sockets")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e1d09c2c
    • Kuniyuki Iwashima's avatar
      af_unix: Fix a data race of sk->sk_receive_queue->qlen. · 679ed006
      Kuniyuki Iwashima authored
      KCSAN found a data race of sk->sk_receive_queue->qlen where recvmsg()
      updates qlen under the queue lock and sendmsg() checks qlen under
      unix_state_sock(), not the queue lock, so the reader side needs
      READ_ONCE().
      
      BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_wait_for_peer
      
      write (marked) to 0xffff888019fe7c68 of 4 bytes by task 49792 on cpu 0:
       __skb_unlink include/linux/skbuff.h:2347 [inline]
       __skb_try_recv_from_queue+0x3de/0x470 net/core/datagram.c:197
       __skb_try_recv_datagram+0xf7/0x390 net/core/datagram.c:263
       __unix_dgram_recvmsg+0x109/0x8a0 net/unix/af_unix.c:2452
       unix_dgram_recvmsg+0x94/0xa0 net/unix/af_unix.c:2549
       sock_recvmsg_nosec net/socket.c:1019 [inline]
       ____sys_recvmsg+0x3a3/0x3b0 net/socket.c:2720
       ___sys_recvmsg+0xc8/0x150 net/socket.c:2764
       do_recvmmsg+0x182/0x560 net/socket.c:2858
       __sys_recvmmsg net/socket.c:2937 [inline]
       __do_sys_recvmmsg net/socket.c:2960 [inline]
       __se_sys_recvmmsg net/socket.c:2953 [inline]
       __x64_sys_recvmmsg+0x153/0x170 net/socket.c:2953
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      read to 0xffff888019fe7c68 of 4 bytes by task 49793 on cpu 1:
       skb_queue_len include/linux/skbuff.h:2127 [inline]
       unix_recvq_full net/unix/af_unix.c:229 [inline]
       unix_wait_for_peer+0x154/0x1a0 net/unix/af_unix.c:1445
       unix_dgram_sendmsg+0x13bc/0x14b0 net/unix/af_unix.c:2048
       sock_sendmsg_nosec net/socket.c:724 [inline]
       sock_sendmsg+0x148/0x160 net/socket.c:747
       ____sys_sendmsg+0x20e/0x620 net/socket.c:2503
       ___sys_sendmsg+0xc6/0x140 net/socket.c:2557
       __sys_sendmmsg+0x11d/0x370 net/socket.c:2643
       __do_sys_sendmmsg net/socket.c:2672 [inline]
       __se_sys_sendmmsg net/socket.c:2669 [inline]
       __x64_sys_sendmmsg+0x58/0x70 net/socket.c:2669
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      value changed: 0x0000000b -> 0x00000001
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 49793 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      679ed006
    • Eric Dumazet's avatar
      net: datagram: fix data-races in datagram_poll() · 5bca1d08
      Eric Dumazet authored
      datagram_poll() runs locklessly, we should add READ_ONCE()
      annotations while reading sk->sk_err, sk->sk_shutdown and sk->sk_state.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230509173131.3263780-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5bca1d08
    • Linus Torvalds's avatar
      MAINTAINERS: re-sort all entries and fields · 80e62bc8
      Linus Torvalds authored
      It's been a few years since we've sorted this thing, and the end result
      is that we've added MAINTAINERS entries in the wrong order, and a number
      of entries have their fields in non-canonical order too.
      
      So roll this boulder up the hill one more time by re-running
      
         ./scripts/parse-maintainers.pl --order
      
      on it.
      
      This file ends up being fairly painful for merge conflicts even
      normally, since unlike almost all other kernel files it's one of those
      "everybody touches the same thing", and re-ordering all entries is only
      going to make that worse.  But the alternative is to never do it at all,
      and just let it all rot..
      
      The rc2 week is likely the quietest and least painful time to do this.
      Requested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Requested-by: Joe Perches <joe@perches.com>	# "Please use --order"
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80e62bc8
  3. 10 May, 2023 28 commits
    • Linus Torvalds's avatar
      Merge tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · d295b66a
      Linus Torvalds authored
      Pull inotify fix from Jan Kara:
       "A fix for possibly reporting invalid watch descriptor with inotify
        event"
      
      * tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        inotify: Avoid reporting event with invalid wd
      d295b66a
    • Linus Torvalds's avatar
      Merge tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 2a78769d
      Linus Torvalds authored
      Pull gfs2 fix from Andreas Gruenbacher:
      
       - Fix a NULL pointer dereference when mounting corrupted filesystems
      
      * tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Don't deref jdesc in evict
      2a78769d
    • Bob Peterson's avatar
      gfs2: Don't deref jdesc in evict · 504a10d9
      Bob Peterson authored
      On corrupt gfs2 file systems the evict code can try to reference the
      journal descriptor structure, jdesc, after it has been freed and set to
      NULL. The sequence of events is:
      
      init_journal()
      ...
      fail_jindex:
         gfs2_jindex_free(sdp); <------frees journals, sets jdesc = NULL
            if (gfs2_holder_initialized(&ji_gh))
               gfs2_glock_dq_uninit(&ji_gh);
      fail:
         iput(sdp->sd_jindex); <--references jdesc in evict_linked_inode
            evict()
               gfs2_evict_inode()
                  evict_linked_inode()
                     ret = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
      <------references the now freed/zeroed sd_jdesc pointer.
      
      The call to gfs2_trans_begin is done because the truncate_inode_pages
      call can cause gfs2 events that require a transaction, such as removing
      journaled data (jdata) blocks from the journal.
      
      This patch fixes the problem by adding a check for sdp->sd_jdesc to
      function gfs2_evict_inode. In theory, this should only happen to corrupt
      gfs2 file systems, when gfs2 detects the problem, reports it, then tries
      to evict all the system inodes it has read in up to that point.
      Reported-by: default avatarYang Lan <lanyang0908@gmail.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      504a10d9
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v6.4-2' of... · ad2fd53a
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Hans de Goede:
       "Nothing special to report just various small fixes:
      
         - thinkpad_acpi: Fix profile (performance/bal/low-power) regression
           on T490
      
         - misc other small fixes / hw-id additions"
      
      * tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/mellanox: fix potential race in mlxbf-tmfifo driver
        platform/x86: touchscreen_dmi: Add info for the Dexp Ursus KX210i
        platform/x86: touchscreen_dmi: Add upside-down quirk for GDIX1002 ts on the Juno Tablet
        platform/x86: thinkpad_acpi: Add profile force ability
        platform/x86: thinkpad_acpi: Fix platform profiles on T490
        platform/x86: hp-wmi: add micmute to hp_wmi_keymap struct
        platform/x86/intel-uncore-freq: Return error on write frequency
        platform/x86: intel_scu_pcidrv: Add back PCI ID for Medfield
      ad2fd53a
    • Colin Foster's avatar
      net: mscc: ocelot: fix stat counter register values · cdc2e28e
      Colin Foster authored
      Commit d4c36765 ("net: mscc: ocelot: keep ocelot_stat_layout by reg
      address, not offset") organized the stats counters for Ocelot chips, namely
      the VSC7512 and VSC7514. A few of the counter offsets were incorrect, and
      were caught by this warning:
      
      WARNING: CPU: 0 PID: 24 at drivers/net/ethernet/mscc/ocelot_stats.c:909
      ocelot_stats_init+0x1fc/0x2d8
      reg 0x5000078 had address 0x220 but reg 0x5000079 has address 0x214,
      bulking broken!
      
      Fix these register offsets.
      
      Fixes: d4c36765 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset")
      Signed-off-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdc2e28e
    • t.feng's avatar
      ipvlan:Fix out-of-bounds caused by unclear skb->cb · 90cbed52
      t.feng authored
      If skb enqueue the qdisc, fq_skb_cb(skb)->time_to_send is changed which
      is actually skb->cb, and IPCB(skb_in)->opt will be used in
      __ip_options_echo. It is possible that memcpy is out of bounds and lead
      to stack overflow.
      We should clear skb->cb before ip_local_out or ip6_local_out.
      
      v2:
      1. clean the stack info
      2. use IPCB/IP6CB instead of skb->cb
      
      crash on stable-5.10(reproduce in kasan kernel).
      Stack info:
      [ 2203.651571] BUG: KASAN: stack-out-of-bounds in
      __ip_options_echo+0x589/0x800
      [ 2203.653327] Write of size 4 at addr ffff88811a388f27 by task
      swapper/3/0
      [ 2203.655460] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted
      5.10.0-60.18.0.50.h856.kasan.eulerosv2r11.x86_64 #1
      [ 2203.655466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000 04/01/2014
      [ 2203.655475] Call Trace:
      [ 2203.655481]  <IRQ>
      [ 2203.655501]  dump_stack+0x9c/0xd3
      [ 2203.655514]  print_address_description.constprop.0+0x19/0x170
      [ 2203.655530]  __kasan_report.cold+0x6c/0x84
      [ 2203.655586]  kasan_report+0x3a/0x50
      [ 2203.655594]  check_memory_region+0xfd/0x1f0
      [ 2203.655601]  memcpy+0x39/0x60
      [ 2203.655608]  __ip_options_echo+0x589/0x800
      [ 2203.655654]  __icmp_send+0x59a/0x960
      [ 2203.655755]  nf_send_unreach+0x129/0x3d0 [nf_reject_ipv4]
      [ 2203.655763]  reject_tg+0x77/0x1bf [ipt_REJECT]
      [ 2203.655772]  ipt_do_table+0x691/0xa40 [ip_tables]
      [ 2203.655821]  nf_hook_slow+0x69/0x100
      [ 2203.655828]  __ip_local_out+0x21e/0x2b0
      [ 2203.655857]  ip_local_out+0x28/0x90
      [ 2203.655868]  ipvlan_process_v4_outbound+0x21e/0x260 [ipvlan]
      [ 2203.655931]  ipvlan_xmit_mode_l3+0x3bd/0x400 [ipvlan]
      [ 2203.655967]  ipvlan_queue_xmit+0xb3/0x190 [ipvlan]
      [ 2203.655977]  ipvlan_start_xmit+0x2e/0xb0 [ipvlan]
      [ 2203.655984]  xmit_one.constprop.0+0xe1/0x280
      [ 2203.655992]  dev_hard_start_xmit+0x62/0x100
      [ 2203.656000]  sch_direct_xmit+0x215/0x640
      [ 2203.656028]  __qdisc_run+0x153/0x1f0
      [ 2203.656069]  __dev_queue_xmit+0x77f/0x1030
      [ 2203.656173]  ip_finish_output2+0x59b/0xc20
      [ 2203.656244]  __ip_finish_output.part.0+0x318/0x3d0
      [ 2203.656312]  ip_finish_output+0x168/0x190
      [ 2203.656320]  ip_output+0x12d/0x220
      [ 2203.656357]  __ip_queue_xmit+0x392/0x880
      [ 2203.656380]  __tcp_transmit_skb+0x1088/0x11c0
      [ 2203.656436]  __tcp_retransmit_skb+0x475/0xa30
      [ 2203.656505]  tcp_retransmit_skb+0x2d/0x190
      [ 2203.656512]  tcp_retransmit_timer+0x3af/0x9a0
      [ 2203.656519]  tcp_write_timer_handler+0x3ba/0x510
      [ 2203.656529]  tcp_write_timer+0x55/0x180
      [ 2203.656542]  call_timer_fn+0x3f/0x1d0
      [ 2203.656555]  expire_timers+0x160/0x200
      [ 2203.656562]  run_timer_softirq+0x1f4/0x480
      [ 2203.656606]  __do_softirq+0xfd/0x402
      [ 2203.656613]  asm_call_irq_on_stack+0x12/0x20
      [ 2203.656617]  </IRQ>
      [ 2203.656623]  do_softirq_own_stack+0x37/0x50
      [ 2203.656631]  irq_exit_rcu+0x134/0x1a0
      [ 2203.656639]  sysvec_apic_timer_interrupt+0x36/0x80
      [ 2203.656646]  asm_sysvec_apic_timer_interrupt+0x12/0x20
      [ 2203.656654] RIP: 0010:default_idle+0x13/0x20
      [ 2203.656663] Code: 89 f0 5d 41 5c 41 5d 41 5e c3 cc cc cc cc cc cc cc
      cc cc cc cc cc cc 0f 1f 44 00 00 0f 1f 44 00 00 0f 00 2d 9f 32 57 00 fb
      f4 <c3> cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 be 08
      [ 2203.656668] RSP: 0018:ffff88810036fe78 EFLAGS: 00000256
      [ 2203.656676] RAX: ffffffffaf2a87f0 RBX: ffff888100360000 RCX:
      ffffffffaf290191
      [ 2203.656681] RDX: 0000000000098b5e RSI: 0000000000000004 RDI:
      ffff88811a3c4f60
      [ 2203.656686] RBP: 0000000000000000 R08: 0000000000000001 R09:
      ffff88811a3c4f63
      [ 2203.656690] R10: ffffed10234789ec R11: 0000000000000001 R12:
      0000000000000003
      [ 2203.656695] R13: ffff888100360000 R14: 0000000000000000 R15:
      0000000000000000
      [ 2203.656729]  default_idle_call+0x5a/0x150
      [ 2203.656735]  cpuidle_idle_call+0x1c6/0x220
      [ 2203.656780]  do_idle+0xab/0x100
      [ 2203.656786]  cpu_startup_entry+0x19/0x20
      [ 2203.656793]  secondary_startup_64_no_verify+0xc2/0xcb
      
      [ 2203.657409] The buggy address belongs to the page:
      [ 2203.658648] page:0000000027a9842f refcount:1 mapcount:0
      mapping:0000000000000000 index:0x0 pfn:0x11a388
      [ 2203.658665] flags:
      0x17ffffc0001000(reserved|node=0|zone=2|lastcpupid=0x1fffff)
      [ 2203.658675] raw: 0017ffffc0001000 ffffea000468e208 ffffea000468e208
      0000000000000000
      [ 2203.658682] raw: 0000000000000000 0000000000000000 00000001ffffffff
      0000000000000000
      [ 2203.658686] page dumped because: kasan: bad access detected
      
      To reproduce(ipvlan with IPVLAN_MODE_L3):
      Env setting:
      =======================================================
      modprobe ipvlan ipvlan_default_mode=1
      sysctl net.ipv4.conf.eth0.forwarding=1
      iptables -t nat -A POSTROUTING -s 20.0.0.0/255.255.255.0 -o eth0 -j
      MASQUERADE
      ip link add gw link eth0 type ipvlan
      ip -4 addr add 20.0.0.254/24 dev gw
      ip netns add net1
      ip link add ipv1 link eth0 type ipvlan
      ip link set ipv1 netns net1
      ip netns exec net1 ip link set ipv1 up
      ip netns exec net1 ip -4 addr add 20.0.0.4/24 dev ipv1
      ip netns exec net1 route add default gw 20.0.0.254
      ip netns exec net1 tc qdisc add dev ipv1 root netem loss 10%
      ifconfig gw up
      iptables -t filter -A OUTPUT -p tcp --dport 8888 -j REJECT --reject-with
      icmp-port-unreachable
      =======================================================
      And then excute the shell(curl any address of eth0 can reach):
      
      for((i=1;i<=100000;i++))
      do
              ip netns exec net1 curl x.x.x.x:8888
      done
      =======================================================
      
      Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
      Signed-off-by: default avatar"t.feng" <fengtao40@huawei.com>
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90cbed52
    • Randy Dunlap's avatar
      docs: networking: fix x25-iface.rst heading & index order · 77c964da
      Randy Dunlap authored
      Fix the chapter heading for "X.25 Device Driver Interface" so that it
      does not contain a trailing '-' character, which makes Sphinx
      omit this heading from the contents.
      
      Reverse the order of the x25.rst and x25-iface.rst files in the index
      so that the project introduction (x25.rst) comes first.
      
      Fixes: 883780af ("docs: networking: convert x25-iface.txt to ReST")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-doc@vger.kernel.org
      Cc: Martin Schiller <ms@dev.tdt.de>
      Cc: linux-x25@vger.kernel.org
      Reviewed-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77c964da
    • Ziwei Xiao's avatar
      gve: Remove the code of clearing PBA bit · f4c2e67c
      Ziwei Xiao authored
      Clearing the PBA bit from the driver is race prone and it may lead to
      dropped interrupt events. This could potentially lead to the traffic
      being completely halted.
      
      Fixes: 5e8c5adf ("gve: DQO: Add core netdev features")
      Signed-off-by: default avatarZiwei Xiao <ziweixiao@google.com>
      Signed-off-by: default avatarBailey Forrest <bcf@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4c2e67c
    • Eric Dumazet's avatar
      tcp: add annotations around sk->sk_shutdown accesses · e14cadfd
      Eric Dumazet authored
      Now sk->sk_shutdown is no longer a bitfield, we can add
      standard READ_ONCE()/WRITE_ONCE() annotations to silence
      KCSAN reports like the following:
      
      BUG: KCSAN: data-race in tcp_disconnect / tcp_poll
      
      write to 0xffff88814588582c of 1 bytes by task 3404 on cpu 1:
      tcp_disconnect+0x4d6/0xdb0 net/ipv4/tcp.c:3121
      __inet_stream_connect+0x5dd/0x6e0 net/ipv4/af_inet.c:715
      inet_stream_connect+0x48/0x70 net/ipv4/af_inet.c:727
      __sys_connect_file net/socket.c:2001 [inline]
      __sys_connect+0x19b/0x1b0 net/socket.c:2018
      __do_sys_connect net/socket.c:2028 [inline]
      __se_sys_connect net/socket.c:2025 [inline]
      __x64_sys_connect+0x41/0x50 net/socket.c:2025
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff88814588582c of 1 bytes by task 3374 on cpu 0:
      tcp_poll+0x2e6/0x7d0 net/ipv4/tcp.c:562
      sock_poll+0x253/0x270 net/socket.c:1383
      vfs_poll include/linux/poll.h:88 [inline]
      io_poll_check_events io_uring/poll.c:281 [inline]
      io_poll_task_func+0x15a/0x820 io_uring/poll.c:333
      handle_tw_list io_uring/io_uring.c:1184 [inline]
      tctx_task_work+0x1fe/0x4d0 io_uring/io_uring.c:1246
      task_work_run+0x123/0x160 kernel/task_work.c:179
      get_signal+0xe64/0xff0 kernel/signal.c:2635
      arch_do_signal_or_restart+0x89/0x2a0 arch/x86/kernel/signal.c:306
      exit_to_user_mode_loop+0x6f/0xe0 kernel/entry/common.c:168
      exit_to_user_mode_prepare+0x6c/0xb0 kernel/entry/common.c:204
      __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
      syscall_exit_to_user_mode+0x26/0x140 kernel/entry/common.c:297
      do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x03 -> 0x00
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e14cadfd
    • Eric Dumazet's avatar
      net: add vlan_get_protocol_and_depth() helper · 4063384e
      Eric Dumazet authored
      Before blamed commit, pskb_may_pull() was used instead
      of skb_header_pointer() in __vlan_get_protocol() and friends.
      
      Few callers depended on skb->head being populated with MAC header,
      syzbot caught one of them (skb_mac_gso_segment())
      
      Add vlan_get_protocol_and_depth() to make the intent clearer
      and use it where sensible.
      
      This is a more generic fix than commit e9d3f809
      ("net/af_packet: make sure to pull mac header") which was
      dealing with a similar issue.
      
      kernel BUG at include/linux/skbuff.h:2655 !
      invalid opcode: 0000 [#1] SMP KASAN
      CPU: 0 PID: 1441 Comm: syz-executor199 Not tainted 6.1.24-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
      RIP: 0010:__skb_pull include/linux/skbuff.h:2655 [inline]
      RIP: 0010:skb_mac_gso_segment+0x68f/0x6a0 net/core/gro.c:136
      Code: fd 48 8b 5c 24 10 44 89 6b 70 48 c7 c7 c0 ae 0d 86 44 89 e6 e8 a1 91 d0 00 48 c7 c7 00 af 0d 86 48 89 de 31 d2 e8 d1 4a e9 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
      RSP: 0018:ffffc90001bd7520 EFLAGS: 00010286
      RAX: ffffffff8469736a RBX: ffff88810f31dac0 RCX: ffff888115a18b00
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffffc90001bd75e8 R08: ffffffff84697183 R09: fffff5200037adf9
      R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000012
      R13: 000000000000fee5 R14: 0000000000005865 R15: 000000000000fed7
      FS: 000055555633f300(0000) GS:ffff8881f6a00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000000 CR3: 0000000116fea000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      [<ffffffff847018dd>] __skb_gso_segment+0x32d/0x4c0 net/core/dev.c:3419
      [<ffffffff8470398a>] skb_gso_segment include/linux/netdevice.h:4819 [inline]
      [<ffffffff8470398a>] validate_xmit_skb+0x3aa/0xee0 net/core/dev.c:3725
      [<ffffffff84707042>] __dev_queue_xmit+0x1332/0x3300 net/core/dev.c:4313
      [<ffffffff851a9ec7>] dev_queue_xmit+0x17/0x20 include/linux/netdevice.h:3029
      [<ffffffff851b4a82>] packet_snd net/packet/af_packet.c:3111 [inline]
      [<ffffffff851b4a82>] packet_sendmsg+0x49d2/0x6470 net/packet/af_packet.c:3142
      [<ffffffff84669a12>] sock_sendmsg_nosec net/socket.c:716 [inline]
      [<ffffffff84669a12>] sock_sendmsg net/socket.c:736 [inline]
      [<ffffffff84669a12>] __sys_sendto+0x472/0x5f0 net/socket.c:2139
      [<ffffffff84669c75>] __do_sys_sendto net/socket.c:2151 [inline]
      [<ffffffff84669c75>] __se_sys_sendto net/socket.c:2147 [inline]
      [<ffffffff84669c75>] __x64_sys_sendto+0xe5/0x100 net/socket.c:2147
      [<ffffffff8551d40f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      [<ffffffff8551d40f>] do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80
      [<ffffffff85600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 469acedd ("vlan: consolidate VLAN parsing code and limit max parsing depth")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Toke Høiland-Jørgensen <toke@redhat.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4063384e
    • Russell King (Oracle)'s avatar
      net: pcs: xpcs: fix incorrect number of interfaces · 43fb622d
      Russell King (Oracle) authored
      In synopsys_xpcs_compat[], the DW_XPCS_2500BASEX entry was setting
      the number of interfaces using the xpcs_2500basex_features array
      rather than xpcs_2500basex_interfaces. This causes us to overflow
      the array of interfaces. Fix this.
      
      Fixes: f27abde3 ("net: pcs: add 2500BASEX support for Intel mGbE controller")
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43fb622d
    • Eric Dumazet's avatar
      net: deal with most data-races in sk_wait_event() · d0ac89f6
      Eric Dumazet authored
      __condition is evaluated twice in sk_wait_event() macro.
      
      First invocation is lockless, and reads can race with writes,
      as spotted by syzbot.
      
      BUG: KCSAN: data-race in sk_stream_wait_connect / tcp_disconnect
      
      write to 0xffff88812d83d6a0 of 4 bytes by task 9065 on cpu 1:
      tcp_disconnect+0x2cd/0xdb0
      inet_shutdown+0x19e/0x1f0 net/ipv4/af_inet.c:911
      __sys_shutdown_sock net/socket.c:2343 [inline]
      __sys_shutdown net/socket.c:2355 [inline]
      __do_sys_shutdown net/socket.c:2363 [inline]
      __se_sys_shutdown+0xf8/0x140 net/socket.c:2361
      __x64_sys_shutdown+0x31/0x40 net/socket.c:2361
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff88812d83d6a0 of 4 bytes by task 9040 on cpu 0:
      sk_stream_wait_connect+0x1de/0x3a0 net/core/stream.c:75
      tcp_sendmsg_locked+0x2e4/0x2120 net/ipv4/tcp.c:1266
      tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1484
      inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:651
      sock_sendmsg_nosec net/socket.c:724 [inline]
      sock_sendmsg net/socket.c:747 [inline]
      __sys_sendto+0x246/0x300 net/socket.c:2142
      __do_sys_sendto net/socket.c:2154 [inline]
      __se_sys_sendto net/socket.c:2150 [inline]
      __x64_sys_sendto+0x78/0x90 net/socket.c:2150
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x00000000 -> 0x00000068
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0ac89f6
    • Eric Dumazet's avatar
      net: annotate sk->sk_err write from do_recvmmsg() · e05a5f51
      Eric Dumazet authored
      do_recvmmsg() can write to sk->sk_err from multiple threads.
      
      As said before, many other points reading or writing sk_err
      need annotations.
      
      Fixes: 34b88a68 ("net: Fix use after free in the recvmmsg exit path")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e05a5f51
    • Eric Dumazet's avatar
      netlink: annotate accesses to nlk->cb_running · a939d149
      Eric Dumazet authored
      Both netlink_recvmsg() and netlink_native_seq_show() read
      nlk->cb_running locklessly. Use READ_ONCE() there.
      
      Add corresponding WRITE_ONCE() to netlink_dump() and
      __netlink_dump_start()
      
      syzbot reported:
      BUG: KCSAN: data-race in __netlink_dump_start / netlink_recvmsg
      
      write to 0xffff88813ea4db59 of 1 bytes by task 28219 on cpu 0:
      __netlink_dump_start+0x3af/0x4d0 net/netlink/af_netlink.c:2399
      netlink_dump_start include/linux/netlink.h:308 [inline]
      rtnetlink_rcv_msg+0x70f/0x8c0 net/core/rtnetlink.c:6130
      netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2577
      rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6192
      netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
      netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
      netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1942
      sock_sendmsg_nosec net/socket.c:724 [inline]
      sock_sendmsg net/socket.c:747 [inline]
      sock_write_iter+0x1aa/0x230 net/socket.c:1138
      call_write_iter include/linux/fs.h:1851 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x463/0x760 fs/read_write.c:584
      ksys_write+0xeb/0x1a0 fs/read_write.c:637
      __do_sys_write fs/read_write.c:649 [inline]
      __se_sys_write fs/read_write.c:646 [inline]
      __x64_sys_write+0x42/0x50 fs/read_write.c:646
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff88813ea4db59 of 1 bytes by task 28222 on cpu 1:
      netlink_recvmsg+0x3b4/0x730 net/netlink/af_netlink.c:2022
      sock_recvmsg_nosec+0x4c/0x80 net/socket.c:1017
      ____sys_recvmsg+0x2db/0x310 net/socket.c:2718
      ___sys_recvmsg net/socket.c:2762 [inline]
      do_recvmmsg+0x2e5/0x710 net/socket.c:2856
      __sys_recvmmsg net/socket.c:2935 [inline]
      __do_sys_recvmmsg net/socket.c:2958 [inline]
      __se_sys_recvmmsg net/socket.c:2951 [inline]
      __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x00 -> 0x01
      
      Fixes: 16b304f3 ("netlink: Eliminate kmalloc in netlink dump operation.")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a939d149
    • David S. Miller's avatar
      Merge branch 'bonding-overflow' · a5b3363d
      David S. Miller authored
      Hangbin Liu says:
      
      ====================
      bonding: fix send_peer_notif overflow
      
      Bonding send_peer_notif was defined as u8. But the value is
      num_peer_notif multiplied by peer_notif_delay, which is u8 * u32.
      This would cause the send_peer_notif overflow.
      
      Before the fix:
      TEST: num_grat_arp (active-backup miimon num_grat_arp 10)           [ OK ]
      TEST: num_grat_arp (active-backup miimon num_grat_arp 20)           [ OK ]
      4 garp packets sent on active slave eth1
      TEST: num_grat_arp (active-backup miimon num_grat_arp 30)           [FAIL]
      24 garp packets sent on active slave eth1
      TEST: num_grat_arp (active-backup miimon num_grat_arp 50)           [FAIL]
      
      After the fix:
      TEST: num_grat_arp (active-backup miimon num_grat_arp 10)           [ OK ]
      TEST: num_grat_arp (active-backup miimon num_grat_arp 20)           [ OK ]
      TEST: num_grat_arp (active-backup miimon num_grat_arp 30)           [ OK ]
      TEST: num_grat_arp (active-backup miimon num_grat_arp 50)           [ OK ]
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5b3363d
    • Hangbin Liu's avatar
      kselftest: bonding: add num_grat_arp test · 6cbe791c
      Hangbin Liu authored
      TEST: num_grat_arp (active-backup miimon num_grat_arp 10)           [ OK ]
      TEST: num_grat_arp (active-backup miimon num_grat_arp 20)           [ OK ]
      TEST: num_grat_arp (active-backup miimon num_grat_arp 30)           [ OK ]
      TEST: num_grat_arp (active-backup miimon num_grat_arp 50)           [ OK ]
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cbe791c
    • Hangbin Liu's avatar
      selftests: forwarding: lib: add netns support for tc rule handle stats get · b6d1599f
      Hangbin Liu authored
      When run the test in netns, it's not easy to get the tc stats via
      tc_rule_handle_stats_get(). With the new netns parameter, we can get
      stats from specific netns like
      
        num=$(tc_rule_handle_stats_get "dev eth0 ingress" 101 ".packets" "-n ns")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6d1599f
    • Hangbin Liu's avatar
      Documentation: bonding: fix the doc of peer_notif_delay · 84df83e0
      Hangbin Liu authored
      Bonding only supports setting peer_notif_delay with miimon set.
      
      Fixes: 0307d589 ("bonding: add documentation for peer_notif_delay")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84df83e0
    • Hangbin Liu's avatar
      bonding: fix send_peer_notif overflow · 9949e2ef
      Hangbin Liu authored
      Bonding send_peer_notif was defined as u8. Since commit 07a4ddec
      ("bonding: add an option to specify a delay between peer notifications").
      the bond->send_peer_notif will be num_peer_notif multiplied by
      peer_notif_delay, which is u8 * u32. This would cause the send_peer_notif
      overflow easily. e.g.
      
        ip link add bond0 type bond mode 1 miimon 100 num_grat_arp 30 peer_notify_delay 1000
      
      To fix the overflow, let's set the send_peer_notif to u32 and limit
      peer_notif_delay to 300s.
      Reported-by: default avatarLiang Li <liali@redhat.com>
      Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2090053
      Fixes: 07a4ddec ("bonding: add an option to specify a delay between peer notifications")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9949e2ef
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: fix NULL pointer dereference · 7c83e28f
      Daniel Golle authored
      Check for NULL pointer to avoid kernel crashing in case of missing WO
      firmware in case only a single WEDv2 device has been initialized, e.g. on
      MT7981 which can connect just one wireless frontend.
      
      Fixes: 86ce0d09 ("net: ethernet: mtk_eth_soc: use WO firmware for MT7981")
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c83e28f
    • Florian Westphal's avatar
      selftests: nft_flowtable.sh: check ingress/egress chain too · 3acf8f6c
      Florian Westphal authored
      Make sure flowtable interacts correctly with ingress and egress
      chains, i.e. those get handled before and after flow table respectively.
      
      Adds three more tests:
      1. repeat flowtable test, but with 'ip dscp set cs3' done in
         inet forward chain.
      
      Expect that some packets have been mangled (before flowtable offload
      became effective) while some pass without mangling (after offload
      succeeds).
      
      2. repeat flowtable test, but with 'ip dscp set cs3' done in
         veth0:ingress.
      
      Expect that all packets pass with cs3 dscp field.
      
      3. same as 2, but use veth1:egress.  Expect the same outcome.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3acf8f6c
    • Boris Sukholitko's avatar
      selftests: nft_flowtable.sh: monitor result file sizes · 90ab5122
      Boris Sukholitko authored
      When running nft_flowtable.sh in VM on a busy server we've found that
      the time of the netcat file transfers vary wildly.
      
      Therefore replace hardcoded 3 second sleep with the loop checking for
      a change in the file sizes. Once no change in detected we test the results.
      
      Nice side effect is that we shave 1 second sleep in the fast case
      (hard-coded 3 second sleep vs two 1 second sleeps).
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      90ab5122
    • Boris Sukholitko's avatar
      selftests: nft_flowtable.sh: wait for specific nc pids · 1114803c
      Boris Sukholitko authored
      Doing wait with no parameters may interfere with some of the tests
      having their own background processes.
      
      Although no such test is currently present, the cleanup is useful
      to rely on the nft_flowtable.sh for local development (e.g. running
      background tcpdump command during the tests).
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1114803c
    • Boris Sukholitko's avatar
      selftests: nft_flowtable.sh: no need for ps -x option · 0749d670
      Boris Sukholitko authored
      Some ps commands (e.g. busybox derived) have no -x option. For the
      purposes of hash calculation of the list of processes this option is
      inessential.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0749d670
    • Boris Sukholitko's avatar
      selftests: nft_flowtable.sh: use /proc for pid checking · 0a11073e
      Boris Sukholitko authored
      Some ps commands (e.g. busybox derived) have no -p option. Use /proc for
      pid existence check.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0a11073e
    • Florian Westphal's avatar
      netfilter: conntrack: fix possible bug_on with enable_hooks=1 · e72eeab5
      Florian Westphal authored
      I received a bug report (no reproducer so far) where we trip over
      
      712         rcu_read_lock();
      713         ct_hook = rcu_dereference(nf_ct_hook);
      714         BUG_ON(ct_hook == NULL);  // here
      
      In nf_conntrack_destroy().
      
      First turn this BUG_ON into a WARN.  I think it was triggered
      via enable_hooks=1 flag.
      
      When this flag is turned on, the conntrack hooks are registered
      before nf_ct_hook pointer gets assigned.
      This opens a short window where packets enter the conntrack machinery,
      can have skb->_nfct set up and a subsequent kfree_skb might occur
      before nf_ct_hook is set.
      
      Call nf_conntrack_init_end() to set nf_ct_hook before we register the
      pernet ops.
      
      Fixes: ba3fbe66 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e72eeab5
    • Florian Westphal's avatar
      netfilter: nf_tables: always release netdev hooks from notifier · dc1c9fd4
      Florian Westphal authored
      This reverts "netfilter: nf_tables: skip netdev events generated on netns removal".
      
      The problem is that when a veth device is released, the veth release
      callback will also queue the peer netns device for removal.
      
      Its possible that the peer netns is also slated for removal.  In this
      case, the device memory is already released before the pre_exit hook of
      the peer netns runs:
      
      BUG: KASAN: slab-use-after-free in nf_hook_entry_head+0x1b8/0x1d0
      Read of size 8 at addr ffff88812c0124f0 by task kworker/u8:1/45
      Workqueue: netns cleanup_net
      Call Trace:
       nf_hook_entry_head+0x1b8/0x1d0
       __nf_unregister_net_hook+0x76/0x510
       nft_netdev_unregister_hooks+0xa0/0x220
       __nft_release_hook+0x184/0x490
       nf_tables_pre_exit_net+0x12f/0x1b0
       ..
      
      Order is:
      1. First netns is released, veth_dellink() queues peer netns device
         for removal
      2. peer netns is queued for removal
      3. peer netns device is released, unreg event is triggered
      4. unreg event is ignored because netns is going down
      5. pre_exit hook calls nft_netdev_unregister_hooks but device memory
         might be free'd already.
      
      Fixes: 68a3765c ("netfilter: nf_tables: skip netdev events generated on netns removal")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dc1c9fd4
    • Florian Fainelli's avatar
      net: phy: bcm7xx: Correct read from expansion register · 582dbb2c
      Florian Fainelli authored
      Since the driver works in the "legacy" addressing mode, we need to write
      to the expansion register (0x17) with bits 11:8 set to 0xf to properly
      select the expansion register passed as argument.
      
      Fixes: f68d08c4 ("net: phy: bcm7xxx: Add EPHY entry for 72165")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230508231749.1681169-1-f.fainelli@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      582dbb2c