1. 11 Oct, 2024 4 commits
    • Wei Fang's avatar
      net: enetc: block concurrent XDP transmissions during ring reconfiguration · c728a95c
      Wei Fang authored
      When testing the XDP_REDIRECT function on the LS1028A platform, we
      found a very reproducible issue that the Tx frames can no longer be
      sent out even if XDP_REDIRECT is turned off. Specifically, if there
      is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on,
      the console may display some warnings like "timeout for tx ring #6
      clear", and all redirected frames will be dropped, the detailed log
      is as follows.
      
      root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2
      Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc)
      [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear
      [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
      [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
      eno0->eno2     1420505 rx/s       1420590 err,drop/s      0 xmit/s
        xmit eno0->eno2    0 xmit/s     1420590 drop/s     0 drv_err/s     15.71 bulk-avg
      eno0->eno2     1420484 rx/s       1420485 err,drop/s      0 xmit/s
        xmit eno0->eno2    0 xmit/s     1420485 drop/s     0 drv_err/s     15.71 bulk-avg
      
      By analyzing the XDP_REDIRECT implementation of enetc driver, the
      driver will reconfigure Tx and Rx BD rings when a bpf program is
      installed or uninstalled, but there is no mechanisms to block the
      redirected frames when enetc driver reconfigures rings. Similarly,
      XDP_TX verdicts on received frames can also lead to frames being
      enqueued in the Tx rings. Because XDP ignores the state set by the
      netif_tx_wake_queue() API, so introduce the ENETC_TX_DOWN flag to
      suppress transmission of XDP frames.
      
      Fixes: c33bfaf9 ("net: enetc: set up XDP program under enetc_reconfigure()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241010092056.298128-3-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c728a95c
    • Wei Fang's avatar
      net: enetc: remove xdp_drops statistic from enetc_xdp_drop() · 412950d5
      Wei Fang authored
      The xdp_drops statistic indicates the number of XDP frames dropped in
      the Rx direction. However, enetc_xdp_drop() is also used in XDP_TX and
      XDP_REDIRECT actions. If frame loss occurs in these two actions, the
      frames loss count should not be included in xdp_drops, because there
      are already xdp_tx_drops and xdp_redirect_failures to count the frame
      loss of these two actions, so it's better to remove xdp_drops statistic
      from enetc_xdp_drop() and increase xdp_drops in XDP_DROP action.
      
      Fixes: 7ed2bc80 ("net: enetc: add support for XDP_TX")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241010092056.298128-2-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      412950d5
    • Daniel Machon's avatar
      net: sparx5: fix source port register when mirroring · 8a6be4bd
      Daniel Machon authored
      When port mirroring is added to a port, the bit position of the source
      port, needs to be written to the register ANA_AC_PROBE_PORT_CFG.  This
      register is replicated for n_ports > 32, and therefore we need to derive
      the correct register from the port number.
      
      Before this patch, we wrongly calculate the register from portno /
      BITS_PER_BYTE, where the divisor ought to be 32, causing any port >=8 to
      be written to the wrong register. We fix this, by using do_div(), where
      the dividend is the register, the remainder is the bit position and the
      divisor is now 32.
      
      Fixes: 4e50d72b ("net: sparx5: add port mirroring implementation")
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241009-mirroring-fix-v1-1-9ec962301989@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a6be4bd
    • Xin Long's avatar
      ipv4: give an IPv4 dev to blackhole_netdev · 22600596
      Xin Long authored
      After commit 8d7017fd ("blackhole_netdev: use blackhole_netdev to
      invalidate dst entries"), blackhole_netdev was introduced to invalidate
      dst cache entries on the TX path whenever the cache times out or is
      flushed.
      
      When two UDP sockets (sk1 and sk2) send messages to the same destination
      simultaneously, they are using the same dst cache. If the dst cache is
      invalidated on one path (sk2) while the other (sk1) is still transmitting,
      sk1 may try to use the invalid dst entry.
      
               CPU1                   CPU2
      
            udp_sendmsg(sk1)       udp_sendmsg(sk2)
            udp_send_skb()
            ip_output()
                                                   <--- dst timeout or flushed
                                   dst_dev_put()
            ip_finish_output2()
            ip_neigh_for_gw()
      
      This results in a scenario where ip_neigh_for_gw() returns -EINVAL because
      blackhole_dev lacks an in_dev, which is needed to initialize the neigh in
      arp_constructor(). This error is then propagated back to userspace,
      breaking the UDP application.
      
      The patch fixes this issue by assigning an in_dev to blackhole_dev for
      IPv4, similar to what was done for IPv6 in commit e5f80fcf ("ipv6:
      give an IPv6 dev to blackhole_netdev"). This ensures that even when the
      dst entry is invalidated with blackhole_dev, it will not fail to create
      the neigh entry.
      
      As devinet_init() is called ealier than blackhole_netdev_init() in system
      booting, it can not assign the in_dev to blackhole_dev in devinet_init().
      As Paolo suggested, add a separate late_initcall() in devinet.c to ensure
      inet_blackhole_dev_init() is called after blackhole_netdev_init().
      
      Fixes: 8d7017fd ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/3000792d45ca44e16c785ebe2b092e610e5b3df1.1728499633.git.lucien.xin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      22600596
  2. 10 Oct, 2024 32 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1d227fcc
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth and netfilter.
      
        Current release - regressions:
      
         - dsa: sja1105: fix reception from VLAN-unaware bridges
      
         - Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is
           enabled"
      
         - eth: fec: don't save PTP state if PTP is unsupported
      
        Current release - new code bugs:
      
         - smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref
      
         - eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop
      
         - phy: aquantia: AQR115c fix up PMA capabilities
      
        Previous releases - regressions:
      
         - tcp: 3 fixes for retrans_stamp and undo logic
      
        Previous releases - always broken:
      
         - net: do not delay dst_entries_add() in dst_release()
      
         - netfilter: restrict xtables extensions to families that are safe,
           syzbot found a way to combine ebtables with extensions that are
           never used by userspace tools
      
         - sctp: ensure sk_state is set to CLOSED if hashing fails in
           sctp_listen_start
      
         - mptcp: handle consistently DSS corruption, and prevent corruption
           due to large pmtu xmit"
      
      * tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
        MAINTAINERS: Add headers and mailing list to UDP section
        MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
        slip: make slhc_remember() more robust against malicious packets
        net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
        ppp: fix ppp_async_encode() illegal access
        docs: netdev: document guidance on cleanup patches
        phonet: Handle error of rtnl_register_module().
        mpls: Handle error of rtnl_register_module().
        mctp: Handle error of rtnl_register_module().
        bridge: Handle error of rtnl_register_module().
        vxlan: Handle error of rtnl_register_module().
        rtnetlink: Add bulk registration helpers for rtnetlink message handlers.
        net: do not delay dst_entries_add() in dst_release()
        mptcp: pm: do not remove closing subflows
        mptcp: fallback when MPTCP opts are dropped after 1st data
        tcp: fix mptcp DSS corruption due to large pmtu xmit
        mptcp: handle consistently DSS corruption
        net: netconsole: fix wrong warning
        net: dsa: refuse cross-chip mirroring operations
        net: fec: don't save PTP state if PTP is unsupported
        ...
      1d227fcc
    • Linus Torvalds's avatar
      Merge tag 'trace-ringbuffer-v6.12-rc2' of... · 0edab8d1
      Linus Torvalds authored
      Merge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing fix from Steven Rostedt:
       "Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug
        callbacks
      
        When a ring buffer is mapped to memory assigned at boot, it also
        splits it up evenly between the possible CPUs. But the allocation code
        still attached a CPU notifier callback to this ring buffer. When a CPU
        is added, the callback will happen and another per-cpu buffer is
        created for the ring buffer.
      
        But for boot mapped buffers, there is no room to add another one (as
        they were all created already). The result of calling the CPU hotplug
        notifier on a boot mapped ring buffer is unpredictable and could lead
        to a system crash.
      
        If the ring buffer is boot mapped simply do not attach the CPU
        notifier to it"
      
      * tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ring-buffer: Do not have boot mapped buffers hook to CPU hotplug
      0edab8d1
    • Linus Torvalds's avatar
      Merge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · eb952c47
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - update fstrim loop and add more cancellation points, fix reported
         delayed or blocked suspend if there's a huge chunk queued
      
       - fix error handling in recent qgroup xarray conversion
      
       - in zoned mode, fix warning printing device path without RCU
         protection
      
       - again fix invalid extent xarray state (6252690f), lost due to
         refactoring
      
      * tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
        btrfs: zoned: fix missing RCU locking in error message when loading zone info
        btrfs: fix missing error handling when adding delayed ref with qgroups enabled
        btrfs: add cancellation points to trim loops
        btrfs: split remaining space to discard in chunks
      eb952c47
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 5870963f
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Fix NFSD bring-up / shutdown
      
       - Fix a UAF when releasing a stateid
      
      * tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        nfsd: fix possible badness in FREE_STATEID
        nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed
        NFSD: Mark filecache "down" if init fails
      5870963f
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 825ec756
      Linus Torvalds authored
      Pull xfs fixes from Carlos Maiolino:
      
       - A few small typo fixes
      
       - fstests xfs/538 DEBUG-only fix
      
       - Performance fix on blockgc on COW'ed files, by skipping trims on
         cowblock inodes currently opened for write
      
       - Prevent cowblocks to be freed under dirty pagecache during unshare
      
       - Update MAINTAINERS file to quote the new maintainer
      
      * tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: fix a typo
        xfs: don't free cowblocks from under dirty pagecache on unshare
        xfs: skip background cowblock trims on inodes open for write
        xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
        xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
        xfs: don't ifdef around the exact minlen allocations
        xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
        xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
        xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
        xfs: return bool from xfs_attr3_leaf_add
        xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
        xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
        xfs: scrub: convert comma to semicolon
        xfs: Remove empty declartion in header file
        MAINTAINERS: add Carlos Maiolino as XFS release manager
      825ec756
    • Jakub Kicinski's avatar
      Merge branch 'maintainers-networking-file-coverage-updates' · 7b43ba65
      Jakub Kicinski authored
      Simon Horman says:
      
      ====================
      MAINTAINERS: Networking file coverage updates
      
      The aim of this proposal is to make the handling of some files,
      related to Networking and Wireless, more consistently. It does so by:
      
      1. Adding some more headers to the UDP section, making it consistent
         with the TCP section.
      
      2. Excluding some files relating to Wireless from NETWORKING [GENERAL],
         making their handling consistent with other files related to
         Wireless.
      
      The aim of this is to make things more consistent.  And for MAINTAINERS
      to better reflect the situation on the ground.  I am more than happy to
      be told that the current state of affairs is fine. Or for other ideas to
      be discussed.
      
      v1: https://lore.kernel.org/20241004-maint-net-hdrs-v1-0-41fd555aacc5@kernel.org
      ====================
      
      Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-0-f2c86e7309c8@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b43ba65
    • Simon Horman's avatar
      MAINTAINERS: Add headers and mailing list to UDP section · 5404b5a2
      Simon Horman authored
      Add netdev mailing list and some more udp.h headers to the UDP section.
      This is now more consistent with the TCP section.
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-2-f2c86e7309c8@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5404b5a2
    • Simon Horman's avatar
      MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL] · 9937aae3
      Simon Horman authored
      We already exclude wireless drivers from the netdev@ traffic, to
      delegate it to linux-wireless@, and avoid overwhelming netdev@.
      
      Many of the following wireless-related sections MAINTAINERS
      are already not included in the NETWORKING [GENERAL] section.
      For consistency, exclude those that are.
      
      * 802.11 (including CFG80211/NL80211)
      * MAC80211
      * RFKILL
      Acked-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-1-f2c86e7309c8@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9937aae3
    • Eric Dumazet's avatar
      slip: make slhc_remember() more robust against malicious packets · 7d3fce8c
      Eric Dumazet authored
      syzbot found that slhc_remember() was missing checks against
      malicious packets [1].
      
      slhc_remember() only checked the size of the packet was at least 20,
      which is not good enough.
      
      We need to make sure the packet includes the IPv4 and TCP header
      that are supposed to be carried.
      
      Add iph and th pointers to make the code more readable.
      
      [1]
      
      BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
        slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
        ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455
        ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline]
        ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212
        ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327
        pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
        sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
        __release_sock+0x1da/0x330 net/core/sock.c:3072
        release_sock+0x6b/0x250 net/core/sock.c:3626
        pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
        sock_sendmsg_nosec net/socket.c:729 [inline]
        __sock_sendmsg+0x30f/0x380 net/socket.c:744
        ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
        ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
        __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
        __do_sys_sendmmsg net/socket.c:2771 [inline]
        __se_sys_sendmmsg net/socket.c:2768 [inline]
        __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
        x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Uninit was created at:
        slab_post_alloc_hook mm/slub.c:4091 [inline]
        slab_alloc_node mm/slub.c:4134 [inline]
        kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186
        kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
        __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
        alloc_skb include/linux/skbuff.h:1322 [inline]
        sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
        pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
        sock_sendmsg_nosec net/socket.c:729 [inline]
        __sock_sendmsg+0x30f/0x380 net/socket.c:744
        ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
        ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
        __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
        __do_sys_sendmmsg net/socket.c:2771 [inline]
        __se_sys_sendmmsg net/socket.c:2768 [inline]
        __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
        x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted 6.12.0-rc2-syzkaller-00006-g87d6aab2 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
      
      Fixes: b5451d78 ("slip: Move the SLIP drivers")
      Reported-by: syzbot+2ada1bc857496353be5a@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/netdev/670646db.050a0220.3f80e.0027.GAE@google.com/T/#uSigned-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20241009091132.2136321-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7d3fce8c
    • D. Wythe's avatar
      net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC · 6fd27ea1
      D. Wythe authored
      Eric report a panic on IPPROTO_SMC, and give the facts
      that when INET_PROTOSW_ICSK was set, icsk->icsk_sync_mss must be set too.
      
      Bug: Unable to handle kernel NULL pointer dereference at virtual address
      0000000000000000
      Mem abort info:
      ESR = 0x0000000086000005
      EC = 0x21: IABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x05: level 1 translation fault
      user pgtable: 4k pages, 48-bit VAs, pgdp=00000001195d1000
      [0000000000000000] pgd=0800000109c46003, p4d=0800000109c46003,
      pud=0000000000000000
      Internal error: Oops: 0000000086000005 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 1 UID: 0 PID: 8037 Comm: syz.3.265 Not tainted
      6.11.0-rc7-syzkaller-g5f5673607153 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine,
      BIOS Google 08/06/2024
      pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : 0x0
      lr : cipso_v4_sock_setattr+0x2a8/0x3c0 net/ipv4/cipso_ipv4.c:1910
      sp : ffff80009b887a90
      x29: ffff80009b887aa0 x28: ffff80008db94050 x27: 0000000000000000
      x26: 1fffe0001aa6f5b3 x25: dfff800000000000 x24: ffff0000db75da00
      x23: 0000000000000000 x22: ffff0000d8b78518 x21: 0000000000000000
      x20: ffff0000d537ad80 x19: ffff0000d8b78000 x18: 1fffe000366d79ee
      x17: ffff8000800614a8 x16: ffff800080569b84 x15: 0000000000000001
      x14: 000000008b336894 x13: 00000000cd96feaa x12: 0000000000000003
      x11: 0000000000040000 x10: 00000000000020a3 x9 : 1fffe0001b16f0f1
      x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f
      x5 : 0000000000000040 x4 : 0000000000000001 x3 : 0000000000000000
      x2 : 0000000000000002 x1 : 0000000000000000 x0 : ffff0000d8b78000
      Call trace:
      0x0
      netlbl_sock_setattr+0x2e4/0x338 net/netlabel/netlabel_kapi.c:1000
      smack_netlbl_add+0xa4/0x154 security/smack/smack_lsm.c:2593
      smack_socket_post_create+0xa8/0x14c security/smack/smack_lsm.c:2973
      security_socket_post_create+0x94/0xd4 security/security.c:4425
      __sock_create+0x4c8/0x884 net/socket.c:1587
      sock_create net/socket.c:1622 [inline]
      __sys_socket_create net/socket.c:1659 [inline]
      __sys_socket+0x134/0x340 net/socket.c:1706
      __do_sys_socket net/socket.c:1720 [inline]
      __se_sys_socket net/socket.c:1718 [inline]
      __arm64_sys_socket+0x7c/0x94 net/socket.c:1718
      __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
      invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
      el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
      do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
      el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
      el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
      el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
      Code: ???????? ???????? ???????? ???????? (????????)
      ---[ end trace 0000000000000000 ]---
      
      This patch add a toy implementation that performs a simple return to
      prevent such panic. This is because MSS can be set in sock_create_kern
      or smc_setsockopt, similar to how it's done in AF_SMC. However, for
      AF_SMC, there is currently no way to synchronize MSS within
      __sys_connect_file. This toy implementation lays the groundwork for us
      to support such feature for IPPROTO_SMC in the future.
      
      Fixes: d25a92cc ("net/smc: Introduce IPPROTO_SMC")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Link: https://patch.msgid.link/1728456916-67035-1-git-send-email-alibuda@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6fd27ea1
    • Eric Dumazet's avatar
      ppp: fix ppp_async_encode() illegal access · 40dddd4b
      Eric Dumazet authored
      syzbot reported an issue in ppp_async_encode() [1]
      
      In this case, pppoe_sendmsg() is called with a zero size.
      Then ppp_async_encode() is called with an empty skb.
      
      BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
       BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
        ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
        ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
        ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634
        ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline]
        ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304
        pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
        sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
        __release_sock+0x1da/0x330 net/core/sock.c:3072
        release_sock+0x6b/0x250 net/core/sock.c:3626
        pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
        sock_sendmsg_nosec net/socket.c:729 [inline]
        __sock_sendmsg+0x30f/0x380 net/socket.c:744
        ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
        ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
        __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
        __do_sys_sendmmsg net/socket.c:2771 [inline]
        __se_sys_sendmmsg net/socket.c:2768 [inline]
        __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
        x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Uninit was created at:
        slab_post_alloc_hook mm/slub.c:4092 [inline]
        slab_alloc_node mm/slub.c:4135 [inline]
        kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187
        kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
        __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
        alloc_skb include/linux/skbuff.h:1322 [inline]
        sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
        pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
        sock_sendmsg_nosec net/socket.c:729 [inline]
        __sock_sendmsg+0x30f/0x380 net/socket.c:744
        ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
        ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
        __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
        __do_sys_sendmmsg net/socket.c:2771 [inline]
        __se_sys_sendmmsg net/socket.c:2768 [inline]
        __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
        x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+1d121645899e7692f92a@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241009185802.3763282-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      40dddd4b
    • Simon Horman's avatar
      docs: netdev: document guidance on cleanup patches · aeb218d9
      Simon Horman authored
      The purpose of this section is to document what is the current practice
      regarding clean-up patches which address checkpatch warnings and similar
      problems. I feel there is a value in having this documented so others
      can easily refer to it.
      
      Clearly this topic is subjective. And to some extent the current
      practice discourages a wider range of patches than is described here.
      But I feel it is best to start somewhere, with the most well established
      part of the current practice.
      Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241009-doc-mc-clean-v2-1-e637b665fa81@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aeb218d9
    • Paolo Abeni's avatar
      Merge branch 'rtnetlink-handle-error-of-rtnl_register_module' · ffc8fa91
      Paolo Abeni authored
      Kuniyuki Iwashima says:
      
      ====================
      rtnetlink: Handle error of rtnl_register_module().
      
      While converting phonet to per-netns RTNL, I found a weird comment
      
        /* Further rtnl_register_module() cannot fail */
      
      that was true but no longer true after commit addf9b90 ("net:
      rtnetlink: use rcu to free rtnl message handlers").
      
      Many callers of rtnl_register_module() just ignore the returned
      value but should handle them properly.
      
      This series introduces two helpers, rtnl_register_many() and
      rtnl_unregister_many(), to do that easily and fix such callers.
      
      All rtnl_register() and rtnl_register_module() will be converted
      to _many() variant and some rtnl_lock() will be saved in _many()
      later in net-next.
      
      Changes:
        v4:
          * Add more context in changelog of each patch
      
        v3: https://lore.kernel.org/all/20241007124459.5727-1-kuniyu@amazon.com/
          * Move module *owner to struct rtnl_msg_handler
          * Make struct rtnl_msg_handler args/vars const
          * Update mctp goto labels
      
        v2: https://lore.kernel.org/netdev/20241004222358.79129-1-kuniyu@amazon.com/
          * Remove __exit from mctp_neigh_exit().
      
        v1: https://lore.kernel.org/netdev/20241003205725.5612-1-kuniyu@amazon.com/
      ====================
      
      Link: https://patch.msgid.link/20241008184737.9619-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ffc8fa91
    • Kuniyuki Iwashima's avatar
      phonet: Handle error of rtnl_register_module(). · b5e837c8
      Kuniyuki Iwashima authored
      Before commit addf9b90 ("net: rtnetlink: use rcu to free rtnl
      message handlers"), once the first rtnl_register_module() allocated
      rtnl_msg_handlers[PF_PHONET], the following calls never failed.
      
      However, after the commit, rtnl_register_module() could fail silently
      to allocate rtnl_msg_handlers[PF_PHONET][msgtype] and requires error
      handling for each call.
      
      Handling the error allows users to view a module as an all-or-nothing
      thing in terms of the rtnetlink functionality.  This prevents syzkaller
      from reporting spurious errors from its tests, where OOM often occurs
      and module is automatically loaded.
      
      Let's use rtnl_register_many() to handle the errors easily.
      
      Fixes: addf9b90 ("net: rtnetlink: use rcu to free rtnl message handlers")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Acked-by: default avatarRémi Denis-Courmont <courmisch@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b5e837c8
    • Kuniyuki Iwashima's avatar
      mpls: Handle error of rtnl_register_module(). · 5be2062e
      Kuniyuki Iwashima authored
      Since introduced, mpls_init() has been ignoring the returned
      value of rtnl_register_module(), which could fail silently.
      
      Handling the error allows users to view a module as an all-or-nothing
      thing in terms of the rtnetlink functionality.  This prevents syzkaller
      from reporting spurious errors from its tests, where OOM often occurs
      and module is automatically loaded.
      
      Let's handle the errors by rtnl_register_many().
      
      Fixes: 03c05665 ("mpls: Netlink commands to add, remove, and dump routes")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5be2062e
    • Kuniyuki Iwashima's avatar
      mctp: Handle error of rtnl_register_module(). · d5170561
      Kuniyuki Iwashima authored
      Since introduced, mctp has been ignoring the returned value of
      rtnl_register_module(), which could fail silently.
      
      Handling the error allows users to view a module as an all-or-nothing
      thing in terms of the rtnetlink functionality.  This prevents syzkaller
      from reporting spurious errors from its tests, where OOM often occurs
      and module is automatically loaded.
      
      Let's handle the errors by rtnl_register_many().
      
      Fixes: 583be982 ("mctp: Add device handling and netlink interface")
      Fixes: 831119f8 ("mctp: Add neighbour netlink interface")
      Fixes: 06d2f4c5 ("mctp: Add netlink route management")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d5170561
    • Kuniyuki Iwashima's avatar
      bridge: Handle error of rtnl_register_module(). · cba5e43b
      Kuniyuki Iwashima authored
      Since introduced, br_vlan_rtnl_init() has been ignoring the returned
      value of rtnl_register_module(), which could fail silently.
      
      Handling the error allows users to view a module as an all-or-nothing
      thing in terms of the rtnetlink functionality.  This prevents syzkaller
      from reporting spurious errors from its tests, where OOM often occurs
      and module is automatically loaded.
      
      Let's handle the errors by rtnl_register_many().
      
      Fixes: 8dcea187 ("net: bridge: vlan: add rtm definitions and dump support")
      Fixes: f26b2965 ("net: bridge: vlan: add new rtm message support")
      Fixes: adb3ce9b ("net: bridge: vlan: add del rtm message support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cba5e43b
    • Kuniyuki Iwashima's avatar
      vxlan: Handle error of rtnl_register_module(). · 78b7b991
      Kuniyuki Iwashima authored
      Since introduced, vxlan_vnifilter_init() has been ignoring the
      returned value of rtnl_register_module(), which could fail silently.
      
      Handling the error allows users to view a module as an all-or-nothing
      thing in terms of the rtnetlink functionality.  This prevents syzkaller
      from reporting spurious errors from its tests, where OOM often occurs
      and module is automatically loaded.
      
      Let's handle the errors by rtnl_register_many().
      
      Fixes: f9c4bb0b ("vxlan: vni filtering support on collect metadata device")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      78b7b991
    • Kuniyuki Iwashima's avatar
      rtnetlink: Add bulk registration helpers for rtnetlink message handlers. · 07cc7b0b
      Kuniyuki Iwashima authored
      Before commit addf9b90 ("net: rtnetlink: use rcu to free rtnl message
      handlers"), once rtnl_msg_handlers[protocol] was allocated, the following
      rtnl_register_module() for the same protocol never failed.
      
      However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to
      be allocated in each rtnl_register_module(), so each call could fail.
      
      Many callers of rtnl_register_module() do not handle the returned error,
      and we need to add many error handlings.
      
      To handle that easily, let's add wrapper functions for bulk registration
      of rtnetlink message handlers.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      07cc7b0b
    • Paolo Abeni's avatar
      Merge tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 9a3cd877
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Restrict xtables extensions to families that are safe, syzbot found
         a way to combine ebtables with extensions that are never used by
         userspace tools. From Florian Westphal.
      
      2) Set l3mdev inconditionally whenever possible in nft_fib to fix lookup
         mismatch, also from Florian.
      
      netfilter pull request 24-10-09
      
      * tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        selftests: netfilter: conntrack_vrf.sh: add fib test case
        netfilter: fib: check correct rtable in vrf setups
        netfilter: xtables: avoid NFPROTO_UNSPEC where needed
      ====================
      
      Link: https://patch.msgid.link/20241009213858.3565808-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9a3cd877
    • Eric Dumazet's avatar
      net: do not delay dst_entries_add() in dst_release() · ac888d58
      Eric Dumazet authored
      dst_entries_add() uses per-cpu data that might be freed at netns
      dismantle from ip6_route_net_exit() calling dst_entries_destroy()
      
      Before ip6_route_net_exit() can be called, we release all
      the dsts associated with this netns, via calls to dst_release(),
      which waits an rcu grace period before calling dst_destroy()
      
      dst_entries_add() use in dst_destroy() is racy, because
      dst_entries_destroy() could have been called already.
      
      Decrementing the number of dsts must happen sooner.
      
      Notes:
      
      1) in CONFIG_XFRM case, dst_destroy() can call
         dst_release_immediate(child), this might also cause UAF
         if the child does not have DST_NOCOUNT set.
         IPSEC maintainers might take a look and see how to address this.
      
      2) There is also discussion about removing this count of dst,
         which might happen in future kernels.
      
      Fixes: f8864972 ("ipv4: fix dst race in sk_dst_get()")
      Closes: https://lore.kernel.org/lkml/CANn89iLCCGsP7SFn9HKpvnKu96Td4KD08xf7aGtiYgZnkjaL=w@mail.gmail.com/T/Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Tested-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Tested-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://patch.msgid.link/20241008143110.1064899-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ac888d58
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · a354733c
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-10-08 (ice, i40e, igb, e1000e)
      
      This series contains updates to ice, i40e, igb, and e1000e drivers.
      
      For ice:
      
      Marcin allows driver to load, into safe mode, when DDP package is
      missing or corrupted and adjusts the netif_is_ice() check to
      account for when the device is in safe mode. He also fixes an
      out-of-bounds issue when MSI-X are increased for VFs.
      
      Wojciech clears FDB entries on reset to match the hardware state.
      
      For i40e:
      
      Aleksandr adds locking around MACVLAN filters to prevent memory leaks
      due to concurrency issues.
      
      For igb:
      
      Mohamed Khalfella adds a check to not attempt to bring up an already
      running interface on non-fatal PCIe errors.
      
      For e1000e:
      
      Vitaly changes board type for I219 to more closely match the hardware
      and stop PHY issues.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        e1000e: change I219 (19) devices to ADP
        igb: Do not bring the device up after non-fatal error
        i40e: Fix macvlan leak by synchronizing access to mac_filter_hash
        ice: Fix increasing MSI-X on VF
        ice: Flush FDB entries before reset
        ice: Fix netif_is_ice() in Safe Mode
        ice: Fix entering Safe Mode
      ====================
      
      Link: https://patch.msgid.link/20241008230050.928245-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a354733c
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-misc-fixes-involving-fallback-to-tcp' · 5151a35c
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      mptcp: misc. fixes involving fallback to TCP
      
      - Patch 1: better handle DSS corruptions from a bugged peer: reducing
        warnings, doing a fallback or a reset depending on the subflow state.
        For >= v5.7.
      
      - Patch 2: fix DSS corruption due to large pmtu xmit, where MPTCP was
        not taken into account. For >= v5.6.
      
      - Patch 3: fallback when MPTCP opts are dropped after the first data
        packet, instead of resetting the connection. For >= v5.6.
      
      - Patch 4: restrict the removal of a subflow to other closing states, a
        better fix, for a recent one. For >= v5.10.
      ====================
      
      Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-0-c6fb8e93e551@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5151a35c
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: do not remove closing subflows · db0a37b7
      Matthieu Baerts (NGI0) authored
      In a previous fix, the in-kernel path-manager has been modified not to
      retrigger the removal of a subflow if it was already closed, e.g. when
      the initial subflow is removed, but kept in the subflows list.
      
      To be complete, this fix should also skip the subflows that are in any
      closing state: mptcp_close_ssk() will initiate the closure, but the
      switch to the TCP_CLOSE state depends on the other peer.
      
      Fixes: 58e1b66b ("mptcp: pm: do not remove already closed subflows")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-4-c6fb8e93e551@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db0a37b7
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: fallback when MPTCP opts are dropped after 1st data · 119d51e2
      Matthieu Baerts (NGI0) authored
      As reported by Christoph [1], before this patch, an MPTCP connection was
      wrongly reset when a host received a first data packet with MPTCP
      options after the 3wHS, but got the next ones without.
      
      According to the MPTCP v1 specs [2], a fallback should happen in this
      case, because the host didn't receive a DATA_ACK from the other peer,
      nor receive data for more than the initial window which implies a
      DATA_ACK being received by the other peer.
      
      The patch here re-uses the same logic as the one used in other places:
      by looking at allow_infinite_fallback, which is disabled at the creation
      of an additional subflow. It's not looking at the first DATA_ACK (or
      implying one received from the other side) as suggested by the RFC, but
      it is in continuation with what was already done, which is safer, and it
      fixes the reported issue. The next step, looking at this first DATA_ACK,
      is tracked in [4].
      
      This patch has been validated using the following Packetdrill script:
      
         0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
        +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
        +0 bind(3, ..., ...) = 0
        +0 listen(3, 1) = 0
      
        // 3WHS is OK
        +0.0 < S  0:0(0)       win 65535  <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
        +0.0 > S. 0:0(0) ack 1            <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
        +0.1 <  . 1:1(0) ack 1 win 2048                                              <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
        +0 accept(3, ..., ...) = 4
      
        // Data from the client with valid MPTCP options (no DATA_ACK: normal)
        +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
        // From here, the MPTCP options will be dropped by a middlebox
        +0.0 >  . 1:1(0)     ack 501        <dss dack8=501 dll=0 nocs>
      
        +0.1 read(4, ..., 500) = 500
        +0   write(4, ..., 100) = 100
      
        // The server replies with data, still thinking MPTCP is being used
        +0.0 > P. 1:101(100)   ack 501          <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
        // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
        +0.1 <  . 501:501(0)   ack 101 win 2048
      
        +0.0 < P. 501:601(100) ack 101 win 2048
        // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
        +0.0 >  . 101:101(0)   ack 601
      
      Note that this script requires Packetdrill with MPTCP support, see [3].
      
      Fixes: dea2b1ea ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
      Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
      Link: https://github.com/multipath-tcp/packetdrill [3]
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-3-c6fb8e93e551@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      119d51e2
    • Paolo Abeni's avatar
      tcp: fix mptcp DSS corruption due to large pmtu xmit · 4dabcdf5
      Paolo Abeni authored
      Syzkaller was able to trigger a DSS corruption:
      
        TCP: request_sock_subflow_v4: Possible SYN flooding on port [::]:20002. Sending cookies.
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 5227 at net/mptcp/protocol.c:695 __mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
        Modules linked in:
        CPU: 0 UID: 0 PID: 5227 Comm: syz-executor350 Not tainted 6.11.0-syzkaller-08829-gaf9c191a #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
        RIP: 0010:__mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
        Code: 0f b6 dc 31 ff 89 de e8 b5 dd ea f5 89 d8 48 81 c4 50 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 98 da ea f5 90 <0f> 0b 90 e9 47 ff ff ff e8 8a da ea f5 90 0f 0b 90 e9 99 e0 ff ff
        RSP: 0018:ffffc90000006db8 EFLAGS: 00010246
        RAX: ffffffff8ba9df18 RBX: 00000000000055f0 RCX: ffff888030023c00
        RDX: 0000000000000100 RSI: 00000000000081e5 RDI: 00000000000055f0
        RBP: 1ffff110062bf1ae R08: ffffffff8ba9cf12 R09: 1ffff110062bf1b8
        R10: dffffc0000000000 R11: ffffed10062bf1b9 R12: 0000000000000000
        R13: dffffc0000000000 R14: 00000000700cec61 R15: 00000000000081e5
        FS:  000055556679c380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000020287000 CR3: 0000000077892000 CR4: 00000000003506f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         <IRQ>
         move_skbs_to_msk net/mptcp/protocol.c:811 [inline]
         mptcp_data_ready+0x29c/0xa90 net/mptcp/protocol.c:854
         subflow_data_ready+0x34a/0x920 net/mptcp/subflow.c:1490
         tcp_data_queue+0x20fd/0x76c0 net/ipv4/tcp_input.c:5283
         tcp_rcv_established+0xfba/0x2020 net/ipv4/tcp_input.c:6237
         tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
         tcp_v4_rcv+0x2dc0/0x37f0 net/ipv4/tcp_ipv4.c:2350
         ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
         ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
         NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
         NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
         __netif_receive_skb_one_core net/core/dev.c:5662 [inline]
         __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
         process_backlog+0x662/0x15b0 net/core/dev.c:6107
         __napi_poll+0xcb/0x490 net/core/dev.c:6771
         napi_poll net/core/dev.c:6840 [inline]
         net_rx_action+0x89b/0x1240 net/core/dev.c:6962
         handle_softirqs+0x2c5/0x980 kernel/softirq.c:554
         do_softirq+0x11b/0x1e0 kernel/softirq.c:455
         </IRQ>
         <TASK>
         __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
         local_bh_enable include/linux/bottom_half.h:33 [inline]
         rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
         __dev_queue_xmit+0x1764/0x3e80 net/core/dev.c:4451
         dev_queue_xmit include/linux/netdevice.h:3094 [inline]
         neigh_hh_output include/net/neighbour.h:526 [inline]
         neigh_output include/net/neighbour.h:540 [inline]
         ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236
         ip_local_out net/ipv4/ip_output.c:130 [inline]
         __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:536
         __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
         tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline]
         tcp_mtu_probe net/ipv4/tcp_output.c:2547 [inline]
         tcp_write_xmit+0x641d/0x6bf0 net/ipv4/tcp_output.c:2752
         __tcp_push_pending_frames+0x9b/0x360 net/ipv4/tcp_output.c:3015
         tcp_push_pending_frames include/net/tcp.h:2107 [inline]
         tcp_data_snd_check net/ipv4/tcp_input.c:5714 [inline]
         tcp_rcv_established+0x1026/0x2020 net/ipv4/tcp_input.c:6239
         tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
         sk_backlog_rcv include/net/sock.h:1113 [inline]
         __release_sock+0x214/0x350 net/core/sock.c:3072
         release_sock+0x61/0x1f0 net/core/sock.c:3626
         mptcp_push_release net/mptcp/protocol.c:1486 [inline]
         __mptcp_push_pending+0x6b5/0x9f0 net/mptcp/protocol.c:1625
         mptcp_sendmsg+0x10bb/0x1b10 net/mptcp/protocol.c:1903
         sock_sendmsg_nosec net/socket.c:730 [inline]
         __sock_sendmsg+0x1a6/0x270 net/socket.c:745
         ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2603
         ___sys_sendmsg net/socket.c:2657 [inline]
         __sys_sendmsg+0x2aa/0x390 net/socket.c:2686
         do_syscall_x64 arch/x86/entry/common.c:52 [inline]
         do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
         entry_SYSCALL_64_after_hwframe+0x77/0x7f
        RIP: 0033:0x7fb06e9317f9
        Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
        RSP: 002b:00007ffe2cfd4f98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
        RAX: ffffffffffffffda RBX: 00007fb06e97f468 RCX: 00007fb06e9317f9
        RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000005
        RBP: 00007fb06e97f446 R08: 0000555500000000 R09: 0000555500000000
        R10: 0000555500000000 R11: 0000000000000246 R12: 00007fb06e97f406
        R13: 0000000000000001 R14: 00007ffe2cfd4fe0 R15: 0000000000000003
         </TASK>
      
      Additionally syzkaller provided a nice reproducer. The repro enables
      pmtu on the loopback device, leading to tcp_mtu_probe() generating
      very large probe packets.
      
      tcp_can_coalesce_send_queue_head() currently does not check for
      mptcp-level invariants, and allowed the creation of cross-DSS probes,
      leading to the mentioned corruption.
      
      Address the issue teaching tcp_can_coalesce_send_queue_head() about
      mptcp using the tcp_skb_can_collapse(), also reducing the code
      duplication.
      
      Fixes: 85712484 ("tcp: coalesce/collapse must respect MPTCP extensions")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+d1bff73460e33101f0e7@syzkaller.appspotmail.com
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/513Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-2-c6fb8e93e551@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4dabcdf5
    • Paolo Abeni's avatar
      mptcp: handle consistently DSS corruption · e32d262c
      Paolo Abeni authored
      Bugged peer implementation can send corrupted DSS options, consistently
      hitting a few warning in the data path. Use DEBUG_NET assertions, to
      avoid the splat on some builds and handle consistently the error, dumping
      related MIBs and performing fallback and/or reset according to the
      subflow type.
      
      Fixes: 6771bfd9 ("mptcp: update mptcp ack sequence from work queue")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-1-c6fb8e93e551@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e32d262c
    • Breno Leitao's avatar
      net: netconsole: fix wrong warning · d94785bb
      Breno Leitao authored
      A warning is triggered when there is insufficient space in the buffer
      for userdata. However, this is not an issue since userdata will be sent
      in the next iteration.
      
      Current warning message:
      
          ------------[ cut here ]------------
           WARNING: CPU: 13 PID: 3013042 at drivers/net/netconsole.c:1122 write_ext_msg+0x3b6/0x3d0
            ? write_ext_msg+0x3b6/0x3d0
            console_flush_all+0x1e9/0x330
      
      The code incorrectly issues a warning when this_chunk is zero, which is
      a valid scenario. The warning should only be triggered when this_chunk
      is negative.
      
      Fixes: 1ec9daf9 ("net: netconsole: append userdata to fragmented netconsole messages")
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241008094325.896208-1-leitao@debian.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d94785bb
    • Vladimir Oltean's avatar
      net: dsa: refuse cross-chip mirroring operations · 8c924369
      Vladimir Oltean authored
      In case of a tc mirred action from one switch to another, the behavior
      is not correct. We simply tell the source switch driver to program a
      mirroring entry towards mirror->to_local_port = to_dp->index, but it is
      not even guaranteed that the to_dp belongs to the same switch as dp.
      
      For proper cross-chip support, we would need to go through the
      cross-chip notifier layer in switch.c, program the entry on cascade
      ports, and introduce new, explicit API for cross-chip mirroring, given
      that intermediary switches should have introspection into the DSA tags
      passed through the cascade port (and not just program a port mirror on
      the entire cascade port). None of that exists today.
      
      Reject what is not implemented so that user space is not misled into
      thinking it works.
      
      Fixes: f50f2127 ("net: dsa: Add plumbing for port mirroring")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://patch.msgid.link/20241008094320.3340980-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8c924369
    • Wei Fang's avatar
      net: fec: don't save PTP state if PTP is unsupported · 6be06307
      Wei Fang authored
      Some platforms (such as i.MX25 and i.MX27) do not support PTP, so on
      these platforms fec_ptp_init() is not called and the related members
      in fep are not initialized. However, fec_ptp_save_state() is called
      unconditionally, which causes the kernel to panic. Therefore, add a
      condition so that fec_ptp_save_state() is not called if PTP is not
      supported.
      
      Fixes: a1477dc8 ("net: fec: Restart PPS after link state change")
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Closes: https://lore.kernel.org/lkml/353e41fe-6bb4-4ee9-9980-2da2a9c1c508@roeck-us.net/Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarCsókás, Bence <csokas.bence@prolan.hu>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://patch.msgid.link/20241008061153.1977930-1-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6be06307
    • Rosen Penev's avatar
      net: ibm: emac: mal: add dcr_unmap to _remove · 080ddc22
      Rosen Penev authored
      It's done in probe so it should be undone here.
      
      Fixes: 1d3bb996 ("Device tree aware EMAC driver")
      Signed-off-by: default avatarRosen Penev <rosenp@gmail.com>
      Reviewed-by: default avatarBreno Leitao <leitao@debian.org>
      Link: https://patch.msgid.link/20241008233050.9422-1-rosenp@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      080ddc22
    • Jacky Chou's avatar
      net: ftgmac100: fixed not check status from fixed phy · 70a0da8c
      Jacky Chou authored
      Add error handling from calling fixed_phy_register.
      It may return some error, therefore, need to check the status.
      
      And fixed_phy_register needs to bind a device node for mdio.
      Add the mac device node for fixed_phy_register function.
      This is a reference to this function, of_phy_register_fixed_link().
      
      Fixes: e24a6c87 ("net: ftgmac100: Get link speed and duplex for NC-SI")
      Signed-off-by: default avatarJacky Chou <jacky_chou@aspeedtech.com>
      Link: https://patch.msgid.link/20241007032435.787892-1-jacky_chou@aspeedtech.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      70a0da8c
  3. 09 Oct, 2024 4 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-10-09-15-46' of... · d3d15566
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "12 hotfixes, 5 of which are c:stable. All singletons, about half of
        which are MM"
      
      * tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        mm: zswap: delete comments for "value" member of 'struct zswap_entry'.
        CREDITS: sort alphabetically by name
        secretmem: disable memfd_secret() if arch cannot set direct map
        .mailmap: update Fangrui's email
        mm/huge_memory: check pmd_special() only after pmd_present()
        resource, kunit: fix user-after-free in resource_test_region_intersects()
        fs/proc/kcore.c: allow translation of physical memory addresses
        selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
        device-dax: correct pgoff align in dax_set_mapping()
        kthread: unpark only parked kthread
        Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"
        bcachefs: do not use PF_MEMALLOC_NORECLAIM
      d3d15566
    • Florian Westphal's avatar
      selftests: netfilter: conntrack_vrf.sh: add fib test case · c6a0862b
      Florian Westphal authored
      meta iifname veth0 ip daddr ... fib daddr oif
      
      ... is expected to return "dummy0" interface which is part of same vrf
      as veth0.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c6a0862b
    • Florian Westphal's avatar
      netfilter: fib: check correct rtable in vrf setups · 05ef7055
      Florian Westphal authored
      We need to init l3mdev unconditionally, else main routing table is searched
      and incorrect result is returned unless strict (iif keyword) matching is
      requested.
      
      Next patch adds a selftest for this.
      
      Fixes: 2a8a7c0e ("netfilter: nft_fib: Fix for rpath check with VRF devices")
      Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1761Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      05ef7055
    • Florian Westphal's avatar
      netfilter: xtables: avoid NFPROTO_UNSPEC where needed · 0bfcb7b7
      Florian Westphal authored
      syzbot managed to call xt_cluster match via ebtables:
      
       WARNING: CPU: 0 PID: 11 at net/netfilter/xt_cluster.c:72 xt_cluster_mt+0x196/0x780
       [..]
       ebt_do_table+0x174b/0x2a40
      
      Module registers to NFPROTO_UNSPEC, but it assumes ipv4/ipv6 packet
      processing.  As this is only useful to restrict locally terminating
      TCP/UDP traffic, register this for ipv4 and ipv6 family only.
      
      Pablo points out that this is a general issue, direct users of the
      set/getsockopt interface can call into targets/matches that were only
      intended for use with ip(6)tables.
      
      Check all UNSPEC matches and targets for similar issues:
      
      - matches and targets are fine except if they assume skb_network_header()
        is valid -- this is only true when called from inet layer: ip(6) stack
        pulls the ip/ipv6 header into linear data area.
      - targets that return XT_CONTINUE or other xtables verdicts must be
        restricted too, they are incompatbile with the ebtables traverser, e.g.
        EBT_CONTINUE is a completely different value than XT_CONTINUE.
      
      Most matches/targets are changed to register for NFPROTO_IPV4/IPV6, as
      they are provided for use by ip(6)tables.
      
      The MARK target is also used by arptables, so register for NFPROTO_ARP too.
      
      While at it, bail out if connbytes fails to enable the corresponding
      conntrack family.
      
      This change passes the selftests in iptables.git.
      
      Reported-by: syzbot+256c348558aa5cf611a9@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/netfilter-devel/66fec2e2.050a0220.9ec68.0047.GAE@google.com/
      Fixes: 0269ea49 ("netfilter: xtables: add cluster match")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Co-developed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0bfcb7b7