1. 08 Aug, 2024 10 commits
    • James Chapman's avatar
      l2tp: fix lockdep splat · 86a41ea9
      James Chapman authored
      When l2tp tunnels use a socket provided by userspace, we can hit
      lockdep splats like the below when data is transmitted through another
      (unrelated) userspace socket which then gets routed over l2tp.
      
      This issue was previously discussed here:
      https://lore.kernel.org/netdev/87sfialu2n.fsf@cloudflare.com/
      
      The solution is to have lockdep treat socket locks of l2tp tunnel
      sockets separately than those of standard INET sockets. To do so, use
      a different lockdep subclass where lock nesting is possible.
      
        ============================================
        WARNING: possible recursive locking detected
        6.10.0+ #34 Not tainted
        --------------------------------------------
        iperf3/771 is trying to acquire lock:
        ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0
      
        but task is already holding lock:
        ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(slock-AF_INET/1);
          lock(slock-AF_INET/1);
      
         *** DEADLOCK ***
      
         May be due to missing lock nesting notation
      
        10 locks held by iperf3/771:
         #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40
         #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
         #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
         #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0
         #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260
         #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
         #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
         #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
         #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450
         #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450
      
        stack backtrace:
        CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
        Call Trace:
         <IRQ>
         dump_stack_lvl+0x69/0xa0
         dump_stack+0xc/0x20
         __lock_acquire+0x135d/0x2600
         ? srso_alias_return_thunk+0x5/0xfbef5
         lock_acquire+0xc4/0x2a0
         ? l2tp_xmit_skb+0x243/0x9d0
         ? __skb_checksum+0xa3/0x540
         _raw_spin_lock_nested+0x35/0x50
         ? l2tp_xmit_skb+0x243/0x9d0
         l2tp_xmit_skb+0x243/0x9d0
         l2tp_eth_dev_xmit+0x3c/0xc0
         dev_hard_start_xmit+0x11e/0x420
         sch_direct_xmit+0xc3/0x640
         __dev_queue_xmit+0x61c/0x1450
         ? ip_finish_output2+0xf4c/0x1130
         ip_finish_output2+0x6b6/0x1130
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? __ip_finish_output+0x217/0x380
         ? srso_alias_return_thunk+0x5/0xfbef5
         __ip_finish_output+0x217/0x380
         ip_output+0x99/0x120
         __ip_queue_xmit+0xae4/0xbc0
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? tcp_options_write.constprop.0+0xcb/0x3e0
         ip_queue_xmit+0x34/0x40
         __tcp_transmit_skb+0x1625/0x1890
         __tcp_send_ack+0x1b8/0x340
         tcp_send_ack+0x23/0x30
         __tcp_ack_snd_check+0xa8/0x530
         ? srso_alias_return_thunk+0x5/0xfbef5
         tcp_rcv_established+0x412/0xd70
         tcp_v4_do_rcv+0x299/0x420
         tcp_v4_rcv+0x1991/0x1e10
         ip_protocol_deliver_rcu+0x50/0x220
         ip_local_deliver_finish+0x158/0x260
         ip_local_deliver+0xc8/0xe0
         ip_rcv+0xe5/0x1d0
         ? __pfx_ip_rcv+0x10/0x10
         __netif_receive_skb_one_core+0xce/0xe0
         ? process_backlog+0x28b/0x9f0
         __netif_receive_skb+0x34/0xd0
         ? process_backlog+0x28b/0x9f0
         process_backlog+0x2cb/0x9f0
         __napi_poll.constprop.0+0x61/0x280
         net_rx_action+0x332/0x670
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? find_held_lock+0x2b/0x80
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? srso_alias_return_thunk+0x5/0xfbef5
         handle_softirqs+0xda/0x480
         ? __dev_queue_xmit+0xa2c/0x1450
         do_softirq+0xa1/0xd0
         </IRQ>
         <TASK>
         __local_bh_enable_ip+0xc8/0xe0
         ? __dev_queue_xmit+0xa2c/0x1450
         __dev_queue_xmit+0xa48/0x1450
         ? ip_finish_output2+0xf4c/0x1130
         ip_finish_output2+0x6b6/0x1130
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? __ip_finish_output+0x217/0x380
         ? srso_alias_return_thunk+0x5/0xfbef5
         __ip_finish_output+0x217/0x380
         ip_output+0x99/0x120
         __ip_queue_xmit+0xae4/0xbc0
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? tcp_options_write.constprop.0+0xcb/0x3e0
         ip_queue_xmit+0x34/0x40
         __tcp_transmit_skb+0x1625/0x1890
         tcp_write_xmit+0x766/0x2fb0
         ? __entry_text_end+0x102ba9/0x102bad
         ? srso_alias_return_thunk+0x5/0xfbef5
         ? __might_fault+0x74/0xc0
         ? srso_alias_return_thunk+0x5/0xfbef5
         __tcp_push_pending_frames+0x56/0x190
         tcp_push+0x117/0x310
         tcp_sendmsg_locked+0x14c1/0x1740
         tcp_sendmsg+0x28/0x40
         inet_sendmsg+0x5d/0x90
         sock_write_iter+0x242/0x2b0
         vfs_write+0x68d/0x800
         ? __pfx_sock_write_iter+0x10/0x10
         ksys_write+0xc8/0xf0
         __x64_sys_write+0x3d/0x50
         x64_sys_call+0xfaf/0x1f50
         do_syscall_64+0x6d/0x140
         entry_SYSCALL_64_after_hwframe+0x76/0x7e
        RIP: 0033:0x7f4d143af992
        Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0
        RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
        RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992
        RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005
        RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc
        R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0
         </TASK>
      
      Fixes: 0b2c5972 ("l2tp: close all race conditions in l2tp_tunnel_register()")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+6acef9e0a4d1f46c83d4@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4
      CC: gnault@redhat.com
      CC: cong.wang@bytedance.com
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Link: https://patch.msgid.link/20240806160626.1248317-1-jchapman@katalix.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      86a41ea9
    • Russell King (Oracle)'s avatar
      net: stmmac: dwmac4: fix PCS duplex mode decode · 85ba108a
      Russell King (Oracle) authored
      dwmac4 was decoding the duplex mode from the GMAC_PHYIF_CONTROL_STATUS
      register incorrectly, using GMAC_PHYIF_CTRLSTATUS_LNKMOD_MASK (value 1)
      rather than GMAC_PHYIF_CTRLSTATUS_LNKMOD (bit 16). Fix this.
      
      Fixes: 70523e63 ("drivers: net: stmmac: reworking the PCS code.")
      Reviewed-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://patch.msgid.link/E1sbJvd-001rGD-E3@rmk-PC.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      85ba108a
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2024-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · b928e7d1
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - hci_sync: avoid dup filtering when passive scanning with adv monitor
       - hci_qca: don't call pwrseq_power_off() twice for QCA6390
       - hci_qca: fix QCA6390 support on non-DT platforms
       - hci_qca: fix a NULL-pointer derefence at shutdown
       - l2cap: always unlock channel in l2cap_conless_channel()
      
      * tag 'for-net-2024-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor
        Bluetooth: l2cap: always unlock channel in l2cap_conless_channel()
        Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown
        Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms
        Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390
      ====================
      
      Link: https://patch.msgid.link/20240807210103.142483-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b928e7d1
    • Jakub Kicinski's avatar
      Merge branch 'idpf-fix-3-bugs-revealed-by-the-chapter-i' · bc59b558
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      idpf: fix 3 bugs revealed by the Chapter I
      
      Alexander Lobakin says:
      
      The libeth conversion revealed 2 serious issues which lead to sporadic
      crashes or WARNs under certain configurations. Additional one was found
      while debugging these two with kmemleak.
      This one is targeted stable, the rest can be backported manually later
      if needed. They can be reproduced only after the conversion is applied
      anyway.
      ====================
      
      Link: https://patch.msgid.link/20240806220923.3359860-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc59b558
    • Alexander Lobakin's avatar
      idpf: fix UAFs when destroying the queues · 290f1c03
      Alexander Lobakin authored
      The second tagged commit started sometimes (very rarely, but possible)
      throwing WARNs from
      net/core/page_pool.c:page_pool_disable_direct_recycling().
      Turned out idpf frees interrupt vectors with embedded NAPIs *before*
      freeing the queues making page_pools' NAPI pointers lead to freed
      memory before these pools are destroyed by libeth.
      It's not clear whether there are other accesses to the freed vectors
      when destroying the queues, but anyway, we usually free queue/interrupt
      vectors only when the queues are destroyed and the NAPIs are guaranteed
      to not be referenced anywhere.
      
      Invert the allocation and freeing logic making queue/interrupt vectors
      be allocated first and freed last. Vectors don't require queues to be
      present, so this is safe. Additionally, this change allows to remove
      that useless queue->q_vector pointer cleanup, as vectors are still
      valid when freeing the queues (+ both are freed within one function,
      so it's not clear why nullify the pointers at all).
      
      Fixes: 1c325aac ("idpf: configure resources for TX queues")
      Fixes: 90912f9f ("idpf: convert header split mode to libeth + napi_build_skb()")
      Reported-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://patch.msgid.link/20240806220923.3359860-4-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      290f1c03
    • Michal Kubiak's avatar
      idpf: fix memleak in vport interrupt configuration · 3cc88e84
      Michal Kubiak authored
      The initialization of vport interrupt consists of two functions:
       1) idpf_vport_intr_init() where a generic configuration is done
       2) idpf_vport_intr_req_irq() where the irq for each q_vector is
         requested.
      
      The first function used to create a base name for each interrupt using
      "kasprintf()" call. Unfortunately, although that call allocated memory
      for a text buffer, that memory was never released.
      
      Fix this by removing creating the interrupt base name in 1).
      Instead, always create a full interrupt name in the function 2), because
      there is no need to create a base name separately, considering that the
      function 2) is never called out of idpf_vport_intr_init() context.
      
      Fixes: d4d55871 ("idpf: initialize interrupts and enable vport")
      Cc: stable@vger.kernel.org # 6.7
      Signed-off-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Reviewed-by: default avatarPavan Kumar Linga <pavan.kumar.linga@intel.com>
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://patch.msgid.link/20240806220923.3359860-3-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3cc88e84
    • Alexander Lobakin's avatar
      idpf: fix memory leaks and crashes while performing a soft reset · f01032a2
      Alexander Lobakin authored
      The second tagged commit introduced a UAF, as it removed restoring
      q_vector->vport pointers after reinitializating the structures.
      This is due to that all queue allocation functions are performed here
      with the new temporary vport structure and those functions rewrite
      the backpointers to the vport. Then, this new struct is freed and
      the pointers start leading to nowhere.
      
      But generally speaking, the current logic is very fragile. It claims
      to be more reliable when the system is low on memory, but in fact, it
      consumes two times more memory as at the moment of running this
      function, there are two vports allocated with their queues and vectors.
      Moreover, it claims to prevent the driver from running into "bad state",
      but in fact, any error during the rebuild leaves the old vport in the
      partially allocated state.
      Finally, if the interface is down when the function is called, it always
      allocates a new queue set, but when the user decides to enable the
      interface later on, vport_open() allocates them once again, IOW there's
      a clear memory leak here.
      
      Just don't allocate a new queue set when performing a reset, that solves
      crashes and memory leaks. Readd the old queue number and reopen the
      interface on rollback - that solves limbo states when the device is left
      disabled and/or without HW queues enabled.
      
      Fixes: 02cbfba1 ("idpf: add ethtool callbacks")
      Fixes: e4891e46 ("idpf: split &idpf_queue into 4 strictly-typed queue structures")
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://patch.msgid.link/20240806220923.3359860-2-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f01032a2
    • Michael Chan's avatar
      bnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl() · da03f5d1
      Michael Chan authored
      A recent commit has modified the code in __bnxt_reserve_rings() to
      set the default RSS indirection table to default only when the number
      of RX rings is changing.  While this works for newer firmware that
      requires RX ring reservations, it causes the regression on older
      firmware not requiring RX ring resrvations (BNXT_NEW_RM() returns
      false).
      
      With older firmware, RX ring reservations are not required and so
      hw_resc->resv_rx_rings is not always set to the proper value.  The
      comparison:
      
      if (old_rx_rings != bp->hw_resc.resv_rx_rings)
      
      in __bnxt_reserve_rings() may be false even when the RX rings are
      changing.  This will cause __bnxt_reserve_rings() to skip setting
      the default RSS indirection table to default to match the current
      number of RX rings.  This may later cause bnxt_fill_hw_rss_tbl() to
      use an out-of-range index.
      
      We already have bnxt_check_rss_tbl_no_rmgr() to handle exactly this
      scenario.  We just need to move it up in bnxt_need_reserve_rings()
      to be called unconditionally when using older firmware.  Without the
      fix, if the TX rings are changing, we'll skip the
      bnxt_check_rss_tbl_no_rmgr() call and __bnxt_reserve_rings() may also
      skip the bnxt_set_dflt_rss_indir_tbl() call for the reason explained
      in the last paragraph.  Without setting the default RSS indirection
      table to default, it causes the regression:
      
      BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
      Read of size 2 at addr ffff8881c5809618 by task ethtool/31525
      Call Trace:
      __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
       bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460
       __bnxt_setup_vnic_p5+0x12e/0x270
       __bnxt_open_nic+0x2262/0x2f30
       bnxt_open_nic+0x5d/0xf0
       ethnl_set_channels+0x5d4/0xb30
       ethnl_default_set_doit+0x2f1/0x620
      Reported-by: default avatarBreno Leitao <leitao@debian.org>
      Closes: https://lore.kernel.org/netdev/ZrC6jpghA3PWVWSB@gmail.com/
      Fixes: 98ba1d93 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()")
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Reviewed-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Tested-by: default avatarBreno Leitao <leitao@debian.org>
      Link: https://patch.msgid.link/20240806053742.140304-1-michael.chan@broadcom.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      da03f5d1
    • Joe Hattori's avatar
      net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register() · e3862093
      Joe Hattori authored
      bcm_sf2_mdio_register() calls of_phy_find_device() and then
      phy_device_remove() in a loop to remove existing PHY devices.
      of_phy_find_device() eventually calls bus_find_device(), which calls
      get_device() on the returned struct device * to increment the refcount.
      The current implementation does not decrement the refcount, which causes
      memory leak.
      
      This commit adds the missing phy_device_free() call to decrement the
      refcount via put_device() to balance the refcount.
      
      Fixes: 771089c2 ("net: dsa: bcm_sf2: Ensure that MDIO diversion is used")
      Signed-off-by: default avatarJoe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
      Tested-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://patch.msgid.link/20240806011327.3817861-1-joe@pf.is.s.u-tokyo.ac.jpSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e3862093
    • Zhengchao Shao's avatar
      net/smc: add the max value of fallback reason count · d27a835f
      Zhengchao Shao authored
      The number of fallback reasons defined in the smc_clc.h file has reached
      36. For historical reasons, some are no longer quoted, and there's 33
      actually in use. So, add the max value of fallback reason count to 36.
      
      Fixes: 6ac1e656 ("net/smc: support smc v2.x features validate")
      Fixes: 7f0620b9 ("net/smc: support max connections per lgr negotiation")
      Fixes: 69b888e3 ("net/smc: support max links per lgr negotiation in clc handshake")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Reviewed-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Link: https://patch.msgid.link/20240805043856.565677-1-shaozhengchao@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d27a835f
  2. 07 Aug, 2024 7 commits
  3. 06 Aug, 2024 1 commit
  4. 05 Aug, 2024 5 commits
  5. 04 Aug, 2024 1 commit
    • Dmitry Safonov's avatar
      net/tcp: Disable TCP-AO static key after RCU grace period · 14ab4792
      Dmitry Safonov authored
      The lifetime of TCP-AO static_key is the same as the last
      tcp_ao_info. On the socket destruction tcp_ao_info ceases to be
      with RCU grace period, while tcp-ao static branch is currently deferred
      destructed. The static key definition is
      : DEFINE_STATIC_KEY_DEFERRED_FALSE(tcp_ao_needed, HZ);
      
      which means that if RCU grace period is delayed by more than a second
      and tcp_ao_needed is in the process of disablement, other CPUs may
      yet see tcp_ao_info which atent dead, but soon-to-be.
      And that breaks the assumption of static_key_fast_inc_not_disabled().
      
      See the comment near the definition:
      > * The caller must make sure that the static key can't get disabled while
      > * in this function. It doesn't patch jump labels, only adds a user to
      > * an already enabled static key.
      
      Originally it was introduced in commit eb8c5072 ("jump_label:
      Prevent key->enabled int overflow"), which is needed for the atomic
      contexts, one of which would be the creation of a full socket from a
      request socket. In that atomic context, it's known by the presence
      of the key (md5/ao) that the static branch is already enabled.
      So, the ref counter for that static branch is just incremented
      instead of holding the proper mutex.
      static_key_fast_inc_not_disabled() is just a helper for such usage
      case. But it must not be used if the static branch could get disabled
      in parallel as it's not protected by jump_label_mutex and as a result,
      races with jump_label_update() implementation details.
      
      Happened on netdev test-bot[1], so not a theoretical issue:
      
      [] jump_label: Fatal kernel bug, unexpected op at tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66 90 0f 1f 00)) size:2 type:1
      [] ------------[ cut here ]------------
      [] kernel BUG at arch/x86/kernel/jump_label.c:73!
      [] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
      [] CPU: 3 PID: 243 Comm: kworker/3:3 Not tainted 6.10.0-virtme #1
      [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
      [] Workqueue: events jump_label_update_timeout
      [] RIP: 0010:__jump_label_patch+0x2f6/0x350
      ...
      [] Call Trace:
      []  <TASK>
      []  arch_jump_label_transform_queue+0x6c/0x110
      []  __jump_label_update+0xef/0x350
      []  __static_key_slow_dec_cpuslocked.part.0+0x3c/0x60
      []  jump_label_update_timeout+0x2c/0x40
      []  process_one_work+0xe3b/0x1670
      []  worker_thread+0x587/0xce0
      []  kthread+0x28a/0x350
      []  ret_from_fork+0x31/0x70
      []  ret_from_fork_asm+0x1a/0x30
      []  </TASK>
      [] Modules linked in: veth
      [] ---[ end trace 0000000000000000 ]---
      [] RIP: 0010:__jump_label_patch+0x2f6/0x350
      
      [1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/696681/5-connect-deny-ipv6/stderr
      
      Cc: stable@kernel.org
      Fixes: 67fa83f7 ("net/tcp: Add static_key for TCP-AO")
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14ab4792
  6. 02 Aug, 2024 12 commits
    • Praveen Kaligineedi's avatar
      gve: Fix use of netif_carrier_ok() · fba917b1
      Praveen Kaligineedi authored
      GVE driver wrongly relies on netif_carrier_ok() to check the
      interface administrative state when resources are being
      allocated/deallocated for queue(s). netif_carrier_ok() needs
      to be replaced with netif_running() for all such cases.
      
      Administrative state is the result of "ip link set dev <dev>
      up/down". It reflects whether the administrator wants to use
      the device for traffic and the corresponding resources have
      been allocated.
      
      Fixes: 5f08cd3d ("gve: Alloc before freeing when adjusting queues")
      Signed-off-by: default avatarPraveen Kaligineedi <pkaligineedi@google.com>
      Reviewed-by: default avatarShailend Chand <shailend@google.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://patch.msgid.link/20240801205619.987396-1-pkaligineedi@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fba917b1
    • Kyle Swenson's avatar
      net: pse-pd: tps23881: Fix the device ID check · 89108cb5
      Kyle Swenson authored
      The DEVID register contains two pieces of information: the device ID in
      the upper nibble, and the silicon revision number in the lower nibble.
      The driver should work fine with any silicon revision, so let's mask
      that out in the device ID check.
      
      Fixes: 20e6d190 ("net: pse-pd: Add TI TPS23881 PSE controller driver")
      Signed-off-by: default avatarKyle Swenson <kyle.swenson@est.tech>
      Reviewed-by: default avatarThomas Petazzoni <thomas.petazzoni@bootlin.com>
      Acked-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://patch.msgid.link/20240731154152.4020668-1-kyle.swenson@est.techSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      89108cb5
    • Kuniyuki Iwashima's avatar
      sctp: Fix null-ptr-deref in reuseport_add_sock(). · 9ab0faa7
      Kuniyuki Iwashima authored
      syzbot reported a null-ptr-deref while accessing sk2->sk_reuseport_cb in
      reuseport_add_sock(). [0]
      
      The repro first creates a listener with SO_REUSEPORT.  Then, it creates
      another listener on the same port and concurrently closes the first
      listener.
      
      The second listen() calls reuseport_add_sock() with the first listener as
      sk2, where sk2->sk_reuseport_cb is not expected to be cleared concurrently,
      but the close() does clear it by reuseport_detach_sock().
      
      The problem is SCTP does not properly synchronise reuseport_alloc(),
      reuseport_add_sock(), and reuseport_detach_sock().
      
      The caller of reuseport_alloc() and reuseport_{add,detach}_sock() must
      provide synchronisation for sockets that are classified into the same
      reuseport group.
      
      Otherwise, such sockets form multiple identical reuseport groups, and
      all groups except one would be silently dead.
      
        1. Two sockets call listen() concurrently
        2. No socket in the same group found in sctp_ep_hashtable[]
        3. Two sockets call reuseport_alloc() and form two reuseport groups
        4. Only one group hit first in __sctp_rcv_lookup_endpoint() receives
            incoming packets
      
      Also, the reported null-ptr-deref could occur.
      
      TCP/UDP guarantees that would not happen by holding the hash bucket lock.
      
      Let's apply the locking strategy to __sctp_hash_endpoint() and
      __sctp_unhash_endpoint().
      
      [0]:
      Oops: general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN PTI
      KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
      CPU: 1 UID: 0 PID: 10230 Comm: syz-executor119 Not tainted 6.10.0-syzkaller-12585-g301927d2 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
      RIP: 0010:reuseport_add_sock+0x27e/0x5e0 net/core/sock_reuseport.c:350
      Code: 00 0f b7 5d 00 bf 01 00 00 00 89 de e8 1b a4 ff f7 83 fb 01 0f 85 a3 01 00 00 e8 6d a0 ff f7 49 8d 7e 12 48 89 f8 48 c1 e8 03 <42> 0f b6 04 28 84 c0 0f 85 4b 02 00 00 41 0f b7 5e 12 49 8d 7e 14
      RSP: 0018:ffffc9000b947c98 EFLAGS: 00010202
      RAX: 0000000000000002 RBX: ffff8880252ddf98 RCX: ffff888079478000
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000012
      RBP: 0000000000000001 R08: ffffffff8993e18d R09: 1ffffffff1fef385
      R10: dffffc0000000000 R11: fffffbfff1fef386 R12: ffff8880252ddac0
      R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
      FS:  00007f24e45b96c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffcced5f7b8 CR3: 00000000241be000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       __sctp_hash_endpoint net/sctp/input.c:762 [inline]
       sctp_hash_endpoint+0x52a/0x600 net/sctp/input.c:790
       sctp_listen_start net/sctp/socket.c:8570 [inline]
       sctp_inet_listen+0x767/0xa20 net/sctp/socket.c:8625
       __sys_listen_socket net/socket.c:1883 [inline]
       __sys_listen+0x1b7/0x230 net/socket.c:1894
       __do_sys_listen net/socket.c:1902 [inline]
       __se_sys_listen net/socket.c:1900 [inline]
       __x64_sys_listen+0x5a/0x70 net/socket.c:1900
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f24e46039b9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f24e45b9228 EFLAGS: 00000246 ORIG_RAX: 0000000000000032
      RAX: ffffffffffffffda RBX: 00007f24e468e428 RCX: 00007f24e46039b9
      RDX: 00007f24e46039b9 RSI: 0000000000000003 RDI: 0000000000000004
      RBP: 00007f24e468e420 R08: 00007f24e45b96c0 R09: 00007f24e45b96c0
      R10: 00007f24e45b96c0 R11: 0000000000000246 R12: 00007f24e468e42c
      R13: 00007f24e465a5dc R14: 0020000000000001 R15: 00007ffcced5f7d8
       </TASK>
      Modules linked in:
      
      Fixes: 6ba84574 ("sctp: process sk_reuseport in sctp_get_port_local")
      Reported-by: syzbot+e6979a5d2f10ecb700e4@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=e6979a5d2f10ecb700e4
      Tested-by: syzbot+e6979a5d2f10ecb700e4@syzkaller.appspotmail.com
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Acked-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://patch.msgid.link/20240731234624.94055-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9ab0faa7
    • Stephen Hemminger's avatar
      MAINTAINERS: update status of sky2 and skge drivers · eeef5f18
      Stephen Hemminger authored
      The old SysKonnect NIc's are not used or actively maintained anymore.
      My sky2 NIC's are all in box in back corner of attic.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Link: https://patch.msgid.link/20240801162930.212299-1-stephen@networkplumber.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eeef5f18
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fix-endpoints-with-signal-and-subflow-flags' · 16dc75e5
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      mptcp: fix endpoints with 'signal' and 'subflow' flags
      
      When looking at improving the user experience around the MPTCP endpoints
      setup, I noticed that setting an endpoint with both the 'signal' and the
      'subflow' flags -- as it has been done in the past by users according to
      bug reports we got -- was resulting on only announcing the endpoint, but
      not using it to create subflows: the 'subflow' flag was then ignored.
      
      My initial thought was to modify IPRoute2 to warn the user when the two
      flags were set, but it doesn't sound normal to ignore one of them. I
      then looked at modifying the kernel not to allow having the two flags
      set, but when discussing about that with Mat, we thought it was maybe
      not ideal to do that, as there might be use-cases, we might break some
      configs. Then I saw it was working before v5.17. So instead, I fixed the
      support on the kernel side (patch 5) using Paolo's suggestion. This also
      includes a fix on the options side (patch 1: for v5.11+), an explicit
      deny of some options combinations (patch 2: for v5.18+), and some
      refactoring (patches 3 and 4) to ease the inclusion of the patch 5.
      
      While at it, I added a new selftest (patch 7) to validate this case --
      including a modification of the chk_add_nr helper to inverse the sides
      were the counters are checked (patch 6) -- and allowed ADD_ADDR echo
      just after the MP_JOIN 3WHS.
      
      The selftests modification have the same Fixes tag as the previous
      commit, but no 'Cc: Stable': if the backport can work, that's good --
      but it still need to be verified by running the selftests -- if not, no
      need to worry, many CIs will use the selftests from the last stable
      version to validate previous stable releases.
      ====================
      
      Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-signal-v1-0-c8a9b036493b@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      16dc75e5
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: test both signal & subflow · 4d2868b5
      Matthieu Baerts (NGI0) authored
      It should be quite uncommon to set both the subflow and the signal
      flags: the initiator of the connection is typically the one creating new
      subflows, not the other peer, then no need to announce additional local
      addresses, and use it to create subflows.
      
      But some people might be confused about the flags, and set both "just to
      be sure at least the right one is set". To verify the previous fix, and
      avoid future regressions, this specific case is now validated: the
      client announces a new address, and initiates a new subflow from the
      same address.
      
      While working on this, another bug has been noticed, where the client
      reset the new subflow because an ADD_ADDR echo got received as the 3rd
      ACK: this new test also explicitly checks that no RST have been sent by
      the client and server.
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it validates the previous fix for an issue introduced by this commit
      ID.
      
      Fixes: 86e39e04 ("mptcp: keep track of local endpoint still available for each msk")
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-signal-v1-7-c8a9b036493b@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4d2868b5
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: ability to invert ADD_ADDR check · bec1f3b1
      Matthieu Baerts (NGI0) authored
      In the following commit, the client will initiate the ADD_ADDR, instead
      of the server. We need to way to verify the ADD_ADDR have been correctly
      sent.
      
      Note: the default expected counters for when the port number is given
      are never changed by the caller, no need to accept them as parameter
      then.
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it validates the previous fix for an issue introduced by this commit
      ID.
      
      Fixes: 86e39e04 ("mptcp: keep track of local endpoint still available for each msk")
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-signal-v1-6-c8a9b036493b@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bec1f3b1
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set · 85df533a
      Matthieu Baerts (NGI0) authored
      Up to the 'Fixes' commit, having an endpoint with both the 'signal' and
      'subflow' flags, resulted in the creation of a subflow and an address
      announcement using the address linked to this endpoint. After this
      commit, only the address announcement was done, ignoring the 'subflow'
      flag.
      
      That's because the same bitmap is used for the two flags. It is OK to
      keep this single bitmap, the already selected local endpoint simply have
      to be re-used, but not via select_local_address() not to look at the
      just modified bitmap.
      
      Note that it is unusual to set the two flags together: creating a new
      subflow using a new local address will implicitly advertise it to the
      other peer. So in theory, no need to advertise it explicitly as well.
      Maybe there are use-cases -- the subflow might not reach the other peer
      that way, we can ask the other peer to try initiating the new subflow
      without delay -- or very likely the user is confused, and put both flags
      "just to be sure at least the right one is set". Still, if it is
      allowed, the kernel should do what has been asked: using this endpoint
      to announce the address and to create a new subflow from it.
      
      An alternative is to forbid the use of the two flags together, but
      that's probably too late, there are maybe use-cases, and it was working
      before. This patch will avoid people complaining subflows are not
      created using the endpoint they added with the 'subflow' and 'signal'
      flag.
      
      Note that with the current patch, the subflow might not be created in
      some corner cases, e.g. if the 'subflows' limit was reached when sending
      the ADD_ADDR, but changed later on. It is probably not worth splitting
      id_avail_bitmap per target ('signal', 'subflow'), which will add another
      large field to the msk "just" to track (again) endpoints. Anyway,
      currently when the limits are changed, the kernel doesn't check if new
      subflows can be created or removed, because we would need to keep track
      of the received ADD_ADDR, and more. It sounds OK to assume that the
      limits should be properly configured before establishing new
      connections.
      
      Fixes: 86e39e04 ("mptcp: keep track of local endpoint still available for each msk")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-signal-v1-5-c8a9b036493b@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      85df533a
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: don't try to create sf if alloc failed · cd7c957f
      Matthieu Baerts (NGI0) authored
      It sounds better to avoid wasting cycles and / or put extreme memory
      pressure on the system by trying to create new subflows if it was not
      possible to add a new item in the announce list.
      
      While at it, a warning is now printed if the entry was already in the
      list as it should not happen with the in-kernel path-manager. With this
      PM, mptcp_pm_alloc_anno_list() should only fail in case of memory
      pressure.
      
      Fixes: b6c08380 ("mptcp: remove addr and subflow in PM netlink")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-signal-v1-4-c8a9b036493b@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cd7c957f
    • Matthieu Baerts (NGI0)'s avatar
      c95eb32c
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: deny endp with signal + subflow + port · 8af1f118
      Matthieu Baerts (NGI0) authored
      As mentioned in the 'Fixes' commit, the port flag is only supported by
      the 'signal' flag, and not by the 'subflow' one. Then if both the
      'signal' and 'subflow' flags are set, the problem is the same: the
      feature cannot work with the 'subflow' flag.
      
      Technically, if both the 'signal' and 'subflow' flags are set, it will
      be possible to create the listening socket, but not to establish a
      subflow using this source port. So better to explicitly deny it, not to
      create some confusions because the expected behaviour is not possible.
      
      Fixes: 09f12c3a ("mptcp: allow to use port and non-signal in set_flags")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-signal-v1-2-c8a9b036493b@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8af1f118
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: fully established after ADD_ADDR echo on MPJ · d67c5649
      Matthieu Baerts (NGI0) authored
      Before this patch, receiving an ADD_ADDR echo on the just connected
      MP_JOIN subflow -- initiator side, after the MP_JOIN 3WHS -- was
      resulting in an MP_RESET. That's because only ACKs with a DSS or
      ADD_ADDRs without the echo bit were allowed.
      
      Not allowing the ADD_ADDR echo after an MP_CAPABLE 3WHS makes sense, as
      we are not supposed to send an ADD_ADDR before because it requires to be
      in full established mode first. For the MP_JOIN 3WHS, that's different:
      the ADD_ADDR can be sent on a previous subflow, and the ADD_ADDR echo
      can be received on the recently created one. The other peer will already
      be in fully established, so it is allowed to send that.
      
      We can then relax the conditions here to accept the ADD_ADDR echo for
      MPJ subflows.
      
      Fixes: 67b12f79 ("mptcp: full fully established support after ADD_ADDR")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-signal-v1-1-c8a9b036493b@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d67c5649
  7. 01 Aug, 2024 4 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 183d46ff
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from wireless, bleutooth, BPF and netfilter.
      
        Current release - regressions:
      
         - core: drop bad gso csum_start and offset in virtio_net_hdr
      
         - wifi: mt76: fix null pointer access in mt792x_mac_link_bss_remove
      
         - eth: tun: add missing bpf_net_ctx_clear() in do_xdp_generic()
      
         - phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and
           aqr115c
      
        Current release - new code bugs:
      
         - smc: prevent UAF in inet_create()
      
         - bluetooth: btmtk: fix kernel crash when entering btmtk_usb_suspend
      
         - eth: bnxt: reject unsupported hash functions
      
        Previous releases - regressions:
      
         - sched: act_ct: take care of padding in struct zones_ht_key
      
         - netfilter: fix null-ptr-deref in iptable_nat_table_init().
      
         - tcp: adjust clamping window for applications specifying SO_RCVBUF
      
        Previous releases - always broken:
      
         - ethtool: rss: small fixes to spec and GET
      
         - mptcp:
            - fix signal endpoint re-add
            - pm: fix backup support in signal endpoints
      
         - wifi: ath12k: fix soft lockup on suspend
      
         - eth: bnxt_en: fix RSS logic in __bnxt_reserve_rings()
      
         - eth: ice: fix AF_XDP ZC timeout and concurrency issues
      
         - eth: mlx5:
            - fix missing lock on sync reset reload
            - fix error handling in irq_pool_request_irq"
      
      * tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
        mptcp: fix duplicate data handling
        mptcp: fix bad RCVPRUNED mib accounting
        ipv6: fix ndisc_is_useropt() handling for PIO
        igc: Fix double reset adapter triggered from a single taprio cmd
        net: MAINTAINERS: Demote Qualcomm IPA to "maintained"
        net: wan: fsl_qmc_hdlc: Discard received CRC
        net: wan: fsl_qmc_hdlc: Convert carrier_lock spinlock to a mutex
        net/mlx5e: Add a check for the return value from mlx5_port_set_eth_ptys
        net/mlx5e: Fix CT entry update leaks of modify header context
        net/mlx5e: Require mlx5 tc classifier action support for IPsec prio capability
        net/mlx5: Fix missing lock on sync reset reload
        net/mlx5: Lag, don't use the hardcoded value of the first port
        net/mlx5: DR, Fix 'stack guard page was hit' error in dr_rule
        net/mlx5: Fix error handling in irq_pool_request_irq
        net/mlx5: Always drain health in shutdown callback
        net: Add skbuff.h to MAINTAINERS
        r8169: don't increment tx_dropped in case of NETDEV_TX_BUSY
        netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().
        netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().
        net: drop bad gso csum_start and offset in virtio_net_hdr
        ...
      183d46ff
    • Paolo Abeni's avatar
      Merge branch 'mptcp-fix-duplicate-data-handling' · 25010bfd
      Paolo Abeni authored
      Matthieu Baerts says:
      
      ====================
      mptcp: fix duplicate data handling
      
      In some cases, the subflow-level's copied_seq counter was incorrectly
      increased, leading to an unexpected subflow reset.
      
      Patch 1/2 fixes the RCVPRUNED MIB counter that was attached to the wrong
      event since its introduction in v5.14, backported to v5.11.
      
      Patch 2/2 fixes the copied_seq counter issues, is present since v5.10.
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      ====================
      
      Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-dup-data-v1-0-bde833fa628a@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      25010bfd
    • Paolo Abeni's avatar
      mptcp: fix duplicate data handling · 68cc9247
      Paolo Abeni authored
      When a subflow receives and discards duplicate data, the mptcp
      stack assumes that the consumed offset inside the current skb is
      zero.
      
      With multiple subflows receiving data simultaneously such assertion
      does not held true. As a result the subflow-level copied_seq will
      be incorrectly increased and later on the same subflow will observe
      a bad mapping, leading to subflow reset.
      
      Address the issue taking into account the skb consumed offset in
      mptcp_subflow_discard_data().
      
      Fixes: 04e4cd4f ("mptcp: cleanup mptcp_subflow_discard_data()")
      Cc: stable@vger.kernel.org
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/501Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      68cc9247
    • Paolo Abeni's avatar
      mptcp: fix bad RCVPRUNED mib accounting · 0a567c2a
      Paolo Abeni authored
      Since its introduction, the mentioned MIB accounted for the wrong
      event: wake-up being skipped as not-needed on some edge condition
      instead of incoming skb being dropped after landing in the (subflow)
      receive queue.
      
      Move the increment in the correct location.
      
      Fixes: ce599c51 ("mptcp: properly account bulk freed memory")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0a567c2a