1. 01 Aug, 2024 19 commits
  2. 31 Jul, 2024 5 commits
    • Kuniyuki Iwashima's avatar
      netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init(). · c22921df
      Kuniyuki Iwashima authored
      ip6table_nat_table_init() accesses net->gen->ptr[ip6table_nat_net_ops.id],
      but the function is exposed to user space before the entry is allocated
      via register_pernet_subsys().
      
      Let's call register_pernet_subsys() before xt_register_template().
      
      Fixes: fdacd57c ("netfilter: x_tables: never register tables by default")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c22921df
    • Kuniyuki Iwashima's avatar
      netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init(). · 5830aa86
      Kuniyuki Iwashima authored
      We had a report that iptables-restore sometimes triggered null-ptr-deref
      at boot time. [0]
      
      The problem is that iptable_nat_table_init() is exposed to user space
      before the kernel fully initialises netns.
      
      In the small race window, a user could call iptable_nat_table_init()
      that accesses net_generic(net, iptable_nat_net_id), which is available
      only after registering iptable_nat_net_ops.
      
      Let's call register_pernet_subsys() before xt_register_template().
      
      [0]:
      bpfilter: Loaded bpfilter_umh pid 11702
      Started bpfilter
      BUG: kernel NULL pointer dereference, address: 0000000000000013
       PF: supervisor write access in kernel mode
       PF: error_code(0x0002) - not-present page
      PGD 0 P4D 0
      PREEMPT SMP NOPTI
      CPU: 2 PID: 11879 Comm: iptables-restor Not tainted 6.1.92-99.174.amzn2023.x86_64 #1
      Hardware name: Amazon EC2 c6i.4xlarge/, BIOS 1.0 10/16/2017
      RIP: 0010:iptable_nat_table_init (net/ipv4/netfilter/iptable_nat.c:87 net/ipv4/netfilter/iptable_nat.c:121) iptable_nat
      Code: 10 4c 89 f6 48 89 ef e8 0b 19 bb ff 41 89 c4 85 c0 75 38 41 83 c7 01 49 83 c6 28 41 83 ff 04 75 dc 48 8b 44 24 08 48 8b 0c 24 <48> 89 08 4c 89 ef e8 a2 3b a2 cf 48 83 c4 10 44 89 e0 5b 5d 41 5c
      RSP: 0018:ffffbef902843cd0 EFLAGS: 00010246
      RAX: 0000000000000013 RBX: ffff9f4b052caa20 RCX: ffff9f4b20988d80
      RDX: 0000000000000000 RSI: 0000000000000064 RDI: ffffffffc04201c0
      RBP: ffff9f4b29394000 R08: ffff9f4b07f77258 R09: ffff9f4b07f77240
      R10: 0000000000000000 R11: ffff9f4b09635388 R12: 0000000000000000
      R13: ffff9f4b1a3c6c00 R14: ffff9f4b20988e20 R15: 0000000000000004
      FS:  00007f6284340000(0000) GS:ffff9f51fe280000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000013 CR3: 00000001d10a6005 CR4: 00000000007706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259)
       ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259)
       ? xt_find_table_lock (net/netfilter/x_tables.c:1259)
       ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420)
       ? page_fault_oops (arch/x86/mm/fault.c:727)
       ? exc_page_fault (./arch/x86/include/asm/irqflags.h:40 ./arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1470 arch/x86/mm/fault.c:1518)
       ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
       ? iptable_nat_table_init (net/ipv4/netfilter/iptable_nat.c:87 net/ipv4/netfilter/iptable_nat.c:121) iptable_nat
       xt_find_table_lock (net/netfilter/x_tables.c:1259)
       xt_request_find_table_lock (net/netfilter/x_tables.c:1287)
       get_info (net/ipv4/netfilter/ip_tables.c:965)
       ? security_capable (security/security.c:809 (discriminator 13))
       ? ns_capable (kernel/capability.c:376 kernel/capability.c:397)
       ? do_ipt_get_ctl (net/ipv4/netfilter/ip_tables.c:1656)
       ? bpfilter_send_req (net/bpfilter/bpfilter_kern.c:52) bpfilter
       nf_getsockopt (net/netfilter/nf_sockopt.c:116)
       ip_getsockopt (net/ipv4/ip_sockglue.c:1827)
       __sys_getsockopt (net/socket.c:2327)
       __x64_sys_getsockopt (net/socket.c:2342 net/socket.c:2339 net/socket.c:2339)
       do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:81)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
      RIP: 0033:0x7f62844685ee
      Code: 48 8b 0d 45 28 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 09
      RSP: 002b:00007ffd1f83d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
      RAX: ffffffffffffffda RBX: 00007ffd1f83d680 RCX: 00007f62844685ee
      RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000004
      RBP: 0000000000000004 R08: 00007ffd1f83d670 R09: 0000558798ffa2a0
      R10: 00007ffd1f83d680 R11: 0000000000000246 R12: 00007ffd1f83e3b2
      R13: 00007f628455baa0 R14: 00007ffd1f83d7b0 R15: 00007f628457a008
       </TASK>
      Modules linked in: iptable_nat(+) bpfilter rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache veth xt_state xt_connmark xt_nat xt_statistic xt_MASQUERADE xt_mark xt_addrtype ipt_REJECT nf_reject_ipv4 nft_chain_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment nft_compat nf_tables nfnetlink overlay nls_ascii nls_cp437 vfat fat ghash_clmulni_intel aesni_intel ena crypto_simd ptp cryptd i8042 pps_core serio button sunrpc sch_fq_codel configfs loop dm_mod fuse dax dmi_sysfs crc32_pclmul crc32c_intel efivarfs
      CR2: 0000000000000013
      
      Fixes: fdacd57c ("netfilter: x_tables: never register tables by default")
      Reported-by: default avatarTakahiro Kawahara <takawaha@amazon.co.jp>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5830aa86
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 0bf50cea
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      ice: fix AF_XDP ZC timeout and concurrency issues
      
      Maciej Fijalkowski says:
      
      Changes included in this patchset address an issue that customer has
      been facing when AF_XDP ZC Tx sockets were used in combination with flow
      control and regular Tx traffic.
      
      After executing:
      ethtool --set-priv-flags $dev link-down-on-close on
      ethtool -A $dev rx on tx on
      
      launching multiple ZC Tx sockets on $dev + pinging remote interface (so
      that regular Tx traffic is present) and then going through down/up of
      $dev, Tx timeout occurred and then most of the time ice driver was unable
      to recover from that state.
      
      These patches combined together solve the described above issue on
      customer side. Main focus here is to forbid producing Tx descriptors when
      either carrier is not yet initialized or process of bringing interface
      down has already started.
      
      v1: https://lore.kernel.org/netdev/20240708221416.625850-1-anthony.l.nguyen@intel.com/
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: xsk: fix txq interrupt mapping
        ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_prog
        ice: improve updating ice_{t,r}x_ring::xsk_pool
        ice: toggle netif_carrier when setting up XSK pool
        ice: modify error handling when setting XSK pool in ndo_bpf
        ice: replace synchronize_rcu with synchronize_net
        ice: don't busy wait for Rx queue disable in ice_qp_dis()
        ice: respect netif readiness in AF_XDP ZC related ndo's
      ====================
      
      Link: https://patch.msgid.link/20240729200716.681496-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0bf50cea
    • Willem de Bruijn's avatar
      net: drop bad gso csum_start and offset in virtio_net_hdr · 89add400
      Willem de Bruijn authored
      Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb
      for GSO packets.
      
      The function already checks that a checksum requested with
      VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets
      this might not hold for segs after segmentation.
      
      Syzkaller demonstrated to reach this warning in skb_checksum_help
      
      	offset = skb_checksum_start_offset(skb);
      	ret = -EINVAL;
      	if (WARN_ON_ONCE(offset >= skb_headlen(skb)))
      
      By injecting a TSO packet:
      
      WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0
       ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774
       ip_finish_output_gso net/ipv4/ip_output.c:279 [inline]
       __ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301
       iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82
       ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813
       __gre_xmit net/ipv4/ip_gre.c:469 [inline]
       ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661
       __netdev_start_xmit include/linux/netdevice.h:4850 [inline]
       netdev_start_xmit include/linux/netdevice.h:4864 [inline]
       xmit_one net/core/dev.c:3595 [inline]
       dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611
       __dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261
       packet_snd net/packet/af_packet.c:3073 [inline]
      
      The geometry of the bad input packet at tcp_gso_segment:
      
      [   52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0
      [   52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244
      [   52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0))
      [   52.003050][ T8403] csum(0x60000c7 start=199 offset=1536
      ip_summed=3 complete_sw=0 valid=0 level=0)
      
      Mitigate with stricter input validation.
      
      csum_offset: for GSO packets, deduce the correct value from gso_type.
      This is already done for USO. Extend it to TSO. Let UFO be:
      udp[46]_ufo_fragment ignores these fields and always computes the
      checksum in software.
      
      csum_start: finding the real offset requires parsing to the transport
      header. Do not add a parser, use existing segmentation parsing. Thanks
      to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded.
      Again test both TSO and USO. Do not test UFO for the above reason, and
      do not test UDP tunnel offload.
      
      GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be
      CHECKSUM_NONE since commit 10154dbd ("udp: Allow GSO transmit
      from devices with no checksum offload"), but then still these fields
      are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no
      need to test for ip_summed == CHECKSUM_PARTIAL first.
      
      This revises an existing fix mentioned in the Fixes tag, which broke
      small packets with GSO offload, as detected by kselftests.
      
      Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871
      Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org
      Fixes: e269d79c ("net: missing check virtio")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://patch.msgid.link/20240729201108.1615114-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      89add400
    • Bartosz Golaszewski's avatar
      net: phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and aqr115c · a7f3abcf
      Bartosz Golaszewski authored
      Commit 708405f3 ("net: phy: aquantia: wait for the GLOBAL_CFG to
      start returning real values") introduced a workaround for an issue
      observed on aqr115c. However there were never any reports of it
      happening on other models and the workaround has been reported to cause
      and issue on aqr113c (and it may cause the same on any other model not
      supporting 10M mode).
      
      Let's limit the impact of the workaround to aqr113, aqr113c and aqr115c
      and poll the 100M GLOBAL_CFG register instead as both models are known
      to support it correctly.
      Reported-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Closes: https://lore.kernel.org/lkml/7c0140be-4325-4005-9068-7e0fc5ff344d@nvidia.com/
      Fixes: 708405f3 ("net: phy: aquantia: wait for the GLOBAL_CFG to start returning real values")
      Tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      Reviewed-by: default avatarAntoine Tenart <atenart@kernel.org>
      Link: https://patch.msgid.link/20240729150315.65798-1-brgl@bgdev.plSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a7f3abcf
  3. 30 Jul, 2024 13 commits
    • Raju Lakkaraju's avatar
      net: phy: micrel: Fix the KSZ9131 MDI-X status issue · 84383b5e
      Raju Lakkaraju authored
      The MDIX status is not accurately reflecting the current state after the link
      partner has manually altered its MDIX configuration while operating in forced
      mode.
      
      Access information about Auto mdix completion and pair selection from the
      KSZ9131's Auto/MDI/MDI-X status register
      
      Fixes: b64e6a87 ("net: phy: micrel: Add PHY Auto/MDI/MDI-X set driver for KSZ9131")
      Signed-off-by: default avatarRaju Lakkaraju <Raju.Lakkaraju@microchip.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://patch.msgid.link/20240725071125.13960-1-Raju.Lakkaraju@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      84383b5e
    • Jiri Olsa's avatar
      bpf/selftests: Fix ASSERT_OK condition check in uprobe_syscall test · 7764b962
      Jiri Olsa authored
      Fixing ASSERT_OK condition check in uprobe_syscall test,
      otherwise we return from test on pipe success.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/bpf/20240726180847.684584-1-jolsa@kernel.org
      7764b962
    • Dan Carpenter's avatar
      net: mvpp2: Don't re-use loop iterator · 0aa3ca95
      Dan Carpenter authored
      This function has a nested loop.  The problem is that both the inside
      and outside loop use the same variable as an iterator.  I found this
      via static analysis so I'm not sure the impact.  It could be that it
      loops forever or, more likely, the loop exits early.
      
      Fixes: 3a616b92 ("net: mvpp2: Add TX flow control support for jumbo frames")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/eaa8f403-7779-4d81-973d-a9ecddc0bf6f@stanley.mountainSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0aa3ca95
    • Alexandra Winter's avatar
      net/iucv: fix use after free in iucv_sock_close() · f558120c
      Alexandra Winter authored
      iucv_sever_path() is called from process context and from bh context.
      iucv->path is used as indicator whether somebody else is taking care of
      severing the path (or it is already removed / never existed).
      This needs to be done with atomic compare and swap, otherwise there is a
      small window where iucv_sock_close() will try to work with a path that has
      already been severed and freed by iucv_callback_connrej() called by
      iucv_tasklet_fn().
      
      Example:
      [452744.123844] Call Trace:
      [452744.123845] ([<0000001e87f03880>] 0x1e87f03880)
      [452744.123966]  [<00000000d593001e>] iucv_path_sever+0x96/0x138
      [452744.124330]  [<000003ff801ddbca>] iucv_sever_path+0xc2/0xd0 [af_iucv]
      [452744.124336]  [<000003ff801e01b6>] iucv_sock_close+0xa6/0x310 [af_iucv]
      [452744.124341]  [<000003ff801e08cc>] iucv_sock_release+0x3c/0xd0 [af_iucv]
      [452744.124345]  [<00000000d574794e>] __sock_release+0x5e/0xe8
      [452744.124815]  [<00000000d5747a0c>] sock_close+0x34/0x48
      [452744.124820]  [<00000000d5421642>] __fput+0xba/0x268
      [452744.124826]  [<00000000d51b382c>] task_work_run+0xbc/0xf0
      [452744.124832]  [<00000000d5145710>] do_notify_resume+0x88/0x90
      [452744.124841]  [<00000000d5978096>] system_call+0xe2/0x2c8
      [452744.125319] Last Breaking-Event-Address:
      [452744.125321]  [<00000000d5930018>] iucv_path_sever+0x90/0x138
      [452744.125324]
      [452744.125325] Kernel panic - not syncing: Fatal exception in interrupt
      
      Note that bh_lock_sock() is not serializing the tasklet context against
      process context, because the check for sock_owned_by_user() and
      corresponding handling is missing.
      
      Ideas for a future clean-up patch:
      A) Correct usage of bh_lock_sock() in tasklet context, as described in
      Link: https://lore.kernel.org/netdev/1280155406.2899.407.camel@edumazet-laptop/
      Re-enqueue, if needed. This may require adding return values to the
      tasklet functions and thus changes to all users of iucv.
      
      B) Change iucv tasklet into worker and use only lock_sock() in af_iucv.
      
      Fixes: 7d316b94 ("af_iucv: remove IUCV-pathes completely")
      Reviewed-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Link: https://patch.msgid.link/20240729122818.947756-1-wintera@linux.ibm.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f558120c
    • D. Wythe's avatar
      net/smc: prevent UAF in inet_create() · 2fe5273f
      D. Wythe authored
      Following syzbot repro crashes the kernel:
      
      socketpair(0x2, 0x1, 0x100, &(0x7f0000000140)) (fail_nth: 13)
      
      Fix this by not calling sk_common_release() from smc_create_clcsk().
      
      Stack trace:
      socket: no more sockets
      ------------[ cut here ]------------
      refcount_t: underflow; use-after-free.
       WARNING: CPU: 1 PID: 5092 at lib/refcount.c:28
      refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28
      Modules linked in:
      CPU: 1 PID: 5092 Comm: syz-executor424 Not tainted
      6.10.0-syzkaller-04483-g0be9ae54 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 06/27/2024
       RIP: 0010:refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28
      Code: 80 f3 1f 8c e8 e7 69 a8 fc 90 0f 0b 90 90 eb 99 e8 cb 4f e6 fc c6
      05 8a 8d e8 0a 01 90 48 c7 c7 e0 f3 1f 8c e8 c7 69 a8 fc 90 <0f> 0b 90
      90 e9 76 ff ff ff e8 a8 4f e6 fc c6 05 64 8d e8 0a 01 90
      RSP: 0018:ffffc900034cfcf0 EFLAGS: 00010246
      RAX: 3b9fcde1c862f700 RBX: ffff888022918b80 RCX: ffff88807b39bc00
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: 0000000000000003 R08: ffffffff815878a2 R09: fffffbfff1c39d94
      R10: dffffc0000000000 R11: fffffbfff1c39d94 R12: 00000000ffffffe9
      R13: 1ffff11004523165 R14: ffff888022918b28 R15: ffff888022918b00
      FS:  00005555870e7380(0000) GS:ffff8880b9500000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000140 CR3: 000000007582e000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       inet_create+0xbaf/0xe70
        __sock_create+0x490/0x920 net/socket.c:1571
        sock_create net/socket.c:1622 [inline]
        __sys_socketpair+0x2ca/0x720 net/socket.c:1769
        __do_sys_socketpair net/socket.c:1822 [inline]
        __se_sys_socketpair net/socket.c:1819 [inline]
        __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7fbcb9259669
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 1a 00 00 90 48 89 f8 48 89
      f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
      f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fffe931c6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000035
      RAX: ffffffffffffffda RBX: 00007fffe931c6f0 RCX: 00007fbcb9259669
      RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000002
      RBP: 0000000000000002 R08: 00007fffe931c476 R09: 00000000000000a0
      R10: 0000000020000140 R11: 0000000000000246 R12: 00007fffe931c6ec
      R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
       </TASK>
      
      Link: https://lore.kernel.org/r/20240723175809.537291-1-edumazet@google.com/
      Fixes: d25a92cc ("net/smc: Introduce IPPROTO_SMC")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Link: https://patch.msgid.link/1722224415-30999-1-git-send-email-alibuda@linux.alibaba.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2fe5273f
    • Paolo Abeni's avatar
      Merge branch 'mptcp-fix-inconsistent-backup-usage' · 0cd55ef9
      Paolo Abeni authored
      Matthieu Baerts says:
      
      ====================
      mptcp: fix inconsistent backup usage
      
      In all the MPTCP backup related tests, the backup flag was set on one
      side, and the expected behaviour is to have both sides respecting this
      decision. That's also the "natural" way, and what the users seem to
      expect.
      
      On the scheduler side, only the 'backup' field was checked, which is
      supposed to be set only if the other peer flagged a subflow as backup.
      But in various places, this flag was also set when the local host
      flagged the subflow as backup, certainly to have the expected behaviour
      mentioned above.
      
      Patch 1 modifies the packet scheduler to check if the backup flag has
      been set on both directions, not to change its behaviour after having
      applied the following patches. That's what the default packet scheduler
      should have done since the beginning in v5.7.
      
      Patch 2 fixes the backup flag being mirrored on the MPJ+SYN+ACK by
      accident since its introduction in v5.7. Instead, the received and sent
      backup flags are properly distinguished in requests.
      
      Patch 3 stops setting the received backup flag as well when sending an
      MP_PRIO, something that was done since the MP_PRIO support in v5.12.
      
      Patch 4 adds related and missing MIB counters to be able to easily check
      if MP_JOIN are sent with a backup flag. Certainly because these counters
      were not there, the behaviour that is fixed by patches here was not
      properly verified.
      
      Patch 5 validates the previous patch by extending the MPTCP Join
      selftest.
      
      Patch 6 fixes the backup support in signal endpoints: if a signal
      endpoint had the backup flag, it was not set in the MPJ+SYN+ACK as
      expected. It was only set for ongoing connections, but not future ones
      as expected, since the introduction of the backup flag in endpoints in
      v5.10.
      
      Patch 7 validates the previous patch by extending the MPTCP Join
      selftest as well.
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      ---
      Matthieu Baerts (NGI0) (7):
            mptcp: sched: check both directions for backup
            mptcp: distinguish rcv vs sent backup flag in requests
            mptcp: pm: only set request_bkup flag when sending MP_PRIO
            mptcp: mib: count MPJ with backup flag
            selftests: mptcp: join: validate backup in MPJ
            mptcp: pm: fix backup support in signal endpoints
            selftests: mptcp: join: check backup support in signal endp
      
       include/trace/events/mptcp.h                    |  2 +-
       net/mptcp/mib.c                                 |  2 +
       net/mptcp/mib.h                                 |  2 +
       net/mptcp/options.c                             |  2 +-
       net/mptcp/pm.c                                  | 12 +++++
       net/mptcp/pm_netlink.c                          | 19 ++++++-
       net/mptcp/pm_userspace.c                        | 18 +++++++
       net/mptcp/protocol.c                            | 10 ++--
       net/mptcp/protocol.h                            |  4 ++
       net/mptcp/subflow.c                             | 10 ++++
       tools/testing/selftests/net/mptcp/mptcp_join.sh | 72 ++++++++++++++++++++-----
       11 files changed, 132 insertions(+), 21 deletions(-)
      ====================
      
      Link: https://patch.msgid.link/20240727-upstream-net-20240727-mptcp-backup-signal-v1-0-f50b31604cf1@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0cd55ef9
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: check backup support in signal endp · f833470c
      Matthieu Baerts (NGI0) authored
      Before the previous commit, 'signal' endpoints with the 'backup' flag
      were ignored when sending the MP_JOIN.
      
      The MPTCP Join selftest has then been modified to validate this case:
      the "single address, backup" test, is now validating the MP_JOIN with a
      backup flag as it is what we expect it to do with such name. The
      previous version has been kept, but renamed to "single address, switch
      to backup" to avoid confusions.
      
      The "single address with port, backup" test is also now validating the
      MPJ with a backup flag, which makes more sense than checking the switch
      to backup with an MP_PRIO.
      
      The "mpc backup both sides" test is now validating that the backup flag
      is also set in MP_JOIN from and to the addresses used in the initial
      subflow, using the special ID 0.
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it validates the previous fix for an issue introduced by this commit
      ID.
      
      Fixes: 4596a2c1 ("mptcp: allow creating non-backup subflows")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f833470c
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: fix backup support in signal endpoints · 6834097f
      Matthieu Baerts (NGI0) authored
      There was a support for signal endpoints, but only when the endpoint's
      flag was changed during a connection. If an endpoint with the signal and
      backup was already present, the MP_JOIN reply was not containing the
      backup flag as expected.
      
      That's confusing to have this inconsistent behaviour. On the other hand,
      the infrastructure to set the backup flag in the SYN + ACK + MP_JOIN was
      already there, it was just never set before. Now when requesting the
      local ID from the path-manager, the backup status is also requested.
      
      Note that when the userspace PM is used, the backup flag can be set if
      the local address was already used before with a backup flag, e.g. if
      the address was announced with the 'backup' flag, or a subflow was
      created with the 'backup' flag.
      
      Fixes: 4596a2c1 ("mptcp: allow creating non-backup subflows")
      Cc: stable@vger.kernel.org
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/507Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6834097f
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: validate backup in MPJ · 935ff5bb
      Matthieu Baerts (NGI0) authored
      A peer can notify the other one that a subflow has to be treated as
      "backup" by two different ways: either by sending a dedicated MP_PRIO
      notification, or by setting the backup flag in the MP_JOIN handshake.
      
      The selftests were previously monitoring the former, but not the latter.
      This is what is now done here by looking at these new MIB counters when
      validating the 'backup' cases:
      
        MPTcpExtMPJoinSynBackupRx
        MPTcpExtMPJoinSynAckBackupRx
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it will help to validate a new fix for an issue introduced by this
      commit ID.
      
      Fixes: 4596a2c1 ("mptcp: allow creating non-backup subflows")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      935ff5bb
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: mib: count MPJ with backup flag · 4dde0d72
      Matthieu Baerts (NGI0) authored
      Without such counters, it is difficult to easily debug issues with MPJ
      not having the backup flags on production servers.
      
      This is not strictly a fix, but it eases to validate the following
      patches without requiring to take packet traces, to query ongoing
      connections with Netlink with admin permissions, or to guess by looking
      at the behaviour of the packet scheduler. Also, the modification is self
      contained, isolated, well controlled, and the increments are done just
      after others, there from the beginning. It looks then safe, and helpful
      to backport this.
      
      Fixes: 4596a2c1 ("mptcp: allow creating non-backup subflows")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4dde0d72
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: only set request_bkup flag when sending MP_PRIO · 4258b948
      Matthieu Baerts (NGI0) authored
      The 'backup' flag from mptcp_subflow_context structure is supposed to be
      set only when the other peer flagged a subflow as backup, not the
      opposite.
      
      Fixes: 06706542 ("mptcp: add the outgoing MP_PRIO support")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4258b948
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: distinguish rcv vs sent backup flag in requests · efd340bf
      Matthieu Baerts (NGI0) authored
      When sending an MP_JOIN + SYN + ACK, it is possible to mark the subflow
      as 'backup' by setting the flag with the same name. Before this patch,
      the backup was set if the other peer set it in its MP_JOIN + SYN
      request.
      
      It is not correct: the backup flag should be set in the MPJ+SYN+ACK only
      if the host asks for it, and not mirroring what was done by the other
      peer. It is then required to have a dedicated bit for each direction,
      similar to what is done in the subflow context.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      efd340bf
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: sched: check both directions for backup · b6a66e52
      Matthieu Baerts (NGI0) authored
      The 'mptcp_subflow_context' structure has two items related to the
      backup flags:
      
       - 'backup': the subflow has been marked as backup by the other peer
      
       - 'request_bkup': the backup flag has been set by the host
      
      Before this patch, the scheduler was only looking at the 'backup' flag.
      That can make sense in some cases, but it looks like that's not what we
      wanted for the general use, because either the path-manager was setting
      both of them when sending an MP_PRIO, or the receiver was duplicating
      the 'backup' flag in the subflow request.
      
      Note that the use of these two flags in the path-manager are going to be
      fixed in the next commits, but this change here is needed not to modify
      the behaviour.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b6a66e52
  4. 29 Jul, 2024 3 commits