1. 15 Jun, 2023 12 commits
    • Lin Ma's avatar
      net: tipc: resize nlattr array to correct size · 44194cb1
      Lin Ma authored
      According to nla_parse_nested_deprecated(), the tb[] is supposed to the
      destination array with maxtype+1 elements. In current
      tipc_nl_media_get() and __tipc_nl_media_set(), a larger array is used
      which is unnecessary. This patch resize them to a proper size.
      
      Fixes: 1e55417d ("tipc: add media set to new netlink api")
      Fixes: 46f15c67 ("tipc: add media get/dump to new netlink api")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarTung Nguyen <tung.q.nguyen@dektech.com.au>
      Link: https://lore.kernel.org/r/20230614120604.1196377-1-linma@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      44194cb1
    • Íñigo Huguet's avatar
      sfc: fix XDP queues mode with legacy IRQ · e84a1e1e
      Íñigo Huguet authored
      In systems without MSI-X capabilities, xdp_txq_queues_mode is calculated
      in efx_allocate_msix_channels, but when enabling MSI-X fails, it was not
      changed to a proper default value. This was leading to the driver
      thinking that it has dedicated XDP queues, when it didn't.
      
      Fix it by setting xdp_txq_queues_mode to the correct value if the driver
      fallbacks to MSI or legacy IRQ mode. The correct value is
      EFX_XDP_TX_QUEUES_BORROWED because there are no XDP dedicated queues.
      
      The issue can be easily visible if the kernel is started with pci=nomsi,
      then a call trace is shown. It is not shown only with sfc's modparam
      interrupt_mode=2. Call trace example:
       WARNING: CPU: 2 PID: 663 at drivers/net/ethernet/sfc/efx_channels.c:828 efx_set_xdp_channels+0x124/0x260 [sfc]
       [...skip...]
       Call Trace:
        <TASK>
        efx_set_channels+0x5c/0xc0 [sfc]
        efx_probe_nic+0x9b/0x15a [sfc]
        efx_probe_all+0x10/0x1a2 [sfc]
        efx_pci_probe_main+0x12/0x156 [sfc]
        efx_pci_probe_post_io+0x18/0x103 [sfc]
        efx_pci_probe.cold+0x154/0x257 [sfc]
        local_pci_probe+0x42/0x80
      
      Fixes: 6215b608 ("sfc: last resort fallback for lack of xdp tx queues")
      Reported-by: default avatarYanghang Liu <yanghliu@redhat.com>
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e84a1e1e
    • Fedor Pchelkin's avatar
      net: macsec: fix double free of percpu stats · 0c0cf3db
      Fedor Pchelkin authored
      Inside macsec_add_dev() we free percpu macsec->secy.tx_sc.stats and
      macsec->stats on some of the memory allocation failure paths. However, the
      net_device is already registered to that moment: in macsec_newlink(), just
      before calling macsec_add_dev(). This means that during unregister process
      its priv_destructor - macsec_free_netdev() - will be called and will free
      the stats again.
      
      Remove freeing percpu stats inside macsec_add_dev() because
      macsec_free_netdev() will correctly free the already allocated ones. The
      pointers to unallocated stats stay NULL, and free_percpu() treats that
      correctly.
      
      Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
      
      Fixes: 0a28bfd4 ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support")
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarFedor Pchelkin <pchelkin@ispras.ru>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c0cf3db
    • Eric Dumazet's avatar
      net: lapbether: only support ethernet devices · 9eed321c
      Eric Dumazet authored
      It probbaly makes no sense to support arbitrary network devices
      for lapbether.
      
      syzbot reported:
      
      skbuff: skb_under_panic: text:ffff80008934c100 len:44 put:40 head:ffff0000d18dd200 data:ffff0000d18dd1ea tail:0x16 end:0x140 dev:bond1
      kernel BUG at net/core/skbuff.c:200 !
      Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 0 PID: 5643 Comm: dhcpcd Not tainted 6.4.0-rc5-syzkaller-g4641cff8e810 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
      pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : skb_panic net/core/skbuff.c:196 [inline]
      pc : skb_under_panic+0x13c/0x140 net/core/skbuff.c:210
      lr : skb_panic net/core/skbuff.c:196 [inline]
      lr : skb_under_panic+0x13c/0x140 net/core/skbuff.c:210
      sp : ffff8000973b7260
      x29: ffff8000973b7270 x28: ffff8000973b7360 x27: dfff800000000000
      x26: ffff0000d85d8150 x25: 0000000000000016 x24: ffff0000d18dd1ea
      x23: ffff0000d18dd200 x22: 000000000000002c x21: 0000000000000140
      x20: 0000000000000028 x19: ffff80008934c100 x18: ffff8000973b68a0
      x17: 0000000000000000 x16: ffff80008a43bfbc x15: 0000000000000202
      x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000001
      x11: 0000000000000201 x10: 0000000000000000 x9 : f22f7eb937cced00
      x8 : f22f7eb937cced00 x7 : 0000000000000001 x6 : 0000000000000001
      x5 : ffff8000973b6b78 x4 : ffff80008df9ee80 x3 : ffff8000805974f4
      x2 : 0000000000000001 x1 : 0000000100000201 x0 : 0000000000000086
      Call trace:
      skb_panic net/core/skbuff.c:196 [inline]
      skb_under_panic+0x13c/0x140 net/core/skbuff.c:210
      skb_push+0xf0/0x108 net/core/skbuff.c:2409
      ip6gre_header+0xbc/0x738 net/ipv6/ip6_gre.c:1383
      dev_hard_header include/linux/netdevice.h:3137 [inline]
      lapbeth_data_transmit+0x1c4/0x298 drivers/net/wan/lapbether.c:257
      lapb_data_transmit+0x8c/0xb0 net/lapb/lapb_iface.c:447
      lapb_transmit_buffer+0x178/0x204 net/lapb/lapb_out.c:149
      lapb_send_control+0x220/0x320 net/lapb/lapb_subr.c:251
      lapb_establish_data_link+0x94/0xec
      lapb_device_event+0x348/0x4e0
      notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93
      raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461
      __dev_notify_flags+0x2bc/0x544
      dev_change_flags+0xd0/0x15c net/core/dev.c:8643
      devinet_ioctl+0x858/0x17e4 net/ipv4/devinet.c:1150
      inet_ioctl+0x2ac/0x4d8 net/ipv4/af_inet.c:979
      sock_do_ioctl+0x134/0x2dc net/socket.c:1201
      sock_ioctl+0x4ec/0x858 net/socket.c:1318
      vfs_ioctl fs/ioctl.c:51 [inline]
      __do_sys_ioctl fs/ioctl.c:870 [inline]
      __se_sys_ioctl fs/ioctl.c:856 [inline]
      __arm64_sys_ioctl+0x14c/0x1c8 fs/ioctl.c:856
      __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
      invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
      el0_svc_common+0x138/0x244 arch/arm64/kernel/syscall.c:142
      do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:191
      el0_svc+0x4c/0x160 arch/arm64/kernel/entry-common.c:647
      el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665
      el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
      Code: aa1803e6 aa1903e7 a90023f5 947730f5 (d4210000)
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Martin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9eed321c
    • Jan Karcher's avatar
      MAINTAINERS: add reviewers for SMC Sockets · 7d03646d
      Jan Karcher authored
      adding three people from Alibaba as reviewers for SMC.
      They are currently working on improving SMC on other architectures than
      s390 and help with reviewing patches on top.
      
      Thank you D. Wythe, Tony Lu and Wen Gu for your contributions and
      collaboration and welcome on board as reviewers!
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarJan Karcher <jaka@linux.ibm.com>
      Acked-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Acked-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d03646d
    • Julian Ruess's avatar
      s390/ism: Fix trying to free already-freed IRQ by repeated ism_dev_exit() · 78d0f949
      Julian Ruess authored
      This patch prevents the system from crashing when unloading the ISM module.
      
      How to reproduce: Attach an ISM device and execute 'rmmod ism'.
      
      Error-Log:
      - Trying to free already-free IRQ 0
      - WARNING: CPU: 1 PID: 966 at kernel/irq/manage.c:1890 free_irq+0x140/0x540
      
      After calling ism_dev_exit() for each ISM device in the exit routine,
      pci_unregister_driver() will execute ism_remove() for each ISM device.
      Because ism_remove() also calls ism_dev_exit(),
      free_irq(pci_irq_vector(pdev, 0), ism) is called twice for each ISM
      device. This results in a crash with the error
      'Trying to free already-free IRQ'.
      
      In the exit routine, it is enough to call pci_unregister_driver()
      because it ensures that ism_dev_exit() is called once per
      ISM device.
      
      Cc: <stable@vger.kernel.org> # 6.3+
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Reviewed-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarJulian Ruess <julianr@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78d0f949
    • Vladimir Oltean's avatar
      net: dsa: felix: fix taprio guard band overflow at 10Mbps with jumbo frames · 6ac7a27a
      Vladimir Oltean authored
      The DEV_MAC_MAXLEN_CFG register contains a 16-bit value - up to 65535.
      Plus 2 * VLAN_HLEN (4), that is up to 65543.
      
      The picos_per_byte variable is the largest when "speed" is lowest -
      SPEED_10 = 10. In that case it is (1000000L * 8) / 10 = 800000.
      
      Their product - 52434400000 - exceeds 32 bits, which is a problem,
      because apparently, a multiplication between two 32-bit factors is
      evaluated as 32-bit before being assigned to a 64-bit variable.
      In fact it's a problem for any MTU value larger than 5368.
      
      Cast one of the factors of the multiplication to u64 to force the
      multiplication to take place on 64 bits.
      
      Issue found by Coverity.
      
      Fixes: 55a515b1 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230613170907.2413559-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6ac7a27a
    • Vlad Buslov's avatar
      net/sched: cls_api: Fix lockup on flushing explicitly created chain · c9a82bec
      Vlad Buslov authored
      Mingshuai Ren reports:
      
      When a new chain is added by using tc, one soft lockup alarm will be
       generated after delete the prio 0 filter of the chain. To reproduce
       the problem, perform the following steps:
      (1) tc qdisc add dev eth0 root handle 1: htb default 1
      (2) tc chain add dev eth0
      (3) tc filter del dev eth0 chain 0 parent 1: prio 0
      (4) tc filter add dev eth0 chain 0 parent 1:
      
      Fix the issue by accounting for additional reference to chains that are
      explicitly created by RTM_NEWCHAIN message as opposed to implicitly by
      RTM_NEWTFILTER message.
      
      Fixes: 726d0612 ("net: sched: prevent insertion of new classifiers during chain flush")
      Reported-by: default avatarMingshuai Ren <renmingshuai@huawei.com>
      Closes: https://lore.kernel.org/lkml/87legswvi3.fsf@nvidia.com/T/Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Link: https://lore.kernel.org/r/20230612093426.2867183-1-vladbu@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c9a82bec
    • Jakub Buchocki's avatar
      ice: Fix ice module unload · 24b454bc
      Jakub Buchocki authored
      Clearing the interrupt scheme before PFR reset,
      during the removal routine, could cause the hardware
      errors and possibly lead to system reboot, as the PF
      reset can cause the interrupt to be generated.
      
      Place the call for PFR reset inside ice_deinit_dev(),
      wait until reset and all pending transactions are done,
      then call ice_clear_interrupt_scheme().
      
      This introduces a PFR reset to multiple error paths.
      
      Additionally, remove the call for the reset from
      ice_load() - it will be a part of ice_unload() now.
      
      Error example:
      [   75.229328] ice 0000:ca:00.1: Failed to read Tx Scheduler Tree - User Selection data from flash
      [   77.571315] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
      [   77.571418] {1}[Hardware Error]: event severity: recoverable
      [   77.571459] {1}[Hardware Error]:  Error 0, type: recoverable
      [   77.571500] {1}[Hardware Error]:   section_type: PCIe error
      [   77.571540] {1}[Hardware Error]:   port_type: 4, root port
      [   77.571580] {1}[Hardware Error]:   version: 3.0
      [   77.571615] {1}[Hardware Error]:   command: 0x0547, status: 0x4010
      [   77.571661] {1}[Hardware Error]:   device_id: 0000:c9:02.0
      [   77.571703] {1}[Hardware Error]:   slot: 25
      [   77.571736] {1}[Hardware Error]:   secondary_bus: 0xca
      [   77.571773] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
      [   77.571821] {1}[Hardware Error]:   class_code: 060400
      [   77.571858] {1}[Hardware Error]:   bridge: secondary_status: 0x2800, control: 0x0013
      [   77.572490] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
      [   77.572870] pcieport 0000:c9:02.0:    [21] ACSViol                (First)
      [   77.573222] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
      [   77.573554] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
      [   77.691273] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
      [   77.691738] {2}[Hardware Error]: event severity: recoverable
      [   77.691971] {2}[Hardware Error]:  Error 0, type: recoverable
      [   77.692192] {2}[Hardware Error]:   section_type: PCIe error
      [   77.692403] {2}[Hardware Error]:   port_type: 4, root port
      [   77.692616] {2}[Hardware Error]:   version: 3.0
      [   77.692825] {2}[Hardware Error]:   command: 0x0547, status: 0x4010
      [   77.693032] {2}[Hardware Error]:   device_id: 0000:c9:02.0
      [   77.693238] {2}[Hardware Error]:   slot: 25
      [   77.693440] {2}[Hardware Error]:   secondary_bus: 0xca
      [   77.693641] {2}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
      [   77.693853] {2}[Hardware Error]:   class_code: 060400
      [   77.694054] {2}[Hardware Error]:   bridge: secondary_status: 0x0800, control: 0x0013
      [   77.719115] pci 0000:ca:00.1: AER: can't recover (no error_detected callback)
      [   77.719140] pcieport 0000:c9:02.0: AER: device recovery failed
      [   77.719216] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
      [   77.719390] pcieport 0000:c9:02.0:    [21] ACSViol                (First)
      [   77.719557] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
      [   77.719723] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
      
      Fixes: 5b246e53 ("ice: split probe into smaller functions")
      Signed-off-by: default avatarJakub Buchocki <jakubx.buchocki@intel.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230612171421.21570-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      24b454bc
    • Jakub Kicinski's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · d6858e19
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-06-12 (igc, igb)
      
      This series contains updates to igc and igb drivers.
      
      Husaini clears Tx rings when interface is brought down for igc.
      
      Vinicius disables PTM and PCI busmaster when removing igc driver.
      
      Alex adds error check and path for NVM read error on igb.
      
      * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        igb: fix nvm.ops.read() error handling
        igc: Fix possible system crash when loading module
        igc: Clean the TX buffer and TX descriptor ring
      ====================
      
      Link: https://lore.kernel.org/r/20230612205208.115292-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6858e19
    • Lin Ma's avatar
      net/handshake: remove fput() that causes use-after-free · 361b6889
      Lin Ma authored
      A reference underflow is found in TLS handshake subsystem that causes a
      direct use-after-free. Part of the crash log is like below:
      
      [    2.022114] ------------[ cut here ]------------
      [    2.022193] refcount_t: underflow; use-after-free.
      [    2.022288] WARNING: CPU: 0 PID: 60 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
      [    2.022432] Modules linked in:
      [    2.022848] RIP: 0010:refcount_warn_saturate+0xbe/0x110
      [    2.023231] RSP: 0018:ffffc900001bfe18 EFLAGS: 00000286
      [    2.023325] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00000000ffffdfff
      [    2.023438] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 0000000000000001
      [    2.023555] RBP: ffff888004c20098 R08: ffffffff82b392c8 R09: 00000000ffffdfff
      [    2.023693] R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888004c200d8
      [    2.023813] R13: 0000000000000000 R14: ffff888004c20000 R15: ffffc90000013ca8
      [    2.023930] FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
      [    2.024062] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    2.024161] CR2: ffff888003601000 CR3: 0000000002a2e000 CR4: 00000000000006f0
      [    2.024275] Call Trace:
      [    2.024322]  <TASK>
      [    2.024367]  ? __warn+0x7f/0x130
      [    2.024430]  ? refcount_warn_saturate+0xbe/0x110
      [    2.024513]  ? report_bug+0x199/0x1b0
      [    2.024585]  ? handle_bug+0x3c/0x70
      [    2.024676]  ? exc_invalid_op+0x18/0x70
      [    2.024750]  ? asm_exc_invalid_op+0x1a/0x20
      [    2.024830]  ? refcount_warn_saturate+0xbe/0x110
      [    2.024916]  ? refcount_warn_saturate+0xbe/0x110
      [    2.024998]  __tcp_close+0x2f4/0x3d0
      [    2.025065]  ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
      [    2.025168]  tcp_close+0x1f/0x70
      [    2.025231]  inet_release+0x33/0x60
      [    2.025297]  sock_release+0x1f/0x80
      [    2.025361]  handshake_req_cancel_test2+0x100/0x2d0
      [    2.025457]  kunit_try_run_case+0x4c/0xa0
      [    2.025532]  kunit_generic_run_threadfn_adapter+0x15/0x20
      [    2.025644]  kthread+0xe1/0x110
      [    2.025708]  ? __pfx_kthread+0x10/0x10
      [    2.025780]  ret_from_fork+0x2c/0x50
      
      One can enable CONFIG_NET_HANDSHAKE_KUNIT_TEST config to reproduce above
      crash.
      
      The root cause of this bug is that the commit 1ce77c99
      ("net/handshake: Unpin sock->file if a handshake is cancelled") adds one
      additional fput() function. That patch claims that the fput() is used to
      enable sock->file to be freed even when user space never calls DONE.
      
      However, it seems that the intended DONE routine will never give an
      additional fput() of ths sock->file. The existing two of them are just
      used to balance the reference added in sockfd_lookup().
      
      This patch revert the mentioned commit to avoid the use-after-free. The
      patched kernel could successfully pass the KUNIT test and boot to shell.
      
      [    0.733613]     # Subtest: Handshake API tests
      [    0.734029]     1..11
      [    0.734255]         KTAP version 1
      [    0.734542]         # Subtest: req_alloc API fuzzing
      [    0.736104]         ok 1 handshake_req_alloc NULL proto
      [    0.736114]         ok 2 handshake_req_alloc CLASS_NONE
      [    0.736559]         ok 3 handshake_req_alloc CLASS_MAX
      [    0.737020]         ok 4 handshake_req_alloc no callbacks
      [    0.737488]         ok 5 handshake_req_alloc no done callback
      [    0.737988]         ok 6 handshake_req_alloc excessive privsize
      [    0.738529]         ok 7 handshake_req_alloc all good
      [    0.739036]     # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
      [    0.739444]     ok 1 req_alloc API fuzzing
      [    0.740065]     ok 2 req_submit NULL req arg
      [    0.740436]     ok 3 req_submit NULL sock arg
      [    0.740834]     ok 4 req_submit NULL sock->file
      [    0.741236]     ok 5 req_lookup works
      [    0.741621]     ok 6 req_submit max pending
      [    0.741974]     ok 7 req_submit multiple
      [    0.742382]     ok 8 req_cancel before accept
      [    0.742764]     ok 9 req_cancel after accept
      [    0.743151]     ok 10 req_cancel after done
      [    0.743510]     ok 11 req_destroy works
      [    0.743882] # Handshake API tests: pass:11 fail:0 skip:0 total:11
      [    0.744205] # Totals: pass:17 fail:0 skip:0 total:17
      Acked-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Fixes: 1ce77c99 ("net/handshake: Unpin sock->file if a handshake is cancelled")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Link: https://lore.kernel.org/r/20230613083204.633896-1-linma@zju.edu.cn
      Link: https://lore.kernel.org/r/20230614015249.987448-1-linma@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      361b6889
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · 37cec6ed
      Jakub Kicinski authored
      Johannes Berg says:
      
      ====================
      A couple of straggler fixes, mostly in the stack:
       - fix fragmentation for multi-link related elements
       - fix callback copy/paste error
       - fix multi-link locking
       - remove double-locking of wiphy mutex
       - transmit only on active links, not all
       - activate links in the correct order
       - don't remove links that weren't added
       - disable soft-IRQs for LQ lock in iwlwifi
      
      * tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
        wifi: mac80211: fragment per STA profile correctly
        wifi: mac80211: Use active_links instead of valid_links in Tx
        wifi: cfg80211: remove links only on AP
        wifi: mac80211: take lock before setting vif links
        wifi: cfg80211: fix link del callback to call correct handler
        wifi: mac80211: fix link activation settings order
        wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid()
      ====================
      
      Link: https://lore.kernel.org/r/20230614075502.11765-1-johannes@sipsolutions.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      37cec6ed
  2. 14 Jun, 2023 12 commits
    • Danielle Ratson's avatar
      selftests: forwarding: hw_stats_l3: Set addrgenmode in a separate step · bef68e20
      Danielle Ratson authored
      Setting the IPv6 address generation mode of a net device during its
      creation never worked, but after commit b0ad3c17 ("rtnetlink: call
      validate_linkmsg in rtnl_create_link") it explicitly fails [1]. The
      failure is caused by the fact that validate_linkmsg() is called before
      the net device is registered, when it still does not have an 'inet6_dev'.
      
      Likewise, raising the net device before setting the address generation
      mode is meaningless, because by the time the mode is set, the address
      has already been generated.
      
      Therefore, fix the test to first create the net device, then set its
      IPv6 address generation mode and finally bring it up.
      
      [1]
       # ip link add name mydev addrgenmode eui64 type dummy
       RTNETLINK answers: Address family not supported by protocol
      
      Fixes: ba95e793 ("selftests: forwarding: hw_stats_l3: Add a new test")
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/f3b05d85b2bc0c3d6168fe8f7207c6c8365703db.1686580046.git.petrm@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bef68e20
    • Paolo Abeni's avatar
      Merge branch 'net-sched-fix-race-conditions-in-mini_qdisc_pair_swap' · 3b0d2819
      Paolo Abeni authored
      Peilin Ye says:
      
      ====================
      net/sched: Fix race conditions in mini_qdisc_pair_swap()
      
      These 2 patches fix race conditions for ingress and clsact Qdiscs as
      reported [1] by syzbot, split out from another [2] series (last 2 patches
      of it).  Per-patch changelog omitted.
      
      Patch 1 hasn't been touched since last version; I just included
      everybody's tag.
      
      Patch 2 bases on patch 6 v1 of [2], with comments and commit log slightly
      changed.  We also need rtnl_dereference() to load ->qdisc_sleeping since
      commit d636fc5d ("net: sched: add rcu annotations around
      qdisc->qdisc_sleeping"), so I changed that; please take yet another look,
      thanks!
      
      Patch 2 has been tested with the new reproducer Pedro posted [3].
      
      [1] https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b
      [2] https://lore.kernel.org/r/cover.1684887977.git.peilin.ye@bytedance.com/
      [3] https://lore.kernel.org/r/7879f218-c712-e9cc-57ba-665990f5f4c9@mojatatu.com/
      ====================
      
      Link: https://lore.kernel.org/r/cover.1686355297.git.peilin.ye@bytedance.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3b0d2819
    • Peilin Ye's avatar
      net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting · 84ad0af0
      Peilin Ye authored
      mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized
      in ingress_init() to point to net_device::miniq_ingress.  ingress Qdiscs
      access this per-net_device pointer in mini_qdisc_pair_swap().  Similar
      for clsact Qdiscs and miniq_egress.
      
      Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
      requests (thanks Hillf Danton for the hint), when replacing ingress or
      clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
      miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
      causing race conditions [1] including a use-after-free bug in
      mini_qdisc_pair_swap() reported by syzbot:
      
       BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
       Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
      ...
       Call Trace:
        <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
        print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
        print_report mm/kasan/report.c:430 [inline]
        kasan_report+0x11c/0x130 mm/kasan/report.c:536
        mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
        tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
        tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
        tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
        tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
        tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
      ...
      
      @old and @new should not affect each other.  In other words, @old should
      never modify miniq_{in,e}gress after @new, and @new should not update
      @old's RCU state.
      
      Fixing without changing sch_api.c turned out to be difficult (please
      refer to Closes: for discussions).  Instead, make sure @new's first call
      always happen after @old's last call (in {ingress,clsact}_destroy()) has
      finished:
      
      In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests,
      and call qdisc_destroy() for @old before grafting @new.
      
      Introduce qdisc_refcount_dec_if_one() as the counterpart of
      qdisc_refcount_inc_nz() used for filter requests.  Introduce a
      non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
      just like qdisc_put() etc.
      
      Depends on patch "net/sched: Refactor qdisc_graft() for ingress and
      clsact Qdiscs".
      
      [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
      TC_H_ROOT (no longer possible after commit c7cfbd11 ("net/sched:
      sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8
      transmission queues:
      
        Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2),
        then adds a flower filter X to A.
      
        Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
        b2) to replace A, then adds a flower filter Y to B.
      
       Thread 1               A's refcnt   Thread 2
        RTM_NEWQDISC (A, RTNL-locked)
         qdisc_create(A)               1
         qdisc_graft(A)                9
      
        RTM_NEWTFILTER (X, RTNL-unlocked)
         __tcf_qdisc_find(A)          10
         tcf_chain0_head_change(A)
         mini_qdisc_pair_swap(A) (1st)
                  |
                  |                         RTM_NEWQDISC (B, RTNL-locked)
               RCU sync                2     qdisc_graft(B)
                  |                    1     notify_and_destroy(A)
                  |
         tcf_block_release(A)          0    RTM_NEWTFILTER (Y, RTNL-unlocked)
         qdisc_destroy(A)                    tcf_chain0_head_change(B)
         tcf_chain0_head_change_cb_del(A)    mini_qdisc_pair_swap(B) (2nd)
         mini_qdisc_pair_swap(A) (3rd)                |
                 ...                                 ...
      
      Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to
      its mini Qdisc, b1.  Then, A calls mini_qdisc_pair_swap() again during
      ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress
      packets on eth0 will not find filter Y in sch_handle_ingress().
      
      This is just one of the possible consequences of concurrently accessing
      miniq_{in,e}gress pointers.
      
      Fixes: 7a096d57 ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
      Fixes: 87f37392 ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
      Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      84ad0af0
    • Peilin Ye's avatar
      net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs · 2d5f6a8d
      Peilin Ye authored
      Grafting ingress and clsact Qdiscs does not need a for-loop in
      qdisc_graft().  Refactor it.  No functional changes intended.
      Tested-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2d5f6a8d
    • Paul Blakey's avatar
      net/sched: act_ct: Fix promotion of offloaded unreplied tuple · 41f2c7c3
      Paul Blakey authored
      Currently UNREPLIED and UNASSURED connections are added to the nf flow
      table. This causes the following connection packets to be processed
      by the flow table which then skips conntrack_in(), and thus such the
      connections will remain UNREPLIED and UNASSURED even if reply traffic
      is then seen. Even still, the unoffloaded reply packets are the ones
      triggering hardware update from new to established state, and if
      there aren't any to triger an update and/or previous update was
      missed, hardware can get out of sync with sw and still mark
      packets as new.
      
      Fix the above by:
      1) Not skipping conntrack_in() for UNASSURED packets, but still
         refresh for hardware, as before the cited patch.
      2) Try and force a refresh by reply-direction packets that update
         the hardware rules from new to established state.
      3) Remove any bidirectional flows that didn't failed to update in
         hardware for re-insertion as bidrectional once any new packet
         arrives.
      
      Fixes: 6a9bad00 ("net/sched: act_ct: offload UDP NEW connections")
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/1686313379-117663-1-git-send-email-paulb@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      41f2c7c3
    • Hugh Dickins's avatar
      wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression · f1a0898b
      Hugh Dickins authored
      Lockdep on 6.4-rc on ThinkPad X1 Carbon 5th says
      =====================================================
      WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      6.4.0-rc5 #1 Not tainted
      -----------------------------------------------------
      kworker/3:1/49 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
      ffff8881066fa368 (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}, at: rs_drv_get_rate+0x46/0xe7
      
      and this task is already holding:
      ffff8881066f80a8 (&sta->rate_ctrl_lock){+.-.}-{2:2}, at: rate_control_get_rate+0xbd/0x126
      which would create a new lock dependency:
       (&sta->rate_ctrl_lock){+.-.}-{2:2} -> (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}
      
      but this new dependency connects a SOFTIRQ-irq-safe lock:
       (&sta->rate_ctrl_lock){+.-.}-{2:2}
      etc. etc. etc.
      
      Changing the spin_lock() in rs_drv_get_rate() to spin_lock_bh() was not
      enough to pacify lockdep, but changing them all on pers.lock has worked.
      
      Fixes: a8938bc8 ("wifi: iwlwifi: mvm: Add locking to the rate read flow")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Link: https://lore.kernel.org/r/79ffcc22-9775-cb6d-3ffd-1a517c40beef@google.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      f1a0898b
    • Jakub Kicinski's avatar
      Merge branch 'fix-small-bugs-and-annoyances-in-tc-testing' · 07b1cc84
      Jakub Kicinski authored
      Vlad Buslov says:
      
      ====================
      Fix small bugs and annoyances in tc-testing
      ====================
      
      Link: https://lore.kernel.org/r/20230612075712.2861848-1-vladbu@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      07b1cc84
    • Vlad Buslov's avatar
      selftests/tc-testing: Remove configs that no longer exist · 11b8b2e7
      Vlad Buslov authored
      Some qdiscs and classifiers have recently been retired from kernel.
      However, tc-testing config is still cluttered with them which causes noise
      when using merge_config.sh script to update existing config for tc-testing
      compatibility. Remove the config settings for affected qdiscs and
      classifiers.
      
      Fixes: fb38306c ("net/sched: Retire ATM qdisc")
      Fixes: 051d4420 ("net/sched: Retire CBQ qdisc")
      Fixes: bbe77c14 ("net/sched: Retire dsmark qdisc")
      Fixes: 265b4da8 ("net/sched: Retire rsvp classifier")
      Fixes: 8c710f75 ("net/sched: Retire tcindex classifier")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      11b8b2e7
    • Vlad Buslov's avatar
      selftests/tc-testing: Fix SFB db test · b39d8c41
      Vlad Buslov authored
      Setting very small value of db like 10ms introduces rounding errors when
      converting to/from jiffies on some kernel configs. For example, on 250hz
      the actual value will be set to 12ms which causes the test to fail:
      
       # $ sudo ./tdc.py  -d eth2 -e 3410
       #  -- ns/SubPlugin.__init__
       # Test 3410: Create SFB with db setting
       #
       # All test results:
       #
       # 1..1
       # not ok 1 3410 - Create SFB with db setting
       #         Could not match regex pattern. Verify command output:
       # qdisc sfb 1: root refcnt 2 rehash 600s db 12ms limit 1000p max 25p target 20p increment 0.000503548 decrement 4.57771e-05 penalty_rate 10pps penalty_burst 20p
      
      Set the value to 100ms instead which currently seem to work on 100hz,
      250hz, 300hz and 1000hz kernel configs.
      
      Fixes: 6ad92dc5 ("selftests/tc-testing: add selftests for sfb qdisc")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b39d8c41
    • Vlad Buslov's avatar
      selftests/tc-testing: Fix Error: failed to find target LOG · b849c566
      Vlad Buslov authored
      Add missing netfilter config dependency.
      
      Fixes following example error when running tests via tdc.sh for all XT
      tests:
      
       # $ sudo ./tdc.py -d eth2 -e 2029
       # Test 2029: Add xt action with log-prefix
       # exit: 255
       # exit: 0
       #  failed to find target LOG
       #
       # bad action parsing
       # parse_action: bad value (7:xt)!
       # Illegal "action"
       #
       # -----> teardown stage *** Could not execute: "$TC actions flush action xt"
       #
       # -----> teardown stage *** Error message: "Error: Cannot flush unknown TC action.
       # We have an error flushing
       # "
       # returncode 1; expected [0]
       #
       # -----> teardown stage *** Aborting test run.
       #
       # <_io.BufferedReader name=3> *** stdout ***
       #
       # <_io.BufferedReader name=5> *** stderr ***
       # "-----> teardown stage" did not complete successfully
       # Exception <class '__main__.PluginMgrTestFail'> ('teardown', ' failed to find target LOG\n\nbad action parsing\nparse_action: bad value (7:xt)!\nIllegal "action"\n', '"-----> teardown stage" did not complete successfully') (caught in test_runner, running test 2 2029 Add xt action with log-prefix stage teardown)
       # ---------------
       # traceback
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 495, in test_runner
       #     res = run_one_test(pm, args, index, tidx)
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 434, in run_one_test
       #     prepare_env(args, pm, 'teardown', '-----> teardown stage', tidx['teardown'], procout)
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 245, in prepare_env
       #     raise PluginMgrTestFail(
       # ---------------
       # accumulated output for this test:
       #  failed to find target LOG
       #
       # bad action parsing
       # parse_action: bad value (7:xt)!
       # Illegal "action"
       #
       # ---------------
       #
       # All test results:
       #
       # 1..1
       # ok 1 2029 - Add xt action with log-prefix # skipped - "-----> teardown stage" did not complete successfully
      
      Fixes: 910d504b ("selftests/tc-testings: add selftests for xt action")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b849c566
    • Vlad Buslov's avatar
      selftests/tc-testing: Fix Error: Specified qdisc kind is unknown. · aef6e908
      Vlad Buslov authored
      All TEQL tests assume that sch_teql module is loaded. Load module in tdc.sh
      before running qdisc tests.
      
      Fixes following example error when running tests via tdc.sh for all TEQL
      tests:
      
       # $ sudo ./tdc.py -d eth2 -e 84a0
       #  -- ns/SubPlugin.__init__
       # Test 84a0: Create TEQL with default setting
       # exit: 2
       # exit: 0
       # Error: Specified qdisc kind is unknown.
       #
       # -----> teardown stage *** Could not execute: "$TC qdisc del dev $DUMMY handle 1: root"
       #
       # -----> teardown stage *** Error message: "Error: Invalid handle.
       # "
       # returncode 2; expected [0]
       #
       # -----> teardown stage *** Aborting test run.
       #
       # <_io.BufferedReader name=3> *** stdout ***
       #
       # <_io.BufferedReader name=5> *** stderr ***
       # "-----> teardown stage" did not complete successfully
       # Exception <class '__main__.PluginMgrTestFail'> ('teardown', 'Error: Specified qdisc kind is unknown.\n', '"-----> teardown stage" did not complete successfully') (caught in test_runner, running test 2 84a0 Create TEQL with default setting stage teardown)
       # ---------------
       # traceback
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 495, in test_runner
       #     res = run_one_test(pm, args, index, tidx)
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 434, in run_one_test
       #     prepare_env(args, pm, 'teardown', '-----> teardown stage', tidx['teardown'], procout)
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 245, in prepare_env
       #     raise PluginMgrTestFail(
       # ---------------
       # accumulated output for this test:
       # Error: Specified qdisc kind is unknown.
       #
       # ---------------
       #
       # All test results:
       #
       # 1..1
       # ok 1 84a0 - Create TEQL with default setting # skipped - "-----> teardown stage" did not complete successfully
      
      Fixes: cc62fbe1 ("selftests/tc-testing: add selftests for teql qdisc")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aef6e908
    • Dan Carpenter's avatar
      net: ethernet: ti: am65-cpsw: Call of_node_put() on error path · 374283a1
      Dan Carpenter authored
      This code returns directly but it should instead call of_node_put()
      to drop some reference counts.
      
      Fixes: dab2b265 ("net: ethernet: ti: am65-cpsw: Add support for SERDES configuration")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Reviewed-by: default avatarRoger Quadros <rogerq@kernel.org>
      Link: https://lore.kernel.org/r/e3012f0c-1621-40e6-bf7d-03c276f6e07f@kili.mountainSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      374283a1
  3. 12 Jun, 2023 16 commits
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fixes' · fbf6f482
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      selftests: mptcp: skip tests not supported by old kernels (part 3)
      
      After a few years of increasing test coverage in the MPTCP selftests, we
      realised [1] the last version of the selftests is supposed to run on old
      kernels without issues.
      
      Supporting older versions is not that easy for this MPTCP case: these
      selftests are often validating the internals by checking packets that
      are exchanged, when some MIB counters are incremented after some
      actions, how connections are getting opened and closed in some cases,
      etc. In other words, it is not limited to the socket interface between
      the userspace and the kernelspace.
      
      In addition to that, the current MPTCP selftests run a lot of different
      sub-tests but the TAP13 protocol used in the selftests don't support
      sub-tests: one failure in sub-tests implies that the whole selftest is
      seen as failed at the end because sub-tests are not tracked. It is then
      important to skip sub-tests not supported by old kernels.
      
      To minimise the modifications and reduce the complexity to support old
      versions, the idea is to look at external signs and skip the whole
      selftest or just some sub-tests before starting them. This cannot be
      applied in all cases.
      
      Similar to the second part, this third one focuses on marking different
      sub-tests as skipped if some MPTCP features are not supported. This
      time, only in "mptcp_join.sh" selftest, the remaining one, is modified.
      Several techniques are used here to achieve this task:
      
      - Before starting some tests:
      
        - Check if a file (sysctl knob) is present: that's what patch 12/17 is
          doing for the userspace PM feature.
      
        - Check if a required kernel symbol is present in /proc/kallsyms:
          patches 9, 10, 14 and 15/17 are using this technique.
      
        - Check if it is possible to setup a particular network environment
          requiring Netfilter or TC: if the preparation step fail, the linked
          sub-test is marked as skipped. Patch 5/17 is doing that.
      
        - Check if a MIB counter is available: patches 7 and 13/17 do that.
      
        - Check if the kernel version is newer than a specific one: patch 1/17
          adds some helpers in mptcp_lib.sh to ease its use. That's not ideal
          and it is only used as last resort but as mentioned above, it is
          important to skip tests if they are not supported not to have the
          whole selftest always being marked as failed on old kernels. Patches
          11 and 17/17 are checking the kernel version. An alternative would
          be to ignore the results for some sub-tests but that's not ideal
          too. Note that SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK env var can be
          set to 1 not to skip these tests if the running kernel doesn't have
          a supported version.
      
      - After having launched the tests:
      
        - Adapt the expectations depending on the presence of a kernel symbol
          (patch 6/17) or a kernel version (patch 8/17).
      
        - Check is a MIB counter is available and skip the verification if
          not. Patch 4/17 is using this technique.
      
      Before skipping tests, SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var
      value is checked: if it is set to 1, the test is marked as "failed"
      instead of "skipped". MPTCP public CI expects to have all features
      supported and it sets this env var to 1 to catch regressions in these
      new checks.
      
      Patch 2/17 uses 'iptables-legacy' if available because it might be
      needed when using an older kernel not supporting iptables-nft.
      
      Patch 3/17 adds some helpers used in the other patches mentioned to
      easily mark sub-tests as skipped.
      
      Patch 16/17 uniforms MPTCP Join "listener" tests: it was imported code
      from userspace_pm.sh but without using the "code style" and ways of
      using tools and printing messages from MPTCP Join selftest.
      
      Link: https://lore.kernel.org/stable/CA+G9fYtDGpgT4dckXD-y-N92nqUxuvue_7AtDdBcHrbOMsDZLg@mail.gmail.com/ [1]
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      ====================
      
      Link: https://lore.kernel.org/r/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fbf6f482
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip mixed tests if not supported · 6673851b
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of a mix of subflows in v4 and v6 by the
      in-kernel PM introduced by commit b9d69db8 ("mptcp: let the
      in-kernel PM use mixed IPv4 and IPv6 addresses").
      
      It looks like there is no external sign we can use to predict the
      expected behaviour. Instead of accepting different behaviours and thus
      not really checking for the expected behaviour, we are looking here for
      a specific kernel version. That's not ideal but it looks better than
      removing the test because it cannot support older kernel versions.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: ad349374 ("selftests: mptcp: add test-cases for mixed v4/v6 subflows")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6673851b
    • Matthieu Baerts's avatar
      selftests: mptcp: join: uniform listener tests · 96b84195
      Matthieu Baerts authored
      The alignment was different from the other tests because tabs were used
      instead of spaces.
      
      While at it, also use 'echo' instead of 'printf' to print the result to
      keep the same style as done in the other sub-tests. And, even if it
      should be better with, also remove 'stdbuf' and sed's '--unbuffered'
      option because they are not used in the other subtests and they are not
      available when using a minimal environment with busybox.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 178d0232 ("selftests: mptcp: listener test for in-kernel PM")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      96b84195
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip PM listener tests if not supported · 0471bb47
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of PM listener events introduced by commit
      f8c9dfbd ("mptcp: add pm listener events").
      
      It is possible to look for "mptcp_event_pm_listener" in kallsyms to know
      in advance if the kernel supports this feature.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 178d0232 ("selftests: mptcp: listener test for in-kernel PM")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0471bb47
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip MPC backups tests if not supported · 632978f0
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of sending an MP_PRIO signal for the initial
      subflow, introduced by commit c157bbe7 ("mptcp: allow the in kernel
      PM to set MPC subflow priority").
      
      It is possible to look for "mptcp_subflow_send_ack" in kallsyms because
      it was needed to introduce the mentioned feature. So we can know in
      advance if the feature is supported instead of trying and accepting any
      results.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 914f6a59 ("selftests: mptcp: add MPC backup tests")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      632978f0
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip fail tests if not supported · ff8897b5
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of the MP_FAIL / infinite mapping introduced
      by commit 1e39e5a3 ("mptcp: infinite mapping sending") and the
      following ones.
      
      It is possible to look for one of the infinite mapping counters to know
      in advance if the this feature is available.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: b6e074e1 ("selftests: mptcp: add infinite map testcase")
      Cc: stable@vger.kernel.org
      Fixes: 2ba18161 ("selftests: mptcp: add MP_FAIL reset testcase")
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ff8897b5
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip userspace PM tests if not supported · f2b492b0
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of the userspace PM introduced by commit
      4638de5a ("mptcp: handle local addrs announced by userspace PMs")
      and the following ones.
      
      It is possible to look for the MPTCP pm_type's sysctl knob to know in
      advance if the userspace PM is available.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 5ac1d2d6 ("selftests: mptcp: Add tests for userspace PM type")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f2b492b0
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip fullmesh flag tests if not supported · 9db34c42
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of the fullmesh flag for the in-kernel PM
      introduced by commit 2843ff6f ("mptcp: remote addresses fullmesh")
      and commit 1a0d6136 ("mptcp: local addresses fullmesh").
      
      It looks like there is no easy external sign we can use to predict the
      expected behaviour. We could add the flag and then check if it has been
      added but for that, and for each fullmesh test, we would need to setup a
      new environment, do the checks, clean it and then only start the test
      from yet another clean environment. To keep it simple and avoid
      introducing new issues, we look for a specific kernel version. That's
      not ideal but an acceptable solution for this case.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 6a0653b9 ("selftests: mptcp: add fullmesh setting tests")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9db34c42
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip backup if set flag on ID not supported · 07216a3c
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      Commit bccefb76 ("selftests: mptcp: simplify pm_nl_change_endpoint")
      has simplified the way the backup flag is set on an endpoint. Instead of
      doing:
      
        ./pm_nl_ctl set 10.0.2.1 flags backup
      
      Now we do:
      
        ./pm_nl_ctl set id 1 flags backup
      
      The new way is easier to maintain but it is also incompatible with older
      kernels not supporting the implicit endpoints putting in place the
      infrastructure to set flags per ID, hence the second Fixes tag.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: bccefb76 ("selftests: mptcp: simplify pm_nl_change_endpoint")
      Cc: stable@vger.kernel.org
      Fixes: 4cf86ae8 ("mptcp: strict local address ID selection")
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      07216a3c
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip implicit tests if not supported · 36c4127a
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of the implicit endpoints introduced by
      commit d045b9eb ("mptcp: introduce implicit endpoints").
      
      It is possible to look for "mptcp_subflow_send_ack" in kallsyms because
      it was needed to introduce the mentioned feature. So we can know in
      advance if the feature is supported instead of trying and accepting any
      results.
      
      Note that here and in the following commits, we re-do the same check for
      each sub-test of the same function for a few reasons. The main one is
      not to break the ID assign to each test in order to be able to easily
      compare results between different kernel versions. Also, we can still
      run a specific test even if it is skipped. Another reason is that it
      makes it clear during the review that a specific subtest will be skipped
      or not under certain conditions. At the end, it looks OK to call the
      exact same helper multiple times: it is not a critical path and it is
      the same code that is executed, not really more cases to maintain.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 69c6ce7b ("selftests: mptcp: add implicit endpoint test case")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      36c4127a
    • Matthieu Baerts's avatar
      selftests: mptcp: join: support RM_ADDR for used endpoints or not · 425ba803
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      At some points, a new feature caused internal behaviour changes we are
      verifying in the selftests, see the Fixes tag below. It was not a UAPI
      change but because in these selftests, we check some internal
      behaviours, it is normal we have to adapt them from time to time after
      having added some features.
      
      It looks like there is no external sign we can use to predict the
      expected behaviour. Instead of accepting different behaviours and thus
      not really checking for the expected behaviour, we are looking here for
      a specific kernel version. That's not ideal but it looks better than
      removing the test because it cannot support older kernel versions.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 6fa0174a ("mptcp: more careful RM_ADDR generation")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      425ba803
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip Fastclose tests if not supported · ae947bb2
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of MP_FASTCLOSE introduced in commit
      f284c0c7 ("mptcp: implement fastclose xmit path").
      
      If the MIB counter is not available, the test cannot be verified and the
      behaviour will not be the expected one. So we can skip the test if the
      counter is missing.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 01542c9b ("selftests: mptcp: add fastclose testcase")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ae947bb2
    • Matthieu Baerts's avatar
      selftests: mptcp: join: support local endpoint being tracked or not · d4c81bbb
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      At some points, a new feature caused internal behaviour changes we are
      verifying in the selftests, see the Fixes tag below. It was not a uAPI
      change but because in these selftests, we check some internal
      behaviours, it is normal we have to adapt them from time to time after
      having added some features.
      
      It is possible to look for "mptcp_pm_subflow_check_next" in kallsyms
      because it was needed to introduce the mentioned feature. So we can know
      in advance what the behaviour we are expecting here instead of
      supporting the two behaviours.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 86e39e04 ("mptcp: keep track of local endpoint still available for each msk")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d4c81bbb
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip test if iptables/tc cmds fail · 4a0b866a
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      Some tests are using IPTables and/or TC commands to force some
      behaviours. If one of these commands fails -- likely because some
      features are not available due to missing kernel config -- we should
      intercept the error and skip the tests requiring these features.
      
      Note that if we expect to have these features available and if
      SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
      will be marked as failed instead of skipped.
      
      This patch also replaces the 'exit 1' by 'return 1' not to stop the
      selftest in the middle without the conclusion if there is an issue with
      NF or TC.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 8d014eaa ("selftests: mptcp: add ADD_ADDR timeout test case")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4a0b866a
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip check if MIB counter not supported · 47867f0a
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the MPTCP MIB counters introduced in commit fc518953
      ("mptcp: add and use MIB counter infrastructure") and more later. The
      MPTCP Join selftest heavily relies on these counters.
      
      If a counter is not supported by the kernel, it is not displayed when
      using 'nstat -z'. We can then detect that and skip the verification. A
      new helper (get_counter()) has been added to do the required checks and
      return an error if the counter is not available.
      
      Note that if we expect to have these features available and if
      SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
      will be marked as failed instead of skipped.
      
      This new helper also makes sure we get the exact counter we want to
      avoid issues we had in the past, e.g. with MPTcpExtRmAddr and
      MPTcpExtRmAddrDrop sharing the same prefix. While at it, we uniform the
      way we fetch a MIB counter.
      
      Note for the backports: we rarely change these modified blocks so if
      there is are conflicts, it is very likely because a counter is not used
      in the older kernels and we don't need that chunk.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: b08fbf24 ("selftests: add test-cases for MPTCP MP_JOIN")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      47867f0a
    • Matthieu Baerts's avatar
      selftests: mptcp: join: helpers to skip tests · cdb50525
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      Here are some helpers that will be used to mark subtests as skipped if a
      feature is not supported. Marking as a fix for the commit introducing
      this selftest to help with the backports.
      
      While at it, also check if kallsyms feature is available as it will also
      be used in the following commits to check if MPTCP features are
      available before starting a test.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: b08fbf24 ("selftests: add test-cases for MPTCP MP_JOIN")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cdb50525