1. 20 Feb, 2024 3 commits
    • Kuniyuki Iwashima's avatar
      arp: Prevent overflow in arp_req_get(). · a7d60277
      Kuniyuki Iwashima authored
      syzkaller reported an overflown write in arp_req_get(). [0]
      
      When ioctl(SIOCGARP) is issued, arp_req_get() looks up an neighbour
      entry and copies neigh->ha to struct arpreq.arp_ha.sa_data.
      
      The arp_ha here is struct sockaddr, not struct sockaddr_storage, so
      the sa_data buffer is just 14 bytes.
      
      In the splat below, 2 bytes are overflown to the next int field,
      arp_flags.  We initialise the field just after the memcpy(), so it's
      not a problem.
      
      However, when dev->addr_len is greater than 22 (e.g. MAX_ADDR_LEN),
      arp_netmask is overwritten, which could be set as htonl(0xFFFFFFFFUL)
      in arp_ioctl() before calling arp_req_get().
      
      To avoid the overflow, let's limit the max length of memcpy().
      
      Note that commit b5f0de6d ("net: dev: Convert sa_data to flexible
      array in struct sockaddr") just silenced syzkaller.
      
      [0]:
      memcpy: detected field-spanning write (size 16) of single field "r->arp_ha.sa_data" at net/ipv4/arp.c:1128 (size 14)
      WARNING: CPU: 0 PID: 144638 at net/ipv4/arp.c:1128 arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128
      Modules linked in:
      CPU: 0 PID: 144638 Comm: syz-executor.4 Not tainted 6.1.74 #31
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
      RIP: 0010:arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128
      Code: fd ff ff e8 41 42 de fb b9 0e 00 00 00 4c 89 fe 48 c7 c2 20 6d ab 87 48 c7 c7 80 6d ab 87 c6 05 25 af 72 04 01 e8 5f 8d ad fb <0f> 0b e9 6c fd ff ff e8 13 42 de fb be 03 00 00 00 4c 89 e7 e8 a6
      RSP: 0018:ffffc900050b7998 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: ffff88803a815000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff8641a44a RDI: 0000000000000001
      RBP: ffffc900050b7a98 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 203a7970636d656d R12: ffff888039c54000
      R13: 1ffff92000a16f37 R14: ffff88803a815084 R15: 0000000000000010
      FS:  00007f172bf306c0(0000) GS:ffff88805aa00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f172b3569f0 CR3: 0000000057f12005 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       arp_ioctl+0x33f/0x4b0 net/ipv4/arp.c:1261
       inet_ioctl+0x314/0x3a0 net/ipv4/af_inet.c:981
       sock_do_ioctl+0xdf/0x260 net/socket.c:1204
       sock_ioctl+0x3ef/0x650 net/socket.c:1321
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:870 [inline]
       __se_sys_ioctl fs/ioctl.c:856 [inline]
       __x64_sys_ioctl+0x18e/0x220 fs/ioctl.c:856
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x37/0x90 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x64/0xce
      RIP: 0033:0x7f172b262b8d
      Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f172bf300b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00007f172b3abf80 RCX: 00007f172b262b8d
      RDX: 0000000020000000 RSI: 0000000000008954 RDI: 0000000000000003
      RBP: 00007f172b2d3493 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007f172b3abf80 R15: 00007f172bf10000
       </TASK>
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Reported-by: default avatarBjoern Doebel <doebel@amazon.de>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240215230516.31330-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a7d60277
    • Vasiliy Kovalev's avatar
      devlink: fix possible use-after-free and memory leaks in devlink_init() · def689fc
      Vasiliy Kovalev authored
      The pernet operations structure for the subsystem must be registered
      before registering the generic netlink family.
      
      Make an unregister in case of unsuccessful registration.
      
      Fixes: 687125b5 ("devlink: split out core code")
      Signed-off-by: default avatarVasiliy Kovalev <kovalev@altlinux.org>
      Link: https://lore.kernel.org/r/20240215203400.29976-1-kovalev@altlinux.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      def689fc
    • Vasiliy Kovalev's avatar
      ipv6: sr: fix possible use-after-free and null-ptr-deref · 5559cea2
      Vasiliy Kovalev authored
      The pernet operations structure for the subsystem must be registered
      before registering the generic netlink family.
      
      Fixes: 915d7e5e ("ipv6: sr: add code base for control plane support of SR-IPv6")
      Signed-off-by: default avatarVasiliy Kovalev <kovalev@altlinux.org>
      Link: https://lore.kernel.org/r/20240215202717.29815-1-kovalev@altlinux.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5559cea2
  2. 19 Feb, 2024 3 commits
    • Kees Cook's avatar
      enic: Avoid false positive under FORTIFY_SOURCE · 40b9385d
      Kees Cook authored
      FORTIFY_SOURCE has been ignoring 0-sized destinations while the kernel
      code base has been converted to flexible arrays. In order to enforce
      the 0-sized destinations (e.g. with __counted_by), the remaining 0-sized
      destinations need to be handled. Unfortunately, struct vic_provinfo
      resists full conversion, as it contains a flexible array of flexible
      arrays, which is only possible with the 0-sized fake flexible array.
      
      Use unsafe_memcpy() to avoid future false positives under
      CONFIG_FORTIFY_SOURCE.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40b9385d
    • Shannon Nelson's avatar
      ionic: use pci_is_enabled not open code · 121e4dcb
      Shannon Nelson authored
      Since there is a utility available for this, use
      the API rather than open code.
      
      Fixes: 13943d6c ("ionic: prevent pci disable of already disabled device")
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      121e4dcb
    • Hangbin Liu's avatar
      selftests: bonding: set active slave to primary eth1 specifically · cd65c48d
      Hangbin Liu authored
      In bond priority testing, we set the primary interface to eth1 and add
      eth0,1,2 to bond in serial. This is OK in normal times. But when in
      debug kernel, the bridge port that eth0,1,2 connected would start
      slowly (enter blocking, forwarding state), which caused the primary
      interface down for a while after enslaving and active slave changed.
      Here is a test log from Jakub's debug test[1].
      
       [  400.399070][   T50] br0: port 1(s0) entered disabled state
       [  400.400168][   T50] br0: port 4(s2) entered disabled state
       [  400.941504][ T2791] bond0: (slave eth0): making interface the new active one
       [  400.942603][ T2791] bond0: (slave eth0): Enslaving as an active interface with an up link
       [  400.943633][ T2766] br0: port 1(s0) entered blocking state
       [  400.944119][ T2766] br0: port 1(s0) entered forwarding state
       [  401.128792][ T2792] bond0: (slave eth1): making interface the new active one
       [  401.130771][ T2792] bond0: (slave eth1): Enslaving as an active interface with an up link
       [  401.131643][   T69] br0: port 2(s1) entered blocking state
       [  401.132067][   T69] br0: port 2(s1) entered forwarding state
       [  401.346201][ T2793] bond0: (slave eth2): Enslaving as a backup interface with an up link
       [  401.348414][   T50] br0: port 4(s2) entered blocking state
       [  401.348857][   T50] br0: port 4(s2) entered forwarding state
       [  401.519669][  T250] bond0: (slave eth0): link status definitely down, disabling slave
       [  401.526522][  T250] bond0: (slave eth1): link status definitely down, disabling slave
       [  401.526986][  T250] bond0: (slave eth2): making interface the new active one
       [  401.629470][  T250] bond0: (slave eth0): link status definitely up
       [  401.630089][  T250] bond0: (slave eth1): link status definitely up
       [...]
       # TEST: prio (active-backup ns_ip6_target primary_reselect 1)         [FAIL]
       # Current active slave is eth2 but not eth1
      
      Fix it by setting active slave to primary slave specifically before
      testing.
      
      [1] https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/464301/1-bond-options-sh/stdout
      
      Fixes: 481b56e0 ("selftests: bonding: re-format bond option tests")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd65c48d
  3. 18 Feb, 2024 20 commits
  4. 17 Feb, 2024 1 commit
  5. 16 Feb, 2024 8 commits
    • Jakub Kicinski's avatar
      net/sched: act_mirred: don't override retval if we already lost the skb · 166c2c8a
      Jakub Kicinski authored
      If we're redirecting the skb, and haven't called tcf_mirred_forward(),
      yet, we need to tell the core to drop the skb by setting the retcode
      to SHOT. If we have called tcf_mirred_forward(), however, the skb
      is out of our hands and returning SHOT will lead to UaF.
      
      Move the retval override to the error path which actually need it.
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Fixes: e5cf1baf ("act_mirred: use TC_ACT_REINSERT when possible")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      166c2c8a
    • Jakub Kicinski's avatar
      net/sched: act_mirred: use the backlog for mirred ingress · 52f671db
      Jakub Kicinski authored
      The test Davide added in commit ca22da2f ("act_mirred: use the backlog
      for nested calls to mirred ingress") hangs our testing VMs every 10 or so
      runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
      lockdep.
      
      The problem as previously described by Davide (see Link) is that
      if we reverse flow of traffic with the redirect (egress -> ingress)
      we may reach the same socket which generated the packet. And we may
      still be holding its socket lock. The common solution to such deadlocks
      is to put the packet in the Rx backlog, rather than run the Rx path
      inline. Do that for all egress -> ingress reversals, not just once
      we started to nest mirred calls.
      
      In the past there was a concern that the backlog indirection will
      lead to loss of error reporting / less accurate stats. But the current
      workaround does not seem to address the issue.
      
      Fixes: 53592b36 ("net/sched: act_mirred: Implement ingress actions")
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Suggested-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52f671db
    • Randy Dunlap's avatar
      net: ethernet: adi: requires PHYLIB support · a9f80df4
      Randy Dunlap authored
      This driver uses functions that are supplied by the Kconfig symbol
      PHYLIB, so select it to ensure that they are built as needed.
      
      When CONFIG_ADIN1110=y and CONFIG_PHYLIB=m, there are multiple build
      (linker) errors that are resolved by this Kconfig change:
      
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_open':
         drivers/net/ethernet/adi/adin1110.c:933: undefined reference to `phy_start'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_probe_netdevs':
         drivers/net/ethernet/adi/adin1110.c:1603: undefined reference to `get_phy_device'
         ld: drivers/net/ethernet/adi/adin1110.c:1609: undefined reference to `phy_connect'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
         drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `devm_mdiobus_alloc':
         include/linux/phy.h:455: undefined reference to `devm_mdiobus_alloc_size'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_register_mdiobus':
         drivers/net/ethernet/adi/adin1110.c:529: undefined reference to `__devm_mdiobus_register'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_stop':
         drivers/net/ethernet/adi/adin1110.c:958: undefined reference to `phy_stop'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
         drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_adjust_link':
         drivers/net/ethernet/adi/adin1110.c:1077: undefined reference to `phy_print_status'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_ioctl':
         drivers/net/ethernet/adi/adin1110.c:790: undefined reference to `phy_do_ioctl'
         ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf60): undefined reference to `phy_ethtool_get_link_ksettings'
         ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf68): undefined reference to `phy_ethtool_set_link_ksettings'
      
      Fixes: bc93e19d ("net: ethernet: adi: Add ADIN1110 support")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202402070626.eZsfVHG5-lkp@intel.com/
      Cc: Lennart Franzen <lennart@lfdomain.com>
      Cc: Alexandru Tachici <alexandru.tachici@analog.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Reviewed-by: default avatarNuno Sa <nuno.sa@analog.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9f80df4
    • Kuniyuki Iwashima's avatar
      dccp/tcp: Unhash sk from ehash for tb2 alloc failure after check_estalblished(). · 66b60b0c
      Kuniyuki Iwashima authored
      syzkaller reported a warning [0] in inet_csk_destroy_sock() with no
      repro.
      
        WARN_ON(inet_sk(sk)->inet_num && !inet_csk(sk)->icsk_bind_hash);
      
      However, the syzkaller's log hinted that connect() failed just before
      the warning due to FAULT_INJECTION.  [1]
      
      When connect() is called for an unbound socket, we search for an
      available ephemeral port.  If a bhash bucket exists for the port, we
      call __inet_check_established() or __inet6_check_established() to check
      if the bucket is reusable.
      
      If reusable, we add the socket into ehash and set inet_sk(sk)->inet_num.
      
      Later, we look up the corresponding bhash2 bucket and try to allocate
      it if it does not exist.
      
      Although it rarely occurs in real use, if the allocation fails, we must
      revert the changes by check_established().  Otherwise, an unconnected
      socket could illegally occupy an ehash entry.
      
      Note that we do not put tw back into ehash because sk might have
      already responded to a packet for tw and it would be better to free
      tw earlier under such memory presure.
      
      [0]:
      WARNING: CPU: 0 PID: 350830 at net/ipv4/inet_connection_sock.c:1193 inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193)
      Modules linked in:
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193)
      Code: 41 5c 41 5d 41 5e e9 2d 4a 3d fd e8 28 4a 3d fd 48 89 ef e8 f0 cd 7d ff 5b 5d 41 5c 41 5d 41 5e e9 13 4a 3d fd e8 0e 4a 3d fd <0f> 0b e9 61 fe ff ff e8 02 4a 3d fd 4c 89 e7 be 03 00 00 00 e8 05
      RSP: 0018:ffffc9000b21fd38 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000009e78 RCX: ffffffff840bae40
      RDX: ffff88806e46c600 RSI: ffffffff840bb012 RDI: ffff88811755cca8
      RBP: ffff88811755c880 R08: 0000000000000003 R09: 0000000000000000
      R10: 0000000000009e78 R11: 0000000000000000 R12: ffff88811755c8e0
      R13: ffff88811755c892 R14: ffff88811755c918 R15: 0000000000000000
      FS:  00007f03e5243800(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b32f21000 CR3: 0000000112ffe001 CR4: 0000000000770ef0
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193)
       dccp_close (net/dccp/proto.c:1078)
       inet_release (net/ipv4/af_inet.c:434)
       __sock_release (net/socket.c:660)
       sock_close (net/socket.c:1423)
       __fput (fs/file_table.c:377)
       __fput_sync (fs/file_table.c:462)
       __x64_sys_close (fs/open.c:1557 fs/open.c:1539 fs/open.c:1539)
       do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
      RIP: 0033:0x7f03e53852bb
      Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 43 c9 f5 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 c9 f5 ff 8b 44
      RSP: 002b:00000000005dfba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
      RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f03e53852bb
      RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000167c
      R10: 0000000008a79680 R11: 0000000000000293 R12: 00007f03e4e43000
      R13: 00007f03e4e43170 R14: 00007f03e4e43178 R15: 00007f03e4e43170
       </TASK>
      
      [1]:
      FAULT_INJECTION: forcing a failure.
      name failslab, interval 1, probability 0, space 0, times 0
      CPU: 0 PID: 350833 Comm: syz-executor.1 Not tainted 6.7.0-12272-g2121c43f #9
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
       should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
       should_failslab (mm/slub.c:3748)
       kmem_cache_alloc (mm/slub.c:3763 mm/slub.c:3842 mm/slub.c:3867)
       inet_bind2_bucket_create (net/ipv4/inet_hashtables.c:135)
       __inet_hash_connect (net/ipv4/inet_hashtables.c:1100)
       dccp_v4_connect (net/dccp/ipv4.c:116)
       __inet_stream_connect (net/ipv4/af_inet.c:676)
       inet_stream_connect (net/ipv4/af_inet.c:747)
       __sys_connect_file (net/socket.c:2048 (discriminator 2))
       __sys_connect (net/socket.c:2065)
       __x64_sys_connect (net/socket.c:2072)
       do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
      RIP: 0033:0x7f03e5284e5d
      Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
      RSP: 002b:00007f03e4641cc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f03e5284e5d
      RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
      R13: 000000000000000b R14: 00007f03e52e5530 R15: 0000000000000000
       </TASK>
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Fixes: 28044fc1 ("net: Add a bhash2 table hashed by port and address")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66b60b0c
    • David S. Miller's avatar
      Merge branch 'bridge-mdb-events' · 82a678e2
      David S. Miller authored
      Tobias Waldekranz says:
      
      ====================
      net: bridge: switchdev: Ensure MDB events are delivered exactly once
      
      When a device is attached to a bridge, drivers will request a replay
      of objects that were created before the device joined the bridge, that
      are still of interest to the joining port. Typical examples include
      FDB entries and MDB memberships on other ports ("foreign interfaces")
      or on the bridge itself.
      
      Conversely when a device is detached, the bridge will synthesize
      deletion events for all those objects that are still live, but no
      longer applicable to the device in question.
      
      This series eliminates two races related to the synching and
      unsynching phases of a bridge's MDB with a joining or leaving device,
      that would cause notifications of such objects to be either delivered
      twice (1/2), or not at all (2/2).
      
      A similar race to the one solved by 1/2 still remains for the
      FDB. This is much harder to solve, due to the lockless operation of
      the FDB's rhashtable, and is therefore knowingly left out of this
      series.
      
      v1 -> v2:
      - Squash the previously separate addition of
        switchdev_port_obj_act_is_deferred into first consumer.
      - Use ether_addr_equal to compare MAC addresses.
      - Document switchdev_port_obj_act_is_deferred (renamed from
        switchdev_port_obj_is_deferred in v1, to indicate that we also match
        on the action).
      - Delay allocations of MDB objects until we know they're needed.
      - Use non-RCU version of the hash list iterator, now that the MDB is
        not scanned while holding the RCU read lock.
      - Add Fixes tag to commit message
      
      v2 -> v3:
      - Fix unlocking in error paths
      - Access RCU protected port list via mlock_dereference, since MDB is
        guaranteed to remain constant for the duration of the scan.
      
      v3 -> v4:
      - Limit the search for exiting deferred events in 1/2 to only apply to
        additions, since the problem does not exist in the deletion case.
      - Add 2/2, to plug a related race when unoffloading an indirectly
        associated device.
      
      v4 -> v5:
      - Fix grammatical errors in kerneldoc of
        switchdev_port_obj_act_is_deferred
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82a678e2
    • Tobias Waldekranz's avatar
      net: bridge: switchdev: Ensure deferred event delivery on unoffload · f7a70d65
      Tobias Waldekranz authored
      When unoffloading a device, it is important to ensure that all
      relevant deferred events are delivered to it before it disassociates
      itself from the bridge.
      
      Before this change, this was true for the normal case when a device
      maps 1:1 to a net_bridge_port, i.e.
      
         br0
         /
      swp0
      
      When swp0 leaves br0, the call to switchdev_deferred_process() in
      del_nbp() makes sure to process any outstanding events while the
      device is still associated with the bridge.
      
      In the case when the association is indirect though, i.e. when the
      device is attached to the bridge via an intermediate device, like a
      LAG...
      
          br0
          /
        lag0
        /
      swp0
      
      ...then detaching swp0 from lag0 does not cause any net_bridge_port to
      be deleted, so there was no guarantee that all events had been
      processed before the device disassociated itself from the bridge.
      
      Fix this by always synchronously processing all deferred events before
      signaling completion of unoffloading back to the driver.
      
      Fixes: 4e51bf44 ("net: bridge: move the switchdev object replay helpers to "push" mode")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7a70d65
    • Tobias Waldekranz's avatar
      net: bridge: switchdev: Skip MDB replays of deferred events on offload · dc489f86
      Tobias Waldekranz authored
      Before this change, generation of the list of MDB events to replay
      would race against the creation of new group memberships, either from
      the IGMP/MLD snooping logic or from user configuration.
      
      While new memberships are immediately visible to walkers of
      br->mdb_list, the notification of their existence to switchdev event
      subscribers is deferred until a later point in time. So if a replay
      list was generated during a time that overlapped with such a window,
      it would also contain a replay of the not-yet-delivered event.
      
      The driver would thus receive two copies of what the bridge internally
      considered to be one single event. On destruction of the bridge, only
      a single membership deletion event was therefore sent. As a
      consequence of this, drivers which reference count memberships (at
      least DSA), would be left with orphan groups in their hardware
      database when the bridge was destroyed.
      
      This is only an issue when replaying additions. While deletion events
      may still be pending on the deferred queue, they will already have
      been removed from br->mdb_list, so no duplicates can be generated in
      that scenario.
      
      To a user this meant that old group memberships, from a bridge in
      which a port was previously attached, could be reanimated (in
      hardware) when the port joined a new bridge, without the new bridge's
      knowledge.
      
      For example, on an mv88e6xxx system, create a snooping bridge and
      immediately add a port to it:
      
          root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
          > ip link set dev x3 up master br0
      
      And then destroy the bridge:
      
          root@infix-06-0b-00:~$ ip link del dev br0
          root@infix-06-0b-00:~$ mvls atu
          ADDRESS             FID  STATE      Q  F  0  1  2  3  4  5  6  7  8  9  a
          DEV:0 Marvell 88E6393X
          33:33:00:00:00:6a     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
          33:33:ff:87:e4:3f     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
          ff:ff:ff:ff:ff:ff     1  static     -  -  0  1  2  3  4  5  6  7  8  9  a
          root@infix-06-0b-00:~$
      
      The two IPv6 groups remain in the hardware database because the
      port (x3) is notified of the host's membership twice: once via the
      original event and once via a replay. Since only a single delete
      notification is sent, the count remains at 1 when the bridge is
      destroyed.
      
      Then add the same port (or another port belonging to the same hardware
      domain) to a new bridge, this time with snooping disabled:
      
          root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
          > ip link set dev x3 up master br1
      
      All multicast, including the two IPv6 groups from br0, should now be
      flooded, according to the policy of br1. But instead the old
      memberships are still active in the hardware database, causing the
      switch to only forward traffic to those groups towards the CPU (port
      0).
      
      Eliminate the race in two steps:
      
      1. Grab the write-side lock of the MDB while generating the replay
         list.
      
      This prevents new memberships from showing up while we are generating
      the replay list. But it leaves the scenario in which a deferred event
      was already generated, but not delivered, before we grabbed the
      lock. Therefore:
      
      2. Make sure that no deferred version of a replay event is already
         enqueued to the switchdev deferred queue, before adding it to the
         replay list, when replaying additions.
      
      Fixes: 4f2673b3 ("net: bridge: add helper to replay port and host-joined mdb entries")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc489f86
    • Alexander Gordeev's avatar
      net/iucv: fix the allocation size of iucv_path_table array · b4ea9b6a
      Alexander Gordeev authored
      iucv_path_table is a dynamically allocated array of pointers to
      struct iucv_path items. Yet, its size is calculated as if it was
      an array of struct iucv_path items.
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4ea9b6a
  6. 15 Feb, 2024 5 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 4f5e5092
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from can, wireless and netfilter.
      
        Current release - regressions:
      
         - af_unix: fix task hung while purging oob_skb in GC
      
         - pds_core: do not try to run health-thread in VF path
      
        Current release - new code bugs:
      
         - sched: act_mirred: don't zero blockid when net device is being
           deleted
      
        Previous releases - regressions:
      
         - netfilter:
            - nat: restore default DNAT behavior
            - nf_tables: fix bidirectional offload, broken when unidirectional
              offload support was added
      
         - openvswitch: limit the number of recursions from action sets
      
         - eth: i40e: do not allow untrusted VF to remove administratively set
           MAC address
      
        Previous releases - always broken:
      
         - tls: fix races and bugs in use of async crypto
      
         - mptcp: prevent data races on some of the main socket fields, fix
           races in fastopen handling
      
         - dpll: fix possible deadlock during netlink dump operation
      
         - dsa: lan966x: fix crash when adding interface under a lag when some
           of the ports are disabled
      
         - can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
      
        Misc:
      
         - a handful of fixes and reliability improvements for selftests
      
         - fix sysfs documentation missing net/ in paths
      
         - finish the work of squashing the missing MODULE_DESCRIPTION()
           warnings in networking"
      
      * tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
        net: fill in MODULE_DESCRIPTION()s for missing arcnet
        net: fill in MODULE_DESCRIPTION()s for mdio_devres
        net: fill in MODULE_DESCRIPTION()s for ppp
        net: fill in MODULE_DESCRIPTION()s for fddik/skfp
        net: fill in MODULE_DESCRIPTION()s for plip
        net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb
        net: fill in MODULE_DESCRIPTION()s for xen-netback
        net: ravb: Count packets instead of descriptors in GbEth RX path
        pppoe: Fix memory leak in pppoe_sendmsg()
        net: sctp: fix skb leak in sctp_inq_free()
        net: bcmasp: Handle RX buffer allocation failure
        net-timestamp: make sk_tskey more predictable in error path
        selftests: tls: increase the wait in poll_partial_rec_async
        ice: Add check for lport extraction to LAG init
        netfilter: nf_tables: fix bidirectional offload regression
        netfilter: nat: restore default DNAT behavior
        netfilter: nft_set_pipapo: fix missing : in kdoc
        igc: Remove temporary workaround
        igb: Fix string truncation warnings in igb_set_fw_version
        can: netlink: Fix TDCO calculation using the old data bittiming
        ...
      4f5e5092
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · cc9c4f0b
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
       "Fixes and simple cleanups:
      
         - use a proper flexible array instead of a one-element array in order
           to avoid array-bounds sanitizer errors
      
         - add NULL pointer checks after allocating memory
      
         - use memdup_array_user() instead of open-coding it
      
         - fix a rare race condition in Xen event channel allocation code
      
         - make struct bus_type instances const
      
         - make kerneldoc inline comments match reality"
      
      * tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/events: close evtchn after mapping cleanup
        xen/gntalloc: Replace UAPI 1-element array
        xen: balloon: make balloon_subsys const
        xen: pcpu: make xen_pcpu_subsys const
        xen/privcmd: Use memdup_array_user() in alloc_ioreq()
        x86/xen: Add some null pointer checking to smp.c
        xen/xenbus: document will_handle argument for xenbus_watch_path()
      cc9c4f0b
    • Linus Torvalds's avatar
      update workarounds for gcc "asm goto" issue · 68fb3ca0
      Linus Torvalds authored
      In commit 4356e9f8 ("work around gcc bugs with 'asm goto' with
      outputs") I did the gcc workaround unconditionally, because the cause of
      the bad code generation wasn't entirely clear.
      
      In the meantime, Jakub Jelinek debugged the issue, and has come up with
      a fix in gcc [2], which also got backported to the still maintained
      branches of gcc-11, gcc-12 and gcc-13.
      
      Note that while the fix technically wasn't in the original gcc-14
      branch, Jakub says:
      
       "while it is true that no GCC 14 snapshots until today (or whenever the
        fix will be committed) have the fix, for GCC trunk it is up to the
        distros to use the latest snapshot if they use it at all and would
        allow better testing of the kernel code without the workaround, so
        that if there are other issues they won't be discovered years later.
        Most userland code doesn't actually use asm goto with outputs..."
      
      so we will consider gcc-14 to be fixed - if somebody is using gcc
      snapshots of the gcc-14 before the fix, they should upgrade.
      
      Note that while the bug goes back to gcc-11, in practice other gcc
      changes seem to have effectively hidden it since gcc-12.1 as per a
      bisect by Jakub.  So even a gcc-14 snapshot without the fix likely
      doesn't show actual problems.
      
      Also, make the default 'asm_goto_output()' macro mark the asm as
      volatile by hand, because of an unrelated gcc issue [1] where it doesn't
      match the documented behavior ("asm goto is always volatile").
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 [1]
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 [2]
      Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/Requested-by: default avatarJakub Jelinek <jakub@redhat.com>
      Cc: Uros Bizjak <ubizjak@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Andrew Pinski <quic_apinski@quicinc.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      68fb3ca0
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 339e2fca
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
      
       - Improve devlink dependency parsing for DT graphs
      
       - Fix devlink handling of io-channels dependencies
      
       - Fix PCI addressing in marvell,prestera example
      
       - A few schema fixes for property constraints
      
       - Improve performance of DT unprobed devices kselftest
      
       - Fix regression in DT_SCHEMA_FILES handling
      
       - Fix compile error in unittest for !OF_DYNAMIC
      
      * tag 'devicetree-fixes-for-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: ufs: samsung,exynos-ufs: Add size constraints on "samsung,sysreg"
        of: property: Add in-ports/out-ports support to of_graph_get_port_parent()
        of: property: Improve finding the supplier of a remote-endpoint property
        of: property: Improve finding the consumer of a remote-endpoint property
        net: marvell,prestera: Fix example PCI bus addressing
        of: unittest: Fix compile in the non-dynamic case
        of: property: fix typo in io-channels
        dt-bindings: tpm: Drop type from "resets"
        dt-bindings: display: nxp,tda998x: Fix 'audio-ports' constraints
        dt-bindings: xilinx: replace Piyush Mehta maintainership
        kselftest: dt: Stop relying on dirname to improve performance
        dt-bindings: don't anchor DT_SCHEMA_FILES to bindings directory
      339e2fca
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · a00cf198
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A smallish collection of fixes for SPI, all driver specific, plus one
        device ID addition for a new Intel part.
      
        The ppc4xx isn't routinely covered by most of the automated testing so
        there were some errors that were missed in some of the recent API
        conversions, otherwise there's nothing super remarkable here"
      
      * tag 'spi-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi-mxs: Fix chipselect glitch
        spi: intel-pci: Add support for Lunar Lake-M SPI serial flash
        spi: omap2-mcspi: Revert FIFO support without DMA
        spi: ppc4xx: Drop write-only variable
        spi: ppc4xx: Fix fallout from rename in struct spi_bitbang
        spi: ppc4xx: Fix fallout from include cleanup
        spi: spi-ppc4xx: include missing platform_device.h
        spi: imx: fix the burst length at DMA mode and CPU mode
      a00cf198