1. 06 Apr, 2023 4 commits
  2. 05 Apr, 2023 10 commits
    • Oliver Hartkopp's avatar
      can: isotp: fix race between isotp_sendsmg() and isotp_release() · 05173743
      Oliver Hartkopp authored
      As discussed with Dae R. Jeong and Hillf Danton here [1] the sendmsg()
      function in isotp.c might get into a race condition when restoring the
      former tx.state from the old_state.
      
      Remove the old_state concept and implement proper locking for the
      ISOTP_IDLE transitions in isotp_sendmsg(), inspired by a
      simplification idea from Hillf Danton.
      
      Introduce a new tx.state ISOTP_SHUTDOWN and use the same locking
      mechanism from isotp_release() which resolves a potential race between
      isotp_sendsmg() and isotp_release().
      
      [1] https://lore.kernel.org/linux-can/ZB%2F93xJxq%2FBUqAgG@dragonet
      
      v1: https://lore.kernel.org/all/20230331102114.15164-1-socketcan@hartkopp.net
      v2: https://lore.kernel.org/all/20230331123600.3550-1-socketcan@hartkopp.net
          take care of signal interrupts for wait_event_interruptible() in
          isotp_release()
      v3: https://lore.kernel.org/all/20230331130654.9886-1-socketcan@hartkopp.net
          take care of signal interrupts for wait_event_interruptible() in
          isotp_sendmsg() in the wait_tx_done case
      v4: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
          take care of signal interrupts for wait_event_interruptible() in
          isotp_sendmsg() in ALL cases
      
      Cc: Dae R. Jeong <threeearcat@gmail.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Fixes: 4f027cba ("can: isotp: split tx timer into transmission and timeout")
      Link: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
      Cc: stable@vger.kernel.org
      [mkl: rephrase commit message]
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      05173743
    • Michal Sojka's avatar
      can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events · 79e19fa7
      Michal Sojka authored
      When using select()/poll()/epoll() with a non-blocking ISOTP socket to
      wait for when non-blocking write is possible, a false EPOLLOUT event
      is sometimes returned. This can happen at least after sending a
      message which must be split to multiple CAN frames.
      
      The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
      not equal to ISOTP_IDLE and this behavior is not reflected in
      datagram_poll(), which is used in isotp_ops.
      
      This is fixed by introducing ISOTP-specific poll function, which
      suppresses the EPOLLOUT events in that case.
      
      v2: https://lore.kernel.org/all/20230302092812.320643-1-michal.sojka@cvut.cz
      v1: https://lore.kernel.org/all/20230224010659.48420-1-michal.sojka@cvut.cz
          https://lore.kernel.org/all/b53a04a2-ba1f-3858-84c1-d3eb3301ae15@hartkopp.netSigned-off-by: default avatarMichal Sojka <michal.sojka@cvut.cz>
      Reported-by: default avatarJakub Jira <jirajak2@fel.cvut.cz>
      Tested-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol")
      Link: https://lore.kernel.org/all/20230331125511.372783-1-michal.sojka@cvut.cz
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      79e19fa7
    • Oliver Hartkopp's avatar
      can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos · 0145462f
      Oliver Hartkopp authored
      isotp.c was still using sock_recv_timestamp() which does not provide
      control messages to detect dropped PDUs in the receive path.
      
      Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol")
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/all/20230330170248.62342-1-socketcan@hartkopp.net
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      0145462f
    • Oleksij Rempel's avatar
      can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access · b45193cb
      Oleksij Rempel authored
      In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access
      could occur during the memcpy() operation if the size of skb->cb is
      larger than the size of struct j1939_sk_buff_cb. This is because the
      memcpy() operation uses the size of skb->cb, leading to a read beyond
      the struct j1939_sk_buff_cb.
      
      Updated the memcpy() operation to use the size of struct
      j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the
      memcpy() operation only reads the memory within the bounds of struct
      j1939_sk_buff_cb, preventing out-of-bounds memory access.
      
      Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb
      is greater than or equal to the size of struct j1939_sk_buff_cb. This
      ensures that the skb->cb buffer is large enough to hold the
      j1939_sk_buff_cb structure.
      
      Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
      Reported-by: default avatarShuangpeng Bai <sjb7183@psu.edu>
      Tested-by: default avatarShuangpeng Bai <sjb7183@psu.edu>
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ
      Link: https://lore.kernel.org/all/20230404073128.3173900-1-o.rempel@pengutronix.de
      Cc: stable@vger.kernel.org
      [mkl: rephrase commit message]
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      b45193cb
    • Shailend Chand's avatar
      gve: Secure enough bytes in the first TX desc for all TCP pkts · 3ce93455
      Shailend Chand authored
      Non-GSO TCP packets whose SKBs' linear portion did not include the
      entire TCP header were not populating the first Tx descriptor with
      as many bytes as the vNIC expected. This change ensures that all
      TCP packets populate the first descriptor with the correct number of
      bytes.
      
      Fixes: 893ce44d ("gve: Add basic driver framework for Compute Engine Virtual NIC")
      Signed-off-by: default avatarShailend Chand <shailend@google.com>
      Link: https://lore.kernel.org/r/20230403172809.2939306-1-shailend@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3ce93455
    • Eric Dumazet's avatar
      netlink: annotate lockless accesses to nlk->max_recvmsg_len · a1865f2e
      Eric Dumazet authored
      syzbot reported a data-race in data-race in netlink_recvmsg() [1]
      
      Indeed, netlink_recvmsg() can be run concurrently,
      and netlink_dump() also needs protection.
      
      [1]
      BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
      
      read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
      netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
      sock_recvmsg_nosec net/socket.c:1017 [inline]
      sock_recvmsg net/socket.c:1038 [inline]
      __sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
      __do_sys_recvfrom net/socket.c:2212 [inline]
      __se_sys_recvfrom net/socket.c:2208 [inline]
      __x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
      netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
      sock_recvmsg_nosec net/socket.c:1017 [inline]
      sock_recvmsg net/socket.c:1038 [inline]
      ____sys_recvmsg+0x156/0x310 net/socket.c:2720
      ___sys_recvmsg net/socket.c:2762 [inline]
      do_recvmmsg+0x2e5/0x710 net/socket.c:2856
      __sys_recvmmsg net/socket.c:2935 [inline]
      __do_sys_recvmmsg net/socket.c:2958 [inline]
      __se_sys_recvmmsg net/socket.c:2951 [inline]
      __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x0000000000000000 -> 0x0000000000001000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
      
      Fixes: 9063e21f ("netlink: autosize skb lengthes")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a1865f2e
    • Andy Roulin's avatar
      ethtool: reset #lanes when lanes is omitted · e847c767
      Andy Roulin authored
      If the number of lanes was forced and then subsequently the user
      omits this parameter, the ksettings->lanes is reset. The driver
      should then reset the number of lanes to the device's default
      for the specified speed.
      
      However, although the ksettings->lanes is set to 0, the mod variable
      is not set to true to indicate the driver and userspace should be
      notified of the changes.
      
      The consequence is that the same ethtool operation will produce
      different results based on the initial state.
      
      If the initial state is:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 2
              Duplex: Full
              Auto-negotiation: on
      
      then executing 'ethtool -s swp1 speed 50000 autoneg off' will yield:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 2
              Duplex: Full
              Auto-negotiation: off
      
      While if the initial state is:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 1
              Duplex: Full
              Auto-negotiation: off
      
      executing the same 'ethtool -s swp1 speed 50000 autoneg off' results in:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 1
              Duplex: Full
              Auto-negotiation: off
      
      This patch fixes this behavior. Omitting lanes will always results in
      the driver choosing the default lane width for the chosen speed. In this
      scenario, regardless of the initial state, the end state will be, e.g.,
      
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 2
              Duplex: Full
              Auto-negotiation: off
      
      Fixes: 012ce4dd ("ethtool: Extend link modes settings uAPI with lanes")
      Signed-off-by: default avatarAndy Roulin <aroulin@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/ac238d6b-8726-8156-3810-6471291dbc7f@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e847c767
    • Jakub Kicinski's avatar
      Merge branch 'raw-ping-fix-locking-in-proc-net-raw-icmp' · 95fac540
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      raw/ping: Fix locking in /proc/net/{raw,icmp}.
      
      The first patch fixes a NULL deref for /proc/net/raw and second one fixes
      the same issue for ping sockets.
      
      The first patch also converts hlist_nulls to hlist, but this is because
      the current code uses sk_nulls_for_each() for lockless readers, instead
      of sk_nulls_for_each_rcu() which adds memory barrier, but raw sockets
      does not use the nulls marker nor SLAB_TYPESAFE_BY_RCU in the first place.
      
      OTOH, the ping sockets already uses sk_nulls_for_each_rcu(), and such
      conversion can be posted later for net-next.
      ====================
      
      Link: https://lore.kernel.org/r/20230403194959.48928-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95fac540
    • Kuniyuki Iwashima's avatar
      ping: Fix potentail NULL deref for /proc/net/icmp. · ab5fb73f
      Kuniyuki Iwashima authored
      After commit dbca1596 ("ping: convert to RCU lookups, get rid
      of rwlock"), we use RCU for ping sockets, but we should use spinlock
      for /proc/net/icmp to avoid a potential NULL deref mentioned in
      the previous patch.
      
      Let's go back to using spinlock there.
      
      Note we can convert ping sockets to use hlist instead of hlist_nulls
      because we do not use SLAB_TYPESAFE_BY_RCU for ping sockets.
      
      Fixes: dbca1596 ("ping: convert to RCU lookups, get rid of rwlock")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ab5fb73f
    • Kuniyuki Iwashima's avatar
      raw: Fix NULL deref in raw_get_next(). · 0a78cf72
      Kuniyuki Iwashima authored
      Dae R. Jeong reported a NULL deref in raw_get_next() [0].
      
      It seems that the repro was running these sequences in parallel so
      that one thread was iterating on a socket that was being freed in
      another netns.
      
        unshare(0x40060200)
        r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00')
        socket$inet_icmp_raw(0x2, 0x3, 0x1)
        pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f)
      
      After commit 0daf07e5 ("raw: convert raw sockets to RCU"), we
      use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW
      sockets.  However, we should use spinlock for slow paths to avoid
      the NULL deref.
      
      Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object
      is not reused during iteration in the grace period.  In fact, the
      lockless readers do not check the nulls marker with get_nulls_value().
      So, SOCK_RAW should use hlist instead of hlist_nulls.
      
      Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(),
      let's convert hlist_nulls to hlist and use sk_for_each_rcu() for
      fast paths and sk_for_each() and spinlock for /proc/net/raw.
      
      [0]:
      general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
      CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
      RIP: 0010:sock_net include/net/sock.h:649 [inline]
      RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
      RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
      RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
      Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
      RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
      RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
      RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
      RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
      R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
      R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
      FS:  00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055bb9614b35f CR3: 000000003c672000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225
       seq_read+0x224/0x320 fs/seq_file.c:162
       pde_read fs/proc/inode.c:316 [inline]
       proc_reg_read+0x23f/0x330 fs/proc/inode.c:328
       vfs_read+0x31e/0xd30 fs/read_write.c:468
       ksys_pread64 fs/read_write.c:665 [inline]
       __do_sys_pread64 fs/read_write.c:675 [inline]
       __se_sys_pread64 fs/read_write.c:672 [inline]
       __x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x478d29
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f843ae8dbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
      RAX: ffffffffffffffda RBX: 0000000000791408 RCX: 0000000000478d29
      RDX: 000000000000000a RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000f477909a R08: 0000000000000000 R09: 0000000000000000
      R10: 000010000000007f R11: 0000000000000246 R12: 0000000000791740
      R13: 0000000000791414 R14: 0000000000791408 R15: 00007ffc2eb48a50
       </TASK>
      Modules linked in:
      ---[ end trace 0000000000000000 ]---
      RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
      RIP: 0010:sock_net include/net/sock.h:649 [inline]
      RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
      RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
      RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
      Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
      RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
      RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
      RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
      RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
      R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
      R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
      FS:  00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f92ff166000 CR3: 000000003c672000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 0daf07e5 ("raw: convert raw sockets to RCU")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reported-by: default avatarDae R. Jeong <threeearcat@gmail.com>
      Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a78cf72
  3. 04 Apr, 2023 4 commits
  4. 03 Apr, 2023 5 commits
    • Felix Fietkau's avatar
      wifi: mt76: ignore key disable commands · e6db67fa
      Felix Fietkau authored
      This helps avoid cleartext leakage of already queued or powersave buffered
      packets, when a reassoc triggers the key deletion.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20230330091259.61378-1-nbd@nbd.name
      e6db67fa
    • Kalle Valo's avatar
      wifi: ath11k: reduce the MHI timeout to 20s · cf5fa3ca
      Kalle Valo authored
      Currently ath11k breaks after hibernation, the reason being that ath11k expects
      that the wireless device will have power during suspend and the firmware will
      continue running. But of course during hibernation the power from the device is
      cut off and firmware is not running when resuming, so ath11k will fail.
      
      (The reason why ath11k needs the firmware running is the interaction between
      mac80211 and MHI stack, it's a long story and more info in the bugzilla report.)
      
      In SUSE kernels the watchdog timeout is reduced from the default 120 to 60 seconds:
      
      CONFIG_DPM_WATCHDOG_TIMEOUT=60
      
      But as the ath11k MHI timeout is 90 seconds the kernel will crash before will
      ath11k will recover in resume callback. To avoid the crash reduce the MHI
      timeout to just 20 seconds.
      
      Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.9
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=214649Signed-off-by: default avatarKalle Valo <quic_kvalo@quicinc.com>
      Signed-off-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20230329162038.8637-1-kvalo@kernel.org
      cf5fa3ca
    • Ziyang Xuan's avatar
      ipv6: Fix an uninit variable access bug in __ip6_make_skb() · ea30388b
      Ziyang Xuan authored
      Syzbot reported a bug as following:
      
      =====================================================
      BUG: KMSAN: uninit-value in arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline]
      BUG: KMSAN: uninit-value in arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline]
      BUG: KMSAN: uninit-value in atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline]
      BUG: KMSAN: uninit-value in __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956
       arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline]
       arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline]
       atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline]
       __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956
       ip6_finish_skb include/net/ipv6.h:1122 [inline]
       ip6_push_pending_frames+0x10e/0x550 net/ipv6/ip6_output.c:1987
       rawv6_push_pending_frames+0xb12/0xb90 net/ipv6/raw.c:579
       rawv6_sendmsg+0x297e/0x2e60 net/ipv6/raw.c:922
       inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       ____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476
       ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530
       __sys_sendmsg net/socket.c:2559 [inline]
       __do_sys_sendmsg net/socket.c:2568 [inline]
       __se_sys_sendmsg net/socket.c:2566 [inline]
       __x64_sys_sendmsg+0x367/0x540 net/socket.c:2566
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:766 [inline]
       slab_alloc_node mm/slub.c:3452 [inline]
       __kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
       __do_kmalloc_node mm/slab_common.c:967 [inline]
       __kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988
       kmalloc_reserve net/core/skbuff.c:492 [inline]
       __alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565
       alloc_skb include/linux/skbuff.h:1270 [inline]
       __ip6_append_data+0x51c1/0x6bb0 net/ipv6/ip6_output.c:1684
       ip6_append_data+0x411/0x580 net/ipv6/ip6_output.c:1854
       rawv6_sendmsg+0x2882/0x2e60 net/ipv6/raw.c:915
       inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       ____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476
       ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530
       __sys_sendmsg net/socket.c:2559 [inline]
       __do_sys_sendmsg net/socket.c:2568 [inline]
       __se_sys_sendmsg net/socket.c:2566 [inline]
       __x64_sys_sendmsg+0x367/0x540 net/socket.c:2566
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      It is because icmp6hdr does not in skb linear region under the scenario
      of SOCK_RAW socket. Access icmp6_hdr(skb)->icmp6_type directly will
      trigger the uninit variable access bug.
      
      Use a local variable icmp6_type to carry the correct value in different
      scenarios.
      
      Fixes: 14878f75 ("[IPV6]: Add ICMPMsgStats MIB (RFC 4293) [rev 2]")
      Reported-by: syzbot+8257f4dcef79de670baf@syzkaller.appspotmail.com
      Link: https://syzkaller.appspot.com/bug?id=3d605ec1d0a7f2a269a1a6936ac7f2b85975ee9cSigned-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea30388b
    • Sricharan Ramabadhran's avatar
      net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT · 839349d1
      Sricharan Ramabadhran authored
      On the remote side, when QRTR socket is removed, af_qrtr will call
      qrtr_port_remove() which broadcasts the DEL_CLIENT packet to all neighbours
      including local NS. NS upon receiving the DEL_CLIENT packet, will remove
      the lookups associated with the node:port and broadcasts the DEL_SERVER
      packet.
      
      But on the host side, due to the arrival of the DEL_CLIENT packet, the NS
      would've already deleted the server belonging to that port. So when the
      remote's NS again broadcasts the DEL_SERVER for that port, it throws below
      error message on the host:
      
      "failed while handling packet from 2:-2"
      
      So fix this error by not broadcasting the DEL_SERVER packet when the
      DEL_CLIENT packet gets processed."
      
      Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
      Reviewed-by: default avatarManivannan Sadhasivam <mani@kernel.org>
      Signed-off-by: default avatarRam Kumar Dharuman <quic_ramd@quicinc.com>
      Signed-off-by: default avatarSricharan Ramabadhran <quic_srichara@quicinc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      839349d1
    • Daniel Golle's avatar
      net: sfp: add quirk enabling 2500Base-x for HG MXPD-483II · ad651d68
      Daniel Golle authored
      The HG MXPD-483II 1310nm SFP module is meant to operate with 2500Base-X,
      however, in their EEPROM they incorrectly specify:
          Transceiver type                          : Ethernet: 1000BASE-LX
          ...
          BR, Nominal                               : 2600MBd
      
      Use sfp_quirk_2500basex for this module to allow 2500Base-X mode anyway.
      
      https://forum.banana-pi.org/t/bpi-r3-sfp-module-compatibility/14573/60Reported-by: default avatarchowtom <chowtom@gmail.com>
      Tested-by: default avatarchowtom <chowtom@gmail.com>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad651d68
  5. 02 Apr, 2023 4 commits
    • Xin Long's avatar
      sctp: check send stream number after wait_for_sndbuf · 2584024b
      Xin Long authored
      This patch fixes a corner case where the asoc out stream count may change
      after wait_for_sndbuf.
      
      When the main thread in the client starts a connection, if its out stream
      count is set to N while the in stream count in the server is set to N - 2,
      another thread in the client keeps sending the msgs with stream number
      N - 1, and waits for sndbuf before processing INIT_ACK.
      
      However, after processing INIT_ACK, the out stream count in the client is
      shrunk to N - 2, the same to the in stream count in the server. The crash
      occurs when the thread waiting for sndbuf is awake and sends the msg in a
      non-existing stream(N - 1), the call trace is as below:
      
        KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
        Call Trace:
         <TASK>
         sctp_cmd_send_msg net/sctp/sm_sideeffect.c:1114 [inline]
         sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1777 [inline]
         sctp_side_effects net/sctp/sm_sideeffect.c:1199 [inline]
         sctp_do_sm+0x197d/0x5310 net/sctp/sm_sideeffect.c:1170
         sctp_primitive_SEND+0x9f/0xc0 net/sctp/primitive.c:163
         sctp_sendmsg_to_asoc+0x10eb/0x1a30 net/sctp/socket.c:1868
         sctp_sendmsg+0x8d4/0x1d90 net/sctp/socket.c:2026
         inet_sendmsg+0x9d/0xe0 net/ipv4/af_inet.c:825
         sock_sendmsg_nosec net/socket.c:722 [inline]
         sock_sendmsg+0xde/0x190 net/socket.c:745
      
      The fix is to add an unlikely check for the send stream number after the
      thread wakes up from the wait_for_sndbuf.
      
      Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
      Reported-by: syzbot+47c24ca20a2fa01f082e@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2584024b
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: fix remaining throughput regression · e669ce46
      Felix Fietkau authored
      Based on further tests, it seems that the QDMA shaper is not able to
      perform shaping close to the MAC link rate without throughput loss.
      This cannot be compensated by increasing the shaping rate, so it seems
      to be an internal limit.
      
      Fix the remaining throughput regression by detecting that condition and
      limiting shaping to ports with lower link speed.
      
      This patch intentionally ignores link speed gain from TRGMII, because
      even on such links, shaping to 1000 Mbit/s incurs some throughput
      degradation.
      
      Fixes: f63959c7 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues")
      Tested-By: default avatarFrank Wunderlich <frank-w@public-files.de>
      Reported-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e669ce46
    • Gustav Ekelund's avatar
      net: dsa: mv88e6xxx: Reset mv88e6393x force WD event bit · 089b91a0
      Gustav Ekelund authored
      The force watchdog event bit is not cleared during SW reset in the
      mv88e6393x switch. This is a different behavior compared to mv886390 which
      clears the force WD event bit as advertised. This causes a force WD event
      to be handled over and over again as the SW reset following the event never
      clears the force WD event bit.
      
      Explicitly clear the watchdog event register to 0 in irq_action when
      handling an event to prevent the switch from sending continuous interrupts.
      Marvell aren't aware of any other stuck bits apart from the force WD
      bit.
      
      Fixes: de776d0d ("net: dsa: mv88e6xxx: add support for mv88e6393x family"
      Signed-off-by: default avatarGustav Ekelund <gustaek@axis.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      089b91a0
    • Jakub Kicinski's avatar
      net: don't let netpoll invoke NAPI if in xmit context · 275b471e
      Jakub Kicinski authored
      Commit 0db3dc73 ("[NETPOLL]: tx lock deadlock fix") narrowed
      down the region under netif_tx_trylock() inside netpoll_send_skb().
      (At that point in time netif_tx_trylock() would lock all queues of
      the device.) Taking the tx lock was problematic because driver's
      cleanup method may take the same lock. So the change made us hold
      the xmit lock only around xmit, and expected the driver to take
      care of locking within ->ndo_poll_controller().
      
      Unfortunately this only works if netpoll isn't itself called with
      the xmit lock already held. Netpoll code is careful and uses
      trylock(). The drivers, however, may be using plain lock().
      Printing while holding the xmit lock is going to result in rare
      deadlocks.
      
      Luckily we record the xmit lock owners, so we can scan all the queues,
      the same way we scan NAPI owners. If any of the xmit locks is held
      by the local CPU we better not attempt any polling.
      
      It would be nice if we could narrow down the check to only the NAPIs
      and the queue we're trying to use. I don't see a way to do that now.
      Reported-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Fixes: 0db3dc73 ("[NETPOLL]: tx lock deadlock fix")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      275b471e
  6. 01 Apr, 2023 2 commits
    • Eric Dumazet's avatar
      icmp: guard against too small mtu · 7d63b671
      Eric Dumazet authored
      syzbot was able to trigger a panic [1] in icmp_glue_bits(), or
      more exactly in skb_copy_and_csum_bits()
      
      There is no repro yet, but I think the issue is that syzbot
      manages to lower device mtu to a small value, fooling __icmp_send()
      
      __icmp_send() must make sure there is enough room for the
      packet to include at least the headers.
      
      We might in the future refactor skb_copy_and_csum_bits() and its
      callers to no longer crash when something bad happens.
      
      [1]
      kernel BUG at net/core/skbuff.c:3343 !
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 15766 Comm: syz-executor.0 Not tainted 6.3.0-rc4-syzkaller-00039-gffe78bbd #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      RIP: 0010:skb_copy_and_csum_bits+0x798/0x860 net/core/skbuff.c:3343
      Code: f0 c1 c8 08 41 89 c6 e9 73 ff ff ff e8 61 48 d4 f9 e9 41 fd ff ff 48 8b 7c 24 48 e8 52 48 d4 f9 e9 c3 fc ff ff e8 c8 27 84 f9 <0f> 0b 48 89 44 24 28 e8 3c 48 d4 f9 48 8b 44 24 28 e9 9d fb ff ff
      RSP: 0018:ffffc90000007620 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 00000000000001e8 RCX: 0000000000000100
      RDX: ffff8880276f6280 RSI: ffffffff87fdd138 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
      R10: 00000000000001e8 R11: 0000000000000001 R12: 000000000000003c
      R13: 0000000000000000 R14: ffff888028244868 R15: 0000000000000b0e
      FS: 00007fbc81f1c700(0000) GS:ffff88802ca00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2df43000 CR3: 00000000744db000 CR4: 0000000000150ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <IRQ>
      icmp_glue_bits+0x7b/0x210 net/ipv4/icmp.c:353
      __ip_append_data+0x1d1b/0x39f0 net/ipv4/ip_output.c:1161
      ip_append_data net/ipv4/ip_output.c:1343 [inline]
      ip_append_data+0x115/0x1a0 net/ipv4/ip_output.c:1322
      icmp_push_reply+0xa8/0x440 net/ipv4/icmp.c:370
      __icmp_send+0xb80/0x1430 net/ipv4/icmp.c:765
      ipv4_send_dest_unreach net/ipv4/route.c:1239 [inline]
      ipv4_link_failure+0x5a9/0x9e0 net/ipv4/route.c:1246
      dst_link_failure include/net/dst.h:423 [inline]
      arp_error_report+0xcb/0x1c0 net/ipv4/arp.c:296
      neigh_invalidate+0x20d/0x560 net/core/neighbour.c:1079
      neigh_timer_handler+0xc77/0xff0 net/core/neighbour.c:1166
      call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700
      expire_timers+0x29b/0x4b0 kernel/time/timer.c:1751
      __run_timers kernel/time/timer.c:2022 [inline]
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+d373d60fddbdc915e666@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230330174502.1915328-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7d63b671
    • Jakub Kicinski's avatar
      Revert "net: netcp: MAX_SKB_FRAGS is now 'int'" · adef41b0
      Jakub Kicinski authored
      This reverts commit c5b959ee.
      
      Reverted change is required after commit 3948b059 ("net: introduce
      a config option to tweak MAX_SKB_FRAGS") which does not exist
      in this tree, yet. It's only present in -next trees at the time
      of writing.
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://lore.kernel.org/all/20230331214444.GA1426512@dev-arch.thelio-3990X/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      adef41b0
  7. 31 Mar, 2023 11 commits