1. 24 Jun, 2023 6 commits
    • Kuniyuki Iwashima's avatar
      gtp: Fix use-after-free in __gtp_encap_destroy(). · ce3aee71
      Kuniyuki Iwashima authored
      syzkaller reported use-after-free in __gtp_encap_destroy(). [0]
      
      It shows the same process freed sk and touched it illegally.
      
      Commit e198987e ("gtp: fix suspicious RCU usage") added lock_sock()
      and release_sock() in __gtp_encap_destroy() to protect sk->sk_user_data,
      but release_sock() is called after sock_put() releases the last refcnt.
      
      [0]:
      BUG: KASAN: slab-use-after-free in instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
      BUG: KASAN: slab-use-after-free in atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:541 [inline]
      BUG: KASAN: slab-use-after-free in queued_spin_lock include/asm-generic/qspinlock.h:111 [inline]
      BUG: KASAN: slab-use-after-free in do_raw_spin_lock include/linux/spinlock.h:186 [inline]
      BUG: KASAN: slab-use-after-free in __raw_spin_lock_bh include/linux/spinlock_api_smp.h:127 [inline]
      BUG: KASAN: slab-use-after-free in _raw_spin_lock_bh+0x75/0xe0 kernel/locking/spinlock.c:178
      Write of size 4 at addr ffff88800dbef398 by task syz-executor.2/2401
      
      CPU: 1 PID: 2401 Comm: syz-executor.2 Not tainted 6.4.0-rc5-01219-gfa0e21fa #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:351 [inline]
       print_report+0xcc/0x620 mm/kasan/report.c:462
       kasan_report+0xb2/0xe0 mm/kasan/report.c:572
       check_region_inline mm/kasan/generic.c:181 [inline]
       kasan_check_range+0x39/0x1c0 mm/kasan/generic.c:187
       instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
       atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:541 [inline]
       queued_spin_lock include/asm-generic/qspinlock.h:111 [inline]
       do_raw_spin_lock include/linux/spinlock.h:186 [inline]
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:127 [inline]
       _raw_spin_lock_bh+0x75/0xe0 kernel/locking/spinlock.c:178
       spin_lock_bh include/linux/spinlock.h:355 [inline]
       release_sock+0x1f/0x1a0 net/core/sock.c:3526
       gtp_encap_disable_sock drivers/net/gtp.c:651 [inline]
       gtp_encap_disable+0xb9/0x220 drivers/net/gtp.c:664
       gtp_dev_uninit+0x19/0x50 drivers/net/gtp.c:728
       unregister_netdevice_many_notify+0x97e/0x1520 net/core/dev.c:10841
       rtnl_delete_link net/core/rtnetlink.c:3216 [inline]
       rtnl_dellink+0x3c0/0xb30 net/core/rtnetlink.c:3268
       rtnetlink_rcv_msg+0x450/0xb10 net/core/rtnetlink.c:6423
       netlink_rcv_skb+0x15d/0x450 net/netlink/af_netlink.c:2548
       netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
       netlink_unicast+0x700/0x930 net/netlink/af_netlink.c:1365
       netlink_sendmsg+0x91c/0xe30 net/netlink/af_netlink.c:1913
       sock_sendmsg_nosec net/socket.c:724 [inline]
       sock_sendmsg+0x1b7/0x200 net/socket.c:747
       ____sys_sendmsg+0x75a/0x990 net/socket.c:2493
       ___sys_sendmsg+0x11d/0x1c0 net/socket.c:2547
       __sys_sendmsg+0xfe/0x1d0 net/socket.c:2576
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7f1168b1fe5d
      Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
      RSP: 002b:00007f1167edccc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f1168b1fe5d
      RDX: 0000000000000000 RSI: 00000000200002c0 RDI: 0000000000000003
      RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007f1168b80530 R15: 0000000000000000
       </TASK>
      
      Allocated by task 1483:
       kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
       kasan_set_track+0x25/0x30 mm/kasan/common.c:52
       __kasan_slab_alloc+0x59/0x70 mm/kasan/common.c:328
       kasan_slab_alloc include/linux/kasan.h:186 [inline]
       slab_post_alloc_hook mm/slab.h:711 [inline]
       slab_alloc_node mm/slub.c:3451 [inline]
       slab_alloc mm/slub.c:3459 [inline]
       __kmem_cache_alloc_lru mm/slub.c:3466 [inline]
       kmem_cache_alloc+0x16d/0x340 mm/slub.c:3475
       sk_prot_alloc+0x5f/0x280 net/core/sock.c:2073
       sk_alloc+0x34/0x6c0 net/core/sock.c:2132
       inet6_create net/ipv6/af_inet6.c:192 [inline]
       inet6_create+0x2c7/0xf20 net/ipv6/af_inet6.c:119
       __sock_create+0x2a1/0x530 net/socket.c:1535
       sock_create net/socket.c:1586 [inline]
       __sys_socket_create net/socket.c:1623 [inline]
       __sys_socket_create net/socket.c:1608 [inline]
       __sys_socket+0x137/0x250 net/socket.c:1651
       __do_sys_socket net/socket.c:1664 [inline]
       __se_sys_socket net/socket.c:1662 [inline]
       __x64_sys_socket+0x72/0xb0 net/socket.c:1662
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      Freed by task 2401:
       kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
       kasan_set_track+0x25/0x30 mm/kasan/common.c:52
       kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
       ____kasan_slab_free mm/kasan/common.c:236 [inline]
       ____kasan_slab_free mm/kasan/common.c:200 [inline]
       __kasan_slab_free+0x10c/0x1b0 mm/kasan/common.c:244
       kasan_slab_free include/linux/kasan.h:162 [inline]
       slab_free_hook mm/slub.c:1781 [inline]
       slab_free_freelist_hook mm/slub.c:1807 [inline]
       slab_free mm/slub.c:3786 [inline]
       kmem_cache_free+0xb4/0x490 mm/slub.c:3808
       sk_prot_free net/core/sock.c:2113 [inline]
       __sk_destruct+0x500/0x720 net/core/sock.c:2207
       sk_destruct+0xc1/0xe0 net/core/sock.c:2222
       __sk_free+0xed/0x3d0 net/core/sock.c:2233
       sk_free+0x7c/0xa0 net/core/sock.c:2244
       sock_put include/net/sock.h:1981 [inline]
       __gtp_encap_destroy+0x165/0x1b0 drivers/net/gtp.c:634
       gtp_encap_disable_sock drivers/net/gtp.c:651 [inline]
       gtp_encap_disable+0xb9/0x220 drivers/net/gtp.c:664
       gtp_dev_uninit+0x19/0x50 drivers/net/gtp.c:728
       unregister_netdevice_many_notify+0x97e/0x1520 net/core/dev.c:10841
       rtnl_delete_link net/core/rtnetlink.c:3216 [inline]
       rtnl_dellink+0x3c0/0xb30 net/core/rtnetlink.c:3268
       rtnetlink_rcv_msg+0x450/0xb10 net/core/rtnetlink.c:6423
       netlink_rcv_skb+0x15d/0x450 net/netlink/af_netlink.c:2548
       netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
       netlink_unicast+0x700/0x930 net/netlink/af_netlink.c:1365
       netlink_sendmsg+0x91c/0xe30 net/netlink/af_netlink.c:1913
       sock_sendmsg_nosec net/socket.c:724 [inline]
       sock_sendmsg+0x1b7/0x200 net/socket.c:747
       ____sys_sendmsg+0x75a/0x990 net/socket.c:2493
       ___sys_sendmsg+0x11d/0x1c0 net/socket.c:2547
       __sys_sendmsg+0xfe/0x1d0 net/socket.c:2576
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      The buggy address belongs to the object at ffff88800dbef300
       which belongs to the cache UDPv6 of size 1344
      The buggy address is located 152 bytes inside of
       freed 1344-byte region [ffff88800dbef300, ffff88800dbef840)
      
      The buggy address belongs to the physical page:
      page:00000000d31bfed5 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88800dbeed40 pfn:0xdbe8
      head:00000000d31bfed5 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      memcg:ffff888008ee0801
      flags: 0x100000000010200(slab|head|node=0|zone=1)
      page_type: 0xffffffff()
      raw: 0100000000010200 ffff88800c7a3000 dead000000000122 0000000000000000
      raw: ffff88800dbeed40 0000000080160015 00000001ffffffff ffff888008ee0801
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88800dbef280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88800dbef300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff88800dbef380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                  ^
       ffff88800dbef400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88800dbef480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: e198987e ("gtp: fix suspicious RCU usage")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://lore.kernel.org/r/20230622213231.24651-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ce3aee71
    • Sabrina Dubroca's avatar
      selftests: rtnetlink: remove netdevsim device after ipsec offload test · 5f789f10
      Sabrina Dubroca authored
      On systems where netdevsim is built-in or loaded before the test
      starts, kci_test_ipsec_offload doesn't remove the netdevsim device it
      created during the test.
      
      Fixes: e05b2d14 ("netdevsim: move netdev creation/destruction to dev probe")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/e1cb94f4f82f4eca4a444feec4488a1323396357.1687466906.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5f789f10
    • Eric Dumazet's avatar
      sch_netem: fix issues in netem_change() vs get_dist_table() · 11b73313
      Eric Dumazet authored
      In blamed commit, I missed that get_dist_table() was allocating
      memory using GFP_KERNEL, and acquiring qdisc lock to perform
      the swap of newly allocated table with current one.
      
      In this patch, get_dist_table() is allocating memory and
      copy user data before we acquire the qdisc lock.
      
      Then we perform swap operations while being protected by the lock.
      
      Note that after this patch netem_change() no longer can do partial changes.
      If an error is returned, qdisc conf is left unchanged.
      
      Fixes: 2174a08d ("sch_netem: acquire qdisc lock in netem_change()")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230622181503.2327695-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      11b73313
    • Jakub Kicinski's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · eb441289
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      igc: TX timestamping fixes
      
      This is the fixes part of the series intended to add support for using
      the 4 timestamp registers present in i225/i226.
      
      Moving the timestamp handling to be inline with the interrupt handling
      has the advantage of improving the TX timestamping retrieval latency,
      here are some numbers using ntpperf:
      
      Before:
      
      $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37
                     |          responses            |     TX timestamp offset (ns)
      rate   clients |  lost invalid   basic  xleave |    min    mean     max stddev
      1000       100   0.00%   0.00%   0.00% 100.00%      -56      +9     +52     19
      1500       150   0.00%   0.00%   0.00% 100.00%      -40     +30     +75     22
      2250       225   0.00%   0.00%   0.00% 100.00%      -11     +29     +72     15
      3375       337   0.00%   0.00%   0.00% 100.00%      -18     +40     +88     22
      5062       506   0.00%   0.00%   0.00% 100.00%      -19     +23     +77     15
      7593       759   0.00%   0.00%   0.00% 100.00%       +7     +47   +5168     43
      11389     1138   0.00%   0.00%   0.00% 100.00%      -11     +41   +5240     39
      17083     1708   0.00%   0.00%   0.00% 100.00%      +19     +60   +5288     50
      25624     2562   0.00%   0.00%   0.00% 100.00%       +1     +56   +5368     58
      38436     3843   0.00%   0.00%   0.00% 100.00%      -84     +12   +8847     66
      57654     5765   0.00%   0.00% 100.00%   0.00%
      86481     8648   0.00%   0.00% 100.00%   0.00%
      129721   12972   0.00%   0.00% 100.00%   0.00%
      194581   16384   0.00%   0.00% 100.00%   0.00%
      291871   16384  27.35%   0.00%  72.65%   0.00%
      437806   16384  50.05%   0.00%  49.95%   0.00%
      
      After:
      
      $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37
                     |          responses            |     TX timestamp offset (ns)
      rate   clients |  lost invalid   basic  xleave |    min    mean     max stddev
      1000       100   0.00%   0.00%   0.00% 100.00%      -44      +0     +61     19
      1500       150   0.00%   0.00%   0.00% 100.00%       -6     +39     +81     16
      2250       225   0.00%   0.00%   0.00% 100.00%      -22     +25     +69     15
      3375       337   0.00%   0.00%   0.00% 100.00%      -28     +15     +56     14
      5062       506   0.00%   0.00%   0.00% 100.00%       +7     +78    +143     27
      7593       759   0.00%   0.00%   0.00% 100.00%      -54     +24    +144     47
      11389     1138   0.00%   0.00%   0.00% 100.00%      -90     -33     +28     21
      17083     1708   0.00%   0.00%   0.00% 100.00%      -50      -2     +35     14
      25624     2562   0.00%   0.00%   0.00% 100.00%      -62      +7     +66     23
      38436     3843   0.00%   0.00%   0.00% 100.00%      -33     +30   +5395     36
      57654     5765   0.00%   0.00% 100.00%   0.00%
      86481     8648   0.00%   0.00% 100.00%   0.00%
      129721   12972   0.00%   0.00% 100.00%   0.00%
      194581   16384  19.50%   0.00%  80.50%   0.00%
      291871   16384  35.81%   0.00%  64.19%   0.00%
      437806   16384  55.40%   0.00%  44.60%   0.00%
      
      During this series, and to show that as is always the case, things are
      never easy as they should be, a hardware issue was found, and it took
      some time to find the workaround(s). The bug and workaround are better
      explained in patch 4/4.
      
      Note: the workaround has a simpler alternative, but it would involve
      adding support for the other timestamp registers, and only using the
      TXSTMP{H/L}_0 as a way to clear the interrupt. But I feel bad about
      throwing this kind of resources away. Didn't test this extensively but
      it should work.
      
      Also, as Marc Kleine-Budde suggested, after some consensus is reached
      on this series, most parts of it will be proposed for igb.
      
      * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        igc: Work around HW bug causing missing timestamps
        igc: Retrieve TX timestamp during interrupt handling
        igc: Check if hardware TX timestamping is enabled earlier
        igc: Fix race condition in PTP tx code
      ====================
      
      Link: https://lore.kernel.org/r/20230622165244.2202786-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eb441289
    • Eric Dumazet's avatar
      bonding: do not assume skb mac_header is set · 6a940abd
      Eric Dumazet authored
      Drivers must not assume in their ndo_start_xmit() that
      skbs have their mac_header set. skb->data is all what is needed.
      
      bonding seems to be one of the last offender as caught by syzbot:
      
      WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 skb_mac_offset include/linux/skbuff.h:2913 [inline]
      WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_xmit_hash drivers/net/bonding/bond_main.c:4170 [inline]
      WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5149 [inline]
      WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_3ad_xor_xmit drivers/net/bonding/bond_main.c:5186 [inline]
      WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 __bond_start_xmit drivers/net/bonding/bond_main.c:5442 [inline]
      WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_start_xmit+0x14ab/0x19d0 drivers/net/bonding/bond_main.c:5470
      Modules linked in:
      CPU: 1 PID: 12155 Comm: syz-executor.3 Not tainted 6.1.30-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
      RIP: 0010:skb_mac_header include/linux/skbuff.h:2907 [inline]
      RIP: 0010:skb_mac_offset include/linux/skbuff.h:2913 [inline]
      RIP: 0010:bond_xmit_hash drivers/net/bonding/bond_main.c:4170 [inline]
      RIP: 0010:bond_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5149 [inline]
      RIP: 0010:bond_3ad_xor_xmit drivers/net/bonding/bond_main.c:5186 [inline]
      RIP: 0010:__bond_start_xmit drivers/net/bonding/bond_main.c:5442 [inline]
      RIP: 0010:bond_start_xmit+0x14ab/0x19d0 drivers/net/bonding/bond_main.c:5470
      Code: 8b 7c 24 30 e8 76 dd 1a 01 48 85 c0 74 0d 48 89 c3 e8 29 67 2e fe e9 15 ef ff ff e8 1f 67 2e fe e9 10 ef ff ff e8 15 67 2e fe <0f> 0b e9 45 f8 ff ff e8 09 67 2e fe e9 dc fa ff ff e8 ff 66 2e fe
      RSP: 0018:ffffc90002fff6e0 EFLAGS: 00010283
      RAX: ffffffff835874db RBX: 000000000000ffff RCX: 0000000000040000
      RDX: ffffc90004dcf000 RSI: 00000000000000b5 RDI: 00000000000000b6
      RBP: ffffc90002fff8b8 R08: ffffffff83586d16 R09: ffffffff83586584
      R10: 0000000000000007 R11: ffff8881599fc780 R12: ffff88811b6a7b7e
      R13: 1ffff110236d4f6f R14: ffff88811b6a7ac0 R15: 1ffff110236d4f76
      FS: 00007f2e9eb47700(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2e421000 CR3: 000000010e6d4000 CR4: 00000000003526e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      [<ffffffff8471a49f>] netdev_start_xmit include/linux/netdevice.h:4925 [inline]
      [<ffffffff8471a49f>] __dev_direct_xmit+0x4ef/0x850 net/core/dev.c:4380
      [<ffffffff851d845b>] dev_direct_xmit include/linux/netdevice.h:3043 [inline]
      [<ffffffff851d845b>] packet_direct_xmit+0x18b/0x300 net/packet/af_packet.c:284
      [<ffffffff851c7472>] packet_snd net/packet/af_packet.c:3112 [inline]
      [<ffffffff851c7472>] packet_sendmsg+0x4a22/0x64d0 net/packet/af_packet.c:3143
      [<ffffffff8467a4b2>] sock_sendmsg_nosec net/socket.c:716 [inline]
      [<ffffffff8467a4b2>] sock_sendmsg net/socket.c:736 [inline]
      [<ffffffff8467a4b2>] __sys_sendto+0x472/0x5f0 net/socket.c:2139
      [<ffffffff8467a715>] __do_sys_sendto net/socket.c:2151 [inline]
      [<ffffffff8467a715>] __se_sys_sendto net/socket.c:2147 [inline]
      [<ffffffff8467a715>] __x64_sys_sendto+0xe5/0x100 net/socket.c:2147
      [<ffffffff8553071f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      [<ffffffff8553071f>] do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80
      [<ffffffff85600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 7b8fc010 ("bonding: add a vlan+srcmac tx hashing option")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jarod Wilson <jarod@redhat.com>
      Cc: Moshe Tal <moshet@nvidia.com>
      Cc: Jussi Maki <joamaki@gmail.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20230622152304.2137482-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6a940abd
    • Florian Fainelli's avatar
      net: bcmgenet: Ensure MDIO unregistration has clocks enabled · 1b5ea7ff
      Florian Fainelli authored
      With support for Ethernet PHY LEDs having been added, while
      unregistering a MDIO bus and its child device liks PHYs there may be
      "late" accesses to the MDIO bus. One typical use case is setting the PHY
      LEDs brightness to OFF for instance.
      
      We need to ensure that the MDIO bus controller remains entirely
      functional since it runs off the main GENET adapter clock.
      
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/all/20230617155500.4005881-1-andrew@lunn.ch/
      Fixes: 9a4e7969 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver")
      Signed-off-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230622103107.1760280-1-florian.fainelli@broadcom.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1b5ea7ff
  2. 23 Jun, 2023 15 commits
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-6.4-20230622' of... · 6f68fc39
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-6.4-20230622' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2023-06-22
      
      Oliver Hartkopp's patch fixes the return value in the error path of
      isotp_sendmsg() in the CAN ISOTP protocol.
      
      * tag 'linux-can-fixes-for-6.4-20230622' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: isotp: isotp_sendmsg(): fix return error fix on TX path
      ====================
      
      Link: https://lore.kernel.org/r/20230622090122.574506-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6f68fc39
    • Oleksij Rempel's avatar
      net: phy: dp83td510: fix kernel stall during netboot in DP83TD510E PHY driver · fc064939
      Oleksij Rempel authored
      Fix an issue where the kernel would stall during netboot, showing the
      "sched: RT throttling activated" message. This stall was triggered by
      the behavior of the mii_interrupt bit (Bit 7 - DP83TD510E_STS_MII_INT)
      in the DP83TD510E's PHY_STS Register (Address = 0x10). The DP83TD510E
      datasheet (2020) states that the bit clears on write, however, in
      practice, the bit clears on read.
      
      This discrepancy had significant implications on the driver's interrupt
      handling. The PHY_STS Register was used by handle_interrupt() to check
      for pending interrupts and by read_status() to get the current link
      status. The call to read_status() was unintentionally clearing the
      mii_interrupt status bit without deasserting the IRQ pin, causing
      handle_interrupt() to miss other pending interrupts. This issue was most
      apparent during netboot.
      
      The fix refrains from using the PHY_STS Register for interrupt handling.
      Instead, we now solely rely on the INTERRUPT_REG_1 Register (Address =
      0x12) and INTERRUPT_REG_2 Register (Address = 0x13) for this purpose.
      These registers directly influence the IRQ pin state and are latched
      high until read.
      
      Note: The INTERRUPT_REG_2 Register (Address = 0x13) exists and can also
      be used for interrupt handling, specifically for "Aneg page received
      interrupt" and "Polarity change interrupt". However, these features are
      currently not supported by this driver.
      
      Fixes: 165cd04f ("net: phy: dp83td510: Add support for the DP83TD510 Ethernet PHY")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230621043848.3806124-1-o.rempel@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fc064939
    • Eric Dumazet's avatar
      netlink: do not hard code device address lenth in fdb dumps · aa540695
      Eric Dumazet authored
      syzbot reports that some netdev devices do not have a six bytes
      address [1]
      
      Replace ETH_ALEN by dev->addr_len.
      
      [1] (Case of a device where dev->addr_len = 4)
      
      BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
      BUG: KMSAN: kernel-infoleak in copyout+0xb8/0x100 lib/iov_iter.c:169
      instrument_copy_to_user include/linux/instrumented.h:114 [inline]
      copyout+0xb8/0x100 lib/iov_iter.c:169
      _copy_to_iter+0x6d8/0x1d00 lib/iov_iter.c:536
      copy_to_iter include/linux/uio.h:206 [inline]
      simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:513
      __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:419
      skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:527
      skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline]
      netlink_recvmsg+0x4ae/0x15a0 net/netlink/af_netlink.c:1970
      sock_recvmsg_nosec net/socket.c:1019 [inline]
      sock_recvmsg net/socket.c:1040 [inline]
      ____sys_recvmsg+0x283/0x7f0 net/socket.c:2722
      ___sys_recvmsg+0x223/0x840 net/socket.c:2764
      do_recvmmsg+0x4f9/0xfd0 net/socket.c:2858
      __sys_recvmmsg net/socket.c:2937 [inline]
      __do_sys_recvmmsg net/socket.c:2960 [inline]
      __se_sys_recvmmsg net/socket.c:2953 [inline]
      __x64_sys_recvmmsg+0x397/0x490 net/socket.c:2953
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Uninit was stored to memory at:
      __nla_put lib/nlattr.c:1009 [inline]
      nla_put+0x1c6/0x230 lib/nlattr.c:1067
      nlmsg_populate_fdb_fill+0x2b8/0x600 net/core/rtnetlink.c:4071
      nlmsg_populate_fdb net/core/rtnetlink.c:4418 [inline]
      ndo_dflt_fdb_dump+0x616/0x840 net/core/rtnetlink.c:4456
      rtnl_fdb_dump+0x14ff/0x1fc0 net/core/rtnetlink.c:4629
      netlink_dump+0x9d1/0x1310 net/netlink/af_netlink.c:2268
      netlink_recvmsg+0xc5c/0x15a0 net/netlink/af_netlink.c:1995
      sock_recvmsg_nosec+0x7a/0x120 net/socket.c:1019
      ____sys_recvmsg+0x664/0x7f0 net/socket.c:2720
      ___sys_recvmsg+0x223/0x840 net/socket.c:2764
      do_recvmmsg+0x4f9/0xfd0 net/socket.c:2858
      __sys_recvmmsg net/socket.c:2937 [inline]
      __do_sys_recvmmsg net/socket.c:2960 [inline]
      __se_sys_recvmmsg net/socket.c:2953 [inline]
      __x64_sys_recvmmsg+0x397/0x490 net/socket.c:2953
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Uninit was created at:
      slab_post_alloc_hook+0x12d/0xb60 mm/slab.h:716
      slab_alloc_node mm/slub.c:3451 [inline]
      __kmem_cache_alloc_node+0x4ff/0x8b0 mm/slub.c:3490
      kmalloc_trace+0x51/0x200 mm/slab_common.c:1057
      kmalloc include/linux/slab.h:559 [inline]
      __hw_addr_create net/core/dev_addr_lists.c:60 [inline]
      __hw_addr_add_ex+0x2e5/0x9e0 net/core/dev_addr_lists.c:118
      __dev_mc_add net/core/dev_addr_lists.c:867 [inline]
      dev_mc_add+0x9a/0x130 net/core/dev_addr_lists.c:885
      igmp6_group_added+0x267/0xbc0 net/ipv6/mcast.c:680
      ipv6_mc_up+0x296/0x3b0 net/ipv6/mcast.c:2754
      ipv6_mc_remap+0x1e/0x30 net/ipv6/mcast.c:2708
      addrconf_type_change net/ipv6/addrconf.c:3731 [inline]
      addrconf_notify+0x4d3/0x1d90 net/ipv6/addrconf.c:3699
      notifier_call_chain kernel/notifier.c:93 [inline]
      raw_notifier_call_chain+0xe4/0x430 kernel/notifier.c:461
      call_netdevice_notifiers_info net/core/dev.c:1935 [inline]
      call_netdevice_notifiers_extack net/core/dev.c:1973 [inline]
      call_netdevice_notifiers+0x1ee/0x2d0 net/core/dev.c:1987
      bond_enslave+0xccd/0x53f0 drivers/net/bonding/bond_main.c:1906
      do_set_master net/core/rtnetlink.c:2626 [inline]
      rtnl_newlink_create net/core/rtnetlink.c:3460 [inline]
      __rtnl_newlink net/core/rtnetlink.c:3660 [inline]
      rtnl_newlink+0x378c/0x40e0 net/core/rtnetlink.c:3673
      rtnetlink_rcv_msg+0x16a6/0x1840 net/core/rtnetlink.c:6395
      netlink_rcv_skb+0x371/0x650 net/netlink/af_netlink.c:2546
      rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6413
      netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
      netlink_unicast+0xf28/0x1230 net/netlink/af_netlink.c:1365
      netlink_sendmsg+0x122f/0x13d0 net/netlink/af_netlink.c:1913
      sock_sendmsg_nosec net/socket.c:724 [inline]
      sock_sendmsg net/socket.c:747 [inline]
      ____sys_sendmsg+0x999/0xd50 net/socket.c:2503
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2557
      __sys_sendmsg net/socket.c:2586 [inline]
      __do_sys_sendmsg net/socket.c:2595 [inline]
      __se_sys_sendmsg net/socket.c:2593 [inline]
      __x64_sys_sendmsg+0x304/0x490 net/socket.c:2593
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Bytes 2856-2857 of 3500 are uninitialized
      Memory access of size 3500 starts at ffff888018d99104
      Data copied to user address 0000000020000480
      
      Fixes: d83b0603 ("net: add fdb generic dump routine")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20230621174720.1845040-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa540695
    • Eric Dumazet's avatar
      netlink: fix potential deadlock in netlink_set_err() · 8d61f926
      Eric Dumazet authored
      syzbot reported a possible deadlock in netlink_set_err() [1]
      
      A similar issue was fixed in commit 1d482e66 ("netlink: disable IRQs
      for netlink_lock_table()") in netlink_lock_table()
      
      This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
      which were not covered by cited commit.
      
      [1]
      
      WARNING: possible irq lock inversion dependency detected
      6.4.0-rc6-syzkaller-00240-g4e9f0ec3 #0 Not tainted
      
      syz-executor.2/23011 just changed the state of lock:
      ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
      but this lock was taken by another, SOFTIRQ-safe lock in the past:
       (&local->queue_stop_reason_lock){..-.}-{2:2}
      
      and interrupts could create inverse lock ordering between them.
      
      other info that might help us debug this:
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(nl_table_lock);
                                     local_irq_disable();
                                     lock(&local->queue_stop_reason_lock);
                                     lock(nl_table_lock);
        <Interrupt>
          lock(&local->queue_stop_reason_lock);
      
       *** DEADLOCK ***
      
      Fixes: 1d482e66 ("netlink: disable IRQs for netlink_lock_table()")
      Reported-by: syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com
      Link: https://syzkaller.appspot.com/bug?extid=a7d200a347f912723e5c
      Link: https://lore.kernel.org/netdev/000000000000e38d1605fea5747e@google.com/T/#uSigned-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Link: https://lore.kernel.org/r/20230621154337.1668594-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8d61f926
    • Bartosz Golaszewski's avatar
      net: stmmac: fix double serdes powerdown · c4fc88ad
      Bartosz Golaszewski authored
      Commit 49725ffc ("net: stmmac: power up/down serdes in
      stmmac_open/release") correctly added a call to the serdes_powerdown()
      callback to stmmac_release() but did not remove the one from
      stmmac_remove() which leads to a doubled call to serdes_powerdown().
      
      This can lead to all kinds of problems: in the case of the qcom ethqos
      driver, it caused an unbalanced regulator disable splat.
      
      Fixes: 49725ffc ("net: stmmac: power up/down serdes in stmmac_open/release")
      Signed-off-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Acked-by: default avatarJunxiao Chang <junxiao.chang@intel.com>
      Reviewed-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Tested-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Link: https://lore.kernel.org/r/20230621135537.376649-1-brgl@bgdev.plSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c4fc88ad
    • Sathesh Edara's avatar
      MAINTAINERS: update email addresses of octeon_ep driver maintainers · b9ec61be
      Sathesh Edara authored
      Update email addresses of Marvell octeon_ep driver maintainers.
      Also remove a former maintainer.
      
      As a maintainer below are the responsibilities:
      - Pushing the bug fixes and new features to upstream.
      - Responsible for reviewing the external changes
        submitted for the octeon_ep driver.
      - Reply to maintainers questions in a timely manner.
      Signed-off-by: default avatarSathesh Edara <sedara@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b9ec61be
    • Krzysztof Kozlowski's avatar
      Bluetooth: MAINTAINERS: add Devicetree bindings to Bluetooth drivers · 533bbc7c
      Krzysztof Kozlowski authored
      The Devicetree bindings should be picked up by subsystem maintainers,
      but respective pattern for Bluetooth drivers was missing.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      533bbc7c
    • Linus Torvalds's avatar
      Merge tag 'net-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8a28a0b6
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from ipsec, bpf, mptcp and netfilter.
      
        Current release - regressions:
      
         - netfilter: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
      
         - eth: mlx5e:
            - fix scheduling of IPsec ASO query while in atomic
            - free IRQ rmap and notifier on kernel shutdown
      
        Current release - new code bugs:
      
         - phy: manual remove LEDs to ensure correct ordering
      
        Previous releases - regressions:
      
         - mptcp: fix possible divide by zero in recvmsg()
      
         - dsa: revert "net: phy: dp83867: perform soft reset and retain
           established link"
      
        Previous releases - always broken:
      
         - sched: netem: acquire qdisc lock in netem_change()
      
         - bpf:
            - fix verifier id tracking of scalars on spill
            - fix NULL dereference on exceptions
            - accept function names that contain dots
      
         - netfilter: disallow element updates of bound anonymous sets
      
         - mptcp: ensure listener is unhashed before updating the sk status
      
         - xfrm:
            - add missed call to delete offloaded policies
            - fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets
      
         - selftests: fixes for FIPS mode
      
         - dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling
      
         - eth: sfc: use budget for TX completions
      
        Misc:
      
         - wifi: iwlwifi: add support for SO-F device with PCI id 0x7AF0"
      
      * tag 'net-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (74 commits)
        revert "net: align SO_RCVMARK required privileges with SO_MARK"
        net: wwan: iosm: Convert single instance struct member to flexible array
        sch_netem: acquire qdisc lock in netem_change()
        selftests: forwarding: Fix race condition in mirror installation
        wifi: mac80211: report all unusable beacon frames
        mptcp: ensure listener is unhashed before updating the sk status
        mptcp: drop legacy code around RX EOF
        mptcp: consolidate fallback and non fallback state machine
        mptcp: fix possible list corruption on passive MPJ
        mptcp: fix possible divide by zero in recvmsg()
        mptcp: handle correctly disconnect() failures
        bpf: Force kprobe multi expected_attach_type for kprobe_multi link
        bpf/btf: Accept function names that contain dots
        Revert "net: phy: dp83867: perform soft reset and retain established link"
        net: mdio: fix the wrong parameters
        netfilter: nf_tables: Fix for deleting base chains with payload
        netfilter: nfnetlink_osf: fix module autoload
        netfilter: nf_tables: drop module reference after updating chain
        netfilter: nf_tables: disallow timeout for anonymous sets
        netfilter: nf_tables: disallow updates of anonymous sets
        ...
      8a28a0b6
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 412d070b
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "ARM:
      
         - Correctly save/restore PMUSERNR_EL0 when host userspace is using
           PMU counters directly
      
         - Fix GICv2 emulation on GICv3 after the locking rework
      
         - Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and
           document why
      
        Generic:
      
         - Avoid setting page table entries pointing to a deleted memslot if a
           host page table entry is changed concurrently with the deletion"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: Avoid illegal stage2 mapping on invalid memory slot
        KVM: arm64: Use raw_smp_processor_id() in kvm_pmu_probe_armpmu()
        KVM: arm64: Restore GICv2-on-GICv3 functionality
        KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded
        KVM: arm64: PMU: Restore the host's PMUSERENR_EL0
      412d070b
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · e7758c0d
      Linus Torvalds authored
      Pull powerpc fix from Michael Ellerman:
      
       - Disable IRQs when switching mm in exit_lazy_flush_tlb() called from
         exit_mmap()
      
      Thanks to Nicholas Piggin and Sachin Sant.
      
      * tag 'powerpc-6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs enabled
      e7758c0d
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · 4a426aa1
      Linus Torvalds authored
      Pull pci fix from Bjorn Helgaas:
      
       - Transfer Intel LGM GW PCIe maintenance from Rahul Tanwar to Chuanhua
         Lei (Zhu YiXin)
      
      * tag 'pci-v6.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
        MAINTAINERS: Add Chuanhua Lei as Intel LGM GW PCIe maintainer
      4a426aa1
    • Linus Torvalds's avatar
      Merge tag 'mmc-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 93765002
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
      
       - Fix support for deferred probing for several host drivers
      
       - litex_mmc: Use async probe as it's common for all mmc hosts
      
       - meson-gx: Fix bug when scheduling while atomic
      
       - mmci_stm32: Fix max busy timeout calculation
      
       - sdhci-msm: Disable broken 64-bit DMA on MSM8916
      
      * tag 'mmc-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: usdhi60rol0: fix deferred probing
        mmc: sunxi: fix deferred probing
        mmc: sh_mmcif: fix deferred probing
        mmc: sdhci-spear: fix deferred probing
        mmc: sdhci-acpi: fix deferred probing
        mmc: owl: fix deferred probing
        mmc: omap_hsmmc: fix deferred probing
        mmc: omap: fix deferred probing
        mmc: mvsdio: fix deferred probing
        mmc: mtk-sd: fix deferred probing
        mmc: meson-gx: fix deferred probing
        mmc: bcm2835: fix deferred probing
        mmc: litex_mmc: set PROBE_PREFER_ASYNCHRONOUS
        mmc: meson-gx: remove redundant mmc_request_done() call from irq context
        mmc: mmci: stm32: fix max busy timeout calculation
        mmc: sdhci-msm: Disable broken 64-bit DMA on MSM8916
      93765002
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v6.4-5' of... · 65d48989
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fix from Hans de Goede:
       "One small fix for an AMD PMF driver issue which is causing issues for
        users of just released AMD laptop models"
      
      * tag 'platform-drivers-x86-v6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/x86/amd/pmf: Register notify handler only if SPS is enabled
      65d48989
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux · c213de63
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A fix for a race condition with poll removal and linked timeouts, and
        then a few followup fixes/tweaks for the msg_control patch from last
        week.
      
        Not super important, particularly the sparse fixup, as it was broken
        before that recent commit. But let's get it sorted for real for this
        release, rather than just have it broken a bit differently"
      
      * tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux:
        io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr
        io_uring/net: disable partial retries for recvmsg with cmsg
        io_uring/net: clear msg_controllen on partial sendmsg retry
        io_uring/poll: serialize poll linked timer start with poll removal
      c213de63
    • Linus Torvalds's avatar
      Merge tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 5950a006
      Linus Torvalds authored
      Pull cgroup fixes from Tejun Heo:
       "It's late but here are two bug fixes. Both fix problems which can be
        severe but are very confined in scope. The risk to most use cases
        should be minimal.
      
         - Fix for an old bug which triggers if a cgroup subsystem is
           remounted to a different hierarchy while someone is reading its
           cgroup.procs/tasks file. The risk is pretty low given how seldom
           cgroup subsystems are moved across hierarchies.
      
         - We moved cpus_read_lock() outside of cgroup internal locks a while
           ago but forgot to update the legacy_freezer leading to lockdep
           triggers. Fixed"
      
      * tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: Do not corrupt task iteration when rebinding subsystem
        cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in freezer_css_{online,offline}()
      5950a006
  3. 22 Jun, 2023 19 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.4-4' of... · 2623b3dc
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.4, take #4
      
      - Correctly save/restore PMUSERNR_EL0 when host userspace is using
        PMU counters directly
      
      - Fix GICv2 emulation on GICv3 after the locking rework
      
      - Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and
        document why...
      2623b3dc
    • Gavin Shan's avatar
      KVM: Avoid illegal stage2 mapping on invalid memory slot · 2230f9e1
      Gavin Shan authored
      We run into guest hang in edk2 firmware when KSM is kept as running on
      the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
      device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
      buffered write. The status is returned by reading the memory region of
      the pflash device and the read request should have been forwarded to QEMU
      and emulated by it. Unfortunately, the read request is covered by an
      illegal stage2 mapping when the guest hang issue occurs. The read request
      is completed with QEMU bypassed and wrong status is fetched. The edk2
      firmware runs into an infinite loop with the wrong status.
      
      The illegal stage2 mapping is populated due to same page sharing by KSM
      at (C) even the associated memory slot has been marked as invalid at (B)
      when the memory slot is requested to be deleted. It's notable that the
      active and inactive memory slots can't be swapped when we're in the middle
      of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
      is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
      to zero again. Besides, the swapping from the active to the inactive memory
      slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
      corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
      
        CPU-A                    CPU-B
        -----                    -----
                                 ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
                                 kvm_vm_ioctl_set_memory_region
                                 kvm_set_memory_region
                                 __kvm_set_memory_region
                                 kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
                                   kvm_invalidate_memslot
                                     kvm_copy_memslot
                                     kvm_replace_memslot
                                     kvm_swap_active_memslots        (A)
                                     kvm_arch_flush_shadow_memslot   (B)
        same page sharing by KSM
        kvm_mmu_notifier_invalidate_range_start
              :
        kvm_mmu_notifier_change_pte
          kvm_handle_hva_range
          __kvm_handle_hva_range
          kvm_set_spte_gfn            (C)
              :
        kvm_mmu_notifier_invalidate_range_end
      
      Fix the issue by skipping the invalid memory slot at (C) to avoid the
      illegal stage2 mapping so that the read request for the pflash's status
      is forwarded to QEMU and emulated by it. In this way, the correct pflash's
      status can be returned from QEMU to break the infinite loop in the edk2
      firmware.
      
      We tried a git-bisect and the first problematic commit is cd4c7183 ("
      KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
      clean_dcache_guest_page() is called after the memory slots are iterated
      in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
      before the iteration on the memory slots before this commit. This change
      literally enlarges the racy window between kvm_mmu_notifier_change_pte()
      and memory slot removal so that we're able to reproduce the issue in a
      practical test case. However, the issue exists since commit d5d8184d
      ("KVM: ARM: Memory virtualization setup").
      
      Cc: stable@vger.kernel.org # v3.9+
      Fixes: d5d8184d ("KVM: ARM: Memory virtualization setup")
      Reported-by: default avatarShuai Hu <hshuai@redhat.com>
      Reported-by: default avatarZhenyu Zhang <zhenyzha@redhat.com>
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarShaoqin Huang <shahuang@redhat.com>
      Message-Id: <20230615054259.14911-1-gshan@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2230f9e1
    • Vinicius Costa Gomes's avatar
      igc: Work around HW bug causing missing timestamps · c789ad7c
      Vinicius Costa Gomes authored
      There's an hardware issue that can cause missing timestamps. The bug
      is that the interrupt is only cleared if the IGC_TXSTMPH_0 register is
      read.
      
      The bug can cause a race condition if a timestamp is captured at the
      wrong time, and we will miss that timestamp. To reduce the time window
      that the problem is able to happen, in case no timestamp was ready, we
      read the "previous" value of the timestamp registers, and we compare
      with the "current" one, if it didn't change we can be reasonably sure
      that no timestamp was captured. If they are different, we use the new
      value as the captured timestamp.
      
      The HW bug is not easy to reproduce, got to reproduce it when smashing
      the NIC with timestamping requests from multiple applications (e.g.
      multiple ntpperf instances + ptp4l), after 10s of minutes.
      
      This workaround has more impact when multiple timestamp registers are
      used, and the IGC_TXSTMPH_0 register always need to be read, so the
      interrupt is cleared.
      
      Fixes: 2c344ae2 ("igc: Add support for TX timestamping")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c789ad7c
    • Vinicius Costa Gomes's avatar
      igc: Retrieve TX timestamp during interrupt handling · afa14158
      Vinicius Costa Gomes authored
      When the interrupt is handled, the TXTT_0 bit in the TSYNCTXCTL
      register should already be set and the timestamp value already loaded
      in the appropriate register.
      
      This simplifies the handling, and reduces the latency for retrieving
      the TX timestamp, which increase the amount of TX timestamps that can
      be handled in a given time period.
      
      As the "work" function doesn't run in a workqueue anymore, rename it
      to something more sensible, a event handler.
      
      Using ntpperf[1] we can see the following performance improvements:
      
      Before:
      
      $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37
                     |          responses            |     TX timestamp offset (ns)
      rate   clients |  lost invalid   basic  xleave |    min    mean     max stddev
      1000       100   0.00%   0.00%   0.00% 100.00%      -56      +9     +52     19
      1500       150   0.00%   0.00%   0.00% 100.00%      -40     +30     +75     22
      2250       225   0.00%   0.00%   0.00% 100.00%      -11     +29     +72     15
      3375       337   0.00%   0.00%   0.00% 100.00%      -18     +40     +88     22
      5062       506   0.00%   0.00%   0.00% 100.00%      -19     +23     +77     15
      7593       759   0.00%   0.00%   0.00% 100.00%       +7     +47   +5168     43
      11389     1138   0.00%   0.00%   0.00% 100.00%      -11     +41   +5240     39
      17083     1708   0.00%   0.00%   0.00% 100.00%      +19     +60   +5288     50
      25624     2562   0.00%   0.00%   0.00% 100.00%       +1     +56   +5368     58
      38436     3843   0.00%   0.00%   0.00% 100.00%      -84     +12   +8847     66
      57654     5765   0.00%   0.00% 100.00%   0.00%
      86481     8648   0.00%   0.00% 100.00%   0.00%
      129721   12972   0.00%   0.00% 100.00%   0.00%
      194581   16384   0.00%   0.00% 100.00%   0.00%
      291871   16384  27.35%   0.00%  72.65%   0.00%
      437806   16384  50.05%   0.00%  49.95%   0.00%
      
      After:
      
      $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37
                     |          responses            |     TX timestamp offset (ns)
      rate   clients |  lost invalid   basic  xleave |    min    mean     max stddev
      1000       100   0.00%   0.00%   0.00% 100.00%      -44      +0     +61     19
      1500       150   0.00%   0.00%   0.00% 100.00%       -6     +39     +81     16
      2250       225   0.00%   0.00%   0.00% 100.00%      -22     +25     +69     15
      3375       337   0.00%   0.00%   0.00% 100.00%      -28     +15     +56     14
      5062       506   0.00%   0.00%   0.00% 100.00%       +7     +78    +143     27
      7593       759   0.00%   0.00%   0.00% 100.00%      -54     +24    +144     47
      11389     1138   0.00%   0.00%   0.00% 100.00%      -90     -33     +28     21
      17083     1708   0.00%   0.00%   0.00% 100.00%      -50      -2     +35     14
      25624     2562   0.00%   0.00%   0.00% 100.00%      -62      +7     +66     23
      38436     3843   0.00%   0.00%   0.00% 100.00%      -33     +30   +5395     36
      57654     5765   0.00%   0.00% 100.00%   0.00%
      86481     8648   0.00%   0.00% 100.00%   0.00%
      129721   12972   0.00%   0.00% 100.00%   0.00%
      194581   16384  19.50%   0.00%  80.50%   0.00%
      291871   16384  35.81%   0.00%  64.19%   0.00%
      437806   16384  55.40%   0.00%  44.60%   0.00%
      
      [1] https://github.com/mlichvar/ntpperfSigned-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      afa14158
    • Vinicius Costa Gomes's avatar
      igc: Check if hardware TX timestamping is enabled earlier · ce58c7cc
      Vinicius Costa Gomes authored
      Before requesting a packet transmission to be hardware timestamped,
      check if the user has TX timestamping enabled. Fixes an issue that if
      a packet was internally forwarded to the NIC, and it had the
      SKBTX_HW_TSTAMP flag set, the driver would mark that timestamp as
      skipped.
      
      In reality, that timestamp was "not for us", as TX timestamp could
      never be enabled in the NIC.
      
      Checking if the TX timestamping is enabled earlier has a secondary
      effect that when TX timestamping is disabled, there's no need to check
      for timestamp timeouts.
      
      We should only take care to free any pending timestamp when TX
      timestamping is disabled, as that skb would never be released
      otherwise.
      
      Fixes: 2c344ae2 ("igc: Add support for TX timestamping")
      Suggested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      ce58c7cc
    • Vinicius Costa Gomes's avatar
      igc: Fix race condition in PTP tx code · 9c50e2b1
      Vinicius Costa Gomes authored
      Currently, the igc driver supports timestamping only one tx packet at a
      time. During the transmission flow, the skb that requires hardware
      timestamping is saved in adapter->ptp_tx_skb. Once hardware has the
      timestamp, an interrupt is delivered, and adapter->ptp_tx_work is
      scheduled. In igc_ptp_tx_work(), we read the timestamp register, update
      adapter->ptp_tx_skb, and notify the network stack.
      
      While the thread executing the transmission flow (the user process
      running in kernel mode) and the thread executing ptp_tx_work don't
      access adapter->ptp_tx_skb concurrently, there are two other places
      where adapter->ptp_tx_skb is accessed: igc_ptp_tx_hang() and
      igc_ptp_suspend().
      
      igc_ptp_tx_hang() is executed by the adapter->watchdog_task worker
      thread which runs periodically so it is possible we have two threads
      accessing ptp_tx_skb at the same time. Consider the following scenario:
      right after __IGC_PTP_TX_IN_PROGRESS is set in igc_xmit_frame_ring(),
      igc_ptp_tx_hang() is executed. Since adapter->ptp_tx_start hasn't been
      written yet, this is considered a timeout and adapter->ptp_tx_skb is
      cleaned up.
      
      This patch fixes the issue described above by adding the ptp_tx_lock to
      protect access to ptp_tx_skb and ptp_tx_start fields from igc_adapter.
      Since igc_xmit_frame_ring() called in atomic context by the networking
      stack, ptp_tx_lock is defined as a spinlock, and the irq safe variants
      of lock/unlock are used.
      
      With the introduction of the ptp_tx_lock, the __IGC_PTP_TX_IN_PROGRESS
      flag doesn't provide much of a use anymore so this patch gets rid of it.
      
      Fixes: 2c344ae2 ("igc: Add support for TX timestamping")
      Signed-off-by: default avatarAndre Guedes <andre.guedes@intel.com>
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9c50e2b1
    • Paolo Abeni's avatar
      Merge tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 2ba7e7eb
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      This is v3, including a crash fix for patch 01/14.
      
      The following patchset contains Netfilter/IPVS fixes for net:
      
      1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock.
      
      2) Fix chain binding transaction logic, add a bound flag to rule
         transactions. Remove incorrect logic in nft_data_hold() and
         nft_data_release().
      
      3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing
         the set/chain as a follow up to 1240eb93 ("netfilter: nf_tables:
         incorrect error path handling with NFT_MSG_NEWRULE")
      
      4) Drop map element references from preparation phase instead of
         set destroy path, otherwise bogus EBUSY with transactions such as:
      
              flush chain ip x y
              delete chain ip x w
      
         where chain ip x y contains jump/goto from set elements.
      
      5) Pipapo set type does not regard generation mask from the walk
         iteration.
      
      6) Fix reference count underflow in set element reference to
         stateful object.
      
      7) Several patches to tighten the nf_tables API:
         - disallow set element updates of bound anonymous set
         - disallow unbound anonymous set/chain at the end of transaction.
         - disallow updates of anonymous set.
         - disallow timeout configuration for anonymous sets.
      
      8) Fix module reference leak in chain updates.
      
      9) Fix nfnetlink_osf module autoload.
      
      10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as
          in iptables-nft.
      
      This Netfilter batch is larger than usual at this stage, I am aware we
      are fairly late in the -rc cycle, if you prefer to route them through
      net-next, please let me know.
      
      netfilter pull request 23-06-21
      
      * tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: Fix for deleting base chains with payload
        netfilter: nfnetlink_osf: fix module autoload
        netfilter: nf_tables: drop module reference after updating chain
        netfilter: nf_tables: disallow timeout for anonymous sets
        netfilter: nf_tables: disallow updates of anonymous sets
        netfilter: nf_tables: reject unbound chain set before commit phase
        netfilter: nf_tables: reject unbound anonymous set before commit phase
        netfilter: nf_tables: disallow element updates of bound anonymous sets
        netfilter: nf_tables: fix underflow in object reference counter
        netfilter: nft_set_pipapo: .walk does not deal with generations
        netfilter: nf_tables: drop map element references from preparation phase
        netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
        netfilter: nf_tables: fix chain binding transaction logic
        ipvs: align inner_mac_header for encapsulation
      ====================
      
      Link: https://lore.kernel.org/r/20230621100731.68068-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2ba7e7eb
    • Maciej Żenczykowski's avatar
      revert "net: align SO_RCVMARK required privileges with SO_MARK" · a9628e88
      Maciej Żenczykowski authored
      This reverts commit 1f86123b ("net: align SO_RCVMARK required
      privileges with SO_MARK") because the reasoning in the commit message
      is not really correct:
        SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such
        it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check
        and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which
        sets the socket mark and does require privs.
      
        Additionally incoming skb->mark may already be visible if
        sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled.
      
        Furthermore, it is easier to block the getsockopt via bpf
        (either cgroup setsockopt hook, or via syscall filters)
        then to unblock it if it requires CAP_NET_RAW/ADMIN.
      
      On Android the socket mark is (among other things) used to store
      the network identifier a socket is bound to.  Setting it is privileged,
      but retrieving it is not.  We'd like unprivileged userspace to be able
      to read the network id of incoming packets (where mark is set via
      iptables [to be moved to bpf])...
      
      An alternative would be to add another sysctl to control whether
      setting SO_RCVMARK is privilged or not.
      (or even a MASK of which bits in the mark can be exposed)
      But this seems like over-engineering...
      
      Note: This is a non-trivial revert, due to later merged commit e42c7bee
      ("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()")
      which changed both 'ns_capable' into 'sockopt_ns_capable' calls.
      
      Fixes: 1f86123b ("net: align SO_RCVMARK required privileges with SO_MARK")
      Cc: Larysa Zaremba <larysa.zaremba@intel.com>
      Cc: Simon Horman <simon.horman@corigine.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Eyal Birger <eyal.birger@gmail.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Patrick Rohr <prohr@google.com>
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a9628e88
    • Kees Cook's avatar
      net: wwan: iosm: Convert single instance struct member to flexible array · dec24b3b
      Kees Cook authored
      struct mux_adth actually ends with multiple struct mux_adth_dg members.
      This is seen both in the comments about the member:
      
      /**
       * struct mux_adth - Structure of the Aggregated Datagram Table Header.
       ...
       * @dg:		datagramm table with variable length
       */
      
      and in the preparation for populating it:
      
                              adth_dg_size = offsetof(struct mux_adth, dg) +
                                              ul_adb->dg_count[i] * sizeof(*dg);
      			...
                              adth_dg_size -= offsetof(struct mux_adth, dg);
                              memcpy(&adth->dg, ul_adb->dg[i], adth_dg_size);
      
      This was reported as a run-time false positive warning:
      
      memcpy: detected field-spanning write (size 16) of single field "&adth->dg" at drivers/net/wwan/iosm/iosm_ipc_mux_codec.c:852 (size 8)
      
      Adjust the struct mux_adth definition and associated sizeof() math; no binary
      output differences are observed in the resulting object file.
      Reported-by: default avatarFlorian Klink <flokli@flokli.de>
      Closes: https://lore.kernel.org/lkml/dbfa25f5-64c8-5574-4f5d-0151ba95d232@gmail.com/
      Fixes: 1f52d7b6 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
      Cc: M Chetan Kumar <m.chetan.kumar@intel.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: Intel Corporation <linuxwwan@intel.com>
      Cc: Loic Poulain <loic.poulain@linaro.org>
      Cc: Sergey Ryazanov <ryazanov.s.a@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230620194234.never.023-kees@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dec24b3b
    • Eric Dumazet's avatar
      sch_netem: acquire qdisc lock in netem_change() · 2174a08d
      Eric Dumazet authored
      syzbot managed to trigger a divide error [1] in netem.
      
      It could happen if q->rate changes while netem_enqueue()
      is running, since q->rate is read twice.
      
      It turns out netem_change() always lacked proper synchronization.
      
      [1]
      divide error: 0000 [#1] SMP KASAN
      CPU: 1 PID: 7867 Comm: syz-executor.1 Not tainted 6.1.30-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
      RIP: 0010:div64_u64 include/linux/math64.h:69 [inline]
      RIP: 0010:packet_time_ns net/sched/sch_netem.c:357 [inline]
      RIP: 0010:netem_enqueue+0x2067/0x36d0 net/sched/sch_netem.c:576
      Code: 89 e2 48 69 da 00 ca 9a 3b 42 80 3c 28 00 4c 8b a4 24 88 00 00 00 74 0d 4c 89 e7 e8 c3 4f 3b fd 48 8b 4c 24 18 48 89 d8 31 d2 <49> f7 34 24 49 01 c7 4c 8b 64 24 48 4d 01 f7 4c 89 e3 48 c1 eb 03
      RSP: 0018:ffffc9000dccea60 EFLAGS: 00010246
      RAX: 000001a442624200 RBX: 000001a442624200 RCX: ffff888108a4f000
      RDX: 0000000000000000 RSI: 000000000000070d RDI: 000000000000070d
      RBP: ffffc9000dcceb90 R08: ffffffff849c5e26 R09: fffffbfff10e1297
      R10: 0000000000000000 R11: dffffc0000000001 R12: ffff888108a4f358
      R13: dffffc0000000000 R14: 0000001a8cd9a7ec R15: 0000000000000000
      FS: 00007fa73fe18700(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa73fdf7718 CR3: 000000011d36e000 CR4: 0000000000350ee0
      Call Trace:
      <TASK>
      [<ffffffff84714385>] __dev_xmit_skb net/core/dev.c:3931 [inline]
      [<ffffffff84714385>] __dev_queue_xmit+0xcf5/0x3370 net/core/dev.c:4290
      [<ffffffff84d22df2>] dev_queue_xmit include/linux/netdevice.h:3030 [inline]
      [<ffffffff84d22df2>] neigh_hh_output include/net/neighbour.h:531 [inline]
      [<ffffffff84d22df2>] neigh_output include/net/neighbour.h:545 [inline]
      [<ffffffff84d22df2>] ip_finish_output2+0xb92/0x10d0 net/ipv4/ip_output.c:235
      [<ffffffff84d21e63>] __ip_finish_output+0xc3/0x2b0
      [<ffffffff84d10a81>] ip_finish_output+0x31/0x2a0 net/ipv4/ip_output.c:323
      [<ffffffff84d10f14>] NF_HOOK_COND include/linux/netfilter.h:298 [inline]
      [<ffffffff84d10f14>] ip_output+0x224/0x2a0 net/ipv4/ip_output.c:437
      [<ffffffff84d123b5>] dst_output include/net/dst.h:444 [inline]
      [<ffffffff84d123b5>] ip_local_out net/ipv4/ip_output.c:127 [inline]
      [<ffffffff84d123b5>] __ip_queue_xmit+0x1425/0x2000 net/ipv4/ip_output.c:542
      [<ffffffff84d12fdc>] ip_queue_xmit+0x4c/0x70 net/ipv4/ip_output.c:556
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230620184425.1179809-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2174a08d
    • Oliver Hartkopp's avatar
      can: isotp: isotp_sendmsg(): fix return error fix on TX path · e38910c0
      Oliver Hartkopp authored
      With commit d674a8f1 ("can: isotp: isotp_sendmsg(): fix return
      error on FC timeout on TX path") the missing correct return value in
      the case of a protocol error was introduced.
      
      But the way the error value has been read and sent to the user space
      does not follow the common scheme to clear the error after reading
      which is provided by the sock_error() function. This leads to an error
      report at the following write() attempt although everything should be
      working.
      
      Fixes: d674a8f1 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path")
      Reported-by: default avatarCarsten Schmidt <carsten.schmidt-achim@t-online.de>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/all/20230607072708.38809-1-socketcan@hartkopp.net
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      e38910c0
    • Shyam Sundar S K's avatar
      platform/x86/amd/pmf: Register notify handler only if SPS is enabled · 146b6f68
      Shyam Sundar S K authored
      Power source notify handler is getting registered even when none of the
      PMF feature in enabled leading to a crash.
      
      ...
      [   22.592162] Call Trace:
      [   22.592164]  <TASK>
      [   22.592164]  ? rcu_note_context_switch+0x5e0/0x660
      [   22.592166]  ? __warn+0x81/0x130
      [   22.592171]  ? rcu_note_context_switch+0x5e0/0x660
      [   22.592172]  ? report_bug+0x171/0x1a0
      [   22.592175]  ? prb_read_valid+0x1b/0x30
      [   22.592177]  ? handle_bug+0x3c/0x80
      [   22.592178]  ? exc_invalid_op+0x17/0x70
      [   22.592179]  ? asm_exc_invalid_op+0x1a/0x20
      [   22.592182]  ? rcu_note_context_switch+0x5e0/0x660
      [   22.592183]  ? acpi_ut_delete_object_desc+0x86/0xb0
      [   22.592186]  ? acpi_ut_update_ref_count.part.0+0x22d/0x930
      [   22.592187]  __schedule+0xc0/0x1410
      [   22.592189]  ? ktime_get+0x3c/0xa0
      [   22.592191]  ? lapic_next_event+0x1d/0x30
      [   22.592193]  ? hrtimer_start_range_ns+0x25b/0x350
      [   22.592196]  schedule+0x5e/0xd0
      [   22.592197]  schedule_hrtimeout_range_clock+0xbe/0x140
      [   22.592199]  ? __pfx_hrtimer_wakeup+0x10/0x10
      [   22.592200]  usleep_range_state+0x64/0x90
      [   22.592203]  amd_pmf_send_cmd+0x106/0x2a0 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616]
      [   22.592207]  amd_pmf_update_slider+0x56/0x1b0 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616]
      [   22.592210]  amd_pmf_set_sps_power_limits+0x72/0x80 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616]
      [   22.592213]  amd_pmf_pwr_src_notify_call+0x49/0x90 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616]
      [   22.592216]  notifier_call_chain+0x5a/0xd0
      [   22.592218]  atomic_notifier_call_chain+0x32/0x50
      ...
      
      Fix this by moving the registration of source change notify handler only
      when SPS(Static Slider) is advertised as supported.
      Reported-by: default avatarAllen Zhong <allen@atr.me>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217571
      Fixes: 4c71ae41 ("platform/x86/amd/pmf: Add support SPS PMF feature")
      Tested-by: default avatarPatil Rajesh Reddy <Patil.Reddy@amd.com>
      Reviewed-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Signed-off-by: default avatarShyam Sundar S K <Shyam-sundar.S-k@amd.com>
      Link: https://lore.kernel.org/r/20230622060309.310001-1-Shyam-sundar.S-k@amd.comReviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      146b6f68
    • Danielle Ratson's avatar
      selftests: forwarding: Fix race condition in mirror installation · c7c059fb
      Danielle Ratson authored
      When mirroring to a gretap in hardware the device expects to be
      programmed with the egress port and all the encapsulating headers. This
      requires the driver to resolve the path the packet will take in the
      software data path and program the device accordingly.
      
      If the path cannot be resolved (in this case because of an unresolved
      neighbor), then mirror installation fails until the path is resolved.
      This results in a race that causes the test to sometimes fail.
      
      Fix this by setting the neighbor's state to permanent in a couple of
      tests, so that it is always valid.
      
      Fixes: 35c31d5c ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1d")
      Fixes: 239e754a ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/268816ac729cb6028c7a34d4dda6f4ec7af55333.1687264607.git.petrm@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c7c059fb
    • Benjamin Berg's avatar
      wifi: mac80211: report all unusable beacon frames · 7f4e0970
      Benjamin Berg authored
      Properly check for RX_DROP_UNUSABLE now that the new drop reason
      infrastructure is used. Without this change, the comparison will always
      be false as a more specific reason is given in the lower bits of result.
      
      Fixes: baa951a1 ("mac80211: use the new drop reasons infrastructure")
      Signed-off-by: default avatarBenjamin Berg <benjamin.berg@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Link: https://lore.kernel.org/r/20230621120543.412920-2-johannes@sipsolutions.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7f4e0970
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fixes-for-6-4' · 533aa0ba
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      mptcp: fixes for 6.4
      
      Patch 1 correctly handles disconnect() failures that can happen in some
      specific cases: now the socket state is set as unconnected as expected.
      That fixes an issue introduced in v6.2.
      
      Patch 2 fixes a divide by zero bug in mptcp_recvmsg() with a fix similar
      to a recent one from Eric Dumazet for TCP introducing sk_wait_pending
      flag. It should address an issue present in MPTCP from almost the
      beginning, from v5.9.
      
      Patch 3 fixes a possible list corruption on passive MPJ even if the race
      seems very unlikely, better be safe than sorry. The possible issue is
      present from v5.17.
      
      Patch 4 consolidates fallback and non fallback state machines to avoid
      leaking some MPTCP sockets. The fix is likely needed for versions from
      v5.11.
      
      Patch 5 drops code that is no longer used after the introduction of
      patch 4/6. This is not really a fix but this patch can probably land in
      the -net tree as well not to leave unused code.
      
      Patch 6 ensures listeners are unhashed before updating their sk status
      to avoid possible deadlocks when diag info are going to be retrieved
      with a lock. Even if it should not be visible with the way we are
      currently getting diag info, the issue is present from v5.17.
      ====================
      
      Link: https://lore.kernel.org/r/20230620-upstream-net-20230620-misc-fixes-for-v6-4-v1-0-f36aa5eae8b9@tessares.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      533aa0ba
    • Paolo Abeni's avatar
      mptcp: ensure listener is unhashed before updating the sk status · 57fc0f1c
      Paolo Abeni authored
      The MPTCP protocol access the listener subflow in a lockless
      manner in a couple of places (poll, diag). That works only if
      the msk itself leaves the listener status only after that the
      subflow itself has been closed/disconnected. Otherwise we risk
      deadlock in diag, as reported by Christoph.
      
      Address the issue ensuring that the first subflow (the listener
      one) is always disconnected before updating the msk socket status.
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/407
      Fixes: b29fcfb5 ("mptcp: full disconnect implementation")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      57fc0f1c
    • Paolo Abeni's avatar
      mptcp: drop legacy code around RX EOF · b7535cfe
      Paolo Abeni authored
      Thanks to the previous patch -- "mptcp: consolidate fallback and non
      fallback state machine" -- we can finally drop the "temporary hack"
      used to detect rx eof.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b7535cfe
    • Paolo Abeni's avatar
      mptcp: consolidate fallback and non fallback state machine · 81c1d029
      Paolo Abeni authored
      An orphaned msk releases the used resources via the worker,
      when the latter first see the msk in CLOSED status.
      
      If the msk status transitions to TCP_CLOSE in the release callback
      invoked by the worker's final release_sock(), such instance of the
      workqueue will not take any action.
      
      Additionally the MPTCP code prevents scheduling the worker once the
      socket reaches the CLOSE status: such msk resources will be leaked.
      
      The only code path that can trigger the above scenario is the
      __mptcp_check_send_data_fin() in fallback mode.
      
      Address the issue removing the special handling of fallback socket
      in __mptcp_check_send_data_fin(), consolidating the state machine
      for fallback and non fallback socket.
      
      Since non-fallback sockets do not send and do not receive data_fin,
      the mptcp code can update the msk internal status to match the next
      step in the SM every time data fin (ack) should be generated or
      received.
      
      As a consequence we can remove a bunch of checks for fallback from
      the fastpath.
      
      Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      81c1d029
    • Paolo Abeni's avatar
      mptcp: fix possible list corruption on passive MPJ · 56a666c4
      Paolo Abeni authored
      At passive MPJ time, if the msk socket lock is held by the user,
      the new subflow is appended to the msk->join_list under the msk
      data lock.
      
      In mptcp_release_cb()/__mptcp_flush_join_list(), the subflows in
      that list are moved from the join_list into the conn_list under the
      msk socket lock.
      
      Append and removal could race, possibly corrupting such list.
      Address the issue splicing the join list into a temporary one while
      still under the msk data lock.
      
      Found by code inspection, the race itself should be almost impossible
      to trigger in practice.
      
      Fixes: 3e501490 ("mptcp: cleanup MPJ subflow list handling")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      56a666c4