1. 01 May, 2023 6 commits
  2. 28 Apr, 2023 7 commits
    • Angelo Dureghello's avatar
      net: dsa: mv88e6xxx: add mv88e6321 rsvd2cpu · 66863178
      Angelo Dureghello authored
      Add rsvd2cpu capability for mv88e6321 model, to allow proper bpdu
      processing.
      Signed-off-by: default avatarAngelo Dureghello <angelo.dureghello@timesys.com>
      Fixes: 51c901a7 ("net: dsa: mv88e6xxx: distinguish Global 2 Rsvd2CPU")
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66863178
    • Antoine Tenart's avatar
      net: ipv6: fix skb hash for some RST packets · dc6456e9
      Antoine Tenart authored
      The skb hash comes from sk->sk_txhash when using TCP, except for some
      IPv6 RST packets. This is because in tcp_v6_send_reset when not in
      TIME_WAIT the hash is taken from sk->sk_hash, while it should come from
      sk->sk_txhash as those two hashes are not computed the same way.
      
      Packetdrill script to test the above,
      
         0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
        +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
        +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
      
        +0 > (flowlabel 0x1) S 0:0(0) <...>
      
        // Wrong ack seq, trigger a rst.
        +0 < S. 0:0(0) ack 0 win 4000
      
        // Check the flowlabel matches prior one from SYN.
        +0 > (flowlabel 0x1) R 0:0(0) <...>
      
      Fixes: 9258b8b1 ("ipv6: tcp: send consistent autoflowlabel in RST packets")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc6456e9
    • Andrea Mayer's avatar
      selftests: srv6: make srv6_end_dt46_l3vpn_test more robust · 46ef24c6
      Andrea Mayer authored
      On some distributions, the rp_filter is automatically set (=1) by
      default on a netdev basis (also on VRFs).
      In an SRv6 End.DT46 behavior, decapsulated IPv4 packets are routed using
      the table associated with the VRF bound to that tunnel. During lookup
      operations, the rp_filter can lead to packet loss when activated on the
      VRF.
      Therefore, we chose to make this selftest more robust by explicitly
      disabling the rp_filter during tests (as it is automatically set by some
      Linux distributions).
      
      Fixes: 03a0b567 ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior")
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Tested-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46ef24c6
    • Cong Wang's avatar
      sit: update dev->needed_headroom in ipip6_tunnel_bind_dev() · c88f8d5c
      Cong Wang authored
      When a tunnel device is bound with the underlying device, its
      dev->needed_headroom needs to be updated properly. IPv4 tunnels
      already do the same in ip_tunnel_bind_dev(). Otherwise we may
      not have enough header room for skb, especially after commit
      b17f709a ("gue: TX support for using remote checksum offload option").
      
      Fixes: 32b8a8e5 ("sit: add IPv4 over IPv4 support")
      Reported-by: default avatarPalash Oswal <oswalpalash@gmail.com>
      Link: https://lore.kernel.org/netdev/CAGyP=7fDcSPKu6nttbGwt7RXzE3uyYxLjCSE97J64pRxJP8jPA@mail.gmail.com/
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c88f8d5c
    • Vlad Buslov's avatar
      net/sched: cls_api: remove block_cb from driver_list before freeing · da94a778
      Vlad Buslov authored
      Error handler of tcf_block_bind() frees the whole bo->cb_list on error.
      However, by that time the flow_block_cb instances are already in the driver
      list because driver ndo_setup_tc() callback is called before that up the
      call chain in tcf_block_offload_cmd(). This leaves dangling pointers to
      freed objects in the list and causes use-after-free[0]. Fix it by also
      removing flow_block_cb instances from driver_list before deallocating them.
      
      [0]:
      [  279.868433] ==================================================================
      [  279.869964] BUG: KASAN: slab-use-after-free in flow_block_cb_setup_simple+0x631/0x7c0
      [  279.871527] Read of size 8 at addr ffff888147e2bf20 by task tc/2963
      
      [  279.873151] CPU: 6 PID: 2963 Comm: tc Not tainted 6.3.0-rc6+ #4
      [  279.874273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  279.876295] Call Trace:
      [  279.876882]  <TASK>
      [  279.877413]  dump_stack_lvl+0x33/0x50
      [  279.878198]  print_report+0xc2/0x610
      [  279.878987]  ? flow_block_cb_setup_simple+0x631/0x7c0
      [  279.879994]  kasan_report+0xae/0xe0
      [  279.880750]  ? flow_block_cb_setup_simple+0x631/0x7c0
      [  279.881744]  ? mlx5e_tc_reoffload_flows_work+0x240/0x240 [mlx5_core]
      [  279.883047]  flow_block_cb_setup_simple+0x631/0x7c0
      [  279.884027]  tcf_block_offload_cmd.isra.0+0x189/0x2d0
      [  279.885037]  ? tcf_block_setup+0x6b0/0x6b0
      [  279.885901]  ? mutex_lock+0x7d/0xd0
      [  279.886669]  ? __mutex_unlock_slowpath.constprop.0+0x2d0/0x2d0
      [  279.887844]  ? ingress_init+0x1c0/0x1c0 [sch_ingress]
      [  279.888846]  tcf_block_get_ext+0x61c/0x1200
      [  279.889711]  ingress_init+0x112/0x1c0 [sch_ingress]
      [  279.890682]  ? clsact_init+0x2b0/0x2b0 [sch_ingress]
      [  279.891701]  qdisc_create+0x401/0xea0
      [  279.892485]  ? qdisc_tree_reduce_backlog+0x470/0x470
      [  279.893473]  tc_modify_qdisc+0x6f7/0x16d0
      [  279.894344]  ? tc_get_qdisc+0xac0/0xac0
      [  279.895213]  ? mutex_lock+0x7d/0xd0
      [  279.896005]  ? __mutex_lock_slowpath+0x10/0x10
      [  279.896910]  rtnetlink_rcv_msg+0x5fe/0x9d0
      [  279.897770]  ? rtnl_calcit.isra.0+0x2b0/0x2b0
      [  279.898672]  ? __sys_sendmsg+0xb5/0x140
      [  279.899494]  ? do_syscall_64+0x3d/0x90
      [  279.900302]  ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  279.901337]  ? kasan_save_stack+0x2e/0x40
      [  279.902177]  ? kasan_save_stack+0x1e/0x40
      [  279.903058]  ? kasan_set_track+0x21/0x30
      [  279.903913]  ? kasan_save_free_info+0x2a/0x40
      [  279.904836]  ? ____kasan_slab_free+0x11a/0x1b0
      [  279.905741]  ? kmem_cache_free+0x179/0x400
      [  279.906599]  netlink_rcv_skb+0x12c/0x360
      [  279.907450]  ? rtnl_calcit.isra.0+0x2b0/0x2b0
      [  279.908360]  ? netlink_ack+0x1550/0x1550
      [  279.909192]  ? rhashtable_walk_peek+0x170/0x170
      [  279.910135]  ? kmem_cache_alloc_node+0x1af/0x390
      [  279.911086]  ? _copy_from_iter+0x3d6/0xc70
      [  279.912031]  netlink_unicast+0x553/0x790
      [  279.912864]  ? netlink_attachskb+0x6a0/0x6a0
      [  279.913763]  ? netlink_recvmsg+0x416/0xb50
      [  279.914627]  netlink_sendmsg+0x7a1/0xcb0
      [  279.915473]  ? netlink_unicast+0x790/0x790
      [  279.916334]  ? iovec_from_user.part.0+0x4d/0x220
      [  279.917293]  ? netlink_unicast+0x790/0x790
      [  279.918159]  sock_sendmsg+0xc5/0x190
      [  279.918938]  ____sys_sendmsg+0x535/0x6b0
      [  279.919813]  ? import_iovec+0x7/0x10
      [  279.920601]  ? kernel_sendmsg+0x30/0x30
      [  279.921423]  ? __copy_msghdr+0x3c0/0x3c0
      [  279.922254]  ? import_iovec+0x7/0x10
      [  279.923041]  ___sys_sendmsg+0xeb/0x170
      [  279.923854]  ? copy_msghdr_from_user+0x110/0x110
      [  279.924797]  ? ___sys_recvmsg+0xd9/0x130
      [  279.925630]  ? __perf_event_task_sched_in+0x183/0x470
      [  279.926656]  ? ___sys_sendmsg+0x170/0x170
      [  279.927529]  ? ctx_sched_in+0x530/0x530
      [  279.928369]  ? update_curr+0x283/0x4f0
      [  279.929185]  ? perf_event_update_userpage+0x570/0x570
      [  279.930201]  ? __fget_light+0x57/0x520
      [  279.931023]  ? __switch_to+0x53d/0xe70
      [  279.931846]  ? sockfd_lookup_light+0x1a/0x140
      [  279.932761]  __sys_sendmsg+0xb5/0x140
      [  279.933560]  ? __sys_sendmsg_sock+0x20/0x20
      [  279.934436]  ? fpregs_assert_state_consistent+0x1d/0xa0
      [  279.935490]  do_syscall_64+0x3d/0x90
      [  279.936300]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  279.937311] RIP: 0033:0x7f21c814f887
      [  279.938085] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [  279.941448] RSP: 002b:00007fff11efd478 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  279.942964] RAX: ffffffffffffffda RBX: 0000000064401979 RCX: 00007f21c814f887
      [  279.944337] RDX: 0000000000000000 RSI: 00007fff11efd4e0 RDI: 0000000000000003
      [  279.945660] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
      [  279.947003] R10: 00007f21c8008708 R11: 0000000000000246 R12: 0000000000000001
      [  279.948345] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400
      [  279.949690]  </TASK>
      
      [  279.950706] Allocated by task 2960:
      [  279.951471]  kasan_save_stack+0x1e/0x40
      [  279.952338]  kasan_set_track+0x21/0x30
      [  279.953165]  __kasan_kmalloc+0x77/0x90
      [  279.954006]  flow_block_cb_setup_simple+0x3dd/0x7c0
      [  279.955001]  tcf_block_offload_cmd.isra.0+0x189/0x2d0
      [  279.956020]  tcf_block_get_ext+0x61c/0x1200
      [  279.956881]  ingress_init+0x112/0x1c0 [sch_ingress]
      [  279.957873]  qdisc_create+0x401/0xea0
      [  279.958656]  tc_modify_qdisc+0x6f7/0x16d0
      [  279.959506]  rtnetlink_rcv_msg+0x5fe/0x9d0
      [  279.960392]  netlink_rcv_skb+0x12c/0x360
      [  279.961216]  netlink_unicast+0x553/0x790
      [  279.962044]  netlink_sendmsg+0x7a1/0xcb0
      [  279.962906]  sock_sendmsg+0xc5/0x190
      [  279.963702]  ____sys_sendmsg+0x535/0x6b0
      [  279.964534]  ___sys_sendmsg+0xeb/0x170
      [  279.965343]  __sys_sendmsg+0xb5/0x140
      [  279.966132]  do_syscall_64+0x3d/0x90
      [  279.966908]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      [  279.968407] Freed by task 2960:
      [  279.969114]  kasan_save_stack+0x1e/0x40
      [  279.969929]  kasan_set_track+0x21/0x30
      [  279.970729]  kasan_save_free_info+0x2a/0x40
      [  279.971603]  ____kasan_slab_free+0x11a/0x1b0
      [  279.972483]  __kmem_cache_free+0x14d/0x280
      [  279.973337]  tcf_block_setup+0x29d/0x6b0
      [  279.974173]  tcf_block_offload_cmd.isra.0+0x226/0x2d0
      [  279.975186]  tcf_block_get_ext+0x61c/0x1200
      [  279.976080]  ingress_init+0x112/0x1c0 [sch_ingress]
      [  279.977065]  qdisc_create+0x401/0xea0
      [  279.977857]  tc_modify_qdisc+0x6f7/0x16d0
      [  279.978695]  rtnetlink_rcv_msg+0x5fe/0x9d0
      [  279.979562]  netlink_rcv_skb+0x12c/0x360
      [  279.980388]  netlink_unicast+0x553/0x790
      [  279.981214]  netlink_sendmsg+0x7a1/0xcb0
      [  279.982043]  sock_sendmsg+0xc5/0x190
      [  279.982827]  ____sys_sendmsg+0x535/0x6b0
      [  279.983703]  ___sys_sendmsg+0xeb/0x170
      [  279.984510]  __sys_sendmsg+0xb5/0x140
      [  279.985298]  do_syscall_64+0x3d/0x90
      [  279.986076]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      [  279.987532] The buggy address belongs to the object at ffff888147e2bf00
                      which belongs to the cache kmalloc-192 of size 192
      [  279.989747] The buggy address is located 32 bytes inside of
                      freed 192-byte region [ffff888147e2bf00, ffff888147e2bfc0)
      
      [  279.992367] The buggy address belongs to the physical page:
      [  279.993430] page:00000000550f405c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147e2a
      [  279.995182] head:00000000550f405c order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      [  279.996713] anon flags: 0x200000000010200(slab|head|node=0|zone=2)
      [  279.997878] raw: 0200000000010200 ffff888100042a00 0000000000000000 dead000000000001
      [  279.999384] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
      [  280.000894] page dumped because: kasan: bad access detected
      
      [  280.002386] Memory state around the buggy address:
      [  280.003338]  ffff888147e2be00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  280.004781]  ffff888147e2be80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [  280.006224] >ffff888147e2bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  280.007700]                                ^
      [  280.008592]  ffff888147e2bf80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [  280.010035]  ffff888147e2c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  280.011564] ==================================================================
      
      Fixes: 59094b1e ("net: sched: use flow block API")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da94a778
    • Eric Dumazet's avatar
      tcp: fix skb_copy_ubufs() vs BIG TCP · 7e692df3
      Eric Dumazet authored
      David Ahern reported crashes in skb_copy_ubufs() caused by TCP tx zerocopy
      using hugepages, and skb length bigger than ~68 KB.
      
      skb_copy_ubufs() assumed it could copy all payload using up to
      MAX_SKB_FRAGS order-0 pages.
      
      This assumption broke when BIG TCP was able to put up to 512 KB per skb.
      
      We did not hit this bug at Google because we use CONFIG_MAX_SKB_FRAGS=45
      and limit gso_max_size to 180000.
      
      A solution is to use higher order pages if needed.
      
      v2: add missing __GFP_COMP, or we leak memory.
      
      Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536")
      Reported-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/netdev/c70000f6-baa4-4a05-46d0-4b3e0dc1ccc8@gmail.com/T/Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Coco Li <lixiaoyan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e692df3
    • Cosmo Chou's avatar
      net/ncsi: clear Tx enable mode when handling a Config required AEN · 6f75cd16
      Cosmo Chou authored
      ncsi_channel_is_tx() determines whether a given channel should be
      used for Tx or not. However, when reconfiguring the channel by
      handling a Configuration Required AEN, there is a misjudgment that
      the channel Tx has already been enabled, which results in the Enable
      Channel Network Tx command not being sent.
      
      Clear the channel Tx enable flag before reconfiguring the channel to
      avoid the misjudgment.
      
      Fixes: 8d951a75 ("net/ncsi: Configure multi-package, multi-channel modes with failover")
      Signed-off-by: default avatarCosmo Chou <chou.cosmo@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f75cd16
  3. 27 Apr, 2023 14 commits
    • Paolo Abeni's avatar
      Merge branch 'macsec-fixes-for-cn10kb' · 075cafff
      Paolo Abeni authored
      Geetha sowjanya says:
      
      ====================
      Macsec fixes for CN10KB
      
      This patch set has fixes for the issues encountered while
      testing macsec on CN10KB silicon. Below is the description
      of patches:
      
      Patch 1: For each LMAC two MCSX_MCS_TOP_SLAVE_CHANNEL_CFG registers exist
      	 in CN10KB. Bypass has to be disabled in two registers.
      
      Patch 2: Add workaround for errata w.r.t accessing TCAM DATA and MASK registers.
      
      Patch 3: Fixes the parser configuration to allow PTP traffic.
      
      Patch 4: Addresses the IP vector and block level interrupt mask changes.
      
      Patch 5: Fix NULL pointer crashes when rebooting
      
      Patch 6: Since MCS is global block shared by all LMACS the TCAM match
      	 must include macsec DMAC also to distinguish each macsec interface
      
      Patch 7: Before freeing MCS hardware resource to AF clear the stats also.
      
      Patch 8: Stats which share single counter in hardware are tracked in software.
      	 This tracking was based on wrong secy mode params.
      	 Use correct secy mode params
      
      Patch 9: When updating secy mode params, PN number was also reset to
      	 initial values. Hence do not write to PN value register when
      	 updating secy.
      ====================
      
      Link: https://lore.kernel.org/r/20230426062528.20575-1-gakula@marvell.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      075cafff
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Do not reset PN while updating secy · 3c99bace
      Subbaraya Sundeep authored
      After creating SecYs, SCs and SAs a SecY can be modified
      to change attributes like validation mode, protect frames
      mode etc. During this SecY update, packet number is reset to
      initial user given value by mistake. Hence do not reset
      PN when updating SecY parameters.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3c99bace
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Fix shared counters logic · 9bdfe610
      Subbaraya Sundeep authored
      Macsec stats like InPktsLate and InPktsDelayed share
      same counter in hardware. If SecY replay_protect is true
      then counter represents InPktsLate otherwise InPktsDelayed.
      This mode change was tracked based on protect_frames
      instead of replay_protect mistakenly. Similarly InPktsUnchecked
      and InPktsOk share same counter and mode change was tracked
      based on validate_check instead of validate_disabled.
      This patch fixes those problems.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9bdfe610
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Clear stats before freeing resource · 815debbb
      Subbaraya Sundeep authored
      When freeing MCS hardware resources like SecY, SC and
      SA the corresponding stats needs to be cleared. Otherwise
      previous stats are shown in newly created macsec interfaces.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      815debbb
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Match macsec ethertype along with DMAC · 57d00d43
      Subbaraya Sundeep authored
      On CN10KB silicon a single hardware macsec block is
      present and offloads macsec operations for all the
      ethernet LMACs. TCAM match with macsec ethertype 0x88e5
      alone at RX side is not sufficient to distinguish all the
      macsec interfaces created on top of netdevs. Hence append
      the DMAC of the macsec interface too. Otherwise the first
      created macsec interface only receives all the macsec traffic.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      57d00d43
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Fix NULL pointer dereferences · 699af748
      Subbaraya Sundeep authored
      When system is rebooted after creating macsec interface
      below NULL pointer dereference crashes occurred. This
      patch fixes those crashes by using correct order of teardown
      
      [ 3324.406942] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [ 3324.415726] Mem abort info:
      [ 3324.418510]   ESR = 0x96000006
      [ 3324.421557]   EC = 0x25: DABT (current EL), IL = 32 bits
      [ 3324.426865]   SET = 0, FnV = 0
      [ 3324.429913]   EA = 0, S1PTW = 0
      [ 3324.433047] Data abort info:
      [ 3324.435921]   ISV = 0, ISS = 0x00000006
      [ 3324.439748]   CM = 0, WnR = 0
      ....
      [ 3324.575915] Call trace:
      [ 3324.578353]  cn10k_mdo_del_secy+0x24/0x180
      [ 3324.582440]  macsec_common_dellink+0xec/0x120
      [ 3324.586788]  macsec_notify+0x17c/0x1c0
      [ 3324.590529]  raw_notifier_call_chain+0x50/0x70
      [ 3324.594965]  call_netdevice_notifiers_info+0x34/0x7c
      [ 3324.599921]  rollback_registered_many+0x354/0x5bc
      [ 3324.604616]  unregister_netdevice_queue+0x88/0x10c
      [ 3324.609399]  unregister_netdev+0x20/0x30
      [ 3324.613313]  otx2_remove+0x8c/0x310
      [ 3324.616794]  pci_device_shutdown+0x30/0x70
      [ 3324.620882]  device_shutdown+0x11c/0x204
      
      [  966.664930] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  966.673712] Mem abort info:
      [  966.676497]   ESR = 0x96000006
      [  966.679543]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  966.684848]   SET = 0, FnV = 0
      [  966.687895]   EA = 0, S1PTW = 0
      [  966.691028] Data abort info:
      [  966.693900]   ISV = 0, ISS = 0x00000006
      [  966.697729]   CM = 0, WnR = 0
      [  966.833467] Call trace:
      [  966.835904]  cn10k_mdo_stop+0x20/0xa0
      [  966.839557]  macsec_dev_stop+0xe8/0x11c
      [  966.843384]  __dev_close_many+0xbc/0x140
      [  966.847298]  dev_close_many+0x84/0x120
      [  966.851039]  rollback_registered_many+0x114/0x5bc
      [  966.855735]  unregister_netdevice_many.part.0+0x14/0xa0
      [  966.860952]  unregister_netdevice_many+0x18/0x24
      [  966.865560]  macsec_notify+0x1ac/0x1c0
      [  966.869303]  raw_notifier_call_chain+0x50/0x70
      [  966.873738]  call_netdevice_notifiers_info+0x34/0x7c
      [  966.878694]  rollback_registered_many+0x354/0x5bc
      [  966.883390]  unregister_netdevice_queue+0x88/0x10c
      [  966.888173]  unregister_netdev+0x20/0x30
      [  966.892090]  otx2_remove+0x8c/0x310
      [  966.895571]  pci_device_shutdown+0x30/0x70
      [  966.899660]  device_shutdown+0x11c/0x204
      [  966.903574]  __do_sys_reboot+0x208/0x290
      [  966.907487]  __arm64_sys_reboot+0x20/0x30
      [  966.911489]  el0_svc_handler+0x80/0x1c0
      [  966.915316]  el0_svc+0x8/0x180
      [  966.918362] Code: f9400000 f9400a64 91220014 f94b3403 (f9400060)
      [  966.924448] ---[ end trace 341778e799c3d8d7 ]---
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      699af748
    • Geetha sowjanya's avatar
      octeontx2-af: mcs: Fix MCS block interrupt · b8aebeaa
      Geetha sowjanya authored
      On CN10KB, MCS IP vector number, BBE and PAB interrupt mask
      got changed to support more block level interrupts.
      To address this changes, this patch fixes the bbe and pab
      interrupt handlers.
      
      Fixes: 6c635f78 ("octeontx2-af: cn10k: mcs: Handle MCS block interrupts")
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b8aebeaa
    • Geetha sowjanya's avatar
      octeontx2-af: mcs: Config parser to skip 8B header · 65cdc2b6
      Geetha sowjanya authored
      When ptp timestamp is enabled in RPM, RPM will append 8B
      timestamp header for all RX traffic. MCS need to skip these
      8 bytes header while parsing the packet header, so that
      correct tcam key is created for lookup.
      This patch fixes the mcs parser configuration to skip this
      8B header for ptp packets.
      
      Fixes: ca7f49ff ("octeontx2-af: cn10k: Introduce driver for macsec block.")
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      65cdc2b6
    • Subbaraya Sundeep's avatar
      octeontx2-af: mcs: Write TCAM_DATA and TCAM_MASK registers at once · b5161219
      Subbaraya Sundeep authored
      As per hardware errata on CN10KB, all the four TCAM_DATA
      and TCAM_MASK registers has to be written at once otherwise
      write to individual registers will fail. Hence write to all
      TCAM_DATA registers and then to all TCAM_MASK registers.
      
      Fixes: cfc14181 ("octeontx2-af: cn10k: mcs: Manage the MCS block hardware resources")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b5161219
    • Geetha sowjanya's avatar
      octeonxt2-af: mcs: Fix per port bypass config · c222b292
      Geetha sowjanya authored
      For each lmac port, MCS has two MCS_TOP_SLAVE_CHANNEL_CONFIGX
      registers. For CN10KB both register need to be configured for the
      port level mcs bypass to work. This patch also sets bitmap
      of flowid/secy entry reserved for default bypass so that these
      entries can be shown in debugfs.
      
      Fixes: bd69476e ("octeontx2-af: cn10k: mcs: Install a default TCAM for normal traffic")
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c222b292
    • John Hickey's avatar
      ixgbe: Fix panic during XDP_TX with > 64 CPUs · c23ae509
      John Hickey authored
      Commit 4fe81585 ("ixgbe: let the xdpdrv work with more than 64 cpus")
      adds support to allow XDP programs to run on systems with more than
      64 CPUs by locking the XDP TX rings and indexing them using cpu % 64
      (IXGBE_MAX_XDP_QS).
      
      Upon trying this out patch on a system with more than 64 cores,
      the kernel paniced with an array-index-out-of-bounds at the return in
      ixgbe_determine_xdp_ring in ixgbe.h, which means ixgbe_determine_xdp_q_idx
      was just returning the cpu instead of cpu % IXGBE_MAX_XDP_QS.  An example
      splat:
      
       ==========================================================================
       UBSAN: array-index-out-of-bounds in
       /var/lib/dkms/ixgbe/5.18.6+focal-1/build/src/ixgbe.h:1147:26
       index 65 is out of range for type 'ixgbe_ring *[64]'
       ==========================================================================
       BUG: kernel NULL pointer dereference, address: 0000000000000058
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP NOPTI
       CPU: 65 PID: 408 Comm: ksoftirqd/65
       Tainted: G          IOE     5.15.0-48-generic #54~20.04.1-Ubuntu
       Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.5.4 01/13/2020
       RIP: 0010:ixgbe_xmit_xdp_ring+0x1b/0x1c0 [ixgbe]
       Code: 3b 52 d4 cf e9 42 f2 ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 b9
       00 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 <44> 0f b7
       47 58 0f b7 47 5a 0f b7 57 54 44 0f b7 76 08 66 41 39 c0
       RSP: 0018:ffffbc3fcd88fcb0 EFLAGS: 00010282
       RAX: ffff92a253260980 RBX: ffffbc3fe68b00a0 RCX: 0000000000000000
       RDX: ffff928b5f659000 RSI: ffff928b5f659000 RDI: 0000000000000000
       RBP: ffffbc3fcd88fce0 R08: ffff92b9dfc20580 R09: 0000000000000001
       R10: 3d3d3d3d3d3d3d3d R11: 3d3d3d3d3d3d3d3d R12: 0000000000000000
       R13: ffff928b2f0fa8c0 R14: ffff928b9be20050 R15: 000000000000003c
       FS:  0000000000000000(0000) GS:ffff92b9dfc00000(0000)
       knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000058 CR3: 000000011dd6a002 CR4: 00000000007706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       PKRU: 55555554
       Call Trace:
        <TASK>
        ixgbe_poll+0x103e/0x1280 [ixgbe]
        ? sched_clock_cpu+0x12/0xe0
        __napi_poll+0x30/0x160
        net_rx_action+0x11c/0x270
        __do_softirq+0xda/0x2ee
        run_ksoftirqd+0x2f/0x50
        smpboot_thread_fn+0xb7/0x150
        ? sort_range+0x30/0x30
        kthread+0x127/0x150
        ? set_kthread_struct+0x50/0x50
        ret_from_fork+0x1f/0x30
        </TASK>
      
      I think this is how it happens:
      
      Upon loading the first XDP program on a system with more than 64 CPUs,
      ixgbe_xdp_locking_key is incremented in ixgbe_xdp_setup.  However,
      immediately after this, the rings are reconfigured by ixgbe_setup_tc.
      ixgbe_setup_tc calls ixgbe_clear_interrupt_scheme which calls
      ixgbe_free_q_vectors which calls ixgbe_free_q_vector in a loop.
      ixgbe_free_q_vector decrements ixgbe_xdp_locking_key once per call if
      it is non-zero.  Commenting out the decrement in ixgbe_free_q_vector
      stopped my system from panicing.
      
      I suspect to make the original patch work, I would need to load an XDP
      program and then replace it in order to get ixgbe_xdp_locking_key back
      above 0 since ixgbe_setup_tc is only called when transitioning between
      XDP and non-XDP ring configurations, while ixgbe_xdp_locking_key is
      incremented every time ixgbe_xdp_setup is called.
      
      Also, ixgbe_setup_tc can be called via ethtool --set-channels, so this
      becomes another path to decrement ixgbe_xdp_locking_key to 0 on systems
      with more than 64 CPUs.
      
      Since ixgbe_xdp_locking_key only protects the XDP_TX path and is tied
      to the number of CPUs present, there is no reason to disable it upon
      unloading an XDP program.  To avoid confusion, I have moved enabling
      ixgbe_xdp_locking_key into ixgbe_sw_init, which is part of the probe path.
      
      Fixes: 4fe81585 ("ixgbe: let the xdpdrv work with more than 64 cpus")
      Signed-off-by: default avatarJohn Hickey <jjh@daedalian.us>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20230425170308.2522429-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c23ae509
    • Pedro Tammela's avatar
      net/sched: act_pedit: free pedit keys on bail from offset check · 1b483d9f
      Pedro Tammela authored
      Ido Schimmel reports a memleak on a syzkaller instance:
         BUG: memory leak
         unreferenced object 0xffff88803d45e400 (size 1024):
           comm "syz-executor292", pid 563, jiffies 4295025223 (age 51.781s)
           hex dump (first 32 bytes):
             28 bd 70 00 fb db df 25 02 00 14 1f ff 02 00 02  (.p....%........
             00 32 00 00 1f 00 00 00 ac 14 14 3e 08 00 07 00  .2.........>....
           backtrace:
             [<ffffffff81bd0f2c>] kmemleak_alloc_recursive include/linux/kmemleak.h:42 [inline]
             [<ffffffff81bd0f2c>] slab_post_alloc_hook mm/slab.h:772 [inline]
             [<ffffffff81bd0f2c>] slab_alloc_node mm/slub.c:3452 [inline]
             [<ffffffff81bd0f2c>] __kmem_cache_alloc_node+0x25c/0x320 mm/slub.c:3491
             [<ffffffff81a865d9>] __do_kmalloc_node mm/slab_common.c:966 [inline]
             [<ffffffff81a865d9>] __kmalloc+0x59/0x1a0 mm/slab_common.c:980
             [<ffffffff83aa85c3>] kmalloc include/linux/slab.h:584 [inline]
             [<ffffffff83aa85c3>] tcf_pedit_init+0x793/0x1ae0 net/sched/act_pedit.c:245
             [<ffffffff83a90623>] tcf_action_init_1+0x453/0x6e0 net/sched/act_api.c:1394
             [<ffffffff83a90e58>] tcf_action_init+0x5a8/0x950 net/sched/act_api.c:1459
             [<ffffffff83a96258>] tcf_action_add+0x118/0x4e0 net/sched/act_api.c:1985
             [<ffffffff83a96997>] tc_ctl_action+0x377/0x490 net/sched/act_api.c:2044
             [<ffffffff83920a8d>] rtnetlink_rcv_msg+0x46d/0xd70 net/core/rtnetlink.c:6395
             [<ffffffff83b24305>] netlink_rcv_skb+0x185/0x490 net/netlink/af_netlink.c:2575
             [<ffffffff83901806>] rtnetlink_rcv+0x26/0x30 net/core/rtnetlink.c:6413
             [<ffffffff83b21cae>] netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
             [<ffffffff83b21cae>] netlink_unicast+0x5be/0x8a0 net/netlink/af_netlink.c:1365
             [<ffffffff83b2293f>] netlink_sendmsg+0x9af/0xed0 net/netlink/af_netlink.c:1942
             [<ffffffff8380c39f>] sock_sendmsg_nosec net/socket.c:724 [inline]
             [<ffffffff8380c39f>] sock_sendmsg net/socket.c:747 [inline]
             [<ffffffff8380c39f>] ____sys_sendmsg+0x3ef/0xaa0 net/socket.c:2503
             [<ffffffff838156d2>] ___sys_sendmsg+0x122/0x1c0 net/socket.c:2557
             [<ffffffff8381594f>] __sys_sendmsg+0x11f/0x200 net/socket.c:2586
             [<ffffffff83815ab0>] __do_sys_sendmsg net/socket.c:2595 [inline]
             [<ffffffff83815ab0>] __se_sys_sendmsg net/socket.c:2593 [inline]
             [<ffffffff83815ab0>] __x64_sys_sendmsg+0x80/0xc0 net/socket.c:2593
      
      The recently added static offset check missed a free to the key buffer when
      bailing out on error.
      
      Fixes: e1201bc7 ("net/sched: act_pedit: check static offsets a priori")
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20230425144725.669262-1-pctammela@mojatatu.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1b483d9f
    • Ivan Vecera's avatar
      net/sched: flower: Fix wrong handle assignment during filter change · 32eff6ba
      Ivan Vecera authored
      Commit 08a0063d ("net/sched: flower: Move filter handle initialization
      earlier") moved filter handle initialization but an assignment of
      the handle to fnew->handle is done regardless of fold value. This is wrong
      because if fold != NULL (so fold->handle == handle) no new handle is
      allocated and passed handle is assigned to fnew->handle. Then if any
      subsequent action in fl_change() fails then the handle value is
      removed from IDR that is incorrect as we will have still valid old filter
      instance with handle that is not present in IDR.
      Fix this issue by moving the assignment so it is done only when passed
      fold == NULL.
      
      Prior the patch:
      [root@machine tc-testing]# ./tdc.py -d enp1s0f0np0 -e 14be
      Test 14be: Concurrently replace same range of 100k flower filters from 10 tc instances
      exit: 123
      exit: 0
      RTNETLINK answers: Invalid argument
      We have an error talking to the kernel
      Command failed tmp/replace_6:1885
      
      All test results:
      
      1..1
      not ok 1 14be - Concurrently replace same range of 100k flower filters from 10 tc instances
              Command exited with 123, expected 0
      RTNETLINK answers: Invalid argument
      We have an error talking to the kernel
      Command failed tmp/replace_6:1885
      
      After the patch:
      [root@machine tc-testing]# ./tdc.py -d enp1s0f0np0 -e 14be
      Test 14be: Concurrently replace same range of 100k flower filters from 10 tc instances
      
      All test results:
      
      1..1
      ok 1 14be - Concurrently replace same range of 100k flower filters from 10 tc instances
      
      Fixes: 08a0063d ("net/sched: flower: Move filter handle initialization earlier")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230425140604.169881-1-ivecera@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      32eff6ba
    • David Howells's avatar
      rxrpc: Fix potential data race in rxrpc_wait_to_be_connected() · 2b5fdc0f
      David Howells authored
      Inside the loop in rxrpc_wait_to_be_connected() it checks call->error to
      see if it should exit the loop without first checking the call state.  This
      is probably safe as if call->error is set, the call is dead anyway, but we
      should probably wait for the call state to have been set to completion
      first, lest it cause surprise on the way out.
      
      Fix this by only accessing call->error if the call is complete.  We don't
      actually need to access the error inside the loop as we'll do that after.
      
      This caused the following report:
      
          BUG: KCSAN: data-race in rxrpc_send_data / rxrpc_set_call_completion
      
          write to 0xffff888159cf3c50 of 4 bytes by task 25673 on cpu 1:
           rxrpc_set_call_completion+0x71/0x1c0 net/rxrpc/call_state.c:22
           rxrpc_send_data_packet+0xba9/0x1650 net/rxrpc/output.c:479
           rxrpc_transmit_one+0x1e/0x130 net/rxrpc/output.c:714
           rxrpc_decant_prepared_tx net/rxrpc/call_event.c:326 [inline]
           rxrpc_transmit_some_data+0x496/0x600 net/rxrpc/call_event.c:350
           rxrpc_input_call_event+0x564/0x1220 net/rxrpc/call_event.c:464
           rxrpc_io_thread+0x307/0x1d80 net/rxrpc/io_thread.c:461
           kthread+0x1ac/0x1e0 kernel/kthread.c:376
           ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
          read to 0xffff888159cf3c50 of 4 bytes by task 25672 on cpu 0:
           rxrpc_send_data+0x29e/0x1950 net/rxrpc/sendmsg.c:296
           rxrpc_do_sendmsg+0xb7a/0xc20 net/rxrpc/sendmsg.c:726
           rxrpc_sendmsg+0x413/0x520 net/rxrpc/af_rxrpc.c:565
           sock_sendmsg_nosec net/socket.c:724 [inline]
           sock_sendmsg net/socket.c:747 [inline]
           ____sys_sendmsg+0x375/0x4c0 net/socket.c:2501
           ___sys_sendmsg net/socket.c:2555 [inline]
           __sys_sendmmsg+0x263/0x500 net/socket.c:2641
           __do_sys_sendmmsg net/socket.c:2670 [inline]
           __se_sys_sendmmsg net/socket.c:2667 [inline]
           __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2667
           do_syscall_x64 arch/x86/entry/common.c:50 [inline]
           do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
           entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
          value changed: 0x00000000 -> 0xffffffea
      
      Fixes: 9d35d880 ("rxrpc: Move client call connection to the I/O thread")
      Reported-by: syzbot+ebc945fdb4acd72cba78@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/r/000000000000e7c6d205fa10a3cd@google.com/Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: Dmitry Vyukov <dvyukov@google.com>
      cc: "David S. Miller" <davem@davemloft.net>
      cc: Eric Dumazet <edumazet@google.com>
      cc: Jakub Kicinski <kuba@kernel.org>
      cc: Paolo Abeni <pabeni@redhat.com>
      cc: linux-afs@lists.infradead.org
      cc: linux-fsdevel@vger.kernel.org
      cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/r/508133.1682427395@warthog.procyon.org.ukSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2b5fdc0f
  4. 26 Apr, 2023 13 commits
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 6e98b09d
      Linus Torvalds authored
      Pull networking updates from Paolo Abeni:
       "Core:
      
         - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
           default value allows for better BIG TCP performances
      
         - Reduce compound page head access for zero-copy data transfers
      
         - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
           possible
      
         - Threaded NAPI improvements, adding defer skb free support and
           unneeded softirq avoidance
      
         - Address dst_entry reference count scalability issues, via false
           sharing avoidance and optimize refcount tracking
      
         - Add lockless accesses annotation to sk_err[_soft]
      
         - Optimize again the skb struct layout
      
         - Extends the skb drop reasons to make it usable by multiple
           subsystems
      
         - Better const qualifier awareness for socket casts
      
        BPF:
      
         - Add skb and XDP typed dynptrs which allow BPF programs for more
           ergonomic and less brittle iteration through data and
           variable-sized accesses
      
         - Add a new BPF netfilter program type and minimal support to hook
           BPF programs to netfilter hooks such as prerouting or forward
      
         - Add more precise memory usage reporting for all BPF map types
      
         - Adds support for using {FOU,GUE} encap with an ipip device
           operating in collect_md mode and add a set of BPF kfuncs for
           controlling encap params
      
         - Allow BPF programs to detect at load time whether a particular
           kfunc exists or not, and also add support for this in light
           skeleton
      
         - Bigger batch of BPF verifier improvements to prepare for upcoming
           BPF open-coded iterators allowing for less restrictive looping
           capabilities
      
         - Rework RCU enforcement in the verifier, add kptr_rcu and enforce
           BPF programs to NULL-check before passing such pointers into kfunc
      
         - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
           in local storage maps
      
         - Enable RCU semantics for task BPF kptrs and allow referenced kptr
           tasks to be stored in BPF maps
      
         - Add support for refcounted local kptrs to the verifier for allowing
           shared ownership, useful for adding a node to both the BPF list and
           rbtree
      
         - Add BPF verifier support for ST instructions in
           convert_ctx_access() which will help new -mcpu=v4 clang flag to
           start emitting them
      
         - Add ARM32 USDT support to libbpf
      
         - Improve bpftool's visual program dump which produces the control
           flow graph in a DOT format by adding C source inline annotations
      
        Protocols:
      
         - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
           indicates the provenance of the IP address
      
         - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
      
         - Add the handshake upcall mechanism, allowing the user-space to
           implement generic TLS handshake on kernel's behalf
      
         - Bridge: support per-{Port, VLAN} neighbor suppression, increasing
           resilience to nodes failures
      
         - SCTP: add support for Fair Capacity and Weighted Fair Queueing
           schedulers
      
         - MPTCP: delay first subflow allocation up to its first usage. This
           will allow for later better LSM interaction
      
         - xfrm: Remove inner/outer modes from input/output path. These are
           not needed anymore
      
         - WiFi:
            - reduced neighbor report (RNR) handling for AP mode
            - HW timestamping support
            - support for randomized auth/deauth TA for PASN privacy
            - per-link debugfs for multi-link
            - TC offload support for mac80211 drivers
            - mac80211 mesh fast-xmit and fast-rx support
            - enable Wi-Fi 7 (EHT) mesh support
      
        Netfilter:
      
         - Add nf_tables 'brouting' support, to force a packet to be routed
           instead of being bridged
      
         - Update bridge netfilter and ovs conntrack helpers to handle IPv6
           Jumbo packets properly, i.e. fetch the packet length from
           hop-by-hop extension header. This is needed for BIT TCP support
      
         - The iptables 32bit compat interface isn't compiled in by default
           anymore
      
         - Move ip(6)tables builtin icmp matches to the udptcp one. This has
           the advantage that icmp/icmpv6 match doesn't load the
           iptables/ip6tables modules anymore when iptables-nft is used
      
         - Extended netlink error report for netdevice in flowtables and
           netdev/chains. Allow for incrementally add/delete devices to netdev
           basechain. Allow to create netdev chain without device
      
        Driver API:
      
         - Remove redundant Device Control Error Reporting Enable, as PCI core
           has already error reporting enabled at enumeration time
      
         - Move Multicast DB netlink handlers to core, allowing devices other
           then bridge to use them
      
         - Allow the page_pool to directly recycle the pages from safely
           localized NAPI
      
         - Implement lockless TX queue stop/wake combo macros, allowing for
           further code de-duplication and sanitization
      
         - Add YNL support for user headers and struct attrs
      
         - Add partial YNL specification for devlink
      
         - Add partial YNL specification for ethtool
      
         - Add tc-mqprio and tc-taprio support for preemptible traffic classes
      
         - Add tx push buf len param to ethtool, specifies the maximum number
           of bytes of a transmitted packet a driver can push directly to the
           underlying device
      
         - Add basic LED support for switch/phy
      
         - Add NAPI documentation, stop relaying on external links
      
         - Convert dsa_master_ioctl() to netdev notifier. This is a
           preparatory work to make the hardware timestamping layer selectable
           by user space
      
         - Add transceiver support and improve the error messages for CAN-FD
           controllers
      
        New hardware / drivers:
      
         - Ethernet:
            - AMD/Pensando core device support
            - MediaTek MT7981 SoC
            - MediaTek MT7988 SoC
            - Broadcom BCM53134 embedded switch
            - Texas Instruments CPSW9G ethernet switch
            - Qualcomm EMAC3 DWMAC ethernet
            - StarFive JH7110 SoC
            - NXP CBTX ethernet PHY
      
         - WiFi:
            - Apple M1 Pro/Max devices
            - RealTek rtl8710bu/rtl8188gu
            - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
      
         - Bluetooth:
            - Realtek RTL8821CS, RTL8851B, RTL8852BS
            - Mediatek MT7663, MT7922
            - NXP w8997
            - Actions Semi ATS2851
            - QTI WCN6855
            - Marvell 88W8997
      
         - Can:
            - STMicroelectronics bxcan stm32f429
      
        Drivers:
      
         - Ethernet NICs:
            - Intel (1G, icg):
               - add tracking and reporting of QBV config errors
               - add support for configuring max SDU for each Tx queue
            - Intel (100G, ice):
               - refactor mailbox overflow detection to support Scalable IOV
               - GNSS interface optimization
            - Intel (i40e):
               - support XDP multi-buffer
            - nVidia/Mellanox:
               - add the support for linux bridge multicast offload
               - enable TC offload for egress and engress MACVLAN over bond
               - add support for VxLAN GBP encap/decap flows offload
               - extend packet offload to fully support libreswan
               - support tunnel mode in mlx5 IPsec packet offload
               - extend XDP multi-buffer support
               - support MACsec VLAN offload
               - add support for dynamic msix vectors allocation
               - drop RX page_cache and fully use page_pool
               - implement thermal zone to report NIC temperature
            - Netronome/Corigine:
               - add support for multi-zone conntrack offload
            - Solarflare/Xilinx:
               - support offloading TC VLAN push/pop actions to the MAE
               - support TC decap rules
               - support unicast PTP
      
         - Other NICs:
            - Broadcom (bnxt): enforce software based freq adjustments only on
              shared PHC NIC
            - RealTek (r8169): refactor to addess ASPM issues during NAPI poll
            - Micrel (lan8841): add support for PTP_PF_PEROUT
            - Cadence (macb): enable PTP unicast
            - Engleder (tsnep): add XDP socket zero-copy support
            - virtio-net: implement exact header length guest feature
            - veth: add page_pool support for page recycling
            - vxlan: add MDB data path support
            - gve: add XDP support for GQI-QPL format
            - geneve: accept every ethertype
            - macvlan: allow some packets to bypass broadcast queue
            - mana: add support for jumbo frame
      
         - Ethernet high-speed switches:
            - Microchip (sparx5): Add support for TC flower templates
      
         - Ethernet embedded switches:
            - Broadcom (b54):
               - configure 6318 and 63268 RGMII ports
            - Marvell (mv88e6xxx):
               - faster C45 bus scan
            - Microchip:
               - lan966x:
                  - add support for IS1 VCAP
                  - better TX/RX from/to CPU performances
               - ksz9477: add ETS Qdisc support
               - ksz8: enhance static MAC table operations and error handling
               - sama7g5: add PTP capability
            - NXP (ocelot):
               - add support for external ports
               - add support for preemptible traffic classes
            - Texas Instruments:
               - add CPSWxG SGMII support for J7200 and J721E
      
         - Intel WiFi (iwlwifi):
            - preparation for Wi-Fi 7 EHT and multi-link support
            - EHT (Wi-Fi 7) sniffer support
            - hardware timestamping support for some devices/firwmares
            - TX beacon protection on newer hardware
      
         - Qualcomm 802.11ax WiFi (ath11k):
            - MU-MIMO parameters support
            - ack signal support for management packets
      
         - RealTek WiFi (rtw88):
            - SDIO bus support
            - better support for some SDIO devices (e.g. MAC address from
              efuse)
      
         - RealTek WiFi (rtw89):
            - HW scan support for 8852b
            - better support for 6 GHz scanning
            - support for various newer firmware APIs
            - framework firmware backwards compatibility
      
         - MediaTek WiFi (mt76):
            - P2P support
            - mesh A-MSDU support
            - EHT (Wi-Fi 7) support
            - coredump support"
      
      * tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
        net: phy: hide the PHYLIB_LEDS knob
        net: phy: marvell-88x2222: remove unnecessary (void*) conversions
        tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
        net: amd: Fix link leak when verifying config failed
        net: phy: marvell: Fix inconsistent indenting in led_blink_set
        lan966x: Don't use xdp_frame when action is XDP_TX
        tsnep: Add XDP socket zero-copy TX support
        tsnep: Add XDP socket zero-copy RX support
        tsnep: Move skb receive action to separate function
        tsnep: Add functions for queue enable/disable
        tsnep: Rework TX/RX queue initialization
        tsnep: Replace modulo operation with mask
        net: phy: dp83867: Add led_brightness_set support
        net: phy: Fix reading LED reg property
        drivers: nfc: nfcsim: remove return value check of `dev_dir`
        net: phy: dp83867: Remove unnecessary (void*) conversions
        net: ethtool: coalesce: try to make user settings stick twice
        net: mana: Check if netdev/napi_alloc_frag returns single page
        net: mana: Rename mana_refill_rxoob and remove some empty lines
        net: veth: add page_pool stats
        ...
      6e98b09d
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · b68ee1c6
      Linus Torvalds authored
      Pull SCSI updates from James Bottomley:
       "Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target,
        mpi3mr, hisi_sas, arcmsr).
      
        The major core change is the constification of the host templates
        (which touches everything) along with other minor fixups and clean
        ups"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
        scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command()
        scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately
        scsi: cxlflash: s/semahpore/semaphore/
        scsi: lpfc: Silence an incorrect device output
        scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation
        scsi: scsi_debug: Fix missing error code in scsi_debug_init()
        scsi: hisi_sas: Work around build failure in suspend function
        scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()
        scsi: mpt3sas: Fix an issue when driver is being removed
        scsi: mpt3sas: Remove HBA BIOS version in the kernel log
        scsi: target: core: Fix invalid memory access
        scsi: scsi_debug: Drop sdebug_queue
        scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts
        scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store()
        scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued()
        scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll()
        scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd
        scsi: scsi_debug: Use scsi_block_requests() to block queues
        scsi: scsi_debug: Protect block_unblock_all_queues() with mutex
        scsi: scsi_debug: Change shost list lock to a mutex
        ...
      b68ee1c6
    • Linus Torvalds's avatar
      Merge tag 'ata-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 36006b1d
      Linus Torvalds authored
      Pull ata updates from Damien Le Moal:
      
       - Many cleanups of the pata_parport driver and of its protocol modules
         (Ondrej)
      
       - Remove unused code (ata_id_xxx() functions) (Sergey)
      
       - Add Add UniPhier SATA controller DT bindings (Kunihiko)
      
       - Fix dependencies for the Freescale QorIQ AHCI SATA controller driver
         (Geert)
      
       - DT property handling improvements (Rob)
      
      * tag 'ata-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (57 commits)
        ata: pata_parport-bpck6: Declare mode_map as static
        ata: pata_parport-bpck6: Remove dependency on 64BIT
        ata: pata_parport-bpck6: reduce indents in bpck6_open
        ata: pata_parport-bpck6: delete ppc6lnx.c
        ata: pata_parport-bpck6: move defines and mode_map to bpck6.c
        ata: pata_parport-bpck6: move ppc6_wr_data_byte to bpck6.c and rename
        ata: pata_parport-bpck6: move ppc6_rd_data_byte to bpck6.c and rename
        ata: pata_parport-bpck6: move ppc6_send_cmd to bpck6.c and rename
        ata: pata_parport-bpck6: move ppc6_deselect to bpck6.c and rename
        ata: pata_parport-bpck6: merge ppc6_select into bpck6_open
        ata: pata_parport-bpck6: move ppc6_open to bpck6.c and rename
        ata: pata_parport-bpck6: move ppc6_wr_extout to bpck6.c and rename
        ata: pata_parport-bpck6: move ppc6_wait_for_fifo to bpck6.c and rename
        ata: pata_parport-bpck6: merge ppc6_wr_data_blk into bpck6_write_block
        ata: pata_parport-bpck6: merge ppc6_rd_data_blk into bpck6_read_block
        ata: pata_parport-bpck6: merge ppc6_wr_port16_blk into bpck6_write_block
        ata: pata_parport-bpck6: merge ppc6_rd_port16_blk into bpck6_read_block
        ata: pata_parport-bpck6: merge ppc6_wr_port into bpck6_write_regr
        ata: pata_parport-bpck6: merge ppc6_rd_port into bpck6_read_regr
        ata: pata_parport-bpck6: remove ppc6_close
        ...
      36006b1d
    • Linus Torvalds's avatar
      Merge tag 'for-6.4/dm-changes' of... · 48dc8100
      Linus Torvalds authored
      Merge tag 'for-6.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper updates from Mike Snitzer:
      
       - Split dm-bufio's rw_semaphore and rbtree. Offers improvements to
         dm-bufio's locking to allow increased concurrent IO -- particularly
         for read access for buffers already in dm-bufio's cache.
      
       - Also split dm-bio-prison-v1's spinlock and rbtree with comparable aim
         at improving concurrent IO (for the DM thinp target).
      
       - Both the dm-bufio and dm-bio-prison-v1 scaling of the number of locks
         and rbtrees used are managed by dm_num_hash_locks(). And the hash
         function used by both is dm_hash_locks_index().
      
       - Allow DM targets to require DISCARD, WRITE_ZEROES and SECURE_ERASE to
         be split at the target specified boundary (in terms of
         max_discard_sectors, max_write_zeroes_sectors and
         max_secure_erase_sectors respectively).
      
       - DM verity error handling fix for check_at_most_once on FEC.
      
       - Update DM verity target to emit audit events on verification failure
         and more.
      
       - DM core ->io_hints improvements needed in support of new discard
         support that is added to the DM "zero" and "error" targets.
      
       - Fix missing kmem_cache_destroy() call in initialization error path of
         both the DM integrity and DM clone targets.
      
       - A couple fixes for DM flakey, also add "error_reads" feature.
      
       - Fix DM core's resume to not lock FS when the DM map is NULL;
         otherwise initial table load can race with FS mount that takes
         superblock's ->s_umount rw_semaphore.
      
       - Various small improvements to both DM core and DM targets.
      
      * tag 'for-6.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (40 commits)
        dm: don't lock fs when the map is NULL in process of resume
        dm flakey: add an "error_reads" option
        dm flakey: remove trailing space in the table line
        dm flakey: fix a crash with invalid table line
        dm ioctl: fix nested locking in table_clear() to remove deadlock concern
        dm: unexport dm_get_queue_limits()
        dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE
        dm: add helper macro for simple DM target module init and exit
        dm raid: remove unused d variable
        dm: remove unnecessary (void*) conversions
        dm mirror: add DMERR message if alloc_workqueue fails
        dm: push error reporting down to dm_register_target()
        dm integrity: call kmem_cache_destroy() in dm_integrity_init() error path
        dm clone: call kmem_cache_destroy() in dm_clone_init() error path
        dm error: add discard support
        dm zero: add discard support
        dm table: allow targets without devices to set ->io_hints
        dm verity: emit audit events on verification failure and more
        dm verity: fix error handling for check_at_most_once on FEC
        dm: improve hash_locks sizing and hash function
        ...
      48dc8100
    • Linus Torvalds's avatar
      Merge tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux · 9dd6956b
      Linus Torvalds authored
      Pull block updates from Jens Axboe:
      
       - drbd patches, bringing us closer to unifying the out-of-tree version
         and the in tree one (Andreas, Christoph)
      
       - support for auto-quiesce for the s390 dasd driver (Stefan)
      
       - MD pull request via Song:
            - md/bitmap: Optimal last page size (Jon Derrick)
            - Various raid10 fixes (Yu Kuai, Li Nan)
            - md: add error_handlers for raid0 and linear (Mariusz Tkaczyk)
      
       - NVMe pull request via Christoph:
            - Drop redundant pci_enable_pcie_error_reporting (Bjorn Helgaas)
            - Validate nvmet module parameters (Chaitanya Kulkarni)
            - Fence TCP socket on receive error (Chris Leech)
            - Fix async event trace event (Keith Busch)
            - Minor cleanups (Chaitanya Kulkarni, zhenwei pi)
            - Fix and cleanup nvmet Identify handling (Damien Le Moal,
              Christoph Hellwig)
            - Fix double blk_mq_complete_request race in the timeout handler
              (Lei Yin)
            - Fix irq locking in nvme-fcloop (Ming Lei)
            - Remove queue mapping helper for rdma devices (Sagi Grimberg)
      
       - use structured request attribute checks for nbd (Jakub)
      
       - fix blk-crypto race conditions between keyslot management (Eric)
      
       - add sed-opal support for reading read locking range attributes
         (Ondrej)
      
       - make fault injection configurable for null_blk (Akinobu)
      
       - clean up the request insertion API (Christoph)
      
       - clean up the queue running API (Christoph)
      
       - blkg config helper cleanups (Tejun)
      
       - lazy init support for blk-iolatency (Tejun)
      
       - various fixes and tweaks to ublk (Ming)
      
       - remove hybrid polling. It hasn't really been useful since we got
         async polled IO support, and these days we don't support sync polled
         IO at all (Keith)
      
       - misc fixes, cleanups, improvements (Zhong, Ondrej, Colin, Chengming,
         Chaitanya, me)
      
      * tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux: (118 commits)
        nbd: fix incomplete validation of ioctl arg
        ublk: don't return 0 in case of any failure
        sed-opal: geometry feature reporting command
        null_blk: Always check queue mode setting from configfs
        block: ublk: switch to ioctl command encoding
        blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush
        block, bfq: Fix division by zero error on zero wsum
        fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m
        block: store bdev->bd_disk->fops->submit_bio state in bdev
        block: re-arrange the struct block_device fields for better layout
        md/raid5: remove unused working_disks variable
        md/raid10: don't call bio_start_io_acct twice for bio which experienced read error
        md/raid10: fix memleak of md thread
        md/raid10: fix memleak for 'conf->bio_split'
        md/raid10: fix leak of 'r10bio->remaining' for recovery
        md/raid10: don't BUG_ON() in raise_barrier()
        md: fix soft lockup in status_resync
        md: add error_handlers for raid0 and linear
        md: Use optimal I/O size for last bitmap page
        md: Fix types in sb writer
        ...
      9dd6956b
    • Linus Torvalds's avatar
      Merge tag 'for-6.4/io_uring-2023-04-21' of git://git.kernel.dk/linux · 5b9a7bb7
      Linus Torvalds authored
      Pull io_uring updates from Jens Axboe:
      
       - Cleanup of the io-wq per-node mapping, notably getting rid of it so
         we just have a single io_wq entry per ring (Breno)
      
       - Followup to the above, move accounting to io_wq as well and
         completely drop struct io_wqe (Gabriel)
      
       - Enable KASAN for the internal io_uring caches (Breno)
      
       - Add support for multishot timeouts. Some applications use timeouts to
         wake someone waiting on completion entries, and this makes it a bit
         easier to just have a recurring timer rather than needing to rearm it
         every time (David)
      
       - Support archs that have shared cache coloring between userspace and
         the kernel, and hence have strict address requirements for mmap'ing
         the ring into userspace. This should only be parisc/hppa. (Helge, me)
      
       - XFS has supported O_DIRECT writes without needing to lock the inode
         exclusively for a long time, and ext4 now supports it as well. This
         is true for the common cases of not extending the file size. Flag the
         fs as having that feature, and utilize that to avoid serializing
         those writes in io_uring (me)
      
       - Enable completion batching for uring commands (me)
      
       - Revert patch adding io_uring restriction to what can be GUP mapped or
         not. This does not belong in io_uring, as io_uring isn't really
         special in this regard. Since this is also getting in the way of
         cleanups and improvements to the GUP code, get rid of if (me)
      
       - A few series greatly reducing the complexity of registered resources,
         like buffers or files. Not only does this clean up the code a lot,
         the simplified code is also a LOT more efficient (Pavel)
      
       - Series optimizing how we wait for events and run task_work related to
         it (Pavel)
      
       - Fixes for file/buffer unregistration with DEFER_TASKRUN (Pavel)
      
       - Misc cleanups and improvements (Pavel, me)
      
      * tag 'for-6.4/io_uring-2023-04-21' of git://git.kernel.dk/linux: (71 commits)
        Revert "io_uring/rsrc: disallow multi-source reg buffers"
        io_uring: add support for multishot timeouts
        io_uring/rsrc: disassociate nodes and rsrc_data
        io_uring/rsrc: devirtualise rsrc put callbacks
        io_uring/rsrc: pass node to io_rsrc_put_work()
        io_uring/rsrc: inline io_rsrc_put_work()
        io_uring/rsrc: add empty flag in rsrc_node
        io_uring/rsrc: merge nodes and io_rsrc_put
        io_uring/rsrc: infer node from ctx on io_queue_rsrc_removal
        io_uring/rsrc: remove unused io_rsrc_node::llist
        io_uring/rsrc: refactor io_queue_rsrc_removal
        io_uring/rsrc: simplify single file node switching
        io_uring/rsrc: clean up __io_sqe_buffers_update()
        io_uring/rsrc: inline switch_start fast path
        io_uring/rsrc: remove rsrc_data refs
        io_uring/rsrc: fix DEFER_TASKRUN rsrc quiesce
        io_uring/rsrc: use wq for quiescing
        io_uring/rsrc: refactor io_rsrc_ref_quiesce
        io_uring/rsrc: remove io_rsrc_node::done
        io_uring/rsrc: use nospec'ed indexes
        ...
      5b9a7bb7
    • Linus Torvalds's avatar
      Merge tag 'f2fs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 5c7ecada
      Linus Torvalds authored
      Pull f2fs update from Jaegeuk Kim:
       "In this round, we've mainly modified to support non-power-of-two zone
        size, which is not required for f2fs by design. In order to avoid arch
        dependency, we refactored the messy rb_entry structure shared across
        different extent_cache. In addition to the improvement, we've also
        fixed several subtle bugs and error cases.
      
        Enhancements:
         - support non-power-of-two zone size for zoned device
         - remove sharing the rb_entry structure in extent cache
         - refactor f2fs_gc to call checkpoint in urgent condition
         - support iopoll
      
        Bug fixes:
         - fix potential corruption when moving a directory
         - fix to avoid use-after-free for cached IPU bio
         - fix the folio private usage
         - avoid kernel warnings or panics in the cp_error case
         - fix to recover quota data correctly
         - fix some bugs in atomic operations
         - fix system crash due to lack of free space in LFS
         - fix null pointer panic in tracepoint in __replace_atomic_write_block
         - fix iostat lock protection
         - fix scheduling while atomic in decompression path
         - preserve direct write semantics when buffering is forced
         - fix to call f2fs_wait_on_page_writeback() in f2fs_write_raw_pages()"
      
      * tag 'f2fs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
        f2fs: remove unnessary comment in __may_age_extent_tree
        f2fs: allocate node blocks for atomic write block replacement
        f2fs: use cow inode data when updating atomic write
        f2fs: remove power-of-two limitation of zoned device
        f2fs: allocate trace path buffer from names_cache
        f2fs: add has_enough_free_secs()
        f2fs: relax sanity check if checkpoint is corrupted
        f2fs: refactor f2fs_gc to call checkpoint in urgent condition
        f2fs: remove folio_detach_private() in .invalidate_folio and .release_folio
        f2fs: remove bulk remove_proc_entry() and unnecessary kobject_del()
        f2fs: support iopoll method
        f2fs: remove batched_trim_sections node description
        f2fs: fix to check return value of inc_valid_block_count()
        f2fs: fix to check return value of f2fs_do_truncate_blocks()
        f2fs: fix passing relative address when discard zones
        f2fs: fix potential corruption when moving a directory
        f2fs: add radix_tree_preload_end in error case
        f2fs: fix to recover quota data correctly
        f2fs: fix to check readonly condition correctly
        docs: f2fs: Correct instruction to disable checkpoint
        ...
      5c7ecada
    • Linus Torvalds's avatar
      Merge tag 'dlm-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm · fbfaf03e
      Linus Torvalds authored
      Pull dlm updates from David Teigland:
      
       - Remove some unused features (related to lock timeouts) that have been
         previously scheduled for removal
      
       - Fix a bug where the pending callback flag would be incorrectly
         cleared, which could potentially result in missing a completion
         callback
      
       - Use an unbound workqueue for dlm socket handling so that socket
         operations can be processed with less delay
      
       - Fix possible lockspace join connection errors with large clusters
         (e.g. over 16 nodes) caused by a small socket backlog setting
      
       - Use atomic bit ops for internal flags to help avoid mistakes copying
         flag values from messages
      
       - Fix recently introduced bug where memory for lvb data could be
         unnecessarily allocated for a lock
      
      * tag 'dlm-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
        fs: dlm: stop unnecessarily filling zero ms_extra bytes
        fs: dlm: switch lkb_sbflags to atomic ops
        fs: dlm: rsb hash table flag value to atomic ops
        fs: dlm: move internal flags to atomic ops
        fs: dlm: change dflags to use atomic bits
        fs: dlm: store lkb distributed flags into own value
        fs: dlm: remove DLM_IFL_LOCAL_MS flag
        fs: dlm: rename stub to local message flag
        fs: dlm: remove deprecated code parts
        DLM: increase socket backlog to avoid hangs with 16 nodes
        fs: dlm: add unbound flag to dlm_io workqueue
        fs: dlm: fix DLM_IFL_CB_PENDING gets overwritten
      fbfaf03e
    • Linus Torvalds's avatar
      Merge tag 'gfs2-v6.3-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · e0fcc9c6
      Linus Torvalds authored
      Pull gfs2 updates from Andreas Gruenbacher:
      
       - Fix revoke processing at unmount and on read-only remount
      
       - Refuse reading in inodes with an impossible indirect block height
      
       - Various minor cleanups
      
      * tag 'gfs2-v6.3-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: gfs2_ail_empty_gl no log flush on error
        gfs2: Issue message when revokes cannot be written
        gfs2: Perform second log flush in gfs2_make_fs_ro
        gfs2: return errors from gfs2_ail_empty_gl
        gfs2: Move variable assignment behind a null pointer check in inode_go_dump
        gfs2: Use gfs2_holder_initialized for jindex
        gfs2: Eliminate gfs2_trim_blocks
        gfs2: Fix inode height consistency check
        gfs2: Remove ghs[] from gfs2_unlink
        gfs2: Remove ghs[] from gfs2_link
        gfs2: Remove duplicate i_nlink check from gfs2_link()
      e0fcc9c6
    • Linus Torvalds's avatar
      Merge tag 'for-6.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 85d7ab24
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "Mostly core changes and cleanups, some notable fixes and two
        performance improvements in directory logging.
      
        The IO path cleanups are removing or refactoring old code, scrub main
        loop has been completely rewritten also refactoring old code.
      
        There are some changes to non-btrfs code, mostly trivial, the cgroup
        punt bio logic is only moved from generic code.
      
        Performance improvements:
      
         - improve logging changes in a directory during one transaction,
           avoid iterating over items and reduce lock contention (fsync time
           4x lower)
      
         - when logging directory entries during one transaction, reduce
           locking of subvolume trees by checking tree-log instead
           (improvement in throughput and latency for concurrent access to a
           subvolume)
      
        Notable fixes:
      
         - dev-replace:
            - properly honor read mode when requested to avoid reading from
              source device
            - target device won't be used for eventual read repair, this is
              unreliable for NODATASUM files
            - when there are unpaired (and unrepairable) metadata during
              replace, exit early with error and don't try to finish whole
              operation
      
         - scrub ioctl properly rejects unknown flags
      
         - fix global block reserve calculations
      
         - fix partial direct io write when there's a page fault in the
           middle, iomap will try to continue with partial request but the
           btrfs part did not match that, this can lead to zeros written
           instead of data
      
        Core changes:
      
         - io path:
            - continued cleanups and refactoring around bio handling
            - extent io submit path simplifications and cleanups
            - flush write path simplifications and cleanups
            - rework logic of passing sync mode of bio, with further cleanups
      
         - rewrite scrub code flow, restructure how the stripes are enumerated
           and verified in a more unified way
      
         - allow to set lower threshold for block group reclaim in debug mode
           to aid zoned mode testing
      
         - remove obsolete time-based delayed ref throttling logic when
           truncating items
      
         - DREW locks are not using percpu variables anymore
      
         - more warning fixes (-Wmaybe-uninitialized)
      
         - u64 division simplifications
      
         - error handling improvements
      
        Non-btrfs code changes:
      
         - push cgroup punt bio logic to btrfs code (there was no other user
           of that), the functionality can be now selected separately by
           BLK_CGROUP_PUNT_BIO
      
         - crc32c_impl removed after removing last uses in btrfs code
      
         - add btrfs_assertfail() to objtool table"
      
      * tag 'for-6.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (147 commits)
        btrfs: mark btrfs_assertfail() __noreturn
        btrfs: fix uninitialized variable warnings
        btrfs: use log root when iterating over index keys when logging directory
        btrfs: avoid iterating over all indexes when logging directory
        btrfs: dev-replace: error out if we have unrepaired metadata error during
        btrfs: remove pointless loop at btrfs_get_next_valid_item()
        btrfs: scrub: reject unsupported scrub flags
        btrfs: reinterpret async discard iops_limit=0 as no delay
        btrfs: set default discard iops_limit to 1000
        btrfs: remove unused raid56 functions which were dedicated for scrub
        btrfs: scrub: remove scrub_bio structure
        btrfs: scrub: remove scrub_block and scrub_sector structures
        btrfs: scrub: remove the old scrub recheck code
        btrfs: scrub: remove the old writeback infrastructure
        btrfs: scrub: remove scrub_parity structure
        btrfs: scrub: use scrub_stripe to implement RAID56 P/Q scrub
        btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure
        btrfs: scrub: introduce helper to queue a stripe for scrub
        btrfs: scrub: introduce error reporting functionality for scrub_stripe
        btrfs: scrub: introduce a writeback helper for scrub_stripe
        ...
      85d7ab24
    • Linus Torvalds's avatar
      Merge tag 'fs_for_v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 94fc0792
      Linus Torvalds authored
      Pull ext2, reiserfs, udf, and quota updates from Jan Kara:
       "A couple of small fixes and cleanups for ext2, udf, reiserfs, and
        quota.
      
        The biggest change is making CONFIG_PRINT_QUOTA_WARNING depend on
        BROKEN with an outlook for removing it completely in an year or so"
      
      * tag 'fs_for_v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        quota: mark PRINT_QUOTA_WARNING as BROKEN
        quota: update Kconfig comment
        reiserfs: remove unused iter variable
        quota: Use register_sysctl_init() for registering fs_dqstats_table
        reiserfs: remove unused sched_count variable
        ext2: remove redundant assignment to pointer end
        quota: make dquot_set_dqinfo return errors from ->write_info
        quota: fixup *_write_file_info() to return proper error code
        quota: simplify two-level sysctl registration for fs_dqstats_table
        udf: use wrapper i_blocksize() in udf_discard_prealloc()
        udf: Use folios in udf_adinicb_writepage()
        ext2: Check block size validity during mount
        ext2: Correct maximum ext2 filesystem block size
      94fc0792
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 0cfcde1f
      Linus Torvalds authored
      Pull ext4 updates from Ted Ts'o:
       "There are a number of major cleanups in ext4 this cycle:
      
         - The data=journal writepath has been significantly cleaned up and
           simplified, and reduces a large number of data=journal special
           cases by Jan Kara.
      
         - Ojaswin Muhoo has replaced linked list used to track extents that
           have been used for inode preallocation with a red-black tree in the
           multi-block allocator. This improves performance for workloads
           which do a large number of random allocating writes.
      
         - Thanks to Kemeng Shi for a lot of cleanup and bug fixes in the
           multi-block allocator.
      
         - Matthew wilcox has converted the code paths for reading and writing
           ext4 pages to use folios.
      
         - Jason Yan has continued to factor out ext4_fill_super() into
           smaller functions for improve ease of maintenance and
           comprehension.
      
         - Josh Triplett has created an uapi header for ext4 userspace API's"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (105 commits)
        ext4: Add a uapi header for ext4 userspace APIs
        ext4: remove useless conditional branch code
        ext4: remove unneeded check of nr_to_submit
        ext4: move dax and encrypt checking into ext4_check_feature_compatibility()
        ext4: factor out ext4_block_group_meta_init()
        ext4: move s_reserved_gdt_blocks and addressable checking into ext4_check_geometry()
        ext4: rename two functions with 'check'
        ext4: factor out ext4_flex_groups_free()
        ext4: use ext4_group_desc_free() in ext4_put_super() to save some duplicated code
        ext4: factor out ext4_percpu_param_init() and ext4_percpu_param_destroy()
        ext4: factor out ext4_hash_info_init()
        Revert "ext4: Fix warnings when freezing filesystem with journaled data"
        ext4: Update comment in mpage_prepare_extent_to_map()
        ext4: Simplify handling of journalled data in ext4_bmap()
        ext4: Drop special handling of journalled data from ext4_quota_on()
        ext4: Drop special handling of journalled data from ext4_evict_inode()
        ext4: Fix special handling of journalled data from extent zeroing
        ext4: Drop special handling of journalled data from extent shifting operations
        ext4: Drop special handling of journalled data from ext4_sync_file()
        ext4: Commit transaction before writing back pages in data=journal mode
        ...
      0cfcde1f
    • Linus Torvalds's avatar
      Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux · c3558a6b
      Linus Torvalds authored
      Pull fsverity updates from Eric Biggers:
       "Several cleanups and fixes for fs/verity/, including a couple minor
        fixes to the changes in 6.3 that added support for Merkle tree block
        sizes less than the page size"
      
      * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
        fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fds
        fsverity: explicitly check for buffer overflow in build_merkle_tree()
        fsverity: use WARN_ON_ONCE instead of WARN_ON
        fs-verity: simplify sysctls with register_sysctl()
        fs/buffer.c: use b_folio for fsverity work
      c3558a6b