1. 23 Dec, 2022 6 commits
  2. 22 Dec, 2022 12 commits
    • Shawn Bohrer's avatar
      veth: Fix race with AF_XDP exposing old or uninitialized descriptors · fa349e39
      Shawn Bohrer authored
      When AF_XDP is used on on a veth interface the RX ring is updated in two
      steps.  veth_xdp_rcv() removes packet descriptors from the FILL ring
      fills them and places them in the RX ring updating the cached_prod
      pointer.  Later xdp_do_flush() syncs the RX ring prod pointer with the
      cached_prod pointer allowing user-space to see the recently filled in
      descriptors.  The rings are intended to be SPSC, however the existing
      order in veth_poll allows the xdp_do_flush() to run concurrently with
      another CPU creating a race condition that allows user-space to see old
      or uninitialized descriptors in the RX ring.  This bug has been observed
      in production systems.
      
      To summarize, we are expecting this ordering:
      
      CPU 0 __xsk_rcv_zc()
      CPU 0 __xsk_map_flush()
      CPU 2 __xsk_rcv_zc()
      CPU 2 __xsk_map_flush()
      
      But we are seeing this order:
      
      CPU 0 __xsk_rcv_zc()
      CPU 2 __xsk_rcv_zc()
      CPU 0 __xsk_map_flush()
      CPU 2 __xsk_map_flush()
      
      This occurs because we rely on NAPI to ensure that only one napi_poll
      handler is running at a time for the given veth receive queue.
      napi_schedule_prep() will prevent multiple instances from getting
      scheduled. However calling napi_complete_done() signals that this
      napi_poll is complete and allows subsequent calls to
      napi_schedule_prep() and __napi_schedule() to succeed in scheduling a
      concurrent napi_poll before the xdp_do_flush() has been called.  For the
      veth driver a concurrent call to napi_schedule_prep() and
      __napi_schedule() can occur on a different CPU because the veth xmit
      path can additionally schedule a napi_poll creating the race.
      
      The fix as suggested by Magnus Karlsson, is to simply move the
      xdp_do_flush() call before napi_complete_done().  This syncs the
      producer ring pointers before another instance of napi_poll can be
      scheduled on another CPU.  It will also slightly improve performance by
      moving the flush closer to when the descriptors were placed in the
      RX ring.
      
      Fixes: d1396004 ("veth: Add XDP TX and REDIRECT")
      Suggested-by: default avatarMagnus Karlsson <magnus.karlsson@gmail.com>
      Signed-off-by: default avatarShawn Bohrer <sbohrer@cloudflare.com>
      Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      fa349e39
    • Horatiu Vultur's avatar
      net: lan966x: Fix configuration of the PCS · d717f947
      Horatiu Vultur authored
      When the PCS was taken out of reset, we were changing by mistake also
      the speed to 100 Mbit. But in case the link was going down, the link
      up routine was setting correctly the link speed. If the link was not
      getting down then the speed was forced to run at 100 even if the
      speed was something else.
      On lan966x, to set the speed link to 1G or 2.5G a value of 1 needs to be
      written in DEV_CLOCK_CFG_LINK_SPEED. This is similar to the procedure in
      lan966x_port_init.
      
      The issue was reproduced using 1000base-x sfp module using the commands:
      ip link set dev eth2 up
      ip link addr add 10.97.10.2/24 dev eth2
      ethtool -s eth2 speed 1000 autoneg off
      
      Fixes: d28d6d2e ("net: lan966x: add port module support")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarPiotr Raczynski <piotr.raczynski@intel.com>
      Link: https://lore.kernel.org/r/20221221093315.939133-1-horatiu.vultur@microchip.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d717f947
    • Eric Dumazet's avatar
      bonding: fix lockdep splat in bond_miimon_commit() · 42c7ded0
      Eric Dumazet authored
      bond_miimon_commit() is run while RTNL is held, not RCU.
      
      WARNING: suspicious RCU usage
      6.1.0-syzkaller-09671-g89529367 #0 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:2704 suspicious rcu_dereference_check() usage!
      
      Fixes: e95cc447 ("bonding: do failover when high prio link up")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Hangbin Liu <liuhangbin@gmail.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Link: https://lore.kernel.org/r/20221220130831.1480888-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      42c7ded0
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-locking-fixes' · 43ae218f
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Locking fixes
      
      Two separate locking fixes for the networking tree:
      
      Patch 1 addresses a MPTCP fastopen error-path deadlock that was found
      with syzkaller.
      
      Patch 2 works around a lockdep false-positive between MPTCP listening and
      non-listening sockets at socket destruct time.
      ====================
      
      Link: https://lore.kernel.org/r/20221220195215.238353-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      43ae218f
    • Paolo Abeni's avatar
      mptcp: fix lockdep false positive · fec3adfd
      Paolo Abeni authored
      MattB reported a lockdep splat in the mptcp listener code cleanup:
      
       WARNING: possible circular locking dependency detected
       packetdrill/14278 is trying to acquire lock:
       ffff888017d868f0 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: __flush_work (kernel/workqueue.c:3069)
      
       but task is already holding lock:
       ffff888017d84130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973)
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
              __lock_acquire (kernel/locking/lockdep.c:5055)
              lock_acquire (kernel/locking/lockdep.c:466)
              lock_sock_nested (net/core/sock.c:3463)
              mptcp_worker (net/mptcp/protocol.c:2614)
              process_one_work (kernel/workqueue.c:2294)
              worker_thread (include/linux/list.h:292)
              kthread (kernel/kthread.c:376)
              ret_from_fork (arch/x86/entry/entry_64.S:312)
      
       -> #0 ((work_completion)(&msk->work)){+.+.}-{0:0}:
              check_prev_add (kernel/locking/lockdep.c:3098)
              validate_chain (kernel/locking/lockdep.c:3217)
              __lock_acquire (kernel/locking/lockdep.c:5055)
              lock_acquire (kernel/locking/lockdep.c:466)
              __flush_work (kernel/workqueue.c:3070)
              __cancel_work_timer (kernel/workqueue.c:3160)
              mptcp_cancel_work (net/mptcp/protocol.c:2758)
              mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817)
              __mptcp_close_ssk (net/mptcp/protocol.c:2363)
              mptcp_destroy_common (net/mptcp/protocol.c:3170)
              mptcp_destroy (include/net/sock.h:1495)
              __mptcp_destroy_sock (net/mptcp/protocol.c:2886)
              __mptcp_close (net/mptcp/protocol.c:2959)
              mptcp_close (net/mptcp/protocol.c:2974)
              inet_release (net/ipv4/af_inet.c:432)
              __sock_release (net/socket.c:651)
              sock_close (net/socket.c:1367)
              __fput (fs/file_table.c:320)
              task_work_run (kernel/task_work.c:181 (discriminator 1))
              exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49)
              syscall_exit_to_user_mode (kernel/entry/common.c:130)
              do_syscall_64 (arch/x86/entry/common.c:87)
              entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(sk_lock-AF_INET);
                                      lock((work_completion)(&msk->work));
                                      lock(sk_lock-AF_INET);
         lock((work_completion)(&msk->work));
      
        *** DEADLOCK ***
      
      The report is actually a false positive, since the only existing lock
      nesting is the msk socket lock acquired by the mptcp work.
      cancel_work_sync() is invoked without the relevant socket lock being
      held, but under a different (the msk listener) socket lock.
      
      We could silence the splat adding a per workqueue dynamic lockdep key,
      but that looks overkill. Instead just tell lockdep the msk socket lock
      is not held around cancel_work_sync().
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/322
      Fixes: 30e51b92 ("mptcp: fix unreleased socket in accept queue")
      Reported-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fec3adfd
    • Paolo Abeni's avatar
      mptcp: fix deadlock in fastopen error path · 7d803344
      Paolo Abeni authored
      MatM reported a deadlock at fastopening time:
      
      INFO: task syz-executor.0:11454 blocked for more than 143 seconds.
            Tainted: G S                 6.1.0-rc5-03226-gdb0157db5153 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:syz-executor.0  state:D stack:25104 pid:11454 ppid:424    flags:0x00004006
      Call Trace:
       <TASK>
       context_switch kernel/sched/core.c:5191 [inline]
       __schedule+0x5c2/0x1550 kernel/sched/core.c:6503
       schedule+0xe8/0x1c0 kernel/sched/core.c:6579
       __lock_sock+0x142/0x260 net/core/sock.c:2896
       lock_sock_nested+0xdb/0x100 net/core/sock.c:3466
       __mptcp_close_ssk+0x1a3/0x790 net/mptcp/protocol.c:2328
       mptcp_destroy_common+0x16a/0x650 net/mptcp/protocol.c:3171
       mptcp_disconnect+0xb8/0x450 net/mptcp/protocol.c:3019
       __inet_stream_connect+0x897/0xa40 net/ipv4/af_inet.c:720
       tcp_sendmsg_fastopen+0x3dd/0x740 net/ipv4/tcp.c:1200
       mptcp_sendmsg_fastopen net/mptcp/protocol.c:1682 [inline]
       mptcp_sendmsg+0x128a/0x1a50 net/mptcp/protocol.c:1721
       inet6_sendmsg+0x11f/0x150 net/ipv6/af_inet6.c:663
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg+0xf7/0x190 net/socket.c:734
       ____sys_sendmsg+0x336/0x970 net/socket.c:2476
       ___sys_sendmsg+0x122/0x1c0 net/socket.c:2530
       __sys_sendmmsg+0x18d/0x460 net/socket.c:2616
       __do_sys_sendmmsg net/socket.c:2645 [inline]
       __se_sys_sendmmsg net/socket.c:2642 [inline]
       __x64_sys_sendmmsg+0x9d/0x110 net/socket.c:2642
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f5920a75e7d
      RSP: 002b:00007f59201e8028 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00007f5920bb4f80 RCX: 00007f5920a75e7d
      RDX: 0000000000000001 RSI: 0000000020002940 RDI: 0000000000000005
      RBP: 00007f5920ae7593 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000020004050 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007f5920bb4f80 R15: 00007f59201c8000
       </TASK>
      
      In the error path, tcp_sendmsg_fastopen() ends-up calling
      mptcp_disconnect(), and the latter tries to close each
      subflow, acquiring the socket lock on each of them.
      
      At fastopen time, we have a single subflow, and such subflow
      socket lock is already held by the called, causing the deadlock.
      
      We already track the 'fastopen in progress' status inside the msk
      socket. Use it to address the issue, making mptcp_disconnect() a
      no op when invoked from the fastopen (error) path and doing the
      relevant cleanup after releasing the subflow socket lock.
      
      While at the above, rename the fastopen status bit to something
      more meaningful.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/321
      Fixes: fa9e5746 ("mptcp: fix abba deadlock on fastopen")
      Reported-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7d803344
    • Yinjun Zhang's avatar
      nfp: fix schedule in atomic context when sync mc address · e20aa071
      Yinjun Zhang authored
      The callback `.ndo_set_rx_mode` is called in atomic context, sleep
      is not allowed in the implementation. Now use workqueue mechanism
      to avoid this issue.
      
      Fixes: de624864 ("nfp: add support for multicast filter")
      Signed-off-by: default avatarYinjun Zhang <yinjun.zhang@corigine.com>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20221220152100.1042774-1-simon.horman@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e20aa071
    • Ronak Doshi's avatar
      vmxnet3: correctly report csum_level for encapsulated packet · 3d8f2c42
      Ronak Doshi authored
      Commit dacce2be ("vmxnet3: add geneve and vxlan tunnel offload
      support") added support for encapsulation offload. However, the
      pathc did not report correctly the csum_level for encapsulated packet.
      
      This patch fixes this issue by reporting correct csum level for the
      encapsulated packet.
      
      Fixes: dacce2be ("vmxnet3: add geneve and vxlan tunnel offload support")
      Signed-off-by: default avatarRonak Doshi <doshir@vmware.com>
      Acked-by: default avatarPeng Li <lpeng@vmware.com>
      Link: https://lore.kernel.org/r/20221220202556.24421-1-doshir@vmware.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d8f2c42
    • Aaron Conole's avatar
      net: openvswitch: release vport resources on failure · 95637d91
      Aaron Conole authored
      A recent commit introducing upcall packet accounting failed to properly
      release the vport object when the per-cpu stats struct couldn't be
      allocated.  This can cause dangling pointers to dp objects long after
      they've been released.
      
      Cc: wangchuanlei <wangchuanlei@inspur.com>
      Fixes: 1933ea36 ("net: openvswitch: Add support to count upcall packets")
      Reported-by: syzbot+8f4e2dcfcb3209ac35f9@syzkaller.appspotmail.com
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Link: https://lore.kernel.org/r/20221220212717.526780-1-aconole@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95637d91
    • Antoine Tenart's avatar
      net: vrf: determine the dst using the original ifindex for multicast · f2575c8f
      Antoine Tenart authored
      Multicast packets received on an interface bound to a VRF are marked as
      belonging to the VRF and the skb device is updated to point to the VRF
      device itself. This was fine even when a route was associated to a
      device as when performing a fib table lookup 'oif' in fib6_table_lookup
      (coming from 'skb->dev->ifindex' in ip6_route_input) was set to 0 when
      FLOWI_FLAG_SKIP_NH_OIF was set.
      
      With commit 40867d74 ("net: Add l3mdev index to flow struct and
      avoid oif reset for port devices") this is not longer true and multicast
      traffic is not received on the original interface.
      
      Instead of adding back a similar check in fib6_table_lookup determine
      the dst using the original ifindex for multicast VRF traffic. To make
      things consistent across the function do the above for all strict
      packets, which was the logic before commit 6f12fa77 ("vrf: mark skb
      for multicast or link-local as enslaved to VRF"). Note that reverting to
      this behavior should be fine as the change was about marking packets
      belonging to the VRF, not about their dst.
      
      Fixes: 40867d74 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20221220171825.1172237-1-atenart@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f2575c8f
    • Maciej Fijalkowski's avatar
      ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf · 53fc61be
      Maciej Fijalkowski authored
      Previously ice XDP xmit routine was changed in a way that it avoids
      xdp_buff->xdp_frame conversion as it is simply not needed for handling
      XDP_TX action and what is more it saves us CPU cycles. This routine is
      re-used on ZC driver to handle XDP_TX action.
      
      Although for XDP_TX on Rx ZC xdp_buff that comes from xsk_buff_pool is
      converted to xdp_frame, xdp_frame itself is not stored inside
      ice_tx_buf, we only store raw data pointer. Casting this pointer to
      xdp_frame and calling against it xdp_return_frame in
      ice_clean_xdp_tx_buf() results in undefined behavior.
      
      To fix this, simply call page_frag_free() on tx_buf->raw_buf.
      Later intention is to remove the buff->frame conversion in order to
      simplify the codebase and improve XDP_TX performance on ZC.
      
      Fixes: 126cdfe1 ("ice: xsk: Improve AF_XDP ZC Tx and use batching API")
      Reported-and-tested-by: default avatarRobin Cowley <robin.cowley@thehutgroup.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarPiotr Raczynski <piotr.raczynski@.intel.com>
      Link: https://lore.kernel.org/r/20221220175448.693999-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      53fc61be
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2022-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · aa6c3961
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.2
      
      First set of fixes for v6.2. Fix for a link error in mt76, fix for an
      iwlwifi firmware crash and two cleanups.
      
      * tag 'wireless-2022-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: ath9k: use proper statements in conditionals
        wifi: mt76: mt7996: select CONFIG_RELAY
        wifi: iwlwifi: fw: skip PPAG for JF
        wifi: ti: remove obsolete lines in the Makefile
      ====================
      
      Link: https://lore.kernel.org/r/20221221180808.96A8AC433EF@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa6c3961
  3. 21 Dec, 2022 4 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 609d3bc6
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf, netfilter and can.
      
        Current release - regressions:
      
         - bpf: synchronize dispatcher update with bpf_dispatcher_xdp_func
      
         - rxrpc:
            - fix security setting propagation
            - fix null-deref in rxrpc_unuse_local()
            - fix switched parameters in peer tracing
      
        Current release - new code bugs:
      
         - rxrpc:
            - fix I/O thread startup getting skipped
            - fix locking issues in rxrpc_put_peer_locked()
            - fix I/O thread stop
            - fix uninitialised variable in rxperf server
            - fix the return value of rxrpc_new_incoming_call()
      
         - microchip: vcap: fix initialization of value and mask
      
         - nfp: fix unaligned io read of capabilities word
      
        Previous releases - regressions:
      
         - stop in-kernel socket users from corrupting socket's task_frag
      
         - stream: purge sk_error_queue in sk_stream_kill_queues()
      
         - openvswitch: fix flow lookup to use unmasked key
      
         - dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port()
      
         - devlink:
            - hold region lock when flushing snapshots
            - protect devlink dump by the instance lock
      
        Previous releases - always broken:
      
         - bpf:
            - prevent leak of lsm program after failed attach
            - resolve fext program type when checking map compatibility
      
         - skbuff: account for tail adjustment during pull operations
      
         - macsec: fix net device access prior to holding a lock
      
         - bonding: switch back when high prio link up
      
         - netfilter: flowtable: really fix NAT IPv6 offload
      
         - enetc: avoid buffer leaks on xdp_do_redirect() failure
      
         - unix: fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()
      
         - dsa: microchip: remove IRQF_TRIGGER_FALLING in
           request_threaded_irq"
      
      * tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
        net: fec: check the return value of build_skb()
        net: simplify sk_page_frag
        Treewide: Stop corrupting socket's task_frag
        net: Introduce sk_use_task_frag in struct sock.
        mctp: Remove device type check at unregister
        net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq
        can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
        can: flexcan: avoid unbalanced pm_runtime_enable warning
        Documentation: devlink: add missing toc entry for etas_es58x devlink doc
        mctp: serial: Fix starting value for frame check sequence
        nfp: fix unaligned io read of capabilities word
        net: stream: purge sk_error_queue in sk_stream_kill_queues()
        myri10ge: Fix an error handling path in myri10ge_probe()
        net: microchip: vcap: Fix initialization of value and mask
        rxrpc: Fix the return value of rxrpc_new_incoming_call()
        rxrpc: rxperf: Fix uninitialised variable
        rxrpc: Fix I/O thread stop
        rxrpc: Fix switched parameters in peer tracing
        rxrpc: Fix locking issues in rxrpc_put_peer_locked()
        rxrpc: Fix I/O thread startup getting skipped
        ...
      609d3bc6
    • Linus Torvalds's avatar
      Merge tag 'fs.vfsuid.ima.v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping · 878cf96f
      Linus Torvalds authored
      Pull vfsuid cleanup from Christian Brauner:
       "This moves the ima specific vfs{g,u}id_t comparison helpers out of the
        header and into the one file in ima where they are used.
      
        We shouldn't incentivize people to use them by placing them into the
        header. As discussed and suggested by Linus in [1] let's just define
        them locally in the one file in ima where they are used"
      
      Link: https://lore.kernel.org/lkml/CAHk-=wj4BpEwUd=OkTv1F9uykvSrsBNZJVHMp+p_+e2kiV71_A@mail.gmail.com [1]
      
      * tag 'fs.vfsuid.ima.v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
        mnt_idmapping: move ima-only helpers to ima
      878cf96f
    • Linus Torvalds's avatar
      Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random · 222882c2
      Linus Torvalds authored
      Pull more random number generator updates from Jason Donenfeld:
       "Two remaining changes that are now possible after you merged a few
        other trees:
      
         - #include <asm/archrandom.h> can be removed from random.h now,
           making the direct use of the arch_random_* API more of a private
           implementation detail between the archs and random.c, rather than
           something for general consumers.
      
         - Two additional uses of prandom_u32_max() snuck in during the
           initial phase of pulls, so these have been converted to
           get_random_u32_below(), and now the deprecated prandom_u32_max()
           alias -- which was just a wrapper around get_random_u32_below() --
           can be removed.
      
        In addition, there is one fix:
      
         - Check efi_rt_services_supported() before attempting to use an EFI
           runtime function.
      
           This affected EFI systems that disable runtime services yet still
           boot via EFI (e.g. the reporter's Lenovo Thinkpad X13s laptop), as
           well systems where EFI runtime services have been forcibly
           disabled, such as on PREEMPT_RT.
      
           On those machines, a very early and hard to diagnose crash would
           happen, preventing boot"
      
      * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
        prandom: remove prandom_u32_max()
        efi: random: fix NULL-deref when refreshing seed
        random: do not include <asm/archrandom.h> from random.h
      222882c2
    • Linus Torvalds's avatar
      Merge tag 'rcu-urgent.2022.12.17a' of... · 19822e3e
      Linus Torvalds authored
      Merge tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
      
      Pull RCU fix from Paul McKenney:
       "This fixes a lockdep false positive in synchronize_rcu() that can
        otherwise occur during early boot.
      
        The fix simply avoids invoking lockdep if the scheduler has not yet
        been initialized, that is, during that portion of boot when interrupts
        are disabled"
      
      * tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        rcu: Don't assert interrupts enabled too early in boot
      19822e3e
  4. 20 Dec, 2022 18 commits
    • Wei Fang's avatar
      net: fec: check the return value of build_skb() · 19e72b06
      Wei Fang authored
      The build_skb might return a null pointer but there is no check on the
      return value in the fec_enet_rx_queue(). So a null pointer dereference
      might occur. To avoid this, we check the return value of build_skb. If
      the return value is a null pointer, the driver will recycle the page and
      update the statistic of ndev. Then jump to rx_processing_done to clear
      the status flags of the BD so that the hardware can recycle the BD.
      
      Fixes: 95698ff6 ("net: fec: using page pool to manage RX buffers")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarShenwei Wang <Shenwei.wang@nxp.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Link: https://lore.kernel.org/r/20221219022755.1047573-1-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      19e72b06
    • Linus Torvalds's avatar
      Merge tag 'm68knommu-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · b6bb9676
      Linus Torvalds authored
      Pull m68knommu update from Greg Ungerer:
       "Only a single change to use the safer strscpy() instead of strncpy()
        when setting up the cmdline"
      
      * tag 'm68knommu-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68k: use strscpy() to instead of strncpy()
      b6bb9676
    • Linus Torvalds's avatar
      Merge tag 'spdx-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx · 32d528c4
      Linus Torvalds authored
      Pull SPDX/License additions from Greg KH:
       "Here are two small updates for LICENSES and some kernel files that add
        the Copyleft-next license and use it in a SPDX tag as a dual-license
        for some kernel files.
      
        These have been discussed thoroughly in public on the linux-spdx
        mailing list, and have the needed acks on them, as well as having been
        in linux-next with no reported issues for quite some time"
      
      * tag 'spdx-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
        testing: use the copyleft-next-0.3.1 SPDX tag
        LICENSES: Add the copyleft-next-0.3.1 license
      32d528c4
    • Linus Torvalds's avatar
      Merge tag 'devicetree-for-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 3e0caea7
      Linus Torvalds authored
      Pull more devicetree updates from Rob Herring:
       "This is mostly a treewide clean-up from Krzysztof. There's also a
        couple of fixes and things that fell thru the cracks.
      
        I must say this has been a nice merge window without bindings dumped
        in at the last minute introducing warnings.
      
        Summary:
      
         - Treewide dropping of redundant 'binding' or 'schema' from schema
           titles. This will be followed up with a automated check to catch
           these.
      
         - Re-sort vendor-prefies
      
         - Convert GPIO based watchdog to schema
      
         - Handle all the variations for clocks, resets, power domains in i.MX
           PCIe binding
      
         - Document missing 'power-domains' property in mxsfb
      
         - Fix error with path references in Tegra XUSB example
      
         - Honor CONFIG_CMDLINE* even without /chosen node"
      
      * tag 'devicetree-for-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: drop redundant part of title (manual)
        dt-bindings: clock: drop redundant part of title
        dt-bindings: drop redundant part of title (beginning)
        dt-bindings: drop redundant part of title (end, part three)
        dt-bindings: drop redundant part of title (end, part two)
        dt-bindings: drop redundant part of title (end)
        dt-bindings: clock: st,stm32mp1-rcc: add proper title
        dt-bindings: memory-controllers: ti,gpmc-child: drop redundant part of title
        dt-bindings: drop redundant part of title of shared bindings
        dt-bindings: watchdog: gpio: Convert bindings to YAML
        dt-bindings: imx6q-pcie: Handle more resets on legacy platforms
        dt-bindings: imx6q-pcie: Handle various PD configurations
        dt-bindings: imx6q-pcie: Handle various clock configurations
        dt-bindings: hwmon: ntc-thermistor: drop Naveen Krishna Chatradhi from maintainers
        dt-bindings: mxsfb: Document i.MX8M/i.MX6SX/i.MX6SL power-domains property
        dt-bindings: vendor-prefixes: sort entries alphabetically
        dt-bindings: usb: tegra-xusb: Remove path references
        of: fdt: Honor CONFIG_CMDLINE* even without /chosen node
      3e0caea7
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 35f79d0e
      Linus Torvalds authored
      Pull parisc updates from Helge Deller:
       "There is one noteable patch, which allows the parisc kernel to use the
        same MADV_xxx constants as the other architectures going forward. With
        that change only alpha has one entry left (MADV_DONTNEED is 6 vs 4 on
        others) which is different. To prevent an ABI breakage, a wrapper is
        included which translates old MADV values to the new ones, so existing
        userspace isn't affected. Reason for that patch is, that some
        applications wrongly used the standard MADV_xxx values even on some
        non-x86 platforms and as such those programs failed to run correctly
        on parisc (examples are qemu-user, tor browser and boringssl).
      
        Then the kgdb console and the LED code received some fixes, and some
        0-day warnings are now gone. Finally, the very last compile warning
        which was visible during a kernel build is now fixed too (in the vDSO
        code).
      
        The majority of the patches are tagged for stable series and in
        summary this patchset is quite small and drops more code than it adds:
      
      Fixes:
         - Fix potential null-ptr-deref in start_task()
         - Fix kgdb console on serial port
         - Add missing FORCE prerequisites in Makefile
         - Drop PMD_SHIFT from calculation in pgtable.h
      
        Enhancements:
         - Implement a wrapper to align madvise() MADV_* constants with other
           architectures
         - If machine supports running MPE/XL, show the MPE model string
      
        Cleanups:
         - Drop duplicate kgdb console code
         - Indenting fixes in setup_cmdline()"
      
      * tag 'parisc-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Show MPE/iX model string at bootup
        parisc: Add missing FORCE prerequisites in Makefile
        parisc: Move pdc_result struct to firmware.c
        parisc: Drop locking in pdc console code
        parisc: Drop duplicate kgdb_pdc console
        parisc: Fix locking in pdc_iodc_print() firmware call
        parisc: Drop PMD_SHIFT from calculation in pgtable.h
        parisc: Align parisc MADV_XXX constants with all other architectures
        parisc: led: Fix potential null-ptr-deref in start_task()
        parisc: Fix inconsistent indenting in setup_cmdline()
      35f79d0e
    • Linus Torvalds's avatar
      Merge tag 'asm-generic-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic · 70b07bec
      Linus Torvalds authored
      Pull asm-generic updates from Arnd Bergmann:
       "There are only three fairly simple patches.
      
        The #include change to linux/swab.h addresses a userspace build issue,
        and the change to the mmio tracing logic helps provide more useful
        traces"
      
      * tag 'asm-generic-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
        uapi: Add missing _UAPI prefix to <asm-generic/types.h> include guard
        asm-generic/io: Add _RET_IP_ to MMIO trace for more accurate debug info
        include/uapi/linux/swab: Fix potentially missing __always_inline
      70b07bec
    • Arnd Bergmann's avatar
      wifi: ath9k: use proper statements in conditionals · b7dc753f
      Arnd Bergmann authored
      A previous cleanup patch accidentally broke some conditional
      expressions by replacing the safe "do {} while (0)" constructs
      with empty macros. gcc points this out when extra warnings
      are enabled:
      
      drivers/net/wireless/ath/ath9k/hif_usb.c: In function 'ath9k_skb_queue_complete':
      drivers/net/wireless/ath/ath9k/hif_usb.c:251:57: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body]
        251 |                         TX_STAT_INC(hif_dev, skb_failed);
      
      Make both sets of macros proper expressions again.
      
      Fixes: d7fc7603 ("ath9k: htc: clean up statistics macros")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20221215165553.1950307-1-arnd@kernel.org
      b7dc753f
    • Arnd Bergmann's avatar
      wifi: mt76: mt7996: select CONFIG_RELAY · 37fc9ad1
      Arnd Bergmann authored
      Without CONFIG_RELAY, the driver fails to link:
      
      ERROR: modpost: "relay_flush" [drivers/net/wireless/mediatek/mt76/mt7996/mt7996e.ko] undefined!
      ERROR: modpost: "relay_switch_subbuf" [drivers/net/wireless/mediatek/mt76/mt7996/mt7996e.ko] undefined!
      ERROR: modpost: "relay_open" [drivers/net/wireless/mediatek/mt76/mt7996/mt7996e.ko] undefined!
      ERROR: modpost: "relay_reset" [drivers/net/wireless/mediatek/mt76/mt7996/mt7996e.ko] undefined!
      ERROR: modpost: "relay_file_operations" [drivers/net/wireless/mediatek/mt76/mt7996/mt7996e.ko] undefined!
      
      The same change was done in mt7915 for the corresponding copy of the code.
      
      Fixes: 98686cd2 ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices")
      See-also: 988845c9 ("mt76: mt7915: add support for passing chip/firmware debug data to user space")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20221215163133.4152299-1-arnd@kernel.org
      37fc9ad1
    • Johannes Berg's avatar
      wifi: iwlwifi: fw: skip PPAG for JF · 1c4c0b28
      Johannes Berg authored
      For JF RFs we don't support PPAG, but many firmware
      images lie about it. Always skip support for JF to
      avoid firmware errors when sending the command.
      Reported-and-tested-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Link: https://lore.kernel.org/linux-wireless/CACT4oufQsqHGp6bah2c4+jPn2wG1oZqY=UKa_TmPx=F6Lxng8Q@mail.gmail.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGregory Greenman <gregory.greenman@intel.com>
      Signed-off-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20221213225723.2a43415d8990.I9ac210740a45b41f1b2e15274e1daf4284f2808a@changeid
      1c4c0b28
    • Jason A. Donenfeld's avatar
      prandom: remove prandom_u32_max() · 3c202d14
      Jason A. Donenfeld authored
      Convert the final two users of prandom_u32_max() that slipped in during
      6.2-rc1 to use get_random_u32_below().
      
      Then, with no more users left, we can finally remove the deprecated
      function.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      3c202d14
    • Johan Hovold's avatar
      efi: random: fix NULL-deref when refreshing seed · 41a15855
      Johan Hovold authored
      Do not try to refresh the RNG seed in case the firmware does not support
      setting variables.
      
      This is specifically needed to prevent a NULL-pointer dereference on the
      Lenovo X13s with some firmware revisions, or more generally, whenever
      the runtime services have been disabled (e.g. efi=noruntime or with
      PREEMPT_RT).
      
      Fixes: e7b813b3 ("efi: random: refresh non-volatile random seed when RNG is initialized")
      Reported-by: default avatarSteev Klimaszewski <steev@kali.org>
      Reported-by: default avatarBjorn Andersson <andersson@kernel.org>
      Tested-by: default avatarSteev Klimaszewski <steev@kali.org>
      Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
      Signed-off-by: default avatarJohan Hovold <johan+linaro@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      41a15855
    • Jason A. Donenfeld's avatar
      random: do not include <asm/archrandom.h> from random.h · 6bb20c15
      Jason A. Donenfeld authored
      The <asm/archrandom.h> header is a random.c private detail, not
      something to be called by other code. As such, don't make it
      automatically available by way of random.h.
      
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      6bb20c15
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-6.2-20221219' of... · 4be84df3
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-6.2-20221219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2022-12-19
      
      The first patch is by Vincent Mailhol and adds the etas_es58x
      devlink documentation to the index.
      
      Haibo Chen's patch for the flexcan driver fixes a unbalanced
      pm_runtime_enable warning.
      
      The last patch is by me, targets the kvaser_usb driver and fixes
      an error occurring with gcc-13.
      
      * tag 'linux-can-fixes-for-6.2-20221219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
        can: flexcan: avoid unbalanced pm_runtime_enable warning
        Documentation: devlink: add missing toc entry for etas_es58x devlink doc
      ====================
      
      Link: https://lore.kernel.org/r/20221219155210.1143439-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4be84df3
    • Jakub Kicinski's avatar
      Merge branch 'stop-corrupting-socket-s-task_frag' · 918fb1aa
      Jakub Kicinski authored
      Benjamin Coddington says:
      
      ====================
      Stop corrupting socket's task_frag
      
      The networking code uses flags in sk_allocation to determine if it can use
      current->task_frag, however in-kernel users of sockets may stop setting
      sk_allocation when they convert to the preferred memalloc_nofs_save/restore,
      as SUNRPC has done in commit a1231fda ("SUNRPC: Set memalloc_nofs_save()
      on all rpciod/xprtiod jobs").
      
      This will cause corruption in current->task_frag when recursing into the
      network layer for those subsystems during page fault or reclaim.  The
      corruption is difficult to diagnose because stack traces may not contain the
      offending subsystem at all.  The corruption is unlikely to show up in
      testing because it requires memory pressure, and so subsystems that
      convert to memalloc_nofs_save/restore are likely to continue to run into
      this issue.
      
      Previous reports and proposed fixes:
      https://lore.kernel.org/netdev/96a18bd00cbc6cb554603cc0d6ef1c551965b078.1663762494.git.gnault@redhat.com/
      https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
      https://lore.kernel.org/linux-nfs/de6d99321d1dcaa2ad456b92b3680aa77c07a747.1665401788.git.gnault@redhat.com/
      
      Guilluame Nault has done all of the hard work tracking this problem down and
      finding the best fix for this issue.  I'm just taking a turn posting another
      fix.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1671194454.git.bcodding@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      918fb1aa
    • Benjamin Coddington's avatar
      net: simplify sk_page_frag · 08f65892
      Benjamin Coddington authored
      Now that in-kernel socket users that may recurse during reclaim have benn
      converted to sk_use_task_frag = false, we can have sk_page_frag() simply
      check that value.
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      08f65892
    • Benjamin Coddington's avatar
      Treewide: Stop corrupting socket's task_frag · 98123866
      Benjamin Coddington authored
      Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
      GFP_NOIO flag on sk_allocation which the networking system uses to decide
      when it is safe to use current->task_frag.  The results of this are
      unexpected corruption in task_frag when SUNRPC is involved in memory
      reclaim.
      
      The corruption can be seen in crashes, but the root cause is often
      difficult to ascertain as a crashing machine's stack trace will have no
      evidence of being near NFS or SUNRPC code.  I believe this problem to
      be much more pervasive than reports to the community may indicate.
      
      Fix this by having kernel users of sockets that may corrupt task_frag due
      to reclaim set sk_use_task_frag = false.  Preemptively correcting this
      situation for users that still set sk_allocation allows them to convert to
      memalloc_nofs_save/restore without the same unexpected corruptions that are
      sure to follow, unlikely to show up in testing, and difficult to bisect.
      
      CC: Philipp Reisner <philipp.reisner@linbit.com>
      CC: Lars Ellenberg <lars.ellenberg@linbit.com>
      CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Josef Bacik <josef@toxicpanda.com>
      CC: Keith Busch <kbusch@kernel.org>
      CC: Christoph Hellwig <hch@lst.de>
      CC: Sagi Grimberg <sagi@grimberg.me>
      CC: Lee Duncan <lduncan@suse.com>
      CC: Chris Leech <cleech@redhat.com>
      CC: Mike Christie <michael.christie@oracle.com>
      CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
      CC: "Martin K. Petersen" <martin.petersen@oracle.com>
      CC: Valentina Manea <valentina.manea.m@gmail.com>
      CC: Shuah Khan <shuah@kernel.org>
      CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      CC: David Howells <dhowells@redhat.com>
      CC: Marc Dionne <marc.dionne@auristor.com>
      CC: Steve French <sfrench@samba.org>
      CC: Christine Caulfield <ccaulfie@redhat.com>
      CC: David Teigland <teigland@redhat.com>
      CC: Mark Fasheh <mark@fasheh.com>
      CC: Joel Becker <jlbec@evilplan.org>
      CC: Joseph Qi <joseph.qi@linux.alibaba.com>
      CC: Eric Van Hensbergen <ericvh@gmail.com>
      CC: Latchesar Ionkov <lucho@ionkov.net>
      CC: Dominique Martinet <asmadeus@codewreck.org>
      CC: Ilya Dryomov <idryomov@gmail.com>
      CC: Xiubo Li <xiubli@redhat.com>
      CC: Chuck Lever <chuck.lever@oracle.com>
      CC: Jeff Layton <jlayton@kernel.org>
      CC: Trond Myklebust <trond.myklebust@hammerspace.com>
      CC: Anna Schumaker <anna@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Suggested-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      98123866
    • Guillaume Nault's avatar
      net: Introduce sk_use_task_frag in struct sock. · fb87bd47
      Guillaume Nault authored
      Sockets that can be used while recursing into memory reclaim, like
      those used by network block devices and file systems, mustn't use
      current->task_frag: if the current process is already using it, then
      the inner memory reclaim call would corrupt the task_frag structure.
      
      To avoid this, sk_page_frag() uses ->sk_allocation to detect sockets
      that mustn't use current->task_frag, assuming that those used during
      memory reclaim had their allocation constraints reflected in
      ->sk_allocation.
      
      This unfortunately doesn't cover all cases: in an attempt to remove all
      usage of GFP_NOFS and GFP_NOIO, sunrpc stopped setting these flags in
      ->sk_allocation, and used memalloc_nofs critical sections instead.
      This breaks the sk_page_frag() heuristic since the allocation
      constraints are now stored in current->flags, which sk_page_frag()
      can't read without risking triggering a cache miss and slowing down
      TCP's fast path.
      
      This patch creates a new field in struct sock, named sk_use_task_frag,
      which sockets with memory reclaim constraints can set to false if they
      can't safely use current->task_frag. In such cases, sk_page_frag() now
      always returns the socket's page_frag (->sk_frag). The first user is
      sunrpc, which needs to avoid using current->task_frag but can keep
      ->sk_allocation set to GFP_KERNEL otherwise.
      
      Eventually, it might be possible to simplify sk_page_frag() by only
      testing ->sk_use_task_frag and avoid relying on the ->sk_allocation
      heuristic entirely (assuming other sockets will set ->sk_use_task_frag
      according to their constraints in the future).
      
      The new ->sk_use_task_frag field is placed in a hole in struct sock and
      belongs to a cache line shared with ->sk_shutdown. Therefore it should
      be hot and shouldn't have negative performance impacts on TCP's fast
      path (sk_shutdown is tested just before the while() loop in
      tcp_sendmsg_locked()).
      
      Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb87bd47
    • Matt Johnston's avatar
      mctp: Remove device type check at unregister · b389a902
      Matt Johnston authored
      The unregister check could be incorrectly triggered if a netdev
      changes its type after register. That is possible for a tun device
      using TUNSETLINK ioctl, resulting in mctp unregister failing
      and the netdev unregister waiting forever.
      
      This was encountered by https://github.com/openthread/openthread/issues/8523
      
      Neither check at register or unregister is required. They were added in
      an attempt to track down mctp_ptr being set unexpectedly, which should
      not happen in normal operation.
      
      Fixes: 7b1871af ("mctp: Warn if pointer is set for a wrong dev type")
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Link: https://lore.kernel.org/r/20221215054933.2403401-1-matt@codeconstruct.com.auSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b389a902