1. 21 May, 2024 8 commits
    • Paolo Abeni's avatar
      Merge branch 'af_unix-fix-gc-and-improve-selftest' · 580acf6c
      Paolo Abeni authored
      Michal Luczaj says:
      
      ====================
      af_unix: Fix GC and improve selftest
      
      Series deals with AF_UNIX garbage collector mishandling some in-flight
      graph cycles. Embryos carrying OOB packets with SCM_RIGHTS cause issues.
      
      Patch 1/2 fixes the memory leak.
      Patch 2/2 tweaks the selftest for a better OOB coverage.
      
      v3:
        - Patch 1/2: correct the commit message (Kuniyuki)
      
      v2: https://lore.kernel.org/netdev/20240516145457.1206847-1-mhal@rbox.co/
        - Patch 1/2: remove WARN_ON_ONCE() (Kuniyuki)
        - Combine both patches into a series (Kuniyuki)
      
      v1: https://lore.kernel.org/netdev/20240516103049.1132040-1-mhal@rbox.co/
      ====================
      
      Link: https://lore.kernel.org/r/20240517093138.1436323-1-mhal@rbox.coSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      580acf6c
    • Kuniyuki Iwashima's avatar
      selftest: af_unix: Make SCM_RIGHTS into OOB data. · e060e433
      Kuniyuki Iwashima authored
      scm_rights.c covers various test cases for inflight file descriptors
      and garbage collector for AF_UNIX sockets.
      
      Currently, SCM_RIGHTS messages are sent with 3-bytes string, and it's
      not good for MSG_OOB cases, as SCM_RIGTS cmsg goes with the first 2-bytes,
      which is non-OOB data.
      
      Let's send SCM_RIGHTS messages with 1-byte character to pack SCM_RIGHTS
      into OOB data.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e060e433
    • Michal Luczaj's avatar
      af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS · 041933a1
      Michal Luczaj authored
      GC attempts to explicitly drop oob_skb's reference before purging the hit
      list.
      
      The problem is with embryos: kfree_skb(u->oob_skb) is never called on an
      embryo socket.
      
      The python script below [0] sends a listener's fd to its embryo as OOB
      data.  While GC does collect the embryo's queue, it fails to drop the OOB
      skb's refcount.  The skb which was in embryo's receive queue stays as
      unix_sk(sk)->oob_skb and keeps the listener's refcount [1].
      
      Tell GC to dispose embryo's oob_skb.
      
      [0]:
      from array import array
      from socket import *
      
      addr = '\x00unix-oob'
      lis = socket(AF_UNIX, SOCK_STREAM)
      lis.bind(addr)
      lis.listen(1)
      
      s = socket(AF_UNIX, SOCK_STREAM)
      s.connect(addr)
      scm = (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()]))
      s.sendmsg([b'x'], [scm], MSG_OOB)
      lis.close()
      
      [1]
      $ grep unix-oob /proc/net/unix
      $ ./unix-oob.py
      $ grep unix-oob /proc/net/unix
      0000000000000000: 00000002 00000000 00000000 0001 02     0 @unix-oob
      0000000000000000: 00000002 00000000 00010000 0001 01  6072 @unix-oob
      
      Fixes: 4090fa37 ("af_unix: Replace garbage collection algorithm.")
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      041933a1
    • Kuniyuki Iwashima's avatar
      tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). · 3ebc46ca
      Kuniyuki Iwashima authored
      In dctcp_update_alpha(), we use a module parameter dctcp_shift_g
      as follows:
      
        alpha -= min_not_zero(alpha, alpha >> dctcp_shift_g);
        ...
        delivered_ce <<= (10 - dctcp_shift_g);
      
      It seems syzkaller started fuzzing module parameters and triggered
      shift-out-of-bounds [0] by setting 100 to dctcp_shift_g:
      
        memcpy((void*)0x20000080,
               "/sys/module/tcp_dctcp/parameters/dctcp_shift_g\000", 47);
        res = syscall(__NR_openat, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20000080ul,
                      /*flags=*/2ul, /*mode=*/0ul);
        memcpy((void*)0x20000000, "100\000", 4);
        syscall(__NR_write, /*fd=*/r[0], /*val=*/0x20000000ul, /*len=*/4ul);
      
      Let's limit the max value of dctcp_shift_g by param_set_uint_minmax().
      
      With this patch:
      
        # echo 10 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g
        # cat /sys/module/tcp_dctcp/parameters/dctcp_shift_g
        10
        # echo 11 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g
        -bash: echo: write error: Invalid argument
      
      [0]:
      UBSAN: shift-out-of-bounds in net/ipv4/tcp_dctcp.c:143:12
      shift exponent 100 is too large for 32-bit type 'u32' (aka 'unsigned int')
      CPU: 0 PID: 8083 Comm: syz-executor345 Not tainted 6.9.0-05151-g1b294a1f #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      1.13.0-1ubuntu1.1 04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x201/0x300 lib/dump_stack.c:114
       ubsan_epilogue lib/ubsan.c:231 [inline]
       __ubsan_handle_shift_out_of_bounds+0x346/0x3a0 lib/ubsan.c:468
       dctcp_update_alpha+0x540/0x570 net/ipv4/tcp_dctcp.c:143
       tcp_in_ack_event net/ipv4/tcp_input.c:3802 [inline]
       tcp_ack+0x17b1/0x3bc0 net/ipv4/tcp_input.c:3948
       tcp_rcv_state_process+0x57a/0x2290 net/ipv4/tcp_input.c:6711
       tcp_v4_do_rcv+0x764/0xc40 net/ipv4/tcp_ipv4.c:1937
       sk_backlog_rcv include/net/sock.h:1106 [inline]
       __release_sock+0x20f/0x350 net/core/sock.c:2983
       release_sock+0x61/0x1f0 net/core/sock.c:3549
       mptcp_subflow_shutdown+0x3d0/0x620 net/mptcp/protocol.c:2907
       mptcp_check_send_data_fin+0x225/0x410 net/mptcp/protocol.c:2976
       __mptcp_close+0x238/0xad0 net/mptcp/protocol.c:3072
       mptcp_close+0x2a/0x1a0 net/mptcp/protocol.c:3127
       inet_release+0x190/0x1f0 net/ipv4/af_inet.c:437
       __sock_release net/socket.c:659 [inline]
       sock_close+0xc0/0x240 net/socket.c:1421
       __fput+0x41b/0x890 fs/file_table.c:422
       task_work_run+0x23b/0x300 kernel/task_work.c:180
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0x9c8/0x2540 kernel/exit.c:878
       do_group_exit+0x201/0x2b0 kernel/exit.c:1027
       __do_sys_exit_group kernel/exit.c:1038 [inline]
       __se_sys_exit_group kernel/exit.c:1036 [inline]
       __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xe4/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x67/0x6f
      RIP: 0033:0x7f6c2b5005b6
      Code: Unable to access opcode bytes at 0x7f6c2b50058c.
      RSP: 002b:00007ffe883eb948 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 00007f6c2b5862f0 RCX: 00007f6c2b5005b6
      RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
      RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffc0
      R10: 0000000000000006 R11: 0000000000000246 R12: 00007f6c2b5862f0
      R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
       </TASK>
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Reported-by: default avatarYue Sun <samsun1006219@gmail.com>
      Reported-by: default avatarxingwei lee <xrivendell7@gmail.com>
      Closes: https://lore.kernel.org/netdev/CAEkJfYNJM=cw-8x7_Vmj1J6uYVCWMbbvD=EFmDPVBGpTsqOxEA@mail.gmail.com/
      Fixes: e3118e83 ("net: tcp: add DCTCP congestion control algorithm")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240517091626.32772-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3ebc46ca
    • Hangbin Liu's avatar
      selftests/net: use tc rule to filter the na packet · ea63ac14
      Hangbin Liu authored
      Test arp_ndisc_untracked_subnets use tcpdump to filter the unsolicited
      and untracked na messages. It set -e before calling tcpdump. But if
      tcpdump filters 0 packet, it will return none zero, and cause the script
      to exit.
      
      Instead of using slow tcpdump to capture packets, let's using tc rule
      to filter out the na message.
      
      At the same time, fix function setup_v6 which only needs one parameter.
      Move all the related helpers from forwarding lib.sh to net lib.sh.
      
      Fixes: 0ea7b0a4 ("selftests: net: arp_ndisc_untracked_subnets: test for arp_accept and accept_untracked_na")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240517010327.2631319-1-liuhangbin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ea63ac14
    • Hangbin Liu's avatar
      ipv6: sr: fix memleak in seg6_hmac_init_algo · efb9f4f1
      Hangbin Liu authored
      seg6_hmac_init_algo returns without cleaning up the previous allocations
      if one fails, so it's going to leak all that memory and the crypto tfms.
      
      Update seg6_hmac_exit to only free the memory when allocated, so we can
      reuse the code directly.
      
      Fixes: bf355b8d ("ipv6: sr: add core files for SR HMAC support")
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Closes: https://lore.kernel.org/netdev/Zj3bh-gE7eT6V6aH@hog/Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/20240517005435.2600277-1-liuhangbin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      efb9f4f1
    • Kuniyuki Iwashima's avatar
      af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock. · 9841991a
      Kuniyuki Iwashima authored
      Billy Jheng Bing-Jhong reported a race between __unix_gc() and
      queue_oob().
      
      __unix_gc() tries to garbage-collect close()d inflight sockets,
      and then if the socket has MSG_OOB in unix_sk(sk)->oob_skb, GC
      will drop the reference and set NULL to it locklessly.
      
      However, the peer socket still can send MSG_OOB message and
      queue_oob() can update unix_sk(sk)->oob_skb concurrently, leading
      NULL pointer dereference. [0]
      
      To fix the issue, let's update unix_sk(sk)->oob_skb under the
      sk_receive_queue's lock and take it everywhere we touch oob_skb.
      
      Note that we defer kfree_skb() in manage_oob() to silence lockdep
      false-positive (See [1]).
      
      [0]:
      BUG: kernel NULL pointer dereference, address: 0000000000000008
       PF: supervisor write access in kernel mode
       PF: error_code(0x0002) - not-present page
      PGD 8000000009f5e067 P4D 8000000009f5e067 PUD 9f5d067 PMD 0
      Oops: 0002 [#1] PREEMPT SMP PTI
      CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc5-00191-gd091e579 #110
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Workqueue: events delayed_fput
      RIP: 0010:skb_dequeue (./include/linux/skbuff.h:2386 ./include/linux/skbuff.h:2402 net/core/skbuff.c:3847)
      Code: 39 e3 74 3e 8b 43 10 48 89 ef 83 e8 01 89 43 10 49 8b 44 24 08 49 c7 44 24 08 00 00 00 00 49 8b 14 24 49 c7 04 24 00 00 00 00 <48> 89 42 08 48 89 10 e8 e7 c5 42 00 4c 89 e0 5b 5d 41 5c c3 cc cc
      RSP: 0018:ffffc900001bfd48 EFLAGS: 00000002
      RAX: 0000000000000000 RBX: ffff8880088f5ae8 RCX: 00000000361289f9
      RDX: 0000000000000000 RSI: 0000000000000206 RDI: ffff8880088f5b00
      RBP: ffff8880088f5b00 R08: 0000000000080000 R09: 0000000000000001
      R10: 0000000000000003 R11: 0000000000000001 R12: ffff8880056b6a00
      R13: ffff8880088f5280 R14: 0000000000000001 R15: ffff8880088f5a80
      FS:  0000000000000000(0000) GS:ffff88807dd80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000008 CR3: 0000000006314000 CR4: 00000000007506f0
      PKRU: 55555554
      Call Trace:
       <TASK>
       unix_release_sock (net/unix/af_unix.c:654)
       unix_release (net/unix/af_unix.c:1050)
       __sock_release (net/socket.c:660)
       sock_close (net/socket.c:1423)
       __fput (fs/file_table.c:423)
       delayed_fput (fs/file_table.c:444 (discriminator 3))
       process_one_work (kernel/workqueue.c:3259)
       worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
       kthread (kernel/kthread.c:388)
       ret_from_fork (arch/x86/kernel/process.c:153)
       ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
       </TASK>
      Modules linked in:
      CR2: 0000000000000008
      
      Link: https://lore.kernel.org/netdev/a00d3993-c461-43f2-be6d-07259c98509a@rbox.co/ [1]
      Fixes: 1279f9d9 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.")
      Reported-by: default avatarBilly Jheng Bing-Jhong <billy@starlabs.sg>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240516134835.8332-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9841991a
    • Heiner Kallweit's avatar
      Revert "r8169: don't try to disable interrupts if NAPI is, scheduled already" · eabb8a9b
      Heiner Kallweit authored
      This reverts commit 7274c414.
      
      Ken reported that RTL8125b can lock up if gro_flush_timeout has the
      default value of 20000 and napi_defer_hard_irqs is set to 0.
      In this scenario device interrupts aren't disabled, what seems to
      trigger some silicon bug under heavy load. I was able to reproduce this
      behavior on RTL8168h. Fix this by reverting 7274c414.
      
      Fixes: 7274c414 ("r8169: don't try to disable interrupts if NAPI is scheduled already")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarKen Milmore <ken.milmore@gmail.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/9b5b6f4c-4f54-4b90-b0b3-8d8023c2e780@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      eabb8a9b
  2. 20 May, 2024 4 commits
  3. 18 May, 2024 10 commits
    • Linus Torvalds's avatar
      kprobe/ftrace: fix build error due to bad function definition · 4b377b48
      Linus Torvalds authored
      Commit 1a7d0890 ("kprobe/ftrace: bail out if ftrace was killed")
      introduced a bad K&R function definition, which we haven't accepted in a
      long long time.
      
      Gcc seems to let it slide, but clang notices with the appropriate error:
      
        kernel/kprobes.c:1140:24: error: a function declaration without a prototype is deprecated in all >
         1140 | void kprobe_ftrace_kill()
              |                        ^
              |                         void
      
      but this commit was apparently never in linux-next before it was sent
      upstream, so it didn't get the appropriate build test coverage.
      
      Fixes: 1a7d0890 kprobe/ftrace: bail out if ftrace was killed
      Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
      Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b377b48
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f08a1e91
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Current release - regressions:
      
         - virtio_net: fix missed error path rtnl_unlock after control queue
           locking rework
      
        Current release - new code bugs:
      
         - bpf: fix KASAN slab-out-of-bounds in percpu_array_map_gen_lookup,
           caused by missing nested map handling
      
         - drv: dsa: correct initialization order for KSZ88x3 ports
      
        Previous releases - regressions:
      
         - af_packet: do not call packet_read_pending() from
           tpacket_destruct_skb() fix performance regression
      
         - ipv6: fix route deleting failure when metric equals 0, don't assume
           0 means not set / default in this case
      
        Previous releases - always broken:
      
         - bridge: couple of syzbot-driven fixes"
      
      * tag 'net-6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits)
        selftests: net: local_termination: annotate the expected failures
        net: dsa: microchip: Correct initialization order for KSZ88x3 ports
        MAINTAINERS: net: Update reviewers for TI's Ethernet drivers
        dt-bindings: net: ti: Update maintainers list
        l2tp: fix ICMP error handling for UDP-encap sockets
        net: txgbe: fix to control VLAN strip
        net: wangxun: match VLAN CTAG and STAG features
        net: wangxun: fix to change Rx features
        af_packet: do not call packet_read_pending() from tpacket_destruct_skb()
        virtio_net: Fix missed rtnl_unlock
        netrom: fix possible dead-lock in nr_rt_ioctl()
        idpf: don't skip over ethtool tcp-data-split setting
        dt-bindings: net: qcom: ethernet: Allow dma-coherent
        bonding: fix oops during rmmod
        net/ipv6: Fix route deleting failure when metric equals 0
        selftests/net: reduce xfrm_policy test time
        selftests/bpf: Adjust btf_dump test to reflect recent change in file_operations
        selftests/bpf: Adjust test_access_variable_array after a kernel function name change
        selftests/net/lib: no need to record ns name if it already exist
        net: qrtr: ns: Fix module refcnt
        ...
      f08a1e91
    • Linus Torvalds's avatar
      Merge tag 'trace-tools-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 26aa834f
      Linus Torvalds authored
      Pull tracing tool updates from Steven Rostedt:
       "Specific for timerlat:
      
         - Improve the output of timerlat top by adding a missing \n, and by
           avoiding printing color-formatting characters where they are
           translated to regular characters.
      
         - Improve timerlat auto-analysis output by replacing '\t' with spaces
           to avoid copy-and-paste issues when reporting problems.
      
         - Make the user-space (-u) option the default, as it is the most
           complete test. Add a -k option to use the in-kernel workload.
      
         - On timerlat top and hist, add a summary with the overall results.
           For instance, the minimum value for all CPUs, the overall average
           and the maximum value from all CPUs.
      
         - timerlat hist was printing initial values (i.e., 0 as max, and ~0
           as min) if the trace stopped before the first Ret-User event. This
           problem was fixed by printing the " - " no value string to the
           output if that was the case.
      
        For all RTLA tools:
      
         - Add a --warm-up <seconds> option, allowing the workload to run for
           <seconds> before starting to collect results.
      
         - Add a --trace-buffer-size option, allowing the user to set the
           tracing buffer size for -t option. This option is mainly useful for
           reducing the trace file. Now rtla depends on libtracefs >= 1.6.
      
         - Fix the -t [trace_file] parsing, now it does not require the '='
           before the option parameter, and better handles the multiple ways a
           user can pass the trace_file.txt"
      
      * tag 'trace-tools-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        rtla: Documentation: Fix -t, --trace
        rtla: Fix -t\--trace[=file]
        rtla/timerlat: Fix histogram report when a cpu count is 0
        rtla: Add --trace-buffer-size option
        rtla/timerlat: Make user-space threads the default
        rtla: Add the --warm-up option
        rtla/timerlat: Add a summary for hist mode
        rtla/timerlat: Add a summary for top mode
        rtla/timerlat: Use pretty formatting only on interactive tty
        rtla/auto-analysis: Replace \t with spaces
        rtla/timerlat: Simplify "no value" printing on top
      26aa834f
    • Linus Torvalds's avatar
      Merge tag 'trace-user-events-v6.10' of... · fa3889d9
      Linus Torvalds authored
      Merge tag 'trace-user-events-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing user-event updates from Steven Rostedt:
      
       - Minor update to the user_events interface
      
        The ABI of creating a user event states that the fields are separated
        by semicolons, and spaces should be ignored.
      
        But the parsing expected at least one space to be there (which was
        incorrect). Fix the reading of the string to handle fields separated
        by semicolons but no space between them.
      
        This does extend the API sightly as now "field;field" will now be
        parsed and not cause an error. But it should not cause any regressions
        as no logic should expect it to fail.
      
        Note, that the logic that parses the event fields to create the
        trace_event works with no spaces after the semi-colon. It is
        the logic that tests against existing events that is inconsistent.
        This causes registering an event without using spaces to succeed
        if it doesn't exist, but makes the same call that tries to register
        to the same event, but doesn't use spaces, fail.
      
      * tag 'trace-user-events-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        selftests/user_events: Add non-spacing separator check
        tracing/user_events: Fix non-spaced field matching
      fa3889d9
    • Linus Torvalds's avatar
      Merge tag 'trace-ringbuffer-v6.10' of... · 53683e40
      Linus Torvalds authored
      Merge tag 'trace-ringbuffer-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing ring buffer updates from Steven Rostedt:
       "Add ring_buffer memory mappings.
      
        The tracing ring buffer was created based on being mostly used with
        the splice system call. It is broken up into page ordered sub-buffers
        and the reader swaps a new sub-buffer with an existing sub-buffer
        that's part of the write buffer. It then has total access to the
        swapped out sub-buffer and can do copyless movements of the memory
        into other mediums (file system, network, etc).
      
        The buffer is great for passing around the ring buffer contents in the
        kernel, but is not so good for when the consumer is the user space
        task itself.
      
        A new interface is added that allows user space to memory map the ring
        buffer. It will get all the write sub-buffers as well as reader
        sub-buffer (that is not written to). It can send an ioctl to change
        which sub-buffer is the new reader sub-buffer.
      
        The ring buffer is read only to user space. It only needs to call the
        ioctl when it is finished with a sub-buffer and needs a new sub-buffer
        that the writer will not write over.
      
        A self test program was also created for testing and can be used as an
        example for the interface to user space. The libtracefs (external to
        the kernel) also has code that interacts with this, although it is
        disabled until the interface is in a official release. It can be
        enabled by compiling the library with a special flag. This was used
        for testing applications that perform better with the buffer being
        mapped.
      
        Memory mapped buffers have limitations. The main one is that it can
        not be used with the snapshot logic. If the buffer is mapped,
        snapshots will be disabled. If any logic is set to trigger snapshots
        on a buffer, that buffer will not be allowed to be mapped"
      
      * tag 'trace-ringbuffer-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ring-buffer: Add cast to unsigned long addr passed to virt_to_page()
        ring-buffer: Have mmapped ring buffer keep track of missed events
        ring-buffer/selftest: Add ring-buffer mapping test
        Documentation: tracing: Add ring-buffer mapping
        tracing: Allow user-space mapping of the ring-buffer
        ring-buffer: Introducing ring-buffer mapping functions
        ring-buffer: Allocate sub-buffers with __GFP_COMP
      53683e40
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 594d2815
      Linus Torvalds authored
      Pull tracing updates from Steven Rostedt:
      
       - Remove unused ftrace_direct_funcs variables
      
       - Fix a possible NULL pointer dereference race in eventfs
      
       - Update do_div() usage in trace event benchmark test
      
       - Speedup direct function registration with asynchronous RCU callback.
      
         The synchronization was done in the registration code and this caused
         delays when registering direct callbacks. Move the freeing to a
         call_rcu() that will prevent delaying of the registering.
      
       - Replace simple_strtoul() usage with kstrtoul()
      
      * tag 'trace-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        eventfs: Fix a possible null pointer dereference in eventfs_find_events()
        ftrace: Fix possible use-after-free issue in ftrace_location()
        ftrace: Remove unused global 'ftrace_direct_func_count'
        ftrace: Remove unused list 'ftrace_direct_funcs'
        tracing: Improve benchmark test performance by using do_div()
        ftrace: Use asynchronous grace period for register_ftrace_direct()
        ftrace: Replaces simple_strtoul in ftrace
      594d2815
    • Linus Torvalds's avatar
      Merge tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 70a66320
      Linus Torvalds authored
      Pull probes updates from Masami Hiramatsu:
      
       - tracing/probes: Add new pseudo-types %pd and %pD support for dumping
         dentry name from 'struct dentry *' and file name from 'struct file *'
      
       - uprobes performance optimizations:
          - Speed up the BPF uprobe event by delaying the fetching of the
            uprobe event arguments that are not used in BPF
          - Avoid locking by speculatively checking whether uprobe event is
            valid
          - Reduce lock contention by using read/write_lock instead of
            spinlock for uprobe list operation. This improved BPF uprobe
            benchmark result 43% on average
      
       - rethook: Remove non-fatal warning messages when tracing stack from
         BPF and skip rcu_is_watching() validation in rethook if possible
      
       - objpool: Optimize objpool (which is used by kretprobes and fprobe as
         rethook backend storage) by inlining functions and avoid caching
         nr_cpu_ids because it is a const value
      
       - fprobe: Add entry/exit callbacks types (code cleanup)
      
       - kprobes: Check ftrace was killed in kprobes if it uses ftrace
      
      * tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        kprobe/ftrace: bail out if ftrace was killed
        selftests/ftrace: Fix required features for VFS type test case
        objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
        objpool: enable inlining objpool_push() and objpool_pop() operations
        rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get()
        ftrace: make extra rcu_is_watching() validation check optional
        uprobes: reduce contention on uprobes_tree access
        rethook: Remove warning messages printed for finding return address of a frame.
        fprobe: Add entry/exit callbacks types
        selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"
        selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"
        Documentation: tracing: add new type '%pd' and '%pD' for kprobe
        tracing/probes: support '%pD' type for print struct file's name
        tracing/probes: support '%pd' type for print struct dentry's name
        uprobes: add speculative lockless system-wide uprobe filter check
        uprobes: prepare uprobe args buffer lazily
        uprobes: encapsulate preparation of uprobe args buffer
      70a66320
    • Linus Torvalds's avatar
      Merge tag 'bootconfig-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · e9d68251
      Linus Torvalds authored
      Pull bootconfig updates from Masami Hiramatsu:
      
       - Do not put unneeded quotes on the extra command line items which was
         inserted from the bootconfig.
      
       - Remove redundant spaces from the extra command line.
      
      * tag 'bootconfig-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        init/main.c: Minor cleanup for the setup_command_line() function
        init/main.c: Remove redundant space from saved_command_line
        bootconfig: do not put quotes on cmdline items unless necessary
      e9d68251
    • Linus Torvalds's avatar
      Merge tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl · 91b6163b
      Linus Torvalds authored
      Pull sysctl updates from Joel Granados:
      
       - Remove sentinel elements from ctl_table structs in kernel/*
      
         Removing sentinels in ctl_table arrays reduces the build time size
         and runtime memory consumed by ~64 bytes per array. Removals for
         net/, io_uring/, mm/, ipc/ and security/ are set to go into mainline
         through their respective subsystems making the next release the most
         likely place where the final series that removes the check for
         proc_name == NULL will land.
      
         This adds to removals already in arch/, drivers/ and fs/.
      
       - Adjust ctl_table definitions and references to allow constification
           - Remove unused ctl_table function arguments
           - Move non-const elements from ctl_table to ctl_table_header
           - Make ctl_table pointers const in ctl_table_root structure
      
         Making the static ctl_table structs const will increase safety by
         keeping the pointers to proc_handler functions in .rodata. Though no
         ctl_tables where made const in this PR, the ground work for making
         that possible has started with these changes sent by Thomas
         Weißschuh.
      
      * tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
        sysctl: drop now unnecessary out-of-bounds check
        sysctl: move sysctl type to ctl_table_header
        sysctl: drop sysctl_is_perm_empty_ctl_table
        sysctl: treewide: constify argument ctl_table_root::permissions(table)
        sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table)
        bpf: Remove the now superfluous sentinel elements from ctl_table array
        delayacct: Remove the now superfluous sentinel elements from ctl_table array
        kprobes: Remove the now superfluous sentinel elements from ctl_table array
        printk: Remove the now superfluous sentinel elements from ctl_table array
        scheduler: Remove the now superfluous sentinel elements from ctl_table array
        seccomp: Remove the now superfluous sentinel elements from ctl_table array
        timekeeping: Remove the now superfluous sentinel elements from ctl_table array
        ftrace: Remove the now superfluous sentinel elements from ctl_table array
        umh: Remove the now superfluous sentinel elements from ctl_table array
        kernel misc: Remove the now superfluous sentinel elements from ctl_table array
      91b6163b
    • Linus Torvalds's avatar
      Merge tag 'devicetree-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 06f054b1
      Linus Torvalds authored
      Pull devicetree updates from Rob Herring:
       "DT Bindings:
      
         - Convert samsung,exynos5-dp, atmel,lcdc, aspeed,ast2400-wdt bindings
           to schemas
      
         - Add bindings for Allwinner H616 NMI controller, Renesas r8a779g0
           irqc, Renesas R-Car V4M TMU and CMT timers, Freescale S32G3
           linflexuart, and Mediatek MT7988 XHCI
      
         - Add 'reg' constraints on DSI and SPI display panels
      
         - More dropping of unnecessary quotes in schemas
      
         - Use full paths rather than relative paths in schema $refs
      
         - Drop redundant storing of phandle for reserved memory
      
        DT Core:
      
         - Use scope based cleanups for kfree() and of_node_put()
      
         - Track interrupt-map and power-supplies for fw_devlink
      
         - Add buffer overflow check in of_modalias()
      
         - Add and use __of_prop_free() helper for freeing struct property"
      
      * tag 'devicetree-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (25 commits)
        of: property: Add fw_devlink support for interrupt-map property
        dt-bindings: display: panel: constrain 'reg' in DSI panels
        dt-bindings: display: panel: constrain 'reg' in SPI panels
        dt-bindings: display: samsung,ams495qa01: add missing SPI properties ref
        dt-bindings: Use full path to other schemas
        dt-bindings: PCI: qcom,pcie-sm8350: Drop redundant 'oneOf' sub-schema
        of: module: add buffer overflow check in of_modalias()
        dt-bindings: PCI: microchip: increase number of items in ranges property
        dt-bindings: Drop unnecessary quotes on keys
        dt-bindings: interrupt-controller: mediatek,mt6577-sysirq: Drop unnecessary quotes
        of: property: Use scope based cleanup on port_node
        of: reserved_mem: Remove the use of phandle from the reserved_mem APIs
        of: property: fw_devlink: Add support for "power-supplies" binding
        dt-bindings: watchdog: aspeed,ast2400-wdt: Convert to DT schema
        dt-bindings: irq: sun7i-nmi: Add binding for the H616 NMI controller
        dt-bindings: interrupt-controller: renesas,irqc: Add r8a779g0 support
        dt-bindings: timer: renesas,tmu: Add R-Car V4M support
        dt-bindings: timer: renesas,cmt: Add R-Car V4M support
        of: Use scope based of_node_put() cleanups
        of: Use scope based kfree() cleanups
        ...
      06f054b1
  4. 17 May, 2024 18 commits
    • Jakub Kicinski's avatar
      selftests: net: local_termination: annotate the expected failures · fe56d6e4
      Jakub Kicinski authored
      Vladimir said when adding this test:
      
        The bridge driver fares particularly badly [...] mainly because
        it does not implement IFF_UNICAST_FLT.
      
      See commit 90b9566a ("selftests: forwarding: add a test for
      local_termination.sh").
      
      We don't want to hide the known gaps, but having a test which
      always fails prevents us from catching regressions. Report
      the cases we know may fail as XFAIL.
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/20240516152513.1115270-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fe56d6e4
    • Oleksij Rempel's avatar
      net: dsa: microchip: Correct initialization order for KSZ88x3 ports · f0fa8411
      Oleksij Rempel authored
      Adjust the initialization sequence of KSZ88x3 switches to enable
      802.1p priority control on Port 2 before configuring Port 1. This
      change ensures the apptrust functionality on Port 1 operates
      correctly, as it depends on the priority settings of Port 2. The
      prior initialization sequence incorrectly configured Port 1 first,
      which could lead to functional discrepancies.
      
      Fixes: a1ea5771 ("net: dsa: microchip: dcb: add special handling for KSZ88X3 family")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Acked-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Link: https://lore.kernel.org/r/20240517050121.2174412-1-o.rempel@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f0fa8411
    • Ravi Gunasekaran's avatar
    • Ravi Gunasekaran's avatar
      dt-bindings: net: ti: Update maintainers list · ce08eeb5
      Ravi Gunasekaran authored
      Update the list with the current maintainers of TI's CPSW ethernet
      peripheral.
      Signed-off-by: default avatarRavi Gunasekaran <r-gunasekaran@ti.com>
      Acked-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Acked-by: default avatarRoger Quadros <rogerq@kernel.org>
      Link: https://lore.kernel.org/r/20240516054932.27597-1-r-gunasekaran@ti.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ce08eeb5
    • Tom Parkin's avatar
      l2tp: fix ICMP error handling for UDP-encap sockets · 6e828dc6
      Tom Parkin authored
      Since commit a36e185e
      ("udp: Handle ICMP errors for tunnels with same destination port on both endpoints")
      UDP's handling of ICMP errors has allowed for UDP-encap tunnels to
      determine socket associations in scenarios where the UDP hash lookup
      could not.
      
      Subsequently, commit d26796ae
      ("udp: check udp sock encap_type in __udp_lib_err")
      subtly tweaked the approach such that UDP ICMP error handling would be
      skipped for any UDP socket which has encapsulation enabled.
      
      In the case of L2TP tunnel sockets using UDP-encap, this latter
      modification effectively broke ICMP error reporting for the L2TP
      control plane.
      
      To a degree this isn't catastrophic inasmuch as the L2TP control
      protocol defines a reliable transport on top of the underlying packet
      switching network which will eventually detect errors and time out.
      
      However, paying attention to the ICMP error reporting allows for more
      timely detection of errors in L2TP userspace, and aids in debugging
      connectivity issues.
      
      Reinstate ICMP error handling for UDP encap L2TP tunnels:
      
       * implement struct udp_tunnel_sock_cfg .encap_err_rcv in order to allow
         the L2TP code to handle ICMP errors;
      
       * only implement error-handling for tunnels which have a managed
         socket: unmanaged tunnels using a kernel socket have no userspace to
         report errors back to;
      
       * flag the error on the socket, which allows for userspace to get an
         error such as -ECONNREFUSED back from sendmsg/recvmsg;
      
       * pass the error into ip[v6]_icmp_error() which allows for userspace to
         get extended error information via. MSG_ERRQUEUE.
      
      Fixes: d26796ae ("udp: check udp sock encap_type in __udp_lib_err")
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Link: https://lore.kernel.org/r/20240513172248.623261-1-tparkin@katalix.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6e828dc6
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 7ee332c9
      Linus Torvalds authored
      Pull parisc updates from Helge Deller:
      
       -  define sigset_t in parisc uapi header to fix build of util-linux
      
       -  define HAVE_ARCH_HUGETLB_UNMAPPED_AREA to avoid compiler warning
      
       -  drop unused 'exc_reg' struct in math-emu code
      
      * tag 'parisc-for-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
        parisc/math-emu: Remove unused struct 'exc_reg'
        parisc: Define sigset_t in parisc uapi header
      7ee332c9
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ff2632d7
      Linus Torvalds authored
      Pull powerpc updates from Michael Ellerman:
      
       - Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
      
       - Allow per-process DEXCR (Dynamic Execution Control Register) settings
         via prctl, notably NPHIE which controls hashst/hashchk for ROP
         protection.
      
       - Install powerpc selftests in sub-directories. Note this changes the
         way run_kselftest.sh needs to be invoked for powerpc selftests.
      
       - Change fadump (Firmware Assisted Dump) to better handle memory
         add/remove.
      
       - Add support for passing additional parameters to the fadump kernel.
      
       - Add support for updating the kdump image on CPU/memory add/remove
         events.
      
       - Other small features, cleanups and fixes.
      
      Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd
      Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe
      Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David
      Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff
      Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin
      Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh
      Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan
      Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang,
      Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth
      Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
      Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui.
      
      * tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits)
        powerpc/fadump: Fix section mismatch warning
        powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
        powerpc/fadump: update documentation about bootargs_append
        powerpc/fadump: pass additional parameters when fadump is active
        powerpc/fadump: setup additional parameters for dump capture kernel
        powerpc/pseries/fadump: add support for multiple boot memory regions
        selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction"
        KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()
        KVM: PPC: Fix documentation for ppc mmu caps
        KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
        KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
        powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
        powerpc/code-patching: Use dedicated memory routines for patching
        powerpc/code-patching: Test patch_instructions() during boot
        powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
        powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
        powerpc: Fix typos
        powerpc/eeh: Fix spelling of the word "auxillary" and update comment
        macintosh/ams: Fix unused variable warning
        powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
        ...
      ff2632d7
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux · 4853f1f6
      Linus Torvalds authored
      Pull ARM updates from Russell King:
      
       - Updates to AMBA bus subsystem to drop .owner struct device_driver
         initialisations, moving that to code instead.
      
       - Add LPAE privileged-access-never support
      
       - Add support for Clang CFI
      
       - clkdev: report over-sized device or connection strings
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: (36 commits)
        ARM: 9398/1: Fix userspace enter on LPAE with CC_OPTIMIZE_FOR_SIZE=y
        clkdev: report over-sized strings when creating clkdev entries
        ARM: 9393/1: mm: Use conditionals for CFI branches
        ARM: 9392/2: Support CLANG CFI
        ARM: 9391/2: hw_breakpoint: Handle CFI breakpoints
        ARM: 9390/2: lib: Annotate loop delay instructions for CFI
        ARM: 9389/2: mm: Define prototypes for all per-processor calls
        ARM: 9388/2: mm: Type-annotate all per-processor assembly routines
        ARM: 9387/2: mm: Rewrite cacheflush vtables in CFI safe C
        ARM: 9386/2: mm: Use symbol alias for cache functions
        ARM: 9385/2: mm: Type-annotate all cache assembly routines
        ARM: 9384/2: mm: Make tlbflush routines CFI safe
        ARM: 9382/1: ftrace: Define ftrace_stub_graph
        ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement
        ARM: 9357/2: Reduce the number of #ifdef CONFIG_CPU_SW_DOMAIN_PAN
        ARM: 9356/2: Move asm statements accessing TTBCR into C functions
        ARM: 9355/2: Add TTBCR_* definitions to pgtable-3level-hwdef.h
        ARM: 9379/1: coresight: tpda: drop owner assignment
        ARM: 9378/1: coresight: etm4x: drop owner assignment
        ARM: 9377/1: hwrng: nomadik: drop owner assignment
        ...
      4853f1f6
    • David S. Miller's avatar
      Merge branch 'wangxun-fixes' · f6f25eeb
      David S. Miller authored
      Jiawen Wu says:
      
      ====================
      Wangxun fixes
      
      Fixed some bugs when using ethtool to operate network devices.
      
      v4 -> v5:
      - Simplify if...else... to fix features.
      
      v3 -> v4:
      - Require both ctag and stag to be enabled or disabled.
      
      v2 -> v3:
      - Drop the first patch.
      
      v1 -> v2:
      - Factor out the same code.
      - Remove statistics printing with more than 64 queues.
      - Detail the commit logs to describe issues.
      - Remove reset flag check in wx_update_stats().
      - Change to set VLAN CTAG and STAG to be consistent.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6f25eeb
    • Jiawen Wu's avatar
      net: txgbe: fix to control VLAN strip · 1d3c6414
      Jiawen Wu authored
      When VLAN tag strip is changed to enable or disable, the hardware requires
      the Rx ring to be in a disabled state, otherwise the feature cannot be
      changed.
      
      Fixes: f3b03c65 ("net: wangxun: Implement vlan add and kill functions")
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d3c6414
    • Jiawen Wu's avatar
      net: wangxun: match VLAN CTAG and STAG features · ac71ab78
      Jiawen Wu authored
      Hardware requires VLAN CTAG and STAG configuration always matches. And
      whether VLAN CTAG or STAG changes, the configuration needs to be changed
      as well.
      
      Fixes: 6670f1ec ("net: txgbe: Add netdev features support")
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Reviewed-by: default avatarSai Krishna <saikrishnag@marvell.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac71ab78
    • Jiawen Wu's avatar
      net: wangxun: fix to change Rx features · 68067f06
      Jiawen Wu authored
      Fix the issue where some Rx features cannot be changed.
      
      When using ethtool -K to turn off rx offload, it returns error and
      displays "Could not change any device features". And netdev->features
      is not assigned a new value to actually configure the hardware.
      
      Fixes: 6dbedcff ("net: libwx: Implement xx_set_features ops")
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68067f06
    • Eric Dumazet's avatar
      af_packet: do not call packet_read_pending() from tpacket_destruct_skb() · 581073f6
      Eric Dumazet authored
      trafgen performance considerably sank on hosts with many cores
      after the blamed commit.
      
      packet_read_pending() is very expensive, and calling it
      in af_packet fast path defeats Daniel intent in commit
      b0138408 ("packet: use percpu mmap tx frame pending refcount")
      
      tpacket_destruct_skb() makes room for one packet, we can immediately
      wakeup a producer, no need to completely drain the tx ring.
      
      Fixes: 89ed5b51 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20240515163358.4105915-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      581073f6
    • Daniel Jurgens's avatar
      virtio_net: Fix missed rtnl_unlock · fa033def
      Daniel Jurgens authored
      The rtnl_lock would stay locked if allocating promisc_allmulti failed.
      Also changed the allocation to GFP_KERNEL.
      
      Fixes: ff7c7d9f ("virtio_net: Remove command data from control_buf")
      Reported-by: default avatarEric Dumazet <edumaset@google.com>
      Link: https://lore.kernel.org/netdev/CANn89iLazVaUCvhPm6RPJJ0owra_oFnx7Fhc8d60gV-65ad3WQ@mail.gmail.com/Signed-off-by: default avatarDaniel Jurgens <danielj@nvidia.com>
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20240515163125.569743-1-danielj@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fa033def
    • Eric Dumazet's avatar
      netrom: fix possible dead-lock in nr_rt_ioctl() · e03e7f20
      Eric Dumazet authored
      syzbot loves netrom, and found a possible deadlock in nr_rt_ioctl [1]
      
      Make sure we always acquire nr_node_list_lock before nr_node_lock(nr_node)
      
      [1]
      WARNING: possible circular locking dependency detected
      6.9.0-rc7-syzkaller-02147-g654de42f #0 Not tainted
      ------------------------------------------------------
      syz-executor350/5129 is trying to acquire lock:
       ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
       ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_node_lock include/net/netrom.h:152 [inline]
       ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:464 [inline]
       ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697
      
      but task is already holding lock:
       ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
       ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline]
       ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (nr_node_list_lock){+...}-{2:2}:
              lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
              __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
              _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
              spin_lock_bh include/linux/spinlock.h:356 [inline]
              nr_remove_node net/netrom/nr_route.c:299 [inline]
              nr_del_node+0x4b4/0x820 net/netrom/nr_route.c:355
              nr_rt_ioctl+0xa95/0x1090 net/netrom/nr_route.c:683
              sock_do_ioctl+0x158/0x460 net/socket.c:1222
              sock_ioctl+0x629/0x8e0 net/socket.c:1341
              vfs_ioctl fs/ioctl.c:51 [inline]
              __do_sys_ioctl fs/ioctl.c:904 [inline]
              __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890
              do_syscall_x64 arch/x86/entry/common.c:52 [inline]
              do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
             entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      -> #0 (&nr_node->node_lock){+...}-{2:2}:
              check_prev_add kernel/locking/lockdep.c:3134 [inline]
              check_prevs_add kernel/locking/lockdep.c:3253 [inline]
              validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
              __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
              lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
              __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
              _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
              spin_lock_bh include/linux/spinlock.h:356 [inline]
              nr_node_lock include/net/netrom.h:152 [inline]
              nr_dec_obs net/netrom/nr_route.c:464 [inline]
              nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697
              sock_do_ioctl+0x158/0x460 net/socket.c:1222
              sock_ioctl+0x629/0x8e0 net/socket.c:1341
              vfs_ioctl fs/ioctl.c:51 [inline]
              __do_sys_ioctl fs/ioctl.c:904 [inline]
              __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890
              do_syscall_x64 arch/x86/entry/common.c:52 [inline]
              do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
             entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(nr_node_list_lock);
                                     lock(&nr_node->node_lock);
                                     lock(nr_node_list_lock);
        lock(&nr_node->node_lock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor350/5129:
        #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
        #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline]
        #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697
      
      stack backtrace:
      CPU: 0 PID: 5129 Comm: syz-executor350 Not tainted 6.9.0-rc7-syzkaller-02147-g654de42f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
        check_prev_add kernel/locking/lockdep.c:3134 [inline]
        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
        validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
        __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
        __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
        _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
        spin_lock_bh include/linux/spinlock.h:356 [inline]
        nr_node_lock include/net/netrom.h:152 [inline]
        nr_dec_obs net/netrom/nr_route.c:464 [inline]
        nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697
        sock_do_ioctl+0x158/0x460 net/socket.c:1222
        sock_ioctl+0x629/0x8e0 net/socket.c:1341
        vfs_ioctl fs/ioctl.c:51 [inline]
        __do_sys_ioctl fs/ioctl.c:904 [inline]
        __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240515142934.3708038-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e03e7f20
    • Michal Schmidt's avatar
      idpf: don't skip over ethtool tcp-data-split setting · 67708158
      Michal Schmidt authored
      Disabling tcp-data-split on idpf silently fails:
        # ethtool -G $NETDEV tcp-data-split off
        # ethtool -g $NETDEV | grep 'TCP data split'
        TCP data split:        on
      
      But it works if you also change 'tx' or 'rx':
        # ethtool -G $NETDEV tcp-data-split off tx 256
        # ethtool -g $NETDEV | grep 'TCP data split'
        TCP data split:        off
      
      The bug is in idpf_set_ringparam, where it takes a shortcut out if the
      TX and RX sizes are not changing. Fix it by checking also if the
      tcp-data-split setting remains unchanged. Only then can the soft reset
      be skipped.
      
      Fixes: 9b1aa3ef ("idpf: add get/set for Ethtool's header split ringparam")
      Reported-by: default avatarXu Du <xudu@redhat.com>
      Closes: https://issues.redhat.com/browse/RHEL-36182Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reviewed-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Link: https://lore.kernel.org/r/20240515092414.158079-1-mschmidt@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      67708158
    • Sagar Cheluvegowda's avatar
    • Tony Battersby's avatar
      bonding: fix oops during rmmod · a45835a0
      Tony Battersby authored
      "rmmod bonding" causes an oops ever since commit cc317ea3 ("bonding:
      remove redundant NULL check in debugfs function").  Here are the relevant
      functions being called:
      
      bonding_exit()
        bond_destroy_debugfs()
          debugfs_remove_recursive(bonding_debug_root);
          bonding_debug_root = NULL; <--------- SET TO NULL HERE
        bond_netlink_fini()
          rtnl_link_unregister()
            __rtnl_link_unregister()
              unregister_netdevice_many_notify()
                bond_uninit()
                  bond_debug_unregister()
                    (commit removed check for bonding_debug_root == NULL)
                    debugfs_remove()
                    simple_recursive_removal()
                      down_write() -> OOPS
      
      However, reverting the bad commit does not solve the problem completely
      because the original code contains a race that could cause the same
      oops, although it was much less likely to be triggered unintentionally:
      
      CPU1
        rmmod bonding
          bonding_exit()
            bond_destroy_debugfs()
              debugfs_remove_recursive(bonding_debug_root);
      
      CPU2
        echo -bond0 > /sys/class/net/bonding_masters
          bond_uninit()
            bond_debug_unregister()
              if (!bonding_debug_root)
      
      CPU1
              bonding_debug_root = NULL;
      
      So do NOT revert the bad commit (since the removed checks were racy
      anyway), and instead change the order of actions taken during module
      removal.  The same oops can also happen if there is an error during
      module init, so apply the same fix there.
      
      Fixes: cc317ea3 ("bonding: remove redundant NULL check in debugfs function")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTony Battersby <tonyb@cybernetics.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Link: https://lore.kernel.org/r/641f914f-3216-4eeb-87dd-91b78aa97773@cybernetics.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a45835a0