1. 10 Feb, 2022 3 commits
    • Sebastian Andrzej Siewior's avatar
      tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH. · 4f9bf2a2
      Sebastian Andrzej Siewior authored
      Commit
         9652dc2e ("tcp: relax listening_hash operations")
      
      removed the need to disable bottom half while acquiring
      listening_hash.lock. There are still two callers left which disable
      bottom half before the lock is acquired.
      
      On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
      as a lock to ensure that resources, that are protected by disabling
      bottom halves, remain protected.
      This leads to a circular locking dependency if the lock acquired with
      disabled bottom halves is also acquired with enabled bottom halves
      followed by disabling bottom halves. This is the reverse locking order.
      It has been observed with inet_listen_hashbucket::lock:
      
      local_bh_disable() + spin_lock(&ilb->lock):
        inet_listen()
          inet_csk_listen_start()
            sk->sk_prot->hash() := inet_hash()
      	local_bh_disable()
      	__inet_hash()
      	  spin_lock(&ilb->lock);
      	    acquire(&ilb->lock);
      
      Reverse order: spin_lock(&ilb2->lock) + local_bh_disable():
        tcp_seq_next()
          listening_get_next()
            spin_lock(&ilb2->lock);
      	acquire(&ilb2->lock);
      
        tcp4_seq_show()
          get_tcp4_sock()
            sock_i_ino()
      	read_lock_bh(&sk->sk_callback_lock);
      	  acquire(softirq_ctrl)	// <---- whoops
      	  acquire(&sk->sk_callback_lock)
      
      Drop local_bh_disable() around __inet_hash() which acquires
      listening_hash->lock. Split inet_unhash() and acquire the
      listen_hashbucket lock without disabling bottom halves; the inet_ehash
      lock with disabled bottom halves.
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de
      Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net
      Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4f9bf2a2
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 1127170d
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2022-02-09
      
      We've added 126 non-merge commits during the last 16 day(s) which contain
      a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).
      
      The main changes are:
      
      1) Add custom BPF allocator for JITs that pack multiple programs into a huge
         page to reduce iTLB pressure, from Song Liu.
      
      2) Add __user tagging support in vmlinux BTF and utilize it from BPF
         verifier when generating loads, from Yonghong Song.
      
      3) Add per-socket fast path check guarding from cgroup/BPF overhead when
         used by only some sockets, from Pavel Begunkov.
      
      4) Continued libbpf deprecation work of APIs/features and removal of their
         usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
         and various others.
      
      5) Improve BPF instruction set documentation by adding byte swap
         instructions and cleaning up load/store section, from Christoph Hellwig.
      
      6) Switch BPF preload infra to light skeleton and remove libbpf dependency
         from it, from Alexei Starovoitov.
      
      7) Fix architecture-agnostic macros in libbpf for accessing syscall
         arguments from BPF progs for non-x86 architectures,
         from Ilya Leoshkevich.
      
      8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
         of 16-bit field with anonymous zero padding, from Jakub Sitnicki.
      
      9) Add new bpf_copy_from_user_task() helper to read memory from a different
         task than current. Add ability to create sleepable BPF iterator progs,
         from Kenny Yu.
      
      10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
          utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.
      
      11) Generate temporary netns names for BPF selftests to avoid naming
          collisions, from Hangbin Liu.
      
      12) Implement bpf_core_types_are_compat() with limited recursion for
          in-kernel usage, from Matteo Croce.
      
      13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
          to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.
      
      14) Misc minor fixes to libbpf and selftests from various folks.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
        selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
        bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
        libbpf: Fix compilation warning due to mismatched printf format
        selftests/bpf: Test BPF_KPROBE_SYSCALL macro
        libbpf: Add BPF_KPROBE_SYSCALL macro
        libbpf: Fix accessing the first syscall argument on s390
        libbpf: Fix accessing the first syscall argument on arm64
        libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
        selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
        libbpf: Fix accessing syscall arguments on riscv
        libbpf: Fix riscv register names
        libbpf: Fix accessing syscall arguments on powerpc
        selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
        libbpf: Add PT_REGS_SYSCALL_REGS macro
        selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
        bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
        bpf: Fix leftover header->pages in sparc and powerpc code.
        libbpf: Fix signedness bug in btf_dump_array_data()
        selftests/bpf: Do not export subtest as standalone test
        bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1127170d
    • Menglong Dong's avatar
      net: drop_monitor: support drop reason · 5cad527d
      Menglong Dong authored
      In the commit c504e5c2 ("net: skb: introduce kfree_skb_reason()")
      drop reason is introduced to the tracepoint of kfree_skb. Therefore,
      drop_monitor is able to report the drop reason to users by netlink.
      
      The drop reasons are reported as string to users, which is exactly
      the same as what we do when reporting it to ftrace.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220209060838.55513-1-imagedong@tencent.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5cad527d
  2. 09 Feb, 2022 37 commits