1. 28 Oct, 2021 5 commits
    • Joanne Koong's avatar
      bpf/benchs: Add benchmark tests for bloom filter throughput + false positive · 57fd1c63
      Joanne Koong authored
      This patch adds benchmark tests for the throughput (for lookups + updates)
      and the false positive rate of bloom filter lookups, as well as some
      minor refactoring of the bash script for running the benchmarks.
      
      These benchmarks show that as the number of hash functions increases,
      the throughput and the false positive rate of the bloom filter decreases.
      >From the benchmark data, the approximate average false-positive rates
      are roughly as follows:
      
      1 hash function = ~30%
      2 hash functions = ~15%
      3 hash functions = ~5%
      4 hash functions = ~2.5%
      5 hash functions = ~1%
      6 hash functions = ~0.5%
      7 hash functions  = ~0.35%
      8 hash functions = ~0.15%
      9 hash functions = ~0.1%
      10 hash functions = ~0%
      
      For reference data, the benchmarks run on one thread on a machine
      with one numa node for 1 to 5 hash functions for 8-byte and 64-byte
      values are as follows:
      
      1 hash function:
        50k entries
      	8-byte value
      	    Lookups - 51.1 M/s operations
      	    Updates - 33.6 M/s operations
      	    False positive rate: 24.15%
      	64-byte value
      	    Lookups - 15.7 M/s operations
      	    Updates - 15.1 M/s operations
      	    False positive rate: 24.2%
        100k entries
      	8-byte value
      	    Lookups - 51.0 M/s operations
      	    Updates - 33.4 M/s operations
      	    False positive rate: 24.04%
      	64-byte value
      	    Lookups - 15.6 M/s operations
      	    Updates - 14.6 M/s operations
      	    False positive rate: 24.06%
        500k entries
      	8-byte value
      	    Lookups - 50.5 M/s operations
      	    Updates - 33.1 M/s operations
      	    False positive rate: 27.45%
      	64-byte value
      	    Lookups - 15.6 M/s operations
      	    Updates - 14.2 M/s operations
      	    False positive rate: 27.42%
        1 mil entries
      	8-byte value
      	    Lookups - 49.7 M/s operations
      	    Updates - 32.9 M/s operations
      	    False positive rate: 27.45%
      	64-byte value
      	    Lookups - 15.4 M/s operations
      	    Updates - 13.7 M/s operations
      	    False positive rate: 27.58%
        2.5 mil entries
      	8-byte value
      	    Lookups - 47.2 M/s operations
      	    Updates - 31.8 M/s operations
      	    False positive rate: 30.94%
      	64-byte value
      	    Lookups - 15.3 M/s operations
      	    Updates - 13.2 M/s operations
      	    False positive rate: 30.95%
        5 mil entries
      	8-byte value
      	    Lookups - 41.1 M/s operations
      	    Updates - 28.1 M/s operations
      	    False positive rate: 31.01%
      	64-byte value
      	    Lookups - 13.3 M/s operations
      	    Updates - 11.4 M/s operations
      	    False positive rate: 30.98%
      
      2 hash functions:
        50k entries
      	8-byte value
      	    Lookups - 34.1 M/s operations
      	    Updates - 20.1 M/s operations
      	    False positive rate: 9.13%
      	64-byte value
      	    Lookups - 8.4 M/s operations
      	    Updates - 7.9 M/s operations
      	    False positive rate: 9.21%
        100k entries
      	8-byte value
      	    Lookups - 33.7 M/s operations
      	    Updates - 18.9 M/s operations
      	    False positive rate: 9.13%
      	64-byte value
      	    Lookups - 8.4 M/s operations
      	    Updates - 7.7 M/s operations
      	    False positive rate: 9.19%
        500k entries
      	8-byte value
      	    Lookups - 32.7 M/s operations
      	    Updates - 18.1 M/s operations
      	    False positive rate: 12.61%
      	64-byte value
      	    Lookups - 8.4 M/s operations
      	    Updates - 7.5 M/s operations
      	    False positive rate: 12.61%
        1 mil entries
      	8-byte value
      	    Lookups - 30.6 M/s operations
      	    Updates - 18.9 M/s operations
      	    False positive rate: 12.54%
      	64-byte value
      	    Lookups - 8.0 M/s operations
      	    Updates - 7.0 M/s operations
      	    False positive rate: 12.52%
        2.5 mil entries
      	8-byte value
      	    Lookups - 25.3 M/s operations
      	    Updates - 16.7 M/s operations
      	    False positive rate: 16.77%
      	64-byte value
      	    Lookups - 7.9 M/s operations
      	    Updates - 6.5 M/s operations
      	    False positive rate: 16.88%
        5 mil entries
      	8-byte value
      	    Lookups - 20.8 M/s operations
      	    Updates - 14.7 M/s operations
      	    False positive rate: 16.78%
      	64-byte value
      	    Lookups - 7.0 M/s operations
      	    Updates - 6.0 M/s operations
      	    False positive rate: 16.78%
      
      3 hash functions:
        50k entries
      	8-byte value
      	    Lookups - 25.1 M/s operations
      	    Updates - 14.6 M/s operations
      	    False positive rate: 7.65%
      	64-byte value
      	    Lookups - 5.8 M/s operations
      	    Updates - 5.5 M/s operations
      	    False positive rate: 7.58%
        100k entries
      	8-byte value
      	    Lookups - 24.7 M/s operations
      	    Updates - 14.1 M/s operations
      	    False positive rate: 7.71%
      	64-byte value
      	    Lookups - 5.8 M/s operations
      	    Updates - 5.3 M/s operations
      	    False positive rate: 7.62%
        500k entries
      	8-byte value
      	    Lookups - 22.9 M/s operations
      	    Updates - 13.9 M/s operations
      	    False positive rate: 2.62%
      	64-byte value
      	    Lookups - 5.6 M/s operations
      	    Updates - 4.8 M/s operations
      	    False positive rate: 2.7%
        1 mil entries
      	8-byte value
      	    Lookups - 19.8 M/s operations
      	    Updates - 12.6 M/s operations
      	    False positive rate: 2.60%
      	64-byte value
      	    Lookups - 5.3 M/s operations
      	    Updates - 4.4 M/s operations
      	    False positive rate: 2.69%
        2.5 mil entries
      	8-byte value
      	    Lookups - 16.2 M/s operations
      	    Updates - 10.7 M/s operations
      	    False positive rate: 4.49%
      	64-byte value
      	    Lookups - 4.9 M/s operations
      	    Updates - 4.1 M/s operations
      	    False positive rate: 4.41%
        5 mil entries
      	8-byte value
      	    Lookups - 18.8 M/s operations
      	    Updates - 9.2 M/s operations
      	    False positive rate: 4.45%
      	64-byte value
      	    Lookups - 5.2 M/s operations
      	    Updates - 3.9 M/s operations
      	    False positive rate: 4.54%
      
      4 hash functions:
        50k entries
      	8-byte value
      	    Lookups - 19.7 M/s operations
      	    Updates - 11.1 M/s operations
      	    False positive rate: 1.01%
      	64-byte value
      	    Lookups - 4.4 M/s operations
      	    Updates - 4.0 M/s operations
      	    False positive rate: 1.00%
        100k entries
      	8-byte value
      	    Lookups - 19.5 M/s operations
      	    Updates - 10.9 M/s operations
      	    False positive rate: 1.00%
      	64-byte value
      	    Lookups - 4.3 M/s operations
      	    Updates - 3.9 M/s operations
      	    False positive rate: 0.97%
        500k entries
      	8-byte value
      	    Lookups - 18.2 M/s operations
      	    Updates - 10.6 M/s operations
      	    False positive rate: 2.05%
      	64-byte value
      	    Lookups - 4.3 M/s operations
      	    Updates - 3.7 M/s operations
      	    False positive rate: 2.05%
        1 mil entries
      	8-byte value
      	    Lookups - 15.5 M/s operations
      	    Updates - 9.6 M/s operations
      	    False positive rate: 1.99%
      	64-byte value
      	    Lookups - 4.0 M/s operations
      	    Updates - 3.4 M/s operations
      	    False positive rate: 1.99%
        2.5 mil entries
      	8-byte value
      	    Lookups - 13.8 M/s operations
      	    Updates - 7.7 M/s operations
      	    False positive rate: 3.91%
      	64-byte value
      	    Lookups - 3.7 M/s operations
      	    Updates - 3.6 M/s operations
      	    False positive rate: 3.78%
        5 mil entries
      	8-byte value
      	    Lookups - 13.0 M/s operations
      	    Updates - 6.9 M/s operations
      	    False positive rate: 3.93%
      	64-byte value
      	    Lookups - 3.5 M/s operations
      	    Updates - 3.7 M/s operations
      	    False positive rate: 3.39%
      
      5 hash functions:
        50k entries
      	8-byte value
      	    Lookups - 16.4 M/s operations
      	    Updates - 9.1 M/s operations
      	    False positive rate: 0.78%
      	64-byte value
      	    Lookups - 3.5 M/s operations
      	    Updates - 3.2 M/s operations
      	    False positive rate: 0.77%
        100k entries
      	8-byte value
      	    Lookups - 16.3 M/s operations
      	    Updates - 9.0 M/s operations
      	    False positive rate: 0.79%
      	64-byte value
      	    Lookups - 3.5 M/s operations
      	    Updates - 3.2 M/s operations
      	    False positive rate: 0.78%
        500k entries
      	8-byte value
      	    Lookups - 15.1 M/s operations
      	    Updates - 8.8 M/s operations
      	    False positive rate: 1.82%
      	64-byte value
      	    Lookups - 3.4 M/s operations
      	    Updates - 3.0 M/s operations
      	    False positive rate: 1.78%
        1 mil entries
      	8-byte value
      	    Lookups - 13.2 M/s operations
      	    Updates - 7.8 M/s operations
      	    False positive rate: 1.81%
      	64-byte value
      	    Lookups - 3.2 M/s operations
      	    Updates - 2.8 M/s operations
      	    False positive rate: 1.80%
        2.5 mil entries
      	8-byte value
      	    Lookups - 10.5 M/s operations
      	    Updates - 5.9 M/s operations
      	    False positive rate: 0.29%
      	64-byte value
      	    Lookups - 3.2 M/s operations
      	    Updates - 2.4 M/s operations
      	    False positive rate: 0.28%
        5 mil entries
      	8-byte value
      	    Lookups - 9.6 M/s operations
      	    Updates - 5.7 M/s operations
      	    False positive rate: 0.30%
      	64-byte value
      	    Lookups - 3.2 M/s operations
      	    Updates - 2.7 M/s operations
      	    False positive rate: 0.30%
      Signed-off-by: default avatarJoanne Koong <joannekoong@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027234504.30744-5-joannekoong@fb.com
      57fd1c63
    • Joanne Koong's avatar
      selftests/bpf: Add bloom filter map test cases · ed9109ad
      Joanne Koong authored
      This patch adds test cases for bpf bloom filter maps. They include tests
      checking against invalid operations by userspace, tests for using the
      bloom filter map as an inner map, and a bpf program that queries the
      bloom filter map for values added by a userspace program.
      Signed-off-by: default avatarJoanne Koong <joannekoong@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027234504.30744-4-joannekoong@fb.com
      ed9109ad
    • Joanne Koong's avatar
      libbpf: Add "map_extra" as a per-map-type extra flag · 47512102
      Joanne Koong authored
      This patch adds the libbpf infrastructure for supporting a
      per-map-type "map_extra" field, whose definition will be
      idiosyncratic depending on map type.
      
      For example, for the bloom filter map, the lower 4 bits of
      map_extra is used to denote the number of hash functions.
      
      Please note that until libbpf 1.0 is here, the
      "bpf_create_map_params" struct is used as a temporary
      means for propagating the map_extra field to the kernel.
      Signed-off-by: default avatarJoanne Koong <joannekoong@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027234504.30744-3-joannekoong@fb.com
      47512102
    • Joanne Koong's avatar
      bpf: Add bloom filter map implementation · 9330986c
      Joanne Koong authored
      This patch adds the kernel-side changes for the implementation of
      a bpf bloom filter map.
      
      The bloom filter map supports peek (determining whether an element
      is present in the map) and push (adding an element to the map)
      operations.These operations are exposed to userspace applications
      through the already existing syscalls in the following way:
      
      BPF_MAP_LOOKUP_ELEM -> peek
      BPF_MAP_UPDATE_ELEM -> push
      
      The bloom filter map does not have keys, only values. In light of
      this, the bloom filter map's API matches that of queue stack maps:
      user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
      which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
      and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
      APIs to query or add an element to the bloom filter map. When the
      bloom filter map is created, it must be created with a key_size of 0.
      
      For updates, the user will pass in the element to add to the map
      as the value, with a NULL key. For lookups, the user will pass in the
      element to query in the map as the value, with a NULL key. In the
      verifier layer, this requires us to modify the argument type of
      a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
      as well, in the syscall layer, we need to copy over the user value
      so that in bpf_map_peek_elem, we know which specific value to query.
      
      A few things to please take note of:
       * If there are any concurrent lookups + updates, the user is
      responsible for synchronizing this to ensure no false negative lookups
      occur.
       * The number of hashes to use for the bloom filter is configurable from
      userspace. If no number is specified, the default used will be 5 hash
      functions. The benchmarks later in this patchset can help compare the
      performance of using different number of hashes on different entry
      sizes. In general, using more hashes decreases both the false positive
      rate and the speed of a lookup.
       * Deleting an element in the bloom filter map is not supported.
       * The bloom filter map may be used as an inner map.
       * The "max_entries" size that is specified at map creation time is used
      to approximate a reasonable bitmap size for the bloom filter, and is not
      otherwise strictly enforced. If the user wishes to insert more entries
      into the bloom filter than "max_entries", they may do so but they should
      be aware that this may lead to a higher false positive rate.
      Signed-off-by: default avatarJoanne Koong <joannekoong@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027234504.30744-2-joannekoong@fb.com
      9330986c
    • Tiezhu Yang's avatar
      bpf, tests: Add module parameter test_suite to test_bpf module · b066abba
      Tiezhu Yang authored
      After commit 9298e63e ("bpf/tests: Add exhaustive tests of ALU
      operand magnitudes"), when modprobe test_bpf.ko with JIT on mips64,
      there exists segment fault due to the following reason:
      
        [...]
        ALU64_MOV_X: all register value magnitudes jited:1
        Break instruction in kernel code[#1]
        [...]
      
      It seems that the related JIT implementations of some test cases
      in test_bpf() have problems. At this moment, I do not care about
      the segment fault while I just want to verify the test cases of
      tail calls.
      
      Based on the above background and motivation, add the following
      module parameter test_suite to the test_bpf.ko:
      
        test_suite=<string>: only the specified test suite will be run, the
        string can be "test_bpf", "test_tail_calls" or "test_skb_segment".
      
      If test_suite is not specified, but test_id, test_name or test_range
      is specified, set 'test_bpf' as the default test suite. This is useful
      to only test the corresponding test suite when specifying the valid
      test_suite string.
      
      Any invalid test suite will result in -EINVAL being returned and no
      tests being run. If the test_suite is not specified or specified as
      empty string, it does not change the current logic, all of the test
      cases will be run.
      
      Here are some test results:
      
       # dmesg -c
       # modprobe test_bpf
       # dmesg | grep Summary
       test_bpf: Summary: 1009 PASSED, 0 FAILED, [0/997 JIT'ed]
       test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
       test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_bpf
       # dmesg | tail -1
       test_bpf: Summary: 1009 PASSED, 0 FAILED, [0/997 JIT'ed]
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_tail_calls
       # dmesg
       test_bpf: #0 Tail call leaf jited:0 21 PASS
       [...]
       test_bpf: #7 Tail call error path, index out of range jited:0 32 PASS
       test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_skb_segment
       # dmesg
       test_bpf: #0 gso_with_rx_frags PASS
       test_bpf: #1 gso_linear_no_head_frag PASS
       test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_id=1
       # dmesg
       test_bpf: test_bpf: set 'test_bpf' as the default test_suite.
       test_bpf: #1 TXA jited:0 54 51 50 PASS
       test_bpf: Summary: 1 PASSED, 0 FAILED, [0/1 JIT'ed]
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_bpf test_name=TXA
       # dmesg
       test_bpf: #1 TXA jited:0 54 50 51 PASS
       test_bpf: Summary: 1 PASSED, 0 FAILED, [0/1 JIT'ed]
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_tail_calls test_range=6,7
       # dmesg
       test_bpf: #6 Tail call error path, NULL target jited:0 41 PASS
       test_bpf: #7 Tail call error path, index out of range jited:0 32 PASS
       test_bpf: test_tail_calls: Summary: 2 PASSED, 0 FAILED, [0/2 JIT'ed]
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_skb_segment test_id=1
       # dmesg
       test_bpf: #1 gso_linear_no_head_frag PASS
       test_bpf: test_skb_segment: Summary: 1 PASSED, 0 FAILED
      
      By the way, the above segment fault has been fixed in the latest bpf-next
      tree which contains the mips64 JIT rework.
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Acked-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Link: https://lore.kernel.org/bpf/1635384321-28128-1-git-send-email-yangtiezhu@loongson.cn
      b066abba
  2. 27 Oct, 2021 10 commits
    • Tong Tiangen's avatar
      riscv, bpf: Add BPF exception tables · 252c765b
      Tong Tiangen authored
      When a tracing BPF program attempts to read memory without using the
      bpf_probe_read() helper, the verifier marks the load instruction with
      the BPF_PROBE_MEM flag. Since the riscv JIT does not currently recognize
      this flag it falls back to the interpreter.
      
      Add support for BPF_PROBE_MEM, by appending an exception table to the
      BPF program. If the load instruction causes a data abort, the fixup
      infrastructure finds the exception table and fixes up the fault, by
      clearing the destination register and jumping over the faulting
      instruction.
      
      A more generic solution would add a "handler" field to the table entry,
      like on x86 and s390. The same issue in ARM64 is fixed in 80083428
      ("bpf, arm64: Add BPF exception tables").
      Signed-off-by: default avatarTong Tiangen <tongtiangen@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarPu Lehui <pulehui@huawei.com>
      Tested-by: default avatarBjörn Töpel <bjorn@kernel.org>
      Acked-by: default avatarBjörn Töpel <bjorn@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027111822.3801679-1-tongtiangen@huawei.com
      252c765b
    • Andrii Nakryiko's avatar
      Merge branch 'selftests/bpf: parallel mode improvement' · 03e6a7a9
      Andrii Nakryiko authored
      Yucong Sun says:
      
      ====================
      
      Several patches to improve parallel execution mode, updating vmtest.sh
      and fixed two previously dropped patches according to feedback.
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      03e6a7a9
    • Yucong Sun's avatar
      selftests/bpf: Adding a namespace reset for tc_redirect · e1ef62a4
      Yucong Sun authored
      This patch delete ns_src/ns_dst/ns_redir namespaces before recreating
      them, making the test more robust.
      Signed-off-by: default avatarYucong Sun <sunyucong@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211025223345.2136168-5-fallentree@fb.com
      e1ef62a4
    • Yucong Sun's avatar
      selftests/bpf: Fix attach_probe in parallel mode · 9e7240fb
      Yucong Sun authored
      This patch makes attach_probe uses its own method as attach point,
      avoiding conflict with other tests like bpf_cookie.
      Signed-off-by: default avatarYucong Sun <sunyucong@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211025223345.2136168-4-fallentree@fb.com
      9e7240fb
    • Yucong Sun's avatar
      selfetests/bpf: Update vmtest.sh defaults · 547208a3
      Yucong Sun authored
      Increase memory to 4G, 8 SMP core with host cpu passthrough. This
      make it run faster in parallel mode and more likely to succeed.
      Signed-off-by: default avatarYucong Sun <sunyucong@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211025223345.2136168-2-fallentree@fb.com
      547208a3
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: use 32bit safe version of u64_stats' · f9d532fc
      Alexei Starovoitov authored
      Eric Dumazet says:
      
      ====================
      
      From: Eric Dumazet <edumazet@google.com>
      
      Two first patches fix bugs added in 5.1 and 5.5
      
      Third patch replaces the u64 fields in struct bpf_prog_stats
      with u64_stats_t ones to avoid possible sampling errors,
      in case of load/store stearing.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f9d532fc
    • Eric Dumazet's avatar
      bpf: Use u64_stats_t in struct bpf_prog_stats · 61a0abae
      Eric Dumazet authored
      Commit 316580b6 ("u64_stats: provide u64_stats_t type")
      fixed possible load/store tearing on 64bit arches.
      
      For instance the following C code
      
      stats->nsecs += sched_clock() - start;
      
      Could be rightfully implemented like this by a compiler,
      confusing concurrent readers a lot:
      
      stats->nsecs += sched_clock();
      // arbitrary delay
      stats->nsecs -= start;
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211026214133.3114279-4-eric.dumazet@gmail.com
      61a0abae
    • Eric Dumazet's avatar
      bpf: Fixes possible race in update_prog_stats() for 32bit arches · d979617a
      Eric Dumazet authored
      It seems update_prog_stats() suffers from same issue fixed
      in the prior patch:
      
      As it can run while interrupts are enabled, it could
      be re-entered and the u64_stats syncp could be mangled.
      
      Fixes: fec56f58 ("bpf: Introduce BPF trampoline")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211026214133.3114279-3-eric.dumazet@gmail.com
      d979617a
    • Eric Dumazet's avatar
      bpf: Avoid races in __bpf_prog_run() for 32bit arches · f941eadd
      Eric Dumazet authored
      __bpf_prog_run() can run from non IRQ contexts, meaning
      it could be re entered if interrupted.
      
      This calls for the irq safe variant of u64_stats_update_{begin|end},
      or risk a deadlock.
      
      This patch is a nop on 64bit arches, fortunately.
      
      syzbot report:
      
      WARNING: inconsistent lock state
      5.12.0-rc3-syzkaller #0 Not tainted
      --------------------------------
      inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      udevd/4013 [HC0[0]:SC0[0]:HE1:SE1] takes:
      ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: sk_filter include/linux/filter.h:867 [inline]
      ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: do_one_broadcast net/netlink/af_netlink.c:1468 [inline]
      ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: netlink_broadcast_filtered+0x27c/0x4fc net/netlink/af_netlink.c:1520
      {IN-SOFTIRQ-W} state was registered at:
        lock_acquire.part.0+0xf0/0x41c kernel/locking/lockdep.c:5510
        lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5483
        do_write_seqcount_begin_nested include/linux/seqlock.h:520 [inline]
        do_write_seqcount_begin include/linux/seqlock.h:545 [inline]
        u64_stats_update_begin include/linux/u64_stats_sync.h:129 [inline]
        bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline]
        bpf_prog_run_clear_cb+0x1bc/0x270 include/linux/filter.h:755
        run_filter+0xa0/0x17c net/packet/af_packet.c:2031
        packet_rcv+0xc0/0x3e0 net/packet/af_packet.c:2104
        dev_queue_xmit_nit+0x2bc/0x39c net/core/dev.c:2387
        xmit_one net/core/dev.c:3588 [inline]
        dev_hard_start_xmit+0x94/0x518 net/core/dev.c:3609
        sch_direct_xmit+0x11c/0x1f0 net/sched/sch_generic.c:313
        qdisc_restart net/sched/sch_generic.c:376 [inline]
        __qdisc_run+0x194/0x7f8 net/sched/sch_generic.c:384
        qdisc_run include/net/pkt_sched.h:136 [inline]
        qdisc_run include/net/pkt_sched.h:128 [inline]
        __dev_xmit_skb net/core/dev.c:3795 [inline]
        __dev_queue_xmit+0x65c/0xf84 net/core/dev.c:4150
        dev_queue_xmit+0x14/0x18 net/core/dev.c:4215
        neigh_resolve_output net/core/neighbour.c:1491 [inline]
        neigh_resolve_output+0x170/0x228 net/core/neighbour.c:1471
        neigh_output include/net/neighbour.h:510 [inline]
        ip6_finish_output2+0x2e4/0x9fc net/ipv6/ip6_output.c:117
        __ip6_finish_output net/ipv6/ip6_output.c:182 [inline]
        __ip6_finish_output+0x164/0x3f8 net/ipv6/ip6_output.c:161
        ip6_finish_output+0x2c/0xb0 net/ipv6/ip6_output.c:192
        NF_HOOK_COND include/linux/netfilter.h:290 [inline]
        ip6_output+0x74/0x294 net/ipv6/ip6_output.c:215
        dst_output include/net/dst.h:448 [inline]
        NF_HOOK include/linux/netfilter.h:301 [inline]
        NF_HOOK include/linux/netfilter.h:295 [inline]
        mld_sendpack+0x2a8/0x7e4 net/ipv6/mcast.c:1679
        mld_send_cr net/ipv6/mcast.c:1975 [inline]
        mld_ifc_timer_expire+0x1e8/0x494 net/ipv6/mcast.c:2474
        call_timer_fn+0xd0/0x570 kernel/time/timer.c:1431
        expire_timers kernel/time/timer.c:1476 [inline]
        __run_timers kernel/time/timer.c:1745 [inline]
        run_timer_softirq+0x2e4/0x384 kernel/time/timer.c:1758
        __do_softirq+0x204/0x7ac kernel/softirq.c:345
        do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
        invoke_softirq kernel/softirq.c:228 [inline]
        __irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422
        irq_exit+0x10/0x3c kernel/softirq.c:446
        __handle_domain_irq+0xb4/0x120 kernel/irq/irqdesc.c:692
        handle_domain_irq include/linux/irqdesc.h:176 [inline]
        gic_handle_irq+0x84/0xac drivers/irqchip/irq-gic.c:370
        __irq_svc+0x5c/0x94 arch/arm/kernel/entry-armv.S:205
        debug_smp_processor_id+0x0/0x24 lib/smp_processor_id.c:53
        rcu_read_lock_held_common kernel/rcu/update.c:108 [inline]
        rcu_read_lock_sched_held+0x24/0x7c kernel/rcu/update.c:123
        trace_lock_acquire+0x24c/0x278 include/trace/events/lock.h:13
        lock_acquire+0x3c/0x74 kernel/locking/lockdep.c:5481
        rcu_lock_acquire include/linux/rcupdate.h:267 [inline]
        rcu_read_lock include/linux/rcupdate.h:656 [inline]
        avc_has_perm_noaudit+0x6c/0x260 security/selinux/avc.c:1150
        selinux_inode_permission+0x140/0x220 security/selinux/hooks.c:3141
        security_inode_permission+0x44/0x60 security/security.c:1268
        inode_permission.part.0+0x5c/0x13c fs/namei.c:521
        inode_permission fs/namei.c:494 [inline]
        may_lookup fs/namei.c:1652 [inline]
        link_path_walk.part.0+0xd4/0x38c fs/namei.c:2208
        link_path_walk fs/namei.c:2189 [inline]
        path_lookupat+0x3c/0x1b8 fs/namei.c:2419
        filename_lookup+0xa8/0x1a4 fs/namei.c:2453
        user_path_at_empty+0x74/0x90 fs/namei.c:2733
        do_readlinkat+0x5c/0x12c fs/stat.c:417
        __do_sys_readlink fs/stat.c:450 [inline]
        sys_readlink+0x24/0x28 fs/stat.c:447
        ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64
        0x7eaa4974
      irq event stamp: 298277
      hardirqs last  enabled at (298277): [<802000d0>] no_work_pending+0x4/0x34
      hardirqs last disabled at (298276): [<8020c9b8>] do_work_pending+0x9c/0x648 arch/arm/kernel/signal.c:676
      softirqs last  enabled at (298216): [<8020167c>] __do_softirq+0x584/0x7ac kernel/softirq.c:372
      softirqs last disabled at (298201): [<8024dff4>] do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
      softirqs last disabled at (298201): [<8024dff4>] invoke_softirq kernel/softirq.c:228 [inline]
      softirqs last disabled at (298201): [<8024dff4>] __irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&pstats->syncp)->seq);
        <Interrupt>
          lock(&(&pstats->syncp)->seq);
      
       *** DEADLOCK ***
      
      1 lock held by udevd/4013:
       #0: 82b09c5c (rcu_read_lock){....}-{1:2}, at: sk_filter_trim_cap+0x54/0x434 net/core/filter.c:139
      
      stack backtrace:
      CPU: 1 PID: 4013 Comm: udevd Not tainted 5.12.0-rc3-syzkaller #0
      Hardware name: ARM-Versatile Express
      Backtrace:
      [<81802550>] (dump_backtrace) from [<818027c4>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       r7:00000080 r6:600d0093 r5:00000000 r4:82b58344
      [<818027ac>] (show_stack) from [<81809e98>] (__dump_stack lib/dump_stack.c:79 [inline])
      [<818027ac>] (show_stack) from [<81809e98>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120)
      [<81809de0>] (dump_stack) from [<81804a00>] (print_usage_bug.part.0+0x228/0x230 kernel/locking/lockdep.c:3806)
       r7:86bcb768 r6:81a0326c r5:830f96a8 r4:86bcb0c0
      [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (print_usage_bug kernel/locking/lockdep.c:3776 [inline])
      [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (valid_state kernel/locking/lockdep.c:3818 [inline])
      [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (mark_lock_irq kernel/locking/lockdep.c:4021 [inline])
      [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (mark_lock.part.0+0xc34/0x136c kernel/locking/lockdep.c:4478)
       r10:83278fe8 r9:82c6d748 r8:00000000 r7:82c6d2d4 r6:00000004 r5:86bcb768
       r4:00000006
      [<802ba584>] (mark_lock.part.0) from [<802bc644>] (mark_lock kernel/locking/lockdep.c:4442 [inline])
      [<802ba584>] (mark_lock.part.0) from [<802bc644>] (mark_usage kernel/locking/lockdep.c:4391 [inline])
      [<802ba584>] (mark_lock.part.0) from [<802bc644>] (__lock_acquire+0x9bc/0x3318 kernel/locking/lockdep.c:4854)
       r10:86bcb768 r9:86bcb0c0 r8:00000001 r7:00040000 r6:0000075a r5:830f96a8
       r4:00000000
      [<802bbc88>] (__lock_acquire) from [<802bfb90>] (lock_acquire.part.0+0xf0/0x41c kernel/locking/lockdep.c:5510)
       r10:00000000 r9:600d0013 r8:00000000 r7:00000000 r6:828a2680 r5:828a2680
       r4:861e5bc8
      [<802bfaa0>] (lock_acquire.part.0) from [<802bff28>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5483)
       r10:8146137c r9:00000000 r8:00000001 r7:00000000 r6:00000000 r5:00000000
       r4:ff7c9dec
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (do_write_seqcount_begin_nested include/linux/seqlock.h:520 [inline])
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (do_write_seqcount_begin include/linux/seqlock.h:545 [inline])
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (u64_stats_update_begin include/linux/u64_stats_sync.h:129 [inline])
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (__bpf_prog_run_save_cb include/linux/filter.h:727 [inline])
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (bpf_prog_run_save_cb include/linux/filter.h:741 [inline])
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (sk_filter_trim_cap+0x26c/0x434 net/core/filter.c:149)
       r10:a4095dd0 r9:ff7c9dd0 r8:e44be000 r7:8146137c r6:00000001 r5:8611ba80
       r4:00000000
      [<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (sk_filter include/linux/filter.h:867 [inline])
      [<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (do_one_broadcast net/netlink/af_netlink.c:1468 [inline])
      [<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (netlink_broadcast_filtered+0x27c/0x4fc net/netlink/af_netlink.c:1520)
       r10:00000001 r9:833d6b1c r8:00000000 r7:8572f864 r6:8611ba80 r5:8698d800
       r4:8572f800
      [<81461100>] (netlink_broadcast_filtered) from [<81463e60>] (netlink_broadcast net/netlink/af_netlink.c:1544 [inline])
      [<81461100>] (netlink_broadcast_filtered) from [<81463e60>] (netlink_sendmsg+0x3d0/0x478 net/netlink/af_netlink.c:1925)
       r10:00000000 r9:00000002 r8:8698d800 r7:000000b7 r6:8611b900 r5:861e5f50
       r4:86aa3000
      [<81463a90>] (netlink_sendmsg) from [<81321f54>] (sock_sendmsg_nosec net/socket.c:654 [inline])
      [<81463a90>] (netlink_sendmsg) from [<81321f54>] (sock_sendmsg+0x3c/0x4c net/socket.c:674)
       r10:00000000 r9:861e5dd4 r8:00000000 r7:86570000 r6:00000000 r5:86570000
       r4:861e5f50
      [<81321f18>] (sock_sendmsg) from [<813234d0>] (____sys_sendmsg+0x230/0x29c net/socket.c:2350)
       r5:00000040 r4:861e5f50
      [<813232a0>] (____sys_sendmsg) from [<8132549c>] (___sys_sendmsg+0xac/0xe4 net/socket.c:2404)
       r10:00000128 r9:861e4000 r8:00000000 r7:00000000 r6:86570000 r5:861e5f50
       r4:00000000
      [<813253f0>] (___sys_sendmsg) from [<81325684>] (__sys_sendmsg net/socket.c:2433 [inline])
      [<813253f0>] (___sys_sendmsg) from [<81325684>] (__do_sys_sendmsg net/socket.c:2442 [inline])
      [<813253f0>] (___sys_sendmsg) from [<81325684>] (sys_sendmsg+0x58/0xa0 net/socket.c:2440)
       r8:80200224 r7:00000128 r6:00000000 r5:7eaa541c r4:86570000
      [<8132562c>] (sys_sendmsg) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)
      Exception stack(0x861e5fa8 to 0x861e5ff0)
      5fa0:                   00000000 00000000 0000000c 7eaa541c 00000000 00000000
      5fc0: 00000000 00000000 76fbf840 00000128 00000000 0000008f 7eaa541c 000563f8
      5fe0: 00056110 7eaa53e0 00036cec 76c9bf44
       r6:76fbf840 r5:00000000 r4:00000000
      
      Fixes: 492ecee8 ("bpf: enable program stats")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211026214133.3114279-2-eric.dumazet@gmail.com
      f941eadd
    • Joe Burton's avatar
      libbpf: Deprecate bpf_objects_list · 689624f0
      Joe Burton authored
      Add a flag to `enum libbpf_strict_mode' to disable the global
      `bpf_objects_list', preventing race conditions when concurrent threads
      call bpf_object__open() or bpf_object__close().
      
      bpf_object__next() will return NULL if this option is set.
      
      Callers may achieve the same workflow by tracking bpf_objects in
      application code.
      
        [0] Closes: https://github.com/libbpf/libbpf/issues/293Signed-off-by: default avatarJoe Burton <jevburton@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211026223528.413950-1-jevburton.kernel@gmail.com
      689624f0
  3. 26 Oct, 2021 20 commits
  4. 25 Oct, 2021 5 commits