Commits · b66ccae01f1ddce47fe2c7f393a3a5c5ab3d7f06 · Kirill Smelkov / linux

21 Sep, 2022 6 commits

bpf: Add libbpf logic for user-space ring buffer · b66ccae0

David Vernet authored 2 years ago


Now that all of the logic is in place in the kernel to support user-space
produced ring buffers, we can add the user-space logic to libbpf. This
patch therefore adds the following public symbols to libbpf:

struct user_ring_buffer *
user_ring_buffer__new(int map_fd,
		      const struct user_ring_buffer_opts *opts);
void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                         __u32 size, int timeout_ms);
void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
void user_ring_buffer__discard(struct user_ring_buffer *rb,
void user_ring_buffer__free(struct user_ring_buffer *rb);

A user-space producer must first create a struct user_ring_buffer * object
with user_ring_buffer__new(), and can then reserve samples in the
ring buffer using one of the following two symbols:

void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                         __u32 size, int timeout_ms);

With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring
buffer will be returned if sufficient space is available in the buffer.
user_ring_buffer__reserve_blocking() provides similar semantics, but will
block for up to 'timeout_ms' in epoll_wait if there is insufficient space
in the buffer. This function has the guarantee from the kernel that it will
receive at least one event-notification per invocation to
bpf_ringbuf_drain(), provided that at least one sample is drained, and the
BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain().

Once a sample is reserved, it must either be committed to the ring buffer
with user_ring_buffer__submit(), or discarded with
user_ring_buffer__discard().
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com

b66ccae0

bpf: Add bpf_user_ringbuf_drain() helper · 20571567

David Vernet authored 2 years ago


In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which
will allow user-space applications to publish messages to a ring buffer
that is consumed by a BPF program in kernel-space. In order for this
map-type to be useful, it will require a BPF helper function that BPF
programs can invoke to drain samples from the ring buffer, and invoke
callbacks on those samples. This change adds that capability via a new BPF
helper function:

bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx,
                       u64 flags)

BPF programs may invoke this function to run callback_fn() on a series of
samples in the ring buffer. callback_fn() has the following signature:

long callback_fn(struct bpf_dynptr *dynptr, void *context);

Samples are provided to the callback in the form of struct bpf_dynptr *'s,
which the program can read using BPF helper functions for querying
struct bpf_dynptr's.

In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register
type is added to the verifier to reflect a dynptr that was allocated by
a helper function and passed to a BPF program. Unlike PTR_TO_STACK
dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR
dynptrs need not use reference tracking, as the BPF helper is trusted to
properly free the dynptr before returning. The verifier currently only
supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL.

Note that while the corresponding user-space libbpf logic will be added
in a subsequent patch, this patch does contain an implementation of the
.map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This
.map_poll() callback guarantees that an epoll-waiting user-space
producer will receive at least one event notification whenever at least
one sample is drained in an invocation of bpf_user_ringbuf_drain(),
provided that the function is not invoked with the BPF_RB_NO_WAKEUP
flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup
notification is sent even if no sample was drained.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com

20571567

bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map type · 583c1f42

David Vernet authored 2 years ago


We want to support a ringbuf map type where samples are published from
user-space, to be consumed by BPF programs. BPF currently supports a
kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF
map type.  We'll need to define a new map type for user-space -> kernel,
as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply
to a user-space producer ring buffer, and we'll want to add one or
more helper functions that would not apply for a kernel-producer
ring buffer.

This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type
definition. The map type is useless in its current form, as there is no
way to access or use it for anything until we one or more BPF helpers. A
follow-on patch will therefore add a new helper function that allows BPF
programs to run callbacks on samples that are published to the ring
buffer.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com

583c1f42

bpf: simplify code in btf_parse_hdr · 3a74904c

William Dean authored 2 years ago


It could directly return 'btf_check_sec_info' to simplify code.
Signed-off-by: William Dean <williamsukatube@163.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220917084248.3649-1-williamsukatube@163.com

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

3a74904c

libbpf: Fix NULL pointer exception in API btf_dump__dump_type_data · 7620bffb

Xin Liu authored 2 years ago

We found that function btf_dump__dump_type_data can be called by the
user as an API, but in this function, the `opts` parameter may be used
as a null pointer.This causes `opts->indent_str` to trigger a NULL
pointer exception.

Fixes: 2ce8450e

 ("libbpf: add bpf_object__open_{file, mem} w/ extensible opts")
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Weibin Kong <kongweibin2@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220917084809.30770-1-liuxin350@huawei.com

7620bffb

samples/bpf: Replace blk_account_io_done() with __blk_account_io_done() · bc069da6

Rong Tao authored 2 years ago

Since commit be6bfe36

 ("block: inline hot paths of blk_account_io_*()")
blk_account_io_*() become inline functions.
Signed-off-by: Rong Tao <rtoax@foxmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/tencent_1CC476835C219FACD84B6715F0D785517E07@qq.com

bc069da6

20 Sep, 2022 5 commits

Merge branch 'bpf: Small nf_conn cleanups' · bfa8fe95

Martin KaFai Lau authored 2 years ago

Daniel Xu says:

====================

This patchset cleans up a few small things:

* Delete unused stub
* Rename variable to be more descriptive
* Fix some `extern` declaration warnings

Past discussion:
- v2: https://lore.kernel.org/bpf/cover.1663616584.git.dxu@dxuuu.xyz/



Changes since v2:
- Remove unused #include's
- Move #include <linux/filter.h> to .c
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

bfa8fe95

bpf: Move nf_conn extern declarations to filter.h · fdf21497

Daniel Xu authored 2 years ago

We're seeing the following new warnings on netdev/build_32bit and
netdev/build_allmodconfig_warn CI jobs:

    ../net/core/filter.c:8608:1: warning: symbol
    'nf_conn_btf_access_lock' was not declared. Should it be static?
    ../net/core/filter.c:8611:5: warning: symbol 'nfct_bsa' was not
    declared. Should it be static?

Fix by ensuring extern declaration is present while compiling filter.o.

Fixes: 864b656f

 ("bpf: Add support for writing to nf_conn:mark")
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/2bd2e0283df36d8a4119605878edb1838d144174.1663683114.git.dxu@dxuuu.xyz

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

fdf21497

bpf: Rename nfct_bsa to nfct_btf_struct_access · 5a090aa3

Daniel Xu authored 2 years ago


The former name was a little hard to guess.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/73adc72385c8b162391fbfb404f0b6d4c5cc55d7.1663683114.git.dxu@dxuuu.xyz

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

5a090aa3

bpf: Remove unused btf_struct_access stub · 52bdae37

Daniel Xu authored 2 years ago


This stub was not being used anywhere.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/590e7bd6172ffe0f3d7b51cd40e8ded941aaf7e8.1663683114.git.dxu@dxuuu.xyz

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

52bdae37

bpf: Check whether or not node is NULL before free it in free_bulk · c31b38cb

Hou Tao authored 2 years ago

llnode could be NULL if there are new allocations after the checking of
c-free_cnt > c->high_watermark in bpf_mem_refill() and before the
calling of __llist_del_first() in free_bulk (e.g. a PREEMPT_RT kernel
or allocation in NMI context). And it will incur oops as shown below:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 0 P4D 0
 Oops: 0002 [#1] PREEMPT_RT SMP
 CPU: 39 PID: 373 Comm: irq_work/39 Tainted: G        W          6.0.0-rc6-rt9+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 RIP: 0010:bpf_mem_refill+0x66/0x130
 ......
 Call Trace:
  <TASK>
  irq_work_single+0x24/0x60
  irq_work_run_list+0x24/0x30
  run_irq_workd+0x18/0x20
  smpboot_thread_fn+0x13f/0x2c0
  kthread+0x121/0x140
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

Simply fixing it by checking whether or not llnode is NULL in free_bulk().

Fixes: 8d5a8011

 ("bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220919144811.3570825-1-houtao@huaweicloud.com

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

c31b38cb

19 Sep, 2022 1 commit

selftests/bpf: Add test result messages for test_task_storage_map_stress_lookup · a7e85406

Hou Tao authored 2 years ago

Add test result message when test_task_storage_map_stress_lookup()
succeeds or is skipped. The test case can be skipped due to the choose
of preemption model in kernel config, so export skips in test_maps.c and
increase it when needed.

The following is the output of test_maps when the test case succeeds or
is skipped:

  test_task_storage_map_stress_lookup:PASS
  test_maps: OK, 0 SKIPPED

  test_task_storage_map_stress_lookup SKIP (no CONFIG_PREEMPT)
  test_maps: OK, 1 SKIPPED

Fixes: 73b97bc7

 ("selftests/bpf: Test concurrent updates on bpf_task_storage_busy")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220919035714.2195144-1-houtao@huaweicloud.com

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

a7e85406

17 Sep, 2022 1 commit

bpf/btf: Use btf_type_str() whenever possible · 571f9738

Peilin Ye authored 2 years ago


We have btf_type_str().  Use it whenever possible in btf.c, instead of
"btf_kind_str[BTF_INFO_KIND(t->info)]".
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Link: https://lore.kernel.org/r/20220916202800.31421-1-yepeilin.cs@gmail.com

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

571f9738

16 Sep, 2022 8 commits

libbpf: Clean up legacy bpf maps declaration in bpf_helpers · dc567045

Xin Liu authored 2 years ago

Legacy BPF map declarations are no longer supported in libbpf v1.0 [0].
Only BTF-defined maps are supported starting from v1.0, so it is time to
remove the definition of bpf_map_def in bpf_helpers.h.

  [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20220913073643.19960-1-liuxin350@huawei.com

dc567045

selftests/bpf: Add veristat tool for mass-verifying BPF object files · c8bc5e05

Andrii Nakryiko authored 2 years ago

Add a small tool, veristat, that allows mass-verification of
a set of *libbpf-compatible* BPF ELF object files. For each such object
file, veristat will attempt to verify each BPF program *individually*.
Regardless of success or failure, it parses BPF verifier stats and
outputs them in human-readable table format. In the future we can also
add CSV and JSON output for more scriptable post-processing, if necessary.

veristat allows to specify a set of stats that should be output and
ordering between multiple objects and files (e.g., so that one can
easily order by total instructions processed, instead of default file
name, prog name, verdict, total instructions order).

This tool should be useful for validating various BPF verifier changes
or even validating different kernel versions for regressions.

Here's an example for some of the heaviest selftests/bpf BPF object
files:

$ sudo ./veristat -s insns,file,prog {pyperf,loop,test_verif_scale,strobemeta,test_cls_redirect,profiler}*.linked3.o
File Program Verdict Duration, us Total insns Total states Peak states
------------------------------------ ------------------------------------ ------- ------------ ----------- ------------ -----------
loop3.linked3.o while_true failure 350990 1000001 9663 9663
test_verif_scale3.linked3.o balancer_ingress success 115244 845499 8636 2141
test_verif_scale2.linked3.o balancer_ingress success 77688 773445 3048 788
pyperf600.linked3.o on_event success 2079872 624585 30335 30241
pyperf600_nounroll.linked3.o on_event success 353972 568128 37101 2115
strobemeta.linked3.o on_event success 455230 557149 15915 13537
test_verif_scale1.linked3.o balancer_ingress success 89880 554754 8636 2141
strobemeta_nounroll2.linked3.o on_event success 433906 501725 17087 1912
loop6.linked3.o trace_virtqueue_add_sgs success 282205 398057 8717 919
loop1.linked3.o nested_loops success 125630 361349 5504 5504
pyperf180.linked3.o on_event success 2511740 160398 11470 11446
pyperf100.linked3.o on_event success 744329 87681 6213 6191
test_cls_redirect.linked3.o cls_redirect success 54087 78925 4782 903
strobemeta_subprogs.linked3.o on_event success 57898 65420 1954 403
test_cls_redirect_subprogs.linked3.o cls_redirect success 54522 64965 4619 958
strobemeta_nounroll1.linked3.o on_event success 43313 57240 1757 382
pyperf50.linked3.o on_event success 194355 46378 3263 3241
profiler2.linked3.o tracepoint__syscalls__sys_enter_kill success 23869 43372 1423 542
pyperf_subprogs.linked3.o on_event success 29179 36358 2499 2499
profiler1.linked3.o tracepoint__syscalls__sys_enter_kill success 13052 27036 1946 936
profiler3.linked3.o tracepoint__syscalls__sys_enter_kill success 21023 26016 2186 915
profiler2.linked3.o kprobe__vfs_link success 5255 13896 303 271
profiler1.linked3.o kprobe__vfs_link success 7792 12687 1042 1041
profiler3.linked3.o kprobe__vfs_link success 7332 10601 865 865
profiler2.linked3.o kprobe_ret__do_filp_open success 3417 8900 216 199
profiler2.linked3.o kprobe__vfs_symlink success 3548 8775 203 186
pyperf_global.linked3.o on_event success 10007 7563 520 520
profiler3.linked3.o kprobe_ret__do_filp_open success 4708 6464 532 532
profiler1.linked3.o kprobe_ret__do_filp_open success 3090 6445 508 508
profiler3.linked3.o kprobe__vfs_symlink success 4477 6358 521 521
profiler1.linked3.o kprobe__vfs_symlink success 3381 6347 507 507
profiler2.linked3.o raw_tracepoint__sched_process_exec success 2464 5874 292 189
profiler3.linked3.o raw_tracepoint__sched_process_exec success 2677 4363 397 283
profiler2.linked3.o kprobe__proc_sys_write success 1800 4355 143 138
profiler1.linked3.o raw_tracepoint__sched_process_exec success 1649 4019 333 240
pyperf600_bpf_loop.linked3.o on_event success 2711 3966 306 306
profiler2.linked3.o raw_tracepoint__sched_process_exit success 1234 3138 83 66
profiler3.linked3.o kprobe__proc_sys_write success 1755 2623 223 223
profiler1.linked3.o kprobe__proc_sys_write success 1222 2456 193 193
loop2.linked3.o while_true success 608 1783 57 30
profiler3.linked3.o raw_tracepoint__sched_process_exit success 789 1680 146 146
profiler1.linked3.o raw_tracepoint__sched_process_exit success 592 1526 133 133
strobemeta_bpf_loop.linked3.o on_event success 1015 1512 106 106
loop4.linked3.o combinations success 165 524 18 17
profiler3.linked3.o raw_tracepoint__sched_process_fork success 196 299 25 25
profiler1.linked3.o raw_tracepoint__sched_process_fork success 109 265 19 19
profiler2.linked3.o raw_tracepoint__sched_process_fork success 111 265 19 19
loop5.linked3.o while_true success 47 84 9 9
------------------------------------ ------------------------------------ ------- ------------ ----------- ------------ -----------
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220909193053.577111-4-andrii@kernel.org

c8bc5e05

libbpf: Fix crash if SEC("freplace") programs don't have attach_prog_fd set · 749c202c

Andrii Nakryiko authored 2 years ago

Fix SIGSEGV caused by libbpf trying to find attach type in vmlinux BTF
for freplace programs. It's wrong to search in vmlinux BTF and libbpf
doesn't even mark vmlinux BTF as required for freplace programs. So
trying to search anything in obj->vmlinux_btf might cause NULL
dereference if nothing else in BPF object requires vmlinux BTF.

Instead, error out if freplace (EXT) program doesn't specify
attach_prog_fd during at the load time.

Fixes: 91abb4a6

 ("libbpf: Support attachment of BPF tracing programs to kernel modules")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220909193053.577111-3-andrii@kernel.org

749c202c

selftests/bpf: Fix test_verif_scale{1,3} SEC() annotations · cf060c2c

Andrii Nakryiko authored 2 years ago


Use proper SEC("tc") for test_verif_scale{1,3} programs. It's not
a problem for selftests right now because we manually set type
programmatically, but not having correct SEC() definitions makes it
harded to generically load BPF object files.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220909193053.577111-2-andrii@kernel.org

cf060c2c

bpf: Move bpf_dispatcher function out of ftrace locations · ceea991a

Jiri Olsa authored 2 years ago

The dispatcher function is attached/detached to trampoline by
dispatcher update function. At the same time it's available as
ftrace attachable function.

After discussion [1] the proposed solution is to use compiler
attributes to alter bpf_dispatcher_##name##_func function:

  - remove it from being instrumented with __no_instrument_function__
    attribute, so ftrace has no track of it

  - but still generate 5 nop instructions with patchable_function_entry(5)
    attribute, which are expected by bpf_arch_text_poke used by
    dispatcher update function

Enabling HAVE_DYNAMIC_FTRACE_NO_PATCHABLE option for x86, so
__patchable_function_entries functions are not part of ftrace/mcount
locations.

Adding attributes to bpf_dispatcher_XXX function on x86_64 so it's
kept out of ftrace locations and has 5 byte nop generated at entry.

These attributes need to be arch specific as pointed out by Ilya
Leoshkevic in here [2].

The dispatcher image is generated only for x86_64 arch, so the
code can stay as is for other archs.

  [1] https://lore.kernel.org/bpf/20220722110811.124515-1-jolsa@kernel.org/
  [2] https://lore.kernel.org/bpf/969a14281a7791c334d476825863ee449964dd0c.camel@linux.ibm.com/

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20220903131154.420467-3-jolsa@kernel.org

ceea991a

ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLE · 9440155c

Peter Zijlstra (Intel) authored 2 years ago


x86 will shortly start using -fpatchable-function-entry for purposes
other than ftrace, make sure the __patchable_function_entry section
isn't merged in the mcount_loc section.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org

9440155c

bpf: Use bpf_capable() instead of CAP_SYS_ADMIN for blinding decision · bfeb7e39

Yauheni Kaliuta authored 2 years ago

The full CAP_SYS_ADMIN requirement for blinding looks too strict nowadays.
These days given unprivileged BPF is disabled by default, the main users
for constant blinding coming from unprivileged in particular via cBPF -> eBPF
migration (e.g. old-style socket filters).
Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220831090655.156434-1-ykaliuta@redhat.com
Link: https://lore.kernel.org/bpf/20220905090149.61221-1-ykaliuta@redhat.com

bfeb7e39

bpf: use kvmemdup_bpfptr helper · a02c118e

Wang Yufen authored 2 years ago


Use kvmemdup_bpfptr helper instead of open-coding to
simplify the code.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1663058433-14089-1-git-send-email-wangyufen@huawei.com

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

a02c118e

15 Sep, 2022 1 commit

bpf: Add verifier check for BPF_PTR_POISON retval and arg · 47e34cb7

Dave Marchevsky authored 2 years ago

BPF_PTR_POISON was added in commit c0a5a21c

 ("bpf: Allow storing
referenced kptr in map") to denote a bpf_func_proto btf_id which the
verifier will replace with a dynamically-determined btf_id at verification
time.

This patch adds verifier 'poison' functionality to BPF_PTR_POISON in
order to prepare for expanded use of the value to poison ret- and
arg-btf_id in ongoing work, namely rbtree and linked list patchsets
[0, 1]. Specifically, when the verifier checks helper calls, it assumes
that BPF_PTR_POISON'ed ret type will be replaced with a valid type before
- or in lieu of - the default ret_btf_id logic. Similarly for arg btf_id.

If poisoned btf_id reaches default handling block for either, consider
this a verifier internal error and fail verification. Otherwise a helper
w/ poisoned btf_id but no verifier logic replacing the type will cause a
crash as the invalid pointer is dereferenced.

Also move BPF_PTR_POISON to existing include/linux/posion.h header and
remove unnecessary shift.

  [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
  [1]: lore.kernel.org/bpf/20220904204145.3089-1-memxor@gmail.com
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220912154544.1398199-1-davemarchevsky@fb.com

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

47e34cb7

11 Sep, 2022 9 commits

bpf: Add verifier support for custom callback return range · 1bfe26fb

Dave Marchevsky authored 2 years ago

Verifier logic to confirm that a callback function returns 0 or 1 was
added in commit 69c087ba

 ("bpf: Add bpf_for_each_map_elem() helper").
At the time, callback return value was only used to continue or stop
iteration.

In order to support callbacks with a broader return value range, such as
those added in rbtree series[0] and others, add a callback_ret_range to
bpf_func_state. Verifier's helpers which set in_callback_fn will also
set the new field, which the verifier will later use to check return
value bounds.

Default to tnum_range(0, 0) instead of using tnum_unknown as a sentinel
value as the latter would prevent the valid range (0, U64_MAX) being
used. Previous global default tnum_range(0, 1) is explicitly set for
extant callback helpers. The change to global default was made after
discussion around this patch in rbtree series [1], goal here is to make
it more obvious that callback_ret_range should be explicitly set.

  [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com/
  [1]: lore.kernel.org/bpf/20220830172759.4069786-2-davemarchevsky@fb.com/
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220908230716.2751723-1-davemarchevsky@fb.com

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1bfe26fb

selftests/bpf: fix ct status check in bpf_nf selftests · f7c946f2

Lorenzo Bianconi authored 2 years ago

Check properly the connection tracking entry status configured running
bpf_ct_change_status kfunc.
Remove unnecessary IPS_CONFIRMED status configuration since it is
already done during entry allocation.

Fixes: 6eb7fba0

 ("selftests/bpf: Add tests for new nf_conntrack kfuncs")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/813a5161a71911378dfac8770ec890428e4998aa.1662623574.git.lorenzo@kernel.org

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

f7c946f2

Merge branch 'Support direct writes to nf_conn:mark' · b8c62fe2

Alexei Starovoitov authored 2 years ago

Daniel Xu says:

====================

Support direct writes to nf_conn:mark from TC and XDP prog types. This
is useful when applications want to store per-connection metadata. This
is also particularly useful for applications that run both bpf and
iptables/nftables because the latter can trivially access this metadata.

One example use case would be if a bpf prog is responsible for advanced
packet classification and iptables/nftables is later used for routing
due to pre-existing/legacy code.

Past discussion:
- v4: https://lore.kernel.org/bpf/cover.1661192455.git.dxu@dxuuu.xyz/
- v3: https://lore.kernel.org/bpf/cover.1660951028.git.dxu@dxuuu.xyz/
- v2: https://lore.kernel.org/bpf/CAP01T74Sgn354dXGiFWFryu4vg+o8b9s9La1d9zEbC4LGvH4qg@mail.gmail.com/T/
- v1: https://lore.kernel.org/bpf/cover.1660592020.git.dxu@dxuuu.xyz/



Changes since v4:
- Use exported function pointer + mutex to handle CONFIG_NF_CONNTRACK=m
  case

Changes since v3:
- Use a mutex to protect module load/unload critical section

Changes since v2:
- Remove use of NOT_INIT for btf_struct_access write path
- Disallow nf_conn writing when nf_conntrack module not loaded
- Support writing to nf_conn___init:mark

Changes since v1:
- Add unimplemented stub for when !CONFIG_BPF_SYSCALL
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b8c62fe2

selftests/bpf: Add tests for writing to nf_conn:mark · e2d75e95

Daniel Xu authored 2 years ago

Add a simple extension to the existing selftest to write to
nf_conn:mark. Also add a failure test for writing to unsupported field.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/f78966b81b9349d2b8ebb4cee2caf15cb6b38ee2.1662568410.git.dxu@dxuuu.xyz

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

e2d75e95

bpf: Add support for writing to nf_conn:mark · 864b656f

Daniel Xu authored 2 years ago

Support direct writes to nf_conn:mark from TC and XDP prog types. This
is useful when applications want to store per-connection metadata. This
is also particularly useful for applications that run both bpf and
iptables/nftables because the latter can trivially access this metadata.

One example use case would be if a bpf prog is responsible for advanced
packet classification and iptables/nftables is later used for routing
due to pre-existing/legacy code.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/ebca06dea366e3e7e861c12f375a548cc4c61108.1662568410.git.dxu@dxuuu.xyz

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

864b656f

bpf: Export btf_type_by_id() and bpf_log() · 84c6ac41

Daniel Xu authored 2 years ago


These symbols will be used in nf_conntrack.ko to support direct writes
to `nf_conn`.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/3c98c19dc50d3b18ea5eca135b4fc3a5db036060.1662568410.git.dxu@dxuuu.xyz

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

84c6ac41

bpf: Use 0 instead of NOT_INIT for btf_struct_access() writes · 896f07c0

Daniel Xu authored 2 years ago


Returning a bpf_reg_type only makes sense in the context of a BPF_READ.
For writes, prefer to explicitly return 0 for clarity.

Note that is non-functional change as it just so happened that NOT_INIT
== 0.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/01772bc1455ae16600796ac78c6cc9fff34f95ff.1662568410.git.dxu@dxuuu.xyz

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

896f07c0

bpf: Add stub for btf_struct_access() · d4f7bdb2

Daniel Xu authored 2 years ago

Add corresponding unimplemented stub for when CONFIG_BPF_SYSCALL=n
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/4021398e884433b1fef57a4d28361bb9fcf1bd05.1662568410.git.dxu@dxuuu.xyz

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

d4f7bdb2

bpf: Remove duplicate PTR_TO_BTF_ID RO check · 65269888

Daniel Xu authored 2 years ago

Since commit 27ae7997

 ("bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS")
there has existed bpf_verifier_ops:btf_struct_access. When
btf_struct_access is _unset_ for a prog type, the verifier runs the
default implementation, which is to enforce read only:

        if (env->ops->btf_struct_access) {
                [...]
        } else {
                if (atype != BPF_READ) {
                        verbose(env, "only read is supported\n");
                        return -EACCES;
                }

                [...]
        }

When btf_struct_access is _set_, the expectation is that
btf_struct_access has full control over accesses, including if writes
are allowed.

Rather than carve out an exception for each prog type that may write to
BTF ptrs, delete the redundant check and give full control to
btf_struct_access.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/962da2bff1238746589e332ff1aecc49403cd7ce.1662568410.git.dxu@dxuuu.xyz

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

65269888

10 Sep, 2022 2 commits

bpf: Simplify code by using for_each_cpu_wrap() · 57c92f11

Punit Agrawal authored 2 years ago


In the percpu freelist code, it is a common pattern to iterate over
the possible CPUs mask starting with the current CPU. The pattern is
implemented using a hand rolled while loop with the loop variable
increment being open-coded.

Simplify the code by using for_each_cpu_wrap() helper to iterate over
the possible cpus starting with the current CPU. As a result, some of
the special-casing in the loop also gets simplified.

No functional change intended.
Signed-off-by: Punit Agrawal <punit.agrawal@bytedance.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20220907155746.1750329-1-punit.agrawal@bytedance.com

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

57c92f11

bpf: add missing percpu_counter_destroy() in htab_map_alloc() · cf7de6a5

Tetsuo Handa authored 2 years ago

syzbot is reporting ODEBUG bug in htab_map_alloc() [1], for
commit 86fe28f7 ("bpf: Optimize element count in non-preallocated
hash map.") added percpu_counter_init() to htab_map_alloc() but forgot to
add percpu_counter_destroy() to the error path.

Link: https://syzkaller.appspot.com/bug?extid=5d1da78b375c3b5e6c2b

 [1]
Reported-by: syzbot <syzbot+5d1da78b375c3b5e6c2b@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 86fe28f7

 ("bpf: Optimize element count in non-preallocated hash map.")
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/e2e4cc0e-9d36-4ca1-9bfa-ce23e6f8310b@I-love.SAKURA.ne.jp

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

cf7de6a5

09 Sep, 2022 5 commits

Merge branch 'cgroup/connect{4,6} programs for unprivileged ICMP ping' · 2fae6771

Martin KaFai Lau authored 2 years ago


YiFei Zhu says:

====================

Usually when a TCP/UDP connection is initiated, we can bind the socket
to a specific IP attached to an interface in a cgroup/connect hook.
But for pings, this is impossible, as the hook is not being called.

This series adds the invocation for cgroup/connect{4,6} programs to
unprivileged ICMP ping (i.e. ping sockets created with SOCK_DGRAM
IPPROTO_ICMP(V6) as opposed to SOCK_RAW). This also adds a test to
verify that the hooks are being called and invoking bpf_bind() from
within the hook actually binds the socket.

Patch 1 adds the invocation of the hook.
Patch 2 deduplicates write_sysctl in BPF test_progs.
Patch 3 adds the tests for this hook.

v1 -> v2:
* Added static to bindaddr_v6 in prog_tests/connect_ping.c
* Deduplicated much of the test logic in prog_tests/connect_ping.c
* Deduplicated write_sysctl() to test_progs.c

v2 -> v3:
* Renamed variable "obj" to "skel" for the BPF skeleton object in
  prog_tests/connect_ping.c

v3 -> v4:
* Fixed error path to destroy skel in prog_tests/connect_ping.c
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

2fae6771

selftests/bpf: Ensure cgroup/connect{4,6} programs can bind unpriv ICMP ping · 58c449a9

YiFei Zhu authored 2 years ago


This tests that when an unprivileged ICMP ping socket connects,
the hooks are actually invoked. We also ensure that if the hook does
not call bpf_bind(), the bound address is unmodified, and if the
hook calls bpf_bind(), the bound address is exactly what we provided
to the helper.

A new netns is used to enable ping_group_range in the test without
affecting ouside of the test, because by default, not even root is
permitted to use unprivileged ICMP ping...
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/086b227c1b97f4e94193e58aae7576d0261b68a4.1662682323.git.zhuyifei@google.com

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

58c449a9

selftests/bpf: Deduplicate write_sysctl() to test_progs.c · e42921c3

YiFei Zhu authored 2 years ago

This helper is needed in multiple tests. Instead of copying it over
and over, better to deduplicate this helper to test_progs.c.

test_progs.c is chosen over testing_helpers.c because of this helper's
use of CHECK / ASSERT_*, and the CHECK was modified to use ASSERT_*
so it does not rely on a duration variable.
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/9b4fc9a27bd52f771b657b4c4090fc8d61f3a6b5.1662682323.git.zhuyifei@google.com

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

e42921c3

bpf: Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping · 0ffe2412

YiFei Zhu authored 2 years ago

Usually when a TCP/UDP connection is initiated, we can bind the socket
to a specific IP attached to an interface in a cgroup/connect hook.
But for pings, this is impossible, as the hook is not being called.

This adds the hook invocation to unprivileged ICMP ping (i.e. ping
sockets created with SOCK_DGRAM IPPROTO_ICMP(V6) as opposed to
SOCK_RAW. Logic is mirrored from UDP sockets where the hook is invoked
during pre_connect, after a check for suficiently sized addr_len.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/5764914c252fad4cd134fb6664c6ede95f409412.1662682323.git.zhuyifei@google.com

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

0ffe2412

libbpf: Remove gcc support for bpf_tail_call_static for now · 665f5d35

Daniel Borkmann authored 2 years ago

This reverts commit 14e5ce79 ("libbpf: Add GCC support for
bpf_tail_call_static"). Reason is that gcc invented their own BPF asm
which is not conform with LLVM one, and going forward this would be
more painful to maintain here and in other areas of the library. Thus
remove it; ask to gcc folks is to align with LLVM one to use exact
same syntax.

Fixes: 14e5ce79

 ("libbpf: Add GCC support for bpf_tail_call_static")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Hilliard <james.hilliard1@gmail.com>
Cc: Jose E. Marchesi <jose.marchesi@oracle.com>

665f5d35

07 Sep, 2022 2 commits

bpf: Add helper macro bpf_for_each_reg_in_vstate · b239da34

Kumar Kartikeya Dwivedi authored 2 years ago


For a lot of use cases in future patches, we will want to modify the
state of registers part of some same 'group' (e.g. same ref_obj_id). It
won't just be limited to releasing reference state, but setting a type
flag dynamically based on certain actions, etc.

Hence, we need a way to easily pass a callback to the function that
iterates over all registers in current bpf_verifier_state in all frames
upto (and including) the curframe.

While in C++ we would be able to easily use a lambda to pass state and
the callback together, sadly we aren't using C++ in the kernel. The next
best thing to avoid defining a function for each case seems like
statement expressions in GNU C. The kernel already uses them heavily,
hence they can passed to the macro in the style of a lambda. The
statement expression will then be substituted in the for loop bodies.

Variables __state and __reg are set to current bpf_func_state and reg
for each invocation of the expression inside the passed in verifier
state.

Then, convert mark_ptr_or_null_regs, clear_all_pkt_pointers,
release_reference, find_good_pkt_pointers, find_equal_scalars to
use bpf_for_each_reg_in_vstate.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220904204145.3089-16-memxor@gmail.com

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b239da34

bpf: Add zero_map_value to zero map value with special fields · cc487558

Kumar Kartikeya Dwivedi authored 2 years ago


We need this helper to skip over special fields (bpf_spin_lock,
bpf_timer, kptrs) while zeroing a map value. Use the same logic as
copy_map_value but memset instead of memcpy.

Currently, the code zeroing map value memory does not have to deal with
special fields, hence this is a prerequisite for introducing such
support.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220904204145.3089-4-memxor@gmail.com

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

cc487558