- 21 Sep, 2022 6 commits
-
-
David Vernet authored
Now that all of the logic is in place in the kernel to support user-space produced ring buffers, we can add the user-space logic to libbpf. This patch therefore adds the following public symbols to libbpf: struct user_ring_buffer * user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); void user_ring_buffer__discard(struct user_ring_buffer *rb, void user_ring_buffer__free(struct user_ring_buffer *rb); A user-space producer must first create a struct user_ring_buffer * object with user_ring_buffer__new(), and can then reserve samples in the ring buffer using one of the following two symbols: void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring buffer will be returned if sufficient space is available in the buffer. user_ring_buffer__reserve_blocking() provides similar semantics, but will block for up to 'timeout_ms' in epoll_wait if there is insufficient space in the buffer. This function has the guarantee from the kernel that it will receive at least one event-notification per invocation to bpf_ringbuf_drain(), provided that at least one sample is drained, and the BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain(). Once a sample is reserved, it must either be committed to the ring buffer with user_ring_buffer__submit(), or discarded with user_ring_buffer__discard(). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
-
David Vernet authored
In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which will allow user-space applications to publish messages to a ring buffer that is consumed by a BPF program in kernel-space. In order for this map-type to be useful, it will require a BPF helper function that BPF programs can invoke to drain samples from the ring buffer, and invoke callbacks on those samples. This change adds that capability via a new BPF helper function: bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags) BPF programs may invoke this function to run callback_fn() on a series of samples in the ring buffer. callback_fn() has the following signature: long callback_fn(struct bpf_dynptr *dynptr, void *context); Samples are provided to the callback in the form of struct bpf_dynptr *'s, which the program can read using BPF helper functions for querying struct bpf_dynptr's. In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register type is added to the verifier to reflect a dynptr that was allocated by a helper function and passed to a BPF program. Unlike PTR_TO_STACK dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR dynptrs need not use reference tracking, as the BPF helper is trusted to properly free the dynptr before returning. The verifier currently only supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL. Note that while the corresponding user-space libbpf logic will be added in a subsequent patch, this patch does contain an implementation of the .map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This .map_poll() callback guarantees that an epoll-waiting user-space producer will receive at least one event notification whenever at least one sample is drained in an invocation of bpf_user_ringbuf_drain(), provided that the function is not invoked with the BPF_RB_NO_WAKEUP flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup notification is sent even if no sample was drained. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com
-
David Vernet authored
We want to support a ringbuf map type where samples are published from user-space, to be consumed by BPF programs. BPF currently supports a kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF map type. We'll need to define a new map type for user-space -> kernel, as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply to a user-space producer ring buffer, and we'll want to add one or more helper functions that would not apply for a kernel-producer ring buffer. This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type definition. The map type is useless in its current form, as there is no way to access or use it for anything until we one or more BPF helpers. A follow-on patch will therefore add a new helper function that allows BPF programs to run callbacks on samples that are published to the ring buffer. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com
-
William Dean authored
It could directly return 'btf_check_sec_info' to simplify code. Signed-off-by: William Dean <williamsukatube@163.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220917084248.3649-1-williamsukatube@163.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Xin Liu authored
We found that function btf_dump__dump_type_data can be called by the user as an API, but in this function, the `opts` parameter may be used as a null pointer.This causes `opts->indent_str` to trigger a NULL pointer exception. Fixes: 2ce8450e ("libbpf: add bpf_object__open_{file, mem} w/ extensible opts") Signed-off-by: Xin Liu <liuxin350@huawei.com> Signed-off-by: Weibin Kong <kongweibin2@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220917084809.30770-1-liuxin350@huawei.com
-
Rong Tao authored
Since commit be6bfe36 ("block: inline hot paths of blk_account_io_*()") blk_account_io_*() become inline functions. Signed-off-by: Rong Tao <rtoax@foxmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/tencent_1CC476835C219FACD84B6715F0D785517E07@qq.com
-
- 20 Sep, 2022 5 commits
-
-
Martin KaFai Lau authored
Daniel Xu says: ==================== This patchset cleans up a few small things: * Delete unused stub * Rename variable to be more descriptive * Fix some `extern` declaration warnings Past discussion: - v2: https://lore.kernel.org/bpf/cover.1663616584.git.dxu@dxuuu.xyz/ Changes since v2: - Remove unused #include's - Move #include <linux/filter.h> to .c ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Daniel Xu authored
We're seeing the following new warnings on netdev/build_32bit and netdev/build_allmodconfig_warn CI jobs: ../net/core/filter.c:8608:1: warning: symbol 'nf_conn_btf_access_lock' was not declared. Should it be static? ../net/core/filter.c:8611:5: warning: symbol 'nfct_bsa' was not declared. Should it be static? Fix by ensuring extern declaration is present while compiling filter.o. Fixes: 864b656f ("bpf: Add support for writing to nf_conn:mark") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/2bd2e0283df36d8a4119605878edb1838d144174.1663683114.git.dxu@dxuuu.xyzSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Daniel Xu authored
The former name was a little hard to guess. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/73adc72385c8b162391fbfb404f0b6d4c5cc55d7.1663683114.git.dxu@dxuuu.xyzSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Daniel Xu authored
This stub was not being used anywhere. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/590e7bd6172ffe0f3d7b51cd40e8ded941aaf7e8.1663683114.git.dxu@dxuuu.xyzSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Hou Tao authored
llnode could be NULL if there are new allocations after the checking of c-free_cnt > c->high_watermark in bpf_mem_refill() and before the calling of __llist_del_first() in free_bulk (e.g. a PREEMPT_RT kernel or allocation in NMI context). And it will incur oops as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT_RT SMP CPU: 39 PID: 373 Comm: irq_work/39 Tainted: G W 6.0.0-rc6-rt9+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:bpf_mem_refill+0x66/0x130 ...... Call Trace: <TASK> irq_work_single+0x24/0x60 irq_work_run_list+0x24/0x30 run_irq_workd+0x18/0x20 smpboot_thread_fn+0x13f/0x2c0 kthread+0x121/0x140 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Simply fixing it by checking whether or not llnode is NULL in free_bulk(). Fixes: 8d5a8011 ("bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20220919144811.3570825-1-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 19 Sep, 2022 1 commit
-
-
Hou Tao authored
Add test result message when test_task_storage_map_stress_lookup() succeeds or is skipped. The test case can be skipped due to the choose of preemption model in kernel config, so export skips in test_maps.c and increase it when needed. The following is the output of test_maps when the test case succeeds or is skipped: test_task_storage_map_stress_lookup:PASS test_maps: OK, 0 SKIPPED test_task_storage_map_stress_lookup SKIP (no CONFIG_PREEMPT) test_maps: OK, 1 SKIPPED Fixes: 73b97bc7 ("selftests/bpf: Test concurrent updates on bpf_task_storage_busy") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20220919035714.2195144-1-houtao@huaweicloud.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
- 17 Sep, 2022 1 commit
-
-
Peilin Ye authored
We have btf_type_str(). Use it whenever possible in btf.c, instead of "btf_kind_str[BTF_INFO_KIND(t->info)]". Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Link: https://lore.kernel.org/r/20220916202800.31421-1-yepeilin.cs@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
- 16 Sep, 2022 8 commits
-
-
Xin Liu authored
Legacy BPF map declarations are no longer supported in libbpf v1.0 [0]. Only BTF-defined maps are supported starting from v1.0, so it is time to remove the definition of bpf_map_def in bpf_helpers.h. [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0Signed-off-by: Xin Liu <liuxin350@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20220913073643.19960-1-liuxin350@huawei.com
-
Andrii Nakryiko authored
Add a small tool, veristat, that allows mass-verification of a set of *libbpf-compatible* BPF ELF object files. For each such object file, veristat will attempt to verify each BPF program *individually*. Regardless of success or failure, it parses BPF verifier stats and outputs them in human-readable table format. In the future we can also add CSV and JSON output for more scriptable post-processing, if necessary. veristat allows to specify a set of stats that should be output and ordering between multiple objects and files (e.g., so that one can easily order by total instructions processed, instead of default file name, prog name, verdict, total instructions order). This tool should be useful for validating various BPF verifier changes or even validating different kernel versions for regressions. Here's an example for some of the heaviest selftests/bpf BPF object files: $ sudo ./veristat -s insns,file,prog {pyperf,loop,test_verif_scale,strobemeta,test_cls_redirect,profiler}*.linked3.o File Program Verdict Duration, us Total insns Total states Peak states ------------------------------------ ------------------------------------ ------- ------------ ----------- ------------ ----------- loop3.linked3.o while_true failure 350990 1000001 9663 9663 test_verif_scale3.linked3.o balancer_ingress success 115244 845499 8636 2141 test_verif_scale2.linked3.o balancer_ingress success 77688 773445 3048 788 pyperf600.linked3.o on_event success 2079872 624585 30335 30241 pyperf600_nounroll.linked3.o on_event success 353972 568128 37101 2115 strobemeta.linked3.o on_event success 455230 557149 15915 13537 test_verif_scale1.linked3.o balancer_ingress success 89880 554754 8636 2141 strobemeta_nounroll2.linked3.o on_event success 433906 501725 17087 1912 loop6.linked3.o trace_virtqueue_add_sgs success 282205 398057 8717 919 loop1.linked3.o nested_loops success 125630 361349 5504 5504 pyperf180.linked3.o on_event success 2511740 160398 11470 11446 pyperf100.linked3.o on_event success 744329 87681 6213 6191 test_cls_redirect.linked3.o cls_redirect success 54087 78925 4782 903 strobemeta_subprogs.linked3.o on_event success 57898 65420 1954 403 test_cls_redirect_subprogs.linked3.o cls_redirect success 54522 64965 4619 958 strobemeta_nounroll1.linked3.o on_event success 43313 57240 1757 382 pyperf50.linked3.o on_event success 194355 46378 3263 3241 profiler2.linked3.o tracepoint__syscalls__sys_enter_kill success 23869 43372 1423 542 pyperf_subprogs.linked3.o on_event success 29179 36358 2499 2499 profiler1.linked3.o tracepoint__syscalls__sys_enter_kill success 13052 27036 1946 936 profiler3.linked3.o tracepoint__syscalls__sys_enter_kill success 21023 26016 2186 915 profiler2.linked3.o kprobe__vfs_link success 5255 13896 303 271 profiler1.linked3.o kprobe__vfs_link success 7792 12687 1042 1041 profiler3.linked3.o kprobe__vfs_link success 7332 10601 865 865 profiler2.linked3.o kprobe_ret__do_filp_open success 3417 8900 216 199 profiler2.linked3.o kprobe__vfs_symlink success 3548 8775 203 186 pyperf_global.linked3.o on_event success 10007 7563 520 520 profiler3.linked3.o kprobe_ret__do_filp_open success 4708 6464 532 532 profiler1.linked3.o kprobe_ret__do_filp_open success 3090 6445 508 508 profiler3.linked3.o kprobe__vfs_symlink success 4477 6358 521 521 profiler1.linked3.o kprobe__vfs_symlink success 3381 6347 507 507 profiler2.linked3.o raw_tracepoint__sched_process_exec success 2464 5874 292 189 profiler3.linked3.o raw_tracepoint__sched_process_exec success 2677 4363 397 283 profiler2.linked3.o kprobe__proc_sys_write success 1800 4355 143 138 profiler1.linked3.o raw_tracepoint__sched_process_exec success 1649 4019 333 240 pyperf600_bpf_loop.linked3.o on_event success 2711 3966 306 306 profiler2.linked3.o raw_tracepoint__sched_process_exit success 1234 3138 83 66 profiler3.linked3.o kprobe__proc_sys_write success 1755 2623 223 223 profiler1.linked3.o kprobe__proc_sys_write success 1222 2456 193 193 loop2.linked3.o while_true success 608 1783 57 30 profiler3.linked3.o raw_tracepoint__sched_process_exit success 789 1680 146 146 profiler1.linked3.o raw_tracepoint__sched_process_exit success 592 1526 133 133 strobemeta_bpf_loop.linked3.o on_event success 1015 1512 106 106 loop4.linked3.o combinations success 165 524 18 17 profiler3.linked3.o raw_tracepoint__sched_process_fork success 196 299 25 25 profiler1.linked3.o raw_tracepoint__sched_process_fork success 109 265 19 19 profiler2.linked3.o raw_tracepoint__sched_process_fork success 111 265 19 19 loop5.linked3.o while_true success 47 84 9 9 ------------------------------------ ------------------------------------ ------- ------------ ----------- ------------ ----------- Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220909193053.577111-4-andrii@kernel.org
-
Andrii Nakryiko authored
Fix SIGSEGV caused by libbpf trying to find attach type in vmlinux BTF for freplace programs. It's wrong to search in vmlinux BTF and libbpf doesn't even mark vmlinux BTF as required for freplace programs. So trying to search anything in obj->vmlinux_btf might cause NULL dereference if nothing else in BPF object requires vmlinux BTF. Instead, error out if freplace (EXT) program doesn't specify attach_prog_fd during at the load time. Fixes: 91abb4a6 ("libbpf: Support attachment of BPF tracing programs to kernel modules") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220909193053.577111-3-andrii@kernel.org
-
Andrii Nakryiko authored
Use proper SEC("tc") for test_verif_scale{1,3} programs. It's not a problem for selftests right now because we manually set type programmatically, but not having correct SEC() definitions makes it harded to generically load BPF object files. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220909193053.577111-2-andrii@kernel.org
-
Jiri Olsa authored
The dispatcher function is attached/detached to trampoline by dispatcher update function. At the same time it's available as ftrace attachable function. After discussion [1] the proposed solution is to use compiler attributes to alter bpf_dispatcher_##name##_func function: - remove it from being instrumented with __no_instrument_function__ attribute, so ftrace has no track of it - but still generate 5 nop instructions with patchable_function_entry(5) attribute, which are expected by bpf_arch_text_poke used by dispatcher update function Enabling HAVE_DYNAMIC_FTRACE_NO_PATCHABLE option for x86, so __patchable_function_entries functions are not part of ftrace/mcount locations. Adding attributes to bpf_dispatcher_XXX function on x86_64 so it's kept out of ftrace locations and has 5 byte nop generated at entry. These attributes need to be arch specific as pointed out by Ilya Leoshkevic in here [2]. The dispatcher image is generated only for x86_64 arch, so the code can stay as is for other archs. [1] https://lore.kernel.org/bpf/20220722110811.124515-1-jolsa@kernel.org/ [2] https://lore.kernel.org/bpf/969a14281a7791c334d476825863ee449964dd0c.camel@linux.ibm.com/Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/bpf/20220903131154.420467-3-jolsa@kernel.org
-
Peter Zijlstra (Intel) authored
x86 will shortly start using -fpatchable-function-entry for purposes other than ftrace, make sure the __patchable_function_entry section isn't merged in the mcount_loc section. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org
-
Yauheni Kaliuta authored
The full CAP_SYS_ADMIN requirement for blinding looks too strict nowadays. These days given unprivileged BPF is disabled by default, the main users for constant blinding coming from unprivileged in particular via cBPF -> eBPF migration (e.g. old-style socket filters). Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220831090655.156434-1-ykaliuta@redhat.com Link: https://lore.kernel.org/bpf/20220905090149.61221-1-ykaliuta@redhat.com
-
Wang Yufen authored
Use kvmemdup_bpfptr helper instead of open-coding to simplify the code. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/1663058433-14089-1-git-send-email-wangyufen@huawei.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
- 15 Sep, 2022 1 commit
-
-
Dave Marchevsky authored
BPF_PTR_POISON was added in commit c0a5a21c ("bpf: Allow storing referenced kptr in map") to denote a bpf_func_proto btf_id which the verifier will replace with a dynamically-determined btf_id at verification time. This patch adds verifier 'poison' functionality to BPF_PTR_POISON in order to prepare for expanded use of the value to poison ret- and arg-btf_id in ongoing work, namely rbtree and linked list patchsets [0, 1]. Specifically, when the verifier checks helper calls, it assumes that BPF_PTR_POISON'ed ret type will be replaced with a valid type before - or in lieu of - the default ret_btf_id logic. Similarly for arg btf_id. If poisoned btf_id reaches default handling block for either, consider this a verifier internal error and fail verification. Otherwise a helper w/ poisoned btf_id but no verifier logic replacing the type will cause a crash as the invalid pointer is dereferenced. Also move BPF_PTR_POISON to existing include/linux/posion.h header and remove unnecessary shift. [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com [1]: lore.kernel.org/bpf/20220904204145.3089-1-memxor@gmail.com Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220912154544.1398199-1-davemarchevsky@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 11 Sep, 2022 9 commits
-
-
Dave Marchevsky authored
Verifier logic to confirm that a callback function returns 0 or 1 was added in commit 69c087ba ("bpf: Add bpf_for_each_map_elem() helper"). At the time, callback return value was only used to continue or stop iteration. In order to support callbacks with a broader return value range, such as those added in rbtree series[0] and others, add a callback_ret_range to bpf_func_state. Verifier's helpers which set in_callback_fn will also set the new field, which the verifier will later use to check return value bounds. Default to tnum_range(0, 0) instead of using tnum_unknown as a sentinel value as the latter would prevent the valid range (0, U64_MAX) being used. Previous global default tnum_range(0, 1) is explicitly set for extant callback helpers. The change to global default was made after discussion around this patch in rbtree series [1], goal here is to make it more obvious that callback_ret_range should be explicitly set. [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com/ [1]: lore.kernel.org/bpf/20220830172759.4069786-2-davemarchevsky@fb.com/ Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220908230716.2751723-1-davemarchevsky@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Lorenzo Bianconi authored
Check properly the connection tracking entry status configured running bpf_ct_change_status kfunc. Remove unnecessary IPS_CONFIRMED status configuration since it is already done during entry allocation. Fixes: 6eb7fba0 ("selftests/bpf: Add tests for new nf_conntrack kfuncs") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/813a5161a71911378dfac8770ec890428e4998aa.1662623574.git.lorenzo@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Alexei Starovoitov authored
Daniel Xu says: ==================== Support direct writes to nf_conn:mark from TC and XDP prog types. This is useful when applications want to store per-connection metadata. This is also particularly useful for applications that run both bpf and iptables/nftables because the latter can trivially access this metadata. One example use case would be if a bpf prog is responsible for advanced packet classification and iptables/nftables is later used for routing due to pre-existing/legacy code. Past discussion: - v4: https://lore.kernel.org/bpf/cover.1661192455.git.dxu@dxuuu.xyz/ - v3: https://lore.kernel.org/bpf/cover.1660951028.git.dxu@dxuuu.xyz/ - v2: https://lore.kernel.org/bpf/CAP01T74Sgn354dXGiFWFryu4vg+o8b9s9La1d9zEbC4LGvH4qg@mail.gmail.com/T/ - v1: https://lore.kernel.org/bpf/cover.1660592020.git.dxu@dxuuu.xyz/ Changes since v4: - Use exported function pointer + mutex to handle CONFIG_NF_CONNTRACK=m case Changes since v3: - Use a mutex to protect module load/unload critical section Changes since v2: - Remove use of NOT_INIT for btf_struct_access write path - Disallow nf_conn writing when nf_conntrack module not loaded - Support writing to nf_conn___init:mark Changes since v1: - Add unimplemented stub for when !CONFIG_BPF_SYSCALL ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Xu authored
Add a simple extension to the existing selftest to write to nf_conn:mark. Also add a failure test for writing to unsupported field. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/f78966b81b9349d2b8ebb4cee2caf15cb6b38ee2.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Xu authored
Support direct writes to nf_conn:mark from TC and XDP prog types. This is useful when applications want to store per-connection metadata. This is also particularly useful for applications that run both bpf and iptables/nftables because the latter can trivially access this metadata. One example use case would be if a bpf prog is responsible for advanced packet classification and iptables/nftables is later used for routing due to pre-existing/legacy code. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/ebca06dea366e3e7e861c12f375a548cc4c61108.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Xu authored
These symbols will be used in nf_conntrack.ko to support direct writes to `nf_conn`. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/3c98c19dc50d3b18ea5eca135b4fc3a5db036060.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Xu authored
Returning a bpf_reg_type only makes sense in the context of a BPF_READ. For writes, prefer to explicitly return 0 for clarity. Note that is non-functional change as it just so happened that NOT_INIT == 0. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/01772bc1455ae16600796ac78c6cc9fff34f95ff.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Xu authored
Add corresponding unimplemented stub for when CONFIG_BPF_SYSCALL=n Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/4021398e884433b1fef57a4d28361bb9fcf1bd05.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Xu authored
Since commit 27ae7997 ("bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS") there has existed bpf_verifier_ops:btf_struct_access. When btf_struct_access is _unset_ for a prog type, the verifier runs the default implementation, which is to enforce read only: if (env->ops->btf_struct_access) { [...] } else { if (atype != BPF_READ) { verbose(env, "only read is supported\n"); return -EACCES; } [...] } When btf_struct_access is _set_, the expectation is that btf_struct_access has full control over accesses, including if writes are allowed. Rather than carve out an exception for each prog type that may write to BTF ptrs, delete the redundant check and give full control to btf_struct_access. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/962da2bff1238746589e332ff1aecc49403cd7ce.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 10 Sep, 2022 2 commits
-
-
Punit Agrawal authored
In the percpu freelist code, it is a common pattern to iterate over the possible CPUs mask starting with the current CPU. The pattern is implemented using a hand rolled while loop with the loop variable increment being open-coded. Simplify the code by using for_each_cpu_wrap() helper to iterate over the possible cpus starting with the current CPU. As a result, some of the special-casing in the loop also gets simplified. No functional change intended. Signed-off-by: Punit Agrawal <punit.agrawal@bytedance.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20220907155746.1750329-1-punit.agrawal@bytedance.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Tetsuo Handa authored
syzbot is reporting ODEBUG bug in htab_map_alloc() [1], for commit 86fe28f7 ("bpf: Optimize element count in non-preallocated hash map.") added percpu_counter_init() to htab_map_alloc() but forgot to add percpu_counter_destroy() to the error path. Link: https://syzkaller.appspot.com/bug?extid=5d1da78b375c3b5e6c2b [1] Reported-by: syzbot <syzbot+5d1da78b375c3b5e6c2b@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 86fe28f7 ("bpf: Optimize element count in non-preallocated hash map.") Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/e2e4cc0e-9d36-4ca1-9bfa-ce23e6f8310b@I-love.SAKURA.ne.jpSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 09 Sep, 2022 5 commits
-
-
Martin KaFai Lau authored
YiFei Zhu says: ==================== Usually when a TCP/UDP connection is initiated, we can bind the socket to a specific IP attached to an interface in a cgroup/connect hook. But for pings, this is impossible, as the hook is not being called. This series adds the invocation for cgroup/connect{4,6} programs to unprivileged ICMP ping (i.e. ping sockets created with SOCK_DGRAM IPPROTO_ICMP(V6) as opposed to SOCK_RAW). This also adds a test to verify that the hooks are being called and invoking bpf_bind() from within the hook actually binds the socket. Patch 1 adds the invocation of the hook. Patch 2 deduplicates write_sysctl in BPF test_progs. Patch 3 adds the tests for this hook. v1 -> v2: * Added static to bindaddr_v6 in prog_tests/connect_ping.c * Deduplicated much of the test logic in prog_tests/connect_ping.c * Deduplicated write_sysctl() to test_progs.c v2 -> v3: * Renamed variable "obj" to "skel" for the BPF skeleton object in prog_tests/connect_ping.c v3 -> v4: * Fixed error path to destroy skel in prog_tests/connect_ping.c ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
YiFei Zhu authored
This tests that when an unprivileged ICMP ping socket connects, the hooks are actually invoked. We also ensure that if the hook does not call bpf_bind(), the bound address is unmodified, and if the hook calls bpf_bind(), the bound address is exactly what we provided to the helper. A new netns is used to enable ping_group_range in the test without affecting ouside of the test, because by default, not even root is permitted to use unprivileged ICMP ping... Signed-off-by: YiFei Zhu <zhuyifei@google.com> Link: https://lore.kernel.org/r/086b227c1b97f4e94193e58aae7576d0261b68a4.1662682323.git.zhuyifei@google.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
YiFei Zhu authored
This helper is needed in multiple tests. Instead of copying it over and over, better to deduplicate this helper to test_progs.c. test_progs.c is chosen over testing_helpers.c because of this helper's use of CHECK / ASSERT_*, and the CHECK was modified to use ASSERT_* so it does not rely on a duration variable. Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: YiFei Zhu <zhuyifei@google.com> Link: https://lore.kernel.org/r/9b4fc9a27bd52f771b657b4c4090fc8d61f3a6b5.1662682323.git.zhuyifei@google.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
YiFei Zhu authored
Usually when a TCP/UDP connection is initiated, we can bind the socket to a specific IP attached to an interface in a cgroup/connect hook. But for pings, this is impossible, as the hook is not being called. This adds the hook invocation to unprivileged ICMP ping (i.e. ping sockets created with SOCK_DGRAM IPPROTO_ICMP(V6) as opposed to SOCK_RAW. Logic is mirrored from UDP sockets where the hook is invoked during pre_connect, after a check for suficiently sized addr_len. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Link: https://lore.kernel.org/r/5764914c252fad4cd134fb6664c6ede95f409412.1662682323.git.zhuyifei@google.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Daniel Borkmann authored
This reverts commit 14e5ce79 ("libbpf: Add GCC support for bpf_tail_call_static"). Reason is that gcc invented their own BPF asm which is not conform with LLVM one, and going forward this would be more painful to maintain here and in other areas of the library. Thus remove it; ask to gcc folks is to align with LLVM one to use exact same syntax. Fixes: 14e5ce79 ("libbpf: Add GCC support for bpf_tail_call_static") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: James Hilliard <james.hilliard1@gmail.com> Cc: Jose E. Marchesi <jose.marchesi@oracle.com>
-
- 07 Sep, 2022 2 commits
-
-
Kumar Kartikeya Dwivedi authored
For a lot of use cases in future patches, we will want to modify the state of registers part of some same 'group' (e.g. same ref_obj_id). It won't just be limited to releasing reference state, but setting a type flag dynamically based on certain actions, etc. Hence, we need a way to easily pass a callback to the function that iterates over all registers in current bpf_verifier_state in all frames upto (and including) the curframe. While in C++ we would be able to easily use a lambda to pass state and the callback together, sadly we aren't using C++ in the kernel. The next best thing to avoid defining a function for each case seems like statement expressions in GNU C. The kernel already uses them heavily, hence they can passed to the macro in the style of a lambda. The statement expression will then be substituted in the for loop bodies. Variables __state and __reg are set to current bpf_func_state and reg for each invocation of the expression inside the passed in verifier state. Then, convert mark_ptr_or_null_regs, clear_all_pkt_pointers, release_reference, find_good_pkt_pointers, find_equal_scalars to use bpf_for_each_reg_in_vstate. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220904204145.3089-16-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
We need this helper to skip over special fields (bpf_spin_lock, bpf_timer, kptrs) while zeroing a map value. Use the same logic as copy_map_value but memset instead of memcpy. Currently, the code zeroing map value memory does not have to deal with special fields, hence this is a prerequisite for introducing such support. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220904204145.3089-4-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-