- 22 Jul, 2022 14 commits
-
-
Alexei Starovoitov authored
Kumar Kartikeya Dwivedi says: ==================== Introduce the following new kfuncs: - bpf_{xdp,skb}_ct_alloc - bpf_ct_insert_entry - bpf_ct_{set,change}_timeout - bpf_ct_{set,change}_status The setting of timeout and status on allocated or inserted/looked up CT is same as the ctnetlink interface, hence code is refactored and shared with the kfuncs. It is ensured allocated CT cannot be passed to kfuncs that expected inserted CT, and vice versa. Please see individual patches for details. Changelog: ---------- v6 -> v7: v6: https://lore.kernel.org/bpf/20220719132430.19993-1-memxor@gmail.com * Use .long to encode flags (Alexei) * Fix description of KF_RET_NULL in documentation (Toke) v5 -> v6: v5: https://lore.kernel.org/bpf/20220623192637.3866852-1-memxor@gmail.com * Introduce kfunc flags, rework verifier to work with them * Add documentation for kfuncs * Add comment explaining TRUSTED_ARGS kfunc flag (Alexei) * Fix missing offset check for trusted arguments (Alexei) * Change nf_conntrack test minimum delta value to 8 v4 -> v5: v4: https://lore.kernel.org/bpf/cover.1653600577.git.lorenzo@kernel.org * Drop read-only PTR_TO_BTF_ID approach, use struct nf_conn___init (Alexei) * Drop acquire release pair code that is no longer required (Alexei) * Disable writes into nf_conn, use dedicated helpers (Florian, Alexei) * Refactor and share ctnetlink code for setting timeout and status * Do strict type matching on finding __ref suffix on argument to prevent passing nf_conn___init as nf_conn (offset = 0, match on walk) * Remove bpf_ct_opts parameter from bpf_ct_insert_entry * Update selftests for new additions, add more negative tests v3 -> v4: v3: https://lore.kernel.org/bpf/cover.1652870182.git.lorenzo@kernel.org * split bpf_xdp_ct_add in bpf_xdp_ct_alloc/bpf_skb_ct_alloc and bpf_ct_insert_entry * add verifier code to properly populate/configure ct entry * improve selftests v2 -> v3: v2: https://lore.kernel.org/bpf/cover.1652372970.git.lorenzo@kernel.org * add bpf_xdp_ct_add and bpf_ct_refresh_timeout kfunc helpers * remove conntrack dependency from selftests * add support for forcing kfunc args to be referenced and related selftests v1 -> v2: v1: https://lore.kernel.org/bpf/1327f8f5696ff2bc60400e8f3b79047914ccc837.1651595019.git.lorenzo@kernel.org * add bpf_ct_refresh_timeout kfunc selftest Kumar Kartikeya Dwivedi (10): bpf: Introduce 8-byte BTF set tools/resolve_btfids: Add support for 8-byte BTF sets bpf: Switch to new kfunc flags infrastructure bpf: Add support for forcing kfunc args to be trusted bpf: Add documentation for kfuncs net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup net: netfilter: Add kfuncs to set and change CT timeout selftests/bpf: Add verifier tests for trusted kfunc args selftests/bpf: Add negative tests for new nf_conntrack kfuncs selftests/bpf: Fix test_verifier failed test in unprivileged mode ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
Loading the BTF won't be permitted without privileges, hence only test for privileged mode by setting the prog type. This makes the test_verifier show 0 failures when unprivileged BPF is enabled. Fixes: 41188e9e ("selftest/bpf: Test for use-after-free bug fix in inline_bpf_loop") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-14-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
Test cases we care about and ensure improper usage is caught and rejected by the verifier. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-13-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Lorenzo Bianconi authored
Introduce selftests for the following kfunc helpers: - bpf_xdp_ct_alloc - bpf_skb_ct_alloc - bpf_ct_insert_entry - bpf_ct_set_timeout - bpf_ct_change_timeout - bpf_ct_set_status - bpf_ct_change_status Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-12-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
Make sure verifier rejects the bad cases and ensure the good case keeps working. The selftests make use of the bpf_kfunc_call_test_ref kfunc added in the previous patch only for verification. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-11-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Lorenzo Bianconi authored
Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in order to set nf_conn field of allocated entry or update nf_conn status field of existing inserted entry. Use nf_ct_change_status_common to share the permitted status field changes between netlink and BPF side by refactoring ctnetlink_change_status. It is required to introduce two kfuncs taking nf_conn___init and nf_conn instead of sharing one because KF_TRUSTED_ARGS flag causes strict type checking. This would disallow passing nf_conn___init to kfunc taking nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we only want to accept refcounted pointers and not e.g. ct->master. Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and bpf_ct_change_* kfuncs are meant to be used on inserted or looked up CT entry. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-10-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in order to change nf_conn timeout. This is same as ctnetlink_change_timeout, hence code is shared between both by extracting it out to __nf_ct_change_timeout. It is also updated to return an error when it sees IPS_FIXED_TIMEOUT_BIT bit in ct->status, as that check was missing. It is required to introduce two kfuncs taking nf_conn___init and nf_conn instead of sharing one because KF_TRUSTED_ARGS flag causes strict type checking. This would disallow passing nf_conn___init to kfunc taking nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we only want to accept refcounted pointers and not e.g. ct->master. Apart from this, bpf_ct_set_timeout is only called for newly allocated CT so it doesn't need to inspect the status field just yet. Sharing the helpers even if it was possible would make timeout setting helper sensitive to order of setting status and timeout after allocation. Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and bpf_ct_change_* kfuncs are meant to be used on inserted or looked up CT entry. Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-9-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Lorenzo Bianconi authored
Introduce bpf_xdp_ct_alloc, bpf_skb_ct_alloc and bpf_ct_insert_entry kfuncs in order to insert a new entry from XDP and TC programs. Introduce bpf_nf_ct_tuple_parse utility routine to consolidate common code. We extract out a helper __nf_ct_set_timeout, used by the ctnetlink and nf_conntrack_bpf code, extract it out to nf_conntrack_core, so that nf_conntrack_bpf doesn't need a dependency on CONFIG_NF_CT_NETLINK. Later this helper will be reused as a helper to set timeout of allocated but not yet inserted CT entry. The allocation functions return struct nf_conn___init instead of nf_conn, to distinguish allocated CT from an already inserted or looked up CT. This is later used to enforce restrictions on what kfuncs allocated CT can be used with. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-8-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
Move common checks inside the common function, and maintain the only difference the two being how to obtain the struct net * from ctx. No functional change intended. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-7-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
As the usage of kfuncs grows, we are starting to form consensus on the kinds of attributes and annotations that kfuncs can have. To better help developers make sense of the various options available at their disposal to present an unstable API to the BPF users, document the various kfunc flags and annotations, their expected usage, and explain the process of defining and registering a kfunc set. Cc: KP Singh <kpsingh@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-6-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
Teach the verifier to detect a new KF_TRUSTED_ARGS kfunc flag, which means each pointer argument must be trusted, which we define as a pointer that is referenced (has non-zero ref_obj_id) and also needs to have its offset unchanged, similar to how release functions expect their argument. This allows a kfunc to receive pointer arguments unchanged from the result of the acquire kfunc. This is required to ensure that kfunc that operate on some object only work on acquired pointers and not normal PTR_TO_BTF_ID with same type which can be obtained by pointer walking. The restrictions applied to release arguments also apply to trusted arguments. This implies that strict type matching (not deducing type by recursively following members at offset) and OBJ_RELEASE offset checks (ensuring they are zero) are used for trusted pointer arguments. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-5-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
Instead of populating multiple sets to indicate some attribute and then researching the same BTF ID in them, prepare a single unified BTF set which indicates whether a kfunc is allowed to be called, and also its attributes if any at the same time. Now, only one call is needed to perform the lookup for both kfunc availability and its attributes. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
A flag is a 4-byte symbol that may follow a BTF ID in a set8. This is used in the kernel to tag kfuncs in BTF sets with certain flags. Add support to adjust the sorting code so that it passes size as 8 bytes for 8-byte BTF sets. Cc: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220721134245.2450-3-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
Introduce support for defining flags for kfuncs using a new set of macros, BTF_SET8_START/BTF_SET8_END, which define a set which contains 8 byte elements (each of which consists of a pair of BTF ID and flags), using a new BTF_ID_FLAGS macro. This will be used to tag kfuncs registered for a certain program type as acquire, release, sleepable, ret_null, etc. without having to create more and more sets which was proving to be an unscalable solution. Now, when looking up whether a kfunc is allowed for a certain program, we can also obtain its kfunc flags in the same call and avoid further lookups. The resolve_btfids change is split into a separate patch. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-2-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 21 Jul, 2022 5 commits
-
-
Alejandro Colomar authored
The Linux man-pages project now uses SPDX tags, instead of the full license text. Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://www.kernel.org/doc/man-pages/licenses.html Link: https://spdx.org/licenses/Linux-man-pages-copyleft.html Link: https://lore.kernel.org/bpf/20220721110821.8240-1-alx.manpages@gmail.com
-
Xu Kuohai authored
dummy_tramp() uses "lr" to refer to the x30 register, but some assembler does not recognize "lr" and reports a build failure: /tmp/cc52xO0c.s: Assembler messages: /tmp/cc52xO0c.s:8: Error: operand 1 should be an integer register -- `mov lr,x9' /tmp/cc52xO0c.s:7: Error: undefined symbol lr used as an immediate value make[2]: *** [scripts/Makefile.build:250: arch/arm64/net/bpf_jit_comp.o] Error 1 make[1]: *** [scripts/Makefile.build:525: arch/arm64/net] Error 2 So replace "lr" with "x30" to fix it. Fixes: b2ad54e1 ("bpf, arm64: Implement bpf_arch_text_poke() for arm64") Reported-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/bpf/20220721121319.2999259-1-xukuohai@huaweicloud.com
-
Stanislav Fomichev authored
Syzkaller found a problem similar to d1a6edec ("bpf: Check attach_func_proto more carefully in check_return_code") where attach_func_proto might be NULL: RIP: 0010:check_helper_call+0x3dcb/0x8d50 kernel/bpf/verifier.c:7330 do_check kernel/bpf/verifier.c:12302 [inline] do_check_common+0x6e1e/0xb980 kernel/bpf/verifier.c:14610 do_check_main kernel/bpf/verifier.c:14673 [inline] bpf_check+0x661e/0xc520 kernel/bpf/verifier.c:15243 bpf_prog_load+0x11ae/0x1f80 kernel/bpf/syscall.c:2620 With the following reproducer: bpf$BPF_PROG_RAW_TRACEPOINT_LOAD(0x5, &(0x7f0000000780)={0xf, 0x4, &(0x7f0000000040)=@framed={{}, [@call={0x85, 0x0, 0x0, 0xbb}]}, &(0x7f0000000000)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, 0x2b, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0}, 0x80) Let's do the same here, only check attach_func_proto for the prog types where we are certain that attach_func_proto is defined. Fixes: 69fd337a ("bpf: per-cgroup lsm flavor") Reported-by: syzbot+0f8d989b1fba1addc5e0@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220720164729.147544-1-sdf@google.com
-
Dan Carpenter authored
The return from strcmp() is inverted so it wrongly returns true instead of false and vice versa. Fixes: a1c9d61b ("libbpf: Improve library identification for uprobe binary path resolution") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/YtZ+/dAA195d99ak@kili
-
Dan Carpenter authored
The code here is supposed to take a signed int and store it in a signed long long. Unfortunately, the way that the type promotion works with this conditional statement is that it takes a signed int, type promotes it to a __u32, and then stores that as a signed long long. The result is never negative. This is from static analysis, but I made a little test program just to test it before I sent the patch: #include <stdio.h> int main(void) { unsigned long long src = -1ULL; signed long long dst1, dst2; int is_signed = 1; dst1 = is_signed ? *(int *)&src : *(unsigned int *)0; dst2 = is_signed ? (signed long long)*(int *)&src : *(unsigned int *)0; printf("%lld\n", dst1); printf("%lld\n", dst2); return 0; } Fixes: d90ec262 ("libbpf: Add enum64 support for btf_dump") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/YtZ+LpgPADm7BeEd@kili
-
- 20 Jul, 2022 1 commit
-
-
Stanislav Fomichev authored
They were updated in kernel/bpf/trampoline.c to fix another build issue. We should to do the same for include/linux/bpf.h header. Fixes: 3908fcdd ("bpf: fix lsm_cgroup build errors on esoteric configs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220720155220.4087433-1-sdf@google.com
-
- 19 Jul, 2022 20 commits
-
-
Dan Carpenter authored
The snprintf() function returns the number of bytes it *would* have copied if there were enough space. So it can return > the sizeof(gen->attach_target). Fixes: 67234743 ("libbpf: Generate loader program out of BPF ELF file.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/YtZ+oAySqIhFl6/J@kiliSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Dan Carpenter authored
The snprintf() function returns the number of bytes which *would* have been copied if there were space. In other words, it can be > sizeof(pin_path). Fixes: c0fa1b6c ("bpf: btf: Add BTF tests") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/YtZ+aD/tZMkgOUw+@kiliSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Donald Hunter authored
Add documentation for BPF_MAP_TYPE_HASH including kernel version introduced, usage and examples. Document BPF_MAP_TYPE_PERCPU_HASH, BPF_MAP_TYPE_LRU_HASH and BPF_MAP_TYPE_LRU_PERCPU_HASH variations. Note that this file is included in the BPF documentation by the glob in Documentation/bpf/maps.rst v3: Fix typos reported by Stanislav Fomichev and Yonghong Song. Add note about iteration and deletion as requested by Yonghong Song. v2: Describe memory allocation semantics as suggested by Stanislav Fomichev. Fix u64 typo reported by Stanislav Fomichev. Cut down usage examples to only show usage in context. Updated patch description to follow style recommendation, reported by Bagas Sanjaya. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220718125847.1390-1-donald.hunter@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Add test validating that libbpf adjusts (and reflects adjusted) ringbuf size early, before bpf_object is loaded. Also make sure we can't successfully resize ringbuf map after bpf_object is loaded. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220715230952.2219271-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Make libbpf adjust RINGBUF map size (rounding it up to closest power-of-2 of page_size) more eagerly: during open phase when initializing the map and on explicit calls to bpf_map__set_max_entries(). Such approach allows user to check actual size of BPF ringbuf even before it's created in the kernel, but also it prevents various edge case scenarios where BPF ringbuf size can get out of sync with what it would be in kernel. One of them (reported in [0]) is during an attempt to pin/reuse BPF ringbuf. Move adjust_ringbuf_sz() helper closer to its first actual use. The implementation of the helper is unchanged. Also make detection of whether bpf_object is already loaded more robust by checking obj->loaded explicitly, given that map->fd can be < 0 even if bpf_object is already loaded due to ability to disable map creation with bpf_map__set_autocreate(map, false). [0] Closes: https://github.com/libbpf/libbpf/pull/530 Fixes: 0087a681 ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220715230952.2219271-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Joanne Koong authored
Fix documentation for bpf_skb_pull_data() helper for when len == 0. Fixes: fa15601a ("bpf: add documentation for eBPF helpers (33-41)") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220715193800.3940070-1-joannelkoong@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Teach libbpf to fallback to tracefs mount point (/sys/kernel/tracing) if debugfs (/sys/kernel/debug/tracing) isn't mounted. Acked-by: Yonghong Song <yhs@fb.com> Suggested-by: Connor O'Brien <connoro@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220715185736.898848-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Zhengchao Shao authored
Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any skbs, that is, the flow->head is null. The root cause, as the [2] says, is because that bpf_prog_test_run_skb() run a bpf prog which redirects empty skbs. So we should determine whether the length of the packet modified by bpf prog or others like bpf_prog_test is valid before forwarding it directly. LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5 LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Alexei Starovoitov authored
Andrii Nakryiko says: ==================== Fix 32-bit overflow in value pointer calculations in BPF array map. And then raise obsolete limit on array map value size. Add selftest making sure this is working as intended. v1->v2: - fix broken patch #1 (no mask_index use in helper, as stated in commit message; and add missing semicolon). ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Add a simple big 16MB array and validate access to the very last byte of it to make sure that kernel supports > KMALLOC_MAX_SIZE value_size for BPF array maps (which are backing .bss in this case). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220715053146.1291891-5-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Syscall-side map_lookup_elem() and map_update_elem() used to use kmalloc() to allocate temporary buffers of value_size, so KMALLOC_MAX_SIZE limit on value_size made sense to prevent creation of array map that won't be accessible through syscall interface. But this limitation since has been lifted by relying on kvmalloc() in syscall handling code. So remove KMALLOC_MAX_SIZE, which among other things means that it's possible to have BPF global variable sections (.bss, .data, .rodata) bigger than 8MB now. Keep the sanity check to prevent trivial overflows like round_up(map->value_size, 8) and restrict value size to <= INT_MAX (2GB). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220715053146.1291891-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
BPF_MAP_TYPE_ARRAY is rounding value_size to closest multiple of 8 and stores that as array->elem_size for various memory allocations and accesses. But the code tends to re-calculate round_up(map->value_size, 8) in multiple places instead of using array->elem_size. Cleaning this up and making sure we always use array->size to avoid duplication of this (admittedly simple) logic for consistency. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220715053146.1291891-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
If BPF array map is bigger than 4GB, element pointer calculation can overflow because both index and elem_size are u32. Fix this everywhere by forcing 64-bit multiplication. Extract this formula into separate small helper and use it consistently in various places. Speculative-preventing formula utilizing index_mask trick is left as is, but explicit u64 casts are added in both places. Fixes: c85d6913 ("bpf: move memory size checks to bpf_map_charge_init()") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220715053146.1291891-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Indu Bhagat authored
The vlen bits in the BTF type of kind BTF_KIND_FUNC are used to convey the linkage information for functions. The Linux kernel only supports linkage values of BTF_FUNC_STATIC and BTF_FUNC_GLOBAL at this time. Signed-off-by: Indu Bhagat <indu.bhagat@oracle.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220714223310.1140097-1-indu.bhagat@oracle.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Stanislav Fomichev authored
This particular ones is about having the following: CONFIG_BPF_LSM=y # CONFIG_CGROUP_BPF is not set Also, add __maybe_unused to the args for the !CONFIG_NET cases. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220714185404.3647772-1-sdf@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Alexei Starovoitov authored
Andrii Nakryiko says: ==================== Add SEC("ksyscall")/SEC("kretsyscall") sections and corresponding bpf_program__attach_ksyscall() API that simplifies tracing kernel syscalls through kprobe mechanism. Kprobing syscalls isn't trivial due to varying syscall handler names in the kernel and various ways syscall argument are passed, depending on kernel architecture and configuration. SEC("ksyscall") allows user to not care about such details and just get access to syscall input arguments, while libbpf takes care of necessary feature detection logic. There are still more quirks that are not straightforward to hide completely (see comments about mmap(), clone() and compat syscalls), so in such more advanced scenarios user might need to fall back to plain SEC("kprobe") approach, but for absolute majority of users SEC("ksyscall") is a big improvement. As part of this patch set libbpf adds two more virtual __kconfig externs, in addition to existing LINUX_KERNEL_VERSION: LINUX_HAS_BPF_COOKIE and LINUX_HAS_SYSCALL_WRAPPER, which let's libbpf-provided BPF-side code minimize external dependencies and assumptions and let's user-space part of libbpf to perform all the feature detection logic. This benefits USDT support code, which now doesn't depend on BPF CO-RE for its functionality. v1->v2: - normalize extern variable-related warn and debug message formats (Alan); rfc->v1: - drop dependency on kallsyms and speed up SYSCALL_WRAPPER detection (Alexei); - drop dependency on /proc/config.gz in bpf_tracing.h (Yaniv); - add doc comment and ephasize mmap(), clone() and compat quirks that are not supported (Ilya); - use mechanism similar to LINUX_KERNEL_VERSION to also improve USDT code. ==================== Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Convert few selftest that used plain SEC("kprobe") with arch-specific syscall wrapper prefix to ksyscall/kretsyscall and corresponding BPF_KSYSCALL macro. test_probe_user.c is especially benefiting from this simplification. Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220714070755.3235561-6-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding kretsyscall variants (for return kprobes) to allow users to kprobe syscall functions in kernel. These special sections allow to ignore complexities and differences between kernel versions and host architectures when it comes to syscall wrapper and corresponding __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on whether host kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER (though libbpf itself doesn't rely on /proc/config.gz for detecting this, see BPF_KSYSCALL patch for how it's done internally). Combined with the use of BPF_KSYSCALL() macro, this allows to just specify intended syscall name and expected input arguments and leave dealing with all the variations to libbpf. In addition to SEC("ksyscall+") and SEC("kretsyscall+") add bpf_program__attach_ksyscall() API which allows to specify syscall name at runtime and provide associated BPF cookie value. At the moment SEC("ksyscall") and bpf_program__attach_ksyscall() do not handle all the calling convention quirks for mmap(), clone() and compat syscalls. It also only attaches to "native" syscall interfaces. If host system supports compat syscalls or defines 32-bit syscalls in 64-bit kernel, such syscall interfaces won't be attached to by libbpf. These limitations may or may not change in the future. Therefore it is recommended to use SEC("kprobe") for these syscalls or if working with compat and 32-bit interfaces is required. Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220714070755.3235561-5-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Improve BPF_KPROBE_SYSCALL (and rename it to shorter BPF_KSYSCALL to match libbpf's SEC("ksyscall") section name, added in next patch) to use __kconfig variable to determine how to properly fetch syscall arguments. Instead of relying on hard-coded knowledge of whether kernel's architecture uses syscall wrapper or not (which only reflects the latest kernel versions, but is not necessarily true for older kernels and won't necessarily hold for later kernel versions on some particular host architecture), determine this at runtime by attempting to create perf_event (with fallback to kprobe event creation through tracefs on legacy kernels, just like kprobe attachment code is doing) for kernel function that would correspond to bpf() syscall on a system that has CONFIG_ARCH_HAS_SYSCALL_WRAPPER set (e.g., for x86-64 it would try '__x64_sys_bpf'). If host kernel uses syscall wrapper, syscall kernel function's first argument is a pointer to struct pt_regs that then contains syscall arguments. In such case we need to use bpf_probe_read_kernel() to fetch actual arguments (which we do through BPF_CORE_READ() macro) from inner pt_regs. But if the kernel doesn't use syscall wrapper approach, input arguments can be read from struct pt_regs directly with no probe reading. All this feature detection is done without requiring /proc/config.gz existence and parsing, and BPF-side helper code uses newly added LINUX_HAS_SYSCALL_WRAPPER virtual __kconfig extern to keep in sync with user-side feature detection of libbpf. BPF_KSYSCALL() macro can be used both with SEC("kprobe") programs that define syscall function explicitly (e.g., SEC("kprobe/__x64_sys_bpf")) and SEC("ksyscall") program added in the next patch (which are the same kprobe program with added benefit of libbpf determining correct kernel function name automatically). Kretprobe and kretsyscall (added in next patch) programs don't need BPF_KSYSCALL as they don't provide access to input arguments. Normal BPF_KRETPROBE is completely sufficient and is recommended. Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220714070755.3235561-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Exercise libbpf's logic for unknown __weak virtual __kconfig externs. USDT selftests are already excercising non-weak known virtual extern already (LINUX_HAS_BPF_COOKIE), so no need to add explicit tests for it. Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220714070755.3235561-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-