Commits · 185cfe837fdbb1fcc0f6b8fbcf5fdd2d1fccd3ad · Kirill Smelkov / linux

26 Apr, 2022 22 commits

libbpf: Record subprog-resolved CO-RE relocations unconditionally · 185cfe83

Andrii Nakryiko authored Apr 25, 2022

Previously, libbpf recorded CO-RE relocations with insns_idx resolved
according to finalized subprog locations (which are appended at the end
of entry BPF program) to simplify the job of light skeleton generator.

This is necessary because once subprogs' instructions are appended to
main entry BPF program all the subprog instruction indices are shifted
and that shift is different for each entry (main) BPF program, so it's
generally impossible to map final absolute insn_idx of the finalized BPF
program to their original locations inside subprograms.

This information is now going to be used not only during light skeleton
generation, but also to map absolute instruction index to subprog's
instruction and its corresponding CO-RE relocation. So start recording
these relocations always, not just when obj->gen_loader is set.

This information is going to be freed at the end of bpf_object__load()
step, as before (but this can change in the future if there will be
a need for this information post load step).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-7-andrii@kernel.org

185cfe83

selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests · b82bb1ff

Andrii Nakryiko authored Apr 25, 2022

Enhance linked_funcs selftest with two tricky features that might not
obviously work correctly together. We add CO-RE relocations to entry BPF
programs and mark those programs as non-autoloadable with SEC("?...")
annotation. This makes sure that libbpf itself handles .BTF.ext CO-RE
relocation data matching correctly for SEC("?...") programs, as well as
ensures that BPF static linker handles this correctly (this was the case
before, no changes are necessary, but it wasn't explicitly tested).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-6-andrii@kernel.org

b82bb1ff

libbpf: Avoid joining .BTF.ext data with BPF programs by section name · 11d5daa8

Andrii Nakryiko authored Apr 25, 2022

Instead of using ELF section names as a joining key between .BTF.ext and
corresponding BPF programs, pre-build .BTF.ext section number to ELF
section index mapping during bpf_object__open() and use it later for
matching .BTF.ext information (func/line info or CO-RE relocations) to
their respective BPF programs and subprograms.

This simplifies corresponding joining logic and let's libbpf do
manipulations with BPF program's ELF sections like dropping leading '?'
character for non-autoloaded programs. Original joining logic in
bpf_object__relocate_core() (see relevant comment that's now removed)
was never elegant, so it's a good improvement regardless. But it also
avoids unnecessary internal assumptions about preserving original ELF
section name as BPF program's section name (which was broken when
SEC("?abc") support was added).

Fixes: a3820c48 ("libbpf: Support opting out from autoloading BPF programs declaratively")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-5-andrii@kernel.org

11d5daa8

libbpf: Fix logic for finding matching program for CO-RE relocation · 966a7509

Andrii Nakryiko authored Apr 25, 2022

Fix the bug in bpf_object__relocate_core() which can lead to finding
invalid matching BPF program when processing CO-RE relocation. IF
matching program is not found, last encountered program will be assumed
to be correct program and thus error detection won't detect the problem.

Fixes: 9c82a63c ("libbpf: Fix CO-RE relocs against .text section")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-4-andrii@kernel.org

966a7509

libbpf: Drop unhelpful "program too large" guess · 0994a54c

Andrii Nakryiko authored Apr 25, 2022

libbpf pretends it knows actual limit of BPF program instructions based
on UAPI headers it compiled with. There is neither any guarantee that
UAPI headers match host kernel, nor BPF verifier actually uses
BPF_MAXINSNS constant anymore. Just drop unhelpful "guess", BPF verifier
will emit actual reason for failure in its logs anyways.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-3-andrii@kernel.org

0994a54c

libbpf: Fix anonymous type check in CO-RE logic · afe98d46

Andrii Nakryiko authored Apr 25, 2022

Use type name for checking whether CO-RE relocation is referring to
anonymous type. Using spec string makes no sense.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220426004511.2691730-2-andrii@kernel.org

afe98d46

bpf: Compute map_btf_id during build time · c317ab71

Menglong Dong authored Apr 25, 2022

For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
types are computed during vmlinux-btf init:

  btf_parse_vmlinux() -> btf_vmlinux_map_ids_init()

It will lookup the btf_type according to the 'map_btf_name' field in
'struct bpf_map_ops'. This process can be done during build time,
thanks to Jiri's resolve_btfids.

selftest of map_ptr has passed:

  $96 map_ptr:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

c317ab71

Merge branch 'Introduce typed pointer support in BPF maps' · 367590b7

Alexei Starovoitov authored Apr 25, 2022

Kumar Kartikeya Dwivedi says:

====================

This set enables storing pointers of a certain type in BPF map, and extends the
verifier to enforce type safety and lifetime correctness properties.

The infrastructure being added is generic enough for allowing storing any kind
of pointers whose type is available using BTF (user or kernel) in the future
(e.g. strongly typed memory allocation in BPF program), which are internally
tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
two kinds of pointers obtained from the kernel.

Obviously, use of this feature depends on map BTF.

1. Unreferenced kernel pointer

In this case, there are very few restrictions. The pointer type being stored
must match the type declared in the map value. However, such a pointer when
loaded from the map can only be dereferenced, but not passed to any in-kernel
helpers or kernel functions available to the program. This is because while the
verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
which are then handled specially by the JIT implementation, the same liberty is
not available to accesses inside the kernel. The pointer by the time it is
passed into a helper has no lifetime related guarantees about the object it is
pointing to, and may well be referencing invalid memory.

2. Referenced kernel pointer

This case imposes a lot of restrictions on the programmer, to ensure safety. To
transfer the ownership of a reference in the BPF program to the map, the user
must use the bpf_kptr_xchg helper, which returns the old pointer contained in
the map, as an acquired reference, and releases verifier state for the
referenced pointer being exchanged, as it moves into the map.

This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
functions callable by the program.

However, if BPF_LDX is used to load a referenced pointer from the map, it is
still not permitted to pass it to in-kernel helpers or kernel functions. To
obtain a reference usable with helpers, the user must invoke a kfunc helper
which returns a usable reference (which also must be eventually released before
BPF_EXIT, or moved into a map).

Since the load of the pointer (preserving data dependency ordering) must happen
inside the RCU read section, the kfunc helper will take a pointer to the map
value, which must point to the actual pointer of the object whose reference is
to be raised. The type will be verified from the BTF information of the kfunc,
as the prototype must be:

	T *func(T **, ... /* other arguments */);

Then, the verifier checks whether pointer at offset of the map value points to
the type T, and permits the call.

This convention is followed so that such helpers may also be called from
sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
program context, hence necessiating the need to pass in a pointer to the actual
pointer to perform the load inside the RCU read section.

Notes
-----

 * C selftests require https://reviews.llvm.org/D119799 to pass.
 * Unlike BPF timers, kptr is not reset or freed on map_release_uref.
 * Referenced kptr storage is always treated as unsigned long * on kernel side,
   as BPF side cannot mutate it. The storage (8 bytes) is sufficient for both
   32-bit and 64-bit platforms.
 * Use of WRITE_ONCE to reset unreferenced kptr on 32-bit systems is fine, as
   the actual pointer is always word sized, so the store tearing into two 32-bit
   stores won't be a problem as the other half is always zeroed out.

Changelog:
----------
v5 -> v6
v5: https://lore.kernel.org/bpf/20220415160354.1050687-1-memxor@gmail.com

 * Address comments from Alexei
   * Drop 'Revisit stack usage' comment
   * Rename off_btf to kernel_btf
   * Add comment about searching using type from map BTF
   * Do kmemdup + btf_get instead of get + kmemdup + put
   * Add comment for btf_struct_ids_match
   * Add comment for assigning non-zero id for mark_ptr_or_null_reg
   * Rename PTR_RELEASE to OBJ_RELEASE
   * Rename BPF_MAP_OFF_DESC_TYPE_XXX_KPTR to BPF_KPTR_XXX
   * Remove unneeded likely/unlikely in cold functions
   * Fix other misc nits
 * Keep release_regno instead of replacing with bool + regno
 * Add a patch to prevent type match for first member when off == 0 for
   release functions (kfunc + BPF helpers)
 * Guard kptr/kptr_ref definition in libbpf header with __has_attribute
   to prevent selftests compilation error with old clang not support
   type tags

v4 -> v5
v4: https://lore.kernel.org/bpf/20220409093303.499196-1-memxor@gmail.com

 * Address comments from Joanne
   * Move __btf_member_bit_offset before strcmp
   * Move strcmp conditional on name to unref kptr patch
   * Directly return from btf_find_struct in patch 1
   * Use enum btf_field_type vs int field_type
   * Put btf and btf_id in off_desc in named struct 'kptr'
   * Switch order for BTF_FIELD_IGNORE check
   * Drop dead tab->nr_off = 0 store
   * Use i instead of tab->nr_off to btf_put on failure
   * Replace kzalloc + memcpy with kmemdup (kernel test robot)
   * Reject both BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG
   * Add logging statement for reject BPF_MODE(insn->code) != BPF_MEM
   * Rename off_desc -> kptr_off_desc in check_mem_access
   * Drop check for err, fallthrough to end of function
   * Remove is_release_function, use meta.release_regno to detect release
     function, release reference state, and remove check_release_regno
   * Drop off_desc->flags, use off_desc->type
   * Update comment for ARG_PTR_TO_KPTR
 * Distinguish between direct/indirect access to kptr
 * Drop check_helper_mem_access from process_kptr_func, check_mem_reg in kptr_get
 * Add verifier test for helper accessing kptr indirectly
 * Fix other misc nits, add Acked-by for patch 2

v3 -> v4
v3: https://lore.kernel.org/bpf/20220320155510.671497-1-memxor@gmail.com

 * Use btf_parse_kptrs, plural kptrs naming (Joanne, Andrii)
 * Remove unused parameters in check_map_kptr_access (Joanne)
 * Handle idx < info_cnt kludge using tmp variable (Andrii)
 * Validate tags always precede modifiers in BTF (Andrii)
   * Split out into https://lore.kernel.org/bpf/20220406004121.282699-1-memxor@gmail.com
 * Store u32 type_id in btf_field_info (Andrii)
 * Use base_type in map_kptr_match_type (Andrii)
 * Free	kptr_off_tab when not bpf_capable (Martin)
 * Use PTR_RELEASE flag instead of bools in bpf_func_proto (Joanne)
 * Drop extra reg->off and reg->ref_obj_id checks in map_kptr_match_type (Martin)
 * Use separate u32 and u8 arrays for offs and sizes in off_arr (Andrii)
 * Simplify and remove map->value_size sentinel in copy_map_value (Andrii)
 * Use sort_r to keep both arrays in sync while sorting (Andrii)
 * Rename check_and_free_timers_and_kptr to check_and_free_fields (Andrii)
 * Move dtor prototype checks to registration phase (Alexei)
 * Use ret variable for checking ASSERT_XXX, use shorter strings (Andrii)
 * Fix missing checks for other maps (Jiri)
 * Fix various other nits, and bugs noticed during self review

v2 -> v3
v2: https://lore.kernel.org/bpf/20220317115957.3193097-1-memxor@gmail.com

 * Address comments from Alexei
   * Set name, sz, align in btf_find_field
   * Do idx >= info_cnt check in caller of btf_find_field_*
     * Use extra element in the info_arr to make this safe
   * Remove while loop, reject extra tags
   * Remove cases of defensive programming
   * Move bpf_capable() check to map_check_btf
   * Put check_ptr_off_reg reordering hunk into separate patch
   * Warn for ref_ptr once
   * Make the meta.ref_obj_id == 0 case simpler to read
   * Remove kptr_percpu and kptr_user support, remove their tests
   * Store size of field at offset in off_arr
 * Fix BPF_F_NO_PREALLOC set wrongly for hash map in C selftest
 * Add missing check_mem_reg call for kptr_get kfunc arg#0 check

v1 -> v2
v1: https://lore.kernel.org/bpf/20220220134813.3411982-1-memxor@gmail.com

 * Address comments from Alexei
   * Rename bpf_btf_find_by_name_kind_all to bpf_find_btf_id
   * Reduce indentation level in that function
   * Always take reference regardless of module or vmlinux BTF
   * Also made it the same for btf_get_module_btf
   * Use kptr, kptr_ref, kptr_percpu, kptr_user type tags
   * Don't reserve tag namespace
   * Refactor btf_find_field to be side effect free, allocate and populate
     kptr_off_tab in caller
   * Move module reference to dtor patch
   * Remove support for BPF_XCHG, BPF_CMPXCHG insn
   * Introduce bpf_kptr_xchg helper
   * Embed offset array in struct bpf_map, populate and sort it once
   * Adjust copy_map_value to memcpy directly using this offset array
   * Removed size member from offset array to save space
 * Fix some problems pointed out by kernel test robot
 * Tidy selftests
 * Lots of other minor fixes
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

367590b7

selftests/bpf: Add test for strict BTF type check · 792c0a34

Kumar Kartikeya Dwivedi authored Apr 25, 2022

Ensure that the edge case where first member type was matched
successfully even if it didn't match BTF type of register is caught and
rejected by the verifier.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-14-memxor@gmail.com

792c0a34

selftests/bpf: Add verifier tests for kptr · 05a945de

Kumar Kartikeya Dwivedi authored Apr 25, 2022

Reuse bpf_prog_test functions to test the support for PTR_TO_BTF_ID in
BPF map case, including some tests that verify implementation sanity and
corner cases.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-13-memxor@gmail.com

05a945de

selftests/bpf: Add C tests for kptr · 2cbc469a

Kumar Kartikeya Dwivedi authored Apr 25, 2022

This uses the __kptr and __kptr_ref macros as well, and tries to test
the stuff that is supposed to work, since we have negative tests in
test_verifier suite. Also include some code to test map-in-map support,
such that the inner_map_meta matches the kptr_off_tab of map added as
element.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-12-memxor@gmail.com

2cbc469a

libbpf: Add kptr type tag macros to bpf_helpers.h · ef89654f

Kumar Kartikeya Dwivedi authored Apr 25, 2022

Include convenience definitions:
__kptr:	Unreferenced kptr
__kptr_ref: Referenced kptr

Users can use them to tag the pointer type meant to be used with the new
support directly in the map value definition.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-11-memxor@gmail.com

ef89654f

bpf: Make BTF type match stricter for release arguments · 2ab3b380

Kumar Kartikeya Dwivedi authored Apr 25, 2022

The current of behavior of btf_struct_ids_match for release arguments is
that when type match fails, it retries with first member type again
(recursively). Since the offset is already 0, this is akin to just
casting the pointer in normal C, since if type matches it was just
embedded inside parent sturct as an object. However, we want to reject
cases for release function type matching, be it kfunc or BPF helpers.

An example is the following:

struct foo {
	struct bar b;
};

struct foo *v = acq_foo();
rel_bar(&v->b); // btf_struct_ids_match fails btf_types_are_same, then
		// retries with first member type and succeeds, while
		// it should fail.

Hence, don't walk the struct and only rely on btf_types_are_same for
strict mode. All users of strict mode must be dealing with zero offset
anyway, since otherwise they would want the struct to be walked.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-10-memxor@gmail.com

2ab3b380

bpf: Teach verifier about kptr_get kfunc helpers · a1ef1959

Kumar Kartikeya Dwivedi authored Apr 25, 2022

We introduce a new style of kfunc helpers, namely *_kptr_get, where they
take pointer to the map value which points to a referenced kernel
pointer contained in the map. Since this is referenced, only
bpf_kptr_xchg from BPF side and xchg from kernel side is allowed to
change the current value, and each pointer that resides in that location
would be referenced, and RCU protected (this must be kept in mind while
adding kernel types embeddable as reference kptr in BPF maps).

This means that if do the load of the pointer value in an RCU read
section, and find a live pointer, then as long as we hold RCU read lock,
it won't be freed by a parallel xchg + release operation. This allows us
to implement a safe refcount increment scheme. Hence, enforce that first
argument of all such kfunc is a proper PTR_TO_MAP_VALUE pointing at the
right offset to referenced pointer.

For the rest of the arguments, they are subjected to typical kfunc
argument checks, hence allowing some flexibility in passing more intent
into how the reference should be taken.

For instance, in case of struct nf_conn, it is not freed until RCU grace
period ends, but can still be reused for another tuple once refcount has
dropped to zero. Hence, a bpf_ct_kptr_get helper not only needs to call
refcount_inc_not_zero, but also do a tuple match after incrementing the
reference, and when it fails to match it, put the reference again and
return NULL.

This can be implemented easily if we allow passing additional parameters
to the bpf_ct_kptr_get kfunc, like a struct bpf_sock_tuple * and a
tuple__sz pair.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-9-memxor@gmail.com

a1ef1959

bpf: Wire up freeing of referenced kptr · 14a324f6

Kumar Kartikeya Dwivedi authored Apr 25, 2022

A destructor kfunc can be defined as void func(type *), where type may
be void or any other pointer type as per convenience.

In this patch, we ensure that the type is sane and capture the function
pointer into off_desc of ptr_off_tab for the specific pointer offset,
with the invariant that the dtor pointer is always set when 'kptr_ref'
tag is applied to the pointer's pointee type, which is indicated by the
flag BPF_MAP_VALUE_OFF_F_REF.

Note that only BTF IDs whose destructor kfunc is registered, thus become
the allowed BTF IDs for embedding as referenced kptr. Hence it serves
the purpose of finding dtor kfunc BTF ID, as well acting as a check
against the whitelist of allowed BTF IDs for this purpose.

Finally, wire up the actual freeing of the referenced pointer if any at
all available offsets, so that no references are leaked after the BPF
map goes away and the BPF program previously moved the ownership a
referenced pointer into it.

The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem
will free any existing referenced kptr. The same case is with LRU map's
bpf_lru_push_free/htab_lru_push_free functions, which are extended to
reset unreferenced and free referenced kptr.

Note that unlike BPF timers, kptr is not reset or freed when map uref
drops to zero.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-8-memxor@gmail.com

14a324f6

bpf: Populate pairs of btf_id and destructor kfunc in btf · 5ce937d6

Kumar Kartikeya Dwivedi authored Apr 25, 2022

To support storing referenced PTR_TO_BTF_ID in maps, we require
associating a specific BTF ID with a 'destructor' kfunc. This is because
we need to release a live referenced pointer at a certain offset in map
value from the map destruction path, otherwise we end up leaking
resources.

Hence, introduce support for passing an array of btf_id, kfunc_btf_id
pairs that denote a BTF ID and its associated release function. Then,
add an accessor 'btf_find_dtor_kfunc' which can be used to look up the
destructor kfunc of a certain BTF ID. If found, we can use it to free
the object from the map free path.

The registration of these pairs also serve as a whitelist of structures
which are allowed as referenced PTR_TO_BTF_ID in a BPF map, because
without finding the destructor kfunc, we will bail and return an error.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-7-memxor@gmail.com

5ce937d6

bpf: Adapt copy_map_value for multiple offset case · 4d7d7f69

Kumar Kartikeya Dwivedi authored Apr 25, 2022

Since now there might be at most 10 offsets that need handling in
copy_map_value, the manual shuffling and special case is no longer going
to work. Hence, let's generalise the copy_map_value function by using
a sorted array of offsets to skip regions that must be avoided while
copying into and out of a map value.

When the map is created, we populate the offset array in struct map,
Then, copy_map_value uses this sorted offset array is used to memcpy
while skipping timer, spin lock, and kptr. The array is allocated as
in most cases none of these special fields would be present in map
value, hence we can save on space for the common case by not embedding
the entire object inside bpf_map struct.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-6-memxor@gmail.com

4d7d7f69

bpf: Prevent escaping of kptr loaded from maps · 6efe152d

Kumar Kartikeya Dwivedi authored Apr 25, 2022

While we can guarantee that even for unreferenced kptr, the object
pointer points to being freed etc. can be handled by the verifier's
exception handling (normal load patching to PROBE_MEM loads), we still
cannot allow the user to pass these pointers to BPF helpers and kfunc,
because the same exception handling won't be done for accesses inside
the kernel. The same is true if a referenced pointer is loaded using
normal load instruction. Since the reference is not guaranteed to be
held while the pointer is used, it must be marked as untrusted.

Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
all registers loading unreferenced and referenced kptr from BPF maps,
and ensure they can never escape the BPF program and into the kernel by
way of calling stable/unstable helpers.

In check_ptr_to_btf_access, the !type_may_be_null check to reject type
flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER,
MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first
two are checked inside the function and rejected using a proper error
message, but we still want to allow dereference of untrusted case.

Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are
walked, so that this flag is never dropped once it has been set on a
PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one
direction).

In convert_ctx_accesses, extend the switch case to consider untrusted
PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM
conversion for BPF_LDX.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-5-memxor@gmail.com

6efe152d

bpf: Allow storing referenced kptr in map · c0a5a21c

Kumar Kartikeya Dwivedi authored Apr 25, 2022

Extending the code in previous commits, introduce referenced kptr
support, which needs to be tagged using 'kptr_ref' tag instead. Unlike
unreferenced kptr, referenced kptr have a lot more restrictions. In
addition to the type matching, only a newly introduced bpf_kptr_xchg
helper is allowed to modify the map value at that offset. This transfers
the referenced pointer being stored into the map, releasing the
references state for the program, and returning the old value and
creating new reference state for the returned pointer.

Similar to unreferenced pointer case, return value for this case will
also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer
must either be eventually released by calling the corresponding release
function, otherwise it must be transferred into another map.

It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear
the value, and obtain the old value if any.

BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future
commit will permit using BPF_LDX for such pointers, but attempt at
making it safe, since the lifetime of object won't be guaranteed.

There are valid reasons to enforce the restriction of permitting only
bpf_kptr_xchg to operate on referenced kptr. The pointer value must be
consistent in face of concurrent modification, and any prior values
contained in the map must also be released before a new one is moved
into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg
returns the old value, which the verifier would require the user to
either free or move into another map, and releases the reference held
for the pointer being moved in.

In the future, direct BPF_XCHG instruction may also be permitted to work
like bpf_kptr_xchg helper.

Note that process_kptr_func doesn't have to call
check_helper_mem_access, since we already disallow rdonly/wronly flags
for map, which is what check_map_access_type checks, and we already
ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc,
so check_map_access is also not required.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-4-memxor@gmail.com

c0a5a21c

bpf: Tag argument to be released in bpf_func_proto · 8f14852e

Kumar Kartikeya Dwivedi authored Apr 25, 2022

Add a new type flag for bpf_arg_type that when set tells verifier that
for a release function, that argument's register will be the one for
which meta.ref_obj_id will be set, and which will then be released
using release_reference. To capture the regno, introduce a new field
release_regno in bpf_call_arg_meta.

This would be required in the next patch, where we may either pass NULL
or a refcounted pointer as an argument to the release function
bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
enough, as there is a case where the type of argument needed matches,
but the ref_obj_id is set to 0. Hence, we must enforce that whenever
meta.ref_obj_id is zero, the register that is to be released can only
be NULL for a release function.

Since we now indicate whether an argument is to be released in
bpf_func_proto itself, is_release_function helper has lost its utitlity,
hence refactor code to work without it, and just rely on
meta.release_regno to know when to release state for a ref_obj_id.
Still, the restriction of one release argument and only one ref_obj_id
passed to BPF helper or kfunc remains. This may be lifted in the future.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-3-memxor@gmail.com

8f14852e

bpf: Allow storing unreferenced kptr in map · 61df10c7

Kumar Kartikeya Dwivedi authored Apr 25, 2022

This commit introduces a new pointer type 'kptr' which can be embedded
in a map value to hold a PTR_TO_BTF_ID stored by a BPF program during
its invocation. When storing such a kptr, BPF program's PTR_TO_BTF_ID
register must have the same type as in the map value's BTF, and loading
a kptr marks the destination register as PTR_TO_BTF_ID with the correct
kernel BTF and BTF ID.

Such kptr are unreferenced, i.e. by the time another invocation of the
BPF program loads this pointer, the object which the pointer points to
may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
patched to PROBE_MEM loads by the verifier, it would safe to allow user
to still access such invalid pointer, but passing such pointers into
BPF helpers and kfuncs should not be permitted. A future patch in this
series will close this gap.

The flexibility offered by allowing programs to dereference such invalid
pointers while being safe at runtime frees the verifier from doing
complex lifetime tracking. As long as the user may ensure that the
object remains valid, it can ensure data read by it from the kernel
object is valid.

The user indicates that a certain pointer must be treated as kptr
capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
information is recorded in the object BTF which will be passed into the
kernel by way of map's BTF information. The name and kind from the map
value BTF is used to look up the in-kernel type, and the actual BTF and
BTF ID is recorded in the map struct in a new kptr_off_tab member. For
now, only storing pointers to structs is permitted.

An example of this specification is shown below:

	#define __kptr __attribute__((btf_type_tag("kptr")))

	struct map_value {
		...
		struct task_struct __kptr *task;
		...
	};

Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
task_struct into the map, and then load it later.

Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
the verifier cannot know whether the value is NULL or not statically, it
must treat all potential loads at that map value offset as loading a
possibly NULL pointer.

Only BPF_LDX, BPF_STX, and BPF_ST (with insn->imm = 0 to denote NULL)
are allowed instructions that can access such a pointer. On BPF_LDX, the
destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
it is checked whether the source register type is a PTR_TO_BTF_ID with
same BTF type as specified in the map BTF. The access size must always
be BPF_DW.

For the map in map support, the kptr_off_tab for outer map is copied
from the inner map's kptr_off_tab. It was chosen to do a deep copy
instead of introducing a refcount to kptr_off_tab, because the copy only
needs to be done when paramterizing using inner_map_fd in the map in map
case, hence would be unnecessary for all other users.

It is not permitted to use MAP_FREEZE command and mmap for BPF map
having kptrs, similar to the bpf_timer case. A kptr also requires that
BPF program has both read and write access to the map (hence both
BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG are disallowed).

Note that check_map_access must be called from both
check_helper_mem_access and for the BPF instructions, hence the kptr
check must distinguish between ACCESS_DIRECT and ACCESS_HELPER, and
reject ACCESS_HELPER cases. We rename stack_access_src to bpf_access_src
and reuse it for this purpose.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-2-memxor@gmail.com

61df10c7

bpf: Use bpf_prog_run_array_cg_flags everywhere · d9d31cf8

Stanislav Fomichev authored Apr 25, 2022

Rename bpf_prog_run_array_cg_flags to bpf_prog_run_array_cg and
use it everywhere. check_return_code already enforces sane
return ranges for all cgroup types. (only egress and bind hooks have
uncanonical return ranges, the rest is using [0, 1])

No functional changes.

v2:
- 'func_ret & 1' under explicit test (Andrii & Martin)
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220425220448.3669032-1-sdf@google.com

d9d31cf8

25 Apr, 2022 3 commits

bpftool, musl compat: Replace sys/fcntl.h by fcntl.h · 246bdfa5

Dominique Martinet authored Apr 24, 2022

musl does not like including sys/fcntl.h directly:

    [...]
    1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
    [...]
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220424051022.2619648-5-asmadeus@codewreck.org

246bdfa5

bpftool, musl compat: Replace nftw with FTW_ACTIONRETVAL · 93bc2e9e

Dominique Martinet authored Apr 24, 2022

musl nftw implementation does not support FTW_ACTIONRETVAL. There have been
multiple attempts at pushing the feature in musl upstream, but it has been
refused or ignored all the times:

  https://www.openwall.com/lists/musl/2021/03/26/1
  https://www.openwall.com/lists/musl/2022/01/22/1

In this case we only care about /proc/<pid>/fd/<fd>, so it's not too difficult
to reimplement directly instead, and the new implementation makes 'bpftool perf'
slightly faster because it doesn't needlessly stat/readdir unneeded directories
(54ms -> 13ms on my machine).
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220424051022.2619648-4-asmadeus@codewreck.org

93bc2e9e

libbpf: Remove unnecessary type cast · 003fed59

Yuntao Wang authored Apr 24, 2022

The link variable is already of type 'struct bpf_link *', casting it to
'struct bpf_link *' is redundant, drop it.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220424143420.457082-1-ytcoode@gmail.com

003fed59

22 Apr, 2022 4 commits

selftests/bpf: Switch fexit_stress to bpf_link_create() API · fd0493a1

Andrii Nakryiko authored Apr 20, 2022

Use bpf_link_create() API in fexit_stress test to attach FEXIT programs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Kui-Feng Lee <kuifeng@fb.com>
Link: https://lore.kernel.org/bpf/20220421033945.3602803-4-andrii@kernel.org

fd0493a1

libbpf: Teach bpf_link_create() to fallback to bpf_raw_tracepoint_open() · 8462e0b4

Andrii Nakryiko authored Apr 20, 2022

Teach bpf_link_create() to fallback to bpf_raw_tracepoint_open() on
older kernels for programs that are attachable through
BPF_RAW_TRACEPOINT_OPEN. This makes bpf_link_create() more unified and
convenient interface for creating bpf_link-based attachments.

With this approach end users can just use bpf_link_create() for
tp_btf/fentry/fexit/fmod_ret/lsm program attachments without needing to
care about kernel support, as libbpf will handle this transparently. On
the other hand, as newer features (like BPF cookie) are added to
LINK_CREATE interface, they will be readily usable though the same
bpf_link_create() API without any major refactoring from user's
standpoint.

bpf_program__attach_btf_id() is now using bpf_link_create() internally
as well and will take advantaged of this unified interface when BPF
cookie is added for fentry/fexit.

Doing proactive feature detection of LINK_CREATE support for
fentry/tp_btf/etc is quite involved. It requires parsing vmlinux BTF,
determining some stable and guaranteed to be in all kernels versions
target BTF type (either raw tracepoint or fentry target function),
actually attaching this program and thus potentially affecting the
performance of the host kernel briefly, etc. So instead we are taking
much simpler "lazy" approach of falling back to
bpf_raw_tracepoint_open() call only if initial LINK_CREATE command
fails. For modern kernels this will mean zero added overhead, while
older kernels will incur minimal overhead with a single fast-failing
LINK_CREATE call.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Kui-Feng Lee <kuifeng@fb.com>
Link: https://lore.kernel.org/bpf/20220421033945.3602803-3-andrii@kernel.org

8462e0b4

bpf: Allow attach TRACING programs through LINK_CREATE command · df86ca0d

Andrii Nakryiko authored Apr 20, 2022

Allow attaching BTF-aware TRACING programs, previously attachable only
through BPF_RAW_TRACEPOINT_OPEN command, through LINK_CREATE command:

  - BTF-aware raw tracepoints (tp_btf in libbpf lingo);
  - fentry/fexit/fmod_ret programs;
  - BPF LSM programs.

This change converges all bpf_link-based attachments under LINK_CREATE
command allowing to further extend the API with features like BPF cookie
under "multiplexed" link_create section of bpf_attr.

Non-BTF-aware raw tracepoints are left under BPF_RAW_TRACEPOINT_OPEN,
but there is nothing preventing opening them up to LINK_CREATE as well.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Kuifeng Lee <kuifeng@fb.com>
Link: https://lore.kernel.org/bpf/20220421033945.3602803-2-andrii@kernel.org

df86ca0d

libbpf: Improve libbpf API documentation link position · e8c5e1a0

Grant Seltzer authored Apr 21, 2022

This puts the link for libbpf API documentation into the sidebar
for much easier navigation.

You can preview this change at:

  https://libbpf-test.readthedocs.io/en/latest/

Note that the link is hardcoded to the production version, so you
can see that it self references itself here for now:

  https://libbpf-test.readthedocs.io/en/latest/api.html

This will need to make its way into the libbpf mirror, before being
deployed to libbpf.readthedocs.org
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220422031050.303984-1-grantseltzer@gmail.com

e8c5e1a0

21 Apr, 2022 11 commits

libbpf: Remove redundant non-null checks on obj_elf · b71a2ebf

Gaosheng Cui authored Apr 21, 2022

Obj_elf is already non-null checked at the function entry, so remove
redundant non-null checks on obj_elf.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220421031803.2283974-1-cuigaosheng1@huawei.com

b71a2ebf

selftests/bpf: Fix map tests errno checks · c14766a8

Artem Savkov authored Apr 21, 2022

Switching to libbpf 1.0 API broke test_lpm_map and test_lru_map as error
reporting changed. Instead of setting errno and returning -1 bpf calls
now return -Exxx directly.
Drop errno checks and look at return code directly.

Fixes: b858ba8c ("selftests/bpf: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/bpf/20220421094320.1563570-1-asavkov@redhat.com

c14766a8

selftests/bpf: Fix prog_tests uprobe_autoattach compilation error · 6a12b8e2

Artem Savkov authored Apr 21, 2022

I am getting the following compilation error for prog_tests/uprobe_autoattach.c:

  tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c: In function ‘test_uprobe_autoattach’:
  ./test_progs.h:209:26: error: pointer ‘mem’ may be used after ‘free’ [-Werror=use-after-free]

The value of mem is now used in one of the asserts, which is why it may be
confusing compilers. However, it is not dereferenced. Silence this by moving
free(mem) after the assert block.

Fixes: 1717e248 ("selftests/bpf: Uprobe tests should verify param/return values")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220421132317.1583867-1-asavkov@redhat.com

6a12b8e2

selftests/bpf: Fix attach tests retcode checks · 920fd5e1

Artem Savkov authored Apr 21, 2022

Switching to libbpf 1.0 API broke test_sock and test_sysctl as they
check for return of bpf_prog_attach to be exactly -1. Switch the check
to '< 0' instead.

Fixes: b858ba8c ("selftests/bpf: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/bpf/20220421130104.1582053-1-asavkov@redhat.com

920fd5e1

i40e, xsk: Get rid of redundant 'fallthrough' · 9d87e41a

Maciej Fijalkowski authored Apr 21, 2022

Intel drivers translate actions returned from XDP programs to their own
return codes that have the following mapping:

XDP_REDIRECT -> I40E_XDP_{REDIR,CONSUMED}
XDP_TX -> I40E_XDP_{TX,CONSUMED}
XDP_DROP -> I40E_XDP_CONSUMED
XDP_ABORTED -> I40E_XDP_CONSUMED
XDP_PASS -> I40E_XDP_PASS

Commit b8aef650 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx
queue gets full") introduced new translation

XDP_REDIRECT -> I40E_XDP_EXIT

which is set when XSK RQ gets full and to indicate that driver should
stop further Rx processing. This happens for unsuccessful
xdp_do_redirect() so it is valuable to call trace_xdp_exception() for
this case. In order to avoid I40E_XDP_EXIT -> IXGBE_XDP_CONSUMED
overwrite, XDP_DROP case was moved above which in turn made the
'fallthrough' that is in XDP_ABORTED useless as it became the last label
in the switch statement.

Simply drop this leftover.

Fixes: b8aef650 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220421132126.471515-3-maciej.fijalkowski@intel.com

9d87e41a

ixgbe, xsk: Get rid of redundant 'fallthrough' · e130e8d5

Maciej Fijalkowski authored Apr 21, 2022

Intel drivers translate actions returned from XDP programs to their own
return codes that have the following mapping:

XDP_REDIRECT -> IXGBE_XDP_{REDIR,CONSUMED}
XDP_TX -> IXGBE_XDP_{TX,CONSUMED}
XDP_DROP -> IXGBE_XDP_CONSUMED
XDP_ABORTED -> IXGBE_XDP_CONSUMED
XDP_PASS -> IXGBE_XDP_PASS

Commit c7dd09fd ("ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx
queue gets full") introduced new translation

XDP_REDIRECT -> IXGBE_XDP_EXIT

which is set when XSK RQ gets full and to indicate that driver should
stop further Rx processing. This happens for unsuccessful
xdp_do_redirect() so it is valuable to call trace_xdp_exception() for
this case. In order to avoid IXGBE_XDP_EXIT -> IXGBE_XDP_CONSUMED
overwrite, XDP_DROP case was moved above which in turn made the
'fallthrough' that is in XDP_ABORTED useless as it became the last label
in the switch statement.

Simply drop this leftover.

Fixes: c7dd09fd ("ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220421132126.471515-2-maciej.fijalkowski@intel.com

e130e8d5

bpf: Move check_ptr_off_reg before check_map_access · e9147b44

Kumar Kartikeya Dwivedi authored Apr 15, 2022

Some functions in next patch want to use this function, and those
functions will be called by check_map_access, hence move it before
check_map_access.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/bpf/20220415160354.1050687-3-memxor@gmail.com

e9147b44

bpf: Make btf_find_field more generic · 42ba1308

Kumar Kartikeya Dwivedi authored Apr 15, 2022

Next commit introduces field type 'kptr' whose kind will not be struct,
but pointer, and it will not be limited to one offset, but multiple
ones. Make existing btf_find_struct_field and btf_find_datasec_var
functions amenable to use for finding kptrs in map value, by moving
spin_lock and timer specific checks into their own function.

The alignment, and name are checked before the function is called, so it
is the last point where we can skip field or return an error before the
next loop iteration happens. Size of the field and type is meant to be
checked inside the function.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220415160354.1050687-2-memxor@gmail.com

42ba1308

libbpf: Add documentation to API functions · a66ab9a9

Grant Seltzer authored Apr 20, 2022

This adds documentation for the following API functions:

- bpf_program__set_expected_attach_type()
- bpf_program__set_type()
- bpf_program__set_attach_target()
- bpf_program__attach()
- bpf_program__pin()
- bpf_program__unpin()
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220420161226.86803-3-grantseltzer@gmail.com

a66ab9a9

libbpf: Update API functions usage to check error · df286716

Grant Seltzer authored Apr 20, 2022

This updates usage of the following API functions within
libbpf so their newly added error return is checked:

- bpf_program__set_expected_attach_type()
- bpf_program__set_type()
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220420161226.86803-2-grantseltzer@gmail.com

df286716

libbpf: Add error returns to two API functions · 93442f13

Grant Seltzer authored Apr 20, 2022

This adds an error return to the following API functions:

- bpf_program__set_expected_attach_type()
- bpf_program__set_type()

In both cases, the error occurs when the BPF object has
already been loaded when the function is called. In this
case -EBUSY is returned.
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220420161226.86803-1-grantseltzer@gmail.com

93442f13