1. 26 Apr, 2022 27 commits
    • Alexei Starovoitov's avatar
      Merge branch 'Teach libbpf to "fix up" BPF verifier log' · d54d06a4
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      This patch set teaches libbpf to enhance BPF verifier log with human-readable
      and relevant information about failed CO-RE relocation. Patch #9 is the main
      one with the new logic. See relevant commit messages for some more details.
      
      All the other patches are either fixing various bugs detected
      while working on this feature, most prominently a bug with libbpf not handling
      CO-RE relocations for SEC("?...") programs, or are refactoring libbpf
      internals to allow for easier reuse of CO-RE relo lookup and formatting logic.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d54d06a4
    • Andrii Nakryiko's avatar
      selftests/bpf: Add libbpf's log fixup logic selftests · ea4128eb
      Andrii Nakryiko authored
      Add tests validating that libbpf is indeed patching up BPF verifier log
      with CO-RE relocation details. Also test partial and full truncation
      scenarios.
      
      This test might be a bit fragile due to changing BPF verifier log
      format. If that proves to be frequently breaking, we can simplify tests
      or remove the truncation subtests. But for now it seems useful to test
      it in those conditions that are otherwise rarely occuring in practice.
      
      Also test CO-RE relo failure in a subprog as that excercises subprogram CO-RE
      relocation mapping logic which doesn't work out of the box without extra
      relo storage previously done only for gen_loader case.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220426004511.2691730-11-andrii@kernel.org
      ea4128eb
    • Andrii Nakryiko's avatar
      libbpf: Fix up verifier log for unguarded failed CO-RE relos · 9fdc4273
      Andrii Nakryiko authored
      Teach libbpf to post-process BPF verifier log on BPF program load
      failure and detect known error patterns to provide user with more
      context.
      
      Currently there is one such common situation: an "unguarded" failed BPF
      CO-RE relocation. While failing CO-RE relocation is expected, it is
      expected to be property guarded in BPF code such that BPF verifier
      always eliminates BPF instructions corresponding to such failed CO-RE
      relos as dead code. In cases when user failed to take such precautions,
      BPF verifier provides the best log it can:
      
        123: (85) call unknown#195896080
        invalid func unknown#195896080
      
      Such incomprehensible log error is due to libbpf "poisoning" BPF
      instruction that corresponds to failed CO-RE relocation by replacing it
      with invalid `call 0xbad2310` instruction (195896080 == 0xbad2310 reads
      "bad relo" if you squint hard enough).
      
      Luckily, libbpf has all the necessary information to look up CO-RE
      relocation that failed and provide more human-readable description of
      what's going on:
      
        5: <invalid CO-RE relocation>
        failed to resolve CO-RE relocation <byte_off> [6] struct task_struct___bad.fake_field_subprog (0:2 @ offset 8)
      
      This hopefully makes it much easier to understand what's wrong with
      user's BPF program without googling magic constants.
      
      This BPF verifier log fixup is setup to be extensible and is going to be
      used for at least one other upcoming feature of libbpf in follow up patches.
      Libbpf is parsing lines of BPF verifier log starting from the very end.
      Currently it processes up to 10 lines of code looking for familiar
      patterns. This avoids wasting lots of CPU processing huge verifier logs
      (especially for log_level=2 verbosity level). Actual verification error
      should normally be found in last few lines, so this should work
      reliably.
      
      If libbpf needs to expand log beyond available log_buf_size, it
      truncates the end of the verifier log. Given verifier log normally ends
      with something like:
      
        processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
      
      ... truncating this on program load error isn't too bad (end user can
      always increase log size, if it needs to get complete log).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220426004511.2691730-10-andrii@kernel.org
      9fdc4273
    • Andrii Nakryiko's avatar
      libbpf: Simplify bpf_core_parse_spec() signature · 14032f26
      Andrii Nakryiko authored
      Simplify bpf_core_parse_spec() signature to take struct bpf_core_relo as
      an input instead of requiring callers to decompose them into type_id,
      relo, spec_str, etc. This makes using and reusing this helper easier.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220426004511.2691730-9-andrii@kernel.org
      14032f26
    • Andrii Nakryiko's avatar
      libbpf: Refactor CO-RE relo human description formatting routine · b58af63a
      Andrii Nakryiko authored
      Refactor how CO-RE relocation is formatted. Now it dumps human-readable
      representation, currently used by libbpf in either debug or error
      message output during CO-RE relocation resolution process, into provided
      buffer. This approach allows for better reuse of this functionality
      outside of CO-RE relocation resolution, which we'll use in next patch
      for providing better error message for BPF verifier rejecting BPF
      program due to unguarded failed CO-RE relocation.
      
      It also gets rid of annoying "stitching" of libbpf_print() calls, which
      was the only place where we did this.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220426004511.2691730-8-andrii@kernel.org
      b58af63a
    • Andrii Nakryiko's avatar
      libbpf: Record subprog-resolved CO-RE relocations unconditionally · 185cfe83
      Andrii Nakryiko authored
      Previously, libbpf recorded CO-RE relocations with insns_idx resolved
      according to finalized subprog locations (which are appended at the end
      of entry BPF program) to simplify the job of light skeleton generator.
      
      This is necessary because once subprogs' instructions are appended to
      main entry BPF program all the subprog instruction indices are shifted
      and that shift is different for each entry (main) BPF program, so it's
      generally impossible to map final absolute insn_idx of the finalized BPF
      program to their original locations inside subprograms.
      
      This information is now going to be used not only during light skeleton
      generation, but also to map absolute instruction index to subprog's
      instruction and its corresponding CO-RE relocation. So start recording
      these relocations always, not just when obj->gen_loader is set.
      
      This information is going to be freed at the end of bpf_object__load()
      step, as before (but this can change in the future if there will be
      a need for this information post load step).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220426004511.2691730-7-andrii@kernel.org
      185cfe83
    • Andrii Nakryiko's avatar
      selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests · b82bb1ff
      Andrii Nakryiko authored
      Enhance linked_funcs selftest with two tricky features that might not
      obviously work correctly together. We add CO-RE relocations to entry BPF
      programs and mark those programs as non-autoloadable with SEC("?...")
      annotation. This makes sure that libbpf itself handles .BTF.ext CO-RE
      relocation data matching correctly for SEC("?...") programs, as well as
      ensures that BPF static linker handles this correctly (this was the case
      before, no changes are necessary, but it wasn't explicitly tested).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220426004511.2691730-6-andrii@kernel.org
      b82bb1ff
    • Andrii Nakryiko's avatar
      libbpf: Avoid joining .BTF.ext data with BPF programs by section name · 11d5daa8
      Andrii Nakryiko authored
      Instead of using ELF section names as a joining key between .BTF.ext and
      corresponding BPF programs, pre-build .BTF.ext section number to ELF
      section index mapping during bpf_object__open() and use it later for
      matching .BTF.ext information (func/line info or CO-RE relocations) to
      their respective BPF programs and subprograms.
      
      This simplifies corresponding joining logic and let's libbpf do
      manipulations with BPF program's ELF sections like dropping leading '?'
      character for non-autoloaded programs. Original joining logic in
      bpf_object__relocate_core() (see relevant comment that's now removed)
      was never elegant, so it's a good improvement regardless. But it also
      avoids unnecessary internal assumptions about preserving original ELF
      section name as BPF program's section name (which was broken when
      SEC("?abc") support was added).
      
      Fixes: a3820c48 ("libbpf: Support opting out from autoloading BPF programs declaratively")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220426004511.2691730-5-andrii@kernel.org
      11d5daa8
    • Andrii Nakryiko's avatar
      libbpf: Fix logic for finding matching program for CO-RE relocation · 966a7509
      Andrii Nakryiko authored
      Fix the bug in bpf_object__relocate_core() which can lead to finding
      invalid matching BPF program when processing CO-RE relocation. IF
      matching program is not found, last encountered program will be assumed
      to be correct program and thus error detection won't detect the problem.
      
      Fixes: 9c82a63c ("libbpf: Fix CO-RE relocs against .text section")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220426004511.2691730-4-andrii@kernel.org
      966a7509
    • Andrii Nakryiko's avatar
      libbpf: Drop unhelpful "program too large" guess · 0994a54c
      Andrii Nakryiko authored
      libbpf pretends it knows actual limit of BPF program instructions based
      on UAPI headers it compiled with. There is neither any guarantee that
      UAPI headers match host kernel, nor BPF verifier actually uses
      BPF_MAXINSNS constant anymore. Just drop unhelpful "guess", BPF verifier
      will emit actual reason for failure in its logs anyways.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220426004511.2691730-3-andrii@kernel.org
      0994a54c
    • Andrii Nakryiko's avatar
      libbpf: Fix anonymous type check in CO-RE logic · afe98d46
      Andrii Nakryiko authored
      Use type name for checking whether CO-RE relocation is referring to
      anonymous type. Using spec string makes no sense.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220426004511.2691730-2-andrii@kernel.org
      afe98d46
    • Menglong Dong's avatar
      bpf: Compute map_btf_id during build time · c317ab71
      Menglong Dong authored
      For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
      types are computed during vmlinux-btf init:
      
        btf_parse_vmlinux() -> btf_vmlinux_map_ids_init()
      
      It will lookup the btf_type according to the 'map_btf_name' field in
      'struct bpf_map_ops'. This process can be done during build time,
      thanks to Jiri's resolve_btfids.
      
      selftest of map_ptr has passed:
      
        $96 map_ptr:OK
        Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c317ab71
    • Alexei Starovoitov's avatar
      Merge branch 'Introduce typed pointer support in BPF maps' · 367590b7
      Alexei Starovoitov authored
      Kumar Kartikeya Dwivedi says:
      
      ====================
      
      This set enables storing pointers of a certain type in BPF map, and extends the
      verifier to enforce type safety and lifetime correctness properties.
      
      The infrastructure being added is generic enough for allowing storing any kind
      of pointers whose type is available using BTF (user or kernel) in the future
      (e.g. strongly typed memory allocation in BPF program), which are internally
      tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
      two kinds of pointers obtained from the kernel.
      
      Obviously, use of this feature depends on map BTF.
      
      1. Unreferenced kernel pointer
      
      In this case, there are very few restrictions. The pointer type being stored
      must match the type declared in the map value. However, such a pointer when
      loaded from the map can only be dereferenced, but not passed to any in-kernel
      helpers or kernel functions available to the program. This is because while the
      verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
      which are then handled specially by the JIT implementation, the same liberty is
      not available to accesses inside the kernel. The pointer by the time it is
      passed into a helper has no lifetime related guarantees about the object it is
      pointing to, and may well be referencing invalid memory.
      
      2. Referenced kernel pointer
      
      This case imposes a lot of restrictions on the programmer, to ensure safety. To
      transfer the ownership of a reference in the BPF program to the map, the user
      must use the bpf_kptr_xchg helper, which returns the old pointer contained in
      the map, as an acquired reference, and releases verifier state for the
      referenced pointer being exchanged, as it moves into the map.
      
      This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
      functions callable by the program.
      
      However, if BPF_LDX is used to load a referenced pointer from the map, it is
      still not permitted to pass it to in-kernel helpers or kernel functions. To
      obtain a reference usable with helpers, the user must invoke a kfunc helper
      which returns a usable reference (which also must be eventually released before
      BPF_EXIT, or moved into a map).
      
      Since the load of the pointer (preserving data dependency ordering) must happen
      inside the RCU read section, the kfunc helper will take a pointer to the map
      value, which must point to the actual pointer of the object whose reference is
      to be raised. The type will be verified from the BTF information of the kfunc,
      as the prototype must be:
      
      	T *func(T **, ... /* other arguments */);
      
      Then, the verifier checks whether pointer at offset of the map value points to
      the type T, and permits the call.
      
      This convention is followed so that such helpers may also be called from
      sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
      program context, hence necessiating the need to pass in a pointer to the actual
      pointer to perform the load inside the RCU read section.
      
      Notes
      -----
      
       * C selftests require https://reviews.llvm.org/D119799 to pass.
       * Unlike BPF timers, kptr is not reset or freed on map_release_uref.
       * Referenced kptr storage is always treated as unsigned long * on kernel side,
         as BPF side cannot mutate it. The storage (8 bytes) is sufficient for both
         32-bit and 64-bit platforms.
       * Use of WRITE_ONCE to reset unreferenced kptr on 32-bit systems is fine, as
         the actual pointer is always word sized, so the store tearing into two 32-bit
         stores won't be a problem as the other half is always zeroed out.
      
      Changelog:
      ----------
      v5 -> v6
      v5: https://lore.kernel.org/bpf/20220415160354.1050687-1-memxor@gmail.com
      
       * Address comments from Alexei
         * Drop 'Revisit stack usage' comment
         * Rename off_btf to kernel_btf
         * Add comment about searching using type from map BTF
         * Do kmemdup + btf_get instead of get + kmemdup + put
         * Add comment for btf_struct_ids_match
         * Add comment for assigning non-zero id for mark_ptr_or_null_reg
         * Rename PTR_RELEASE to OBJ_RELEASE
         * Rename BPF_MAP_OFF_DESC_TYPE_XXX_KPTR to BPF_KPTR_XXX
         * Remove unneeded likely/unlikely in cold functions
         * Fix other misc nits
       * Keep release_regno instead of replacing with bool + regno
       * Add a patch to prevent type match for first member when off == 0 for
         release functions (kfunc + BPF helpers)
       * Guard kptr/kptr_ref definition in libbpf header with __has_attribute
         to prevent selftests compilation error with old clang not support
         type tags
      
      v4 -> v5
      v4: https://lore.kernel.org/bpf/20220409093303.499196-1-memxor@gmail.com
      
       * Address comments from Joanne
         * Move __btf_member_bit_offset before strcmp
         * Move strcmp conditional on name to unref kptr patch
         * Directly return from btf_find_struct in patch 1
         * Use enum btf_field_type vs int field_type
         * Put btf and btf_id in off_desc in named struct 'kptr'
         * Switch order for BTF_FIELD_IGNORE check
         * Drop dead tab->nr_off = 0 store
         * Use i instead of tab->nr_off to btf_put on failure
         * Replace kzalloc + memcpy with kmemdup (kernel test robot)
         * Reject both BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG
         * Add logging statement for reject BPF_MODE(insn->code) != BPF_MEM
         * Rename off_desc -> kptr_off_desc in check_mem_access
         * Drop check for err, fallthrough to end of function
         * Remove is_release_function, use meta.release_regno to detect release
           function, release reference state, and remove check_release_regno
         * Drop off_desc->flags, use off_desc->type
         * Update comment for ARG_PTR_TO_KPTR
       * Distinguish between direct/indirect access to kptr
       * Drop check_helper_mem_access from process_kptr_func, check_mem_reg in kptr_get
       * Add verifier test for helper accessing kptr indirectly
       * Fix other misc nits, add Acked-by for patch 2
      
      v3 -> v4
      v3: https://lore.kernel.org/bpf/20220320155510.671497-1-memxor@gmail.com
      
       * Use btf_parse_kptrs, plural kptrs naming (Joanne, Andrii)
       * Remove unused parameters in check_map_kptr_access (Joanne)
       * Handle idx < info_cnt kludge using tmp variable (Andrii)
       * Validate tags always precede modifiers in BTF (Andrii)
         * Split out into https://lore.kernel.org/bpf/20220406004121.282699-1-memxor@gmail.com
       * Store u32 type_id in btf_field_info (Andrii)
       * Use base_type in map_kptr_match_type (Andrii)
       * Free	kptr_off_tab when not bpf_capable (Martin)
       * Use PTR_RELEASE flag instead of bools in bpf_func_proto (Joanne)
       * Drop extra reg->off and reg->ref_obj_id checks in map_kptr_match_type (Martin)
       * Use separate u32 and u8 arrays for offs and sizes in off_arr (Andrii)
       * Simplify and remove map->value_size sentinel in copy_map_value (Andrii)
       * Use sort_r to keep both arrays in sync while sorting (Andrii)
       * Rename check_and_free_timers_and_kptr to check_and_free_fields (Andrii)
       * Move dtor prototype checks to registration phase (Alexei)
       * Use ret variable for checking ASSERT_XXX, use shorter strings (Andrii)
       * Fix missing checks for other maps (Jiri)
       * Fix various other nits, and bugs noticed during self review
      
      v2 -> v3
      v2: https://lore.kernel.org/bpf/20220317115957.3193097-1-memxor@gmail.com
      
       * Address comments from Alexei
         * Set name, sz, align in btf_find_field
         * Do idx >= info_cnt check in caller of btf_find_field_*
           * Use extra element in the info_arr to make this safe
         * Remove while loop, reject extra tags
         * Remove cases of defensive programming
         * Move bpf_capable() check to map_check_btf
         * Put check_ptr_off_reg reordering hunk into separate patch
         * Warn for ref_ptr once
         * Make the meta.ref_obj_id == 0 case simpler to read
         * Remove kptr_percpu and kptr_user support, remove their tests
         * Store size of field at offset in off_arr
       * Fix BPF_F_NO_PREALLOC set wrongly for hash map in C selftest
       * Add missing check_mem_reg call for kptr_get kfunc arg#0 check
      
      v1 -> v2
      v1: https://lore.kernel.org/bpf/20220220134813.3411982-1-memxor@gmail.com
      
       * Address comments from Alexei
         * Rename bpf_btf_find_by_name_kind_all to bpf_find_btf_id
         * Reduce indentation level in that function
         * Always take reference regardless of module or vmlinux BTF
         * Also made it the same for btf_get_module_btf
         * Use kptr, kptr_ref, kptr_percpu, kptr_user type tags
         * Don't reserve tag namespace
         * Refactor btf_find_field to be side effect free, allocate and populate
           kptr_off_tab in caller
         * Move module reference to dtor patch
         * Remove support for BPF_XCHG, BPF_CMPXCHG insn
         * Introduce bpf_kptr_xchg helper
         * Embed offset array in struct bpf_map, populate and sort it once
         * Adjust copy_map_value to memcpy directly using this offset array
         * Removed size member from offset array to save space
       * Fix some problems pointed out by kernel test robot
       * Tidy selftests
       * Lots of other minor fixes
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      367590b7
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add test for strict BTF type check · 792c0a34
      Kumar Kartikeya Dwivedi authored
      Ensure that the edge case where first member type was matched
      successfully even if it didn't match BTF type of register is caught and
      rejected by the verifier.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-14-memxor@gmail.com
      792c0a34
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add verifier tests for kptr · 05a945de
      Kumar Kartikeya Dwivedi authored
      Reuse bpf_prog_test functions to test the support for PTR_TO_BTF_ID in
      BPF map case, including some tests that verify implementation sanity and
      corner cases.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-13-memxor@gmail.com
      05a945de
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add C tests for kptr · 2cbc469a
      Kumar Kartikeya Dwivedi authored
      This uses the __kptr and __kptr_ref macros as well, and tries to test
      the stuff that is supposed to work, since we have negative tests in
      test_verifier suite. Also include some code to test map-in-map support,
      such that the inner_map_meta matches the kptr_off_tab of map added as
      element.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-12-memxor@gmail.com
      2cbc469a
    • Kumar Kartikeya Dwivedi's avatar
      libbpf: Add kptr type tag macros to bpf_helpers.h · ef89654f
      Kumar Kartikeya Dwivedi authored
      Include convenience definitions:
      __kptr:	Unreferenced kptr
      __kptr_ref: Referenced kptr
      
      Users can use them to tag the pointer type meant to be used with the new
      support directly in the map value definition.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-11-memxor@gmail.com
      ef89654f
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Make BTF type match stricter for release arguments · 2ab3b380
      Kumar Kartikeya Dwivedi authored
      The current of behavior of btf_struct_ids_match for release arguments is
      that when type match fails, it retries with first member type again
      (recursively). Since the offset is already 0, this is akin to just
      casting the pointer in normal C, since if type matches it was just
      embedded inside parent sturct as an object. However, we want to reject
      cases for release function type matching, be it kfunc or BPF helpers.
      
      An example is the following:
      
      struct foo {
      	struct bar b;
      };
      
      struct foo *v = acq_foo();
      rel_bar(&v->b); // btf_struct_ids_match fails btf_types_are_same, then
      		// retries with first member type and succeeds, while
      		// it should fail.
      
      Hence, don't walk the struct and only rely on btf_types_are_same for
      strict mode. All users of strict mode must be dealing with zero offset
      anyway, since otherwise they would want the struct to be walked.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-10-memxor@gmail.com
      2ab3b380
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Teach verifier about kptr_get kfunc helpers · a1ef1959
      Kumar Kartikeya Dwivedi authored
      We introduce a new style of kfunc helpers, namely *_kptr_get, where they
      take pointer to the map value which points to a referenced kernel
      pointer contained in the map. Since this is referenced, only
      bpf_kptr_xchg from BPF side and xchg from kernel side is allowed to
      change the current value, and each pointer that resides in that location
      would be referenced, and RCU protected (this must be kept in mind while
      adding kernel types embeddable as reference kptr in BPF maps).
      
      This means that if do the load of the pointer value in an RCU read
      section, and find a live pointer, then as long as we hold RCU read lock,
      it won't be freed by a parallel xchg + release operation. This allows us
      to implement a safe refcount increment scheme. Hence, enforce that first
      argument of all such kfunc is a proper PTR_TO_MAP_VALUE pointing at the
      right offset to referenced pointer.
      
      For the rest of the arguments, they are subjected to typical kfunc
      argument checks, hence allowing some flexibility in passing more intent
      into how the reference should be taken.
      
      For instance, in case of struct nf_conn, it is not freed until RCU grace
      period ends, but can still be reused for another tuple once refcount has
      dropped to zero. Hence, a bpf_ct_kptr_get helper not only needs to call
      refcount_inc_not_zero, but also do a tuple match after incrementing the
      reference, and when it fails to match it, put the reference again and
      return NULL.
      
      This can be implemented easily if we allow passing additional parameters
      to the bpf_ct_kptr_get kfunc, like a struct bpf_sock_tuple * and a
      tuple__sz pair.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-9-memxor@gmail.com
      a1ef1959
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Wire up freeing of referenced kptr · 14a324f6
      Kumar Kartikeya Dwivedi authored
      A destructor kfunc can be defined as void func(type *), where type may
      be void or any other pointer type as per convenience.
      
      In this patch, we ensure that the type is sane and capture the function
      pointer into off_desc of ptr_off_tab for the specific pointer offset,
      with the invariant that the dtor pointer is always set when 'kptr_ref'
      tag is applied to the pointer's pointee type, which is indicated by the
      flag BPF_MAP_VALUE_OFF_F_REF.
      
      Note that only BTF IDs whose destructor kfunc is registered, thus become
      the allowed BTF IDs for embedding as referenced kptr. Hence it serves
      the purpose of finding dtor kfunc BTF ID, as well acting as a check
      against the whitelist of allowed BTF IDs for this purpose.
      
      Finally, wire up the actual freeing of the referenced pointer if any at
      all available offsets, so that no references are leaked after the BPF
      map goes away and the BPF program previously moved the ownership a
      referenced pointer into it.
      
      The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem
      will free any existing referenced kptr. The same case is with LRU map's
      bpf_lru_push_free/htab_lru_push_free functions, which are extended to
      reset unreferenced and free referenced kptr.
      
      Note that unlike BPF timers, kptr is not reset or freed when map uref
      drops to zero.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-8-memxor@gmail.com
      14a324f6
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Populate pairs of btf_id and destructor kfunc in btf · 5ce937d6
      Kumar Kartikeya Dwivedi authored
      To support storing referenced PTR_TO_BTF_ID in maps, we require
      associating a specific BTF ID with a 'destructor' kfunc. This is because
      we need to release a live referenced pointer at a certain offset in map
      value from the map destruction path, otherwise we end up leaking
      resources.
      
      Hence, introduce support for passing an array of btf_id, kfunc_btf_id
      pairs that denote a BTF ID and its associated release function. Then,
      add an accessor 'btf_find_dtor_kfunc' which can be used to look up the
      destructor kfunc of a certain BTF ID. If found, we can use it to free
      the object from the map free path.
      
      The registration of these pairs also serve as a whitelist of structures
      which are allowed as referenced PTR_TO_BTF_ID in a BPF map, because
      without finding the destructor kfunc, we will bail and return an error.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-7-memxor@gmail.com
      5ce937d6
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Adapt copy_map_value for multiple offset case · 4d7d7f69
      Kumar Kartikeya Dwivedi authored
      Since now there might be at most 10 offsets that need handling in
      copy_map_value, the manual shuffling and special case is no longer going
      to work. Hence, let's generalise the copy_map_value function by using
      a sorted array of offsets to skip regions that must be avoided while
      copying into and out of a map value.
      
      When the map is created, we populate the offset array in struct map,
      Then, copy_map_value uses this sorted offset array is used to memcpy
      while skipping timer, spin lock, and kptr. The array is allocated as
      in most cases none of these special fields would be present in map
      value, hence we can save on space for the common case by not embedding
      the entire object inside bpf_map struct.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-6-memxor@gmail.com
      4d7d7f69
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Prevent escaping of kptr loaded from maps · 6efe152d
      Kumar Kartikeya Dwivedi authored
      While we can guarantee that even for unreferenced kptr, the object
      pointer points to being freed etc. can be handled by the verifier's
      exception handling (normal load patching to PROBE_MEM loads), we still
      cannot allow the user to pass these pointers to BPF helpers and kfunc,
      because the same exception handling won't be done for accesses inside
      the kernel. The same is true if a referenced pointer is loaded using
      normal load instruction. Since the reference is not guaranteed to be
      held while the pointer is used, it must be marked as untrusted.
      
      Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
      all registers loading unreferenced and referenced kptr from BPF maps,
      and ensure they can never escape the BPF program and into the kernel by
      way of calling stable/unstable helpers.
      
      In check_ptr_to_btf_access, the !type_may_be_null check to reject type
      flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER,
      MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first
      two are checked inside the function and rejected using a proper error
      message, but we still want to allow dereference of untrusted case.
      
      Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are
      walked, so that this flag is never dropped once it has been set on a
      PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one
      direction).
      
      In convert_ctx_accesses, extend the switch case to consider untrusted
      PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM
      conversion for BPF_LDX.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-5-memxor@gmail.com
      6efe152d
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Allow storing referenced kptr in map · c0a5a21c
      Kumar Kartikeya Dwivedi authored
      Extending the code in previous commits, introduce referenced kptr
      support, which needs to be tagged using 'kptr_ref' tag instead. Unlike
      unreferenced kptr, referenced kptr have a lot more restrictions. In
      addition to the type matching, only a newly introduced bpf_kptr_xchg
      helper is allowed to modify the map value at that offset. This transfers
      the referenced pointer being stored into the map, releasing the
      references state for the program, and returning the old value and
      creating new reference state for the returned pointer.
      
      Similar to unreferenced pointer case, return value for this case will
      also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer
      must either be eventually released by calling the corresponding release
      function, otherwise it must be transferred into another map.
      
      It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear
      the value, and obtain the old value if any.
      
      BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future
      commit will permit using BPF_LDX for such pointers, but attempt at
      making it safe, since the lifetime of object won't be guaranteed.
      
      There are valid reasons to enforce the restriction of permitting only
      bpf_kptr_xchg to operate on referenced kptr. The pointer value must be
      consistent in face of concurrent modification, and any prior values
      contained in the map must also be released before a new one is moved
      into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg
      returns the old value, which the verifier would require the user to
      either free or move into another map, and releases the reference held
      for the pointer being moved in.
      
      In the future, direct BPF_XCHG instruction may also be permitted to work
      like bpf_kptr_xchg helper.
      
      Note that process_kptr_func doesn't have to call
      check_helper_mem_access, since we already disallow rdonly/wronly flags
      for map, which is what check_map_access_type checks, and we already
      ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc,
      so check_map_access is also not required.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-4-memxor@gmail.com
      c0a5a21c
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Tag argument to be released in bpf_func_proto · 8f14852e
      Kumar Kartikeya Dwivedi authored
      Add a new type flag for bpf_arg_type that when set tells verifier that
      for a release function, that argument's register will be the one for
      which meta.ref_obj_id will be set, and which will then be released
      using release_reference. To capture the regno, introduce a new field
      release_regno in bpf_call_arg_meta.
      
      This would be required in the next patch, where we may either pass NULL
      or a refcounted pointer as an argument to the release function
      bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
      enough, as there is a case where the type of argument needed matches,
      but the ref_obj_id is set to 0. Hence, we must enforce that whenever
      meta.ref_obj_id is zero, the register that is to be released can only
      be NULL for a release function.
      
      Since we now indicate whether an argument is to be released in
      bpf_func_proto itself, is_release_function helper has lost its utitlity,
      hence refactor code to work without it, and just rely on
      meta.release_regno to know when to release state for a ref_obj_id.
      Still, the restriction of one release argument and only one ref_obj_id
      passed to BPF helper or kfunc remains. This may be lifted in the future.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-3-memxor@gmail.com
      8f14852e
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Allow storing unreferenced kptr in map · 61df10c7
      Kumar Kartikeya Dwivedi authored
      This commit introduces a new pointer type 'kptr' which can be embedded
      in a map value to hold a PTR_TO_BTF_ID stored by a BPF program during
      its invocation. When storing such a kptr, BPF program's PTR_TO_BTF_ID
      register must have the same type as in the map value's BTF, and loading
      a kptr marks the destination register as PTR_TO_BTF_ID with the correct
      kernel BTF and BTF ID.
      
      Such kptr are unreferenced, i.e. by the time another invocation of the
      BPF program loads this pointer, the object which the pointer points to
      may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
      patched to PROBE_MEM loads by the verifier, it would safe to allow user
      to still access such invalid pointer, but passing such pointers into
      BPF helpers and kfuncs should not be permitted. A future patch in this
      series will close this gap.
      
      The flexibility offered by allowing programs to dereference such invalid
      pointers while being safe at runtime frees the verifier from doing
      complex lifetime tracking. As long as the user may ensure that the
      object remains valid, it can ensure data read by it from the kernel
      object is valid.
      
      The user indicates that a certain pointer must be treated as kptr
      capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
      a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
      information is recorded in the object BTF which will be passed into the
      kernel by way of map's BTF information. The name and kind from the map
      value BTF is used to look up the in-kernel type, and the actual BTF and
      BTF ID is recorded in the map struct in a new kptr_off_tab member. For
      now, only storing pointers to structs is permitted.
      
      An example of this specification is shown below:
      
      	#define __kptr __attribute__((btf_type_tag("kptr")))
      
      	struct map_value {
      		...
      		struct task_struct __kptr *task;
      		...
      	};
      
      Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
      task_struct into the map, and then load it later.
      
      Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
      the verifier cannot know whether the value is NULL or not statically, it
      must treat all potential loads at that map value offset as loading a
      possibly NULL pointer.
      
      Only BPF_LDX, BPF_STX, and BPF_ST (with insn->imm = 0 to denote NULL)
      are allowed instructions that can access such a pointer. On BPF_LDX, the
      destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
      it is checked whether the source register type is a PTR_TO_BTF_ID with
      same BTF type as specified in the map BTF. The access size must always
      be BPF_DW.
      
      For the map in map support, the kptr_off_tab for outer map is copied
      from the inner map's kptr_off_tab. It was chosen to do a deep copy
      instead of introducing a refcount to kptr_off_tab, because the copy only
      needs to be done when paramterizing using inner_map_fd in the map in map
      case, hence would be unnecessary for all other users.
      
      It is not permitted to use MAP_FREEZE command and mmap for BPF map
      having kptrs, similar to the bpf_timer case. A kptr also requires that
      BPF program has both read and write access to the map (hence both
      BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG are disallowed).
      
      Note that check_map_access must be called from both
      check_helper_mem_access and for the BPF instructions, hence the kptr
      check must distinguish between ACCESS_DIRECT and ACCESS_HELPER, and
      reject ACCESS_HELPER cases. We rename stack_access_src to bpf_access_src
      and reuse it for this purpose.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220424214901.2743946-2-memxor@gmail.com
      61df10c7
    • Stanislav Fomichev's avatar
      bpf: Use bpf_prog_run_array_cg_flags everywhere · d9d31cf8
      Stanislav Fomichev authored
      Rename bpf_prog_run_array_cg_flags to bpf_prog_run_array_cg and
      use it everywhere. check_return_code already enforces sane
      return ranges for all cgroup types. (only egress and bind hooks have
      uncanonical return ranges, the rest is using [0, 1])
      
      No functional changes.
      
      v2:
      - 'func_ret & 1' under explicit test (Andrii & Martin)
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220425220448.3669032-1-sdf@google.com
      d9d31cf8
  2. 25 Apr, 2022 3 commits
  3. 22 Apr, 2022 4 commits
  4. 21 Apr, 2022 6 commits