1. 11 Mar, 2023 4 commits
    • Alexei Starovoitov's avatar
      Merge branch 'Support stashing local kptrs with bpf_kptr_xchg' · 49b5300f
      Alexei Starovoitov authored
      Dave Marchevsky says:
      
      ====================
      
      Local kptrs are kptrs allocated via bpf_obj_new with a type specified in program
      BTF. A BPF program which creates a local kptr has exclusive control of the
      lifetime of the kptr, and, prior to terminating, must:
      
        * free the kptr via bpf_obj_drop
        * If the kptr is a {list,rbtree} node, add the node to a {list, rbtree},
          thereby passing control of the lifetime to the collection
      
      This series adds a third option:
      
        * stash the kptr in a map value using bpf_kptr_xchg
      
      As indicated by the use of "stash" to describe this behavior, the intended use
      of this feature is temporary storage of local kptrs. For example, a sched_ext
      ([0]) scheduler may want to create an rbtree node for each new cgroup on cgroup
      init, but to add that node to the rbtree as part of a separate program which
      runs on enqueue. Stashing the node in a map_value allows its lifetime to outlive
      the execution of the cgroup_init program.
      
      Behavior:
      
      There is no semantic difference between adding a kptr to a graph collection and
      "stashing" it in a map. In both cases exclusive ownership of the kptr's lifetime
      is passed to some containing data structure, which is responsible for
      bpf_obj_drop'ing it when the container goes away.
      
      Since graph collections also expect exclusive ownership of the nodes they
      contain, graph nodes cannot be both stashed in a map_value and contained by
      their corresponding collection.
      
      Implementation:
      
      Two observations simplify the verifier changes for this feature. First, kptrs
      ("referenced kptrs" until a recent renaming) require registration of a
      dtor function as part of their acquire/release semantics, so that a referenced
      kptr which is placed in a map_value is properly released when the map goes away.
      We want this exact behavior for local kptrs, but with bpf_obj_drop as the dtor
      instead of a per-btf_id dtor.
      
      The second observation is that, in terms of identification, "referenced kptr"
      and "local kptr" already don't interfere with one another. Consider the
      following example:
      
        struct node_data {
                long key;
                long data;
                struct bpf_rb_node node;
        };
      
        struct map_value {
                struct node_data __kptr *node;
        };
      
        struct {
                __uint(type, BPF_MAP_TYPE_ARRAY);
                __type(key, int);
                __type(value, struct map_value);
                __uint(max_entries, 1);
        } some_nodes SEC(".maps");
      
        struct map_value *mapval;
        struct node_data *res;
        int key = 0;
      
        res = bpf_obj_new(typeof(*res));
        if (!res) { /* err handling */ }
      
        mapval = bpf_map_lookup_elem(&some_nodes, &key);
        if (!mapval) { /* err handling */ }
      
        res = bpf_kptr_xchg(&mapval->node, res);
        if (res)
                bpf_obj_drop(res);
      
      The __kptr tag identifies map_value's node as a referenced kptr, while the
      PTR_TO_BTF_ID which bpf_obj_new returns - a type in some non-vmlinux,
      non-module BTF - identifies res as a local kptr. Type tag on the pointer
      indicates referenced kptr, while the type of the pointee indicates local kptr.
      So using existing facilities we can tell the verifier about a "referenced kptr"
      pointer to a "local kptr" pointee.
      
      When kptr_xchg'ing a kptr into a map_value, the verifier can recognize local
      kptr types and treat them like referenced kptrs with a properly-typed
      bpf_obj_drop as a dtor.
      
      Other implementation notes:
        * We don't need to do anything special to enforce "graph nodes cannot be
          both stashed in a map_value and contained by their corresponding collection"
          * bpf_kptr_xchg both returns and takes as input a (possibly-null) owning
            reference. It does not accept non-owning references as input by virtue
            of requiring a ref_obj_id. By definition, if a program has an owning
            ref to a node, the node isn't in a collection, so it's safe to pass
            ownership via bpf_kptr_xchg.
      
      Summary of patches:
      
        * Patch 1 modifies BTF plumbing to support using bpf_obj_drop as a dtor
        * Patch 2 adds verifier plumbing to support MEM_ALLOC-flagged param for
          bpf_kptr_xchg
        * Patch 3 adds selftests exercising the new behavior
      
      Changelog:
      
      v1 -> v2: https://lore.kernel.org/bpf/20230309180111.1618459-1-davemarchevsky@fb.com/
      
      Patch #s used below refer to the patch's position in v1 unless otherwise
      specified.
      
      Patches 1-3 were applied and are not included in v2.
      Rebase onto latest bpf-next: "libbpf: Revert poisoning of strlcpy"
      
      Patch 4: "bpf: Support __kptr to local kptrs"
        * Remove !btf_is_kernel(btf) check, WARN_ON_ONCE instead (Alexei)
      
      Patch 6: "selftests/bpf: Add local kptr stashing test"
        * Add test which stashes 2 nodes and later unstashes one of them using a
          separate BPF program (Alexei)
        * Fix incorrect runner subtest name for original test (was
          "rbtree_add_nodes")
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      49b5300f
    • Dave Marchevsky's avatar
      selftests/bpf: Add local kptr stashing test · 5d8d6634
      Dave Marchevsky authored
      Add a new selftest, local_kptr_stash, which uses bpf_kptr_xchg to stash
      a bpf_obj_new-allocated object in a map. Test the following scenarios:
      
        * Stash two rb_nodes in an arraymap, don't unstash them, rely on map
          free to destruct them
        * Stash two rb_nodes in an arraymap, unstash the second one in a
          separate program, rely on map free to destruct first
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230310230743.2320707-4-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5d8d6634
    • Dave Marchevsky's avatar
      bpf: Allow local kptrs to be exchanged via bpf_kptr_xchg · 738c96d5
      Dave Marchevsky authored
      The previous patch added necessary plumbing for verifier and runtime to
      know what to do with non-kernel PTR_TO_BTF_IDs in map values, but didn't
      provide any way to get such local kptrs into a map value. This patch
      modifies verifier handling of bpf_kptr_xchg to allow MEM_ALLOC kptr
      types.
      
      check_reg_type is modified accept MEM_ALLOC-flagged input to
      bpf_kptr_xchg despite such types not being in btf_ptr_types. This could
      have been done with a MAYBE_MEM_ALLOC equivalent to MAYBE_NULL, but
      bpf_kptr_xchg is the only helper that I can forsee using
      MAYBE_MEM_ALLOC, so keep it special-cased for now.
      
      The verifier tags bpf_kptr_xchg retval MEM_ALLOC if and only if the BTF
      associated with the retval is not kernel BTF.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230310230743.2320707-3-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      738c96d5
    • Dave Marchevsky's avatar
      bpf: Support __kptr to local kptrs · c8e18754
      Dave Marchevsky authored
      If a PTR_TO_BTF_ID type comes from program BTF - not vmlinux or module
      BTF - it must have been allocated by bpf_obj_new and therefore must be
      free'd with bpf_obj_drop. Such a PTR_TO_BTF_ID is considered a "local
      kptr" and is tagged with MEM_ALLOC type tag by bpf_obj_new.
      
      This patch adds support for treating __kptr-tagged pointers to "local
      kptrs" as having an implicit bpf_obj_drop destructor for referenced kptr
      acquire / release semantics. Consider the following example:
      
        struct node_data {
                long key;
                long data;
                struct bpf_rb_node node;
        };
      
        struct map_value {
                struct node_data __kptr *node;
        };
      
        struct {
                __uint(type, BPF_MAP_TYPE_ARRAY);
                __type(key, int);
                __type(value, struct map_value);
                __uint(max_entries, 1);
        } some_nodes SEC(".maps");
      
      If struct node_data had a matching definition in kernel BTF, the verifier would
      expect a destructor for the type to be registered. Since struct node_data does
      not match any type in kernel BTF, the verifier knows that there is no kfunc
      that provides a PTR_TO_BTF_ID to this type, and that such a PTR_TO_BTF_ID can
      only come from bpf_obj_new. So instead of searching for a registered dtor,
      a bpf_obj_drop dtor can be assumed.
      
      This allows the runtime to properly destruct such kptrs in
      bpf_obj_free_fields, which enables maps to clean up map_vals w/ such
      kptrs when going away.
      
      Implementation notes:
        * "kernel_btf" variable is renamed to "kptr_btf" in btf_parse_kptr.
          Before this patch, the variable would only ever point to vmlinux or
          module BTFs, but now it can point to some program BTF for local kptr
          type. It's later used to populate the (btf, btf_id) pair in kptr btf
          field.
        * It's necessary to btf_get the program BTF when populating btf_field
          for local kptr. btf_record_free later does a btf_put.
        * Behavior for non-local referenced kptrs is not modified, as
          bpf_find_btf_id helper only searches vmlinux and module BTFs for
          matching BTF type. If such a type is found, btf_field_kptr's btf will
          pass btf_is_kernel check, and the associated release function is
          some one-argument dtor. If btf_is_kernel check fails, associated
          release function is two-arg bpf_obj_drop_impl. Before this patch
          only btf_field_kptr's w/ kernel or module BTFs were created.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230310230743.2320707-2-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c8e18754
  2. 10 Mar, 2023 27 commits
  3. 09 Mar, 2023 9 commits
    • Lorenzo Bianconi's avatar
    • Lorenzo Bianconi's avatar
      selftests/bpf: Use ifname instead of ifindex in XDP compliance test tool · 27a36bc3
      Lorenzo Bianconi authored
      Rely on interface name instead of interface index in error messages or
      logs from XDP compliance test tool.
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/7dc5a8ff56c252b1a7ae29b059d0b2b1543c8b5d.1678382940.git.lorenzo@kernel.org
      27a36bc3
    • Michael Weiß's avatar
      bpf: Fix a typo for BPF_F_ANY_ALIGNMENT in bpf.h · 5a70f4a6
      Michael Weiß authored
      Fix s/BPF_PROF_LOAD/BPF_PROG_LOAD/ typo in the documentation comment
      for BPF_F_ANY_ALIGNMENT in bpf.h.
      Signed-off-by: default avatarMichael Weiß <michael.weiss@aisec.fraunhofer.de>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230309133823.944097-1-michael.weiss@aisec.fraunhofer.de
      5a70f4a6
    • Martin KaFai Lau's avatar
      selftests/bpf: Fix flaky fib_lookup test · a6865576
      Martin KaFai Lau authored
      There is a report that fib_lookup test is flaky when running in parallel.
      A symptom of slowness or delay. An example:
      
      Testing IPv6 stale neigh
      set_lookup_params:PASS:inet_pton(IPV6_IFACE_ADDR) 0 nsec
      test_fib_lookup:PASS:bpf_prog_test_run_opts 0 nsec
      test_fib_lookup:FAIL:fib_lookup_ret unexpected fib_lookup_ret: actual 0 != expected 7
      test_fib_lookup:FAIL:dmac not match unexpected dmac not match: actual 1 != expected 0
      dmac expected 11:11:11:11:11:11 actual 00:00:00:00:00:00
      
      [ Note that the "fib_lookup_ret unexpected fib_lookup_ret actual 0 ..."
        is reversed in terms of expected and actual value. Fixing in this
        patch also. ]
      
      One possibility is the testing stale neigh entry was marked dead by the
      gc (in neigh_periodic_work). The default gc_stale_time sysctl is 60s.
      This patch increases it to 15 mins.
      
      It also:
      
      - fixes the reversed arg (actual vs expected) in one of the
        ASSERT_EQ test
      - removes the nodad command arg when adding v4 neigh entry which
        currently has a warning.
      
      Fixes: 168de023 ("selftests/bpf: Add bpf_fib_lookup test")
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230309060244.3242491-1-martin.lau@linux.dev
      a6865576
    • Alexei Starovoitov's avatar
      Merge branch 'BPF open-coded iterators' · 23e403b3
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      Add support for open-coded (aka inline) iterators in BPF world. This is a next
      evolution of gradually allowing more powerful and less restrictive looping and
      iteration capabilities to BPF programs.
      
      We set up a framework for implementing all kinds of iterators (e.g., cgroup,
      task, file, etc, iterators), but this patch set only implements numbers
      iterator, which is used to implement ergonomic bpf_for() for-like construct
      (see patches #4-#5). We also add bpf_for_each(), which is a generic
      foreach-like construct that will work with any kind of open-coded iterator
      implementation, as long as we stick with bpf_iter_<type>_{new,next,destroy}()
      naming pattern (which we now enforce on the kernel side).
      
      Patch #1 is preparatory refactoring for easier way to check for special kfunc
      calls. Patch #2 is adding iterator kfunc registration and validation logic,
      which is mostly independent from the rest of open-coded iterator logic, so is
      separated out for easier reviewing.
      
      The meat of verifier-side logic is in patch #3. Patch #4 implements numbers
      iterator. I kept them separate to have clean reference for how to integrate
      new iterator types (now even simpler to do than in v1 of this patch set).
      Patch #5 adds bpf_for(), bpf_for_each(), and bpf_repeat() macros to
      bpf_misc.h, and also adds yet another pyperf test variant, now with bpf_for()
      loop. Patch #6 is verification tests, based on numbers iterator (as the only
      available right now). Patch #7 actually tests runtime behavior of numbers
      iterator.
      
      Finally, with changes in v2, it's possible and trivial to implement custom
      iterators completely in kernel modules, which we showcase and test by adding
      a simple iterator returning same number a given number of times to
      bpf_testmod. Patch #8 is where all this happens and is tested.
      
      Most of the relevant details are in corresponding commit messages or code
      comments.
      
      v4->v5:
        - fixing missed inner for() in is_iter_reg_valid_uninit, and fixed return
          false (kernel test robot);
        - typo fixes and comment/commit description improvements throughout the
          patch set;
      v3->v4:
        - remove unused variable from is_iter_reg_valid_init (kernel test robot);
      v2->v3:
        - remove special kfunc leftovers for bpf_iter_num_{new,next,destroy};
        - add iters/testmod_seq* to DENYLIST.s390x, it doesn't support kfuncs in
          modules yet (CI);
      v1->v2:
        - rebased on latest, dropping previously landed preparatory patches;
        - each iterator type now have its own `struct bpf_iter_<type>` which allows
          each iterator implementation to use exactly as much stack space as
          necessary, allowing to avoid runtime allocations (Alexei);
        - reworked how iterator kfuncs are defined, no verifier changes are required
          when adding new iterator type;
        - added bpf_testmod-based iterator implementation;
        - address the rest of feedback, comments, commit message adjustment, etc.
      
      Cc: Tejun Heo <tj@kernel.org>
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      23e403b3
    • Andrii Nakryiko's avatar
      selftests/bpf: implement and test custom testmod_seq iterator · 7e86a8c4
      Andrii Nakryiko authored
      Implement a trivial iterator returning same specified integer value
      N times as part of bpf_testmod kernel module. Add selftests to validate
      everything works end to end.
      
      We also reuse these tests as "verification-only" tests to validate that
      kernel prints the state of custom kernel module-defined iterator correctly:
      
        fp-16=iter_testmod_seq(ref_id=1,state=drained,depth=0)
      
      "testmod_seq" part is an iterator type, and is coming from module's BTF
      data dynamically at runtime.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-9-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7e86a8c4
    • Andrii Nakryiko's avatar
      selftests/bpf: add number iterator tests · f59b1460
      Andrii Nakryiko authored
      Add number iterator (bpf_iter_num_{new,next,destroy}()) tests,
      validating the correct handling of various corner and common cases
      *at runtime*.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-8-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f59b1460
    • Andrii Nakryiko's avatar
      selftests/bpf: add iterators tests · 57400dcc
      Andrii Nakryiko authored
      Add various tests for open-coded iterators. Some of them excercise
      various possible coding patterns in C, some go down to low-level
      assembly for more control over various conditions, especially invalid
      ones.
      
      We also make use of bpf_for(), bpf_for_each(), bpf_repeat() macros in
      some of these tests.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-7-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      57400dcc
    • Andrii Nakryiko's avatar
      selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros · 8c2b5e90
      Andrii Nakryiko authored
      Add bpf_for_each(), bpf_for(), and bpf_repeat() macros that make writing
      open-coded iterator-based loops much more convenient and natural. These
      macros utilize cleanup attribute to ensure proper destruction of the
      iterator and thanks to that manage to provide the ergonomics that is
      very close to C language's for() construct. Typical loop would look like:
      
        int i;
        int arr[N];
      
        bpf_for(i, 0, N) {
            /* verifier will know that i >= 0 && i < N, so could be used to
             * directly access array elements with no extra checks
             */
             arr[i] = i;
        }
      
      bpf_repeat() is very similar, but it doesn't expose iteration number and
      is meant as a simple "repeat action N times" loop:
      
        bpf_repeat(N) { /* whatever, N times */ }
      
      Note that `break` and `continue` statements inside the {} block work as
      expected.
      
      bpf_for_each() is a generalization over any kind of BPF open-coded
      iterator allowing to use for-each-like approach instead of calling
      low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.:
      
        struct cgroup *cg;
      
        bpf_for_each(cgroup, cg, some, input, args) {
            /* do something with each cg */
        }
      
      would call (not-yet-implemented) bpf_iter_cgroup_{new,next,destroy}()
      functions to form a loop over cgroups, where `some, input, args` are
      passed verbatim into constructor as
      
        bpf_iter_cgroup_new(&it, some, input, args).
      
      As a first demonstration, add pyperf variant based on the bpf_for() loop.
      
      Also clean up a few tests that either included bpf_misc.h header
      unnecessarily from the user-space, which is unsupported, or included it
      before any common types are defined (and thus leading to unnecessary
      compilation warnings, potentially).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-6-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8c2b5e90