1. 21 Sep, 2022 8 commits
    • Andrii Nakryiko's avatar
      Merge branch 'bpf: Add user-space-publisher ring buffer map type' · c12a0376
      Andrii Nakryiko authored
      David Vernet says:
      
      ====================
      This patch set defines a new map type, BPF_MAP_TYPE_USER_RINGBUF, which
      provides single-user-space-producer / single-kernel-consumer semantics over
      a ring buffer.  Along with the new map type, a helper function called
      bpf_user_ringbuf_drain() is added which allows a BPF program to specify a
      callback with the following signature, to which samples are posted by the
      helper:
      
      void (struct bpf_dynptr *dynptr, void *context);
      
      The program can then use the bpf_dynptr_read() or bpf_dynptr_data() helper
      functions to safely read the sample from the dynptr. There are currently no
      helpers available to determine the size of the sample, but one could easily
      be added if required.
      
      On the user-space side, libbpf has been updated to export a new
      'struct ring_buffer_user' type, along with the following symbols:
      
      struct ring_buffer_user *
      ring_buffer_user__new(int map_fd,
                            const struct ring_buffer_user_opts *opts);
      void ring_buffer_user__free(struct ring_buffer_user *rb);
      void *ring_buffer_user__reserve(struct ring_buffer_user *rb,
      				uint32_t size);
      void *ring_buffer_user__poll(struct ring_buffer_user *rb, uint32_t size,
      			     int timeout_ms);
      void ring_buffer_user__discard(struct ring_buffer_user *rb, void *sample);
      void ring_buffer_user__submit(struct ring_buffer_user *rb, void *sample);
      
      These symbols are exported for inclusion in libbpf version 1.0.0.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      ---
      v5 -> v6:
      - Fixed s/BPF_MAP_TYPE_RINGBUF/BPF_MAP_TYPE_USER_RINGBUF typo in the
        libbpf user ringbuf doxygen header comment for ring_buffer_user__new()
        (Andrii).
      - Specify that pointer returned from ring_buffer_user__reserve() and its
        blocking counterpart is 8-byte aligned (Andrii).
      - Renamed user_ringbuf__commit() to user_ringbuf_commit(), as it's static
        (Andrii).
      - Another slight reworking of user_ring_buffer__reserve_blocking() to
        remove some extraneous nanosecond variables + checking (Andrii).
      - Add a final check of user_ring_buffer__reserve() in
        user_ring_buffer__reserve_blocking().
      - Moved busy bit lock / unlock logic from __bpf_user_ringbuf_peek() to
        bpf_user_ringbuf_drain() (Andrii).
      - -ENOSPC -> -ENODATA for an empty ring buffer in
        __bpf_user_ringbuf_peek() (Andrii).
      - Updated BPF_RB_FORCE_WAKEUP to only force a wakeup notification to be
        sent if even if no sample was drained.
      - Changed a bit of the wording in the UAPI header for
        bpf_user_ringbuf_drain() to mention the BPF_RB_FORCE_WAKEUP behavior.
      - Remove extra space after return in ringbuf_map_poll_user() (Andrii).
      - Removed now-extraneous paragraph from the commit summary of patch 2/4
        (Andrii).
      v4 -> v5:
      - DENYLISTed the user-ringbuf test suite on s390x. We have a number of
        functions in the progs/user_ringbuf_success.c prog that user-space
        fires by invoking a syscall. Not all of these syscalls are available
        on s390x. If and when we add the ability to kick the kernel from
        user-space, or if we end up using iterators for that per Hao's
        suggestion, we could re-enable this test suite on s390x.
      - Fixed a few more places that needed ringbuffer -> ring buffer.
      v3 -> v4:
      - Update BPF_MAX_USER_RINGBUF_SAMPLES to not specify a bit, and instead
        just specify a number of samples. (Andrii)
      - Update "ringbuffer" in comments and commit summaries to say "ring
        buffer". (Andrii)
      - Return -E2BIG from bpf_user_ringbuf_drain() both when a sample can't
        fit into the ring buffer, and when it can't fit into a dynptr. (Andrii)
      - Don't loop over samples in __bpf_user_ringbuf_peek() if a sample was
        discarded. Instead, return -EAGAIN so the caller can deal with it. Also
        updated the caller to detect -EAGAIN and skip over it when iterating.
        (Andrii)
      - Removed the heuristic for notifying user-space when a sample is drained,
        causing the ring buffer to no longer be full. This may be useful in the
        future, but is being removed now because it's strictly a heuristic.
      - Re-add BPF_RB_FORCE_WAKEUP flag to bpf_user_ringbuf_drain(). (Andrii)
      - Remove helper_allocated_dynptr tracker from verifier. (Andrii)
      - Add libbpf function header comments to tools/lib/bpf/libbpf.h, so that
        they will be included in rendered libbpf docs. (Andrii)
      - Add symbols to a new LIBBPF_1.1.0 section in linker version script,
        rather than including them in LIBBPF_1.0.0. (Andrii)
      - Remove libbpf_err() calls from static libbpf functions. (Andrii)
      - Check user_ring_buffer_opts instead of ring_buffer_opts in
        user_ring_buffer__new(). (Andrii)
      - Avoid an extra if in the hot path in user_ringbuf__commit(). (Andrii)
      - Use ENOSPC rather than ENODATA if no space is available in the ring
        buffer. (Andrii)
      - Don't round sample size in header to 8, but still round size that is
        reserved and written to 8, and validate positions are multiples of 8
        (Andrii).
      - Use nanoseconds for most calculations in
        user_ring_buffer__reserve_blocking(). (Andrii)
      - Don't use CHECK() in testcases, instead use ASSERT_*. (Andrii)
      - Use SEC("?raw_tp") instead of SEC("?raw_tp/sys_nanosleep") in negative
        test. (Andrii)
      - Move test_user_ringbuf.h header to live next to BPF program instead of
        a directory up from both it and the user-space test program. (Andrii)
      - Update bpftool help message / docs to also include user_ringbuf.
      v2 -> v3:
      - Lots of formatting fixes, such as keeping things on one line if they fit
        within 100 characters, and removing some extraneous newlines. Applies
        to all diffs in the patch-set. (Andrii)
      - Renamed ring_buffer_user__* symbols to user_ring_buffer__*. (Andrii)
      - Added a missing smb_mb__before_atomic() in
        __bpf_user_ringbuf_sample_release(). (Hao)
      - Restructure how and when notification events are sent from the kernel to
        the user-space producers via the .map_poll() callback for the
        BPF_MAP_TYPE_USER_RINGBUF map. Before, we only sent a notification when
        the ringbuffer was fully drained. Now, we guarantee user-space that
        we'll send an event at least once per bpf_user_ringbuf_drain(), as long
        as at least one sample was drained, and BPF_RB_NO_WAKEUP was not passed.
        As a heuristic, we also send a notification event any time a sample being
        drained causes the ringbuffer to no longer be full. (Andrii)
      - Continuing on the above point, updated
        user_ring_buffer__reserve_blocking() to loop around epoll_wait() until a
        sufficiently large sample is found. (Andrii)
      - Communicate BPF_RINGBUF_BUSY_BIT and BPF_RINGBUF_DISCARD_BIT in sample
        headers. The ringbuffer implementation still only supports
        single-producer semantics, but we can now add synchronization support in
        user_ring_buffer__reserve(), and will automatically get multi-producer
        semantics. (Andrii)
      - Updated some commit summaries, specifically adding more details where
        warranted. (Andrii)
      - Improved function documentation for bpf_user_ringbuf_drain(), more
        clearly explaining all function arguments and return types, as well as
        the semantics for waking up user-space producers.
      - Add function header comments for user_ring_buffer__reserve{_blocking}().
        (Andrii)
      - Rounding-up all samples to 8-bytes in the user-space producer, and
        enforcing that all samples are properly aligned in the kernel. (Andrii)
      - Added testcases that verify that bpf_user_ringbuf_drain() properly
        validates samples, and returns error conditions if any invalid samples
        are encountered. (Andrii)
      - Move atomic_t busy field out of the consumer page, and into the
        struct bpf_ringbuf. (Andrii)
      - Split ringbuf_map_{mmap, poll}_{kern, user}() into separate
        implementations. (Andrii)
      - Don't silently consume errors in bpf_user_ringbuf_drain(). (Andrii)
      - Remove magic number of samples (4096) from bpf_user_ringbuf_drain(),
        and instead use BPF_MAX_USER_RINGBUF_SAMPLES macro, which allows
        128k samples. (Andrii)
      - Remove MEM_ALLOC modifier from PTR_TO_DYNPTR register in verifier, and
        instead rely solely on the register being PTR_TO_DYNPTR. (Andrii)
      - Move freeing of atomic_t busy bit to before we invoke irq_work_queue() in
        __bpf_user_ringbuf_sample_release(). (Andrii)
      - Only check for BPF_RB_NO_WAKEUP flag in bpf_ringbuf_drain().
      - Remove libbpf function names from kernel smp_{load, store}* comments in
        the kernel. (Andrii)
      - Don't use double-underscore naming convention in libbpf functions.
        (Andrii)
      - Use proper __u32 and __u64 for types where we need to guarantee their
        size. (Andrii)
      
      v1 -> v2:
      - Following Joanne landing 88374342 ("bpf: Fix ref_obj_id for dynptr
        data slices in verifier") [0], removed [PATCH 1/5] bpf: Clear callee
        saved regs after updating REG0 [1]. (Joanne)
      - Following the above adjustment, updated check_helper_call() to not store
        a reference for bpf_dynptr_data() if the register containing the dynptr
        is of type MEM_ALLOC. (Joanne)
      - Fixed casting issue pointed out by kernel test robot by adding a missing
        (uintptr_t) cast. (lkp)
      
      [0] https://lore.kernel.org/all/20220809214055.4050604-1-joannelkoong@gmail.com/
      [1] https://lore.kernel.org/all/20220808155341.2479054-1-void@manifault.com/
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      c12a0376
    • David Vernet's avatar
      selftests/bpf: Add selftests validating the user ringbuf · e5a9df51
      David Vernet authored
      This change includes selftests that validate the expected behavior and
      APIs of the new BPF_MAP_TYPE_USER_RINGBUF map type.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220920000100.477320-5-void@manifault.com
      e5a9df51
    • David Vernet's avatar
      bpf: Add libbpf logic for user-space ring buffer · b66ccae0
      David Vernet authored
      Now that all of the logic is in place in the kernel to support user-space
      produced ring buffers, we can add the user-space logic to libbpf. This
      patch therefore adds the following public symbols to libbpf:
      
      struct user_ring_buffer *
      user_ring_buffer__new(int map_fd,
      		      const struct user_ring_buffer_opts *opts);
      void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
      void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                               __u32 size, int timeout_ms);
      void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
      void user_ring_buffer__discard(struct user_ring_buffer *rb,
      void user_ring_buffer__free(struct user_ring_buffer *rb);
      
      A user-space producer must first create a struct user_ring_buffer * object
      with user_ring_buffer__new(), and can then reserve samples in the
      ring buffer using one of the following two symbols:
      
      void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
      void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                               __u32 size, int timeout_ms);
      
      With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring
      buffer will be returned if sufficient space is available in the buffer.
      user_ring_buffer__reserve_blocking() provides similar semantics, but will
      block for up to 'timeout_ms' in epoll_wait if there is insufficient space
      in the buffer. This function has the guarantee from the kernel that it will
      receive at least one event-notification per invocation to
      bpf_ringbuf_drain(), provided that at least one sample is drained, and the
      BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain().
      
      Once a sample is reserved, it must either be committed to the ring buffer
      with user_ring_buffer__submit(), or discarded with
      user_ring_buffer__discard().
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
      b66ccae0
    • David Vernet's avatar
      bpf: Add bpf_user_ringbuf_drain() helper · 20571567
      David Vernet authored
      In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which
      will allow user-space applications to publish messages to a ring buffer
      that is consumed by a BPF program in kernel-space. In order for this
      map-type to be useful, it will require a BPF helper function that BPF
      programs can invoke to drain samples from the ring buffer, and invoke
      callbacks on those samples. This change adds that capability via a new BPF
      helper function:
      
      bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx,
                             u64 flags)
      
      BPF programs may invoke this function to run callback_fn() on a series of
      samples in the ring buffer. callback_fn() has the following signature:
      
      long callback_fn(struct bpf_dynptr *dynptr, void *context);
      
      Samples are provided to the callback in the form of struct bpf_dynptr *'s,
      which the program can read using BPF helper functions for querying
      struct bpf_dynptr's.
      
      In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register
      type is added to the verifier to reflect a dynptr that was allocated by
      a helper function and passed to a BPF program. Unlike PTR_TO_STACK
      dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR
      dynptrs need not use reference tracking, as the BPF helper is trusted to
      properly free the dynptr before returning. The verifier currently only
      supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL.
      
      Note that while the corresponding user-space libbpf logic will be added
      in a subsequent patch, this patch does contain an implementation of the
      .map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This
      .map_poll() callback guarantees that an epoll-waiting user-space
      producer will receive at least one event notification whenever at least
      one sample is drained in an invocation of bpf_user_ringbuf_drain(),
      provided that the function is not invoked with the BPF_RB_NO_WAKEUP
      flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup
      notification is sent even if no sample was drained.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com
      20571567
    • David Vernet's avatar
      bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map type · 583c1f42
      David Vernet authored
      We want to support a ringbuf map type where samples are published from
      user-space, to be consumed by BPF programs. BPF currently supports a
      kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF
      map type.  We'll need to define a new map type for user-space -> kernel,
      as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply
      to a user-space producer ring buffer, and we'll want to add one or
      more helper functions that would not apply for a kernel-producer
      ring buffer.
      
      This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type
      definition. The map type is useless in its current form, as there is no
      way to access or use it for anything until we one or more BPF helpers. A
      follow-on patch will therefore add a new helper function that allows BPF
      programs to run callbacks on samples that are published to the ring
      buffer.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com
      583c1f42
    • William Dean's avatar
      bpf: simplify code in btf_parse_hdr · 3a74904c
      William Dean authored
      It could directly return 'btf_check_sec_info' to simplify code.
      Signed-off-by: default avatarWilliam Dean <williamsukatube@163.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20220917084248.3649-1-williamsukatube@163.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      3a74904c
    • Xin Liu's avatar
      libbpf: Fix NULL pointer exception in API btf_dump__dump_type_data · 7620bffb
      Xin Liu authored
      We found that function btf_dump__dump_type_data can be called by the
      user as an API, but in this function, the `opts` parameter may be used
      as a null pointer.This causes `opts->indent_str` to trigger a NULL
      pointer exception.
      
      Fixes: 2ce8450e ("libbpf: add bpf_object__open_{file, mem} w/ extensible opts")
      Signed-off-by: default avatarXin Liu <liuxin350@huawei.com>
      Signed-off-by: default avatarWeibin Kong <kongweibin2@huawei.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220917084809.30770-1-liuxin350@huawei.com
      7620bffb
    • Rong Tao's avatar
      samples/bpf: Replace blk_account_io_done() with __blk_account_io_done() · bc069da6
      Rong Tao authored
      Since commit be6bfe36 ("block: inline hot paths of blk_account_io_*()")
      blk_account_io_*() become inline functions.
      Signed-off-by: default avatarRong Tao <rtoax@foxmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/tencent_1CC476835C219FACD84B6715F0D785517E07@qq.com
      bc069da6
  2. 20 Sep, 2022 5 commits
  3. 19 Sep, 2022 1 commit
  4. 17 Sep, 2022 1 commit
  5. 16 Sep, 2022 8 commits
  6. 15 Sep, 2022 1 commit
    • Dave Marchevsky's avatar
      bpf: Add verifier check for BPF_PTR_POISON retval and arg · 47e34cb7
      Dave Marchevsky authored
      BPF_PTR_POISON was added in commit c0a5a21c ("bpf: Allow storing
      referenced kptr in map") to denote a bpf_func_proto btf_id which the
      verifier will replace with a dynamically-determined btf_id at verification
      time.
      
      This patch adds verifier 'poison' functionality to BPF_PTR_POISON in
      order to prepare for expanded use of the value to poison ret- and
      arg-btf_id in ongoing work, namely rbtree and linked list patchsets
      [0, 1]. Specifically, when the verifier checks helper calls, it assumes
      that BPF_PTR_POISON'ed ret type will be replaced with a valid type before
      - or in lieu of - the default ret_btf_id logic. Similarly for arg btf_id.
      
      If poisoned btf_id reaches default handling block for either, consider
      this a verifier internal error and fail verification. Otherwise a helper
      w/ poisoned btf_id but no verifier logic replacing the type will cause a
      crash as the invalid pointer is dereferenced.
      
      Also move BPF_PTR_POISON to existing include/linux/posion.h header and
      remove unnecessary shift.
      
        [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
        [1]: lore.kernel.org/bpf/20220904204145.3089-1-memxor@gmail.com
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220912154544.1398199-1-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      47e34cb7
  7. 11 Sep, 2022 9 commits
  8. 10 Sep, 2022 2 commits
  9. 09 Sep, 2022 5 commits