1. 17 Aug, 2021 6 commits
  2. 16 Aug, 2021 18 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-perf-link' · 3a4ce01b
      Daniel Borkmann authored
      Andrii Nakryiko says:
      
      ====================
      This patch set implements an ability for users to specify custom black box u64
      value for each BPF program attachment, bpf_cookie, which is available to BPF
      program at runtime. This is a feature that's critically missing for cases when
      some sort of generic processing needs to be done by the common BPF program
      logic (or even exactly the same BPF program) across multiple BPF hooks (e.g.,
      many uniformly handled kprobes) and it's important to be able to distinguish
      between each BPF hook at runtime (e.g., for additional configuration lookup).
      
      The choice of restricting this to a fixed-size 8-byte u64 value is an explicit
      design decision. Making this configurable by users adds unnecessary complexity
      (extra memory allocations, extra complications on the verifier side to validate
      accesses to variable-sized data area) while not really opening up new
      possibilities. If user's use case requires storing more data per attachment,
      it's possible to use either global array, or ARRAY/HASHMAP BPF maps, where
      bpf_cookie would be used as an index into respective storage, populated by
      user-space code before creating BPF link. This gives user all the flexibility
      and control while keeping BPF verifier and BPF helper API simple.
      
      Currently, similar functionality can only be achieved through:
      
        - code-generation and BPF program cloning, which is very complicated and
          unmaintainable;
        - on-the-fly C code generation and further runtime compilation, which is
          what BCC uses and allows to do pretty simply. The big downside is a very
          heavy-weight Clang/LLVM dependency and inefficient memory usage (due to
          many BPF program clones and the compilation process itself);
        - in some cases (kprobes and sometimes uprobes) it's possible to do function
          IP lookup to get function-specific configuration. This doesn't work for
          all the cases (e.g., when attaching uprobes to shared libraries) and has
          higher runtime overhead and additional programming complexity due to
          BPF_MAP_TYPE_HASHMAP lookups. Up until recently, before bpf_get_func_ip()
          BPF helper was added, it was also very complicated and unstable (API-wise)
          to get traced function's IP from fentry/fexit and kretprobe.
      
      With libbpf and BPF CO-RE, runtime compilation is not an option, so to be able
      to build generic tracing tooling simply and efficiently, ability to provide
      additional bpf_cookie value for each *attachment* (as opposed to each BPF
      program) is extremely important. Two immediate users of this functionality are
      going to be libbpf-based USDT library (currently in development) and retsnoop
      ([0]), but I'm sure more applications will come once users get this feature in
      their kernels.
      
      To achieve above described, all perf_event-based BPF hooks are made available
      through a new BPF_LINK_TYPE_PERF_EVENT BPF link, which allows to use common
      LINK_CREATE command for program attachments and generally brings
      perf_event-based attachments into a common BPF link infrastructure.
      
      With that, LINK_CREATE gets ability to pass throught bpf_cookie value during
      link creation (BPF program attachment) time. bpf_get_attach_cookie() BPF
      helper is added to allow fetching this value at runtime from BPF program side.
      BPF cookie is stored either on struct perf_event itself and fetched from the
      BPF program context, or is passed through ambient BPF run context, added in
      c7603cfa ("bpf: Add ambient BPF runtime context stored in current").
      
      On the libbpf side of things, BPF perf link is utilized whenever is supported
      by the kernel instead of using PERF_EVENT_IOC_SET_BPF ioctl on perf_event FD.
      All the tracing attach APIs are extended with OPTS and bpf_cookie is passed
      through corresponding opts structs.
      
      Last part of the patch set adds few self-tests utilizing new APIs.
      
      There are also a few refactorings along the way to make things cleaner and
      easier to work with, both in kernel (BPF_PROG_RUN and BPF_PROG_RUN_ARRAY), and
      throughout libbpf and selftests.
      
      Follow-up patches will extend bpf_cookie to fentry/fexit programs.
      
      While adding uprobe_opts, also extend it with ref_ctr_offset for specifying
      USDT semaphore (reference counter) offset. Update attach_probe selftests to
      validate its functionality. This is another feature (along with bpf_cookie)
      required for implementing libbpf-based USDT solution.
      
        [0] https://github.com/anakryiko/retsnoop
      
      v4->v5:
        - rebase on latest bpf-next to resolve merge conflict;
        - add ref_ctr_offset to uprobe_opts and corresponding selftest;
      v3->v4:
        - get rid of BPF_PROG_RUN macro in favor of bpf_prog_run() (Daniel);
        - move #ifdef CONFIG_BPF_SYSCALL check into bpf_set_run_ctx (Daniel);
      v2->v3:
        - user_ctx -> bpf_cookie, bpf_get_user_ctx -> bpf_get_attach_cookie (Peter);
        - fix BPF_LINK_TYPE_PERF_EVENT value fix (Jiri);
        - use bpf_prog_run() from bpf_prog_run_pin_on_cpu() (Yonghong);
      v1->v2:
        - fix build failures on non-x86 arches by gating on CONFIG_PERF_EVENTS.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3a4ce01b
    • Andrii Nakryiko's avatar
      selftests/bpf: Add ref_ctr_offset selftests · 4bd11e08
      Andrii Nakryiko authored
      Extend attach_probe selftests to specify ref_ctr_offset for uprobe/uretprobe
      and validate that its value is incremented from zero.
      
      Turns out that once uprobe is attached with ref_ctr_offset, uretprobe for the
      same location/function *has* to use ref_ctr_offset as well, otherwise
      perf_event_open() fails with -EINVAL. So this test uses ref_ctr_offset for
      both uprobe and uretprobe, even though for the purpose of test uprobe would be
      enough.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-17-andrii@kernel.org
      4bd11e08
    • Andrii Nakryiko's avatar
      libbpf: Add uprobe ref counter offset support for USDT semaphores · 5e3b8356
      Andrii Nakryiko authored
      When attaching to uprobes through perf subsystem, it's possible to specify
      offset of a so-called USDT semaphore, which is just a reference counted u16,
      used by kernel to keep track of how many tracers are attached to a given
      location. Support for this feature was added in [0], so just wire this through
      uprobe_opts. This is important to enable implementing USDT attachment and
      tracing through libbpf's bpf_program__attach_uprobe_opts() API.
      
        [0] a6ca88b2 ("trace_uprobe: support reference counter in fd-based uprobe")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-16-andrii@kernel.org
      5e3b8356
    • Andrii Nakryiko's avatar
      selftests/bpf: Add bpf_cookie selftests for high-level APIs · 0a80cf67
      Andrii Nakryiko authored
      Add selftest with few subtests testing proper bpf_cookie usage.
      
      Kprobe and uprobe subtests are pretty straightforward and just validate that
      the same BPF program attached with different bpf_cookie will be triggered with
      those different bpf_cookie values.
      
      Tracepoint subtest is a bit more interesting, as it is the only
      perf_event-based BPF hook that shares bpf_prog_array between multiple
      perf_events internally. This means that the same BPF program can't be attached
      to the same tracepoint multiple times. So we have 3 identical copies. This
      arrangement allows to test bpf_prog_array_copy()'s handling of bpf_prog_array
      list manipulation logic when programs are attached and detached.  The test
      validates that bpf_cookie isn't mixed up and isn't lost during such list
      manipulations.
      
      Perf_event subtest validates that two BPF links can be created against the
      same perf_event (but not at the same time, only one BPF program can be
      attached to perf_event itself), and that for each we can specify different
      bpf_cookie value.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-15-andrii@kernel.org
      0a80cf67
    • Andrii Nakryiko's avatar
      selftests/bpf: Extract uprobe-related helpers into trace_helpers.{c,h} · a549aaa6
      Andrii Nakryiko authored
      Extract two helpers used for working with uprobes into trace_helpers.{c,h} to
      be re-used between multiple uprobe-using selftests. Also rename get_offset()
      into more appropriate get_uprobe_offset().
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-14-andrii@kernel.org
      a549aaa6
    • Andrii Nakryiko's avatar
      selftests/bpf: Test low-level perf BPF link API · f36d3557
      Andrii Nakryiko authored
      Add tests utilizing low-level bpf_link_create() API to create perf BPF link.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-13-andrii@kernel.org
      f36d3557
    • Andrii Nakryiko's avatar
      libbpf: Add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs · 47faff37
      Andrii Nakryiko authored
      Wire through bpf_cookie for all attach APIs that use perf_event_open under the
      hood:
        - for kprobes, extend existing bpf_kprobe_opts with bpf_cookie field;
        - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and
          pass bpf_cookie through opts.
      
      For kernel that don't support BPF_LINK_CREATE for perf_events, and thus
      bpf_cookie is not supported either, return error and log warning for user.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-12-andrii@kernel.org
      47faff37
    • Andrii Nakryiko's avatar
      libbpf: Add bpf_cookie support to bpf_link_create() API · 3ec84f4b
      Andrii Nakryiko authored
      Add ability to specify bpf_cookie value when creating BPF perf link with
      bpf_link_create() low-level API.
      
      Given BPF_LINK_CREATE command is growing and keeps getting new fields that are
      specific to the type of BPF_LINK, extend libbpf side of bpf_link_create() API
      and corresponding OPTS struct to accomodate such changes. Add extra checks to
      prevent using incompatible/unexpected combinations of fields.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-11-andrii@kernel.org
      3ec84f4b
    • Andrii Nakryiko's avatar
      libbpf: Use BPF perf link when supported by kernel · 668ace0e
      Andrii Nakryiko authored
      Detect kernel support for BPF perf link and prefer it when attaching to
      perf_event, tracepoint, kprobe/uprobe. Underlying perf_event FD will be kept
      open until BPF link is destroyed, at which point both perf_event FD and BPF
      link FD will be closed.
      
      This preserves current behavior in which perf_event FD is open for the
      duration of bpf_link's lifetime and user is able to "disconnect" bpf_link from
      underlying FD (with bpf_link__disconnect()), so that bpf_link__destroy()
      doesn't close underlying perf_event FD.When BPF perf link is used, disconnect
      will keep both perf_event and bpf_link FDs open, so it will be up to
      (advanced) user to close them. This approach is demonstrated in bpf_cookie.c
      selftests, added in this patch set.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-10-andrii@kernel.org
      668ace0e
    • Andrii Nakryiko's avatar
      libbpf: Remove unused bpf_link's destroy operation, but add dealloc · d88b71d4
      Andrii Nakryiko authored
      bpf_link->destroy() isn't used by any code, so remove it. Instead, add ability
      to override deallocation procedure, with default doing plain free(link). This
      is necessary for cases when we want to "subclass" struct bpf_link to keep
      extra information, as is the case in the next patch adding struct
      bpf_link_perf.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-9-andrii@kernel.org
      d88b71d4
    • Andrii Nakryiko's avatar
      libbpf: Re-build libbpf.so when libbpf.map changes · 61c7aa50
      Andrii Nakryiko authored
      Ensure libbpf.so is re-built whenever libbpf.map is modified.  Without this,
      changes to libbpf.map are not detected and versioned symbols mismatch error
      will be reported until `make clean && make` is used, which is a suboptimal
      developer experience.
      
      Fixes: 306b267c ("libbpf: Verify versioned symbols")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-8-andrii@kernel.org
      61c7aa50
    • Andrii Nakryiko's avatar
      bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value · 7adfc6c9
      Andrii Nakryiko authored
      Add new BPF helper, bpf_get_attach_cookie(), which can be used by BPF programs
      to get access to a user-provided bpf_cookie value, specified during BPF
      program attachment (BPF link creation) time.
      
      Naming is hard, though. With the concept being named "BPF cookie", I've
      considered calling the helper:
        - bpf_get_cookie() -- seems too unspecific and easily mistaken with socket
          cookie;
        - bpf_get_bpf_cookie() -- too much tautology;
        - bpf_get_link_cookie() -- would be ok, but while we create a BPF link to
          attach BPF program to BPF hook, it's still an "attachment" and the
          bpf_cookie is associated with BPF program attachment to a hook, not a BPF
          link itself. Technically, we could support bpf_cookie with old-style
          cgroup programs.So I ultimately rejected it in favor of
          bpf_get_attach_cookie().
      
      Currently all perf_event-backed BPF program types support
      bpf_get_attach_cookie() helper. Follow-up patches will add support for
      fentry/fexit programs as well.
      
      While at it, mark bpf_tracing_func_proto() as static to make it obvious that
      it's only used from within the kernel/trace/bpf_trace.c.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-7-andrii@kernel.org
      7adfc6c9
    • Andrii Nakryiko's avatar
      bpf: Allow to specify user-provided bpf_cookie for BPF perf links · 82e6b1ee
      Andrii Nakryiko authored
      Add ability for users to specify custom u64 value (bpf_cookie) when creating
      BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
      tracepoints).
      
      This is useful for cases when the same BPF program is used for attaching and
      processing invocation of different tracepoints/kprobes/uprobes in a generic
      fashion, but such that each invocation is distinguished from each other (e.g.,
      BPF program can look up additional information associated with a specific
      kernel function without having to rely on function IP lookups). This enables
      new use cases to be implemented simply and efficiently that previously were
      possible only through code generation (and thus multiple instances of almost
      identical BPF program) or compilation at runtime (BCC-style) on target hosts
      (even more expensive resource-wise). For uprobes it is not even possible in
      some cases to know function IP before hand (e.g., when attaching to shared
      library without PID filtering, in which case base load address is not known
      for a library).
      
      This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
      corresponding to each attached and run BPF program. Given cgroup BPF programs
      already use two 8-byte pointers for their needs and cgroup BPF programs don't
      have (yet?) support for bpf_cookie, reuse that space through union of
      cgroup_storage and new bpf_cookie field.
      
      Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
      This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
      program execution code, which luckily is now also split from
      BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
      giving access to this user-provided cookie value from inside a BPF program.
      Generic perf_event BPF programs will access this value from perf_event itself
      through passed in BPF program context.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org
      82e6b1ee
    • Andrii Nakryiko's avatar
      bpf: Implement minimal BPF perf link · b89fbfbb
      Andrii Nakryiko authored
      Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
      BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
      the common BPF link infrastructure, allowing to list all active perf_event
      based attachments, auto-detaching BPF program from perf_event when link's FD
      is closed, get generic BPF link fdinfo/get_info functionality.
      
      BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
      are currently supported.
      
      Force-detaching and atomic BPF program updates are not yet implemented, but
      with perf_event-based BPF links we now have common framework for this without
      the need to extend ioctl()-based perf_event interface.
      
      One interesting consideration is a new value for bpf_attach_type, which
      BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
      bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
      bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
      BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
      program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
      mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
      define a single BPF_PERF_EVENT attach type for all of them and adjust
      link_create()'s logic for checking correspondence between attach type and
      program type.
      
      The alternative would be to define three new attach types (e.g., BPF_KPROBE,
      BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
      and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
      libbpf. I chose to not do this to avoid unnecessary proliferation of
      bpf_attach_type enum values and not have to deal with naming conflicts.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-5-andrii@kernel.org
      b89fbfbb
    • Andrii Nakryiko's avatar
      bpf: Refactor perf_event_set_bpf_prog() to use struct bpf_prog input · 652c1b17
      Andrii Nakryiko authored
      Make internal perf_event_set_bpf_prog() use struct bpf_prog pointer as an
      input argument, which makes it easier to re-use for other internal uses
      (coming up for BPF link in the next patch). BPF program FD is not as
      convenient and in some cases it's not available. So switch to struct bpf_prog,
      move out refcounting outside and let caller do bpf_prog_put() in case of an
      error. This follows the approach of most of the other BPF internal functions.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-4-andrii@kernel.org
      652c1b17
    • Andrii Nakryiko's avatar
      bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions · 7d08c2c9
      Andrii Nakryiko authored
      Similar to BPF_PROG_RUN, turn BPF_PROG_RUN_ARRAY macros into proper functions
      with all the same readability and maintainability benefits. Making them into
      functions required shuffling around bpf_set_run_ctx/bpf_reset_run_ctx
      functions. Also, explicitly specifying the type of the BPF prog run callback
      required adjusting __bpf_prog_run_save_cb() to accept const void *, casted
      internally to const struct sk_buff.
      
      Further, split out a cgroup-specific BPF_PROG_RUN_ARRAY_CG and
      BPF_PROG_RUN_ARRAY_CG_FLAGS from the more generic BPF_PROG_RUN_ARRAY due to
      the differences in bpf_run_ctx used for those two different use cases.
      
      I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to accept
      struct cgroup and enum bpf_attach_type instead of bpf_prog_array, fetching
      cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
      required including include/linux/cgroup-defs.h, which I wasn't sure is ok with
      everyone.
      
      The remaining generic BPF_PROG_RUN_ARRAY function will be extended to
      pass-through user-provided context value in the next patch.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-3-andrii@kernel.org
      7d08c2c9
    • Andrii Nakryiko's avatar
      bpf: Refactor BPF_PROG_RUN into a function · fb7dd8bc
      Andrii Nakryiko authored
      Turn BPF_PROG_RUN into a proper always inlined function. No functional and
      performance changes are intended, but it makes it much easier to understand
      what's going on with how BPF programs are actually get executed. It's more
      obvious what types and callbacks are expected. Also extra () around input
      parameters can be dropped, as well as `__` variable prefixes intended to avoid
      naming collisions, which makes the code simpler to read and write.
      
      This refactoring also highlighted one extra issue. BPF_PROG_RUN is both
      a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
      BPF_PROG_RUN into a function causes naming conflict compilation error. So
      rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
      bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of
      BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org
      fb7dd8bc
    • Colin Ian King's avatar
      1bda52f8
  3. 15 Aug, 2021 6 commits
  4. 14 Aug, 2021 3 commits
  5. 13 Aug, 2021 7 commits