1. 08 Apr, 2022 12 commits
  2. 07 Apr, 2022 6 commits
  3. 06 Apr, 2022 9 commits
  4. 05 Apr, 2022 10 commits
    • Yuntao Wang's avatar
      selftests/bpf: Fix file descriptor leak in load_kallsyms() · 2d0df019
      Yuntao Wang authored
      Currently, if sym_cnt > 0, it just returns and does not close file, fix it.
      Signed-off-by: default avatarYuntao Wang <ytcoode@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220405145711.49543-1-ytcoode@gmail.com
      2d0df019
    • Xu Kuohai's avatar
      bpf, arm64: Sign return address for JITed code · 042152c2
      Xu Kuohai authored
      Sign return address for JITed code when the kernel is built with pointer
      authentication enabled:
      
      1. Sign LR with paciasp instruction before LR is pushed to stack. Since
         paciasp acts like landing pads for function entry, no need to insert
         bti instruction before paciasp.
      
      2. Authenticate LR with autiasp instruction after LR is popped from stack.
      
      For BPF tail call, the stack frame constructed by the caller is reused by
      the callee. That is, the stack frame is constructed by the caller and
      destructed by the callee. Thus LR is signed and pushed to the stack in the
      caller's prologue, and poped from the stack and authenticated in the
      callee's epilogue.
      
      For BPF2BPF call, the caller and callee construct their own stack frames,
      and sign and authenticate their own LRs.
      Signed-off-by: default avatarXu Kuohai <xukuohai@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://events.static.linuxfound.org/sites/events/files/slides/slides_23.pdf
      Link: https://lore.kernel.org/bpf/20220402073942.3782529-1-xukuohai@huawei.com
      042152c2
    • Alexei Starovoitov's avatar
      Merge branch 'Add libbpf support for USDTs' · 9a7ef9f8
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      Add libbpf support for USDT (User Statically-Defined Tracing) probes.
      USDTs is important part of tracing, and BPF, ecosystem, widely used in
      mission-critical production applications for observability, performance
      analysis, and debugging.
      
      And while USDTs themselves are pretty complicated abstraction built on top of
      uprobes, for end-users USDT is as natural a primitive as uprobes themselves.
      And thus it's important for libbpf to provide best possible user experience
      when it comes to build tracing applications relying on USDTs.
      
      USDTs historically presented a lot of challenges for libbpf's no
      compilation-on-the-fly general approach to BPF tracing. BCC utilizes power of
      on-the-fly source code generation and compilation using its embedded Clang
      toolchain, which was impractical for more lightweight and thus more rigid
      libbpf-based approach. But still, with enough diligence and BPF cookies it's
      possible to implement USDT support that feels as natural as tracing any
      uprobe.
      
      This patch set is the culmination of such effort to add libbpf USDT support
      following the spirit and philosophy of BPF CO-RE (even though it's not
      inherently relying on BPF CO-RE much, see patch #1 for some notes regarding
      this). Each respective patch has enough details and explanations, so I won't
      go into details here.
      
      In the end, I think the overall usability of libbpf's USDT support *exceeds*
      the status quo set by BCC due to the elimination of awkward runtime USDT
      supporting code generation. It also exceeds BCC's capabilities due to the use
      of BPF cookie. This eliminates the need to determine a USDT call site (and
      thus specifics about how exactly to fetch arguments) based on its *absolute IP
      address*, which is impossible with shared libraries if no PID is specified (as
      we then just *can't* know absolute IP at which shared library is loaded,
      because it might be different for each process). With BPF cookie this is not
      a problem as we record "call site ID" directly in a BPF cookie value. This
      makes it possible to do a system-wide tracing of a USDT defined in a shared
      library. Think about tracing some USDT in libc across any process in the
      system, both running at the time of attachment and all the new processes
      started *afterwards*. This is a very powerful capability that allows more
      efficient observability and tracing tooling.
      
      Once this functionality lands, the plan is to extend libbpf-bootstrap ([0])
      with an USDT example. It will also become possible to start converting BCC
      tools that rely on USDTs to their libbpf-based counterparts ([1]).
      
      It's worth noting that preliminary version of this code was currently used and
      tested in production code running fleet-wide observability toolkit.
      
      Libbpf functionality is broken down into 5 mostly logically independent parts,
      for ease of reviewing:
        - patch #1 adds BPF-side implementation;
        - patch #2 adds user-space APIs and wires bpf_link for USDTs;
        - patch #3 adds the most mundate pieces: handling ELF, parsing USDT notes,
          dealing with memory segments, relative vs absolute addresses, etc;
        - patch #4 adds internal ID allocation and setting up/tearing down of
          BPF-side state (spec and IP-to-ID mapping);
        - patch #5 implements x86/x86-64-specific logic of parsing USDT argument
          specifications;
        - patch #6 adds testing of various basic aspects of handling of USDT;
        - patch #7 extends the set of tests with more combinations of semaphore,
          executable vs shared library, and PID filter options.
      
        [0] https://github.com/libbpf/libbpf-bootstrap
        [1] https://github.com/iovisor/bcc/tree/master/libbpf-tools
      
      v2->v3:
        - fix typos, leave link to systemtap doc, acks, etc (Dave);
        - include sys/sdt.h to avoid extra system-wide package dependencies;
      v1->v2:
        - huge high-level comment describing how all the moving parts fit together
          (Alan, Alexei);
        - switched from `__hidden __weak` to `static inline __noinline` for now, as
          there is a bug in BPF linker breaking final BPF object file due to invalid
          .BTF.ext data; I want to fix it separately at which point I'll switch back
          to __hidden __weak again. The fix isn't trivial, so I don't want to block
          on that. Same for __weak variable lookup bug that Henqi reported.
        - various fixes and improvements, addressing other feedback (Alan, Hengqi);
      
      Cc: Alan Maguire <alan.maguire@oracle.com>
      Cc: Dave Marchevsky <davemarchevsky@fb.com>
      Cc: Hengqi Chen <hengqi.chen@gmail.com>
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9a7ef9f8
    • Andrii Nakryiko's avatar
      selftests/bpf: Add urandom_read shared lib and USDTs · 00a0fa2d
      Andrii Nakryiko authored
      Extend urandom_read helper binary to include USDTs of 4 combinations:
      semaphore/semaphoreless (refcounted and non-refcounted) and based in
      executable or shared library. We also extend urandom_read with ability
      to report it's own PID to parent process and wait for parent process to
      ready itself up for tracing urandom_read. We utilize popen() and
      underlying pipe properties for proper signaling.
      
      Once urandom_read is ready, we add few tests to validate that libbpf's
      USDT attachment handles all the above combinations of semaphore (or lack
      of it) and static or shared library USDTs. Also, we validate that libbpf
      handles shared libraries both with PID filter and without one (i.e., -1
      for PID argument).
      
      Having the shared library case tested with and without PID is important
      because internal logic differs on kernels that don't support BPF
      cookies. On such older kernels, attaching to USDTs in shared libraries
      without specifying concrete PID doesn't work in principle, because it's
      impossible to determine shared library's load address to derive absolute
      IPs for uprobe attachments. Without absolute IPs, it's impossible to
      perform correct look up of USDT spec based on uprobe's absolute IP (the
      only kind available from BPF at runtime). This is not the problem on
      newer kernels with BPF cookie as we don't need IP-to-ID lookup because
      BPF cookie value *is* spec ID.
      
      So having those two situations as separate subtests is good because
      libbpf CI is able to test latest selftests against old kernels (e.g.,
      4.9 and 5.5), so we'll be able to disable PID-less shared lib attachment
      for old kernels, but will still leave PID-specific one enabled to validate
      this legacy logic is working correctly.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/bpf/20220404234202.331384-8-andrii@kernel.org
      00a0fa2d
    • Andrii Nakryiko's avatar
      selftests/bpf: Add basic USDT selftests · 630301b0
      Andrii Nakryiko authored
      Add semaphore-based USDT to test_progs itself and write basic tests to
      valicate both auto-attachment and manual attachment logic, as well as
      BPF-side functionality.
      
      Also add subtests to validate that libbpf properly deduplicates USDT
      specs and handles spec overflow situations correctly, as well as proper
      "rollback" of partially-attached multi-spec USDT.
      
      BPF-side of selftest intentionally consists of two files to validate
      that usdt.bpf.h header can be included from multiple source code files
      that are subsequently linked into final BPF object file without causing
      any symbol duplication or other issues. We are validating that __weak
      maps and bpf_usdt_xxx() API functions defined in usdt.bpf.h do work as
      intended.
      
      USDT selftests utilize sys/sdt.h header that on Ubuntu systems comes
      from systemtap-sdt-devel package. But to simplify everyone's life,
      including CI but especially casual contributors to bpf/bpf-next that
      are trying to build selftests, I've checked in sys/sdt.h header from [0]
      directly. This way it will work on all architectures and distros without
      having to figure it out for every relevant combination and adding any
      extra implicit package dependencies.
      
        [0] https://sourceware.org/git?p=systemtap.git;a=blob_plain;f=includes/sys/sdt.h;h=ca0162b4dc57520b96638c8ae79ad547eb1dd3a1;hb=HEADSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Acked-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/bpf/20220404234202.331384-7-andrii@kernel.org
      630301b0
    • Andrii Nakryiko's avatar
      libbpf: Add x86-specific USDT arg spec parsing logic · 4c59e584
      Andrii Nakryiko authored
      Add x86/x86_64-specific USDT argument specification parsing. Each
      architecture will require their own logic, as all this is arch-specific
      assembly-based notation. Architectures that libbpf doesn't support for
      USDTs will pr_warn() with specific error and return -ENOTSUP.
      
      We use sscanf() as a very powerful and easy to use string parser. Those
      spaces in sscanf's format string mean "skip any whitespaces", which is
      pretty nifty (and somewhat little known) feature.
      
      All this was tested on little-endian architecture, so bit shifts are
      probably off on big-endian, which our CI will hopefully prove.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Reviewed-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/bpf/20220404234202.331384-6-andrii@kernel.org
      4c59e584
    • Andrii Nakryiko's avatar
      libbpf: Wire up spec management and other arch-independent USDT logic · 999783c8
      Andrii Nakryiko authored
      Last part of architecture-agnostic user-space USDT handling logic is to
      set up BPF spec and, optionally, IP-to-ID maps from user-space.
      usdt_manager performs a compact spec ID allocation to utilize
      fixed-sized BPF maps as efficiently as possible. We also use hashmap to
      deduplicate USDT arg spec strings and map identical strings to single
      USDT spec, minimizing the necessary BPF map size. usdt_manager supports
      arbitrary sequences of attachment and detachment, both of the same USDT
      and multiple different USDTs and internally maintains a free list of
      unused spec IDs. bpf_link_usdt's logic is extended with proper setup and
      teardown of this spec ID free list and supporting BPF maps.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Reviewed-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/bpf/20220404234202.331384-5-andrii@kernel.org
      999783c8
    • Andrii Nakryiko's avatar
      libbpf: Add USDT notes parsing and resolution logic · 74cc6311
      Andrii Nakryiko authored
      Implement architecture-agnostic parts of USDT parsing logic. The code is
      the documentation in this case, it's futile to try to succinctly
      describe how USDT parsing is done in any sort of concreteness. But
      still, USDTs are recorded in special ELF notes section (.note.stapsdt),
      where each USDT call site is described separately. Along with USDT
      provider and USDT name, each such note contains USDT argument
      specification, which uses assembly-like syntax to describe how to fetch
      value of USDT argument. USDT arg spec could be just a constant, or
      a register, or a register dereference (most common cases in x86_64), but
      it technically can be much more complicated cases, like offset relative
      to global symbol and stuff like that. One of the later patches will
      implement most common subset of this for x86 and x86-64 architectures,
      which seems to handle a lot of real-world production application.
      
      USDT arg spec contains a compact encoding allowing usdt.bpf.h from
      previous patch to handle the above 3 cases. Instead of recording which
      register might be needed, we encode register's offset within struct
      pt_regs to simplify BPF-side implementation. USDT argument can be of
      different byte sizes (1, 2, 4, and 8) and signed or unsigned. To handle
      this, libbpf pre-calculates necessary bit shifts to do proper casting
      and sign-extension in a short sequences of left and right shifts.
      
      The rest is in the code with sometimes extensive comments and references
      to external "documentation" for USDTs.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Reviewed-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/bpf/20220404234202.331384-4-andrii@kernel.org
      74cc6311
    • Andrii Nakryiko's avatar
      libbpf: Wire up USDT API and bpf_link integration · 2e4913e0
      Andrii Nakryiko authored
      Wire up libbpf USDT support APIs without yet implementing all the
      nitty-gritty details of USDT discovery, spec parsing, and BPF map
      initialization.
      
      User-visible user-space API is simple and is conceptually very similar
      to uprobe API.
      
      bpf_program__attach_usdt() API allows to programmatically attach given
      BPF program to a USDT, specified through binary path (executable or
      shared lib), USDT provider and name. Also, just like in uprobe case, PID
      filter is specified (0 - self, -1 - any process, or specific PID).
      Optionally, USDT cookie value can be specified. Such single API
      invocation will try to discover given USDT in specified binary and will
      use (potentially many) BPF uprobes to attach this program in correct
      locations.
      
      Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
      represents this attachment. It is a virtual BPF link that doesn't have
      direct kernel object, as it can consist of multiple underlying BPF
      uprobe links. As such, attachment is not atomic operation and there can
      be brief moment when some USDT call sites are attached while others are
      still in the process of attaching. This should be taken into
      consideration by user. But bpf_program__attach_usdt() guarantees that
      in the case of success all USDT call sites are successfully attached, or
      all the successfuly attachments will be detached as soon as some USDT
      call sites failed to be attached. So, in theory, there could be cases of
      failed bpf_program__attach_usdt() call which did trigger few USDT
      program invocations. This is unavoidable due to multi-uprobe nature of
      USDT and has to be handled by user, if it's important to create an
      illusion of atomicity.
      
      USDT BPF programs themselves are marked in BPF source code as either
      SEC("usdt"), in which case they won't be auto-attached through
      skeleton's <skel>__attach() method, or it can have a full definition,
      which follows the spirit of fully-specified uprobes:
      SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
      attach method will attempt auto-attachment. Similarly, generic
      bpf_program__attach() will have enought information to go off of for
      parameterless attachment.
      
      USDT BPF programs are actually uprobes, and as such for kernel they are
      marked as BPF_PROG_TYPE_KPROBE.
      
      Another part of this patch is USDT-related feature probing:
        - BPF cookie support detection from user-space;
        - detection of kernel support for auto-refcounting of USDT semaphore.
      
      The latter is optional. If kernel doesn't support such feature and USDT
      doesn't rely on USDT semaphores, no error is returned. But if libbpf
      detects that USDT requires setting semaphores and kernel doesn't support
      this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
      support poking process's memory directly to increment semaphore value,
      like BCC does on legacy kernels, due to inherent raciness and danger of
      such process memory manipulation. Libbpf let's kernel take care of this
      properly or gives up.
      
      Logistically, all the extra USDT-related infrastructure of libbpf is put
      into a separate usdt.c file and abstracted behind struct usdt_manager.
      Each bpf_object has lazily-initialized usdt_manager pointer, which is
      only instantiated if USDT programs are attempted to be attached. Closing
      BPF object frees up usdt_manager resources. usdt_manager keeps track of
      USDT spec ID assignment and few other small things.
      
      Subsequent patches will fill out remaining missing pieces of USDT
      initialization and setup logic.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Link: https://lore.kernel.org/bpf/20220404234202.331384-3-andrii@kernel.org
      2e4913e0
    • Andrii Nakryiko's avatar
      libbpf: Add BPF-side of USDT support · d72e2968
      Andrii Nakryiko authored
      Add BPF-side implementation of libbpf-provided USDT support. This
      consists of single header library, usdt.bpf.h, which is meant to be used
      from user's BPF-side source code. This header is added to the list of
      installed libbpf header, along bpf_helpers.h and others.
      
      BPF-side implementation consists of two BPF maps:
        - spec map, which contains "a USDT spec" which encodes information
          necessary to be able to fetch USDT arguments and other information
          (argument count, user-provided cookie value, etc) at runtime;
        - IP-to-spec-ID map, which is only used on kernels that don't support
          BPF cookie feature. It allows to lookup spec ID based on the place
          in user application that triggers USDT program.
      
      These maps have default sizes, 256 and 1024, which are chosen
      conservatively to not waste a lot of space, but handling a lot of common
      cases. But there could be cases when user application needs to either
      trace a lot of different USDTs, or USDTs are heavily inlined and their
      arguments are located in a lot of differing locations. For such cases it
      might be necessary to size those maps up, which libbpf allows to do by
      overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.
      
      It is an important aspect to keep in mind. Single USDT (user-space
      equivalent of kernel tracepoint) can have multiple USDT "call sites".
      That is, single logical USDT is triggered from multiple places in user
      application. This can happen due to function inlining. Each such inlined
      instance of USDT invocation can have its own unique USDT argument
      specification (instructions about the location of the value of each of
      USDT arguments). So while USDT looks very similar to usual uprobe or
      kernel tracepoint, under the hood it's actually a collection of uprobes,
      each potentially needing different spec to know how to fetch arguments.
      
      User-visible API consists of three helper functions:
        - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT;
        - bpf_usdt_arg(), which reads value of specified USDT argument (by
          it's zero-indexed position) and returns it as 64-bit value;
        - bpf_usdt_cookie(), which functions like BPF cookie for USDT
          programs; this is necessary as libbpf doesn't allow specifying actual
          BPF cookie and utilizes it internally for USDT support implementation.
      
      Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
      BPF program. On kernels that don't support BPF cookie it is used to
      fetch absolute IP address of the underlying uprobe.
      
      usdt.bpf.h also provides BPF_USDT() macro, which functions like
      BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
      get access to USDT arguments, if USDT definition is static and known to
      the user. It is expected that majority of use cases won't have to use
      bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover
      all their needs.
      
      Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
      detect kernel support for BPF cookie. If BPF CO-RE dependency is
      undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
      either a boolean constant (or equivalently zero and non-zero), or even
      point it to its own .rodata variable that can be specified from user's
      application user-space code. It is important that
      BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus
      .rodata and not just .data), as otherwise BPF code will still contain
      bpf_get_attach_cookie() BPF helper call and will fail validation at
      runtime, if not dead-code eliminated.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Link: https://lore.kernel.org/bpf/20220404234202.331384-2-andrii@kernel.org
      d72e2968
  5. 04 Apr, 2022 3 commits