1. 30 Sep, 2020 8 commits
  2. 29 Sep, 2020 32 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf, x64: optimize JIT's pro/epilogue' · 67e4ca74
      Alexei Starovoitov authored
      Maciej Fijalkowski says:
      
      ====================
      Hi!
      
      This small set can be considered as a followup after recent addition of
      support for tailcalls in bpf subprograms and is focused on optimizing
      x64 JIT prologue and epilogue sections.
      
      Turns out the popping tail call counter is not needed anymore and %rsp
      handling when stack depth is 0 can be skipped.
      
      For longer explanations, please see commit messages.
      
      Thank you,
      Maciej
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      67e4ca74
    • Maciej Fijalkowski's avatar
      bpf: x64: Do not emit sub/add 0, %rsp when !stack_depth · 4d0b8c0b
      Maciej Fijalkowski authored
      There is no particular reason for keeping the "sub 0, %rsp" insn within
      the BPF's x64 JIT prologue.
      
      When tail call code was skipping the whole prologue section these 7
      bytes that represent the rsp subtraction could not be simply discarded
      as the jump target address would be broken. An option to address that
      would be to substitute it with nop7.
      
      Right now tail call is skipping only first 11 bytes of target program's
      prologue and "sub X, %rsp" is the first insn that is processed, so if
      stack depth is zero then this insn could be omitted without the need for
      nop7 swap.
      
      Therefore, do not emit the "sub 0, %rsp" in prologue when program is not
      making use of R10 register. Also, make the emission of "add X, %rsp"
      conditional in tail call code logic and take into account the presence
      of mentioned insn when calculating the jump offsets.
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200929204653.4325-3-maciej.fijalkowski@intel.com
      4d0b8c0b
    • Maciej Fijalkowski's avatar
      bpf, x64: Drop "pop %rcx" instruction on BPF JIT epilogue · d207929d
      Maciej Fijalkowski authored
      Back when all of the callee-saved registers where always pushed to stack
      in x64 JIT prologue, tail call counter was placed at the bottom of the
      BPF program's stack frame that had a following layout:
      
      +-------------+
      |  ret addr   |
      +-------------+
      |     rbp     | <- rbp
      +-------------+
      |             |
      | free space  |
      | from:       |
      | sub $x,%rsp |
      |             |
      +-------------+
      |     rbx     |
      +-------------+
      |     r13     |
      +-------------+
      |     r14     |
      +-------------+
      |     r15     |
      +-------------+
      |  tail call  | <- rsp
      |   counter   |
      +-------------+
      
      In order to restore the callee saved registers, epilogue needed to
      explicitly toss away the tail call counter via "pop %rbx" insn, so that
      %rsp would be back at the place where %r15 was stored.
      
      Currently, the tail call counter is placed on stack *before* the callee
      saved registers (brackets on rbx through r15 mean that they are now
      pushed to stack only if they are used):
      
      +-------------+
      |  ret addr   |
      +-------------+
      |     rbp     | <- rbp
      +-------------+
      |             |
      | free space  |
      | from:       |
      | sub $x,%rsp |
      |             |
      +-------------+
      |  tail call  |
      |   counter   |
      +-------------+
      (     rbx     )
      +-------------+
      (     r13     )
      +-------------+
      (     r14     )
      +-------------+
      (     r15     ) <- rsp
      +-------------+
      
      For the record, the epilogue insns consist of (assuming all of the
      callee saved registers are used by program):
      pop    %r15
      pop    %r14
      pop    %r13
      pop    %rbx
      pop    %rcx
      leaveq
      retq
      
      "pop %rbx" for getting rid of tail call counter was not an option
      anymore as it would overwrite the restored value of %rbx register, so it
      was changed to use the %rcx register.
      
      Since epilogue can start popping the callee saved registers right away
      without any additional work, the "pop %rcx" could be dropped altogether
      as "leave" insn will simply move the %rbp to %rsp. IOW, tail call
      counter does not need the explicit handling.
      
      Having in mind the explanation above and the actual reason for that,
      let's piggy back on "leave" insn for discarding the tail call counter
      from stack and remove the "pop %rcx" from epilogue.
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200929204653.4325-2-maciej.fijalkowski@intel.com
      d207929d
    • Ilya Leoshkevich's avatar
      selftests/bpf: Fix endianness issues in sk_lookup/ctx_narrow_access · 6458bde3
      Ilya Leoshkevich authored
      This test makes a lot of narrow load checks while assuming little
      endian architecture, and therefore fails on s390.
      
      Fix by introducing LSB and LSW macros and using them to perform narrow
      loads.
      
      Fixes: 0ab5539f ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200929201814.44360-1-iii@linux.ibm.com
      6458bde3
    • John Fastabend's avatar
      bpf, selftests: Fix warning in snprintf_btf where system() call unchecked · c810b31e
      John Fastabend authored
      On my systems system() calls are marked with warn_unused_result
      apparently. So without error checking we get this warning,
      
      ./prog_tests/snprintf_btf.c:30:9: warning: ignoring return value
         of ‘system’, declared with attribute warn_unused_result[-Wunused-result]
      
      Also it seems like a good idea to check the return value anyways
      to ensure ping exists even if its seems unlikely.
      
      Fixes: 076a95f5 ("selftests/bpf: Add bpf_snprintf_btf helper tests")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/160141006897.25201.12095049414156293265.stgit@john-Precision-5820-Tower
      c810b31e
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: Support multi-attach for freplace' · 93b8713d
      Alexei Starovoitov authored
      Toke Høiland-Jørgensen says:
      
      ====================
      This series adds support attaching freplace BPF programs to multiple targets.
      This is needed to support incremental attachment of multiple XDP programs using
      the libxdp dispatcher model.
      
      Patch 1 moves prog_aux->linked_prog and the trampoline to be embedded in
      bpf_tracing_link on attach, and freed by the link release logic, and introduces
      a mutex to protect the writing of the pointers in prog->aux.
      
      Based on this refactoring (and previously applied patches), it becomes pretty
      straight-forward to support multiple-attach for freplace programs (patch 2).
      This is simply a matter of creating a second bpf_tracing_link if a target is
      supplied. However, for API consistency with other types of link attach, this
      option is added to the BPF_LINK_CREATE API instead of extending
      bpf_raw_tracepoint_open().
      
      Patch 3 is a port of Jiri Olsa's patch to support fentry/fexit on freplace
      programs. His approach of getting the target type from the target program
      reference no longer works after we've gotten rid of linked_prog (because the
      bpf_tracing_link reference disappears on attach). Instead, we used the saved
      reference to the target prog type that is also used to verify compatibility on
      secondary freplace attachment.
      
      Patch 4 is the accompanying libbpf update, and patches 5-7 are selftests: patch
      5 tests for the multi-freplace functionality itself; patch 6 is Jiri's previous
      selftest for the fentry-to-freplace fix; patch 7 is a test for the change
      introduced in the previously-applied patches, blocking MODIFY_RETURN functions
      from attaching to other BPF programs.
      
      With this series, libxdp and xdp-tools can successfully attach multiple programs
      one at a time. To play with this, use the 'freplace-multi-attach' branch of
      xdp-tools:
      
      $ git clone --recurse-submodules --branch freplace-multi-attach https://github.com/xdp-project/xdp-tools
      $ cd xdp-tools/xdp-loader
      $ make
      $ sudo ./xdp-loader load veth0 ../lib/testing/xdp_drop.o
      $ sudo ./xdp-loader load veth0 ../lib/testing/xdp_pass.o
      $ sudo ./xdp-loader status
      
      The series is also available here:
      https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=bpf-freplace-multi-attach-alt-10
      
      Changelog:
      
      v10:
        - Dial back the s/tgt_/dst_/ replacement a bit
        - Fix smatch warning (from ktest robot)
        - Rebase to bpf-next, drop already-applied patches
      
      v9:
        - Clarify commit message of patch 3
        - Add new struct bpf_attach_target_info for returning from
          bpf_check_attach_target() and passing to bpf_trampoline_get()
        - Move trampoline key computation into a helper
        - Make sure we don't break bpffs debug umh
        - Add some comment blocks explaining the logic flow in
          bpf_tracing_prog_attach()
        - s/tgt_/dst_/ in prog->aux, and for local variables using those members
        - Always drop dst_trampoline and dst_prog from prog->aux on first attach
        - Don't remove syscall fmod_ret test from selftest benchmarks
        - Add saved_ prefix to dst_{prog,attach}_type members in prog_aux
        - Drop prog argument from check_attach_modify_return()
        - Add comment about possible NULL of tr_link->tgt_prog on link_release()
      
      v8:
        - Add a separate error message when trying to attach FMOD_REPLACE to tgt_prog
        - Better error messages in bpf_program__attach_freplace()
        - Don't lock mutex when setting tgt_* pointers in prog create and verifier
        - Remove fmod_ret programs from benchmarks in selftests (new patch 11)
        - Fix a few other nits in selftests
      
      v7:
        - Add back missing ptype == prog->type check in link_create()
        - Use tracing_bpf_link_attach() instead of separate freplace_bpf_link_attach()
        - Don't break attachment of bpf_iters in libbpf (by clobbering link_create.iter_info)
      
      v6:
        - Rebase to latest bpf-next
        - Simplify logic in bpf_tracing_prog_attach()
        - Don't create a new attach_type for link_create(), disambiguate on prog->type
          instead
        - Use raw_tracepoint_open() in libbpf bpf_program__attach_ftrace() if called
          with NULL target
        - Switch bpf_program__attach_ftrace() to take function name as parameter
          instead of btf_id
        - Add a patch disallowing MODIFY_RETURN programs from attaching to other BPF
          programs, and an accompanying selftest (patches 1 and 10)
      
      v5:
        - Fix typo in inline function definition of bpf_trampoline_get()
        - Don't put bpf_tracing_link in prog->aux, use a mutex to protect tgt_prog and
          trampoline instead, and move them to the link on attach.
        - Restore Jiri as author of the last selftest patch
      
      v4:
        - Cleanup the refactored check_attach_btf_id() to make the logic easier to follow
        - Fix cleanup paths for bpf_tracing_link
        - Use xchg() for removing the bpf_tracing_link from prog->aux and restore on (some) failures
        - Use BPF_LINK_CREATE operation to create link with target instead of extending raw_tracepoint_open
        - Fold update of tools/ UAPI header into main patch
        - Update arg dereference patch to use skeletons and set_attach_target()
      
      v3:
        - Get rid of prog_aux->linked_prog entirely in favour of a bpf_tracing_link
        - Incorporate Jiri's fix for attaching fentry to freplace programs
      
      v2:
        - Drop the log arguments from bpf_raw_tracepoint_open
        - Fix kbot errors
        - Rebase to latest bpf-next
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      93b8713d
    • Toke Høiland-Jørgensen's avatar
      selftests: Add selftest for disallowing modify_return attachment to freplace · bee4b7e6
      Toke Høiland-Jørgensen authored
      This adds a selftest that ensures that modify_return tracing programs
      cannot be attached to freplace programs. The security_ prefix is added to
      the freplace program because that would otherwise let it pass the check for
      modify_return.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/160138355713.48470.3811074984255709369.stgit@toke.dk
      bee4b7e6
    • Jiri Olsa's avatar
      selftests/bpf: Adding test for arg dereference in extension trace · 17d3f386
      Jiri Olsa authored
      Adding test that setup following program:
      
        SEC("classifier/test_pkt_md_access")
        int test_pkt_md_access(struct __sk_buff *skb)
      
      with its extension:
      
        SEC("freplace/test_pkt_md_access")
        int test_pkt_md_access_new(struct __sk_buff *skb)
      
      and tracing that extension with:
      
        SEC("fentry/test_pkt_md_access_new")
        int BPF_PROG(fentry, struct sk_buff *skb)
      
      The test verifies that the tracing program can
      dereference skb argument properly.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/160138355603.48470.9072073357530773228.stgit@toke.dk
      17d3f386
    • Toke Høiland-Jørgensen's avatar
      selftests: Add test for multiple attachments of freplace program · f6429476
      Toke Høiland-Jørgensen authored
      This adds a selftest for attaching an freplace program to multiple targets
      simultaneously.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/160138355497.48470.17568077161540217107.stgit@toke.dk
      f6429476
    • Toke Høiland-Jørgensen's avatar
      libbpf: Add support for freplace attachment in bpf_link_create · a5359091
      Toke Høiland-Jørgensen authored
      This adds support for supplying a target btf ID for the bpf_link_create()
      operation, and adds a new bpf_program__attach_freplace() high-level API for
      attaching freplace functions with a target.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/160138355387.48470.18026176785351166890.stgit@toke.dk
      a5359091
    • Toke Høiland-Jørgensen's avatar
      bpf: Fix context type resolving for extension programs · 43bc2874
      Toke Høiland-Jørgensen authored
      Eelco reported we can't properly access arguments if the tracing
      program is attached to extension program.
      
      Having following program:
      
        SEC("classifier/test_pkt_md_access")
        int test_pkt_md_access(struct __sk_buff *skb)
      
      with its extension:
      
        SEC("freplace/test_pkt_md_access")
        int test_pkt_md_access_new(struct __sk_buff *skb)
      
      and tracing that extension with:
      
        SEC("fentry/test_pkt_md_access_new")
        int BPF_PROG(fentry, struct sk_buff *skb)
      
      It's not possible to access skb argument in the fentry program,
      with following error from verifier:
      
        ; int BPF_PROG(fentry, struct sk_buff *skb)
        0: (79) r1 = *(u64 *)(r1 +0)
        invalid bpf_context access off=0 size=8
      
      The problem is that btf_ctx_access gets the context type for the
      traced program, which is in this case the extension.
      
      But when we trace extension program, we want to get the context
      type of the program that the extension is attached to, so we can
      access the argument properly in the trace program.
      
      This version of the patch is tweaked slightly from Jiri's original one,
      since the refactoring in the previous patches means we have to get the
      target prog type from the new variable in prog->aux instead of directly
      from the target prog.
      Reported-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Suggested-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/160138355278.48470.17057040257274725638.stgit@toke.dk
      43bc2874
    • Toke Høiland-Jørgensen's avatar
      bpf: Support attaching freplace programs to multiple attach points · 4a1e7c0c
      Toke Høiland-Jørgensen authored
      This enables support for attaching freplace programs to multiple attach
      points. It does this by amending the UAPI for bpf_link_Create with a target
      btf ID that can be used to supply the new attachment point along with the
      target program fd. The target must be compatible with the target that was
      supplied at program load time.
      
      The implementation reuses the checks that were factored out of
      check_attach_btf_id() to ensure compatibility between the BTF types of the
      old and new attachment. If these match, a new bpf_tracing_link will be
      created for the new attach target, allowing multiple attachments to
      co-exist simultaneously.
      
      The code could theoretically support multiple-attach of other types of
      tracing programs as well, but since I don't have a use case for any of
      those, there is no API support for doing so.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk
      4a1e7c0c
    • Toke Høiland-Jørgensen's avatar
      bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach · 3aac1ead
      Toke Høiland-Jørgensen authored
      In preparation for allowing multiple attachments of freplace programs, move
      the references to the target program and trampoline into the
      bpf_tracing_link structure when that is created. To do this atomically,
      introduce a new mutex in prog->aux to protect writing to the two pointers
      to target prog and trampoline, and rename the members to make it clear that
      they are related.
      
      With this change, it is no longer possible to attach the same tracing
      program multiple times (detaching in-between), since the reference from the
      tracing program to the target disappears on the first attach. However,
      since the next patch will let the caller supply an attach target, that will
      also make it possible to attach to the same place multiple times.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/160138355059.48470.2503076992210324984.stgit@toke.dk
      3aac1ead
    • Alexei Starovoitov's avatar
      Merge branch 'libbpf: support loading/storing any BTF' · 85e3f318
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      Add support for loading and storing BTF in either little- or big-endian
      integer encodings, regardless of host endianness. This allows users of libbpf
      to not care about endianness when they don't want to and transparently
      open/load BTF of any endianness. libbpf will preserve original endianness and
      will convert output raw data as necessary back to original endianness, if
      necessary. This allows tools like pahole to be ignorant to such issues during
      cross-compilation.
      
      While working with BTF data in memory, the endianness is always native to the
      host. Convetion can happen only during btf__get_raw_data() call, and only in
      a raw data copy.
      
      Additionally, it's possible to force output BTF endianness through new
      btf__set_endianness() API. This which allows to create flexible tools doing
      arbitrary conversions of BTF endianness, just by relying on libbpf.
      
      Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
      Cc: Tony Ambardar <tony.ambardar@gmail.com>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Luka Perkov <luka.perkov@sartura.hr>
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      85e3f318
    • Andrii Nakryiko's avatar
      selftests/bpf: Test BTF's handling of endianness · ed9cf248
      Andrii Nakryiko authored
      Add selftests juggling endianness back and forth to validate BTF's handling of
      endianness convertions internally.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200929043046.1324350-4-andriin@fb.com
      ed9cf248
    • Andrii Nakryiko's avatar
      libbpf: Support BTF loading and raw data output in both endianness · 3289959b
      Andrii Nakryiko authored
      Teach BTF to recognized wrong endianness and transparently convert it
      internally to host endianness. Original endianness of BTF will be preserved
      and used during btf__get_raw_data() to convert resulting raw data to the same
      endianness and a source raw_data. This means that little-endian host can parse
      big-endian BTF with no issues, all the type data will be presented to the
      client application in native endianness, but when it's time for emitting BTF
      to persist it in a file (e.g., after BTF deduplication), original non-native
      endianness will be preserved and stored.
      
      It's possible to query original endianness of BTF data with new
      btf__endianness() API. It's also possible to override desired output
      endianness with btf__set_endianness(), so that if application needs to load,
      say, big-endian BTF and store it as little-endian BTF, it's possible to
      manually override this. If btf__set_endianness() was used to change
      endianness, btf__endianness() will reflect overridden endianness.
      
      Given there are no known use cases for supporting cross-endianness for
      .BTF.ext, loading .BTF.ext in non-native endianness is not supported.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200929043046.1324350-3-andriin@fb.com
      3289959b
    • Andrii Nakryiko's avatar
      selftests/bpf: Move and extend ASSERT_xxx() testing macros · 22ba3635
      Andrii Nakryiko authored
      Move existing ASSERT_xxx() macros out of btf_write selftest into test_progs.h
      to use across all selftests. Also expand a set of macros for typical cases.
      
      Now there are the following macros:
        - ASSERT_EQ() -- check for equality of two integers;
        - ASSERT_STREQ() -- check for equality of two C strings;
        - ASSERT_OK() -- check for successful (zero) return result;
        - ASSERT_ERR() -- check for unsuccessful (non-zero) return result;
        - ASSERT_NULL() -- check for NULL pointer;
        - ASSERT_OK_PTR() -- check for a valid pointer;
        - ASSERT_ERR_PTR() -- check for NULL or negative error encoded in a pointer.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200929043046.1324350-2-andriin@fb.com
      22ba3635
    • Toke Høiland-Jørgensen's avatar
      selftests: Make sure all 'skel' variables are declared static · f970cbcd
      Toke Høiland-Jørgensen authored
      If programs in prog_tests using skeletons declare the 'skel' variable as
      global but not static, that will lead to linker errors on the final link of
      the prog_tests binary due to duplicate symbols. Fix a few instances of this.
      
      Fixes: b18c1f0a ("bpf: selftest: Adapt sock_fields test to use skel and global variables")
      Fixes: 9a856cae ("bpf: selftest: Add test_btf_skc_cls_ingress")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200929123026.46751-1-toke@redhat.com
      f970cbcd
    • Ciara Loftus's avatar
      xsk: Fix a documentation mistake in xsk_queue.h · f1fc8ece
      Ciara Loftus authored
      After 'peeking' the ring, the consumer, not the producer, reads the data.
      Fix this mistake in the comments.
      
      Fixes: 15d8c916 ("xsk: Add function naming comments and reorder functions")
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20200928082344.17110-1-ciara.loftus@intel.com
      f1fc8ece
    • Toke Høiland-Jørgensen's avatar
      selftests/bpf_iter: Don't fail test due to missing __builtin_btf_type_id · d2197c7f
      Toke Høiland-Jørgensen authored
      The new test for task iteration in bpf_iter checks (in do_btf_read()) if it
      should be skipped due to missing __builtin_btf_type_id. However, this
      'skip' verdict is not propagated to the caller, so the parent test will
      still fail. Fix this by also skipping the rest of the parent test if the
      skip condition was reached.
      
      Fixes: b72091bd ("selftests/bpf: Add test for bpf_seq_printf_btf helper")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Link: https://lore.kernel.org/bpf/20200929123004.46694-1-toke@redhat.com
      d2197c7f
    • Toke Høiland-Jørgensen's avatar
      bpf/preload: Make sure Makefile cleans up after itself, and add .gitignore · 9d9aae53
      Toke Høiland-Jørgensen authored
      The Makefile in bpf/preload builds a local copy of libbpf, but does not
      properly clean up after itself. This can lead to subsequent compilation
      failures, since the feature detection cache is kept around which can lead
      subsequent detection to fail.
      
      Fix this by properly setting clean-files, and while we're at it, also add a
      .gitignore for the directory to ignore the build artifacts.
      
      Fixes: d71fa5c9 ("bpf: Add kernel module with user mode driver that populates bpffs.")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200927193005.8459-1-toke@redhat.com
      9d9aae53
    • Alexei Starovoitov's avatar
      Merge branch 'selftests/bpf: BTF-based kernel data display' · 3aae4a38
      Alexei Starovoitov authored
      Alan Maguire says:
      
      ====================
      Resolve issues in bpf selftests introduced with BTF-based kernel data
      display selftests; these are
      
      - a warning introduced in snprintf_btf.c; and
      - compilation failures with old kernels vmlinux.h
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3aae4a38
    • Alan Maguire's avatar
      selftests/bpf: Ensure snprintf_btf/bpf_iter tests compatibility with old vmlinux.h · cfe77683
      Alan Maguire authored
      Andrii reports that bpf selftests relying on "struct btf_ptr" and BTF_F_*
      values will not build as vmlinux.h for older kernels will not include
      "struct btf_ptr" or the BTF_F_* enum values.  Undefine and redefine
      them to work around this.
      
      Fixes: b72091bd ("selftests/bpf: Add test for bpf_seq_printf_btf helper")
      Fixes: 076a95f5 ("selftests/bpf: Add bpf_snprintf_btf helper tests")
      Reported-by: default avatarAndrii Nakryiko <andrii.nakryiko@gmail.com>
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/1601379151-21449-3-git-send-email-alan.maguire@oracle.com
      cfe77683
    • Alan Maguire's avatar
      selftests/bpf: Fix unused-result warning in snprintf_btf.c · 96c48058
      Alan Maguire authored
      Daniel reports:
      
      +    system("ping -c 1 127.0.0.1 > /dev/null");
      
      This generates the following new warning when compiling BPF selftests:
      
        [...]
        EXT-OBJ  [test_progs] cgroup_helpers.o
        EXT-OBJ  [test_progs] trace_helpers.o
        EXT-OBJ  [test_progs] network_helpers.o
        EXT-OBJ  [test_progs] testing_helpers.o
        TEST-OBJ [test_progs] snprintf_btf.test.o
      /root/bpf-next/tools/testing/selftests/bpf/prog_tests/snprintf_btf.c: In function ‘test_snprintf_btf’:
      /root/bpf-next/tools/testing/selftests/bpf/prog_tests/snprintf_btf.c:30:2: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result]
        system("ping -c 1 127.0.0.1 > /dev/null");
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        [...]
      
      Fixes: 076a95f5 ("selftests/bpf: Add bpf_snprintf_btf helper tests")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601379151-21449-2-git-send-email-alan.maguire@oracle.com
      96c48058
    • John Fastabend's avatar
      bpf, selftests: Fix cast to smaller integer type 'int' warning in raw_tp · 00e8c44a
      John Fastabend authored
      Fix warning in bpf selftests,
      
      progs/test_raw_tp_test_run.c:18:10: warning: cast to smaller integer type 'int' from 'struct task_struct *' [-Wpointer-to-int-cast]
      
      Change int type cast to long to fix. Discovered with gcc-9 and llvm-11+
      where llvm was recent main branch.
      
      Fixes: 09d8ad16 ("selftests/bpf: Add raw_tp_test_run")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/160134424745.11199.13841922833336698133.stgit@john-Precision-5820-Tower
      00e8c44a
    • Alexei Starovoitov's avatar
      Merge branch 'libbpf: BTF writer APIs' · bc600908
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      This patch set introduces a new set of BTF APIs to libbpf that allow to
      conveniently produce BTF types and strings. These APIs will allow libbpf to do
      more intrusive modifications of program's BTF (by rewriting it, at least as of
      right now), which is necessary for the upcoming libbpf static linking. But
      they are complete and generic, so can be adopted by anyone who has a need to
      produce BTF type information.
      
      One such example outside of libbpf is pahole, which was actually converted to
      these APIs (locally, pending landing of these changes in libbpf) completely
      and shows reduction in amount of custom pahole code necessary and brings nice
      savings in memory usage (about 370MB reduction at peak for my kernel
      configuration) and even BTF deduplication times (one second reduction,
      23.7s -> 22.7s). Memory savings are due to avoiding pahole's own copy of
      "uncompressed" raw BTF data. Time reduction comes from faster string
      search and deduplication by relying on hashmap instead of BST used by pahole's
      own code. Consequently, these APIs are already tested on real-world
      complicated kernel BTF, but there is also pretty extensive selftest doing
      extra validations.
      
      Selftests in patch #3 add a set of generic ASSERT_{EQ,STREQ,ERR,OK} macros
      that are useful for writing shorter and less repretitive selftests. I decided
      to keep them local to that selftest for now, but if they prove to be useful in
      more contexts we should move them to test_progs.h. And few more (e.g.,
      inequality tests) macros are probably necessary to have a more complete set.
      
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      
      v2->v3:
        - resending original patches #7-9 as patches #1-3 due to merge conflict;
      
      v1->v2:
        - fixed comments (John);
        - renamed btf__append_xxx() into btf__add_xxx() (Alexei);
        - added btf__find_str() in addition to btf__add_str();
        - btf__new_empty() now sets kernel FD to -1 initially.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bc600908
    • Andrii Nakryiko's avatar
    • Andrii Nakryiko's avatar
      libbpf: Add btf__str_by_offset() as a more generic variant of name_by_offset · f86ed050
      Andrii Nakryiko authored
      BTF strings are used not just for names, they can be arbitrary strings used
      for CO-RE relocations, line/func infos, etc. Thus "name_by_offset" terminology
      is too specific and might be misleading. Instead, introduce
      btf__str_by_offset() API which uses generic string terminology.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200929020533.711288-3-andriin@fb.com
      f86ed050
    • Andrii Nakryiko's avatar
      libbpf: Add BTF writing APIs · 4a3b33f8
      Andrii Nakryiko authored
      Add APIs for appending new BTF types at the end of BTF object.
      
      Each BTF kind has either one API of the form btf__add_<kind>(). For types
      that have variable amount of additional items (struct/union, enum, func_proto,
      datasec), additional API is provided to emit each such item. E.g., for
      emitting a struct, one would use the following sequence of API calls:
      
      btf__add_struct(...);
      btf__add_field(...);
      ...
      btf__add_field(...);
      
      Each btf__add_field() will ensure that the last BTF type is of STRUCT or
      UNION kind and will automatically increment that type's vlen field.
      
      All the strings are provided as C strings (const char *), not a string offset.
      This significantly improves usability of BTF writer APIs. All such strings
      will be automatically appended to string section or existing string will be
      re-used, if such string was already added previously.
      
      Each API attempts to do all the reasonable validations, like enforcing
      non-empty names for entities with required names, proper value bounds, various
      bit offset restrictions, etc.
      
      Type ID validation is minimal because it's possible to emit a type that refers
      to type that will be emitted later, so libbpf has no way to enforce such
      cases. User must be careful to properly emit all the necessary types and
      specify type IDs that will be valid in the finally generated BTF.
      
      Each of btf__add_<kind>() APIs return new type ID on success or negative
      value on error. APIs like btf__add_field() that emit additional items
      return zero on success and negative value on error.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200929020533.711288-2-andriin@fb.com
      4a3b33f8
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: add helpers to support BTF-based kernel' · 98b972d2
      Alexei Starovoitov authored
      Alan Maguire says:
      
      ====================
      This series attempts to provide a simple way for BPF programs (and in
      future other consumers) to utilize BPF Type Format (BTF) information
      to display kernel data structures in-kernel.  The use case this
      functionality is applied to here is to support a snprintf()-like
      helper to copy a BTF representation of kernel data to a string,
      and a BPF seq file helper to display BTF data for an iterator.
      
      There is already support in kernel/bpf/btf.c for "show" functionality;
      the changes here generalize that support from seq-file specific
      verifier display to the more generic case and add another specific
      use case; rather than seq_printf()ing the show data, it is copied
      to a supplied string using a snprintf()-like function.  Other future
      consumers of the show functionality could include a bpf_printk_btf()
      function which printk()ed the data instead.  Oops messaging in
      particular would be an interesting application for such functionality.
      
      The above potential use case hints at a potential reply to
      a reasonable objection that such typed display should be
      solved by tracing programs, where the in-kernel tracing records
      data and the userspace program prints it out.  While this
      is certainly the recommended approach for most cases, I
      believe having an in-kernel mechanism would be valuable
      also.  Critically in BPF programs it greatly simplifies
      debugging and tracing of such data to invoking a simple
      helper.
      
      One challenge raised in an earlier iteration of this work -
      where the BTF printing was implemented as a printk() format
      specifier - was that the amount of data printed per
      printk() was large, and other format specifiers were far
      simpler.  Here we sidestep that concern by printing
      components of the BTF representation as we go for the
      seq file case, and in the string case the snprintf()-like
      operation is intended to be a basis for perf event or
      ringbuf output.  The reasons for avoiding bpf_trace_printk
      are that
      
      1. bpf_trace_printk() strings are restricted in size and
      cannot display anything beyond trivial data structures; and
      2. bpf_trace_printk() is for debugging purposes only.
      
      As Alexei suggested, a bpf_trace_puts() helper could solve
      this in the future but it still would be limited by the
      1000 byte limit for traced strings.
      
      Default output for an sk_buff looks like this (zeroed fields
      are omitted):
      
      (struct sk_buff){
       .transport_header = (__u16)65535,
       .mac_header = (__u16)65535,
       .end = (sk_buff_data_t)192,
       .head = (unsigned char *)0x000000007524fd8b,
       .data = (unsigned char *)0x000000007524fd8b,
       .truesize = (unsigned int)768,
       .users = (refcount_t){
        .refs = (atomic_t){
         .counter = (int)1,
        },
       },
      }
      
      Flags can modify aspects of output format; see patch 3
      for more details.
      
      Changes since v6:
      
      - Updated safe data size to 32, object name size to 80.
        This increases the number of safe copies done, but performance is
        not a key goal here. WRT name size the largest type name length
        in bpf-next according to "pahole -s" is 64 bytes, so that still gives
        room for additional type qualifiers, parens etc within the name limit
        (Alexei, patch 2)
      - Remove inlines and converted as many #defines to functions as was
        possible.  In a few cases - btf_show_type_value[s]() specifically -
        I left these as macros as btf_show_type_value[s]() prepends and
        appends format strings to the format specifier (in order to include
        indentation, delimiters etc so a macro makes that simpler (Alexei,
        patch 2)
      - Handle btf_resolve_size() error in btf_show_obj_safe() (Alexei, patch 2)
      - Removed clang loop unroll in BTF snprintf test (Alexei)
      - switched to using bpf_core_type_id_kernel(type) as suggested by Andrii,
        and Alexei noted that __builtin_btf_type_id(,1) should be used (patch 4)
      - Added skip logic if __builtin_btf_type_id is not available (patches 4,8)
      - Bumped limits on bpf iters to support printing larger structures (Alexei,
        patch 5)
      - Updated overflow bpf_iter tests to reflect new iter max size (patch 6)
      - Updated seq helper to use type id only (Alexei, patch 7)
      - Updated BTF task iter test to use task struct instead of struct fs_struct
        since new limits allow a task_struct to be displayed (patch 8)
      - Fixed E2BIG handling in iter task (Alexei, patch 8)
      
      Changes since v5:
      
      - Moved btf print prepare into patch 3, type show seq
        with flags into patch 2 (Alexei, patches 2,3)
      - Fixed build bot warnings around static declarations
        and printf attributes
      - Renamed functions to snprintf_btf/seq_printf_btf
        (Alexei, patches 3-6)
      
      Changes since v4:
      
      - Changed approach from a BPF trace event-centric design to one
        utilizing a snprintf()-like helper and an iter helper (Alexei,
        patches 3,5)
      - Added tests to verify BTF output (patch 4)
      - Added support to tests for verifying BTF type_id-based display
        as well as type name via __builtin_btf_type_id (Andrii, patch 4).
      - Augmented task iter tests to cover the BTF-based seq helper.
        Because a task_struct's BTF-based representation would overflow
        the PAGE_SIZE limit on iterator data, the "struct fs_struct"
        (task->fs) is displayed for each task instead (Alexei, patch 6).
      
      Changes since v3:
      
      - Moved to RFC since the approach is different (and bpf-next is
        closed)
      - Rather than using a printk() format specifier as the means
        of invoking BTF-enabled display, a dedicated BPF helper is
        used.  This solves the issue of printk() having to output
        large amounts of data using a complex mechanism such as
        BTF traversal, but still provides a way for the display of
        such data to be achieved via BPF programs.  Future work could
        include a bpf_printk_btf() function to invoke display via
        printk() where the elements of a data structure are printk()ed
       one at a time.  Thanks to Petr Mladek, Andy Shevchenko and
        Rasmus Villemoes who took time to look at the earlier printk()
        format-specifier-focused version of this and provided feedback
        clarifying the problems with that approach.
      - Added trace id to the bpf_trace_printk events as a means of
        separating output from standard bpf_trace_printk() events,
        ensuring it can be easily parsed by the reader.
      - Added bpf_trace_btf() helper tests which do simple verification
        of the various display options.
      
      Changes since v2:
      
      - Alexei and Yonghong suggested it would be good to use
        probe_kernel_read() on to-be-shown data to ensure safety
        during operation.  Safe copy via probe_kernel_read() to a
        buffer object in "struct btf_show" is used to support
        this.  A few different approaches were explored
        including dynamic allocation and per-cpu buffers. The
        downside of dynamic allocation is that it would be done
        during BPF program execution for bpf_trace_printk()s using
        %pT format specifiers. The problem with per-cpu buffers
        is we'd have to manage preemption and since the display
        of an object occurs over an extended period and in printk
        context where we'd rather not change preemption status,
        it seemed tricky to manage buffer safety while considering
        preemption.  The approach of utilizing stack buffer space
        via the "struct btf_show" seemed like the simplest approach.
        The stack size of the associated functions which have a
        "struct btf_show" on their stack to support show operation
        (btf_type_snprintf_show() and btf_type_seq_show()) stays
        under 500 bytes. The compromise here is the safe buffer we
        use is small - 256 bytes - and as a result multiple
        probe_kernel_read()s are needed for larger objects. Most
        objects of interest are smaller than this (e.g.
        "struct sk_buff" is 224 bytes), and while task_struct is a
        notable exception at ~8K, performance is not the priority for
        BTF-based display. (Alexei and Yonghong, patch 2).
      - safe buffer use is the default behaviour (and is mandatory
        for BPF) but unsafe display - meaning no safe copy is done
        and we operate on the object itself - is supported via a
        'u' option.
      - pointers are prefixed with 0x for clarity (Alexei, patch 2)
      - added additional comments and explanations around BTF show
        code, especially around determining whether objects such
        zeroed. Also tried to comment safe object scheme used. (Yonghong,
        patch 2)
      - added late_initcall() to initialize vmlinux BTF so that it would
        not have to be initialized during printk operation (Alexei,
        patch 5)
      - removed CONFIG_BTF_PRINTF config option as it is not needed;
        CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and
        determining behaviour of type-based printk can be done via
        retrieval of BTF data; if it's not there BTF was unavailable
        or broken (Alexei, patches 4,6)
      - fix bpf_trace_printk test to use vmlinux.h and globals via
        skeleton infrastructure, removing need for perf events
        (Andrii, patch 8)
      
      Changes since v1:
      
      - changed format to be more drgn-like, rendering indented type info
        along with type names by default (Alexei)
      - zeroed values are omitted (Arnaldo) by default unless the '0'
        modifier is specified (Alexei)
      - added an option to print pointer values without obfuscation.
        The reason to do this is the sysctls controlling pointer display
        are likely to be irrelevant in many if not most tracing contexts.
        Some questions on this in the outstanding questions section below...
      - reworked printk format specifer so that we no longer rely on format
        %pT<type> but instead use a struct * which contains type information
        (Rasmus). This simplifies the printk parsing, makes use more dynamic
        and also allows specification by BTF id as well as name.
      - removed incorrect patch which tried to fix dereferencing of resolved
        BTF info for vmlinux; instead we skip modifiers for the relevant
        case (array element type determination) (Alexei).
      - fixed issues with negative snprintf format length (Rasmus)
      - added test cases for various data structure formats; base types,
        typedefs, structs, etc.
      - tests now iterate through all typedef, enum, struct and unions
        defined for vmlinux BTF and render a version of the target dummy
        value which is either all zeros or all 0xff values; the idea is this
        exercises the "skip if zero" and "print everything" cases.
      - added support in BPF for using the %pT format specifier in
        bpf_trace_printk()
      - added BPF tests which ensure %pT format specifier use works (Alexei).
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      98b972d2
    • Alan Maguire's avatar
      selftests/bpf: Add test for bpf_seq_printf_btf helper · b72091bd
      Alan Maguire authored
      Add a test verifying iterating over tasks and displaying BTF
      representation of task_struct succeeds.
      Suggested-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-9-git-send-email-alan.maguire@oracle.com
      b72091bd
    • Alan Maguire's avatar
      bpf: Add bpf_seq_printf_btf helper · eb411377
      Alan Maguire authored
      A helper is added to allow seq file writing of kernel data
      structures using vmlinux BTF.  Its signature is
      
      long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr,
                              u32 btf_ptr_size, u64 flags);
      
      Flags and struct btf_ptr definitions/use are identical to the
      bpf_snprintf_btf helper, and the helper returns 0 on success
      or a negative error value.
      Suggested-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com
      eb411377