Commits · c240ba28789063690077d282f8f89e03f31037d0 · Kirill Smelkov / linux

15 Sep, 2021 12 commits

selftests/bpf: Add a test with a bpf program with btf_tag attributes · c240ba28

Yonghong Song authored Sep 14, 2021

Add a bpf program with btf_tag attributes. The program is
loaded successfully with the kernel. With the command
  bpftool btf dump file ./tag.o
the following dump shows that tags are properly encoded:
  [8] STRUCT 'key_t' size=12 vlen=3
          'a' type_id=2 bits_offset=0
          'b' type_id=2 bits_offset=32
          'c' type_id=2 bits_offset=64
  [9] TAG 'tag1' type_id=8 component_id=-1
  [10] TAG 'tag2' type_id=8 component_id=-1
  [11] TAG 'tag1' type_id=8 component_id=1
  [12] TAG 'tag2' type_id=8 component_id=1
  ...
  [21] FUNC_PROTO '(anon)' ret_type_id=2 vlen=1
          'x' type_id=2
  [22] FUNC 'foo' type_id=21 linkage=static
  [23] TAG 'tag1' type_id=22 component_id=0
  [24] TAG 'tag2' type_id=22 component_id=0
  [25] TAG 'tag1' type_id=22 component_id=-1
  [26] TAG 'tag2' type_id=22 component_id=-1
  ...
  [29] VAR 'total' type_id=27, linkage=global
  [30] TAG 'tag1' type_id=29 component_id=-1
  [31] TAG 'tag2' type_id=29 component_id=-1

If an old clang compiler, which does not support btf_tag attribute,
is used, these btf_tag attributes will be silently ignored.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223058.248949-1-yhs@fb.com

c240ba28

selftests/bpf: Test BTF_KIND_TAG for deduplication · ad526474

Yonghong Song authored Sep 14, 2021

Add unit tests for BTF_KIND_TAG deduplication for
  - struct and struct member
  - variable
  - func and func argument
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223052.248535-1-yhs@fb.com

ad526474

selftests/bpf: Add BTF_KIND_TAG unit tests · 35baba7a

Yonghong Song authored Sep 14, 2021

Test good and bad variants of BTF_KIND_TAG encoding.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223047.248223-1-yhs@fb.com

35baba7a

selftests/bpf: Change NAME_NTH/IS_NAME_NTH for BTF_KIND_TAG format · 3df3bd68

Yonghong Song authored Sep 14, 2021

BTF_KIND_TAG ELF format has a component_idx which might have value -1.
test_btf may confuse it with common_type.name as NAME_NTH checkes
high 16bit to be 0xffff. Change NAME_NTH high 16bit check to be
0xfffe so it won't confuse with component_idx.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223041.248009-1-yhs@fb.com

3df3bd68

selftests/bpf: Test libbpf API function btf__add_tag() · 71d29c2d

Yonghong Song authored Sep 14, 2021

Add btf_write tests with btf__add_tag() function.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223036.247560-1-yhs@fb.com

71d29c2d

bpftool: Add support for BTF_KIND_TAG · 5c07f2fe

Yonghong Song authored Sep 14, 2021

Added bpftool support to dump BTF_KIND_TAG information.
The new bpftool will be used in later patches to dump
btf in the test bpf program object file.

Currently, the tags are not emitted with
  bpftool btf dump file <path> format c
and they are silently ignored.  The tag information is
mostly used in the kernel for verification purpose and the kernel
uses its own btf to check. With adding these tags
to vmlinux.h, tags will be encoded in program's btf but
they will not be used by the kernel, at least for now.
So let us delay adding these tags to format C header files
until there is a real need.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223031.246951-1-yhs@fb.com

5c07f2fe

libbpf: Add support for BTF_KIND_TAG · 5b84bd10

Yonghong Song authored Sep 14, 2021

Add BTF_KIND_TAG support for parsing and dedup.
Also added sanitization for BTF_KIND_TAG. If BTF_KIND_TAG is not
supported in the kernel, sanitize it to INTs.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223025.246687-1-yhs@fb.com

5b84bd10

libbpf: Rename btf_{hash,equal}_int to btf_{hash,equal}_int_tag · 30025e8b

Yonghong Song authored Sep 14, 2021

This patch renames functions btf_{hash,equal}_int() to
btf_{hash,equal}_int_tag() so they can be reused for
BTF_KIND_TAG support. There is no functionality change for
this patch.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223020.245829-1-yhs@fb.com

30025e8b

bpf: Support for new btf kind BTF_KIND_TAG · b5ea834d

Yonghong Song authored Sep 14, 2021

LLVM14 added support for a new C attribute ([1])
  __attribute__((btf_tag("arbitrary_str")))
This attribute will be emitted to dwarf ([2]) and pahole
will convert it to BTF. Or for bpf target, this
attribute will be emitted to BTF directly ([3], [4]).
The attribute is intended to provide additional
information for
  - struct/union type or struct/union member
  - static/global variables
  - static/global function or function parameter.

For linux kernel, the btf_tag can be applied
in various places to specify user pointer,
function pre- or post- condition, function
allow/deny in certain context, etc. Such information
will be encoded in vmlinux BTF and can be used
by verifier.

The btf_tag can also be applied to bpf programs
to help global verifiable functions, e.g.,
specifying preconditions, etc.

This patch added basic parsing and checking support
in kernel for new BTF_KIND_TAG kind.

 [1] https://reviews.llvm.org/D106614
 [2] https://reviews.llvm.org/D106621
 [3] https://reviews.llvm.org/D106622
 [4] https://reviews.llvm.org/D109560Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223015.245546-1-yhs@fb.com

b5ea834d

btf: Change BTF_KIND_* macros to enums · 41ced4cd

Yonghong Song authored Sep 14, 2021

Change BTF_KIND_* macros to enums so they are encoded in dwarf and
appear in vmlinux.h. This will make it easier for bpf programs
to use these constants without macro definitions.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223009.245307-1-yhs@fb.com

41ced4cd

selftests/bpf: Fix .gitignore to not ignore test_progs.c · 8987ede3

Andrii Nakryiko authored Sep 14, 2021

List all possible test_progs flavors explicitly to avoid accidentally
ignoring valid source code files. In this case, test_progs.c was still
ignored after recent 809ed84d ("selftests/bpf: Whitelist test_progs.h
from .gitignore") fix that added exception only for test_progs.h.

Fixes: 74b5a596 ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210914162228.3995740-1-andrii@kernel.org

8987ede3

bpf,x64 Emit IMUL instead of MUL for x86-64 · c0354077

Jie Meng authored Sep 13, 2021

IMUL allows for multiple operands and saving and storing rax/rdx is no
longer needed. Signedness of the operands doesn't matter here because
the we only keep the lower 32/64 bit of the product for 32/64 bit
multiplications.
Signed-off-by: Jie Meng <jmeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210913211337.1564014-1-jmeng@fb.com

c0354077

14 Sep, 2021 6 commits

Merge branch 'libbpf: Streamline internal BPF program sections handling' · 67dfac47

Alexei Starovoitov authored Sep 14, 2021

Andrii Nakryiko says:

====================

This small patch set performs internal refactorings around libbpf BPF program
ELF section definitions' handling. This is preparatory changes for further
changes around making libbpf BPF program section handling more strict but also
pluggable and customizable, as part of the libbpf 1.0 effort. See individual
patches for details.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

67dfac47

libbpf: Minimize explicit iterator of section definition array · b6291a6f

Andrii Nakryiko authored Sep 13, 2021

Remove almost all the code that explicitly iterated BPF program section
definitions in favor of using find_sec_def(). The only remaining user of
section_defs is libbpf_get_type_names that has to iterate all of them to
construct its result.

Having one internal API entry point for section definitions will
simplify further refactorings around libbpf's program section
definitions parsing.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-5-andrii@kernel.org

b6291a6f

libbpf: Simplify BPF program auto-attach code · 5532dfd4

Andrii Nakryiko authored Sep 13, 2021

Remove the need to explicitly pass bpf_sec_def for auto-attachable BPF
programs, as it is already recorded at bpf_object__open() time for all
recognized type of BPF programs. This further reduces number of explicit
calls to find_sec_def(), simplifying further refactorings.

No functional changes are done by this patch.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-4-andrii@kernel.org

5532dfd4

libbpf: Ensure BPF prog types are set before relocations · 91b4d1d1

Andrii Nakryiko authored Sep 13, 2021

Refactor bpf_object__open() sequencing to perform BPF program type
detection based on SEC() definitions before we get to relocations
collection. This allows to have more information about BPF program by
the time we get to, say, struct_ops relocation gathering. This,
subsequently, simplifies struct_ops logic and removes the need to
perform extra find_sec_def() resolution.

With this patch libbpf will require all struct_ops BPF programs to be
marked with SEC("struct_ops") or SEC("struct_ops/xxx") annotations.
Real-world applications are already doing that through something like
selftests's BPF_STRUCT_OPS() macro. This change streamlines libbpf's
internal handling of SEC() definitions and is in the sprit of
upcoming libbpf-1.0 section strictness changes ([0]).

  [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#stricter-and-more-uniform-bpf-program-section-name-sec-handlingSigned-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-3-andrii@kernel.org

91b4d1d1

selftests/bpf: Update selftests to always provide "struct_ops" SEC · 53df63cc

Andrii Nakryiko authored Sep 13, 2021

Update struct_ops selftests to always specify "struct_ops" section
prefix. Libbpf will require a proper BPF program type set in the next
patch, so this prevents tests breaking.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-2-andrii@kernel.org

53df63cc

libbpf: Introduce legacy kprobe events support · ca304b40

Rafael David Tinoco authored Sep 12, 2021

Allow kprobe tracepoint events creation through legacy interface, as the
kprobe dynamic PMUs support, used by default, was only created in v4.17.

Store legacy kprobe name in struct bpf_perf_link, instead of creating
a new "subclass" off of bpf_perf_link. This is ok as it's just two new
fields, which are also going to be reused for legacy uprobe support in
follow up patches.
Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210912064844.3181742-1-rafaeldtinoco@gmail.com

ca304b40

13 Sep, 2021 6 commits

libbpf: Make libbpf_version.h non-auto-generated · 2f383041

Andrii Nakryiko authored Sep 13, 2021

Turn previously auto-generated libbpf_version.h header into a normal
header file. This prevents various tricky Makefile integration issues,
simplifies the overall build process, but also allows to further extend
it with some more versioning-related APIs in the future.

To prevent accidental out-of-sync versions as defined by libbpf.map and
libbpf_version.h, Makefile checks their consistency at build time.

Simultaneously with this change bump libbpf.map to v0.6.

Also undo adding libbpf's output directory into include path for
kernel/bpf/preload, bpftool, and resolve_btfids, which is not necessary
because libbpf_version.h is just a normal header like any other.

Fixes: 0b46b755 ("libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210913222309.3220849-1-andrii@kernel.org

2f383041

bpf, selftests: Replicate tailcall limit test for indirect call case · dbd7eb14

Daniel Borkmann authored Sep 10, 2021

The tailcall_3 test program uses bpf_tail_call_static() where the JIT
would patch a direct jump. Add a new tailcall_6 test program replicating
exactly the same test just ensuring that bpf_tail_call() uses a map
index where the verifier cannot make assumptions this time.

In other words, this will now cover both on x86-64 JIT, meaning, JIT
images with emit_bpf_tail_call_direct() emission as well as JIT images
with emit_bpf_tail_call_indirect() emission.

  # echo 1 > /proc/sys/net/core/bpf_jit_enable
  # ./test_progs -t tailcalls
  #136/1 tailcalls/tailcall_1:OK
  #136/2 tailcalls/tailcall_2:OK
  #136/3 tailcalls/tailcall_3:OK
  #136/4 tailcalls/tailcall_4:OK
  #136/5 tailcalls/tailcall_5:OK
  #136/6 tailcalls/tailcall_6:OK
  #136/7 tailcalls/tailcall_bpf2bpf_1:OK
  #136/8 tailcalls/tailcall_bpf2bpf_2:OK
  #136/9 tailcalls/tailcall_bpf2bpf_3:OK
  #136/10 tailcalls/tailcall_bpf2bpf_4:OK
  #136/11 tailcalls/tailcall_bpf2bpf_5:OK
  #136 tailcalls:OK
  Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED

  # echo 0 > /proc/sys/net/core/bpf_jit_enable
  # ./test_progs -t tailcalls
  #136/1 tailcalls/tailcall_1:OK
  #136/2 tailcalls/tailcall_2:OK
  #136/3 tailcalls/tailcall_3:OK
  #136/4 tailcalls/tailcall_4:OK
  #136/5 tailcalls/tailcall_5:OK
  #136/6 tailcalls/tailcall_6:OK
  [...]

For interpreter, the tailcall_1-6 tests are passing as well. The later
tailcall_bpf2bpf_* are failing due lack of bpf2bpf + tailcall support
in interpreter, so this is expected.

Also, manual inspection shows that both loaded programs from tailcall_3
and tailcall_6 test case emit the expected opcodes:

* tailcall_3 disasm, emit_bpf_tail_call_direct():

  [...]
   b:   push   %rax
   c:   push   %rbx
   d:   push   %r13
   f:   mov    %rdi,%rbx
  12:   movabs $0xffff8d3f5afb0200,%r13
  1c:   mov    %rbx,%rdi
  1f:   mov    %r13,%rsi
  22:   xor    %edx,%edx                 _
  24:   mov    -0x4(%rbp),%eax          |  limit check
  2a:   cmp    $0x20,%eax               |
  2d:   ja     0x0000000000000046       |
  2f:   add    $0x1,%eax                |
  32:   mov    %eax,-0x4(%rbp)          |_
  38:   nopl   0x0(%rax,%rax,1)
  3d:   pop    %r13
  3f:   pop    %rbx
  40:   pop    %rax
  41:   jmpq   0xffffffffffffe377
  [...]

* tailcall_6 disasm, emit_bpf_tail_call_indirect():

  [...]
  47:   movabs $0xffff8d3f59143a00,%rsi
  51:   mov    %edx,%edx
  53:   cmp    %edx,0x24(%rsi)
  56:   jbe    0x0000000000000093        _
  58:   mov    -0x4(%rbp),%eax          |  limit check
  5e:   cmp    $0x20,%eax               |
  61:   ja     0x0000000000000093       |
  63:   add    $0x1,%eax                |
  66:   mov    %eax,-0x4(%rbp)          |_
  6c:   mov    0x110(%rsi,%rdx,8),%rcx
  74:   test   %rcx,%rcx
  77:   je     0x0000000000000093
  79:   pop    %rax
  7a:   mov    0x30(%rcx),%rcx
  7e:   add    $0xb,%rcx
  82:   callq  0x000000000000008e
  87:   pause
  89:   lfence
  8c:   jmp    0x0000000000000087
  8e:   mov    %rcx,(%rsp)
  92:   retq
  [...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Paul Chaignon <paul@cilium.io>
Link: https://lore.kernel.org/bpf/CAM1=_QRyRVCODcXo_Y6qOm1iT163HoiSj8U2pZ8Rj3hzMTT=HQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210910091900.16119-1-daniel@iogearbox.net

dbd7eb14

Merge branch 'bpf: introduce bpf_get_branch_snapshot' · 14bef1ab

Alexei Starovoitov authored Sep 13, 2021

Song Liu says:

====================

Changes v6 => v7:
1. Improve/fix intel_pmu_snapshot_branch_stack() logic. (Peter).

Changes v5 => v6:
1. Add local_irq_save/restore to intel_pmu_snapshot_branch_stack. (Peter)
2. Remove buf and size check in bpf_get_branch_snapshot, move flags check
   to later fo the function. (Peter, Andrii)
3. Revise comments for bpf_get_branch_snapshot in bpf.h (Andrii)

Changes v4 => v5:
1. Modify perf_snapshot_branch_stack_t to save some memcpy. (Andrii)
2. Minor fixes in selftests. (Andrii)

Changes v3 => v4:
1. Do not reshuffle intel_pmu_disable_all(). Use some inline to save LBR
   entries. (Peter)
2. Move static_call(perf_snapshot_branch_stack) to the helper. (Alexei)
3. Add argument flags to bpf_get_branch_snapshot. (Andrii)
4. Make MAX_BRANCH_SNAPSHOT an enum (Andrii). And rename it as
   PERF_MAX_BRANCH_SNAPSHOT
5. Make bpf_get_branch_snapshot similar to bpf_read_branch_records.
   (Andrii)
6. Move the test target function to bpf_testmod. Updated kallsyms_find_next
   to work properly with modules. (Andrii)

Changes v2 => v3:
1. Fix the use of static_call. (Peter)
2. Limit the use to perfmon version >= 2. (Peter)
3. Modify intel_pmu_snapshot_branch_stack() to use intel_pmu_disable_all
   and intel_pmu_enable_all().

Changes v1 => v2:
1. Rename the helper as bpf_get_branch_snapshot;
2. Fix/simplify the use of static_call;
3. Instead of percpu variables, let intel_pmu_snapshot_branch_stack output
   branch records to an output argument of type perf_branch_snapshot.

Branch stack can be very useful in understanding software events. For
example, when a long function, e.g. sys_perf_event_open, returns an errno,
it is not obvious why the function failed. Branch stack could provide very
helpful information in this type of scenarios.

This set adds support to read branch stack with a new BPF helper
bpf_get_branch_trace(). Currently, this is only supported in Intel systems.
It is also possible to support the same feaure for PowerPC.

The hardware that records the branch stace is not stopped automatically on
software events. Therefore, it is necessary to stop it in software soon.
Otherwise, the hardware buffers/registers will be flushed. One of the key
design consideration in this set is to minimize the number of branch record
entries between the event triggers and the hardware recorder is stopped.
Based on this goal, current design is different from the discussions in
original RFC [1]:
 1) Static call is used when supported, to save function pointer
    dereference;
 2) intel_pmu_lbr_disable_all is used instead of perf_pmu_disable(),
    because the latter uses about 10 entries before stopping LBR.

With current code, on Intel CPU, LBR is stopped after 7 branch entries
after fexit triggers:

ID: 0 from bpf_get_branch_snapshot+18 to intel_pmu_snapshot_branch_stack+0
ID: 1 from __brk_limit+477143934 to bpf_get_branch_snapshot+0
ID: 2 from __brk_limit+477192263 to __brk_limit+477143880  # trampoline
ID: 3 from __bpf_prog_enter+34 to __brk_limit+477192251
ID: 4 from migrate_disable+60 to __bpf_prog_enter+9
ID: 5 from __bpf_prog_enter+4 to migrate_disable+0
ID: 6 from bpf_testmod_loop_test+20 to __bpf_prog_enter+0
ID: 7 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
ID: 8 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
...

[1] https://lore.kernel.org/bpf/20210818012937.2522409-1-songliubraving@fb.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

14bef1ab

selftests/bpf: Add test for bpf_get_branch_snapshot · 025bd7c7

Song Liu authored Sep 10, 2021

This test uses bpf_get_branch_snapshot from a fexit program. The test uses
a target function (bpf_testmod_loop_test) and compares the record against
kallsyms. If there isn't enough record matching kallsyms, the test fails.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210910183352.3151445-4-songliubraving@fb.com

025bd7c7

bpf: Introduce helper bpf_get_branch_snapshot · 856c02db

Song Liu authored Sep 10, 2021

Introduce bpf_get_branch_snapshot(), which allows tracing pogram to get
branch trace from hardware (e.g. Intel LBR). To use the feature, the
user need to create perf_event with proper branch_record filtering
on each cpu, and then calls bpf_get_branch_snapshot in the bpf function.
On Intel CPUs, VLBR event (raw event 0x1b00) can be use for this.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210910183352.3151445-3-songliubraving@fb.com

856c02db

perf: Enable branch record for software events · c22ac2a3

Song Liu authored Sep 10, 2021

The typical way to access branch record (e.g. Intel LBR) is via hardware
perf_event. For CPUs with FREEZE_LBRS_ON_PMI support, PMI could capture
reliable LBR. On the other hand, LBR could also be useful in non-PMI
scenario. For example, in kretprobe or bpf fexit program, LBR could
provide a lot of information on what happened with the function. Add API
to use branch record for software use.

Note that, when the software event triggers, it is necessary to stop the
branch record hardware asap. Therefore, static_call is used to remove some
branch instructions in this process.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210910183352.3151445-2-songliubraving@fb.com

c22ac2a3

10 Sep, 2021 16 commits

selftests/bpf: Test new __sk_buff field hwtstamp · 3384c7c7

Vadim Fedorenko authored Sep 10, 2021

Analogous to the gso_segs selftests introduced in commit d9ff286a
("bpf: allow BPF programs access skb_shared_info->gso_segs field").
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210909220409.8804-3-vfedorenko@novek.ru

3384c7c7

bpf: Add hardware timestamp field to __sk_buff · f64c4ace

Vadim Fedorenko authored Sep 10, 2021

BPF programs may want to know hardware timestamps if NIC supports
such timestamping.

Expose this data as hwtstamp field of __sk_buff the same way as
gso_segs/gso_size. This field could be accessed from the same
programs as tstamp field, but it's read-only field. Explicit test
to deny access to padding data is added to bpf_skb_is_valid_access.

Also update BPF_PROG_TEST_RUN tests of the feature.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210909220409.8804-2-vfedorenko@novek.ru

f64c4ace

Merge branch 'bpf-xsk-selftests' · e876a036

Daniel Borkmann authored Sep 10, 2021

Magnus Karlsson says:

====================
This patch set facilitates adding new tests as well as describing
existing ones in the xsk selftests suite and adds 3 new test suites at
the end. The idea is to isolate the run-time that executes the test
from the actual implementation of the test. Today, implementing a test
amounts to adding test specific if-statements all around the run-time,
which is not scalable or amenable for reuse. This patch set instead
introduces a test specification that is the only thing that a test
fills in. The run-time then gets this specification and acts upon it
completely unaware of what test it is executing. This way, we can get
rid of all test specific if-statements from the run-time and the
implementation of the test can be contained in a single function. This
hopefully makes it easier to add tests and for users to understand
what the test accomplishes.

As a recap of what the run-time does: each test is based on the
run-time launching two threads and connecting a veth link between the
two threads. Each thread opens an AF_XDP socket on that veth interface
and one of them sends traffic that the other one receives and
validates. Each thread has its own umem. Note that this behavior is
not changed by this patch set.

A test specification consists of several items. Most importantly:

* Two packet streams. One for Tx thread that specifies what traffic to
  send and one for the Rx thread that specifies what that thread
  should receive. If it receives exactly what is specified, the test
  passes, otherwise it fails. A packet stream can also specify what
  buffers in the umem that should be used by the Rx and Tx threads.

* What kind of AF_XDP sockets it should create and bind to what
  interfaces

* How many times it should repeat the socket creation and destruction

* The name of the test

The interface for the test spec is the following:

void test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx,
                    struct ifobject *ifobj_rx, enum test_mode mode);

/* Reset everything but the interface specifications and the mode */
void test_spec_reset(struct test_spec *test);

void test_spec_set_name(struct test_spec *test, const char *name);

Packet streams have the following interfaces:

struct pkt *pkt_stream_get_pkt(struct pkt_stream *pkt_stream, u32 pkt_nb)

struct pkt *pkt_stream_get_next_rx_pkt(struct pkt_stream *pkt_stream)

struct pkt_stream *pkt_stream_generate(struct xsk_umem_info *umem,
                                       u32 nb_pkts, u32 pkt_len);

void pkt_stream_delete(struct pkt_stream *pkt_stream);

struct pkt_stream *pkt_stream_clone(struct xsk_umem_info *umem,
                                    struct pkt_stream *pkt_stream);

/* Replaces all packets in the stream*/
void pkt_stream_replace(struct test_spec *test, u32 nb_pkts, u32 pkt_len);

/* Replaces every other packet in the stream */
void pkt_stream_replace_half(struct test_spec *test, u32 pkt_len, u32 offset);

/* For creating custom made packet streams */
void pkt_stream_generate_custom(struct test_spec *test, struct pkt *pkts,
                                u32 nb_pkts);

/* Restores the default packet stream */
void pkt_stream_restore_default(struct test_spec *test);

A test can then then in the most basic case described like this
(provided the test specification has been created before calling the
function):

static bool testapp_aligned(struct test_spec *test)
{
        test_spec_set_name(test, "RUN_TO_COMPLETION");
        testapp_validate_traffic(test);
}

Running the same test in unaligned mode would then look like this:

static bool testapp_unaligned(struct test_spec *test)
{
        if (!hugepages_present(test->ifobj_tx)) {
                ksft_test_result_skip("No 2M huge pages present.\n");
                return false;
        }

        test_spec_set_name(test, "UNALIGNED_MODE");
        test->ifobj_tx->umem->unaligned_mode = true;
        test->ifobj_rx->umem->unaligned_mode = true;
        /* Let half of the packets straddle a buffer boundrary */
        pkt_stream_replace_half(test, PKT_SIZE,
                                XSK_UMEM__DEFAULT_FRAME_SIZE - 32);
	/* Populate fill ring with addresses in the packet stream */
        test->ifobj_rx->pkt_stream->use_addr_for_fill = true;
        testapp_validate_traffic(test);

        pkt_stream_restore_default(test);
	return true;
}

3 of the last 4 patches in the set add 3 new test suites, one for
unaligned mode, one for testing the rejection of tricky invalid
descriptors plus the acceptance of some valid ones in the Tx ring, and
one for testing 2K frame sizes (the default is 4K).

What is left to do for follow-up patches:

* Convert the statistics tests to the new framework.

* Implement a way of registering new tests without having the enum
  test_type. Once this has been done (together with the previous
  bullet), all the test types can be dropped from the header
  file. This means that we should be able to add tests by just writing
  a single function with a new test specification, which is one of the
  goals.

* Introduce functions for manipulating parts of the test or interface
  spec instead of direct manipulations such as
  test->ifobj_rx->pkt_stream->use_addr_for_fill = true; which is kind
  of awkward.

* Move the run-time and its interface to its own .c and .h files. Then
  we can have all the tests in a separate file.

* Better error reporting if a test fails. Today it does not state what
  test fails and might not continue execute the rest of the tests due
  to this failure. Failures are not propagated upwards through the
  functions so a failed test will also be a passed test, which messes
  up the stats counting. This needs to be changed.

* Add option to run specific test instead of all of them

* Introduce pacing of sent packets so that they are never dropped
  by the receiver even if it is stalled for some reason. If you run
  the current tests on a heavily loaded system, they might fail in SKB
  mode due to packets being dropped by the driver on Tx. Though I have
  never seen it, it might happen.

v1 -> v2:

* Fixed a number of spelling errors [Maciej]
* Fixed use after free bug in pkt_stream_replace() [Maciej]
* pkt_stream_set -> pkt_stream_generate_custom [Maciej]
* Fixed formatting problem in testapp_invalid_desc() [Maciej]
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

e876a036

selftests: xsk: Add tests for 2K frame size · 909f0e28

Magnus Karlsson authored Sep 07, 2021

Add tests for 2K frame size. Both a standard send and receive test and
one testing for invalid descriptors when the frame size is 2K.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-21-magnus.karlsson@gmail.com

909f0e28

selftests: xsk: Add tests for invalid xsk descriptors · 0d1b7f3a

Magnus Karlsson authored Sep 07, 2021

Add tests for invalid xsk descriptors in the Tx ring. A number of
handcrafted nasty invalid descriptors are created and submitted to the
tx ring to check that they are validated correctly. Corner case valid
ones are also sent. The tests are run for both aligned and unaligned
mode.

pkt_stream_set() is introduced to be able to create a hand-crafted
packet stream where every single packet is specified in detail.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-20-magnus.karlsson@gmail.com

0d1b7f3a

selftests: xsk: Eliminate test specific if-statement in test runner · 6ce67b51

Magnus Karlsson authored Sep 07, 2021

Eliminate a test specific if-statement for the RX_FILL_EMTPY stats
test that is present in the test runner. We can do this as we now have
the use_addr_for_fill option. Just create and empty Rx packet stream
and indicated that the test runner should use the addresses in that to
populate the fill ring. As there are no packets in the stream, the
fill ring will be empty and we will get the error stats that we want
to test.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-19-magnus.karlsson@gmail.com

6ce67b51

selftests: xsk: Add test for unaligned mode · a4ba98dd

Magnus Karlsson authored Sep 07, 2021

Add a test for unaligned mode in which packet buffers can be placed
anywhere within the umem. Some packets are made to straddle page
boundaries in order to check for correctness. On the Tx side, buffers
are now allocated according to the addresses found in the packet
stream. Thus, the placement of buffers can be controlled with the
boolean use_addr_for_fill in the packet stream.

One new pkt_stream interface is introduced: pkt_stream_replace_half()
that replaces every other packet in the default packet stream with the
specified new packet. The constant DEFAULT_OFFSET is also
introduced. It specifies at what offset from the start of a chunk a Tx
packet is placed by the sending thread. This is just to be able to
test that it is possible to send packets at an offset not equal to
zero.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-18-magnus.karlsson@gmail.com

a4ba98dd

selftests: xsk: Introduce replacing the default packet stream · 605091c5

Magnus Karlsson authored Sep 07, 2021

Introduce the concept of a default packet stream that is the set of
packets sent by most tests. Then add the ability to replace it for a
test that would like to send or receive something else through the use
of the function pkt_stream_replace() and then restored with
pkt_stream_restore_default(). These are then used to convert the
STAT_TEST_TX_INVALID to use these new APIs.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-17-magnus.karlsson@gmail.com

605091c5

selftests: xsk: Allow for invalid packets · 8abf6f72

Magnus Karlsson authored Sep 07, 2021

Allow for invalid packets to be sent. These are verified by the Rx
thread not to be received. Or put in another way, if they are
received, the test will fail. This feature will be used to eliminate
an if statement for a stats test and will also be used by other tests
in later patches. The previous code could only deal with valid
packets.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-16-magnus.karlsson@gmail.com

8abf6f72

selftests: xsk: Eliminate MAX_SOCKS define · 8ce7192b

Magnus Karlsson authored Sep 07, 2021

Remove the MAX_SOCKS define as it always will be one for the forseable
future and the code does not work for any other case anyway.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-15-magnus.karlsson@gmail.com

8ce7192b

selftests: xsx: Make pthreads local scope · e2d850d5

Magnus Karlsson authored Sep 07, 2021

Make the pthread_t variables local scope instead of global. No reason
for them to be global.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-14-magnus.karlsson@gmail.com

e2d850d5

selftests: xsk: Make xdp_flags and bind_flags local · af6731d1

Magnus Karlsson authored Sep 07, 2021

Make xdp_flags and bind_flags local instead of global by moving them
into the interface object. These flags decide if the socket should be
created in SKB mode or in DRV mode and therefore they are sticky and
will survive a test_spec_reset. Since every test is first run in SKB
mode then in DRV mode, this change only happens once. With this
change, the configured_mode global variable can also be
erradicated. The first test_spec_init() also becomes superfluous and
can be eliminated.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-13-magnus.karlsson@gmail.com

af6731d1

selftests: xsk: Specify number of sockets to create · 85c6c957

Magnus Karlsson authored Sep 07, 2021

Add the ability in the test specification to specify numbers of
sockets to create. The default is one socket. This is then used to
remove test specific if-statements around the bpf_res tests.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-12-magnus.karlsson@gmail.com

85c6c957

selftests: xsk: Replace second_step global variable · 55be575d

Magnus Karlsson authored Sep 07, 2021

Replace the second_step global variable with a test specification
variable called total_steps that a test can be set to indicate how
many times the packet stream should be sent without reinitializing any
sockets. This eliminates test specific code in the test runner around
the bidirectional test.

The total_steps variable is 1 by default as most tests only need a
single round of packets.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-11-magnus.karlsson@gmail.com

55be575d

selftests: xsk: Introduce rx_on and tx_on in ifobject · 1856c24d

Magnus Karlsson authored Sep 07, 2021

Introduce rx_on and tx_on in the ifobject so that we can describe if
the thread should create a socket with only tx, rx, or both. This
eliminates some test specific if statements from the code. We can also
eliminate the flow vector structure now as this is fully specified
by the tx_on and rx_on variables.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-10-magnus.karlsson@gmail.com

1856c24d

selftests: xsk: Add use_poll to ifobject · 119d4b02

Magnus Karlsson authored Sep 07, 2021

Add a use_poll option to the ifobject so that we do not need to use a
test specific if-statement in the test runner.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-9-magnus.karlsson@gmail.com

119d4b02