Commits · d95d56519026322b25ffa964bb92182620176972 · Kirill Smelkov / linux

05 Sep, 2024 19 commits

selftests/bpf: Enable cross platform testing for vmtest · d95d5651

Pu Lehui authored Sep 05, 2024

Add support cross platform testing for vmtest. The variable $ARCH in the
current script is platform semantics, not kernel semantics. Rename it to
$PLATFORM so that we can easily use $ARCH in cross-compilation. And drop
`set -u` unbound variable check as we will use CROSS_COMPILE env
variable. For now, Using PLATFORM= and CROSS_COMPILE= options will
enable cross platform testing:

  PLATFORM=<platform> CROSS_COMPILE=<toolchain> vmtest.sh -- ./test_progs
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-7-pulehui@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d95d5651

selftests/bpf: Support local rootfs image for vmtest · 2294073d

Pu Lehui authored Sep 05, 2024

Support vmtest to use local rootfs image generated by [0] that is
consistent with BPF CI. Now we can specify the local rootfs image
through the `-l` parameter like as follows:

vmtest.sh -l ./libbpf-vmtest-rootfs-2024.08.22-noble-amd64.tar.zst -- ./test_progs

Meanwhile, some descriptions have been flushed.

Link: https://github.com/libbpf/ci/blob/main/rootfs/mkrootfs_debian.sh [0]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-6-pulehui@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

2294073d

selftests/bpf: Limit URLS parsing logic to actual scope in vmtest · 0c3fc330

Pu Lehui authored Sep 05, 2024

The URLS array is only valid in the download_rootfs function and does
not need to be parsed globally in advance. At the same time, the logic
of loading rootfs is refactored to prepare vmtest for supporting local
rootfs.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-5-pulehui@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0c3fc330

selftests/bpf: Prefer static linking for LLVM libraries · 67ab80a0

Eduard Zingerman authored Sep 05, 2024

It is not always convenient to have LLVM libraries installed inside CI
rootfs images, thus request static libraries from llvm-config.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20240905081401.1894789-4-pulehui@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

67ab80a0

selftests/bpf: Rename fallback in bpf_dctcp to avoid naming conflict · a48a4388

Pu Lehui authored Sep 05, 2024

Recently, when compiling bpf selftests on RV64, the following
compilation failure occurred:

progs/bpf_dctcp.c:29:21: error: redefinition of 'fallback' as different kind of symbol
   29 | volatile const char fallback[TCP_CA_NAME_MAX];
      |                     ^
/workspace/tools/testing/selftests/bpf/tools/include/vmlinux.h:86812:15: note: previous definition is here
 86812 | typedef u32 (*fallback)(u32, const unsigned char *, size_t);

The reason is that the `fallback` symbol has been defined in
arch/riscv/lib/crc32.c, which will cause symbol conflicts when vmlinux.h
is included in bpf_dctcp. Let we rename `fallback` string to
`fallback_cc` in bpf_dctcp to fix this compilation failure.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-3-pulehui@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a48a4388

selftests/bpf: Adapt OUTPUT appending logic to lower versions of Make · dc3a8804

Pu Lehui authored Sep 05, 2024

The $(let ...) function is only supported by GNU Make version 4.4 [0]
and above, otherwise the following exception file or directory will be
generated:

	tools/testing/selftests/bpfFEATURE-DUMP.selftests
	tools/testing/selftests/bpffeature/

Considering that the GNU Make version of most Linux distributions is
lower than 4.4, let us adapt the corresponding logic to it.

Link: https://lists.gnu.org/archive/html/info-gnu/2022-10/msg00008.html [0]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-2-pulehui@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

dc3a8804

libbpf: fix some typos in libbpf · bd4d67f8

Lin Yikai authored Sep 05, 2024

Hi, fix some spelling errors in libbpf, the details are as follows:

-in the code comments:
	termintaing->terminating
	architecutre->architecture
	requring->requiring
	recored->recoded
	sanitise->sanities
	allowd->allowed
	abover->above
	see bpf_udst_arg()->see bpf_usdt_arg()
Signed-off-by: Lin Yikai <yikai.lin@vivo.com>
Link: https://lore.kernel.org/r/20240905110354.3274546-3-yikai.lin@vivo.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

bd4d67f8

bpftool: fix some typos in bpftool · a86857d2

Lin Yikai authored Sep 05, 2024

Hi, fix some spelling errors in bpftool, the details are as follows:

-in file "bpftool-gen.rst"
	libppf->libbpf
-in the code comments:
	ouptut->output
Signed-off-by: Lin Yikai <yikai.lin@vivo.com>
Link: https://lore.kernel.org/r/20240905110354.3274546-2-yikai.lin@vivo.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a86857d2

selftests/bpf: fix some typos in selftests · 5db0ba67

Lin Yikai authored Sep 05, 2024

Hi, fix some spelling errors in selftest, the details are as follows:

-in the codes:
	test_bpf_sk_stoarge_map_iter_fd(void)
		->test_bpf_sk_storage_map_iter_fd(void)
	load BTF from btf_data.o->load BTF from btf_data.bpf.o

-in the code comments:
	preample->preamble
	multi-contollers->multi-controllers
	errono->errno
	unsighed/unsinged->unsigned
	egree->egress
	shoud->should
	regsiter->register
	assummed->assumed
	conditiona->conditional
	rougly->roughly
	timetamp->timestamp
	ingores->ignores
	null-termainted->null-terminated
	slepable->sleepable
	implemenation->implementation
	veriables->variables
	timetamps->timestamps
	substitue a costant->substitute a constant
	secton->section
	unreferened->unreferenced
	verifer->verifier
	libppf->libbpf
...
Signed-off-by: Lin Yikai <yikai.lin@vivo.com>
Link: https://lore.kernel.org/r/20240905110354.3274546-1-yikai.lin@vivo.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5db0ba67

Merge branch 'selftests-bpf-add-uprobe-multi-pid-filter-test' · 552895af

Andrii Nakryiko authored Sep 05, 2024

Jiri Olsa says:

====================
selftests/bpf: Add uprobe multi pid filter test

hi,
sending fix for uprobe multi pid filtering together with tests. The first
version included tests for standard uprobes, but as we still do not have
fix for that, sending just uprobe multi changes.

thanks,
jirka

v2 changes:
  - focused on uprobe multi only, removed perf event uprobe specific parts
  - added fix and test for CLONE_VM process filter
---
====================

Link: https://lore.kernel.org/r/20240905115124.1503998-1-jolsa@kernel.orgSigned-off-by: Andrii Nakryiko <andrii@kernel.org>

552895af

selftests/bpf: Add uprobe multi pid filter test for clone-ed processes · d2520bdb

Jiri Olsa authored Sep 05, 2024

The idea is to run same test as for test_pid_filter_process, but instead
of standard fork-ed process we create the process with clone(CLONE_VM..)
to make sure the thread leader process filter works properly in this case.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-5-jolsa@kernel.org

d2520bdb

selftests/bpf: Add uprobe multi pid filter test for fork-ed processes · 8df43e85

Jiri Olsa authored Sep 05, 2024

The idea is to create and monitor 3 uprobes, each trigered in separate
process and make sure the bpf program gets executed just for the proper
PID specified via pid filter.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-4-jolsa@kernel.org

8df43e85

selftests/bpf: Add child argument to spawn_child function · 0b0bb453

Jiri Olsa authored Sep 05, 2024

Adding child argument to spawn_child function to allow
to create multiple children in following change.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-3-jolsa@kernel.org

0b0bb453

bpf: Fix uprobe multi pid filter check · 900f362e

Jiri Olsa authored Sep 05, 2024

Uprobe multi link does its own process (thread leader) filtering before
running the bpf program by comparing task's vm pointers.

But as Oleg pointed out there can be processes sharing the vm (CLONE_VM),
so we can't just compare task->vm pointers, but instead we need to use
same_thread_group call.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-2-jolsa@kernel.org

900f362e

Merge branch 'fix-accessing-first-syscall-argument-on-rv64' · aa01d13e

Andrii Nakryiko authored Sep 04, 2024

Pu Lehui says:

====================
Fix accessing first syscall argument on RV64

On RV64, as Ilya mentioned before [0], the first syscall parameter should be
accessed through orig_a0 (see arch/riscv64/include/asm/syscall.h),
otherwise it will cause selftests like bpf_syscall_macro, vmlinux,
test_lsm, etc. to fail on RV64.

Link: https://lore.kernel.org/bpf/20220209021745.2215452-1-iii@linux.ibm.com [0]

v3:
- Fix test case error.

v2: https://lore.kernel.org/all/20240831023646.1558629-1-pulehui@huaweicloud.com/
- Access first syscall argument with CO-RE direct read. (Andrii)

v1: https://lore.kernel.org/all/20240829133453.882259-1-pulehui@huaweicloud.com/
====================

Link: https://lore.kernel.org/r/20240831041934.1629216-1-pulehui@huaweicloud.comSigned-off-by: Andrii Nakryiko <andrii@kernel.org>

aa01d13e

libbpf: Fix accessing first syscall argument on RV64 · 99857422

Pu Lehui authored Aug 31, 2024

On RV64, as Ilya mentioned before [0], the first syscall parameter should be
accessed through orig_a0 (see arch/riscv64/include/asm/syscall.h),
otherwise it will cause selftests like bpf_syscall_macro, vmlinux,
test_lsm, etc. to fail on RV64. Let's fix it by using the struct pt_regs
style CO-RE direct access.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-1-iii@linux.ibm.com [0]
Link: https://lore.kernel.org/bpf/20240831041934.1629216-5-pulehui@huaweicloud.com

99857422

selftests/bpf: Enable test_bpf_syscall_macro: Syscall_arg1 on s390 and arm64 · 4a4c4c0d

Pu Lehui authored Aug 31, 2024

Considering that CO-RE direct read access to the first system call
argument is already available on s390 and arm64, let's enable
test_bpf_syscall_macro:syscall_arg1 on these architectures.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-4-pulehui@huaweicloud.com

4a4c4c0d

libbpf: Access first syscall argument with CO-RE direct read on arm64 · 9ab94078

Pu Lehui authored Aug 31, 2024

Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
into account the read pt_regs comes directly from the context, let's use
CO-RE direct read to access the first system call argument.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-3-pulehui@huaweicloud.com

9ab94078

libbpf: Access first syscall argument with CO-RE direct read on s390 · e4db2a82

Pu Lehui authored Aug 31, 2024

Currently PT_REGS_PARM1 SYSCALL(x) is consistent with PT_REGS_PARM1_CORE
SYSCALL(x), which will introduce the overhead of BPF_CORE_READ(), taking
into account the read pt_regs comes directly from the context, let's use
CO-RE direct read to access the first system call argument.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-2-pulehui@huaweicloud.com

e4db2a82

04 Sep, 2024 9 commits

selftests/bpf: Add a selftest for x86 jit convergence issues · eff5b5ff

Yonghong Song authored Sep 04, 2024

The core part of the selftest, i.e., the je <-> jmp cycle, mimics the
original sched-ext bpf program. The test will fail without the
previous patch.

I tried to create some cases for other potential cycles
(je <-> je, jmp <-> je and jmp <-> jmp) with similar pattern
to the test in this patch, but failed. So this patch
only contains one test for je <-> jmp cycle.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240904221256.37389-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

eff5b5ff

bpf, x64: Fix a jit convergence issue · c8831bdb

Yonghong Song authored Sep 04, 2024

Daniel Hodges reported a jit error when playing with a sched-ext program.
The error message is:
  unexpected jmp_cond padding: -4 bytes

But further investigation shows the error is actual due to failed
convergence. The following are some analysis:

  ...
  pass4, final_proglen=4391:
    ...
    20e:    48 85 ff                test   rdi,rdi
    211:    74 7d                   je     0x290
    213:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
    ...
    289:    48 85 ff                test   rdi,rdi
    28c:    74 17                   je     0x2a5
    28e:    e9 7f ff ff ff          jmp    0x212
    293:    bf 03 00 00 00          mov    edi,0x3

Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125)
and insn at 0x28e is 5-byte jmp insn with offset -129.

  pass5, final_proglen=4392:
    ...
    20e:    48 85 ff                test   rdi,rdi
    211:    0f 84 80 00 00 00       je     0x297
    217:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
    ...
    28d:    48 85 ff                test   rdi,rdi
    290:    74 1a                   je     0x2ac
    292:    eb 84                   jmp    0x218
    294:    bf 03 00 00 00          mov    edi,0x3

Note that insn at 0x211 is 6-byte cond jump insn now since its offset
becomes 0x80 based on previous round (0x293 - 0x213 = 0x80). At the same
time, insn at 0x292 is a 2-byte insn since its offset is -124.

pass6 will repeat the same code as in pass4. pass7 will repeat the same
code as in pass5, and so on. This will prevent eventual convergence.

Passes 1-14 are with padding = 0. At pass15, padding is 1 and related
insn looks like:

    211:    0f 84 80 00 00 00       je     0x297
    217:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
    ...
    24d:    48 85 d2                test   rdx,rdx

The similar code in pass14:
    211:    74 7d                   je     0x290
    213:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
    ...
    249:    48 85 d2                test   rdx,rdx
    24c:    74 21                   je     0x26f
    24e:    48 01 f7                add    rdi,rsi
    ...

Before generating the following insn,
  250:    74 21                   je     0x273
"padding = 1" enables some checking to ensure nops is either 0 or 4
where
  #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
  nops = INSN_SZ_DIFF - 2

In this specific case,
  addrs[i] = 0x24e // from pass14
  addrs[i-1] = 0x24d // from pass15
  prog - temp = 3 // from 'test rdx,rdx' in pass15
so
  nops = -4
and this triggers the failure.

To fix the issue, we need to break cycles of je <-> jmp. For example,
in the above case, we have
  211:    74 7d                   je     0x290
the offset is 0x7d. If 2-byte je insn is generated only if
the offset is less than 0x7d (<= 0x7c), the cycle can be
break and we can achieve the convergence.

I did some study on other cases like je <-> je, jmp <-> je and
jmp <-> jmp which may cause cycles. Those cases are not from actual
reproducible cases since it is pretty hard to construct a test case
for them. the results show that the offset <= 0x7b (0x7b = 123) should
be enough to cover all cases. This patch added a new helper to generate 8-bit
cond/uncond jmp insns only if the offset range is [-128, 123].
Reported-by: Daniel Hodges <hodgesd@meta.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240904221251.37109-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c8831bdb

selftests: bpf: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE · 23457b37

Feng Yang authored Sep 03, 2024

The ARRAY_SIZE macro is more compact and more formal in linux source.
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240903072559.292607-1-yangfeng59949@163.com

23457b37

Merge branch 'bpf-follow-up-on-gen_epilogue' · 6fee7a7e

Alexei Starovoitov authored Sep 04, 2024

Martin KaFai Lau says:

====================
bpf: Follow up on gen_epilogue

From: Martin KaFai Lau <martin.lau@kernel.org>

The set addresses some follow ups on the earlier gen_epilogue
patch set.
====================
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240904180847.56947-1-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6fee7a7e

bpf: Fix indentation issue in epilogue_idx · 00750788

Martin KaFai Lau authored Sep 04, 2024

There is a report on new indentation issue in epilogue_idx.
This patch fixed it.

Fixes: 169c3176 ("bpf: Add gen_epilogue to bpf_verifier_ops")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408311622.4GzlzN33-lkp@intel.com/Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240904180847.56947-3-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

00750788

bpf: Remove the insn_buf array stack usage from the inline_bpf_loop() · 940ce73b

Martin KaFai Lau authored Sep 04, 2024

This patch removes the insn_buf array stack usage from the
inline_bpf_loop(). Instead, the env->insn_buf is used. The
usage in inline_bpf_loop() needs more than 16 insn, so the
INSN_BUF_SIZE needs to be increased from 16 to 32.
The compiler stack size warning on the verifier is gone
after this change.

Cc: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240904180847.56947-2-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

940ce73b

samples/bpf: Remove sample tracex2 · 46f4ea04

Rong Tao authored Aug 31, 2024

In commit ba8de796 ("net: introduce sk_skb_reason_drop function")
kfree_skb_reason() becomes an inline function and cannot be traced.

samples/bpf is abandonware by now, and we should slowly but surely
convert whatever makes sense into BPF selftests under
tools/testing/selftests/bpf and just get rid of the rest.

Link: https://github.com/torvalds/linux/commit/ba8de796baf4bdc03530774fb284fe3c97875566Signed-off-by: Rong Tao <rongtao@cestc.cn>
Link: https://lore.kernel.org/r/tencent_30ADAC88CB2915CA57E9512D4460035BA107@qq.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

46f4ea04

selftests/bpf: Fix procmap_query()'s params mismatch and compilation warning · 02baa0a2

Yuan Chen authored Sep 03, 2024

When the PROCMAP_QUERY is not defined, a compilation error occurs due to the
mismatch of the procmap_query()'s params, procmap_query() only be called in
the file where the function is defined, modify the params so they can match.

We get a warning when build samples/bpf:
    trace_helpers.c:252:5: warning: no previous prototype for ‘procmap_query’ [-Wmissing-prototypes]
      252 | int procmap_query(int fd, const void *addr, __u32 query_flags, size_t *start, size_t *offset, int *flags)
          |     ^~~~~~~~~~~~~
As this function is only used in the file, mark it as 'static'.

Fixes: 4e9e0760 ("selftests/bpf: make use of PROCMAP_QUERY ioctl if available")
Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
Link: https://lore.kernel.org/r/20240903012839.3178-1-chenyuan_fl@163.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

02baa0a2

bpf, arm64: Jit BPF_CALL to direct call when possible · ddbe9ec5

Xu Kuohai authored Sep 03, 2024

Currently, BPF_CALL is always jited to indirect call. When target is
within the range of direct call, BPF_CALL can be jited to direct call.

For example, the following BPF_CALL

call __htab_map_lookup_elem

is always jited to indirect call:

mov x10, #0xffffffffffff18f4
movk x10, #0x821, lsl #16
movk x10, #0x8000, lsl #32
blr x10

When the address of target __htab_map_lookup_elem is within the range of
direct call, the BPF_CALL can be jited to:

bl 0xfffffffffd33bc98

This patch does such jit optimization by emitting arm64 direct calls for
BPF_CALL when possible, indirect calls otherwise.

Without this patch, the jit works as follows.

1. First pass
A. Determine jited position and size for each bpf instruction.
B. Computed the jited image size.

2. Allocate jited image with size computed in step 1.

3. Second pass
A. Adjust jump offset for jump instructions
B. Write the final image.

This works because, for a given bpf prog, regardless of where the jited
image is allocated, the jited result for each instruction is fixed. The
second pass differs from the first only in adjusting the jump offsets,
like changing "jmp imm1" to "jmp imm2", while the position and size of
the "jmp" instruction remain unchanged.

Now considering whether to jit BPF_CALL to arm64 direct or indirect call
instruction. The choice depends solely on the jump offset: direct call
if the jump offset is within 128MB, indirect call otherwise.

For a given BPF_CALL, the target address is known, so the jump offset is
decided by the jited address of the BPF_CALL instruction. In other words,
for a given bpf prog, the jited result for each BPF_CALL is determined
by its jited address.

The jited address for a BPF_CALL is the jited image address plus the
total jited size of all preceding instructions. For a given bpf prog,
there are clearly no BPF_CALL instructions before the first BPF_CALL
instruction. Since the jited result for all other instructions other
than BPF_CALL are fixed, the total jited size preceding the first
BPF_CALL is also fixed. Therefore, once the jited image is allocated,
the jited address for the first BPF_CALL is fixed.

Now that the jited result for the first BPF_CALL is fixed, the jited
results for all instructions preceding the second BPF_CALL are fixed.
So the jited address and result for the second BPF_CALL are also fixed.

Similarly, we can conclude that the jited addresses and results for all
subsequent BPF_CALL instructions are fixed.

This means that, for a given bpf prog, once the jited image is allocated,
the jited address and result for all instructions, including all BPF_CALL
instructions, are fixed.

Based on the observation, with this patch, the jit works as follows.

1. First pass
Estimate the maximum jited image size. In this pass, all BPF_CALLs
are jited to arm64 indirect calls since the jump offsets are unknown
because the jited image is not allocated.

2. Allocate jited image with size estimated in step 1.

3. Second pass
A. Determine the jited result for each BPF_CALL.
B. Determine jited address and size for each bpf instruction.

4. Third pass
A. Adjust jump offset for jump instructions.
B. Write the final image.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240903094407.601107-1-xukuohai@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ddbe9ec5

02 Sep, 2024 2 commits

bpftool: Fix handling enum64 in btf dump sorting · b0222d1d

Mykyta Yatsenko authored Sep 02, 2024

Wrong function is used to access the first enum64 element. Substituting btf_enum(t)
with btf_enum64(t) for BTF_KIND_ENUM64.

Fixes: 94133cf2 ("bpftool: Introduce btf c dump sorting")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240902171721.105253-1-mykyta.yatsenko5@gmail.com

b0222d1d

bpftool: Add missing blank lines in bpftool-net doc example · 5d2784e2

Quentin Monnet authored Sep 01, 2024

In bpftool-net documentation, two blank lines are missing in a
recently added example, causing docutils to complain:

    $ cd tools/bpf/bpftool
    $ make doc
      DESCEND Documentation
      GEN     bpftool-btf.8
      GEN     bpftool-cgroup.8
      GEN     bpftool-feature.8
      GEN     bpftool-gen.8
      GEN     bpftool-iter.8
      GEN     bpftool-link.8
      GEN     bpftool-map.8
      GEN     bpftool-net.8
    <stdin>:189: (INFO/1) Possible incomplete section title.
    Treating the overline as ordinary text because it's so short.
    <stdin>:192: (INFO/1) Blank line missing before literal block (after the "::")? Interpreted as a definition list item.
    <stdin>:199: (INFO/1) Possible incomplete section title.
    Treating the overline as ordinary text because it's so short.
    <stdin>:201: (INFO/1) Blank line missing before literal block (after the "::")? Interpreted as a definition list item.
      GEN     bpftool-perf.8
      GEN     bpftool-prog.8
      GEN     bpftool.8
      GEN     bpftool-struct_ops.8

Add the missing blank lines.

Fixes: 0d7c0612 ("bpftool: Add document for net attach/detach on tcx subcommand")
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240901210742.25758-1-qmo@kernel.org

5d2784e2

30 Aug, 2024 10 commits

selftests/bpf: Do not update vmlinux.h unnecessarily · 2ad6d23f

Ihor Solodrai authored Aug 28, 2024

%.bpf.o objects depend on vmlinux.h, which makes them transitively
dependent on unnecessary libbpf headers. However vmlinux.h doesn't
actually change as often.

When generating vmlinux.h, compare it to a previous version and update
it only if there are changes.

Example of build time improvement (after first clean build):
  $ touch ../../../lib/bpf/bpf.h
  $ time make -j8
Before: real  1m37.592s
After:  real  0m27.310s

Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed.
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20240828174608.377204-2-ihor.solodrai@pm.me

2ad6d23f

selftests/bpf: Specify libbpf headers required for %.bpf.o progs · 38960ac8

Ihor Solodrai authored Aug 28, 2024

Test %.bpf.o objects actually depend only on some libbpf headers.
Define a list of required headers and use it as TRUNNER_BPF_OBJS
dependency.

bpf_*.h list was determined by:

    $ grep -rh 'include <bpf/bpf_' progs | sort -u
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link:
Link: https://lore.kernel.org/bpf/20240828174608.377204-1-ihor.solodrai@pm.me

https://lore.kernel.org/bpf/CAEf4BzYQ-j2i_xjs94Nn=8+FVfkWt51mLZyiYKiz9oA4Z=pCeA@mail.gmail.com/

38960ac8

selftests/bpf: Check if distilled base inherits source endianness · 181b0d1a

Eduard Zingerman authored Aug 30, 2024

Create a BTF with endianness different from host, make a distilled
base/split BTF pair from it, dump as raw bytes, import again and
verify that endianness is preserved.
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240830173406.1581007-1-eddyz87@gmail.com

181b0d1a

libbpf: Ensure new BTF objects inherit input endianness · da18bfa5

Tony Ambardar authored Aug 30, 2024

New split BTF needs to preserve base's endianness. Similarly, when
creating a distilled BTF, we need to preserve original endianness.

Fix by updating libbpf's btf__distill_base() and btf_new_empty() to retain
the byte order of any source BTF objects when creating new ones.

Fixes: ba451366 ("libbpf: Implement basic split BTF support")
Fixes: 58e185a0 ("libbpf: Add btf__distill_base() creating split BTF with distilled base BTF")
Reported-by: Song Liu <song@kernel.org>
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/6358db36c5f68b07873a0a5be2d062b1af5ea5f8.camel@gmail.com/
Link: https://lore.kernel.org/bpf/20240830095150.278881-1-tony.ambardar@gmail.com

da18bfa5

bpf: Use sockfd_put() helper · 65ef66d9

Jinjie Ruan authored Aug 30, 2024

Replace fput() with sockfd_put() in bpf_fd_reuseport_array_update_elem().
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20240830020756.607877-1-ruanjinjie@huawei.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

65ef66d9

bpf: Remove custom build rule · 1dd7622e

Alexey Gladkov authored Aug 30, 2024

According to the documentation, when building a kernel with the C=2
parameter, all source files should be checked. But this does not happen
for the kernel/bpf/ directory.

$ touch kernel/bpf/core.o
$ make C=2 CHECK=true kernel/bpf/core.o

Outputs:

  CHECK   scripts/mod/empty.c
  CALL    scripts/checksyscalls.sh
  DESCEND objtool
  INSTALL libsubcmd_headers
  CC      kernel/bpf/core.o

As can be seen the compilation is done, but CHECK is not executed. This
happens because kernel/bpf/Makefile has defined its own rule for
compilation and forgotten the macro that does the check.

There is no need to duplicate the build code, and this rule can be
removed to use generic rules.
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://lore.kernel.org/r/20240830074350.211308-1-legion@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1dd7622e

selftests/bpf: Add tests for iter next method returning valid pointer · 7c5f7b16

Juntong Deng authored Aug 29, 2024

This patch adds test cases for iter next method returning valid
pointer, which can also used as usage examples.

Currently iter next method should return valid pointer.

iter_next_trusted is the correct usage and test if iter next method
return valid pointer. bpf_iter_task_vma_next has KF_RET_NULL flag,
so the returned pointer may be NULL. We need to check if the pointer
is NULL before using it.

iter_next_trusted_or_null is the incorrect usage. There is no checking
before using the pointer, so it will be rejected by the verifier.

iter_next_rcu and iter_next_rcu_or_null are similar test cases for
KF_RCU_PROTECTED iterators.

iter_next_rcu_not_trusted is used to test that the pointer returned by
iter next method of KF_RCU_PROTECTED iterator cannot be passed in
KF_TRUSTED_ARGS kfuncs.

iter_next_ptr_mem_not_trusted is used to test that base type
PTR_TO_MEM should not be combined with type flag PTR_TRUSTED.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB5848709758F6922F02AF9F1F99962@AM6PR03MB5848.eurprd03.prod.outlook.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

7c5f7b16

bpf: Make the pointer returned by iter next method valid · 4cc8c50c

Juntong Deng authored Aug 29, 2024

Currently we cannot pass the pointer returned by iter next method as
argument to KF_TRUSTED_ARGS or KF_RCU kfuncs, because the pointer
returned by iter next method is not "valid".

This patch sets the pointer returned by iter next method to be valid.

This is based on the fact that if the iterator is implemented correctly,
then the pointer returned from the iter next method should be valid.

This does not make NULL pointer valid. If the iter next method has
KF_RET_NULL flag, then the verifier will ask the ebpf program to
check NULL pointer.

KF_RCU_PROTECTED iterator is a special case, the pointer returned by
iter next method should only be valid within RCU critical section,
so it should be with MEM_RCU, not PTR_TRUSTED.

Another special case is bpf_iter_num_next, which returns a pointer with
base type PTR_TO_MEM. PTR_TO_MEM should not be combined with type flag
PTR_TRUSTED (PTR_TO_MEM already means the pointer is valid).

The pointer returned by iter next method of other types of iterators
is with PTR_TRUSTED.

In addition, this patch adds get_iter_from_state to help us get the
current iterator from the current state.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB584869F8B448EA1C87B7CDA399962@AM6PR03MB5848.eurprd03.prod.outlook.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4cc8c50c

Merge branch 'bpf-add-gen_epilogue-to-bpf_verifier_ops' · f6284563

Alexei Starovoitov authored Aug 29, 2024

Martin KaFai Lau says:

====================
bpf: Add gen_epilogue to bpf_verifier_ops

From: Martin KaFai Lau <martin.lau@kernel.org>

This set allows the subsystem to patch codes before BPF_EXIT.
The verifier ops, .gen_epilogue, is added for this purpose.
One of the use case will be in the bpf qdisc, the bpf qdisc
subsystem can ensure the skb->dev is in the correct value.
The bpf qdisc subsystem can either inline fixing it in the
epilogue or call another kernel function to handle it (e.g. drop)
in the epilogue. Another use case could be in bpf_tcp_ca.c to
enforce snd_cwnd has valid value (e.g. positive value).

v5:
 * Removed the skip_cnt argument from adjust_jmp_off() in patch 2.
   Instead, reuse the delta argument and skip
   the [tgt_idx, tgt_idx + delta) instructions.
 * Added a BPF_JMP32_A macro in patch 3.
 * Removed pro_epilogue_subprog.c in patch 6.
   The pro_epilogue_kfunc.c has covered the subprog case.
   Renamed the file pro_epilogue_kfunc.c to pro_epilogue.c.
   Some of the SEC names and function names are changed
   accordingly (mainly shorten them by removing the _kfunc suffix).
 * Added comments to explain the tail_call result in patch 7.
 * Fixed the following bpf CI breakages. I ran it in CI
   manually to confirm:
   https://github.com/kernel-patches/bpf/actions/runs/10590714532
 * s390 zext added "w3 = w3". Adjusted the test to
   use all ALU64 and BPF_DW to avoid zext.
   Also changed the "int a" in the "struct st_ops_args" to "u64 a".
 * llvm17 does not take:
       *(u64 *)(r1 +0) = 0;
   so it is changed to:
       r3 = 0;
       *(u64 *)(r1 +0) = r3;

v4:
 * Fixed a bug in the memcpy in patch 3
   The size in the memcpy should be
   epilogue_cnt * sizeof(*epilogue_buf)

v3:
 * Moved epilogue_buf[16] to env.
   Patch 1 is added to move the existing insn_buf[16] to env.
 * Fixed a case that the bpf prog has a BPF_JMP that goes back
   to the first instruction of the main prog.
   The jump back to 1st insn case also applies to the prologue.
   Patch 2 is added to handle it.
 * If the bpf main prog has multiple BPF_EXIT, use a BPF_JA
   to goto the earlier patched epilogue.
   Note that there are (BPF_JMP32 | BPF_JA) vs (BPF_JMP | BPF_JA)
   details in the patch 3 commit message.
 * There are subtle changes in patch 3, so I reset the Reviewed-by.
 * Added patch 8 and patch 9 to cover the changes in patch 2 and patch 3.
 * Dropped the kfunc call from pro/epilogue and its selftests.

v2:
 * Remove the RFC tag. Keep the ordering at where .gen_epilogue is
   called in the verifier relative to the check_max_stack_depth().
   This will be consistent with the other extra stack_depth
   usage like optimize_bpf_loop().
 * Use __xlated check provided by the test_loader to
   check the patched instructions after gen_pro/epilogue (Eduard).
 * Added Patch 3 by Eduard (Thanks!).
====================

Link: https://lore.kernel.org/r/20240829210833.388152-1-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f6284563

selftests/bpf: Test epilogue patching when the main prog has multiple BPF_EXIT · cada0bdc

Martin KaFai Lau authored Aug 29, 2024

This patch tests the epilogue patching when the main prog has
multiple BPF_EXIT. The verifier should have patched the 2nd (and
later) BPF_EXIT with a BPF_JA that goes back to the earlier
patched epilogue instructions.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240829210833.388152-10-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

cada0bdc