Commits · 5b268d1ebcdceacf992dfda8f9031d56005a274e · Kirill Smelkov / linux

13 Feb, 2024 1 commit

bpf: Have bpf_rdonly_cast() take a const pointer · 5b268d1e

Daniel Xu authored Feb 04, 2024

Since 20d59ee5 ("libbpf: add bpf_core_cast() macro"), libbpf is now
exporting a const arg version of bpf_rdonly_cast(). This causes the
following conflicting type error when generating kfunc prototypes from
BTF:

In file included from skeleton/pid_iter.bpf.c:5:
/home/dxu/dev/linux/tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_core_read.h:297:14: error: conflicting types for 'bpf_rdonly_cast'
extern void *bpf_rdonly_cast(const void *obj__ign, __u32 btf_id__k) __ksym __weak;
             ^
./vmlinux.h:135625:14: note: previous declaration is here
extern void *bpf_rdonly_cast(void *obj__ign, u32 btf_id__k) __weak __ksym;

This is b/c the kernel defines bpf_rdonly_cast() with non-const arg.
Since const arg is more permissive and thus backwards compatible, we
change the kernel definition as well to avoid conflicting type errors.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/dfd3823f11ffd2d4c838e961d61ec9ae8a646773.1707080349.git.dxu@dxuuu.xyz

5b268d1e

11 Feb, 2024 1 commit

bpf: Allow compiler to inline most of bpf_local_storage_lookup() · 68bc61c2

Marco Elver authored Feb 07, 2024

In various performance profiles of kernels with BPF programs attached,
bpf_local_storage_lookup() appears as a significant portion of CPU
cycles spent. To enable the compiler generate more optimal code, turn
bpf_local_storage_lookup() into a static inline function, where only the
cache insertion code path is outlined

Notably, outlining cache insertion helps avoid bloating callers by
duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on
architectures which do not inline spin_lock/unlock, such as x86), which
would cause the compiler produce worse code by deciding to outline
otherwise inlinable functions. The call overhead is neutral, because we
make 2 calls either way: either calling raw_spin_lock_irqsave() and
raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(),
which calls raw_spin_lock_irqsave(), followed by a tail-call to
raw_spin_unlock_irqsave() where the compiler can perform TCO and (in
optimized uninstrumented builds) turns it into a plain jump. The call to
__bpf_local_storage_insert_cache() can be elided entirely if
cacheit_lockit is a false constant expression.

Based on results from './benchs/run_bench_local_storage.sh' (21 trials,
reboot between each trial; x86 defconfig + BPF, clang 16) this produces
improvements in throughput and latency in the majority of cases, with an
average (geomean) improvement of 8%:

+---- Hashmap Control --------------------
|
| + num keys: 10
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 14.789 M ops/s       | 14.745 M ops/s (  ~  )
|   +- hits latency                       | 67.679 ns/op         | 67.879 ns/op   (  ~  )
|   +- important_hits throughput          | 14.789 M ops/s       | 14.745 M ops/s (  ~  )
|
| + num keys: 1000
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 12.233 M ops/s       | 12.170 M ops/s (  ~  )
|   +- hits latency                       | 81.754 ns/op         | 82.185 ns/op   (  ~  )
|   +- important_hits throughput          | 12.233 M ops/s       | 12.170 M ops/s (  ~  )
|
| + num keys: 10000
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 7.220 M ops/s        | 7.204 M ops/s  (  ~  )
|   +- hits latency                       | 138.522 ns/op        | 138.842 ns/op  (  ~  )
|   +- important_hits throughput          | 7.220 M ops/s        | 7.204 M ops/s  (  ~  )
|
| + num keys: 100000
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 5.061 M ops/s        | 5.165 M ops/s  (+2.1%)
|   +- hits latency                       | 198.483 ns/op        | 194.270 ns/op  (-2.1%)
|   +- important_hits throughput          | 5.061 M ops/s        | 5.165 M ops/s  (+2.1%)
|
| + num keys: 4194304
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 2.864 M ops/s        | 2.882 M ops/s  (  ~  )
|   +- hits latency                       | 365.220 ns/op        | 361.418 ns/op  (-1.0%)
|   +- important_hits throughput          | 2.864 M ops/s        | 2.882 M ops/s  (  ~  )
|
+---- Local Storage ----------------------
|
| + num_maps: 1
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 33.005 M ops/s       | 39.068 M ops/s (+18.4%)
|   +- hits latency                       | 30.300 ns/op         | 25.598 ns/op   (-15.5%)
|   +- important_hits throughput          | 33.005 M ops/s       | 39.068 M ops/s (+18.4%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 37.151 M ops/s       | 44.926 M ops/s (+20.9%)
|   +- hits latency                       | 26.919 ns/op         | 22.259 ns/op   (-17.3%)
|   +- important_hits throughput          | 37.151 M ops/s       | 44.926 M ops/s (+20.9%)
|
| + num_maps: 10
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 32.288 M ops/s       | 38.099 M ops/s (+18.0%)
|   +- hits latency                       | 30.972 ns/op         | 26.248 ns/op   (-15.3%)
|   +- important_hits throughput          | 3.229 M ops/s        | 3.810 M ops/s  (+18.0%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 34.473 M ops/s       | 41.145 M ops/s (+19.4%)
|   +- hits latency                       | 29.010 ns/op         | 24.307 ns/op   (-16.2%)
|   +- important_hits throughput          | 12.312 M ops/s       | 14.695 M ops/s (+19.4%)
|
| + num_maps: 16
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 32.524 M ops/s       | 38.341 M ops/s (+17.9%)
|   +- hits latency                       | 30.748 ns/op         | 26.083 ns/op   (-15.2%)
|   +- important_hits throughput          | 2.033 M ops/s        | 2.396 M ops/s  (+17.9%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 34.575 M ops/s       | 41.338 M ops/s (+19.6%)
|   +- hits latency                       | 28.925 ns/op         | 24.193 ns/op   (-16.4%)
|   +- important_hits throughput          | 11.001 M ops/s       | 13.153 M ops/s (+19.6%)
|
| + num_maps: 17
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 28.861 M ops/s       | 32.756 M ops/s (+13.5%)
|   +- hits latency                       | 34.649 ns/op         | 30.530 ns/op   (-11.9%)
|   +- important_hits throughput          | 1.700 M ops/s        | 1.929 M ops/s  (+13.5%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 31.529 M ops/s       | 36.110 M ops/s (+14.5%)
|   +- hits latency                       | 31.719 ns/op         | 27.697 ns/op   (-12.7%)
|   +- important_hits throughput          | 9.598 M ops/s        | 10.993 M ops/s (+14.5%)
|
| + num_maps: 24
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 18.602 M ops/s       | 19.937 M ops/s (+7.2%)
|   +- hits latency                       | 53.767 ns/op         | 50.166 ns/op   (-6.7%)
|   +- important_hits throughput          | 0.776 M ops/s        | 0.831 M ops/s  (+7.2%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 21.718 M ops/s       | 23.332 M ops/s (+7.4%)
|   +- hits latency                       | 46.047 ns/op         | 42.865 ns/op   (-6.9%)
|   +- important_hits throughput          | 6.110 M ops/s        | 6.564 M ops/s  (+7.4%)
|
| + num_maps: 32
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 14.118 M ops/s       | 14.626 M ops/s (+3.6%)
|   +- hits latency                       | 70.856 ns/op         | 68.381 ns/op   (-3.5%)
|   +- important_hits throughput          | 0.442 M ops/s        | 0.458 M ops/s  (+3.6%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 17.111 M ops/s       | 17.906 M ops/s (+4.6%)
|   +- hits latency                       | 58.451 ns/op         | 55.865 ns/op   (-4.4%)
|   +- important_hits throughput          | 4.776 M ops/s        | 4.998 M ops/s  (+4.6%)
|
| + num_maps: 100
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 5.281 M ops/s        | 5.528 M ops/s  (+4.7%)
|   +- hits latency                       | 192.398 ns/op        | 183.059 ns/op  (-4.9%)
|   +- important_hits throughput          | 0.053 M ops/s        | 0.055 M ops/s  (+4.9%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 6.265 M ops/s        | 6.498 M ops/s  (+3.7%)
|   +- hits latency                       | 161.436 ns/op        | 152.877 ns/op  (-5.3%)
|   +- important_hits throughput          | 1.636 M ops/s        | 1.697 M ops/s  (+3.7%)
|
| + num_maps: 1000
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 0.355 M ops/s        | 0.354 M ops/s  (  ~  )
|   +- hits latency                       | 2826.538 ns/op       | 2827.139 ns/op (  ~  )
|   +- important_hits throughput          | 0.000 M ops/s        | 0.000 M ops/s  (  ~  )
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 0.404 M ops/s        | 0.403 M ops/s  (  ~  )
|   +- hits latency                       | 2481.190 ns/op       | 2487.555 ns/op (  ~  )
|   +- important_hits throughput          | 0.102 M ops/s        | 0.101 M ops/s  (  ~  )

The on_lookup test in {cgrp,task}_ls_recursion.c is removed
because the bpf_local_storage_lookup is no longer traceable
and adding tracepoint will make the compiler generate worse
code: https://lore.kernel.org/bpf/ZcJmok64Xqv6l4ZS@elver.google.com/Signed-off-by: Marco Elver <elver@google.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240207122626.3508658-1-elver@google.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

68bc61c2

08 Feb, 2024 7 commits

Merge branch 'bpf, btf: Add DEBUG_INFO_BTF checks for __register_bpf_struct_ops' · a7170d81

Martin KaFai Lau authored Feb 08, 2024

Geliang Tang says:

====================
bpf: Add DEBUG_INFO_BTF checks for __register_bpf_struct_ops

This patch set avoids module loading failure when the module
trying to register a struct_ops and the module has its btf section
stripped. This will then work similarly as module kfunc registration in
commit 3de4d22c ("bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()")

v5:
 - drop CONFIG_MODULE_ALLOW_BTF_MISMATCH check as Martin suggested.

v4:
 - add a new patch to fix error checks for btf_get_module_btf.
 - rename the helper to check_btf_kconfigs.

v3:
 - fix this build error:
kernel/bpf/btf.c:7750:11: error: incomplete definition of type 'struct module'
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402040934.Fph0XeEo-lkp@intel.com/

v2:
 - add register_check_missing_btf helper as Jiri suggested.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

a7170d81

bpf, btf: Check btf for register_bpf_struct_ops · 947e56f8

Geliang Tang authored Feb 08, 2024

Similar to the handling in the functions __register_btf_kfunc_id_set()
and register_btf_id_dtor_kfuncs(), this patch uses the newly added
helper check_btf_kconfigs() to handle module with its btf section
stripped.

While at it, the patch also adds the missed IS_ERR() check to fix the
commit f6be98d1 ("bpf, net: switch to dynamic registration")

Fixes: f6be98d1 ("bpf, net: switch to dynamic registration")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/69082b9835463fe36f9e354bddf2d0a97df39c2b.1707373307.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

947e56f8

bpf, btf: Add check_btf_kconfigs helper · 9e60b0e0

Geliang Tang authored Feb 08, 2024

This patch extracts duplicate code on error path when btf_get_module_btf()
returns NULL from the functions __register_btf_kfunc_id_set() and
register_btf_id_dtor_kfuncs() into a new helper named check_btf_kconfigs()
to check CONFIG_DEBUG_INFO_BTF and CONFIG_DEBUG_INFO_BTF_MODULES in it.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/fa5537fc55f1e4d0bfd686598c81b7ab9dbd82b7.1707373307.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

9e60b0e0

bpf, btf: Fix return value of register_btf_id_dtor_kfuncs · b9a395f0

Geliang Tang authored Feb 08, 2024

The same as __register_btf_kfunc_id_set(), to let the modules with
stripped btf section loaded, this patch changes the return value of
register_btf_id_dtor_kfuncs() too from -ENOENT to 0 when btf is NULL.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/eab65586d7fb0e72f2707d3747c7d4a5d60c823f.1707373307.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

b9a395f0

bpf: Merge two CONFIG_BPF entries · e55dad12

Masahiro Yamada authored Feb 04, 2024

'config BPF' exists in both init/Kconfig and kernel/bpf/Kconfig.

Commit b24abcff ("bpf, kconfig: Add consolidated menu entry for bpf
with core options") added the second one to kernel/bpf/Kconfig instead
of moving the existing one.

Merge them together.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240204075634.32969-1-masahiroy@kernel.org

e55dad12

selftests/bpf: Mark cpumask kfunc declarations as __weak · ba6a6abb

Yafang Shao authored Feb 06, 2024

After the series "Annotate kfuncs in .BTF_ids section"[0], kfuncs can be
generated from bpftool. Let's mark the existing cpumask kfunc declarations
__weak so they don't conflict with definitions that will eventually come
from vmlinux.h.

[0]. https://lore.kernel.org/all/cover.1706491398.git.dxu@dxuuu.xyzSuggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/20240206081416.26242-5-laoar.shao@gmail.com

ba6a6abb

selftests/bpf: Fix error checking for cpumask_success__load() · a2bff65c

Yafang Shao authored Feb 06, 2024

We should verify the return value of cpumask_success__load().
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240206081416.26242-4-laoar.shao@gmail.com

a2bff65c

07 Feb, 2024 3 commits

Merge branch 'tools-resolve_btfids-fix-cross-compilation-to-non-host-endianness' · abae1ac5

Andrii Nakryiko authored Feb 07, 2024

Viktor Malik says:

====================
tools/resolve_btfids: fix cross-compilation to non-host endianness

The .BTF_ids section is pre-filled with zeroed BTF ID entries during the
build and afterwards patched by resolve_btfids with correct values.
Since resolve_btfids always writes in host-native endianness, it relies
on libelf to do the translation when the target ELF is cross-compiled to
a different endianness (this was introduced in commit 61e8aeda
("bpf: Fix libelf endian handling in resolv_btfids")).

Unfortunately, the translation will corrupt the flags fields of SET8
entries because these were written during vmlinux compilation and are in
the correct endianness already. This will lead to numerous selftests
failures such as:

    $ sudo ./test_verifier 502 502
    #502/p sleepable fentry accept FAIL
    Failed to load prog 'Invalid argument'!
    bpf_fentry_test1 is not sleepable
    verification time 34 usec
    stack depth 0
    processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
    Summary: 0 PASSED, 0 SKIPPED, 1 FAILED

Since it's not possible to instruct libelf to translate just certain
values, let's manually bswap the flags (both global and entry flags) in
resolve_btfids when needed, so that libelf then translates everything
correctly.

The first patch of the series refactors resolve_btfids by using types
from btf_ids.h instead of accessing the BTF ID data using magic offsets.
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
Changes in v4:
- remove unnecessary vars and pointer casts (suggested by Daniel Xu)

Changes in v3:
- add byte swap of global 'flags' field in btf_id_set8 (suggested by
  Jiri Olsa)
- cleaner refactoring of sets_patch (suggested by Jiri Olsa)
- add compile-time assertion that IDs are at the beginning of pairs
  struct in btf_id_set8 (suggested by Daniel Borkmann)

Changes in v2:
- use type defs from btf_ids.h (suggested by Andrii Nakryiko)
====================

Link: https://lore.kernel.org/r/cover.1707223196.git.vmalik@redhat.comSigned-off-by: Andrii Nakryiko <andrii@kernel.org>

abae1ac5

tools/resolve_btfids: Fix cross-compilation to non-host endianness · 903fad43

Viktor Malik authored Feb 06, 2024

The .BTF_ids section is pre-filled with zeroed BTF ID entries during the
build and afterwards patched by resolve_btfids with correct values.
Since resolve_btfids always writes in host-native endianness, it relies
on libelf to do the translation when the target ELF is cross-compiled to
a different endianness (this was introduced in commit 61e8aeda
("bpf: Fix libelf endian handling in resolv_btfids")).

Unfortunately, the translation will corrupt the flags fields of SET8
entries because these were written during vmlinux compilation and are in
the correct endianness already. This will lead to numerous selftests
failures such as:

    $ sudo ./test_verifier 502 502
    #502/p sleepable fentry accept FAIL
    Failed to load prog 'Invalid argument'!
    bpf_fentry_test1 is not sleepable
    verification time 34 usec
    stack depth 0
    processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
    Summary: 0 PASSED, 0 SKIPPED, 1 FAILED

Since it's not possible to instruct libelf to translate just certain
values, let's manually bswap the flags (both global and entry flags) in
resolve_btfids when needed, so that libelf then translates everything
correctly.

Fixes: ef2c6f37 ("tools/resolve_btfids: Add support for 8-byte BTF sets")
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/7b6bff690919555574ce0f13d2a5996cacf7bf69.1707223196.git.vmalik@redhat.com

903fad43

tools/resolve_btfids: Refactor set sorting with types from btf_ids.h · 9707ac4f

Viktor Malik authored Feb 06, 2024

Instead of using magic offsets to access BTF ID set data, leverage types
from btf_ids.h (btf_id_set and btf_id_set8) which define the actual
layout of the data. Thanks to this change, set sorting should also
continue working if the layout changes.

This requires to sync the definition of 'struct btf_id_set8' from
include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync
the rest of the file at the moment, b/c that would require to also sync
multiple dependent headers and we don't need any other defs from
btf_ids.h.
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com

9707ac4f

06 Feb, 2024 14 commits

libbpf: Use OPTS_SET() macro in bpf_xdp_query() · 92a871ab

Toke Høiland-Jørgensen authored Feb 06, 2024

When the feature_flags and xdp_zc_max_segs fields were added to the libbpf
bpf_xdp_query_opts, the code writing them did not use the OPTS_SET() macro.
This causes libbpf to write to those fields unconditionally, which means
that programs compiled against an older version of libbpf (with a smaller
size of the bpf_xdp_query_opts struct) will have its stack corrupted by
libbpf writing out of bounds.

The patch adding the feature_flags field has an early bail out if the
feature_flags field is not part of the opts struct (via the OPTS_HAS)
macro, but the patch adding xdp_zc_max_segs does not. For consistency, this
fix just changes the assignments to both fields to use the OPTS_SET()
macro.

Fixes: 13ce2daa ("xsk: add new netlink attribute dedicated for ZC max frags")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240206125922.1992815-1-toke@redhat.com

92a871ab

bpf: Use -Wno-address-of-packed-member in some selftests · c27aa462

Jose E. Marchesi authored Feb 06, 2024

[Differences from V2:
- Remove conditionals in the source files pragmas, as the
  pragma is supported by both GCC and clang.]

Both GCC and clang implement the -Wno-address-of-packed-member
warning, which is enabled by -Wall, that warns about taking the
address of a packed struct field when it can lead to an "unaligned"
address.

This triggers the following errors (-Werror) when building three
particular BPF selftests with GCC:

  progs/test_cls_redirect.c
  986 |         if (ipv4_is_fragment((void *)&encap->ip)) {
  progs/test_cls_redirect_dynptr.c
  410 |         pkt_ipv4_checksum((void *)&encap_gre->ip);
  progs/test_cls_redirect.c
  521 |         pkt_ipv4_checksum((void *)&encap_gre->ip);
  progs/test_tc_tunnel.c
   232 |         set_ipv4_csum((void *)&h_outer.ip);

These warnings do not signal any real problem in the tests as far as I
can see.

This patch adds pragmas to these test files that inhibit the
-Waddress-of-packed-member warning.

Tested in bpf-next master.
No regressions.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240206102330.7113-1-jose.marchesi@oracle.com

c27aa462

bpf, docs: Fix typos in instructions-set.rst · 563918a0

Dave Thaler authored Feb 05, 2024

* "imm32" should just be "imm"
* Add blank line to fix formatting error reported by Stephen Rothwell [0]

[0]: https://lore.kernel.org/bpf/20240206153301.4ead0bad@canb.auug.org.au/T/#uSigned-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240206045146.4965-1-dthaler1968@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

563918a0

selftests/bpf: mark dynptr kfuncs __weak to make them optional on old kernels · c7dcb6c9

Andrii Nakryiko authored Feb 05, 2024

Mark dynptr kfuncs as __weak to allow
verifier_global_subprogs/arg_ctx_{perf,kprobe,raw_tp} subtests to be
loadable on old kernels. Because bpf_dynptr_from_xdp() kfunc is used
from arg_tag_dynptr BPF program in progs/verifier_global_subprogs.c
*and* is not marked as __weak, loading any subtest from
verifier_global_subprogs fails on old kernels that don't have
bpf_dynptr_from_xdp() kfunc defined. Even if arg_tag_dynptr program
itself is not loaded, libbpf bails out on non-weak reference to
bpf_dynptr_from_xdp (that can't be resolved), which shared across all
programs in progs/verifier_global_subprogs.c.

So mark all dynptr-related kfuncs as __weak to unblock libbpf CI ([0]).
In the upcoming "kfunc in vmlinux.h" work we should make sure that
kfuncs are always declared __weak as well.

  [0] https://github.com/libbpf/libbpf/actions/runs/7792673215/job/21251250831?pr=776#step:4:7961Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240206004008.1541513-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c7dcb6c9

libbpf: fix return value for PERF_EVENT __arg_ctx type fix up check · d7bc416a

Andrii Nakryiko authored Feb 05, 2024

If PERF_EVENT program has __arg_ctx argument with matching
architecture-specific pt_regs/user_pt_regs/user_regs_struct pointer
type, libbpf should still perform type rewrite for old kernels, but not
emit the warning. Fix copy/paste from kernel code where 0 is meant to
signify "no error" condition. For libbpf we need to return "true" to
proceed with type rewrite (which for PERF_EVENT program will be
a canonical `struct bpf_perf_event_data *` type).

Fixes: 9eea8faf ("libbpf: fix __arg_ctx type enforcement for perf_event programs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240206002243.1439450-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d7bc416a

Merge branch 'xsk-support-redirect-to-any-socket-bound-to-the-same-umem' · 6146fae6

Alexei Starovoitov authored Feb 05, 2024

Magnus Karlsson says:

====================
xsk: support redirect to any socket bound to the same umem

This patch set adds support for directing a packet to any socket bound
to the same umem. This makes it possible to use the XDP program to
select what socket the packet should be received on. The user can
populate the XSKMAP with various sockets and as long as they share the
same umem, the XDP program can pick any one of them.

The implementation is straight-forward. Instead of testing that the
incoming packet is targeting the same device and queue id as the
socket is bound to, just check that the umem the packet was received
on is the same as the socket we want it to be received on. This
guarantees that the redirect is legal as it is already in the correct
umem.

Patch #1 implements the feature and patch #2 adds documentation.

Thanks: Magnus
====================

Link: https://lore.kernel.org/r/20240205123553.22180-1-magnus.karlsson@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6146fae6

xsk: document ability to redirect to any socket bound to the same umem · 968595a9

Magnus Karlsson authored Feb 05, 2024

Document the ability to redirect to any socket bound to the same umem.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20240205123553.22180-3-magnus.karlsson@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

968595a9

xsk: support redirect to any socket bound to the same umem · 2863d665

Magnus Karlsson authored Feb 05, 2024

Add support for directing a packet to any socket bound to the same
umem. This makes it possible to use the XDP program to select what
socket the packet should be received on. The user can populate the
XSKMAP with various sockets and as long as they share the same umem,
the XDP program can pick any one of them.
Suggested-by: Yuval El-Hanany <yuvale@radware.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240205123553.22180-2-magnus.karlsson@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

2863d665

Merge branch 'transfer-rcu-lock-state-across-subprog-calls' · 20a286c1

Alexei Starovoitov authored Feb 05, 2024

Kumar Kartikeya Dwivedi says:

====================
Transfer RCU lock state across subprog calls

David suggested during the discussion in [0] that we should handle RCU
locks in a similar fashion to spin locks where the verifier understands
when a lock held in a caller is released in callee, or lock taken in
callee is released in a caller, or the callee is called within a lock
critical section. This set extends the same semantics to RCU read locks
and adds a few selftests to verify correct behavior. This issue has also
come up for sched-ext programs.

This would now allow static subprog calls to be made without errors
within RCU read sections, for subprogs to release RCU locks of callers
and return to them, or for subprogs to take RCU lock which is later
released in the caller.

  [0]: https://lore.kernel.org/bpf/20240204120206.796412-1-memxor@gmail.com

Changelog:
----------
v1 -> v2:
v1: https://lore.kernel.org/bpf/20240204230231.1013964-1-memxor@gmail.com

 * Add tests for global subprog behaviour (Yafang)
 * Add Acks, Tested-by (Yonghong, Yafang)
====================

Link: https://lore.kernel.org/r/20240205055646.1112186-1-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

20a286c1

selftests/bpf: Add tests for RCU lock transfer between subprogs · 8be6a014

Kumar Kartikeya Dwivedi authored Feb 05, 2024

Add selftests covering the following cases:
- A static or global subprog called from within a RCU read section works
- A static subprog taking an RCU read lock which is released in caller works
- A static subprog releasing the caller's RCU read lock works

Global subprogs that leave the lock in an imbalanced state will not
work, as they are verified separately, so ensure those cases fail as
well.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240205055646.1112186-3-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8be6a014

bpf: Transfer RCU lock state between subprog calls · 6fceea0f

Kumar Kartikeya Dwivedi authored Feb 05, 2024

Allow transferring an imbalanced RCU lock state between subprog calls
during verification. This allows patterns where a subprog call returns
with an RCU lock held, or a subprog call releases an RCU lock held by
the caller. Currently, the verifier would end up complaining if the RCU
lock is not released when processing an exit from a subprog, which is
non-ideal if its execution is supposed to be enclosed in an RCU read
section of the caller.

Instead, simply only check whether we are processing exit for frame#0
and do not complain on an active RCU lock otherwise. We only need to
update the check when processing BPF_EXIT insn, as copy_verifier_state
is already set up to do the right thing.
Suggested-by: David Vernet <void@manifault.com>
Tested-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240205055646.1112186-2-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6fceea0f

Merge branch 'enable-static-subprog-calls-in-spin-lock-critical-sections' · 8244ab50

Alexei Starovoitov authored Feb 05, 2024

Kumar Kartikeya Dwivedi says:

====================
Enable static subprog calls in spin lock critical sections

This set allows a BPF program to make a call to a static subprog within
a bpf_spin_lock critical section. This problem has been hit in sched-ext
and ghOSt [0] as well, and is mostly an annoyance which is worked around
by inling the static subprog into the critical section.

In case of sched-ext, there are a lot of other helper/kfunc calls that
need to be allow listed for the support to be complete, but a separate
follow up will deal with that.

Unlike static subprogs, global subprogs cannot be allowed yet as the
verifier will not explore their body when encountering a call
instruction for them. Therefore, we would need an alternative approach
(some sort of function summarization to ensure a lock is never taken
from a global subprog and all its callees).

 [0]: https://lore.kernel.org/bpf/bd173bf2-dea6-3e0e-4176-4a9256a9a056@google.com

Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/20240204120206.796412-1-memxor@gmail.com

 * Indicate global function call in verifier error string (Yonghong, David)
 * Add Acks from Yonghong, David
====================

Link: https://lore.kernel.org/r/20240204222349.938118-1-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8244ab50

selftests/bpf: Add test for static subprog call in lock cs · e8699c4f

Kumar Kartikeya Dwivedi authored Feb 04, 2024

Add selftests for static subprog calls within bpf_spin_lock critical
section, and ensure we still reject global subprog calls. Also test the
case where a subprog call will unlock the caller's held lock, or the
caller will unlock a lock taken by a subprog call, ensuring correct
transfer of lock state across frames on exit.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240204222349.938118-3-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e8699c4f

bpf: Allow calling static subprogs while holding a bpf_spin_lock · a44b1334

Kumar Kartikeya Dwivedi authored Feb 04, 2024

Currently, calling any helpers, kfuncs, or subprogs except the graph
data structure (lists, rbtrees) API kfuncs while holding a bpf_spin_lock
is not allowed. One of the original motivations of this decision was to
force the BPF programmer's hand into keeping the bpf_spin_lock critical
section small, and to ensure the execution time of the program does not
increase due to lock waiting times. In addition to this, some of the
helpers and kfuncs may be unsafe to call while holding a bpf_spin_lock.

However, when it comes to subprog calls, atleast for static subprogs,
the verifier is able to explore their instructions during verification.
Therefore, it is similar in effect to having the same code inlined into
the critical section. Hence, not allowing static subprog calls in the
bpf_spin_lock critical section is mostly an annoyance that needs to be
worked around, without providing any tangible benefit.

Unlike static subprog calls, global subprog calls are not safe to permit
within the critical section, as the verifier does not explore them
during verification, therefore whether the same lock will be taken
again, or unlocked, cannot be ascertained.

Therefore, allow calling static subprogs within a bpf_spin_lock critical
section, and only reject it in case the subprog linkage is global.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240204222349.938118-2-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a44b1334

05 Feb, 2024 5 commits

bpf, docs: Expand set of initial conformance groups · 2d9a925d

Dave Thaler authored Feb 02, 2024

This patch attempts to update the ISA specification according
to the latest mailing list discussion about conformance groups,
in a way that is intended to be consistent with IANA registry
processes and IETF 118 WG meeting discussion.

It does the following:
* Split basic into base32 and base64 for 32-bit vs 64-bit base
  instructions
* Split division/multiplication/modulo instructions out of base groups
* Split atomic instructions out of base groups

There may be additional changes as discussion continues,
but there seems to be consensus on the principles above.

v1->v2: fixed typo pointed out by David Vernet

v2->v3: Moved multiplication to same groups as division/modulo
Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240202221110.3872-1-dthaler1968@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

2d9a925d

selftests/bpf: Fix flaky selftest lwt_redirect/lwt_reroute · e7f31873

Yonghong Song authored Feb 04, 2024

Recently, when running './test_progs -j', I occasionally hit the
following errors:

  test_lwt_redirect:PASS:pthread_create 0 nsec
  test_lwt_redirect_run:FAIL:netns_create unexpected error: 256 (errno 0)
  #142/2   lwt_redirect/lwt_redirect_normal_nomac:FAIL
  #142     lwt_redirect:FAIL
  test_lwt_reroute:PASS:pthread_create 0 nsec
  test_lwt_reroute_run:FAIL:netns_create unexpected error: 256 (errno 0)
  test_lwt_reroute:PASS:pthread_join 0 nsec
  #143/2   lwt_reroute/lwt_reroute_qdisc_dropped:FAIL
  #143     lwt_reroute:FAIL

The netns_create() definition looks like below:

  #define NETNS "ns_lwt"
  static inline int netns_create(void)
  {
        return system("ip netns add " NETNS);
  }

One possibility is that both lwt_redirect and lwt_reroute create
netns with the same name "ns_lwt" which may cause conflict. I tried
the following example:
  $ sudo ip netns add abc
  $ echo $?
  0
  $ sudo ip netns add abc
  Cannot create namespace file "/var/run/netns/abc": File exists
  $ echo $?
  1
  $

The return code for above netns_create() is 256. The internet search
suggests that the return value for 'ip netns add ns_lwt' is 1, which
matches the above 'sudo ip netns add abc' example.

This patch tried to use different netns names for two tests to avoid
'ip netns add <name>' failure.

I ran './test_progs -j' 10 times and all succeeded with
lwt_redirect/lwt_reroute tests.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240205052914.1742687-1-yonghong.song@linux.dev

e7f31873

selftests/bpf: Suppress warning message of an unused variable. · 169e6500

Kui-Feng Lee authored Feb 03, 2024

"r" is used to receive the return value of test_2 in bpf_testmod.c, but it
is not actually used. So, we remove "r" and change the return type to
"void".
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401300557.z5vzn8FM-lkp@intel.com/Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240204061204.1864529-1-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

169e6500

selftests/bpf: Fix flaky test ptr_untrusted · 7e428638

Yonghong Song authored Feb 04, 2024

Somehow recently I frequently hit the following test failure
with either ./test_progs or ./test_progs-cpuv4:
  serial_test_ptr_untrusted:PASS:skel_open 0 nsec
  serial_test_ptr_untrusted:PASS:lsm_attach 0 nsec
  serial_test_ptr_untrusted:PASS:raw_tp_attach 0 nsec
  serial_test_ptr_untrusted:FAIL:cmp_tp_name unexpected cmp_tp_name: actual -115 != expected 0
  #182     ptr_untrusted:FAIL

Further investigation found the failure is due to
  bpf_probe_read_user_str()
where reading user-level string attr->raw_tracepoint.name
is not successfully, most likely due to the
string itself still in disk and not populated into memory yet.

One solution is do a printf() call of the string before doing bpf
syscall which will force the raw_tracepoint.name into memory.
But I think a more robust solution is to use bpf_copy_from_user()
which is used in sleepable program and can tolerate page fault,
and the fix here used the latter approach.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240204194452.2785936-1-yonghong.song@linux.dev

7e428638

bpf: Remove an unnecessary check. · df9705ea

Kui-Feng Lee authored Feb 02, 2024

The "i" here is always equal to "btf_type_vlen(t)" since
the "for_each_member()" loop never breaks.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240203055119.2235598-1-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

df9705ea

03 Feb, 2024 4 commits

Merge branch 'two-small-fixes-for-global-subprog-tagging' · 2a79690e

Alexei Starovoitov authored Feb 02, 2024

Andrii Nakryiko says:

====================
Two small fixes for global subprog tagging

Fix a bug with passing trusted PTR_TO_BTF_ID_OR_NULL register into global
subprog that expects `__arg_trusted __arg_nullable` arguments, which was
discovered when adopting production BPF application.

Also fix annoying warnings that are irrelevant for static subprogs, which are
just an artifact of using btf_prepare_func_args() for both static and global
subprogs.
====================
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240202190529.2374377-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

2a79690e

bpf: don't emit warnings intended for global subprogs for static subprogs · 1eb98674

Andrii Nakryiko authored Feb 02, 2024

When btf_prepare_func_args() was generalized to handle both static and
global subprogs, a few warnings/errors that are meant only for global
subprog cases started to be emitted for static subprogs, where they are
sort of expected and irrelavant.

Stop polutting verifier logs with irrelevant scary-looking messages.

Fixes: e26080d0 ("bpf: prepare btf_prepare_func_args() for handling static subprogs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240202190529.2374377-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1eb98674

selftests/bpf: add more cases for __arg_trusted __arg_nullable args · e2e70535

Andrii Nakryiko authored Feb 02, 2024

Add extra layer of global functions to ensure that passing around
(trusted) PTR_TO_BTF_ID_OR_NULL registers works as expected. We also
extend trusted_task_arg_nullable subtest to check three possible valid
argumements: known NULL, known non-NULL, and maybe NULL cases.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240202190529.2374377-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e2e70535

bpf: handle trusted PTR_TO_BTF_ID_OR_NULL in argument check logic · 8f13c340

Andrii Nakryiko authored Feb 02, 2024

Add PTR_TRUSTED | PTR_MAYBE_NULL modifiers for PTR_TO_BTF_ID to
check_reg_type() to support passing trusted nullable PTR_TO_BTF_ID
registers into global functions accepting `__arg_trusted __arg_nullable`
arguments. This hasn't been caught earlier because tests were either
passing known non-NULL PTR_TO_BTF_ID registers or known NULL (SCALAR)
registers.

When utilizing this functionality in complicated real-world BPF
application that passes around PTR_TO_BTF_ID_OR_NULL, it became apparent
that verifier rejects valid case because check_reg_type() doesn't handle
this case explicitly. Existing check_reg_type() logic is already
anticipating this combination, so we just need to explicitly list this
combo in the switch statement.

Fixes: e2b3c4ff ("bpf: add __arg_trusted global func arg tag")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240202190529.2374377-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8f13c340

02 Feb, 2024 5 commits

selftests/bpf: trace_helpers.c: do not use poisoned type · a68b50f4

Shung-Hsi Yu authored Feb 02, 2024

After commit c698eaeb ("selftests/bpf: trace_helpers.c: Optimize
kallsyms cache") trace_helpers.c now includes libbpf_internal.h, and
thus can no longer use the u32 type (among others) since they are poison
in libbpf_internal.h. Replace u32 with __u32 to fix the following error
when building trace_helpers.c on powerpc:

error: attempt to use poisoned "u32"

Fixes: c698eaeb ("selftests/bpf: trace_helpers.c: Optimize kallsyms cache")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20240202095559.12900-1-shung-hsi.yu@suse.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

a68b50f4

Merge branch 'improvements-for-tracking-scalars-in-the-bpf-verifier' · 6fb3f727

Andrii Nakryiko authored Feb 02, 2024

Maxim Mikityanskiy says:

====================
Improvements for tracking scalars in the BPF verifier

From: Maxim Mikityanskiy <maxim@isovalent.com>

The goal of this series is to extend the verifier's capabilities of
tracking scalars when they are spilled to stack, especially when the
spill or fill is narrowing. It also contains a fix by Eduard for
infinite loop detection and a state pruning optimization by Eduard that
compensates for a verification complexity regression introduced by
tracking unbounded scalars. These improvements reduce the surface of
false rejections that I saw while working on Cilium codebase.

Patches 1-9 of the original series were previously applied in v2.

Patches 1-2 (Maxim): Support the case when boundary checks are first
performed after the register was spilled to the stack.

Patches 3-4 (Maxim): Support narrowing fills.

Patches 5-6 (Eduard): Optimization for state pruning in stacksafe() to
mitigate the verification complexity regression.

veristat -e file,prog,states -f '!states_diff<50' -f '!states_pct<10' -f '!states_a<10' -f '!states_b<10' -C ...

 * Without patch 5:

File                  Program   States (A)  States (B)  States    (DIFF)
--------------------  --------  ----------  ----------  ----------------
pyperf100.bpf.o       on_event        4878        6528   +1650 (+33.83%)
pyperf180.bpf.o       on_event        6936       11032   +4096 (+59.05%)
pyperf600.bpf.o       on_event       22271       39455  +17184 (+77.16%)
pyperf600_iter.bpf.o  on_event         400         490     +90 (+22.50%)
strobemeta.bpf.o      on_event        4895       14028  +9133 (+186.58%)

 * With patch 5:

File                     Program        States (A)  States (B)  States   (DIFF)
-----------------------  -------------  ----------  ----------  ---------------
bpf_xdp.o                tail_lb_ipv4         2770        2224   -546 (-19.71%)
pyperf100.bpf.o          on_event             4878        5848   +970 (+19.89%)
pyperf180.bpf.o          on_event             6936        8868  +1932 (+27.85%)
pyperf600.bpf.o          on_event            22271       29656  +7385 (+33.16%)
pyperf600_iter.bpf.o     on_event              400         450    +50 (+12.50%)
xdp_synproxy_kern.bpf.o  syncookie_tc          280         226    -54 (-19.29%)
xdp_synproxy_kern.bpf.o  syncookie_xdp         302         228    -74 (-24.50%)

v2 changes:

Fixed comments in patch 1, moved endianness checks to header files in
patch 12 where possible, added Eduard's ACKs.

v3 changes:

Maxim: Removed __is_scalar_unbounded altogether, addressed Andrii's
comments.

Eduard: Patch #5 (#14 in v2) changed significantly:
- Logical changes:
  - Handling of STACK_{MISC,ZERO} mix turned out to be incorrect:
    a mix of MISC and ZERO in old state is not equivalent to e.g.
    just MISC is current state, because verifier could have deduced
    zero scalars from ZERO slots in old state for some loads.
  - There is no reason to limit the change only to cases when
    old or current stack is a spill of unbounded scalar,
    it is valid to compare any 64-bit scalar spill with fake
    register impersonating MISC.
  - STACK_ZERO vs spilled zero case was dropped,
    after recent changes for zero handling by Andrii and Yonghong
    it is hard (impossible?) to conjure all ZERO slots for an spi.
    => the case does not make any difference in veristat results.
- Use global static variable for unbound_reg (Andrii)
- Code shuffling to remove duplication in stacksafe() (Andrii)
====================

Link: https://lore.kernel.org/r/20240127175237.526726-1-maxtram95@gmail.comSigned-off-by: Andrii Nakryiko <andrii@kernel.org>

6fb3f727

selftests/bpf: States pruning checks for scalar vs STACK_MISC · 73a28d9d

Eduard Zingerman authored Jan 27, 2024

Check that stacksafe() compares spilled scalars with STACK_MISC.
The following combinations are explored:
- old spill of imprecise scalar is equivalent to cur STACK_{MISC,INVALID}
  (plus error in unpriv mode);
- old spill of precise scalar is not equivalent to cur STACK_MISC;
- old STACK_MISC is equivalent to cur scalar;
- old STACK_MISC is not equivalent to cur non-scalar.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127175237.526726-7-maxtram95@gmail.com

73a28d9d

bpf: Handle scalar spill vs all MISC in stacksafe() · 6efbde20

Eduard Zingerman authored Jan 27, 2024

When check_stack_read_fixed_off() reads value from an spi
all stack slots of which are set to STACK_{MISC,INVALID},
the destination register is set to unbound SCALAR_VALUE.

Exploit this fact by allowing stacksafe() to use a fake
unbound scalar register to compare 'mmmm mmmm' stack value
in old state vs spilled 64-bit scalar in current state
and vice versa.

Veristat results after this patch show some gains:

./veristat -C -e file,prog,states -f 'states_pct>10'  not-opt after
File                     Program                States   (DIFF)
-----------------------  ---------------------  ---------------
bpf_overlay.o            tail_rev_nodeport_lb4    -45 (-15.85%)
bpf_xdp.o                tail_lb_ipv4            -541 (-19.57%)
pyperf100.bpf.o          on_event                -680 (-10.42%)
pyperf180.bpf.o          on_event               -2164 (-19.62%)
pyperf600.bpf.o          on_event               -9799 (-24.84%)
strobemeta.bpf.o         on_event               -9157 (-65.28%)
xdp_synproxy_kern.bpf.o  syncookie_tc             -54 (-19.29%)
xdp_synproxy_kern.bpf.o  syncookie_xdp            -74 (-24.50%)
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127175237.526726-6-maxtram95@gmail.com

6efbde20

selftests/bpf: Add test cases for narrowing fill · 067313a8

Maxim Mikityanskiy authored Jan 27, 2024

The previous commit allowed to preserve boundaries and track IDs of
scalars on narrowing fills. Add test cases for that pattern.
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240127175237.526726-5-maxtram95@gmail.com

067313a8