Commits · cf060c2c399fa457569123bb9806b455ff53e64c · Kirill Smelkov / linux

16 Sep, 2022 5 commits

selftests/bpf: Fix test_verif_scale{1,3} SEC() annotations · cf060c2c

Andrii Nakryiko authored Sep 09, 2022

Use proper SEC("tc") for test_verif_scale{1,3} programs. It's not
a problem for selftests right now because we manually set type
programmatically, but not having correct SEC() definitions makes it
harded to generically load BPF object files.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220909193053.577111-2-andrii@kernel.org

cf060c2c

bpf: Move bpf_dispatcher function out of ftrace locations · ceea991a

Jiri Olsa authored Sep 03, 2022

The dispatcher function is attached/detached to trampoline by
dispatcher update function. At the same time it's available as
ftrace attachable function.

After discussion [1] the proposed solution is to use compiler
attributes to alter bpf_dispatcher_##name##_func function:

  - remove it from being instrumented with __no_instrument_function__
    attribute, so ftrace has no track of it

  - but still generate 5 nop instructions with patchable_function_entry(5)
    attribute, which are expected by bpf_arch_text_poke used by
    dispatcher update function

Enabling HAVE_DYNAMIC_FTRACE_NO_PATCHABLE option for x86, so
__patchable_function_entries functions are not part of ftrace/mcount
locations.

Adding attributes to bpf_dispatcher_XXX function on x86_64 so it's
kept out of ftrace locations and has 5 byte nop generated at entry.

These attributes need to be arch specific as pointed out by Ilya
Leoshkevic in here [2].

The dispatcher image is generated only for x86_64 arch, so the
code can stay as is for other archs.

  [1] https://lore.kernel.org/bpf/20220722110811.124515-1-jolsa@kernel.org/
  [2] https://lore.kernel.org/bpf/969a14281a7791c334d476825863ee449964dd0c.camel@linux.ibm.com/Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20220903131154.420467-3-jolsa@kernel.org

ceea991a

ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLE · 9440155c

Peter Zijlstra (Intel) authored Sep 03, 2022

x86 will shortly start using -fpatchable-function-entry for purposes
other than ftrace, make sure the __patchable_function_entry section
isn't merged in the mcount_loc section.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org

9440155c

bpf: Use bpf_capable() instead of CAP_SYS_ADMIN for blinding decision · bfeb7e39

Yauheni Kaliuta authored Sep 05, 2022

The full CAP_SYS_ADMIN requirement for blinding looks too strict nowadays.
These days given unprivileged BPF is disabled by default, the main users
for constant blinding coming from unprivileged in particular via cBPF -> eBPF
migration (e.g. old-style socket filters).
Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220831090655.156434-1-ykaliuta@redhat.com
Link: https://lore.kernel.org/bpf/20220905090149.61221-1-ykaliuta@redhat.com

bfeb7e39

bpf: use kvmemdup_bpfptr helper · a02c118e

Wang Yufen authored Sep 13, 2022

Use kvmemdup_bpfptr helper instead of open-coding to
simplify the code.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1663058433-14089-1-git-send-email-wangyufen@huawei.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

a02c118e

15 Sep, 2022 1 commit

bpf: Add verifier check for BPF_PTR_POISON retval and arg · 47e34cb7

Dave Marchevsky authored Sep 12, 2022

BPF_PTR_POISON was added in commit c0a5a21c ("bpf: Allow storing
referenced kptr in map") to denote a bpf_func_proto btf_id which the
verifier will replace with a dynamically-determined btf_id at verification
time.

This patch adds verifier 'poison' functionality to BPF_PTR_POISON in
order to prepare for expanded use of the value to poison ret- and
arg-btf_id in ongoing work, namely rbtree and linked list patchsets
[0, 1]. Specifically, when the verifier checks helper calls, it assumes
that BPF_PTR_POISON'ed ret type will be replaced with a valid type before
- or in lieu of - the default ret_btf_id logic. Similarly for arg btf_id.

If poisoned btf_id reaches default handling block for either, consider
this a verifier internal error and fail verification. Otherwise a helper
w/ poisoned btf_id but no verifier logic replacing the type will cause a
crash as the invalid pointer is dereferenced.

Also move BPF_PTR_POISON to existing include/linux/posion.h header and
remove unnecessary shift.

  [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
  [1]: lore.kernel.org/bpf/20220904204145.3089-1-memxor@gmail.com
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220912154544.1398199-1-davemarchevsky@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

47e34cb7

11 Sep, 2022 9 commits

bpf: Add verifier support for custom callback return range · 1bfe26fb

Dave Marchevsky authored Sep 08, 2022

Verifier logic to confirm that a callback function returns 0 or 1 was
added in commit 69c087ba ("bpf: Add bpf_for_each_map_elem() helper").
At the time, callback return value was only used to continue or stop
iteration.

In order to support callbacks with a broader return value range, such as
those added in rbtree series[0] and others, add a callback_ret_range to
bpf_func_state. Verifier's helpers which set in_callback_fn will also
set the new field, which the verifier will later use to check return
value bounds.

Default to tnum_range(0, 0) instead of using tnum_unknown as a sentinel
value as the latter would prevent the valid range (0, U64_MAX) being
used. Previous global default tnum_range(0, 1) is explicitly set for
extant callback helpers. The change to global default was made after
discussion around this patch in rbtree series [1], goal here is to make
it more obvious that callback_ret_range should be explicitly set.

  [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com/
  [1]: lore.kernel.org/bpf/20220830172759.4069786-2-davemarchevsky@fb.com/
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220908230716.2751723-1-davemarchevsky@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1bfe26fb

selftests/bpf: fix ct status check in bpf_nf selftests · f7c946f2

Lorenzo Bianconi authored Sep 08, 2022

Check properly the connection tracking entry status configured running
bpf_ct_change_status kfunc.
Remove unnecessary IPS_CONFIRMED status configuration since it is
already done during entry allocation.

Fixes: 6eb7fba0 ("selftests/bpf: Add tests for new nf_conntrack kfuncs")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/813a5161a71911378dfac8770ec890428e4998aa.1662623574.git.lorenzo@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f7c946f2

Merge branch 'Support direct writes to nf_conn:mark' · b8c62fe2

Alexei Starovoitov authored Sep 10, 2022

Daniel Xu says:

====================

Support direct writes to nf_conn:mark from TC and XDP prog types. This
is useful when applications want to store per-connection metadata. This
is also particularly useful for applications that run both bpf and
iptables/nftables because the latter can trivially access this metadata.

One example use case would be if a bpf prog is responsible for advanced
packet classification and iptables/nftables is later used for routing
due to pre-existing/legacy code.

Past discussion:
- v4: https://lore.kernel.org/bpf/cover.1661192455.git.dxu@dxuuu.xyz/
- v3: https://lore.kernel.org/bpf/cover.1660951028.git.dxu@dxuuu.xyz/
- v2: https://lore.kernel.org/bpf/CAP01T74Sgn354dXGiFWFryu4vg+o8b9s9La1d9zEbC4LGvH4qg@mail.gmail.com/T/
- v1: https://lore.kernel.org/bpf/cover.1660592020.git.dxu@dxuuu.xyz/

Changes since v4:
- Use exported function pointer + mutex to handle CONFIG_NF_CONNTRACK=m
  case

Changes since v3:
- Use a mutex to protect module load/unload critical section

Changes since v2:
- Remove use of NOT_INIT for btf_struct_access write path
- Disallow nf_conn writing when nf_conntrack module not loaded
- Support writing to nf_conn___init:mark

Changes since v1:
- Add unimplemented stub for when !CONFIG_BPF_SYSCALL
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b8c62fe2

selftests/bpf: Add tests for writing to nf_conn:mark · e2d75e95

Daniel Xu authored Sep 07, 2022

Add a simple extension to the existing selftest to write to
nf_conn:mark. Also add a failure test for writing to unsupported field.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/f78966b81b9349d2b8ebb4cee2caf15cb6b38ee2.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e2d75e95

bpf: Add support for writing to nf_conn:mark · 864b656f

Daniel Xu authored Sep 07, 2022

Support direct writes to nf_conn:mark from TC and XDP prog types. This
is useful when applications want to store per-connection metadata. This
is also particularly useful for applications that run both bpf and
iptables/nftables because the latter can trivially access this metadata.

One example use case would be if a bpf prog is responsible for advanced
packet classification and iptables/nftables is later used for routing
due to pre-existing/legacy code.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/ebca06dea366e3e7e861c12f375a548cc4c61108.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

864b656f

bpf: Export btf_type_by_id() and bpf_log() · 84c6ac41

Daniel Xu authored Sep 07, 2022

These symbols will be used in nf_conntrack.ko to support direct writes
to `nf_conn`.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/3c98c19dc50d3b18ea5eca135b4fc3a5db036060.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

84c6ac41

bpf: Use 0 instead of NOT_INIT for btf_struct_access() writes · 896f07c0

Daniel Xu authored Sep 07, 2022

Returning a bpf_reg_type only makes sense in the context of a BPF_READ.
For writes, prefer to explicitly return 0 for clarity.

Note that is non-functional change as it just so happened that NOT_INIT
== 0.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/01772bc1455ae16600796ac78c6cc9fff34f95ff.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

896f07c0

bpf: Add stub for btf_struct_access() · d4f7bdb2

Daniel Xu authored Sep 07, 2022

Add corresponding unimplemented stub for when CONFIG_BPF_SYSCALL=n
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/4021398e884433b1fef57a4d28361bb9fcf1bd05.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d4f7bdb2

bpf: Remove duplicate PTR_TO_BTF_ID RO check · 65269888

Daniel Xu authored Sep 07, 2022

Since commit 27ae7997 ("bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS")
there has existed bpf_verifier_ops:btf_struct_access. When
btf_struct_access is _unset_ for a prog type, the verifier runs the
default implementation, which is to enforce read only:

        if (env->ops->btf_struct_access) {
                [...]
        } else {
                if (atype != BPF_READ) {
                        verbose(env, "only read is supported\n");
                        return -EACCES;
                }

                [...]
        }

When btf_struct_access is _set_, the expectation is that
btf_struct_access has full control over accesses, including if writes
are allowed.

Rather than carve out an exception for each prog type that may write to
BTF ptrs, delete the redundant check and give full control to
btf_struct_access.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/962da2bff1238746589e332ff1aecc49403cd7ce.1662568410.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

65269888

10 Sep, 2022 2 commits

bpf: Simplify code by using for_each_cpu_wrap() · 57c92f11

Punit Agrawal authored Sep 07, 2022

In the percpu freelist code, it is a common pattern to iterate over
the possible CPUs mask starting with the current CPU. The pattern is
implemented using a hand rolled while loop with the loop variable
increment being open-coded.

Simplify the code by using for_each_cpu_wrap() helper to iterate over
the possible cpus starting with the current CPU. As a result, some of
the special-casing in the loop also gets simplified.

No functional change intended.
Signed-off-by: Punit Agrawal <punit.agrawal@bytedance.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20220907155746.1750329-1-punit.agrawal@bytedance.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

57c92f11

bpf: add missing percpu_counter_destroy() in htab_map_alloc() · cf7de6a5

Tetsuo Handa authored Sep 11, 2022

syzbot is reporting ODEBUG bug in htab_map_alloc() [1], for
commit 86fe28f7 ("bpf: Optimize element count in non-preallocated
hash map.") added percpu_counter_init() to htab_map_alloc() but forgot to
add percpu_counter_destroy() to the error path.

Link: https://syzkaller.appspot.com/bug?extid=5d1da78b375c3b5e6c2b [1]
Reported-by: syzbot <syzbot+5d1da78b375c3b5e6c2b@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 86fe28f7 ("bpf: Optimize element count in non-preallocated hash map.")
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/e2e4cc0e-9d36-4ca1-9bfa-ce23e6f8310b@I-love.SAKURA.ne.jpSigned-off-by: Alexei Starovoitov <ast@kernel.org>

cf7de6a5

09 Sep, 2022 5 commits

Merge branch 'cgroup/connect{4,6} programs for unprivileged ICMP ping' · 2fae6771

Martin KaFai Lau authored Sep 09, 2022

YiFei Zhu says:

====================

Usually when a TCP/UDP connection is initiated, we can bind the socket
to a specific IP attached to an interface in a cgroup/connect hook.
But for pings, this is impossible, as the hook is not being called.

This series adds the invocation for cgroup/connect{4,6} programs to
unprivileged ICMP ping (i.e. ping sockets created with SOCK_DGRAM
IPPROTO_ICMP(V6) as opposed to SOCK_RAW). This also adds a test to
verify that the hooks are being called and invoking bpf_bind() from
within the hook actually binds the socket.

Patch 1 adds the invocation of the hook.
Patch 2 deduplicates write_sysctl in BPF test_progs.
Patch 3 adds the tests for this hook.

v1 -> v2:
* Added static to bindaddr_v6 in prog_tests/connect_ping.c
* Deduplicated much of the test logic in prog_tests/connect_ping.c
* Deduplicated write_sysctl() to test_progs.c

v2 -> v3:
* Renamed variable "obj" to "skel" for the BPF skeleton object in
  prog_tests/connect_ping.c

v3 -> v4:
* Fixed error path to destroy skel in prog_tests/connect_ping.c
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

2fae6771

selftests/bpf: Ensure cgroup/connect{4,6} programs can bind unpriv ICMP ping · 58c449a9

YiFei Zhu authored Sep 09, 2022

This tests that when an unprivileged ICMP ping socket connects,
the hooks are actually invoked. We also ensure that if the hook does
not call bpf_bind(), the bound address is unmodified, and if the
hook calls bpf_bind(), the bound address is exactly what we provided
to the helper.

A new netns is used to enable ping_group_range in the test without
affecting ouside of the test, because by default, not even root is
permitted to use unprivileged ICMP ping...
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/086b227c1b97f4e94193e58aae7576d0261b68a4.1662682323.git.zhuyifei@google.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

58c449a9

selftests/bpf: Deduplicate write_sysctl() to test_progs.c · e42921c3

YiFei Zhu authored Sep 09, 2022

This helper is needed in multiple tests. Instead of copying it over
and over, better to deduplicate this helper to test_progs.c.

test_progs.c is chosen over testing_helpers.c because of this helper's
use of CHECK / ASSERT_*, and the CHECK was modified to use ASSERT_*
so it does not rely on a duration variable.
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/9b4fc9a27bd52f771b657b4c4090fc8d61f3a6b5.1662682323.git.zhuyifei@google.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

e42921c3

bpf: Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping · 0ffe2412

YiFei Zhu authored Sep 09, 2022

Usually when a TCP/UDP connection is initiated, we can bind the socket
to a specific IP attached to an interface in a cgroup/connect hook.
But for pings, this is impossible, as the hook is not being called.

This adds the hook invocation to unprivileged ICMP ping (i.e. ping
sockets created with SOCK_DGRAM IPPROTO_ICMP(V6) as opposed to
SOCK_RAW. Logic is mirrored from UDP sockets where the hook is invoked
during pre_connect, after a check for suficiently sized addr_len.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/5764914c252fad4cd134fb6664c6ede95f409412.1662682323.git.zhuyifei@google.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

0ffe2412

libbpf: Remove gcc support for bpf_tail_call_static for now · 665f5d35

Daniel Borkmann authored Sep 09, 2022

This reverts commit 14e5ce79 ("libbpf: Add GCC support for
bpf_tail_call_static"). Reason is that gcc invented their own BPF asm
which is not conform with LLVM one, and going forward this would be
more painful to maintain here and in other areas of the library. Thus
remove it; ask to gcc folks is to align with LLVM one to use exact
same syntax.

Fixes: 14e5ce79 ("libbpf: Add GCC support for bpf_tail_call_static")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Hilliard <james.hilliard1@gmail.com>
Cc: Jose E. Marchesi <jose.marchesi@oracle.com>

665f5d35

07 Sep, 2022 18 commits

bpf: Add helper macro bpf_for_each_reg_in_vstate · b239da34

Kumar Kartikeya Dwivedi authored Sep 04, 2022

For a lot of use cases in future patches, we will want to modify the
state of registers part of some same 'group' (e.g. same ref_obj_id). It
won't just be limited to releasing reference state, but setting a type
flag dynamically based on certain actions, etc.

Hence, we need a way to easily pass a callback to the function that
iterates over all registers in current bpf_verifier_state in all frames
upto (and including) the curframe.

While in C++ we would be able to easily use a lambda to pass state and
the callback together, sadly we aren't using C++ in the kernel. The next
best thing to avoid defining a function for each case seems like
statement expressions in GNU C. The kernel already uses them heavily,
hence they can passed to the macro in the style of a lambda. The
statement expression will then be substituted in the for loop bodies.

Variables __state and __reg are set to current bpf_func_state and reg
for each invocation of the expression inside the passed in verifier
state.

Then, convert mark_ptr_or_null_regs, clear_all_pkt_pointers,
release_reference, find_good_pkt_pointers, find_equal_scalars to
use bpf_for_each_reg_in_vstate.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220904204145.3089-16-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

b239da34

bpf: Add zero_map_value to zero map value with special fields · cc487558

Kumar Kartikeya Dwivedi authored Sep 04, 2022

We need this helper to skip over special fields (bpf_spin_lock,
bpf_timer, kptrs) while zeroing a map value. Use the same logic as
copy_map_value but memset instead of memcpy.

Currently, the code zeroing map value memory does not have to deal with
special fields, hence this is a prerequisite for introducing such
support.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220904204145.3089-4-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

cc487558

bpf: Support kptrs in percpu arraymap · 6df4ea1f

Kumar Kartikeya Dwivedi authored Sep 04, 2022

Enable support for kptrs in percpu BPF arraymap by wiring up the freeing
of these kptrs from percpu map elements.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220904204145.3089-3-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6df4ea1f

bpf: Add copy_map_value_long to copy to remote percpu memory · 44832519

Kumar Kartikeya Dwivedi authored Sep 04, 2022

bpf_long_memcpy is used while copying to remote percpu regions from BPF
syscall and helpers, so that the copy is atomic at word size
granularity.

This might not be possible when you copy from map value hosting kptrs
from or to percpu maps, as the alignment or size in disjoint regions may
not be multiple of word size.

Hence, to avoid complicating the copy loop, we only use bpf_long_memcpy
when special fields are not present, otherwise use normal memcpy to copy
the disjoint regions.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220904204145.3089-2-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

44832519

bpf: Fix resetting logic for unreferenced kptrs · 9fad7fe5

Jules Irenge authored Sep 07, 2022

Sparse reported a warning at bpf_map_free_kptrs()
"warning: Using plain integer as NULL pointer"
During the process of fixing this warning, it was discovered that the current
code erroneously writes to the pointer variable instead of deferencing and
writing to the actual kptr. Hence, Sparse tool accidentally helped to uncover
this problem. Fix this by doing WRITE_ONCE(*p, 0) instead of WRITE_ONCE(p, 0).

Note that the effect of this bug is that unreferenced kptrs will not be cleared
during check_and_free_fields. It is not a problem if the clearing is not done
during map_free stage, as there is nothing to free for them.

Fixes: 14a324f6 ("bpf: Wire up freeing of referenced kptr")
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Link: https://lore.kernel.org/r/Yxi3pJaK6UDjVJSy@playgroundSigned-off-by: Alexei Starovoitov <ast@kernel.org>

9fad7fe5

Merge branch 'bpf-core changes for preparation of HID-bpf' · 53e80a39

Alexei Starovoitov authored Sep 07, 2022

Benjamin Tissoires says:

====================

Hi,

well, given that the HID changes haven't moved a lot in the past
revisions and that I am cc-ing a bunch of people, I have dropped them
while we focus on the last 2 requirements in bpf-core changes.

I'll submit a HID targeted series when we get these in tree, which
would make things a lore more independent.

For reference, the whole reasons for these 2 main changes are at
https://lore.kernel.org/bpf/20220902132938.2409206-1-benjamin.tissoires@redhat.com/

Compared to v10 (in addition of dropping the HID changes), I have
changed the selftests so we can test both light skeletons and libbbpf
calls.

Cheers,
Benjamin
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

53e80a39

selftests/bpf: Add tests for kfunc returning a memory pointer · 22ed8d5a

Benjamin Tissoires authored Sep 06, 2022

We add 2 new kfuncs that are following the RET_PTR_TO_MEM
capability from the previous commit.
Then we test them in selftests:
the first tests are testing valid case, and are not failing,
and the later ones are actually preventing the program to be loaded
because they are wrong.

To work around that, we mark the failing ones as not autoloaded
(with SEC("?tc")), and we manually enable them one by one, ensuring
the verifier rejects them.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220906151303.2780789-8-benjamin.tissoires@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

22ed8d5a

bpf/verifier: allow kfunc to return an allocated mem · eb1f7f71

Benjamin Tissoires authored Sep 06, 2022

For drivers (outside of network), the incoming data is not statically
defined in a struct. Most of the time the data buffer is kzalloc-ed
and thus we can not rely on eBPF and BTF to explore the data.

This commit allows to return an arbitrary memory, previously allocated by
the driver.
An interesting extra point is that the kfunc can mark the exported
memory region as read only or read/write.

So, when a kfunc is not returning a pointer to a struct but to a plain
type, we can consider it is a valid allocated memory assuming that:
- one of the arguments is either called rdonly_buf_size or
  rdwr_buf_size
- and this argument is a const from the caller point of view

We can then use this parameter as the size of the allocated memory.

The memory is either read-only or read-write based on the name
of the size parameter.
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220906151303.2780789-7-benjamin.tissoires@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

eb1f7f71

bpf/btf: bump BTF_KFUNC_SET_MAX_CNT · f9b34818

Benjamin Tissoires authored Sep 06, 2022

net/bpf/test_run.c is already presenting 20 kfuncs.
net/netfilter/nf_conntrack_bpf.c is also presenting an extra 10 kfuncs.

Given that all the kfuncs are regrouped into one unique set, having
only 2 space left prevent us to add more selftests.

Bump it to 256.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220906151303.2780789-6-benjamin.tissoires@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f9b34818

selftests/bpf: add test for accessing ctx from syscall program type · fb66223a

Benjamin Tissoires authored Sep 06, 2022

We need to also export the kfunc set to the syscall program type,
and then add a couple of eBPF programs that are testing those calls.

The first one checks for valid access, and the second one is OK
from a static analysis point of view but fails at run time because
we are trying to access outside of the allocated memory.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220906151303.2780789-5-benjamin.tissoires@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fb66223a

bpf/verifier: allow all functions to read user provided context · 15baa55f

Benjamin Tissoires authored Sep 06, 2022

When a function was trying to access data from context in a syscall eBPF
program, the verifier was rejecting the call unless it was accessing the
first element.
This is because the syscall context is not known at compile time, and
so we need to check this when actually accessing it.

Check for the valid memory access if there is no convert_ctx callback,
and allow such situation to happen.
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220906151303.2780789-4-benjamin.tissoires@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

15baa55f

bpf: split btf_check_subprog_arg_match in two · 95f2f26f

Benjamin Tissoires authored Sep 06, 2022

btf_check_subprog_arg_match() was used twice in verifier.c:
- when checking for the type mismatches between a (sub)prog declaration
  and BTF
- when checking the call of a subprog to see if the provided arguments
  are correct and valid

This is problematic when we check if the first argument of a program
(pointer to ctx) is correctly accessed:
To be able to ensure we access a valid memory in the ctx, the verifier
assumes the pointer to context is not null.
This has the side effect of marking the program accessing the entire
context, even if the context is never dereferenced.

For example, by checking the context access with the current code, the
following eBPF program would fail with -EINVAL if the ctx is set to null
from the userspace:

```
SEC("syscall")
int prog(struct my_ctx *args) {
  return 0;
}
```

In that particular case, we do not want to actually check that the memory
is correct while checking for the BTF validity, but we just want to
ensure that the (sub)prog definition matches the BTF we have.

So split btf_check_subprog_arg_match() in two so we can actually check
for the memory used when in a call, and ignore that part when not.

Note that a further patch is in preparation to disentangled
btf_check_func_arg_match() from these two purposes, and so right now we
just add a new hack around that by adding a boolean to this function.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220906151303.2780789-3-benjamin.tissoires@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

95f2f26f

selftests/bpf: regroup and declare similar kfuncs selftests in an array · 012ba115

Benjamin Tissoires authored Sep 06, 2022

Similar to tools/testing/selftests/bpf/prog_tests/dynptr.c:
we declare an array of tests that we run one by one in a for loop.

Followup patches will add more similar-ish tests, so avoid a lot of copy
paste by grouping the declaration in an array.

For light skeletons, we have to rely on the offsetof() macro so we can
statically declare which program we are using.
In the libbpf case, we can rely on bpf_object__find_program_by_name().
So also change the Makefile to generate both light skeletons and normal
ones.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220906151303.2780789-2-benjamin.tissoires@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

012ba115

Merge branch 'bpf: Support struct argument for trampoline base progs' · 028a9642

Alexei Starovoitov authored Sep 06, 2022

Yonghong Song says:

====================
Currently struct arguments are not supported for trampoline based progs.
One of major reason is that struct argument may pass by value which may
use more than one registers. This breaks trampoline progs where
each argument is assumed to take one register. bcc community reported the
issue ([1]) where struct argument is not supported for fentry program.
  typedef struct {
        uid_t val;
  } kuid_t;
  typedef struct {
        gid_t val;
  } kgid_t;
  int security_path_chown(struct path *path, kuid_t uid, kgid_t gid);
Inside Meta, we also have a use case to attach to tcp_setsockopt()
  typedef struct {
        union {
                void            *kernel;
                void __user     *user;
        };
        bool            is_kernel : 1;
  } sockptr_t;
  int tcp_setsockopt(struct sock *sk, int level, int optname,
                     sockptr_t optval, unsigned int optlen);

This patch added struct value support for bpf tracing programs which
uses trampoline. Only <= 16 byte struct size is supported for now
which covers use cases in the above. For x86/arm64/bpf, <= 16
struct value will be passed in registers instead of by reference.
Only x86_64 is supported in this patch. arm64 support can be
added later.

 [1] https://github.com/iovisor/bcc/issues/3657

Changelog:
  v3 -> v4:
   - fix a test failure where no casting for
     bpf_get_func_arg() value as the value type is 'int'.
   - add tracing_struct test in DENYLIST.s390x
   - simplify macro BPF_REG_CNT for BPF_PROG2.
  v2 -> v3:
   - previously struct arguments (<= 16 bytes) are passed
     by reference for bpf programs. Suggested by Alexei,
     it is passed by value now.
   - in order to support passing <= 16 struct value, a
     new macro BPF_PROG2 is invented.
  rfc v1 -> v2:
   - changed bpf_func_model struct info fields to
     arg_flags[] to make it easy to iterate arguments
     in arch specific {save|restore}_regs() functions.
   - added fexit tests to test return values with
     struct arguments.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

028a9642

selftests/bpf: Add tracing_struct test in DENYLIST.s390x · ae63c10f

Yonghong Song authored Aug 31, 2022

Add tracing_struct test in DENYLIST.s390x since s390x does not
support trampoline now.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152723.2081551-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ae63c10f

selftests/bpf: Use BPF_PROG2 for some fentry programs without struct arguments · a7c2ca3a

Yonghong Song authored Aug 31, 2022

Use BPF_PROG2 instead of BPF_PROG for programs in progs/timer.c
to test BPF_PROG2 for cases without struct arguments.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152718.2081091-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a7c2ca3a

selftests/bpf: Add struct argument tests with fentry/fexit programs. · 1642a394

Yonghong Song authored Aug 31, 2022

Add various struct argument tests with fentry/fexit programs.
Also add one test with a kernel func which does not have any
argument to test BPF_PROG2 macro in such situation.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152713.2080039-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1642a394

libbpf: Add new BPF_PROG2 macro · 34586d29

Yonghong Song authored Aug 31, 2022

To support struct arguments in trampoline based programs,
existing BPF_PROG doesn't work any more since
the type size is needed to find whether a parameter
takes one or two registers. So this patch added a new
BPF_PROG2 macro to support such trampoline programs.

The idea is suggested by Andrii. For example, if the
to-be-traced function has signature like
  typedef struct {
       void *x;
       int t;
  } sockptr;
  int blah(sockptr x, char y);

In the new BPF_PROG2 macro, the argument can be
represented as
  __bpf_prog_call(
     ({ union {
          struct { __u64 x, y; } ___z;
          sockptr x;
        } ___tmp = { .___z = { ctx[0], ctx[1] }};
        ___tmp.x;
     }),
     ({ union {
          struct { __u8 x; } ___z;
          char y;
        } ___tmp = { .___z = { ctx[2] }};
        ___tmp.y;
     }));
In the above, the values stored on the stack are properly
assigned to the actual argument type value by using 'union'
magic. Note that the macro also works even if no arguments
are with struct types.

Note that new BPF_PROG2 works for both llvm16 and pre-llvm16
compilers where llvm16 supports bpf target passing value
with struct up to 16 byte size and pre-llvm16 will pass
by reference by storing values on the stack. With static functions
with struct argument as always inline, the compiler is able
to optimize and remove additional stack saving of struct values.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152707.2079473-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

34586d29