Commits · ac7ac432a67eb5410be32a3bef0fb393058af537 · Kirill Smelkov / linux

22 Jul, 2022 14 commits

Merge branch 'New nf_conntrack kfuncs for insertion, changing timeout, status' · ac7ac432

Alexei Starovoitov authored Jul 21, 2022

Kumar Kartikeya Dwivedi says:

====================

Introduce the following new kfuncs:
 - bpf_{xdp,skb}_ct_alloc
 - bpf_ct_insert_entry
 - bpf_ct_{set,change}_timeout
 - bpf_ct_{set,change}_status

The setting of timeout and status on allocated or inserted/looked up CT
is same as the ctnetlink interface, hence code is refactored and shared
with the kfuncs. It is ensured allocated CT cannot be passed to kfuncs
that expected inserted CT, and vice versa. Please see individual patches
for details.

Changelog:
----------
v6 -> v7:
v6: https://lore.kernel.org/bpf/20220719132430.19993-1-memxor@gmail.com

 * Use .long to encode flags (Alexei)
 * Fix description of KF_RET_NULL in documentation (Toke)

v5 -> v6:
v5: https://lore.kernel.org/bpf/20220623192637.3866852-1-memxor@gmail.com

 * Introduce kfunc flags, rework verifier to work with them
 * Add documentation for kfuncs
 * Add comment explaining TRUSTED_ARGS kfunc flag (Alexei)
 * Fix missing offset check for trusted arguments (Alexei)
 * Change nf_conntrack test minimum delta value to 8

v4 -> v5:
v4: https://lore.kernel.org/bpf/cover.1653600577.git.lorenzo@kernel.org

 * Drop read-only PTR_TO_BTF_ID approach, use struct nf_conn___init (Alexei)
 * Drop acquire release pair code that is no longer required (Alexei)
 * Disable writes into nf_conn, use dedicated helpers (Florian, Alexei)
 * Refactor and share ctnetlink code for setting timeout and status
 * Do strict type matching on finding __ref suffix on argument to
   prevent passing nf_conn___init as nf_conn (offset = 0, match on walk)
 * Remove bpf_ct_opts parameter from bpf_ct_insert_entry
 * Update selftests for new additions, add more negative tests

v3 -> v4:
v3: https://lore.kernel.org/bpf/cover.1652870182.git.lorenzo@kernel.org

 * split bpf_xdp_ct_add in bpf_xdp_ct_alloc/bpf_skb_ct_alloc and
   bpf_ct_insert_entry
 * add verifier code to properly populate/configure ct entry
 * improve selftests

v2 -> v3:
v2: https://lore.kernel.org/bpf/cover.1652372970.git.lorenzo@kernel.org

 * add bpf_xdp_ct_add and bpf_ct_refresh_timeout kfunc helpers
 * remove conntrack dependency from selftests
 * add support for forcing kfunc args to be referenced and related selftests

v1 -> v2:
v1: https://lore.kernel.org/bpf/1327f8f5696ff2bc60400e8f3b79047914ccc837.1651595019.git.lorenzo@kernel.org

 * add bpf_ct_refresh_timeout kfunc selftest

Kumar Kartikeya Dwivedi (10):
  bpf: Introduce 8-byte BTF set
  tools/resolve_btfids: Add support for 8-byte BTF sets
  bpf: Switch to new kfunc flags infrastructure
  bpf: Add support for forcing kfunc args to be trusted
  bpf: Add documentation for kfuncs
  net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
  net: netfilter: Add kfuncs to set and change CT timeout
  selftests/bpf: Add verifier tests for trusted kfunc args
  selftests/bpf: Add negative tests for new nf_conntrack kfuncs
  selftests/bpf: Fix test_verifier failed test in unprivileged mode
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ac7ac432

selftests/bpf: Fix test_verifier failed test in unprivileged mode · e3fa4735

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Loading the BTF won't be permitted without privileges, hence only test
for privileged mode by setting the prog type. This makes the
test_verifier show 0 failures when unprivileged BPF is enabled.

Fixes: 41188e9e ("selftest/bpf: Test for use-after-free bug fix in inline_bpf_loop")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-14-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e3fa4735

selftests/bpf: Add negative tests for new nf_conntrack kfuncs · c6f420ac

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Test cases we care about and ensure improper usage is caught and
rejected by the verifier.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-13-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c6f420ac

selftests/bpf: Add tests for new nf_conntrack kfuncs · 6eb7fba0

Lorenzo Bianconi authored Jul 21, 2022

Introduce selftests for the following kfunc helpers:
- bpf_xdp_ct_alloc
- bpf_skb_ct_alloc
- bpf_ct_insert_entry
- bpf_ct_set_timeout
- bpf_ct_change_timeout
- bpf_ct_set_status
- bpf_ct_change_status
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-12-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6eb7fba0

selftests/bpf: Add verifier tests for trusted kfunc args · 8dd5e756

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Make sure verifier rejects the bad cases and ensure the good case keeps
working. The selftests make use of the bpf_kfunc_call_test_ref kfunc
added in the previous patch only for verification.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-11-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8dd5e756

net: netfilter: Add kfuncs to set and change CT status · ef69aa3a

Lorenzo Bianconi authored Jul 21, 2022

Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in
order to set nf_conn field of allocated entry or update nf_conn status
field of existing inserted entry. Use nf_ct_change_status_common to
share the permitted status field changes between netlink and BPF side
by refactoring ctnetlink_change_status.

It is required to introduce two kfuncs taking nf_conn___init and nf_conn
instead of sharing one because KF_TRUSTED_ARGS flag causes strict type
checking. This would disallow passing nf_conn___init to kfunc taking
nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we
only want to accept refcounted pointers and not e.g. ct->master.

Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-10-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ef69aa3a

net: netfilter: Add kfuncs to set and change CT timeout · 0b389236

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in
order to change nf_conn timeout. This is same as ctnetlink_change_timeout,
hence code is shared between both by extracting it out to
__nf_ct_change_timeout. It is also updated to return an error when it
sees IPS_FIXED_TIMEOUT_BIT bit in ct->status, as that check was missing.

Apart from this, bpf_ct_set_timeout is only called for newly allocated
CT so it doesn't need to inspect the status field just yet. Sharing the
helpers even if it was possible would make timeout setting helper
sensitive to order of setting status and timeout after allocation.

Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-9-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0b389236

net: netfilter: Add kfuncs to allocate and insert CT · d7e79c97

Lorenzo Bianconi authored Jul 21, 2022

Introduce bpf_xdp_ct_alloc, bpf_skb_ct_alloc and bpf_ct_insert_entry
kfuncs in order to insert a new entry from XDP and TC programs.
Introduce bpf_nf_ct_tuple_parse utility routine to consolidate common
code.

We extract out a helper __nf_ct_set_timeout, used by the ctnetlink and
nf_conntrack_bpf code, extract it out to nf_conntrack_core, so that
nf_conntrack_bpf doesn't need a dependency on CONFIG_NF_CT_NETLINK.
Later this helper will be reused as a helper to set timeout of allocated
but not yet inserted CT entry.

The allocation functions return struct nf_conn___init instead of
nf_conn, to distinguish allocated CT from an already inserted or looked
up CT. This is later used to enforce restrictions on what kfuncs
allocated CT can be used with.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-8-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d7e79c97

net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup · aed8ee7f

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Move common checks inside the common function, and maintain the only
difference the two being how to obtain the struct net * from ctx.
No functional change intended.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-7-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

aed8ee7f

bpf: Add documentation for kfuncs · 63e564eb

Kumar Kartikeya Dwivedi authored Jul 21, 2022

As the usage of kfuncs grows, we are starting to form consensus on the
kinds of attributes and annotations that kfuncs can have. To better help
developers make sense of the various options available at their disposal
to present an unstable API to the BPF users, document the various kfunc
flags and annotations, their expected usage, and explain the process of
defining and registering a kfunc set.

Cc: KP Singh <kpsingh@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-6-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

63e564eb

bpf: Add support for forcing kfunc args to be trusted · 56e948ff

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Teach the verifier to detect a new KF_TRUSTED_ARGS kfunc flag, which
means each pointer argument must be trusted, which we define as a
pointer that is referenced (has non-zero ref_obj_id) and also needs to
have its offset unchanged, similar to how release functions expect their
argument. This allows a kfunc to receive pointer arguments unchanged
from the result of the acquire kfunc.

This is required to ensure that kfunc that operate on some object only
work on acquired pointers and not normal PTR_TO_BTF_ID with same type
which can be obtained by pointer walking. The restrictions applied to
release arguments also apply to trusted arguments. This implies that
strict type matching (not deducing type by recursively following members
at offset) and OBJ_RELEASE offset checks (ensuring they are zero) are
used for trusted pointer arguments.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-5-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

56e948ff

bpf: Switch to new kfunc flags infrastructure · a4703e31

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Instead of populating multiple sets to indicate some attribute and then
researching the same BTF ID in them, prepare a single unified BTF set
which indicates whether a kfunc is allowed to be called, and also its
attributes if any at the same time. Now, only one call is needed to
perform the lookup for both kfunc availability and its attributes.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a4703e31

tools/resolve_btfids: Add support for 8-byte BTF sets · ef2c6f37

Kumar Kartikeya Dwivedi authored Jul 21, 2022

A flag is a 4-byte symbol that may follow a BTF ID in a set8. This is
used in the kernel to tag kfuncs in BTF sets with certain flags. Add
support to adjust the sorting code so that it passes size as 8 bytes
for 8-byte BTF sets.

Cc: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220721134245.2450-3-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ef2c6f37

bpf: Introduce 8-byte BTF set · ab21d606

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Introduce support for defining flags for kfuncs using a new set of
macros, BTF_SET8_START/BTF_SET8_END, which define a set which contains
8 byte elements (each of which consists of a pair of BTF ID and flags),
using a new BTF_ID_FLAGS macro.

This will be used to tag kfuncs registered for a certain program type
as acquire, release, sleepable, ret_null, etc. without having to create
more and more sets which was proving to be an unscalable solution.

Now, when looking up whether a kfunc is allowed for a certain program,
we can also obtain its kfunc flags in the same call and avoid further
lookups.

The resolve_btfids change is split into a separate patch.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-2-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ab21d606

21 Jul, 2022 5 commits

bpf, docs: Use SPDX license identifier in bpf_doc.py · 5cb62b75

Alejandro Colomar authored Jul 21, 2022

The Linux man-pages project now uses SPDX tags, instead of the full
license text.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://www.kernel.org/doc/man-pages/licenses.html
Link: https://spdx.org/licenses/Linux-man-pages-copyleft.html
Link: https://lore.kernel.org/bpf/20220721110821.8240-1-alx.manpages@gmail.com

5cb62b75

bpf, arm64: Fix compile error in dummy_tramp() · 339ed900

Xu Kuohai authored Jul 21, 2022

dummy_tramp() uses "lr" to refer to the x30 register, but some assembler
does not recognize "lr" and reports a build failure:

/tmp/cc52xO0c.s: Assembler messages:
/tmp/cc52xO0c.s:8: Error: operand 1 should be an integer register -- `mov lr,x9'
/tmp/cc52xO0c.s:7: Error: undefined symbol lr used as an immediate value
make[2]: *** [scripts/Makefile.build:250: arch/arm64/net/bpf_jit_comp.o] Error 1
make[1]: *** [scripts/Makefile.build:525: arch/arm64/net] Error 2

So replace "lr" with "x30" to fix it.

Fixes: b2ad54e1 ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/bpf/20220721121319.2999259-1-xukuohai@huaweicloud.com

339ed900

bpf: Check attach_func_proto more carefully in check_helper_call · aef9d4a3

Stanislav Fomichev authored Jul 20, 2022

Syzkaller found a problem similar to d1a6edec ("bpf: Check
attach_func_proto more carefully in check_return_code") where
attach_func_proto might be NULL:

RIP: 0010:check_helper_call+0x3dcb/0x8d50 kernel/bpf/verifier.c:7330
do_check kernel/bpf/verifier.c:12302 [inline]
do_check_common+0x6e1e/0xb980 kernel/bpf/verifier.c:14610
do_check_main kernel/bpf/verifier.c:14673 [inline]
bpf_check+0x661e/0xc520 kernel/bpf/verifier.c:15243
bpf_prog_load+0x11ae/0x1f80 kernel/bpf/syscall.c:2620

With the following reproducer:

bpf$BPF_PROG_RAW_TRACEPOINT_LOAD(0x5, &(0x7f0000000780)={0xf, 0x4, &(0x7f0000000040)=@framed={{}, [@call={0x85, 0x0, 0x0, 0xbb}]}, &(0x7f0000000000)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, 0x2b, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0}, 0x80)

Let's do the same here, only check attach_func_proto for the prog types
where we are certain that attach_func_proto is defined.

Fixes: 69fd337a ("bpf: per-cgroup lsm flavor")
Reported-by: syzbot+0f8d989b1fba1addc5e0@syzkaller.appspotmail.com
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220720164729.147544-1-sdf@google.com

aef9d4a3

libbpf: Fix str_has_sfx()'s return value · 14229b81

Dan Carpenter authored Jul 19, 2022

The return from strcmp() is inverted so it wrongly returns true instead
of false and vice versa.

Fixes: a1c9d61b ("libbpf: Improve library identification for uprobe binary path resolution")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/YtZ+/dAA195d99ak@kili

14229b81

libbpf: Fix sign expansion bug in btf_dump_get_enum_value() · c6018fc6

Dan Carpenter authored Jul 19, 2022

The code here is supposed to take a signed int and store it in a signed
long long. Unfortunately, the way that the type promotion works with
this conditional statement is that it takes a signed int, type promotes
it to a __u32, and then stores that as a signed long long. The result is
never negative.

This is from static analysis, but I made a little test program just to
test it before I sent the patch:

  #include <stdio.h>

  int main(void)
  {
        unsigned long long src = -1ULL;
        signed long long dst1, dst2;
        int is_signed = 1;

        dst1 = is_signed ? *(int *)&src : *(unsigned int *)0;
        dst2 = is_signed ? (signed long long)*(int *)&src : *(unsigned int *)0;

        printf("%lld\n", dst1);
        printf("%lld\n", dst2);

        return 0;
  }

Fixes: d90ec262 ("libbpf: Add enum64 support for btf_dump")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/YtZ+LpgPADm7BeEd@kili

c6018fc6

20 Jul, 2022 1 commit

bpf: Fix bpf_trampoline_{,un}link_cgroup_shim ifdef guards · 9cb61fda

Stanislav Fomichev authored Jul 20, 2022

They were updated in kernel/bpf/trampoline.c to fix another build
issue. We should to do the same for include/linux/bpf.h header.

Fixes: 3908fcdd ("bpf: fix lsm_cgroup build errors on esoteric configs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220720155220.4087433-1-sdf@google.com

9cb61fda

19 Jul, 2022 20 commits

libbpf: fix an snprintf() overflow check · b77ffb30

Dan Carpenter authored Jul 19, 2022

The snprintf() function returns the number of bytes it *would* have
copied if there were enough space.  So it can return > the
sizeof(gen->attach_target).

Fixes: 67234743 ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/YtZ+oAySqIhFl6/J@kiliSigned-off-by: Alexei Starovoitov <ast@kernel.org>

b77ffb30

selftests/bpf: fix a test for snprintf() overflow · c5d22f4c

Dan Carpenter authored Jul 19, 2022

The snprintf() function returns the number of bytes which *would*
have been copied if there were space.  In other words, it can be
> sizeof(pin_path).

Fixes: c0fa1b6c ("bpf: btf: Add BTF tests")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/YtZ+aD/tZMkgOUw+@kiliSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c5d22f4c

bpf, docs: document BPF_MAP_TYPE_HASH and variants · 979855d3

Donald Hunter authored Jul 18, 2022

Add documentation for BPF_MAP_TYPE_HASH including kernel version
introduced, usage and examples. Document BPF_MAP_TYPE_PERCPU_HASH,
BPF_MAP_TYPE_LRU_HASH and BPF_MAP_TYPE_LRU_PERCPU_HASH variations.

Note that this file is included in the BPF documentation by the glob in
Documentation/bpf/maps.rst

v3:
Fix typos reported by Stanislav Fomichev and Yonghong Song.
Add note about iteration and deletion as requested by Yonghong Song.

v2:
Describe memory allocation semantics as suggested by Stanislav Fomichev.
Fix u64 typo reported by Stanislav Fomichev.
Cut down usage examples to only show usage in context.
Updated patch description to follow style recommendation, reported by
Bagas Sanjaya.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220718125847.1390-1-donald.hunter@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

979855d3

selftests/bpf: test eager BPF ringbuf size adjustment logic · e1346019

Andrii Nakryiko authored Jul 15, 2022

Add test validating that libbpf adjusts (and reflects adjusted) ringbuf
size early, before bpf_object is loaded. Also make sure we can't
successfully resize ringbuf map after bpf_object is loaded.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220715230952.2219271-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e1346019

libbpf: make RINGBUF map size adjustments more eagerly · 597fbc46

Andrii Nakryiko authored Jul 15, 2022

Make libbpf adjust RINGBUF map size (rounding it up to closest power-of-2
of page_size) more eagerly: during open phase when initializing the map
and on explicit calls to bpf_map__set_max_entries().

Such approach allows user to check actual size of BPF ringbuf even
before it's created in the kernel, but also it prevents various edge
case scenarios where BPF ringbuf size can get out of sync with what it
would be in kernel. One of them (reported in [0]) is during an attempt
to pin/reuse BPF ringbuf.

Move adjust_ringbuf_sz() helper closer to its first actual use. The
implementation of the helper is unchanged.

Also make detection of whether bpf_object is already loaded more robust
by checking obj->loaded explicitly, given that map->fd can be < 0 even
if bpf_object is already loaded due to ability to disable map creation
with bpf_map__set_autocreate(map, false).

  [0] Closes: https://github.com/libbpf/libbpf/pull/530

Fixes: 0087a681 ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220715230952.2219271-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

597fbc46

bpf: fix bpf_skb_pull_data documentation · bdb2bc75

Joanne Koong authored Jul 15, 2022

Fix documentation for bpf_skb_pull_data() helper for
when len == 0.

Fixes: fa15601a ("bpf: add documentation for eBPF helpers (33-41)")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220715193800.3940070-1-joannelkoong@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

bdb2bc75

libbpf: fallback to tracefs mount point if debugfs is not mounted · a1ac9fd6

Andrii Nakryiko authored Jul 15, 2022

Teach libbpf to fallback to tracefs mount point (/sys/kernel/tracing) if
debugfs (/sys/kernel/debug/tracing) isn't mounted.
Acked-by: Yonghong Song <yhs@fb.com>
Suggested-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715185736.898848-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a1ac9fd6

bpf: Don't redirect packets with invalid pkt_len · fd189422

Zhengchao Shao authored Jul 15, 2022

Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any
skbs, that is, the flow->head is null.
The root cause, as the [2] says, is because that bpf_prog_test_run_skb()
run a bpf prog which redirects empty skbs.
So we should determine whether the length of the packet modified by bpf
prog or others like bpf_prog_test is valid before forwarding it directly.

LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5
LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html

Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fd189422

Merge branch 'BPF array map fixes and improvements' · 92f61973

Alexei Starovoitov authored Jul 19, 2022

Andrii Nakryiko says:

====================

Fix 32-bit overflow in value pointer calculations in BPF array map. And then
raise obsolete limit on array map value size. Add selftest making sure this is
working as intended.

v1->v2:
  - fix broken patch #1 (no mask_index use in helper, as stated in commit
    message; and add missing semicolon).
====================
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

92f61973

selftests/bpf: validate .bss section bigger than 8MB is possible now · 24316461

Andrii Nakryiko authored Jul 14, 2022

Add a simple big 16MB array and validate access to the very last byte of
it to make sure that kernel supports > KMALLOC_MAX_SIZE value_size for
BPF array maps (which are backing .bss in this case).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715053146.1291891-5-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

24316461

bpf: remove obsolete KMALLOC_MAX_SIZE restriction on array map value size · 63b8ce77

Andrii Nakryiko authored Jul 14, 2022

Syscall-side map_lookup_elem() and map_update_elem() used to use
kmalloc() to allocate temporary buffers of value_size, so
KMALLOC_MAX_SIZE limit on value_size made sense to prevent creation of
array map that won't be accessible through syscall interface.

But this limitation since has been lifted by relying on kvmalloc() in
syscall handling code. So remove KMALLOC_MAX_SIZE, which among other
things means that it's possible to have BPF global variable sections
(.bss, .data, .rodata) bigger than 8MB now. Keep the sanity check to
prevent trivial overflows like round_up(map->value_size, 8) and restrict
value size to <= INT_MAX (2GB).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715053146.1291891-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

63b8ce77

bpf: make uniform use of array->elem_size everywhere in arraymap.c · d937bc34

Andrii Nakryiko authored Jul 14, 2022

BPF_MAP_TYPE_ARRAY is rounding value_size to closest multiple of 8 and
stores that as array->elem_size for various memory allocations and
accesses.

But the code tends to re-calculate round_up(map->value_size, 8) in
multiple places instead of using array->elem_size. Cleaning this up and
making sure we always use array->size to avoid duplication of this
(admittedly simple) logic for consistency.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715053146.1291891-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d937bc34

bpf: fix potential 32-bit overflow when accessing ARRAY map element · 87ac0d60

Andrii Nakryiko authored Jul 14, 2022

If BPF array map is bigger than 4GB, element pointer calculation can
overflow because both index and elem_size are u32. Fix this everywhere
by forcing 64-bit multiplication. Extract this formula into separate
small helper and use it consistently in various places.

Speculative-preventing formula utilizing index_mask trick is left as is,
but explicit u64 casts are added in both places.

Fixes: c85d6913 ("bpf: move memory size checks to bpf_map_charge_init()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715053146.1291891-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

87ac0d60

docs/bpf: Update documentation for BTF_KIND_FUNC · e5e23424

Indu Bhagat authored Jul 14, 2022

The vlen bits in the BTF type of kind BTF_KIND_FUNC are used to convey the
linkage information for functions. The Linux kernel only supports
linkage values of BTF_FUNC_STATIC and BTF_FUNC_GLOBAL at this time.
Signed-off-by: Indu Bhagat <indu.bhagat@oracle.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220714223310.1140097-1-indu.bhagat@oracle.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e5e23424

bpf: fix lsm_cgroup build errors on esoteric configs · 3908fcdd

Stanislav Fomichev authored Jul 14, 2022

This particular ones is about having the following:
 CONFIG_BPF_LSM=y
 # CONFIG_CGROUP_BPF is not set

Also, add __maybe_unused to the args for the !CONFIG_NET cases.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220714185404.3647772-1-sdf@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

3908fcdd

Merge branch 'Add SEC("ksyscall") support' · ab850abb

Alexei Starovoitov authored Jul 19, 2022

Andrii Nakryiko says:

====================

Add SEC("ksyscall")/SEC("kretsyscall") sections and corresponding
bpf_program__attach_ksyscall() API that simplifies tracing kernel syscalls
through kprobe mechanism. Kprobing syscalls isn't trivial due to varying
syscall handler names in the kernel and various ways syscall argument are
passed, depending on kernel architecture and configuration. SEC("ksyscall")
allows user to not care about such details and just get access to syscall
input arguments, while libbpf takes care of necessary feature detection logic.

There are still more quirks that are not straightforward to hide completely
(see comments about mmap(), clone() and compat syscalls), so in such more
advanced scenarios user might need to fall back to plain SEC("kprobe")
approach, but for absolute majority of users SEC("ksyscall") is a big
improvement.

As part of this patch set libbpf adds two more virtual __kconfig externs, in
addition to existing LINUX_KERNEL_VERSION: LINUX_HAS_BPF_COOKIE and
LINUX_HAS_SYSCALL_WRAPPER, which let's libbpf-provided BPF-side code minimize
external dependencies and assumptions and let's user-space part of libbpf to
perform all the feature detection logic. This benefits USDT support code,
which now doesn't depend on BPF CO-RE for its functionality.

v1->v2:
  - normalize extern variable-related warn and debug message formats (Alan);
rfc->v1:
  - drop dependency on kallsyms and speed up SYSCALL_WRAPPER detection (Alexei);
  - drop dependency on /proc/config.gz in bpf_tracing.h (Yaniv);
  - add doc comment and ephasize mmap(), clone() and compat quirks that are
    not supported (Ilya);
  - use mechanism similar to LINUX_KERNEL_VERSION to also improve USDT code.
====================
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ab850abb

selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests · d814ed62

Andrii Nakryiko authored Jul 14, 2022

Convert few selftest that used plain SEC("kprobe") with arch-specific
syscall wrapper prefix to ksyscall/kretsyscall and corresponding
BPF_KSYSCALL macro. test_probe_user.c is especially benefiting from this
simplification.
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-6-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d814ed62

libbpf: add ksyscall/kretsyscall sections support for syscall kprobes · 708ac5be

Andrii Nakryiko authored Jul 14, 2022

Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
kretsyscall variants (for return kprobes) to allow users to kprobe
syscall functions in kernel. These special sections allow to ignore
complexities and differences between kernel versions and host
architectures when it comes to syscall wrapper and corresponding
__<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
whether host kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER (though libbpf
itself doesn't rely on /proc/config.gz for detecting this, see
BPF_KSYSCALL patch for how it's done internally).

Combined with the use of BPF_KSYSCALL() macro, this allows to just
specify intended syscall name and expected input arguments and leave
dealing with all the variations to libbpf.

In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
bpf_program__attach_ksyscall() API which allows to specify syscall name
at runtime and provide associated BPF cookie value.

At the moment SEC("ksyscall") and bpf_program__attach_ksyscall() do not
handle all the calling convention quirks for mmap(), clone() and compat
syscalls. It also only attaches to "native" syscall interfaces. If host
system supports compat syscalls or defines 32-bit syscalls in 64-bit
kernel, such syscall interfaces won't be attached to by libbpf.

These limitations may or may not change in the future. Therefore it is
recommended to use SEC("kprobe") for these syscalls or if working with
compat and 32-bit interfaces is required.
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-5-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

708ac5be

libbpf: improve BPF_KPROBE_SYSCALL macro and rename it to BPF_KSYSCALL · 6f5d467d

Andrii Nakryiko authored Jul 14, 2022

Improve BPF_KPROBE_SYSCALL (and rename it to shorter BPF_KSYSCALL to
match libbpf's SEC("ksyscall") section name, added in next patch) to use
__kconfig variable to determine how to properly fetch syscall arguments.

Instead of relying on hard-coded knowledge of whether kernel's
architecture uses syscall wrapper or not (which only reflects the latest
kernel versions, but is not necessarily true for older kernels and won't
necessarily hold for later kernel versions on some particular host
architecture), determine this at runtime by attempting to create
perf_event (with fallback to kprobe event creation through tracefs on
legacy kernels, just like kprobe attachment code is doing) for kernel
function that would correspond to bpf() syscall on a system that has
CONFIG_ARCH_HAS_SYSCALL_WRAPPER set (e.g., for x86-64 it would try
'__x64_sys_bpf').

If host kernel uses syscall wrapper, syscall kernel function's first
argument is a pointer to struct pt_regs that then contains syscall
arguments. In such case we need to use bpf_probe_read_kernel() to fetch
actual arguments (which we do through BPF_CORE_READ() macro) from inner
pt_regs.

But if the kernel doesn't use syscall wrapper approach, input
arguments can be read from struct pt_regs directly with no probe reading.

All this feature detection is done without requiring /proc/config.gz
existence and parsing, and BPF-side helper code uses newly added
LINUX_HAS_SYSCALL_WRAPPER virtual __kconfig extern to keep in sync with
user-side feature detection of libbpf.

BPF_KSYSCALL() macro can be used both with SEC("kprobe") programs that
define syscall function explicitly (e.g., SEC("kprobe/__x64_sys_bpf"))
and SEC("ksyscall") program added in the next patch (which are the same
kprobe program with added benefit of libbpf determining correct kernel
function name automatically).

Kretprobe and kretsyscall (added in next patch) programs don't need
BPF_KSYSCALL as they don't provide access to input arguments. Normal
BPF_KRETPROBE is completely sufficient and is recommended.
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6f5d467d

selftests/bpf: add test of __weak unknown virtual __kconfig extern · ce6dc74a

Andrii Nakryiko authored Jul 14, 2022

Exercise libbpf's logic for unknown __weak virtual __kconfig externs.
USDT selftests are already excercising non-weak known virtual extern
already (LINUX_HAS_BPF_COOKIE), so no need to add explicit tests for it.
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ce6dc74a