Commits · af66bfd3c8538ed21cf72af18426fc4a408665cf · Kirill Smelkov / linux

An error occurred fetching the project authors.

05 Dec, 2023 5 commits

bpf: Optimize the free of inner map · af66bfd3

Hou Tao authored 1 year ago

When removing the inner map from the outer map, the inner map will be
freed after one RCU grace period and one RCU tasks trace grace
period, so it is certain that the bpf program, which may access the
inner map, has exited before the inner map is freed.

However there is no need to wait for one RCU tasks trace grace period if
the outer map is only accessed by non-sleepable program. So adding
sleepable_refcnt in bpf_map and increasing sleepable_refcnt when adding
the outer map into env->used_maps for sleepable program. Although the
max number of bpf program is INT_MAX - 1, the number of bpf programs
which are being loaded may be greater than INT_MAX, so using atomic64_t
instead of atomic_t for sleepable_refcnt. When removing the inner map
from the outer map, using sleepable_refcnt to decide whether or not a
RCU tasks trace grace period is needed before freeing the inner map.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-6-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

af66bfd3

bpf: Defer the free of inner map when necessary · 87667336

Hou Tao authored 1 year ago

When updating or deleting an inner map in map array or map htab, the map
may still be accessed by non-sleepable program or sleepable program.
However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map
directly through bpf_map_put(), if the ref-counter is the last one
(which is true for most cases), the inner map will be freed by
ops->map_free() in a kworker. But for now, most .map_free() callbacks
don't use synchronize_rcu() or its variants to wait for the elapse of a
RCU grace period, so after the invocation of ops->map_free completes,
the bpf program which is accessing the inner map may incur
use-after-free problem.

Fix the free of inner map by invoking bpf_map_free_deferred() after both
one RCU grace period and one tasks trace RCU grace period if the inner
map has been removed from the outer map before. The deferment is
accomplished by using call_rcu() or call_rcu_tasks_trace() when
releasing the last ref-counter of bpf map. The newly-added rcu_head
field in bpf_map shares the same storage space with work field to
reduce the size of bpf_map.

Fixes: bba1dc0b ("bpf: Remove redundant synchronize_rcu.")
Fixes: 638e4b82 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-5-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

87667336

bpf: Set need_defer as false when clearing fd array during map free · 79d93b3c

Hou Tao authored 1 year ago

Both map deletion operation, map release and map free operation use
fd_array_map_delete_elem() to remove the element from fd array and
need_defer is always true in fd_array_map_delete_elem(). For the map
deletion operation and map release operation, need_defer=true is
necessary, because the bpf program, which accesses the element in fd
array, may still alive. However for map free operation, it is certain
that the bpf program which owns the fd array has already been exited, so
setting need_defer as false is appropriate for map free operation.

So fix it by adding need_defer parameter to bpf_fd_array_map_clear() and
adding a new helper __fd_array_map_delete_elem() to handle the map
deletion, map release and map free operations correspondingly.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-4-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

79d93b3c

bpf: Add map and need_defer parameters to .map_fd_put_ptr() · 20c20bd1

Hou Tao authored 1 year ago

map is the pointer of outer map, and need_defer needs some explanation.
need_defer tells the implementation to defer the reference release of
the passed element and ensure that the element is still alive before
the bpf program, which may manipulate it, exits.

The following three cases will invoke map_fd_put_ptr() and different
need_defer values will be passed to these callers:

1) release the reference of the old element in the map during map update
or map deletion. The release must be deferred, otherwise the bpf
program may incur use-after-free problem, so need_defer needs to be
true.
2) release the reference of the to-be-added element in the error path of
map update. The to-be-added element is not visible to any bpf
program, so it is OK to pass false for need_defer parameter.
3) release the references of all elements in the map during map release.
Any bpf program which has access to the map must have been exited and
released, so need_defer=false will be OK.

These two parameters will be used by the following patches to fix the
potential use-after-free problem for map-in-map.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-3-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

20c20bd1

bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers · 169410eb

Hou Tao authored 1 year ago

These three bpf_map_{lookup,update,delete}_elem() helpers are also
available for sleepable bpf program, so add the corresponding lock
assertion for sleepable bpf program, otherwise the following warning
will be reported when a sleepable bpf program manipulates bpf map under
interpreter mode (aka bpf_jit_enable=0):

  WARNING: CPU: 3 PID: 4985 at kernel/bpf/helpers.c:40 ......
  CPU: 3 PID: 4985 Comm: test_progs Not tainted 6.6.0+ #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
  RIP: 0010:bpf_map_lookup_elem+0x54/0x60
  ......
  Call Trace:
   <TASK>
   ? __warn+0xa5/0x240
   ? bpf_map_lookup_elem+0x54/0x60
   ? report_bug+0x1ba/0x1f0
   ? handle_bug+0x40/0x80
   ? exc_invalid_op+0x18/0x50
   ? asm_exc_invalid_op+0x1b/0x20
   ? __pfx_bpf_map_lookup_elem+0x10/0x10
   ? rcu_lockdep_current_cpu_online+0x65/0xb0
   ? rcu_is_watching+0x23/0x50
   ? bpf_map_lookup_elem+0x54/0x60
   ? __pfx_bpf_map_lookup_elem+0x10/0x10
   ___bpf_prog_run+0x513/0x3b70
   __bpf_prog_run32+0x9d/0xd0
   ? __bpf_prog_enter_sleepable_recur+0xad/0x120
   ? __bpf_prog_enter_sleepable_recur+0x3e/0x120
   bpf_trampoline_6442580665+0x4d/0x1000
   __x64_sys_getpgid+0x5/0x30
   ? do_syscall_64+0x36/0xb0
   entry_SYSCALL_64_after_hwframe+0x6e/0x76
   </TASK>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-2-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

169410eb

04 Dec, 2023 2 commits

selftests/bpf: Fix spelling mistake "get_signaure_size" -> "get_signature_size" · 153de60e

Colin Ian King authored 1 year ago

There is a spelling mistake in an ASSERT_GT message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231204093940.2611954-1-colin.i.king@gmail.com

153de60e

bpf: Minor logging improvement · 5bd90cdc

Andrei Matei authored 1 year ago

One place where we were logging a register was only logging the variable
part, not also the fixed part.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231204011248.2040084-1-andreimatei1@gmail.com

5bd90cdc

02 Dec, 2023 19 commits

Merge branch 'bpf-verifier-retval-logic-fixes' · 90679706

Alexei Starovoitov authored 1 year ago

Andrii Nakryiko says:

====================
BPF verifier retval logic fixes

This patch set fixes BPF verifier logic around validating and enforcing return
values for BPF programs that have specific range of expected return values.
Both sync and async callbacks have similar logic and are fixes as well.
A few tests are added that would fail without the fixes in this patch set.

Also, while at it, we update retval checking logic to use smin/smax range
instead of tnum, avoiding future potential issues if expected range cannot be
represented precisely by tnum (e.g., [0, 2] is not representable by tnum and
is treated as [0, 3]).

There is a little bit of refactoring to unify async callback and program exit
logic to avoid duplication of checks as much as possible.

v4->v5:
  - fix timer_bad_ret test on no-alu32 flavor (CI);
v3->v4:
  - add back bpf_func_state rearrangement patch;
  - simplified patch #4 as suggested (Shung-Hsi);
v2->v3:
  - more carefullly switch from umin/umax to smin/smax;
v1->v2:
  - drop tnum from retval checks (Eduard);
  - use smin/smax instead of umin/umax (Alexei).
====================

Link: https://lore.kernel.org/r/20231202175705.885270-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

90679706

bpf: simplify tnum output if a fully known constant · 81eff2e3

Andrii Nakryiko authored 1 year ago

Emit tnum representation as just a constant if all bits are known.
Use decimal-vs-hex logic to determine exact format of emitted
constant value, just like it's done for register range values.
For that move tnum_strn() to kernel/bpf/log.c to reuse decimal-vs-hex
determination logic and constants.
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-12-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

81eff2e3

selftests/bpf: adjust global_func15 test to validate prog exit precision · 5c19e1d0

Andrii Nakryiko authored 1 year ago

Add one more subtest to global_func15 selftest to validate that
verifier properly marks r0 as precise and avoids erroneous state pruning
of the branch that has return value outside of expected [0, 1] value.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-11-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5c19e1d0

selftests/bpf: validate async callback return value check correctness · e02dea15

Andrii Nakryiko authored 1 year ago

Adjust timer/timer_ret_1 test to validate more carefully verifier logic
of enforcing async callback return value. This test will pass only if
return result is marked precise and read.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-10-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e02dea15

bpf: enforce precision of R0 on program/async callback return · eabe518d

Andrii Nakryiko authored 1 year ago

Given we enforce a valid range for program and async callback return
value, we must mark R0 as precise to avoid incorrect state pruning.

Fixes: b5dc0163 ("bpf: precise scalar_value tracking")
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-9-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

eabe518d

bpf: unify async callback and program retval checks · 0ef24c8d

Andrii Nakryiko authored 1 year ago

Use common logic to verify program return values and async callback
return values. This allows to avoid duplication of any extra steps
necessary, like precision marking, which will be added in the next
patch.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-8-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0ef24c8d

bpf: enforce precise retval range on program exit · c871d0e0

Andrii Nakryiko authored 1 year ago

Similarly to subprog/callback logic, enforce return value of BPF program
using more precise smin/smax range.

We need to adjust a bunch of tests due to a changed format of an error
message.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-7-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c871d0e0

selftests/bpf: add selftest validating callback result is enforced · 60a6b2c7

Andrii Nakryiko authored 1 year ago

BPF verifier expects callback subprogs to return values from specified
range (typically [0, 1]). This requires that r0 at exit is both precise
(because we rely on specific value range) and is marked as read
(otherwise state comparison will ignore such register as unimportant).

Add a simple test that validates that all these conditions are enforced.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-6-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

60a6b2c7

bpf: enforce exact retval range on subprog/callback exit · 8fa4ecd4

Andrii Nakryiko authored 1 year ago

Instead of relying on potentially imprecise tnum representation of
expected return value range for callbacks and subprogs, validate that
smin/smax range satisfy exact expected range of return values.

E.g., if callback would need to return [0, 2] range, tnum can't
represent this precisely and instead will allow [0, 3] range. By
checking smin/smax range, we can make sure that subprog/callback indeed
returns only valid [0, 2] range.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-5-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8fa4ecd4

bpf: enforce precision of R0 on callback return · 0acd03a5

Andrii Nakryiko authored 1 year ago

Given verifier checks actual value, r0 has to be precise, so we need to
propagate precision properly. r0 also has to be marked as read,
otherwise subsequent state comparisons will ignore such register as
unimportant and precision won't really help here.

Fixes: 69c087ba ("bpf: Add bpf_for_each_map_elem() helper")
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0acd03a5

bpf: provide correct register name for exception callback retval check · 5fad52be

Andrii Nakryiko authored 1 year ago

bpf_throw() is checking R1, so let's report R1 in the log.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5fad52be

bpf: rearrange bpf_func_state fields to save a bit of memory · 45b5623f

Andrii Nakryiko authored 1 year ago

It's a trivial rearrangement saving 8 bytes. We have 4 bytes of padding
at the end which can be filled with another field without increasing
struct bpf_func_state.

copy_func_state() logic remains correct without any further changes.

BEFORE
======
struct bpf_func_state {
        struct bpf_reg_state       regs[11];             /*     0  1320 */
        /* --- cacheline 20 boundary (1280 bytes) was 40 bytes ago --- */
        int                        callsite;             /*  1320     4 */
        u32                        frameno;              /*  1324     4 */
        u32                        subprogno;            /*  1328     4 */
        u32                        async_entry_cnt;      /*  1332     4 */
        bool                       in_callback_fn;       /*  1336     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 21 boundary (1344 bytes) --- */
        struct tnum                callback_ret_range;   /*  1344    16 */
        bool                       in_async_callback_fn; /*  1360     1 */
        bool                       in_exception_callback_fn; /*  1361     1 */

        /* XXX 2 bytes hole, try to pack */

        int                        acquired_refs;        /*  1364     4 */
        struct bpf_reference_state * refs;               /*  1368     8 */
        int                        allocated_stack;      /*  1376     4 */

        /* XXX 4 bytes hole, try to pack */

        struct bpf_stack_state *   stack;                /*  1384     8 */

        /* size: 1392, cachelines: 22, members: 13 */
        /* sum members: 1379, holes: 3, sum holes: 13 */
        /* last cacheline: 48 bytes */
};

AFTER
=====
struct bpf_func_state {
        struct bpf_reg_state       regs[11];             /*     0  1320 */
        /* --- cacheline 20 boundary (1280 bytes) was 40 bytes ago --- */
        int                        callsite;             /*  1320     4 */
        u32                        frameno;              /*  1324     4 */
        u32                        subprogno;            /*  1328     4 */
        u32                        async_entry_cnt;      /*  1332     4 */
        struct tnum                callback_ret_range;   /*  1336    16 */
        /* --- cacheline 21 boundary (1344 bytes) was 8 bytes ago --- */
        bool                       in_callback_fn;       /*  1352     1 */
        bool                       in_async_callback_fn; /*  1353     1 */
        bool                       in_exception_callback_fn; /*  1354     1 */

        /* XXX 1 byte hole, try to pack */

        int                        acquired_refs;        /*  1356     4 */
        struct bpf_reference_state * refs;               /*  1360     8 */
        struct bpf_stack_state *   stack;                /*  1368     8 */
        int                        allocated_stack;      /*  1376     4 */

        /* size: 1384, cachelines: 22, members: 13 */
        /* sum members: 1379, holes: 1, sum holes: 1 */
        /* padding: 4 */
        /* last cacheline: 40 bytes */
};
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

45b5623f

Merge branch 'bpf-file-verification-with-lsm-and-fsverity' · 6685aadc

Alexei Starovoitov authored 1 year ago

Song Liu says:

====================
bpf: File verification with LSM and fsverity

Changes v14 => v15:
1. Fix selftest build without CONFIG_FS_VERITY. (Alexei)
2. Add Acked-by from KP.

Changes v13 => v14:
1. Add "static" for bpf_fs_kfunc_set.
2. Add Acked-by from Christian Brauner.

Changes v12 => v13:
1. Only keep 4/9 through 9/9 of v12, as the first 3 patches already
   applied;
2. Use new macro __bpf_kfunc_[start|end]_defs().

Changes v11 => v12:
1. Fix typo (data_ptr => sig_ptr) in bpf_get_file_xattr().

Changes v10 => v11:
1. Let __bpf_dynptr_data() return const void *. (Andrii)
2. Optimize code to reuse output from __bpf_dynptr_size(). (Andrii)
3. Add __diag_ignore_all("-Wmissing-declarations") for kfunc definition.
4. Fix an off indentation. (Andrii)

Changes v9 => v10:
1. Remove WARN_ON_ONCE() from check_reg_const_str. (Alexei)

Changes v8 => v9:
1. Fix test_progs kfunc_dynptr_param/dynptr_data_null.

Changes v7 => v8:
1. Do not use bpf_dynptr_slice* in the kernel. Add __bpf_dynptr_data* and
   use them in ther kernel. (Andrii)

Changes v6 => v7:
1. Change "__const_str" annotation to "__str". (Alexei, Andrii)
2. Add KF_TRUSTED_ARGS flag for both new kfuncs. (KP)
3. Only allow bpf_get_file_xattr() to read xattr with "user." prefix.
4. Add Acked-by from Eric Biggers.

Changes v5 => v6:
1. Let fsverity_init_bpf() return void. (Eric Biggers)
2. Sort things in alphabetic orders. (Eric Biggers)

Changes v4 => v5:
1. Revise commit logs. (Alexei)

Changes v3 => v4:
1. Fix error reported by CI.
2. Update comments of bpf_dynptr_slice* that they may return error pointer.

Changes v2 => v3:
1. Rebase and resolve conflicts.

Changes v1 => v2:
1. Let bpf_get_file_xattr() use const string for arg "name". (Alexei)
2. Add recursion prevention with allowlist. (Alexei)
3. Let bpf_get_file_xattr() use __vfs_getxattr() to avoid recursion,
   as vfs_getxattr() calls into other LSM hooks.
4. Do not use dynptr->data directly, use helper insteadd. (Andrii)
5. Fixes with bpf_get_fsverity_digest. (Eric Biggers)
6. Add documentation. (Eric Biggers)
7. Fix some compile warnings. (kernel test robot)

This set enables file verification with BPF LSM and fsverity.

In this solution, fsverity is used to provide reliable and efficient hash
of files; and BPF LSM is used to implement signature verification (against
asymmetric keys), and to enforce access control.

This solution can be used to implement access control in complicated cases.
For example: only signed python binary and signed python script and access
special files/devices/ports.

Thanks,
Song
====================

Link: https://lore.kernel.org/r/20231129234417.856536-1-song@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6685aadc

selftests/bpf: Add test that uses fsverity and xattr to sign a file · 1030e915

Song Liu authored 1 year ago

This selftests shows a proof of concept method to use BPF LSM to enforce
file signature. This test is added to verify_pkcs7_sig, so that some
existing logic can be reused.

This file signature method uses fsverity, which provides reliable and
efficient hash (known as digest) of the file. The file digest is signed
with asymmetic key, and the signature is stored in xattr. At the run time,
BPF LSM reads file digest and the signature, and then checks them against
the public key.

Note that this solution does NOT require FS_VERITY_BUILTIN_SIGNATURES.
fsverity is only used to provide file digest. The signature verification
and access control is all implemented in BPF LSM.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231129234417.856536-7-song@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1030e915

selftests/bpf: Add tests for filesystem kfuncs · 341f06fd

Song Liu authored 1 year ago

Add selftests for two new filesystem kfuncs:
  1. bpf_get_file_xattr
  2. bpf_get_fsverity_digest

These tests simply make sure the two kfuncs work. Another selftest will be
added to demonstrate how to use these kfuncs to verify file signature.

CONFIG_FS_VERITY is added to selftests config. However, this is not
sufficient to guarantee bpf_get_fsverity_digest works. This is because
fsverity need to be enabled at file system level (for example, with tune2fs
on ext4). If local file system doesn't have this feature enabled, just skip
the test.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231129234417.856536-6-song@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

341f06fd

selftests/bpf: Sort config in alphabetic order · 6b0ae456

Song Liu authored 1 year ago

Move CONFIG_VSOCKETS up, so the CONFIGs are in alphabetic order.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231129234417.856536-5-song@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6b0ae456

Documentation/bpf: Add documentation for filesystem kfuncs · 0de267d9

Song Liu authored 1 year ago

Add a brief introduction for file system kfuncs:

  bpf_get_file_xattr()
  bpf_get_fsverity_digest()

The documentation highlights the strategy to avoid recursions of these
kfuncs.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231129234417.856536-4-song@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0de267d9

bpf, fsverity: Add kfunc bpf_get_fsverity_digest · 67814c00

Song Liu authored 1 year ago

fsverity provides fast and reliable hash of files, namely fsverity_digest.
The digest can be used by security solutions to verify file contents.

Add new kfunc bpf_get_fsverity_digest() so that we can access fsverity from
BPF LSM programs. This kfunc is added to fs/verity/measure.c because some
data structure used in the function is private to fsverity
(fs/verity/fsverity_private.h).

To avoid recursion, bpf_get_fsverity_digest is only allowed in BPF LSM
programs.
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20231129234417.856536-3-song@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

67814c00

bpf: Add kfunc bpf_get_file_xattr · ac9c05e0

Song Liu authored 1 year ago

It is common practice for security solutions to store tags/labels in
xattrs. To implement similar functionalities in BPF LSM, add new kfunc
bpf_get_file_xattr().

The first use case of bpf_get_file_xattr() is to implement file
verifications with asymmetric keys. Specificially, security applications
could use fsverity for file hashes and use xattr to store file signatures.
(kfunc for fsverity hash will be added in a separate commit.)

Currently, only xattrs with "user." prefix can be read with kfunc
bpf_get_file_xattr(). As use cases evolve, we may add a dedicated prefix
for bpf_get_file_xattr().

To avoid recursion, bpf_get_file_xattr can be only called from LSM hooks.
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/20231129234417.856536-2-song@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ac9c05e0

01 Dec, 2023 14 commits

selftests/bpf: Fix erroneous bitmask operation · b6a3451e

Jeroen van Ingen Schenau authored 1 year ago

xdp_synproxy_kern.c is a BPF program that generates SYN cookies on
allowed TCP ports and sends SYNACKs to clients, accelerating synproxy
iptables module.

Fix the bitmask operation when checking the status of an existing
conntrack entry within tcp_lookup() function. Do not AND with the bit
position number, but with the bitmask value to check whether the entry
found has the IPS_CONFIRMED flag set.

Fixes: fb5cd0ce ("selftests/bpf: Add selftests for raw syncookie helpers")
Signed-off-by: Jeroen van Ingen Schenau <jeroen.vaningenschenau@novoserve.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Minh Le Hoang <minh.lehoang@novoserve.com>
Link: https://lore.kernel.org/xdp-newbies/CAAi1gX7owA+Tcxq-titC-h-KPM7Ri-6ZhTNMhrnPq5gmYYwKow@mail.gmail.com/T/#u
Link: https://lore.kernel.org/bpf/20231130120353.3084-1-jeroen.vaningenschenau@novoserve.com

b6a3451e

octeon_ep: set backpressure watermark for RX queues · 15bc8121

Shinas Rasheed authored 1 year ago

Set backpressure watermark for hardware RX queues. Backpressure
gets triggered when the available buffers of a hardware RX queue
falls below the set watermark. This backpressure will propagate
to packet processing pipeline in the OCTEON card, so that the host
receives fewer packets and prevents packet dropping at host.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15bc8121

octeon_ep: Fix error code in probe() · 0cd523ee

Dan Carpenter authored 1 year ago

Set the error code if octep_ctrl_net_get_mtu() fails.  Currently the code
returns success.

Fixes: 0a5f8534 ("octeon_ep: get max rx packet length from firmware")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Sathesh B Edara <sedara@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cd523ee

Merge branch 'selftests-tc-testing-more-tdc-updates' · 86b88965

Jakub Kicinski authored 1 year ago

Pedro Tammela says:

====================
selftests: tc-testing: more tdc updates

Follow-up on a feedback from Jakub and random cleanups from related
net/sched patches
====================

Link: https://lore.kernel.org/r/20231129222424.910148-1-pctammela@mojatatu.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

86b88965

selftests: tc-testing: remove filters/tests.json · 0fbb5a54

Pedro Tammela authored 1 year ago

Remove this generic file and move the tests to their appropriate files
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231129222424.910148-5-pctammela@mojatatu.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0fbb5a54

selftests: tc-testing: rename concurrency.json to flower.json · 7de8b2ef

Pedro Tammela authored 1 year ago

All tests in this file pertain to flower, so name it appropriately
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231129222424.910148-4-pctammela@mojatatu.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7de8b2ef

selftests: tc-testing: remove spurious './' from Makefile · 74f7e7ee

Pedro Tammela authored 1 year ago

Patchwork CI didn't like the extra './', so remove it.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231129222424.910148-3-pctammela@mojatatu.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

74f7e7ee

selftests: tc-testing: remove spurious nsPlugin usage · f7580f00

Pedro Tammela authored 1 year ago

Tests using DEV2 should not be run in a dedicated net namespace,
and in parallel, as this device cannot be shared.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231129222424.910148-2-pctammela@mojatatu.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f7580f00

docs: netlink: link to family documentations from spec info · e8c780a5

Jakub Kicinski authored 1 year ago

To increase the chances of people finding the rendered docs
add a link to specs.rst and index.rst.

Add a label in the generated index.rst and while at it adjust
the title a little bit.
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231129041427.2763074-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e8c780a5

Merge branch 'support-octeon-cn98-devices' · 981239ee

Jakub Kicinski authored 1 year ago

Shinas Rasheed says:

====================
support OCTEON CN98 devices

Implement device unload control net API required for CN98
devices and add support in driver for the same.

V1: https://lore.kernel.org/all/20231127162135.2529363-1-srasheed@marvell.com/
====================

Link: https://lore.kernel.org/r/20231129045348.2538843-1-srasheed@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

981239ee

octeon_ep: support OCTEON CN98 devices · 068b2b64

Shinas Rasheed authored 1 year ago

Add PCI Endpoint NIC support for Octeon CN98 devices.
CN98 devices are part of Octeon 9 family products with
similar PCI NIC characteristics to CN93, already supported
driver.

Add CN98 card to the device id table, as well
as support differences in the register fields and
certain usage scenarios such as unload.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://lore.kernel.org/r/20231129045348.2538843-3-srasheed@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

068b2b64

octeon_ep: implement device unload control net API · b77e23f1

Shinas Rasheed authored 1 year ago

Device unload control net function should inform firmware
of driver unload to let it take necessary actions to cleanup.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://lore.kernel.org/r/20231129045348.2538843-2-srasheed@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b77e23f1

net/sched: cbs: Use units.h instead of the copy of a definition · 000db9e9

Andy Shevchenko authored 1 year ago

BYTES_PER_KBIT is defined in units.h, use that definition.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231128174813.394462-1-andriy.shevchenko@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

000db9e9

net: phy: mdio_device: Reset device only when necessary · df16c1c5

Andrew Halaney authored 1 year ago

Currently the phy reset sequence is as shown below for a
devicetree described mdio phy on boot:

1. Assert the phy_device's reset as part of registering
2. Deassert the phy_device's reset as part of registering
3. Deassert the phy_device's reset as part of phy_probe
4. Deassert the phy_device's reset as part of phy_hw_init

The extra two deasserts include waiting the deassert delay afterwards,
which is adding unnecessary delay.

This applies to both possible types of resets (reset controller
reference and a reset gpio) that can be used.

Here's some snipped tracing output using the following command line
params "trace_event=gpio:* trace_options=stacktrace" illustrating
the reset handling and where its coming from:

    /* Assert */
       systemd-udevd-283     [002] .....     6.780434: gpio_value: 544 set 0
       systemd-udevd-283     [002] .....     6.783849: <stack trace>
     => gpiod_set_raw_value_commit
     => gpiod_set_value_nocheck
     => gpiod_set_value_cansleep
     => mdio_device_reset
     => mdiobus_register_device
     => phy_device_register
     => fwnode_mdiobus_phy_device_register
     => fwnode_mdiobus_register_phy
     => __of_mdiobus_register
     => stmmac_mdio_register
     => stmmac_dvr_probe
     => stmmac_pltfr_probe
     => devm_stmmac_pltfr_probe
     => qcom_ethqos_probe
     => platform_probe

    /* Deassert */
       systemd-udevd-283     [002] .....     6.802480: gpio_value: 544 set 1
       systemd-udevd-283     [002] .....     6.805886: <stack trace>
     => gpiod_set_raw_value_commit
     => gpiod_set_value_nocheck
     => gpiod_set_value_cansleep
     => mdio_device_reset
     => phy_device_register
     => fwnode_mdiobus_phy_device_register
     => fwnode_mdiobus_register_phy
     => __of_mdiobus_register
     => stmmac_mdio_register
     => stmmac_dvr_probe
     => stmmac_pltfr_probe
     => devm_stmmac_pltfr_probe
     => qcom_ethqos_probe
     => platform_probe

    /* Deassert */
       systemd-udevd-283     [002] .....     6.882601: gpio_value: 544 set 1
       systemd-udevd-283     [002] .....     6.886014: <stack trace>
     => gpiod_set_raw_value_commit
     => gpiod_set_value_nocheck
     => gpiod_set_value_cansleep
     => mdio_device_reset
     => phy_probe
     => really_probe
     => __driver_probe_device
     => driver_probe_device
     => __device_attach_driver
     => bus_for_each_drv
     => __device_attach
     => device_initial_probe
     => bus_probe_device
     => device_add
     => phy_device_register
     => fwnode_mdiobus_phy_device_register
     => fwnode_mdiobus_register_phy
     => __of_mdiobus_register
     => stmmac_mdio_register
     => stmmac_dvr_probe
     => stmmac_pltfr_probe
     => devm_stmmac_pltfr_probe
     => qcom_ethqos_probe
     => platform_probe

    /* Deassert */
      NetworkManager-477     [000] .....     7.023144: gpio_value: 544 set 1
      NetworkManager-477     [000] .....     7.026596: <stack trace>
     => gpiod_set_raw_value_commit
     => gpiod_set_value_nocheck
     => gpiod_set_value_cansleep
     => mdio_device_reset
     => phy_init_hw
     => phy_attach_direct
     => phylink_fwnode_phy_connect
     => __stmmac_open
     => stmmac_open

There's a lot of paths where the device is getting its reset
asserted and deasserted. Let's track the state and only actually
do the assert/deassert when it changes.
Reported-by: Sagar Cheluvegowda <quic_scheluve@quicinc.com>
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231127-net-phy-reset-once-v2-1-448e8658779e@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

df16c1c5