Commits · 083818156d1e98f22b1ac612a3957bc553e7ba57 · Kirill Smelkov / linux

10 Aug, 2022 9 commits

bpf: Remove unneeded memset in queue_stack_map creation · 08381815

Yafang Shao authored Aug 10, 2022

__GFP_ZERO will clear the memory, so we don't need to memset it.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20220810151840.16394-2-laoar.shao@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

08381815

libbpf: preserve errno across pr_warn/pr_info/pr_debug · d7c5802f

Andrii Nakryiko authored Aug 10, 2022

As suggested in [0], make sure that libbpf_print saves and restored
errno and as such guaranteed that no matter what actual print callback
user installs, macros like pr_warn/pr_info/pr_debug are completely
transparent as far as errno goes.

While libbpf code is pretty careful about not clobbering important errno
values accidentally with pr_warn(), it's a trivial change to make sure
that pr_warn can be used anywhere without a risk of clobbering errno.

No functional changes, just future proofing.

  [0] https://github.com/libbpf/libbpf/pull/536Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/r/20220810183425.1998735-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d7c5802f

Merge branch 'destructive bpf_kfuncs' · 43caeec9

Alexei Starovoitov authored Aug 10, 2022

Artem Savkov says:

====================

eBPF is often used for kernel debugging, and one of the widely used and
powerful debugging techniques is post-mortem debugging with a full memory dump.
Triggering a panic at exactly the right moment allows the user to get such a
dump and thus a better view at the system's state. Right now the only way to
do this in BPF is to signal userspace to trigger kexec/panic. This is
suboptimal as going through userspace requires context changes and adds
significant delays taking system further away from "the right moment". On a
single-cpu system the situation is even worse because BPF program won't even be
able to block the thread of interest.

This patchset tries to solve this problem by allowing properly marked tracing
bpf programs to call crash_kexec() kernel function. The only requirement for
now to run programs calling crash_kexec() or other destructive kfuncs is
CAP_SYS_BOOT capability. When signature checking for bpf programs is available
it is possible that stricter rules will be applied to programs utilizing
destructive kfuncs.

Changes in v5:
 - documentation numbering fixed
 - no more warning on failed kfunc registration

Changes in v4:
 - added description for KF_DESTRUCTIVE flag to documentation

Changes in v3:
 - moved kfunc set registration to kernel/bpf/helpers.c

Changes in v2:
 - BPF_PROG_LOAD flag dropped as it doesn't fully achieve it's aim of
   preventing accidental execution of destructive bpf programs
 - selftest moved to the end of patchset
 - switched to kfunc destructive flag instead of a separate set

Changes from RFC:
 - sysctl knob dropped
 - using crash_kexec() instead of panic()
 - using kfuncs instead of adding a new helper
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

43caeec9

selftests/bpf: add destructive kfunc test · e3389458

Artem Savkov authored Aug 10, 2022

Add a test checking that programs calling destructive kfuncs can only do
so if they have CAP_SYS_BOOT capabilities.
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Link: https://lore.kernel.org/r/20220810065905.475418-4-asavkov@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e3389458

bpf: export crash_kexec() as destructive kfunc · 13379059

Artem Savkov authored Aug 10, 2022

Allow properly marked bpf programs to call crash_kexec().
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Link: https://lore.kernel.org/r/20220810065905.475418-3-asavkov@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

13379059

bpf: add destructive kfunc flag · 4dd48c6f

Artem Savkov authored Aug 10, 2022

Add KF_DESTRUCTIVE flag for destructive functions. Functions with this
flag set will require CAP_SYS_BOOT capabilities.
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Link: https://lore.kernel.org/r/20220810065905.475418-2-asavkov@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4dd48c6f

selftests/xsk: Update poll test cases · 3143d10b

Shibin Koikkara Reeny authored Aug 03, 2022

Poll test case was not testing all the functionality of the poll feature
in the test suite. This patch updates the poll test case which contains 2
test cases to test the RX and the TX poll functionality and additional 2
more test cases to check the timeout feature of the poll event.

Poll test suite has 4 test cases:

1. TEST_TYPE_RX_POLL: Check if RX path POLLIN function works as expect.
TX path can use any method to send the traffic.

2. TEST_TYPE_TX_POLL: Check if TX path POLLOUT function works as expect.
RX path can use any method to receive the traffic.

3. TEST_TYPE_POLL_RXQ_EMPTY: Call poll function with parameter POLLIN on
empty RX queue will cause timeout. If timeout then test case passes.

4. TEST_TYPE_POLL_TXQ_FULL: When TX queue is filled and packets are not
cleaned by the kernel then if we invoke the poll function with POLLOUT
it should trigger timeout.
Signed-off-by: Shibin Koikkara Reeny <shibin.koikkara.reeny@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20220803144354.98122-1-shibin.koikkara.reeny@intel.com

3143d10b

selftests/bpf: add extra test for using dynptr data slice after release · dc444be8

Joanne Koong authored Aug 09, 2022

Add an additional test, "data_slice_use_after_release2", for ensuring
that data slices are correctly invalidated by the verifier after the
dynptr whose ref obj id they track is released. In particular, this
tests data slice invalidation for dynptrs located at a non-zero offset
from the frame pointer.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220809214055.4050604-2-joannelkoong@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

dc444be8

bpf: Fix ref_obj_id for dynptr data slices in verifier · 88374342

Joanne Koong authored Aug 09, 2022

When a data slice is obtained from a dynptr (through the bpf_dynptr_data API),
the ref obj id of the dynptr must be found and then associated with the data
slice.

The ref obj id of the dynptr must be found *before* the caller saved regs are
reset. Without this fix, the ref obj id tracking is not correct for
dynptrs that are at an offset from the frame pointer.

Please also note that the data slice's ref obj id must be assigned after the
ret types are parsed, since RET_PTR_TO_ALLOC_MEM-type return regs get
zero-marked.

Fixes: 34d4ef57 ("bpf: Add dynptr data slices")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20220809214055.4050604-1-joannelkoong@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

88374342

09 Aug, 2022 11 commits

selftests/bpf: Fix vmtest.sh getopts optstring · a7be0ab1

Daniel Xu authored Aug 09, 2022

Before, you could see the following errors:

  $ ./vmtest.sh -j
  ./vmtest.sh: option requires an argument -- j
  ./vmtest.sh: line 357: OPTARG: unbound variable

  $ ./vmtest.sh -z
  ./vmtest.sh: illegal option -- z
  ./vmtest.sh: line 357: OPTARG: unbound variable

Fix by adding ':' as first character of optstring. Reason is that getopts
requires ':' as the first character for OPTARG to be set in the `?` and `:`
error cases.

Note that the ':' as the first character of the optstring switches getopts
to silent mode. The desire to run in this mode seems to have been there all
along, as the script takes care of reporting errors.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/0f93b56198328b6b4da7b4cf4662d05c3edb5fd2.1660064925.git.dxu@dxuuu.xyz

a7be0ab1

selftests/bpf: Fix vmtest.sh -h to not require root · d020b236

Daniel Xu authored Aug 09, 2022

Set the exit trap only after argument parsing is done. This way argument
parse failure or `-h` will not require sudo.

Reasoning is that it's confusing that a help message would require root
access.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/6a802aa37758e5a7e6aa5de294634f5518005e2b.1660064925.git.dxu@dxuuu.xyz

d020b236

bpf: Always return corresponding btf_type in __get_type_size() · a00ed843

Yonghong Song authored Aug 07, 2022

Currently in funciton __get_type_size(), the corresponding
btf_type is returned only in invalid cases. Let us always
return btf_type regardless of valid or invalid cases.
Such a new functionality will be used in subsequent patches.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220807175116.4179242-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a00ed843

Merge branch 'Add BPF-helper for accessing CLOCK_TAI' · 11b91485

Alexei Starovoitov authored Aug 09, 2022

Kurt Kanzenbach says:

====================

Hi,

add a BPF-helper for accessing CLOCK_TAI. Use cases for such a BPF helper
include functionalities such as Tx launch time (e.g. ETF and TAPRIO Qdiscs),
timestamping and policing.

Patch #1 - Introduce BPF helper
Patch #2 - Add test case (skb based)

Changes since v1:

 * Update changelog (Alexei Starovoitov)
 * Add test case (Alexei Starovoitov, Andrii Nakryiko)
 * Add missing function prototype (netdev ci)

Previous versions:

 * v1: https://lore.kernel.org/r/20220606103734.92423-1-kurt@linutronix.de/

Jesper Dangaard Brouer (1):
  bpf: Add BPF-helper for accessing CLOCK_TAI
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

11b91485

selftests/bpf: Add BPF-helper test for CLOCK_TAI access · 64e15820

Kurt Kanzenbach authored Aug 09, 2022

Add BPF-helper test case for CLOCK_TAI access. The added test verifies that:

 * Timestamps are generated
 * Timestamps are moving forward
 * Timestamps are reasonable
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20220809060803.5773-3-kurt@linutronix.deSigned-off-by: Alexei Starovoitov <ast@kernel.org>

64e15820

bpf: Add BPF-helper for accessing CLOCK_TAI · c8996c98

Jesper Dangaard Brouer authored Aug 09, 2022

Commit 3dc6ffae ("timekeeping: Introduce fast accessor to clock tai")
introduced a fast and NMI-safe accessor for CLOCK_TAI. Especially in time
sensitive networks (TSN), where all nodes are synchronized by Precision Time
Protocol (PTP), it's helpful to have the possibility to generate timestamps
based on CLOCK_TAI instead of CLOCK_MONOTONIC. With a BPF helper for TAI in
place, it becomes very convenient to correlate activity across different
machines in the network.

Use cases for such a BPF helper include functionalities such as Tx launch
time (e.g. ETF and TAPRIO Qdiscs) and timestamping.

Note: CLOCK_TAI is nothing new per se, only the NMI-safe variant of it is.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
[Kurt: Wrote changelog and renamed helper]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20220809060803.5773-2-kurt@linutronix.deSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c8996c98

bpf: Cleanup check_refcount_ok · b2d8ef19

Dave Marchevsky authored Aug 08, 2022

Discussion around a recently-submitted patch provided historical
context for check_refcount_ok [0]. Specifically, the function and its
helpers - may_be_acquire_function and arg_type_may_be_refcounted -
predate the OBJ_RELEASE type flag and the addition of many more helpers
with acquire/release semantics.

The purpose of check_refcount_ok is to ensure:
1) Helper doesn't have multiple uses of return reg's ref_obj_id
2) Helper with release semantics only has one arg needing to be
released, since that's tracked using meta->ref_obj_id

With current verifier, it's safe to remove check_refcount_ok and its
helpers. Since addition of OBJ_RELEASE type flag, case 2) has been
handled by the arg_type_is_release check in check_func_arg. To ensure
case 1) won't result in verifier silently prioritizing one use of
ref_obj_id, this patch adds a helper_multiple_ref_obj_use check which
fails loudly if a helper passes > 1 test for use of ref_obj_id.

[0]: lore.kernel.org/bpf/20220713234529.4154673-1-davemarchevsky@fb.com
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220808171559.3251090-1-davemarchevsky@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

b2d8ef19

net: netfilter: Remove ifdefs for code shared by BPF and ctnetlink · 6e116280

Kumar Kartikeya Dwivedi authored Jul 25, 2022

The current ifdefry for code shared by the BPF and ctnetlink side looks
ugly. As per Pablo's request, simplify this by unconditionally compiling
in the code. This can be revisited when the shared code between the two
grows further.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220725085130.11553-1-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6e116280

bpf, iter: Fix the condition on p when calling stop. · be3bb83d

Hao Luo authored Aug 05, 2022

In bpf_seq_read, seq->op->next() could return an ERR and jump to
the label stop. However, the existing code in stop does not handle
the case when p (returned from next()) is an ERR. Adds the handling
of ERR of p by converting p into an error and jumping to done.

Because all the current implementations do not have a case that
returns ERR from next(), so this patch doesn't have behavior changes
right now.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220805214821.1058337-4-haoluo@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

be3bb83d

cgroup: enable cgroup_get_from_file() on cgroup1 · f3a2aebd

Yosry Ahmed authored Aug 05, 2022

cgroup_get_from_file() currently fails with -EBADF if called on cgroup
v1. However, the current implementation works on cgroup v1 as well, so
the restriction is unnecessary.

This enabled cgroup_get_from_fd() to work on cgroup v1, which would be
the only thing stopping bpf cgroup_iter from supporting cgroup v1.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220805214821.1058337-3-haoluo@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f3a2aebd

btf: Add a new kfunc flag which allows to mark a function to be sleepable · fa96b242

Benjamin Tissoires authored Aug 05, 2022

This allows to declare a kfunc as sleepable and prevents its use in
a non sleepable program.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Co-developed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220805214821.1058337-2-haoluo@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fa96b242

08 Aug, 2022 5 commits

bpf: Improve docstring for BPF_F_USER_BUILD_ID flag · ca34ce29

Dave Marchevsky authored Aug 08, 2022

Most tools which use bpf_get_stack or bpf_get_stackid symbolicate the
stack - meaning the stack of addresses in the target process' address
space is transformed into meaningful symbol names. The
BPF_F_USER_BUILD_ID flag eases this process by finding the build_id of
the file-backed vma which the address falls in and translating the
address to an offset within the backing file.

To be more specific, the offset is a "file offset" from the beginning of
the backing file. The symbols in ET_DYN ELF objects have a st_value
which is also described as an "offset" - but an offset in the process
address space, relative to the base address of the object.

It's necessary to translate between the "file offset" and "virtual
address offset" during symbolication before they can be directly
compared. Failure to do so can lead to confusing bugs, so this patch
clarifies language in the documentation in an attempt to keep this from
happening.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220808164723.3107500-1-davemarchevsky@fb.com

ca34ce29

libbpf: Do not require executable permission for shared libraries · 9e32084e

Hengqi Chen authored Aug 06, 2022

Currently, resolve_full_path() requires executable permission for both
programs and shared libraries. This causes failures on distos like Debian
since the shared libraries are not installed executable and Linux is not
requiring shared libraries to have executable permissions. Let's remove
executable permission check for shared libraries.
Reported-by: Goro Fuji <goro@fastly.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220806102021.3867130-1-hengqi.chen@gmail.com

9e32084e

bpf: Verifier cleanups · 0c9a7a7e

Joanne Koong authored Aug 02, 2022

This patch cleans up a few things in the verifier:

  * type_is_pkt_pointer():
    Future work (skb + xdp dynptrs [0]) will be using the reg type
    PTR_TO_PACKET | PTR_MAYBE_NULL. type_is_pkt_pointer() should return
    true for any type whose base type is PTR_TO_PACKET, regardless of
    flags attached to it.

  * reg_type_may_be_refcounted_or_null():
    Get the base type at the start of the function to avoid
    having to recompute it / improve readability

  * check_func_proto(): remove unnecessary 'meta' arg

  * check_helper_call():
    Use switch casing on the base type of return value instead of
    nested ifs on the full type

There are no functional behavior changes.

  [0] https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220802214638.3643235-1-joannelkoong@gmail.com

0c9a7a7e

libbpf: Reject legacy 'maps' ELF section · e19db676

Andrii Nakryiko authored Aug 03, 2022

Add explicit error message if BPF object file is still using legacy BPF
map definitions in SEC("maps"). Before this change, if BPF object file
is still using legacy map definition user will see a bit confusing:

libbpf: elf: skipping unrecognized data section(4) maps
libbpf: prog 'handler': bad map relo against 'server_map' in section 'maps'

Now libbpf will be explicit about rejecting "maps" ELF section:

libbpf: elf: legacy map definitions in 'maps' section are not supported by libbpf v1.0+
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220803214202.23750-1-andrii@kernel.org

e19db676

selftests/bpf: Clean up sys_nanosleep uses · 5653f55e

Joanne Koong authored Aug 05, 2022

This patch cleans up a few things:

  * dynptr_fail.c:
    There is no sys_nanosleep tracepoint. dynptr_fail only tests
    that the prog load fails, so just SEC("?raw_tp") suffices here.

  * test_bpf_cookie:
    There is no sys_nanosleep kprobe. The prog is loaded in
    userspace through bpf_program__attach_kprobe_opts passing in
    SYS_NANOSLEEP_KPROBE_NAME, so just SEC("k{ret}probe") suffices here.

  * test_helper_restricted:
    There is no sys_nanosleep kprobe. test_helper_restricted only tests
    that the prog load fails, so just SEC("?kprobe")( suffices here.

There are no functional changes.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220805171405.2272103-1-joannelkoong@gmail.com

5653f55e

04 Aug, 2022 4 commits

libbpf: Ensure functions with always_inline attribute are inline · d25f40ff

James Hilliard authored Aug 03, 2022

GCC expects the always_inline attribute to only be set on inline
functions, as such we should make all functions with this attribute
use the __always_inline macro which makes the function inline and
sets the attribute.

Fixes errors like:
/home/buildroot/bpf-next/tools/testing/selftests/bpf/tools/include/bpf/bpf_tracing.h:439:1: error: ‘always_inline’ function might not be inlinable [-Werror=attributes]
  439 | ____##name(unsigned long long *ctx, ##args)
      | ^~~~
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220803151403.793024-1-james.hilliard1@gmail.com

d25f40ff

bpftool: Remove BPF_OBJ_NAME_LEN restriction when looking up bpf program by name · d55dfe58

Manu Bretelle authored Aug 01, 2022

bpftool was limiting the length of names to BPF_OBJ_NAME_LEN in prog_parse
fds.

Since commit b662000a ("bpftool: Adding support for BTF program names")
we can get the full program name from BTF.

This patch removes the restriction of name length when running `bpftool
prog show name ${name}`.

Test:
Tested against some internal program names that were longer than
`BPF_OBJ_NAME_LEN`, here a redacted example of what was ran to test.

    # previous behaviour
    $ sudo bpftool prog show name some_long_program_name
    Error: can't parse name
    # with the patch
    $ sudo ./bpftool prog show name some_long_program_name
    123456789: tracing  name some_long_program_name  tag taghexa  gpl ....
    ...
    ...
    ...
    # too long
    sudo ./bpftool prog show name $(python3 -c 'print("A"*128)')
    Error: can't parse name
    # not too long but no match
    $ sudo ./bpftool prog show name $(python3 -c 'print("A"*127)')
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220801132409.4147849-1-chantr4@gmail.com

d55dfe58

libbpf: Initialize err in probe_map_create · 3045f42a

Florian Fainelli authored Jul 31, 2022

GCC-11 warns about the possibly unitialized err variable in
probe_map_create:

libbpf_probes.c: In function 'probe_map_create':
libbpf_probes.c:361:38: error: 'err' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  361 |                 return fd < 0 && err == exp_err ? 1 : 0;
      |                                  ~~~~^~~~~~~~~~

Fixes: 878d8def ("libbpf: Rework feature-probing APIs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220801025109.1206633-1-f.fainelli@gmail.com

3045f42a

libbpf: Skip empty sections in bpf_object__init_global_data_maps · 47ea7417

James Hilliard authored Jul 31, 2022

The GNU assembler generates an empty .bss section. This is a well
established behavior in GAS that happens in all supported targets.

The LLVM assembler doesn't generate an empty .bss section.

bpftool chokes on the empty .bss section.

Additionally in bpf_object__elf_collect the sec_desc->data is not
initialized when a section is not recognized. In this case, this
happens with .comment.

So we must check that sec_desc->data is initialized before checking
if the size is 0.
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220731232649.4668-1-james.hilliard1@gmail.com

47ea7417

03 Aug, 2022 11 commits

Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · f86d1fbb

Linus Torvalds authored Aug 03, 2022

Pull networking changes from Paolo Abeni:
"Core:

- Refactor the forward memory allocation to better cope with memory
pressure with many open sockets, moving from a per socket cache to
a per-CPU one

- Replace rwlocks with RCU for better fairness in ping, raw sockets
and IP multicast router.

- Network-side support for IO uring zero-copy send.

- A few skb drop reason improvements, including codegen the source
file with string mapping instead of using macro magic.

- Rename reference tracking helpers to a more consistent netdev_*
schema.

- Adapt u64_stats_t type to address load/store tearing issues.

- Refine debug helper usage to reduce the log noise caused by bots.

BPF:

- Improve socket map performance, avoiding skb cloning on read
operation.

- Add support for 64 bits enum, to match types exposed by kernel.

- Introduce support for sleepable uprobes program.

- Introduce support for enum textual representation in libbpf.

- New helpers to implement synproxy with eBPF/XDP.

- Improve loop performances, inlining indirect calls when possible.

- Removed all the deprecated libbpf APIs.

- Implement new eBPF-based LSM flavor.

- Add type match support, which allow accurate queries to the eBPF
used types.

- A few TCP congetsion control framework usability improvements.

- Add new infrastructure to manipulate CT entries via eBPF programs.

- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.

Protocols:

- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.

- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.

- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.

- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.

- Support for changing the initial MTPCP subflow priority/backup
status

- Introduce virtually contingus buffers for sockets over RDMA, to
cope better with memory pressure.

- Extend CAN ethtool support with timestamping capabilities

- Refactor CAN build infrastructure to allow building only the needed
features.

Driver API:

- Remove devlink mutex to allow parallel commands on multiple links.

- Add support for pause stats in distributed switch.

- Implement devlink helpers to query and flash line cards.

- New helper for phy mode to register conversion.

New hardware / drivers:

- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.

- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.

- Ethernet DSA driver for the Microchip LAN937x switch.

- Ethernet PHY driver for the Aquantia AQR113C EPHY.

- CAN driver for the OBD-II ELM327 interface.

- CAN driver for RZ/N1 SJA1000 CAN controller.

- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.

Drivers:

- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload

- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate

- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default

- Microsoft vNIC driver (mana)
- add support for XDP redirect

- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support

- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability (parts 1-6)
- support for PTP in Spectrum-2 asics

- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY

- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload

- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share the
probe code (parts 1, 2), add support for phylink mac
configuration

- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support

Old code removal:

- Neterion vxge ethernet driver: this is untouched since more than 10 years"

* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
doc: sfp-phylink: Fix a broken reference
wireguard: selftests: support UML
wireguard: allowedips: don't corrupt stack when detecting overflow
wireguard: selftests: update config fragments
wireguard: ratelimiter: use hrtimer in selftest
net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
net: usb: ax88179_178a: Bind only to vendor-specific interface
selftests: net: fix IOAM test skip return code
net: usb: make USB_RTL8153_ECM non user configurable
net: marvell: prestera: remove reduntant code
octeontx2-pf: Reduce minimum mtu size to 60
net: devlink: Fix missing mutex_unlock() call
net/tls: Remove redundant workqueue flush before destroy
net: txgbe: Fix an error handling path in txgbe_probe()
net: dsa: Fix spelling mistakes and cleanup code
Documentation: devlink: add add devlink-selftests to the table of contents
dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
net: ionic: fix error check for vlan flags in ionic_set_nic_features()
net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
nfp: flower: add support for tunnel offload without key ID
...

f86d1fbb

Merge tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 526942b8

Linus Torvalds authored Aug 03, 2022

Pull ATA updates from Damien Le Moal:

 - Some code refactoring for the pata_hpt37x and pata_hpt3x2n drivers,
   from Sergei.

 - Several patches to cleanup in libata-core, libata-scsi and libata-eh
   code: fixes arguments and variables types, change some functions
   declaration to static and fix for a typo in a comment. From Sergey
   and Xiang.

 - Fix a compilation warning in the pata_macio driver, from me.

 - A fix for the expected number of resources in the sata_mv driver fix,
   from Andrew.

* tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: sata_mv: Fixes expected number of resources now IRQs are gone
  ata: libata-scsi: fix result type of ata_ioc32()
  ata: pata_macio: Fix compilation warning
  ata: libata-eh: fix sloppy result type of ata_internal_cmd_timeout()
  ata: libata-core: fix sloppy parameter type in ata_exec_internal[_sg]()
  ata: make ata_port::fastdrain_cnt *unsigned int*
  ata: libata-eh: fix sloppy result type of ata_eh_nr_in_flight()
  ata: libata-core: make ata_exec_internal_sg() *static*
  ata: make transfer mode masks *unsigned int*
  ata: libata-core: get rid of *else* branches in ata_id_n_sectors()
  ata: libata-core: fix sloppy typing in ata_id_n_sectors()
  ata: pata_hpt3x2n: pass base DPLL frequency to hpt3x2n_pci_clock()
  ata: pata_hpt37x: merge hpt374_read_freq() to hpt37x_pci_clock()
  ata: pata_hpt37x: factor out hpt37x_pci_clock()
  ata: pata_hpt37x: move claculating PCI clock from hpt37x_clock_slot()
  ata: libata: Fix syntax errors in comments

526942b8

Merge tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · a39b5dbd

Linus Torvalds authored Aug 03, 2022

Pull zonefs update from Damien Le Moal:
 "A single change for this cycle to simplify handling of the memory page
  used as super block buffer during mount (from Fabio)"

* tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Call page_address() on page acquired with GFP_KERNEL flag

a39b5dbd

Merge tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f18d7309

Linus Torvalds authored Aug 03, 2022

Pull iomap updates from Darrick Wong:
 "The most notable change in this first batch is that we no longer
  schedule pages beyond i_size for writeback, preferring instead to let
  truncate deal with those pages.

  Next week, there may be a second pull request to remove
  iomap_writepage from the other two filesystems (gfs2/zonefs) that use
  iomap for buffered IO. This follows in the same vein as the recent
  removal of writepage from XFS, since it hasn't been triggered in a few
  years; it does nothing during direct reclaim; and as far as the people
  who examined the patchset can tell, it's moving the codebase in the
  right direction.

  However, as it was a late addition to for-next, I'm holding off on
  that section for another week of testing to see if anyone can come up
  with a solid reason for holding off in the meantime.

  Summary:

   - Skip writeback for pages that are completely beyond EOF

   - Minor code cleanups"

* tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  dax: set did_zero to true when zeroing successfully
  iomap: set did_zero to true when zeroing successfully
  iomap: skip pages past eof in iomap_do_writepage()

f18d7309

Merge tag 'affs-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 2e4f8c72

Linus Torvalds authored Aug 03, 2022

Pull affs fix from David Sterba:
 "One update to AFFS, switching away from the kmap/kmap_atomic API"

* tag 'affs-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  affs: use memcpy_to_page and remove replace kmap_atomic()

2e4f8c72

Merge tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 353767e4

Linus Torvalds authored Aug 03, 2022

Pull btrfs updates from David Sterba:
 "This brings some long awaited changes, the send protocol bump,
  otherwise lots of small improvements and fixes. The main core part is
  reworking bio handling, cleaning up the submission and endio and
  improving error handling.

  There are some changes outside of btrfs adding helpers or updating
  API, listed at the end of the changelog.

  Features:

   - sysfs:
      - export chunk size, in debug mode add tunable for setting its size
      - show zoned among features (was only in debug mode)
      - show commit stats (number, last/max/total duration)

   - send protocol updated to 2
      - new commands:
         - ability write larger data chunks than 64K
         - send raw compressed extents (uses the encoded data ioctls),
           ie. no decompression on send side, no compression needed on
           receive side if supported
         - send 'otime' (inode creation time) among other timestamps
         - send file attributes (a.k.a file flags and xflags)
      - this is first version bump, backward compatibility on send and
        receive side is provided
      - there are still some known and wanted commands that will be
        implemented in the near future, another version bump will be
        needed, however we want to minimize that to avoid causing
        usability issues

   - print checksum type and implementation at mount time

   - don't print some messages at mount (mentioned as people asked about
     it), we want to print messages namely for new features so let's
     make some space for that
      - big metadata - this has been supported for a long time and is
        not a feature that's worth mentioning
      - skinny metadata - same reason, set by default by mkfs

  Performance improvements:

   - reduced amount of reserved metadata for delayed items
      - when inserted items can be batched into one leaf
      - when deleting batched directory index items
      - when deleting delayed items used for deletion
      - overall improved count of files/sec, decreased subvolume lock
        contention

   - metadata item access bounds checker micro-optimized, with a few
     percent of improved runtime for metadata-heavy operations

   - increase direct io limit for read to 256 sectors, improved
     throughput by 3x on sample workload

  Notable fixes:

   - raid56
      - reduce parity writes, skip sectors of stripe when there are no
        data updates
      - restore reading from on-disk data instead of using stripe cache,
        this reduces chances to damage correct data due to RMW cycle

   - refuse to replay log with unknown incompat read-only feature bit
     set

   - zoned
      - fix page locking when COW fails in the middle of allocation
      - improved tracking of active zones, ZNS drives may limit the
        number and there are ENOSPC errors due to that limit and not
        actual lack of space
      - adjust maximum extent size for zone append so it does not cause
        late ENOSPC due to underreservation

   - mirror reading error messages show the mirror number

   - don't fallback to buffered IO for NOWAIT direct IO writes, we don't
     have the NOWAIT semantics for buffered io yet

   - send, fix sending link commands for existing file paths when there
     are deleted and created hardlinks for same files

   - repair all mirrors for profiles with more than 1 copy (raid1c34)

   - fix repair of compressed extents, unify where error detection and
     repair happen

  Core changes:

   - bio completion cleanups
      - don't double defer compression bios
      - simplify endio workqueues
      - add more data to btrfs_bio to avoid allocation for read requests
      - rework bio error handling so it's same what block layer does,
        the submission works and errors are consumed in endio
      - when asynchronous bio offload fails fall back to synchronous
        checksum calculation to avoid errors under writeback or memory
        pressure

   - new trace points
      - raid56 events
      - ordered extent operations

   - super block log_root_transid deprecated (never used)

   - mixed_backref and big_metadata sysfs feature files removed, they've
     been default for sufficiently long time, there are no known users
     and mixed_backref could be confused with mixed_groups

  Non-btrfs changes, API updates:

   - minor highmem API update to cover const arguments

   - switch all kmap/kmap_atomic to kmap_local

   - remove redundant flush_dcache_page()

   - address_space_operations::writepage callback removed

   - add bdev_max_segments() helper"

* tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (163 commits)
  btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read
  btrfs: fix repair of compressed extents
  btrfs: remove the start argument to check_data_csum and export
  btrfs: pass a btrfs_bio to btrfs_repair_one_sector
  btrfs: simplify the pending I/O counting in struct compressed_bio
  btrfs: repair all known bad mirrors
  btrfs: merge btrfs_dev_stat_print_on_error with its only caller
  btrfs: join running log transaction when logging new name
  btrfs: simplify error handling in btrfs_lookup_dentry
  btrfs: send: always use the rbtree based inode ref management infrastructure
  btrfs: send: fix sending link commands for existing file paths
  btrfs: send: introduce recorded_ref_alloc and recorded_ref_free
  btrfs: zoned: wait until zone is finished when allocation didn't progress
  btrfs: zoned: write out partially allocated region
  btrfs: zoned: activate necessary block group
  btrfs: zoned: activate metadata block group on flush_space
  btrfs: zoned: disable metadata overcommit for zoned
  btrfs: zoned: introduce space_info->active_total_bytes
  btrfs: zoned: finish least available block group on data bg allocation
  btrfs: let can_allocate_chunk return error
  ...

353767e4

Merge tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · ab17c0cd

Linus Torvalds authored Aug 03, 2022

Pull efivars sysfs interface removal from Ard Biesheuvel:
 "Remove the obsolete 'efivars' sysfs based interface to the EFI
  variable store, now that all users have moved to the efivarfs pseudo
  file system, which was created ~10 years ago to address some
  fundamental shortcomings in the sysfs based driver.

  Move the 'business logic' related to which EFI variables are important
  and may affect the boot flow from the efivars support layer into the
  efivarfs pseudo file system, so it is no longer exposed to other parts
  of the kernel"

* tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: vars: Move efivar caching layer into efivarfs
  efi: vars: Switch to new wrapper layer
  efi: vars: Remove deprecated 'efivars' sysfs interface

ab17c0cd

Merge tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · 97a77ab1

Linus Torvalds authored Aug 03, 2022

Pull EFI updates from Ard Biesheuvel:

 - Enable mirrored memory for arm64

 - Fix up several abuses of the efivar API

 - Refactor the efivar API in preparation for moving the 'business
   logic' part of it into efivarfs

 - Enable ACPI PRM on arm64

* tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
  ACPI: Move PRM config option under the main ACPI config
  ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64
  ACPI: PRM: Change handler_addr type to void pointer
  efi: Simplify arch_efi_call_virt() macro
  drivers: fix typo in firmware/efi/memmap.c
  efi: vars: Drop __efivar_entry_iter() helper which is no longer used
  efi: vars: Use locking version to iterate over efivars linked lists
  efi: pstore: Omit efivars caching EFI varstore access layer
  efi: vars: Add thin wrapper around EFI get/set variable interface
  efi: vars: Don't drop lock in the middle of efivar_init()
  pstore: Add priv field to pstore_record for backend specific use
  Input: applespi - avoid efivars API and invoke EFI services directly
  selftests/kexec: remove broken EFI_VARS secure boot fallback check
  brcmfmac: Switch to appropriate helper to load EFI variable contents
  iwlwifi: Switch to proper EFI variable store interface
  media: atomisp_gmin_platform: stop abusing efivar API
  efi: efibc: avoid efivar API for setting variables
  efi: avoid efivars layer when loading SSDTs from variables
  efi: Correct comment on efi_memmap_alloc
  memblock: Disable mirror feature if kernelcore is not specified
  ...

97a77ab1

Merge tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ff89dd08

Linus Torvalds authored Aug 03, 2022

Pull 9p iov_iter fix from Al Viro:
 "net/9p abuses iov_iter primitives - it attempts to copy _from_ a
  destination-only iov_iter when it handles Rerror arriving in reply to
  zero-copy request.   Not hard to fix, fortunately.

  This is a prereq for the iov_iter_get_pages() work in the second part
  of iov_iter series, ended up in a separate branch"

* tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  9p: handling Rerror without copy_from_iter_full()

ff89dd08

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d9b58ab7

Linus Torvalds authored Aug 03, 2022

Pull copy_to_iter_mc fix from Al Viro:
 "Backportable fix for copy_to_iter_mc() - the second part of iov_iter
  work will pretty much overwrite this, but would be much harder to
  backport"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix short copy handling in copy_mc_pipe_to_iter()

d9b58ab7

Merge tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 5264406c

Linus Torvalds authored Aug 03, 2022

Pull vfs iov_iter updates from Al Viro:
 "Part 1 - isolated cleanups and optimizations.

  One of the goals is to reduce the overhead of using ->read_iter() and
  ->write_iter() instead of ->read()/->write().

  new_sync_{read,write}() has a surprising amount of overhead, in
  particular inside iocb_flags(). That's the explanation for the
  beginning of the series is in this pile; it's not directly
  iov_iter-related, but it's a part of the same work..."

* tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  first_iovec_segment(): just return address
  iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
  iov_iter: first_{iovec,bvec}_segment() - simplify a bit
  iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()
  iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT
  iov_iter_bvec_advance(): don't bother with bvec_iter
  copy_page_{to,from}_iter(): switch iovec variants to generic
  keep iocb_flags() result cached in struct file
  iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
  struct file: use anonymous union member for rcuhead and llist
  btrfs: use IOMAP_DIO_NOSYNC
  teach iomap_dio_rw() to suppress dsync
  No need of likely/unlikely on calls of check_copy_size()

5264406c