Commits · 5a6b1a206d1f399c9ea19173bd9a945a657f1fbf · Kirill Smelkov / linux

02 Aug, 2020 7 commits

Alexei Starovoitov authored Aug 01, 2020

Andrii Nakryiko says:

====================
This patch set adds new BPF link operation, LINK_DETACH, allowing processes
with BPF link FD to force-detach it from respective BPF hook, similarly how
BPF link is auto-detached when such BPF hook (e.g., cgroup, net_device, netns,
etc) is removed. This facility allows admin to forcefully undo BPF link
attachment, while process that created BPF link in the first place is left
intact.

Once force-detached, BPF link stays valid in the kernel as long as there is at
least one FD open against it. It goes into defunct state, just like
auto-detached BPF link.

bpftool also got `link detach` command to allow triggering this in
non-programmatic fashion.

v1->v2:
- improve error reporting in `bpftool link detach` (Song).
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

5a6b1a20

tools/bpftool: Add documentation and bash-completion for `link detach` · e85f99aa

Andrii Nakryiko authored Jul 31, 2020

Add info on link detach sub-command to man page. Add detach to bash-completion
as well.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com.
Link: https://lore.kernel.org/bpf/20200731182830.286260-6-andriin@fb.com

e85f99aa

tools/bpftool: Add `link detach` subcommand · 0e8c7c07

Andrii Nakryiko authored Jul 31, 2020

Add ability to force-detach BPF link. Also add missing error message, if
specified link ID is wrong.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-5-andriin@fb.com

0e8c7c07

selftests/bpf: Add link detach tests for cgroup, netns, and xdp bpf_links · 90806ccc

Andrii Nakryiko authored Jul 31, 2020

Add bpf_link__detach() testing to selftests for cgroup, netns, and xdp
bpf_links.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-4-andriin@fb.com

90806ccc

libbpf: Add bpf_link detach APIs · 2e49527e

Andrii Nakryiko authored Jul 31, 2020

Add low-level bpf_link_detach() API. Also add higher-level bpf_link__detach()
one.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-3-andriin@fb.com

2e49527e

bpf: Add support for forced LINK_DETACH command · 73b11c2a

Andrii Nakryiko authored Jul 31, 2020

Add LINK_DETACH command to force-detach bpf_link without destroying it. It has
the same behavior as auto-detaching of bpf_link due to cgroup dying for
bpf_cgroup_link or net_device being destroyed for bpf_xdp_link. In such case,
bpf_link is still a valid kernel object, but is defuncts and doesn't hold BPF
program attached to corresponding BPF hook. This functionality allows users
with enough access rights to manually force-detach attached bpf_link without
killing respective owner process.

This patch implements LINK_DETACH for cgroup, xdp, and netns links, mostly
re-using existing link release handling code.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-2-andriin@fb.com

73b11c2a

bpf, selftests: Use single cgroup helpers for both test_sockmap/progs · 4939b284

John Fastabend authored Jul 31, 2020

Nearly every user of cgroup helpers does the same sequence of API calls. So
push these into a single helper cgroup_setup_and_join. The cases that do
a bit of extra logic are test_progs which currently uses an env variable
to decide if it needs to setup the cgroup environment or can use an
existingi environment. And then tests that are doing cgroup tests
themselves. We skip these cases for now.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159623335418.30208.15807461815525100199.stgit@john-XPS-13-9370

4939b284

31 Jul, 2020 3 commits

Documentation/bpf: Use valid and new links in index.rst · ffba964e

Tiezhu Yang authored Jul 31, 2020

There exists an error "404 Not Found" when I click the html link of
"Documentation/networking/filter.rst" in the BPF documentation [1],
fix it.

Additionally, use the new links about "BPF and XDP Reference Guide"
and "bpf(2)" to avoid redirects.

[1] https://www.kernel.org/doc/html/latest/bpf/

Fixes: d9b9170a ("docs: bpf: Rename README.rst to index.rst")
Fixes: cb3f0d56 ("docs: networking: convert filter.txt to ReST")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1596184142-18476-1-git-send-email-yangtiezhu@loongson.cn

ffba964e

libbpf: Fix register in PT_REGS MIPS macros · 1acf8f90

Jerry Crunchtime authored Jul 31, 2020

The o32, n32 and n64 calling conventions require the return
value to be stored in $v0 which maps to $2 register, i.e.,
the register 2.

Fixes: c1932cdb ("bpf: Add MIPS support to samples/bpf.")
Signed-off-by: Jerry Crunchtime <jerry.c.t@web.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/43707d31-0210-e8f0-9226-1af140907641@web.de

1acf8f90

udp, bpf: Ignore connections in reuseport group after BPF sk lookup · c64c9c28

Jakub Sitnicki authored Jul 26, 2020

When BPF sk lookup invokes reuseport handling for the selected socket, it
should ignore the fact that reuseport group can contain connected UDP
sockets. With BPF sk lookup this is not relevant as we are not scoring
sockets to find the best match, which might be a connected UDP socket.

Fix it by unconditionally accepting the socket selected by reuseport.

This fixes the following two failures reported by test_progs.

  # ./test_progs -t sk_lookup
  ...
  #73/14 UDP IPv4 redir and reuseport with conns:FAIL
  ...
  #73/20 UDP IPv6 redir and reuseport with conns:FAIL
  ...

Fixes: a57066b1 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200726120228.1414348-1-jakub@cloudflare.com

c64c9c28

30 Jul, 2020 9 commits

libbpf: Make destructors more robust by handling ERR_PTR(err) cases · 50450fc7

Andrii Nakryiko authored Jul 29, 2020

Most of libbpf "constructors" on failure return ERR_PTR(err) result encoded as
a pointer. It's a common mistake to eventually pass such malformed pointers
into xxx__destroy()/xxx__free() "destructors". So instead of fixing up
clean up code in selftests and user programs, handle such error pointers in
destructors themselves. This works beautifully for NULL pointers passed to
destructors, so might as well just work for error pointers.
Suggested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729232148.896125-1-andriin@fb.com

50450fc7

selftests/bpf: Omit nodad flag when adding addresses to loopback · a6599abd

Jakub Sitnicki authored Jul 30, 2020

Setting IFA_F_NODAD flag for IPv6 addresses to add to loopback is
unnecessary. Duplicate Address Detection does not happen on loopback
device.

Also, passing 'nodad' flag to 'ip address' breaks libbpf CI, which runs in
an environment with BusyBox implementation of 'ip' command, that doesn't
understand this flag.

Fixes: 0ab5539f ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Andrii Nakryiko <andrii@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200730125325.1869363-1-jakub@cloudflare.com

a6599abd

selftests/bpf: Don't destroy failed link · 80546ac4

Andrii Nakryiko authored Jul 28, 2020

Check that link is NULL or proper pointer before invoking bpf_link__destroy().
Not doing this causes crash in test_progs, when cg_storage_multi selftest
fails.

Fixes: 3573f384 ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729045056.3363921-1-andriin@fb.com

80546ac4

selftests/bpf: Add xdpdrv mode for test_xdp_redirect · dfdb0d93

Hangbin Liu authored Jul 29, 2020

This patch add xdpdrv mode for test_xdp_redirect.sh since veth has support
native mode. After update here is the test result:

  # ./test_xdp_redirect.sh
  selftests: test_xdp_redirect xdpgeneric [PASS]
  selftests: test_xdp_redirect xdpdrv [PASS]
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: William Tu <u9012063@gmail.com>
Link: https://lore.kernel.org/bpf/20200729085658.403794-1-liuhangbin@gmail.com

dfdb0d93

selftests/bpf: Verify socket storage in cgroup/sock_{create, release} · 4fb5f949

Stanislav Fomichev authored Jul 28, 2020

Augment udp_limit test to set and verify socket storage value.
That should be enough to exercise the changes from the previous
patch.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729003104.1280813-2-sdf@google.com

4fb5f949

bpf: Expose socket storage to BPF_PROG_TYPE_CGROUP_SOCK · f7c6cb1d

Stanislav Fomichev authored Jul 28, 2020

This lets us use socket storage from the following hooks:

* BPF_CGROUP_INET_SOCK_CREATE
* BPF_CGROUP_INET_SOCK_RELEASE
* BPF_CGROUP_INET4_POST_BIND
* BPF_CGROUP_INET6_POST_BIND

Using existing 'bpf_sk_storage_get_proto' doesn't work because
second argument is ARG_PTR_TO_SOCKET. Even though
BPF_PROG_TYPE_CGROUP_SOCK hooks operate on 'struct bpf_sock',
the verifier still considers it as a PTR_TO_CTX.
That's why I'm adding another 'bpf_sk_storage_get_cg_sock_proto'
definition strictly for BPF_PROG_TYPE_CGROUP_SOCK which accepts
ARG_PTR_TO_CTX which is really 'struct sock' for this program type.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729003104.1280813-1-sdf@google.com

f7c6cb1d

selftests/bpf: Test bpf_iter buffer access with negative offset · 12e6196f

Yonghong Song authored Jul 28, 2020

Commit afbf21dc ("bpf: Support readonly/readwrite buffers
in verifier") added readonly/readwrite buffer support which
is currently used by bpf_iter tracing programs. It has
a bug with incorrect parameter ordering which later fixed
by Commit f6dfbe31 ("bpf: Fix swapped arguments in calls
to check_buffer_access").

This patch added a test case with a negative offset access
which will trigger the error path.

Without Commit f6dfbe31, running the test case in the patch,
the error message looks like:
   R1_w=rdwr_buf(id=0,off=0,imm=0) R10=fp0
  ; value_sum += *(__u32 *)(value - 4);
  2: (61) r1 = *(u32 *)(r1 -4)
  R1 invalid (null) buffer access: off=-4, size=4

With the above commit, the error message looks like:
   R1_w=rdwr_buf(id=0,off=0,imm=0) R10=fp0
  ; value_sum += *(__u32 *)(value - 4);
  2: (61) r1 = *(u32 *)(r1 -4)
  R1 invalid rdwr buffer access: off=-4, size=4
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728221801.1090406-1-yhs@fb.com

12e6196f

bpf: Add missing newline characters in verifier error messages · 4fc00b79

Yonghong Song authored Jul 28, 2020

Newline characters are added in two verifier error messages,
refactored in Commit afbf21dc ("bpf: Support readonly/readwrite
buffers in verifier"). This way, they do not mix with
messages afterwards.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728221801.1090349-1-yhs@fb.com

4fc00b79

bpf, arm64: Add BPF exception tables · 80083428

Jean-Philippe Brucker authored Jul 28, 2020

When a tracing BPF program attempts to read memory without using the
bpf_probe_read() helper, the verifier marks the load instruction with
the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
this flag it falls back to the interpreter.

Add support for BPF_PROBE_MEM, by appending an exception table to the
BPF program. If the load instruction causes a data abort, the fixup
infrastructure finds the exception table and fixes up the fault, by
clearing the destination register and jumping over the faulting
instruction.

To keep the compact exception table entry format, inspect the pc in
fixup_exception(). A more generic solution would add a "handler" field
to the table entry, like on x86 and s390.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728152122.1292756-2-jean-philippe@linaro.org

80083428

28 Jul, 2020 6 commits

bpf: Fix build without CONFIG_NET when using BPF XDP link · 310ad797

Andrii Nakryiko authored Jul 28, 2020

Entire net/core subsystem is not built without CONFIG_NET. linux/netdevice.h
just assumes that it's always there, so the easiest way to fix this is to
conditionally compile out bpf_xdp_link_attach() use in bpf/syscall.c.

Fixes: aa8d3a71 ("bpf, xdp: Add bpf_link-based XDP attachment API")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728190527.110830-1-andriin@fb.com

310ad797

bpf, selftests: use :: 1 for localhost in tcp_server.py · ca5cd355

John Fastabend authored Jul 28, 2020

Using localhost requires the host to have a /etc/hosts file with that
specific line in it. By default my dev box did not, they used
ip6-localhost, so the test was failing. To fix remove the need for any
/etc/hosts and use ::1.

I could just add the line, but this seems easier.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159594714197.21431.10113693935099326445.stgit@john-Precision-5820-Tower

ca5cd355

xdp: Prevent kernel-infoleak in xsk_getsockopt() · 3c4f850e

Peilin Ye authored Jul 28, 2020

xsk_getsockopt() is copying uninitialized stack memory to userspace when
'extra_stats' is 'false'. Fix it. Doing '= {};' is sufficient since currently
'struct xdp_statistics' is defined as follows:

  struct xdp_statistics {
    __u64 rx_dropped;
    __u64 rx_invalid_descs;
    __u64 tx_invalid_descs;
    __u64 rx_ring_full;
    __u64 rx_fill_ring_empty_descs;
    __u64 tx_ring_empty_descs;
  };

When being copied to the userspace, 'stats' will not contain any uninitialized
'holes' between struct fields.

Fixes: 8aa5a335 ("xsk: Add new statistics")
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/bpf/20200728053604.404631-1-yepeilin.cs@gmail.com

3c4f850e

bpf: Fix swapped arguments in calls to check_buffer_access · f6dfbe31

Colin Ian King authored Jul 27, 2020

There are a couple of arguments of the boolean flag zero_size_allowed and
the char pointer buf_info when calling to function check_buffer_access that
are swapped by mistake. Fix these by swapping them to correct the argument
ordering.

Fixes: afbf21dc ("bpf: Support readonly/readwrite buffers in verifier")
Addresses-Coverity: ("Array compared to 0")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200727175411.155179-1-colin.king@canonical.com

f6dfbe31

selftests/bpf: Add new bpf_iter context structs to fix build on old kernels · 363885d7

Andrii Nakryiko authored Jul 27, 2020

Add bpf_iter__bpf_map_elem and bpf_iter__bpf_sk_storage_map to bpf_iter.h.

Fixes: 3b1c420b ("selftests/bpf: Add a test for bpf sk_storage_map iterator")
Fixes: 2a7c2fff ("selftests/bpf: Add test for bpf hash map iterators")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200727233345.1686358-1-andriin@fb.com

363885d7

bpf: Fix bpf_ringbuf_output() signature to return long · e1613b57

Andrii Nakryiko authored Jul 27, 2020

Due to bpf tree fix merge, bpf_ringbuf_output() signature ended up with int as
a return type, while all other helpers got converted to returning long. So fix
it in bpf-next now.

Fixes: b0659d8a ("bpf: Fix definition of bpf_ringbuf_output() helper in UAPI comments")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200727224715.652037-1-andriin@fb.com

e1613b57

27 Jul, 2020 2 commits

tools, bpftool: Add LSM type to array of prog names · 9a97c9d2

Quentin Monnet authored Jul 24, 2020

Assign "lsm" as a printed name for BPF_PROG_TYPE_LSM in bpftool, so that
it can use it when listing programs loaded on the system or when probing
features.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200724090618.16378-3-quentin@isovalent.com

9a97c9d2

tools, bpftool: Skip type probe if name is not found · 70cfab1d

Quentin Monnet authored Jul 24, 2020

For probing program and map types, bpftool loops on type values and uses
the relevant type name in prog_type_name[] or map_type_name[]. To ensure
the name exists, we exit from the loop if we go over the size of the
array.

However, this is not enough in the case where the arrays have "holes" in
them, program or map types for which they have no name, but not at the
end of the list. This is currently the case for BPF_PROG_TYPE_LSM, not
known to bpftool and which name is a null string. When probing for
features, bpftool attempts to strlen() that name and segfaults.

Let's fix it by skipping probes for "unknown" program and map types,
with an informational message giving the numeral value in that case.

Fixes: 93a3545d ("tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type")
Reported-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200724090618.16378-2-quentin@isovalent.com

70cfab1d

26 Jul, 2020 13 commits

Merge branch 'bpf_link-XDP' · 47960ad6

Alexei Starovoitov authored Jul 25, 2020

Andrii Nakryiko says:

====================
Following cgroup and netns examples, implement bpf_link support for XDP.

The semantics is described in patch #2. Program and link attachments are
mutually exclusive, in the sense that neither link can replace attached
program nor program can replace attached link. Link can't replace attached
link as well, as is the case for any other bpf_link implementation.

Patch #1 refactors existing BPF program-based attachment API and centralizes
high-level query/attach decisions in generic kernel code, while drivers are
kept simple and are instructed with low-level decisions about attaching and
detaching specific bpf_prog. This also makes QUERY command unnecessary, and
patch #8 removes support for it from all kernel drivers. If that's a bad idea,
we can drop that patch altogether.

With refactoring in patch #1, adding bpf_xdp_link is completely transparent to
drivers, they are still functioning at the level of "effective" bpf_prog, that
should be called in XDP data path.

Corresponding libbpf support for BPF XDP link is added in patch #5.

v3->v4:
- fix a compilation warning in one of drivers (Jakub);

v2->v3:
- fix build when CONFIG_BPF_SYSCALL=n (kernel test robot);

v1->v2:
- fix prog refcounting bug (David);
- split dev_change_xdp_fd() changes into 2 patches (David);
- add extack messages to all user-induced errors (David).
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

47960ad6

bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands · e8407fde

Andrii Nakryiko authored Jul 21, 2020

Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.

This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().

This patch was compile-tested on allyesconfig.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com

e8407fde

selftests/bpf: Add BPF XDP link selftests · fe48230c

Andrii Nakryiko authored Jul 21, 2020

Add selftest validating all the attachment logic around BPF XDP link. Test
also link updates and get_obj_info() APIs.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-9-andriin@fb.com

fe48230c

libbpf: Add support for BPF XDP link · dc8698ca

Andrii Nakryiko authored Jul 21, 2020

Sync UAPI header and add support for using bpf_link-based XDP attachment.
Make xdp/ prog type set expected attach type. Kernel didn't enforce
attach_type for XDP programs before, so there is no backwards compatiblity
issues there.

Also fix section_names selftest to recognize that xdp prog types now have
expected attach type.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-8-andriin@fb.com

dc8698ca

bpf: Implement BPF XDP link-specific introspection APIs · c1931c97

Andrii Nakryiko authored Jul 21, 2020

Implement XDP link-specific show_fdinfo and link_info to emit ifindex.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-7-andriin@fb.com

c1931c97

bpf, xdp: Implement LINK_UPDATE for BPF XDP link · 026a4c28

Andrii Nakryiko authored Jul 21, 2020

Add support for LINK_UPDATE command for BPF XDP link to enable reliable
replacement of underlying BPF program.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-6-andriin@fb.com

026a4c28

bpf, xdp: Add bpf_link-based XDP attachment API · aa8d3a71

Andrii Nakryiko authored Jul 21, 2020

Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through
BPF_LINK_CREATE command.

bpf_xdp_link is mutually exclusive with direct BPF program attachment,
previous BPF program should be detached prior to attempting to create a new
bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it
can't be replaced by other BPF program attachment or link attachment. It will
be detached only when the last BPF link FD is closed.

bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to
how other BPF links behave (cgroup, flow_dissector). At that point bpf_link
will become defunct, but won't be destroyed until last FD is closed.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com

aa8d3a71

bpf, xdp: Extract common XDP program attachment logic · d4baa936

Andrii Nakryiko authored Jul 21, 2020

Further refactor XDP attachment code. dev_change_xdp_fd() is split into two
parts: getting bpf_progs from FDs and attachment logic, working with
bpf_progs. This makes attachment logic a bit more straightforward and
prepares code for bpf_xdp_link inclusion, which will share the common logic.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-4-andriin@fb.com

d4baa936

bpf, xdp: Maintain info on attached XDP BPF programs in net_device · 7f0a8382

Andrii Nakryiko authored Jul 21, 2020

Instead of delegating to drivers, maintain information about which BPF
programs are attached in which XDP modes (generic/skb, driver, or hardware)
locally in net_device. This effectively obsoletes XDP_QUERY_PROG command.

Such re-organization simplifies existing code already. But it also allows to
further add bpf_link-based XDP attachments without drivers having to know
about any of this at all, which seems like a good setup.
XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to
install/uninstall active BPF program. All the higher-level concerns about
prog/link interaction will be contained within generic driver-agnostic logic.

All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed.
It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags
when resetting installed programs. That seems unnecessary, plus most drivers
don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW
should be enough of an indicator of what is required of driver to correctly
reset active BPF program. dev_xdp_uninstall() is also generalized as an
iteration over all three supported mode.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com

7f0a8382

bpf: Make bpf_link API available indepently of CONFIG_BPF_SYSCALL · 6cc7d1e8

Andrii Nakryiko authored Jul 21, 2020

Similarly to bpf_prog, make bpf_link and related generic API available
unconditionally to make it easier to have bpf_link support in various parts of
the kernel. Stub out init/prime/settle/cleanup and inc/put APIs.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-2-andriin@fb.com

6cc7d1e8

bpf: Fix build on architectures with special bpf_user_pt_regs_t · 2b9b305f

Song Liu authored Jul 24, 2020

Architectures like s390, powerpc, arm64, riscv have speical definition of
bpf_user_pt_regs_t. So we need to cast the pointer before passing it to
bpf_get_stack(). This is similar to bpf_get_stack_tp().

Fixes: 03d42fd2d83f ("bpf: Separate bpf_get_[stack|stackid] for perf events BPF")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200724200503.3629591-1-songliubraving@fb.com

2b9b305f

bpf/local_storage: Fix build without CONFIG_CGROUP · dfcdf0e9

YiFei Zhu authored Jul 24, 2020

local_storage.o has its compile guard as CONFIG_BPF_SYSCALL, which
does not imply that CONFIG_CGROUP is on. Including cgroup-internal.h
when CONFIG_CGROUP is off cause a compilation failure.

Fixes: f67cfc233706 ("bpf: Make cgroup storages shared between programs on the same cgroup")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200724211753.902969-1-zhuyifei1999@gmail.com

dfcdf0e9

Merge branch 'shared-cgroup-storage' · 36f72484

Alexei Starovoitov authored Jul 23, 2020

YiFei Zhu says:

====================
To access the storage in a CGROUP_STORAGE map, one uses
bpf_get_local_storage helper, which is extremely fast due to its
use of per-CPU variables. However, its whole code is built on
the assumption that one map can only be used by one program at any
time, and this prohibits any sharing of data between multiple
programs using these maps, eliminating a lot of use cases, such
as some per-cgroup configuration storage, written to by a
setsockopt program and read by a cg_sock_addr program.

Why not use other map types? The great part of CGROUP_STORAGE map
is that it is isolated by different cgroups its attached to. When
one program uses bpf_get_local_storage, even on the same map, it
gets different storages if it were run as a result of attaching
to different cgroups. The kernel manages the storages, simplifying
BPF program or userspace. In theory, one could probably use other
maps like array or hash to do the same thing, but it would be a
major overhead / complexity. Userspace needs to know when a cgroup
is being freed in order to free up a space in the replacement map.

This patch set introduces a significant change to the semantics of
CGROUP_STORAGE map type. Instead of each storage being tied to one
single attachment, it is shared across different attachments to
the same cgroup, and persists until either the map or the cgroup
attached to is being freed.

User may use u64 as the key to the map, and the result would be
that the attach type become ignored during key comparison, and
programs of different attach types will share the same storage if
the cgroups they are attached to are the same.

How could this break existing users?
* Users that uses detach & reattach / program replacement as a
  shortcut to zeroing the storage. Since we need sharing between
  programs, we cannot zero the storage. Users that expect this
  behavior should either attach a program with a new map, or
  explicitly zero the map with a syscall.
This case is dependent on undocumented implementation details,
so the impact should be very minimal.

Patch 1 introduces a test on the old expected behavior of the map
type.

Patch 2 introduces a test showing how two programs cannot share
one such map.

Patch 3 implements the change of semantics to the map.

Patch 4 amends the new test such that it yields the behavior we
expect from the change.

Patch 5 documents the map type.

Changes since RFC:
* Clarify commit message in patch 3 such that it says the lifetime
  of the storage is ended at the freeing of the cgroup_bpf, rather
  than the cgroup itself.
* Restored an -ENOMEM check in __cgroup_bpf_attach.
* Update selftests for recent change in network_helpers API.

Changes since v1:
* s/CHECK_FAIL/CHECK/
* s/bpf_prog_attach/bpf_program__attach_cgroup/
* Moved test__start_subtest to test_cg_storage_multi.
* Removed some redundant CHECK_FAIL where they are already CHECK-ed.

Changes since v2:
* Lock cgroup_mutex during map_free.
* Publish new storages only if attach is successful, by tracking
  exactly which storages are reused in an array of bools.
* Mention bpftool map dump showing a value of zero for attach_type
  in patch 3 commit message.

Changes since v3:
* Use a much simpler lookup and allocate-if-not-exist from the fact
  that cgroup_mutex is locked during attach.
* Removed an unnecessary spinlock hold.

Changes since v4:
* Changed semantics so that if the key type is struct
  bpf_cgroup_storage_key the map retains isolation between different
  attach types. Sharing between different attach types only occur
  when key type is u64.
* Adapted tests and docs for the above change.

Changes since v5:
* Removed redundant NULL check before bpf_link__destroy.
* Free BPF object explicitly, after asserting that object failed to
  load, in the event that the object did not fail to load.
* Rename variable in bpf_cgroup_storage_key_cmp for clarity.
* Added a lot of information to Documentation, more or less copied
  from what Martin KaFai Lau wrote.
====================
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

36f72484