Commits · 75ccae62cb8d42a619323a85c577107b8b37d797 · nexedi / linux

17 Jan, 2020 2 commits

xdp: Move devmap bulk queue into struct net_device · 75ccae62

Toke Høiland-Jørgensen authored Jan 16, 2020

Commit 96360004 ("xdp: Make devmap flush_list common for all map
instances"), changed devmap flushing to be a global operation instead of a
per-map operation. However, the queue structure used for bulking was still
allocated as part of the containing map.

This patch moves the devmap bulk queue into struct net_device. The
motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
which will be changed in a subsequent commit.  To avoid other fields of
struct net_device moving to different cache lines, we also move a couple of
other members around.

We defer the actual allocation of the bulk queue structure until the
NETDEV_REGISTER notification devmap.c. This makes it possible to check for
ndo_xdp_xmit support before allocating the structure, which is not possible
at the time struct net_device is allocated. However, we keep the freeing in
free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.

Because of this change, we lose the reference back to the map that
originated the redirect, so change the tracepoint to always return 0 as the
map ID and index. Otherwise no functional change is intended with this
patch.

After this patch, the relevant part of struct net_device looks like this,
according to pahole:

	/* --- cacheline 14 boundary (896 bytes) --- */
	struct netdev_queue *      _tx __attribute__((__aligned__(64))); /*   896     8 */
	unsigned int               num_tx_queues;        /*   904     4 */
	unsigned int               real_num_tx_queues;   /*   908     4 */
	struct Qdisc *             qdisc;                /*   912     8 */
	unsigned int               tx_queue_len;         /*   920     4 */
	spinlock_t                 tx_global_lock;       /*   924     4 */
	struct xdp_dev_bulk_queue * xdp_bulkq;           /*   928     8 */
	struct xps_dev_maps *      xps_cpus_map;         /*   936     8 */
	struct xps_dev_maps *      xps_rxqs_map;         /*   944     8 */
	struct mini_Qdisc *        miniq_egress;         /*   952     8 */
	/* --- cacheline 15 boundary (960 bytes) --- */
	struct hlist_head  qdisc_hash[16];               /*   960   128 */
	/* --- cacheline 17 boundary (1088 bytes) --- */
	struct timer_list  watchdog_timer;               /*  1088    40 */

	/* XXX last struct has 4 bytes of padding */

	int                        watchdog_timeo;       /*  1128     4 */

	/* XXX 4 bytes hole, try to pack */

	struct list_head   todo_list;                    /*  1136    16 */
	/* --- cacheline 18 boundary (1152 bytes) --- */
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk

75ccae62

libbpf: Revert bpf_helper_defs.h inclusion regression · 20f21d98

Andrii Nakryiko authored Jan 16, 2020

Revert bpf_helpers.h's change to include auto-generated bpf_helper_defs.h
through <> instead of "", which causes it to be searched in include path. This
can break existing applications that don't have their include path pointing
directly to where libbpf installs its headers.

There is ongoing work to make all (not just bpf_helper_defs.h) includes more
consistent across libbpf and its consumers, but this unbreaks user code as is
right now without any regressions. Selftests still behave sub-optimally
(taking bpf_helper_defs.h from libbpf's source directory, if it's present
there), which will be fixed in subsequent patches.

Fixes: 6910d7d3 ("selftests/bpf: Ensure bpf_helper_defs.h are taken from selftests dir")
Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200117004103.148068-1-andriin@fb.com

20f21d98

16 Jan, 2020 3 commits

selftests/bpf: Fix test_progs send_signal flakiness with nmi mode · 35697c12

Yonghong Song authored Jan 16, 2020

Alexei observed that test_progs send_signal may fail if run
with command line "./test_progs" and the tests will pass
if just run "./test_progs -n 40".

I observed similar issue with nmi subtest failure
and added a delay 100 us in Commit ab8b7f0c
("tools/bpf: Add self tests for bpf_send_signal_thread()")
and the problem is gone for me. But the issue still exists
in Alexei's testing environment.

The current code uses sample_freq = 50 (50 events/second), which
may not be enough. But if the sample_freq value is larger than
sysctl kernel/perf_event_max_sample_rate, the perf_event_open
syscall will fail.

This patch changed nmi perf testing to use sample_period = 1,
which means trying to sampling every event. This seems fixing
the issue.

Fixes: ab8b7f0c ("tools/bpf: Add self tests for bpf_send_signal_thread()")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200116174004.1522812-1-yhs@fb.com

35697c12

libbpf: Fix unneeded extra initialization in bpf_map_batch_common · 858e284f

Brian Vazquez authored Jan 15, 2020

bpf_attr doesn't required to be declared with '= {}' as memset is used
in the code.

Fixes: 2ab3d86e ("libbpf: Add libbpf support to batch ops")
Reported-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200116045918.75597-1-brianvv@google.com

858e284f

selftests/bpf: Add whitelist/blacklist of test names to test_progs · b65053cd

Andrii Nakryiko authored Jan 15, 2020

Add ability to specify a list of test name substrings for selecting which
tests to run. So now -t is accepting a comma-separated list of strings,
similarly to how -n accepts a comma-separated list of test numbers.

Additionally, add ability to blacklist tests by name. Blacklist takes
precedence over whitelist. Blacklisting is important for cases where it's
known that some tests can't pass (e.g., due to perf hardware events that are
not available within VM). This is going to be used for libbpf testing in
Travis CI in its Github repo.

Example runs with just whitelist and whitelist + blacklist:

  $ sudo ./test_progs -tattach,core/existence
  #1 attach_probe:OK
  #6 cgroup_attach_autodetach:OK
  #7 cgroup_attach_multi:OK
  #8 cgroup_attach_override:OK
  #9 core_extern:OK
  #10/44 existence:OK
  #10/45 existence___minimal:OK
  #10/46 existence__err_int_sz:OK
  #10/47 existence__err_int_type:OK
  #10/48 existence__err_int_kind:OK
  #10/49 existence__err_arr_kind:OK
  #10/50 existence__err_arr_value_type:OK
  #10/51 existence__err_struct_type:OK
  #10 core_reloc:OK
  #19 flow_dissector_reattach:OK
  #60 tp_attach_query:OK
  Summary: 8/8 PASSED, 0 SKIPPED, 0 FAILED

  $ sudo ./test_progs -tattach,core/existence -bcgroup,flow/arr
  #1 attach_probe:OK
  #9 core_extern:OK
  #10/44 existence:OK
  #10/45 existence___minimal:OK
  #10/46 existence__err_int_sz:OK
  #10/47 existence__err_int_type:OK
  #10/48 existence__err_int_kind:OK
  #10/51 existence__err_struct_type:OK
  #10 core_reloc:OK
  #60 tp_attach_query:OK
  Summary: 4/6 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Julia Kartseva <hex@fb.com>
Link: https://lore.kernel.org/bpf/20200116005549.3644118-1-andriin@fb.com

b65053cd

15 Jan, 2020 22 commits

Merge branch 'bpftool-improvements' · 7bcfea96

Alexei Starovoitov authored Jan 15, 2020

Martin Lau says:

====================
When a map is storing a kernel's struct, its
map_info->btf_vmlinux_value_type_id is set.  The first map type
supporting it is BPF_MAP_TYPE_STRUCT_OPS.

This series adds support to dump this kind of map with BTF.
The first two patches are bug fixes which are only applicable to
bpf-next.

Please see individual patches for details.

v3:
- Remove unnecessary #include "libbpf_internal.h" from patch 5

v2:
- Expose bpf_find_kernel_btf() as a LIBBPF_API in patch 3 (Andrii)
- Cache btf_vmlinux in bpftool/map.c (Andrii)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

7bcfea96

bpftool: Support dumping a map with btf_vmlinux_value_type_id · 4e1ea332

Martin KaFai Lau authored Jan 15, 2020

This patch makes bpftool support dumping a map's value properly
when the map's value type is a type of the running kernel's btf.
(i.e. map_info.btf_vmlinux_value_type_id is set instead of
map_info.btf_value_type_id).  The first usecase is for the
BPF_MAP_TYPE_STRUCT_OPS.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200115230044.1103008-1-kafai@fb.com

4e1ea332

bpftool: Add struct_ops map name · 84c72cee

Martin KaFai Lau authored Jan 15, 2020

This patch adds BPF_MAP_TYPE_STRUCT_OPS to "struct_ops" name mapping
so that "bpftool map show" can print the "struct_ops" map type
properly.

[root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool map show id 8
8: struct_ops  name dctcp  flags 0x0
	key 4B  value 256B  max_entries 1  memlock 4096B
	btf_id 7
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200115230037.1102674-1-kafai@fb.com

84c72cee

libbpf: Expose bpf_find_kernel_btf as a LIBBPF_API · fb2426ad

Martin KaFai Lau authored Jan 15, 2020

This patch exposes bpf_find_kernel_btf() as a LIBBPF_API.
It will be used in 'bpftool map dump' in a following patch
to dump a map with btf_vmlinux_value_type_id set.

bpf_find_kernel_btf() is renamed to libbpf_find_kernel_btf()
and moved to btf.c.  As <linux/kernel.h> is included,
some of the max/min type casting needs to be fixed.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200115230031.1102305-1-kafai@fb.com

fb2426ad

bpftool: Fix missing BTF output for json during map dump · 188a4866

Martin KaFai Lau authored Jan 15, 2020

The btf availability check is only done for plain text output.
It causes the whole BTF output went missing when json_output
is used.

This patch simplifies the logic a little by avoiding passing "int btf" to
map_dump().

For plain text output, the btf_wtr is only created when the map has
BTF (i.e. info->btf_id != 0).  The nullness of "json_writer_t *wtr"
in map_dump() alone can decide if dumping BTF output is needed.
As long as wtr is not NULL, map_dump() will print out the BTF-described
data whenever a map has BTF available (i.e. info->btf_id != 0)
regardless of json or plain-text output.

In do_dump(), the "int btf" is also renamed to "int do_plain_btf".

Fixes: 99f9863a ("bpftool: Match maps by name")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: Paul Chaignon <paul.chaignon@orange.com>
Link: https://lore.kernel.org/bpf/20200115230025.1101828-1-kafai@fb.com

188a4866

bpftool: Fix a leak of btf object · d7de7267

Martin KaFai Lau authored Jan 15, 2020

When testing a map has btf or not, maps_have_btf() tests it by actually
getting a btf_fd from sys_bpf(BPF_BTF_GET_FD_BY_ID). However, it
forgot to btf__free() it.

In maps_have_btf() stage, there is no need to test it by really
calling sys_bpf(BPF_BTF_GET_FD_BY_ID). Testing non zero
info.btf_id is good enough.

Also, the err_close case is unnecessary, and also causes double
close() because the calling func do_dump() will close() all fds again.

Fixes: 99f9863a ("bpftool: Match maps by name")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: Paul Chaignon <paul.chaignon@orange.com>
Link: https://lore.kernel.org/bpf/20200115230019.1101352-1-kafai@fb.com

d7de7267

Merge branch 'bpf-batch-ops' · 990bca1f

Alexei Starovoitov authored Jan 15, 2020

Brian Vazquez says:

====================
This patch series introduce batch ops that can be added to bpf maps to
lookup/lookup_and_delete/update/delete more than 1 element at the time,
this is specially useful when syscall overhead is a problem and in case
of hmap it will provide a reliable way of traversing them.

The implementation inclues a generic approach that could potentially be
used by any bpf map and adds it to arraymap, it also includes the specific
implementation of hashmaps which are traversed using buckets instead
of keys.

The bpf syscall subcommands introduced are:

  BPF_MAP_LOOKUP_BATCH
  BPF_MAP_LOOKUP_AND_DELETE_BATCH
  BPF_MAP_UPDATE_BATCH
  BPF_MAP_DELETE_BATCH

The UAPI attribute is:

  struct { /* struct used by BPF_MAP_*_BATCH commands */
         __aligned_u64   in_batch;       /* start batch,
                                          * NULL to start from beginning
                                          */
         __aligned_u64   out_batch;      /* output: next start batch */
         __aligned_u64   keys;
         __aligned_u64   values;
         __u32           count;          /* input/output:
                                          * input: # of key/value
                                          * elements
                                          * output: # of filled elements
                                          */
         __u32           map_fd;
         __u64           elem_flags;
         __u64           flags;
  } batch;

in_batch and out_batch are only used for lookup and lookup_and_delete since
those are the only two operations that attempt to traverse the map.

update/delete batch ops should provide the keys/values that user wants
to modify.

Here are the previous discussions on the batch processing:
 - https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
 - https://lore.kernel.org/bpf/20190829064502.2750303-1-yhs@fb.com/
 - https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/

Changelog sinve v4:
 - Remove unnecessary checks from libbpf API (Andrii Nakryiko)
 - Move DECLARE_LIBBPF_OPTS with all var declarations (Andrii Nakryiko)
 - Change bucket internal buffer size to 5 entries (Yonghong Song)
 - Fix some minor bugs in hashtab batch ops implementation (Yonghong Song)

Changelog sinve v3:
 - Do not use copy_to_user inside atomic region (Yonghong Song)
 - Use _opts approach on libbpf APIs (Andrii Nakryiko)
 - Drop generic_map_lookup_and_delete_batch support
 - Free malloc-ed memory in tests (Yonghong Song)
 - Reverse christmas tree (Yonghong Song)
 - Add acked labels

Changelog sinve v2:
 - Add generic batch support for lpm_trie and test it (Yonghong Song)
 - Use define MAP_LOOKUP_RETRIES for retries (John Fastabend)
 - Return errors directly and remove labels (Yonghong Song)
 - Insert new API functions into libbpf alphabetically (Yonghong Song)
 - Change hlist_nulls_for_each_entry_rcu to
   hlist_nulls_for_each_entry_safe in htab batch ops (Yonghong Song)

Changelog since v1:
 - Fix SOB ordering and remove Co-authored-by tag (Alexei Starovoitov)

Changelog since RFC:
 - Change batch to in_batch and out_batch to support more flexible opaque
   values to iterate the bpf maps.
 - Remove update/delete specific batch ops for htab and use the generic
   implementations instead.
====================
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

990bca1f

selftests/bpf: Add batch ops testing to array bpf map · f0fac2ce

Brian Vazquez authored Jan 15, 2020

Tested bpf_map_lookup_batch() and bpf_map_update_batch()
functionality.

  $ ./test_maps
      ...
        test_array_map_batch_ops:PASS
      ...
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-10-brianvv@google.com

f0fac2ce

selftests/bpf: Add batch ops testing for htab and htab_percpu map · 30ff3c59

Yonghong Song authored Jan 15, 2020

Tested bpf_map_lookup_batch(), bpf_map_lookup_and_delete_batch(),
bpf_map_update_batch(), and bpf_map_delete_batch() functionality.
  $ ./test_maps
    ...
      test_htab_map_batch_ops:PASS
      test_htab_percpu_map_batch_ops:PASS
    ...
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-9-brianvv@google.com

30ff3c59

libbpf: Add libbpf support to batch ops · 2ab3d86e

Yonghong Song authored Jan 15, 2020

Added four libbpf API functions to support map batch operations:
  . int bpf_map_delete_batch( ... )
  . int bpf_map_lookup_batch( ... )
  . int bpf_map_lookup_and_delete_batch( ... )
  . int bpf_map_update_batch( ... )
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-8-brianvv@google.com

2ab3d86e

tools/bpf: Sync uapi header bpf.h · a1e3a3b8

Yonghong Song authored Jan 15, 2020

sync uapi header include/uapi/linux/bpf.h to
tools/include/uapi/linux/bpf.h
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-7-brianvv@google.com

a1e3a3b8

bpf: Add batch ops to all htab bpf map · 05799638

Yonghong Song authored Jan 15, 2020

htab can't use generic batch support due some problematic behaviours
inherent to the data structre, i.e. while iterating the bpf map  a
concurrent program might delete the next entry that batch was about to
use, in that case there's no easy solution to retrieve the next entry,
the issue has been discussed multiple times (see [1] and [2]).

The only way hmap can be traversed without the problem previously
exposed is by making sure that the map is traversing entire buckets.
This commit implements those strict requirements for hmap, the
implementation follows the same interaction that generic support with
some exceptions:

 - If keys/values buffer are not big enough to traverse a bucket,
   ENOSPC will be returned.
 - out_batch contains the value of the next bucket in the iteration, not
   the next key, but this is transparent for the user since the user
   should never use out_batch for other than bpf batch syscalls.

This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new
command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete
batch ops it is possible to use the generic implementations.

[1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
[2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com

05799638

bpf: Add lookup and update batch ops to arraymap · c60f2d28

Brian Vazquez authored Jan 15, 2020

This adds the generic batch ops functionality to bpf arraymap, note that
since deletion is not a valid operation for arraymap, only batch and
lookup are added.
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200115184308.162644-5-brianvv@google.com

c60f2d28

bpf: Add generic support for update and delete batch ops · aa2e93b8

Brian Vazquez authored Jan 15, 2020

This commit adds generic support for update and delete batch ops that
can be used for almost all the bpf maps. These commands share the same
UAPI attr that lookup and lookup_and_delete batch ops use and the
syscall commands are:

  BPF_MAP_UPDATE_BATCH
  BPF_MAP_DELETE_BATCH

The main difference between update/delete and lookup batch ops is that
for update/delete keys/values must be specified for userspace and
because of that, neither in_batch nor out_batch are used.
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com

aa2e93b8

bpf: Add generic support for lookup batch op · cb4d03ab

Brian Vazquez authored Jan 15, 2020

This commit introduces generic support for the bpf_map_lookup_batch.
This implementation can be used by almost all the bpf maps since its core
implementation is relying on the existing map_get_next_key and
map_lookup_elem. The bpf syscall subcommand introduced is:

  BPF_MAP_LOOKUP_BATCH

The UAPI attribute is:

  struct { /* struct used by BPF_MAP_*_BATCH commands */
         __aligned_u64   in_batch;       /* start batch,
                                          * NULL to start from beginning
                                          */
         __aligned_u64   out_batch;      /* output: next start batch */
         __aligned_u64   keys;
         __aligned_u64   values;
         __u32           count;          /* input/output:
                                          * input: # of key/value
                                          * elements
                                          * output: # of filled elements
                                          */
         __u32           map_fd;
         __u64           elem_flags;
         __u64           flags;
  } batch;

in_batch/out_batch are opaque values use to communicate between
user/kernel space, in_batch/out_batch must be of key_size length.

To start iterating from the beginning in_batch must be null,
count is the # of key/value elements to retrieve. Note that the 'keys'
buffer must be a buffer of key_size * count size and the 'values' buffer
must be value_size * count, where value_size must be aligned to 8 bytes
by userspace if it's dealing with percpu maps. 'count' will contain the
number of keys/values successfully retrieved. Note that 'count' is an
input/output variable and it can contain a lower value after a call.

If there's no more entries to retrieve, ENOENT will be returned. If error
is ENOENT, count might be > 0 in case it copied some values but there were
no more entries to retrieve.

Note that if the return code is an error and not -EFAULT,
count indicates the number of elements successfully processed.
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com

cb4d03ab

bpf: Add bpf_map_{value_size, update_value, map_copy_value} functions · 15c14a3d

Brian Vazquez authored Jan 15, 2020

This commit moves reusable code from map_lookup_elem and map_update_elem
to avoid code duplication in kernel/bpf/syscall.c.
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200115184308.162644-2-brianvv@google.com

15c14a3d

selftests/bpf: Add a test for attaching a bpf fentry/fexit trace to an XDP program · 83e4b88b

Eelco Chaudron authored Jan 15, 2020

Add a test that will attach a FENTRY and FEXIT program to the XDP test
program. It will also verify data from the XDP context on FENTRY and
verifies the return code on exit.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157909410480.47481.11202505690938004673.stgit@xdp-tutorial

83e4b88b

libbpf: Support .text sub-calls relocations · 9173cac3

Andrii Nakryiko authored Jan 15, 2020

The LLVM patch https://reviews.llvm.org/D72197 makes LLVM emit function call
relocations within the same section. This includes a default .text section,
which contains any BPF sub-programs. This wasn't the case before and so libbpf
was able to get a way with slightly simpler handling of subprogram call
relocations.

This patch adds support for .text section relocations. It needs to ensure
correct order of relocations, so does two passes:
- first, relocate .text instructions, if there are any relocations in it;
- then process all the other programs and copy over patched .text instructions
for all sub-program calls.

v1->v2:
- break early once .text program is processed.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115190856.2391325-1-andriin@fb.com

9173cac3

Merge branch 'bpf_send_signal_thread' · 5640a771

Alexei Starovoitov authored Jan 15, 2020

Yonghong Song says:

====================
Commit 8b401f9e ("bpf: implement bpf_send_signal() helper")
added helper bpf_send_signal() which permits bpf program to
send a signal to the current process. The signal may send to
any thread of the process.

This patch implemented a new helper bpf_send_signal_thread()
to send a signal to the thread corresponding to the kernel current task.
This helper can simplify user space code if the thread context of
bpf sending signal is needed in user space. Please see Patch #1 for
details of use case and kernel implementation.

Patch #2 added some bpf self tests for the new helper.

Changelogs:
  v2 -> v3:
    - More simplification for skeleton codes by removing not-needed
      mmap code and redundantly created tracepoint link.
  v1 -> v2:
    - More description for the difference between bpf_send_signal()
      and bpf_send_signal_thread() in the uapi header bpf.h.
    - Use skeleton and mmap for send_signal test.
====================
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

5640a771

tools/bpf: Add self tests for bpf_send_signal_thread() · ab8b7f0c

Yonghong Song authored Jan 14, 2020

The test_progs send_signal() is amended to test
bpf_send_signal_thread() as well.

   $ ./test_progs -n 40
   #40/1 send_signal_tracepoint:OK
   #40/2 send_signal_perf:OK
   #40/3 send_signal_nmi:OK
   #40/4 send_signal_tracepoint_thread:OK
   #40/5 send_signal_perf_thread:OK
   #40/6 send_signal_nmi_thread:OK
   #40 send_signal:OK
   Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED

Also took this opportunity to rewrite the send_signal test
using skeleton framework and array mmap to make code
simpler and more readable.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115035003.602425-1-yhs@fb.com

ab8b7f0c

bpf: Add bpf_send_signal_thread() helper · 8482941f

Yonghong Song authored Jan 14, 2020

Commit 8b401f9e ("bpf: implement bpf_send_signal() helper")
added helper bpf_send_signal() which permits bpf program to
send a signal to the current process. The signal may be
delivered to any threads in the process.

We found a use case where sending the signal to the current
thread is more preferable.
  - A bpf program will collect the stack trace and then
    send signal to the user application.
  - The user application will add some thread specific
    information to the just collected stack trace for
    later analysis.

If bpf_send_signal() is used, user application will need
to check whether the thread receiving the signal matches
the thread collecting the stack by checking thread id.
If not, it will need to send signal to another thread
through pthread_kill().

This patch proposed a new helper bpf_send_signal_thread(),
which sends the signal to the thread corresponding to
the current kernel task. This way, user space is guaranteed that
bpf_program execution context and user space signal handling
context are the same thread.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115035002.602336-1-yhs@fb.com

8482941f

xsk: Support allocations of large umems · d3a56931

Magnus Karlsson authored Jan 14, 2020

When registering a umem area that is sufficiently large (>1G on an
x86), kmalloc cannot be used to allocate one of the internal data
structures, as the size requested gets too large. Use kvmalloc instead
that falls back on vmalloc if the allocation is too large for kmalloc.

Also add accounting for this structure as it is triggered by a user
space action (the XDP_UMEM_REG setsockopt) and it is by far the
largest structure of kernel allocated memory in xsk.
Reported-by: Ryan Goodfellow <rgoodfel@isi.edu>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/1578995365-7050-1-git-send-email-magnus.karlsson@intel.com

d3a56931

14 Jan, 2020 9 commits

bpf: Return -EBADRQC for invalid map type in __bpf_tx_xdp_map · 0a29275b

Li RongQing authored Jan 10, 2020

A negative value should be returned if map->map_type is invalid
although that is impossible now, but if we run into such situation
in future, then xdpbuff could be leaked.

Daniel Borkmann suggested:

-EBADRQC should be returned to stay consistent with generic XDP
for the tracepoint output and not to be confused with -EOPNOTSUPP
from other locations like dev_map_enqueue() when ndo_xdp_xmit is
missing and such.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1578618277-18085-1-git-send-email-lirongqing@baidu.com

0a29275b

bpf: Fix seq_show for BPF_MAP_TYPE_STRUCT_OPS · 3b413041

Martin KaFai Lau authored Jan 13, 2020

Instead of using bpf_struct_ops_map_lookup_elem() which is
not implemented, bpf_struct_ops_map_seq_show_elem() should
also use bpf_struct_ops_map_sys_lookup_elem() which does
an inplace update to the value.  The change allocates
a value to pass to bpf_struct_ops_map_sys_lookup_elem().

[root@arch-fb-vm1 bpf]# cat /sys/fs/bpf/dctcp
{{{1}},BPF_STRUCT_OPS_STATE_INUSE,{{00000000df93eebc,00000000df93eebc},0,2, ...

Fixes: 85d33df3 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200114072647.3188298-1-kafai@fb.com

3b413041

Merge branch 'runqslower' · 6dd42aa1

Alexei Starovoitov authored Jan 13, 2020

Andrii Nakryiko says:

====================
Based on recent BPF CO-RE, tp_btf, and BPF skeleton changes, re-implement
BCC-based runqslower tool as a portable pre-compiled BPF CO-RE-based tool.
Make sure it's built as part of selftests to ensure it doesn't bit rot.

As part of this patch set, augment `format c` output of `bpftool btf dump`
sub-command with applying `preserve_access_index` attribute to all structs and
unions. This makes all such structs and unions automatically relocatable under
BPF CO-RE, which improves user experience of writing TRACING programs with
direct kernel memory read access.

Also, further clean up selftest/bpf Makefile output and make it conforming to
libbpf and bpftool succinct output format.

v1->v2:
- build in-tree bpftool for runqslower (Yonghong);
- drop `format core` and augment `format c` instead (Alexei);
- move runqslower under tools/bpf (Daniel).
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6dd42aa1

selftests/bpf: Build runqslower from selftests · 3a0d3092

Andrii Nakryiko authored Jan 12, 2020

Ensure runqslower tool is built as part of selftests to prevent it from bit
rotting.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200113073143.1779940-7-andriin@fb.com

3a0d3092

tools/bpf: Add runqslower tool to tools/bpf · 9c01546d

Andrii Nakryiko authored Jan 12, 2020

Convert one of BCC tools (runqslower [0]) to BPF CO-RE + libbpf. It matches
its BCC-based counterpart 1-to-1, supporting all the same parameters and
functionality.

runqslower tool utilizes BPF skeleton, auto-generated from BPF object file,
as well as memory-mapped interface to global (read-only, in this case) data.
Its Makefile also ensures auto-generation of "relocatable" vmlinux.h, which is
necessary for BTF-typed raw tracepoints with direct memory access.

  [0] https://github.com/iovisor/bcc/blob/11bf5d02c895df9646c117c713082eb192825293/tools/runqslower.pySigned-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200113073143.1779940-6-andriin@fb.com

9c01546d

bpftool: Apply preserve_access_index attribute to all types in BTF dump · 1cf5b239

Andrii Nakryiko authored Jan 12, 2020

This patch makes structs and unions, emitted through BTF dump, automatically
CO-RE-relocatable (unless disabled with `#define BPF_NO_PRESERVE_ACCESS_INDEX`,
specified before including generated header file).

This effectivaly turns usual bpf_probe_read() call into equivalent of
bpf_core_read(), by automatically applying builtin_preserve_access_index to
any field accesses of types in generated C types header.

This is especially useful for tp_btf/fentry/fexit BPF program types. They
allow direct memory access, so BPF C code just uses straightfoward a->b->c
access pattern to read data from kernel. But without kernel structs marked as
CO-RE relocatable through preserve_access_index attribute, one has to enclose
all the data reads into a special __builtin_preserve_access_index code block,
like so:

__builtin_preserve_access_index(({
    x = p->pid; /* where p is struct task_struct *, for example */
}));

This is very inconvenient and obscures the logic quite a bit. By marking all
auto-generated types with preserve_access_index attribute the above code is
reduced to just a clean and natural `x = p->pid;`.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200113073143.1779940-5-andriin@fb.com

1cf5b239

selftests/bpf: Conform selftests/bpf Makefile output to libbpf and bpftool · 2cc51d34

Andrii Nakryiko authored Jan 12, 2020

Bring selftest/bpf's Makefile output to the same format used by libbpf and
bpftool: 2 spaces of padding on the left + 8-character left-aligned build step
identifier.

Also, hide feature detection output by default. Can be enabled back by setting
V=1.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200113073143.1779940-4-andriin@fb.com

2cc51d34

libbpf: Clean up bpf_helper_defs.h generation output · 292e1d73

Andrii Nakryiko authored Jan 12, 2020

bpf_helpers_doc.py script, used to generate bpf_helper_defs.h, unconditionally
emits one informational message to stderr. Remove it and preserve stderr to
contain only relevant errors. Also make sure script invocations command is
muted by default in libbpf's Makefile.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200113073143.1779940-3-andriin@fb.com

292e1d73

tools: Sync uapi/linux/if_link.h · 533420a4

Andrii Nakryiko authored Jan 12, 2020

Sync uapi/linux/if_link.h into tools to avoid out of sync warnings during
libbpf build.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200113073143.1779940-2-andriin@fb.com

533420a4

10 Jan, 2020 4 commits

selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros · ac065870

Andrii Nakryiko authored Jan 10, 2020

Streamline BPF_TRACE_x macro by moving out return type and section attribute
definition out of macro itself. That makes those function look in source code
similar to other BPF programs. Additionally, simplify its usage by determining
number of arguments automatically (so just single BPF_TRACE vs a family of
BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument
syntax without commas inbetween argument type and name.

Given this helper is useful not only for tracing tp_btf/fenty/fexit programs,
but could be used for LSM programs and others following the same pattern,
rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x
usages in selftests are converted to new BPF_PROG macro.

Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for
nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts
same convention used by fexit programs, that last defined argument is probed
function's return result.

v4->v5:
- fix test_overhead test (__set_task_comm is void) (Alexei);

v3->v4:
- rebased and fixed one more BPF_TRACE_x occurence (Alexei);

v2->v3:
- rename to shorter and as generic BPF_PROG (Alexei);

v1->v2:
- verified GCC handles pragmas as expected;
- added descriptions to macros;
- converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected);
- added original context as 'ctx' parameter, for cases where it has to be
  passed into BPF helpers. This might cause an accidental naming collision,
  unfortunately, but at least it's easy to work around. Fortunately, this
  situation produces quite legible compilation error:

progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long *'
        int ctx = 123;
            ^
progs/bpf_dctcp.c:42:6: note: previous definition is here
void BPF_HANDLER(dctcp_init, struct sock *sk)
     ^
./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER'
____##name(unsigned long long *ctx, ##args)
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com

ac065870

libbpf: Poison kernel-only integer types · 1d1a3bcf

Andrii Nakryiko authored Jan 10, 2020

It's been a recurring issue with types like u32 slipping into libbpf source
code accidentally. This is not detected during builds inside kernel source
tree, but becomes a compilation error in libbpf's Github repo. Libbpf is
supposed to use only __{s,u}{8,16,32,64} typedefs, so poison {s,u}{8,16,32,64}
explicitly in every .c file. Doing that in a bit more centralized way, e.g.,
inside libbpf_internal.h breaks selftests, which are both using kernel u32 and
libbpf_internal.h.

This patch also fixes a new u32 occurence in libbpf.c, added recently.

Fixes: 590a0088 ("bpf: libbpf: Add STRUCT_OPS support")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200110181916.271446-1-andriin@fb.com

1d1a3bcf

Merge branch 'bpf-global-funcs' · 7a2d070f

Daniel Borkmann authored Jan 10, 2020

Alexei Starovoitov says:

====================
Introduce static vs global functions and function by function verification.
This is another step toward dynamic re-linking (or replacement) of global
functions. See patch 2 for details.

v2->v3:
- cleaned up a check spotted by Song.
- rebased and dropped patch 2 that was trying to improve BTF based on ELF.
- added one more unit test for scalar return value from global func.

v1->v2:
- addressed review comments from Song, Andrii, Yonghong
- fixed memory leak in error path
- added modified ctx check
- added more tests in patch 7
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7a2d070f

selftests/bpf: Add unit tests for global functions · 360301a6

Alexei Starovoitov authored Jan 09, 2020

test_global_func[12] - check 512 stack limit.
test_global_func[34] - check 8 frame call chain limit.
test_global_func5    - check that non-ctx pointer cannot be passed into
                       a function that expects context.
test_global_func6    - check that ctx pointer is unmodified.
test_global_func7    - check that global function returns scalar.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200110064124.1760511-7-ast@kernel.org

360301a6