Commits · 48cca7e44f9f8268fdcd4351e2f19ff2275119d1 · Kirill Smelkov / linux

17 Dec, 2017 6 commits

libbpf: add support for bpf_call · 48cca7e4

Alexei Starovoitov authored Dec 14, 2017

- recognize relocation emitted by llvm
- since all regular function will be kept in .text section and llvm
  takes care of pc-relative offsets in bpf_call instruction
  simply copy all of .text to relevant program section while adjusting
  bpf_call instructions in program section to point to newly copied
  body of instructions from .text
- do so for all programs in the elf file
- set all programs types to the one passed to bpf_prog_load()

Note for elf files with multiple programs that use different
functions in .text section we need to do 'linker' style logic.
This work is still TBD
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

48cca7e4

selftests/bpf: add tests for stack_zero tracking · d98588ce

Alexei Starovoitov authored Dec 14, 2017

adjust two tests, since verifier got smarter
and add new one to test stack_zero logic
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d98588ce

bpf: teach verifier to recognize zero initialized stack · cc2b14d5

Alexei Starovoitov authored Dec 14, 2017

programs with function calls are often passing various
pointers via stack. When all calls are inlined llvm
flattens stack accesses and optimizes away extra branches.
When functions are not inlined it becomes the job of
the verifier to recognize zero initialized stack to avoid
exploring paths that program will not take.
The following program would fail otherwise:

ptr = &buffer_on_stack;
*ptr = 0;
...
func_call(.., ptr, ...) {
  if (..)
    *ptr = bpf_map_lookup();
}
...
if (*ptr != 0) {
  // Access (*ptr)->field is valid.
  // Without stack_zero tracking such (*ptr)->field access
  // will be rejected
}

since stack slots are no longer uniform invalid | spill | misc
add liveness marking to all slots, but do it in 8 byte chunks.
So if nothing was read or written in [fp-16, fp-9] range
it will be marked as LIVE_NONE.
If any byte in that range was read, it will be marked LIVE_READ
and stacksafe() check will perform byte-by-byte verification.
If all bytes in the range were written the slot will be
marked as LIVE_WRITTEN.
This significantly speeds up state equality comparison
and reduces total number of states processed.

                    before   after
bpf_lb-DLB_L3.o       2051    2003
bpf_lb-DLB_L4.o       3287    3164
bpf_lb-DUNKNOWN.o     1080    1080
bpf_lxc-DDROP_ALL.o   24980   12361
bpf_lxc-DUNKNOWN.o    34308   16605
bpf_netdev.o          15404   10962
bpf_overlay.o         7191    6679
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

cc2b14d5

selftests/bpf: add verifier tests for bpf_call · a7ff3eca

Alexei Starovoitov authored Dec 14, 2017

Add extensive set of tests for bpf_call verification logic:

calls: basic sanity
calls: using r0 returned by callee
calls: callee is using r1
calls: callee using args1
calls: callee using wrong args2
calls: callee using two args
calls: callee changing pkt pointers
calls: two calls with args
calls: two calls with bad jump
calls: recursive call. test1
calls: recursive call. test2
calls: unreachable code
calls: invalid call
calls: jumping across function bodies. test1
calls: jumping across function bodies. test2
calls: call without exit
calls: call into middle of ld_imm64
calls: call into middle of other call
calls: two calls with bad fallthrough
calls: two calls with stack read
calls: two calls with stack write
calls: spill into caller stack frame
calls: two calls with stack write and void return
calls: ambiguous return value
calls: two calls that return map_value
calls: two calls that return map_value with bool condition
calls: two calls that return map_value with incorrect bool check
calls: two calls that receive map_value via arg=ptr_stack_of_caller. test1
calls: two calls that receive map_value via arg=ptr_stack_of_caller. test2
calls: two jumps that receive map_value via arg=ptr_stack_of_jumper. test3
calls: two calls that receive map_value_ptr_or_null via arg. test1
calls: two calls that receive map_value_ptr_or_null via arg. test2
calls: pkt_ptr spill into caller stack
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

a7ff3eca

bpf: introduce function calls (verification) · f4d7e40a

Alexei Starovoitov authored Dec 14, 2017

Allow arbitrary function calls from bpf function to another bpf function.

To recognize such set of bpf functions the verifier does:
1. runs control flow analysis to detect function boundaries
2. proceeds with verification of all functions starting from main(root) function
It recognizes that the stack of the caller can be accessed by the callee
(if the caller passed a pointer to its stack to the callee) and the callee
can store map_value and other pointers into the stack of the caller.
3. keeps track of the stack_depth of each function to make sure that total
stack depth is still less than 512 bytes
4. disallows pointers to the callee stack to be stored into the caller stack,
since they will be invalid as soon as the callee returns
5. to reuse all of the existing state_pruning logic each function call
is considered to be independent call from the verifier point of view.
The verifier pretends to inline all function calls it sees are being called.
It stores the callsite instruction index as part of the state to make sure
that two calls to the same callee from two different places in the caller
will be different from state pruning point of view
6. more safety checks are added to liveness analysis

Implementation details:
. struct bpf_verifier_state is now consists of all stack frames that
  led to this function
. struct bpf_func_state represent one stack frame. It consists of
  registers in the given frame and its stack
. propagate_liveness() logic had a premature optimization where
  mark_reg_read() and mark_stack_slot_read() were manually inlined
  with loop iterating over parents for each register or stack slot.
  Undo this optimization to reuse more complex mark_*_read() logic
. skip_callee() logic is not necessary from safety point of view,
  but without it mark_*_read() markings become too conservative,
  since after returning from the funciton call a read of r6-r9
  will incorrectly propagate the read marks into callee causing
  inefficient pruning later
. mark_*_read() logic is now aware of control flow which makes it
  more complex. In the future the plan is to rewrite liveness
  to be hierarchical. So that liveness can be done within
  basic block only and control flow will be responsible for
  propagation of liveness information along cfg and between calls.
. tail_calls and ld_abs insns are not allowed in the programs with
  bpf-to-bpf calls
. returning stack pointers to the caller or storing them into stack
  frame of the caller is not allowed

Testing:
. no difference in cilium processed_insn numbers
. large number of tests follows in next patches
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

f4d7e40a

bpf: introduce function calls (function boundaries) · cc8b0b92

Alexei Starovoitov authored Dec 14, 2017

Allow arbitrary function calls from bpf function to another bpf function.

Since the beginning of bpf all bpf programs were represented as a single function
and program authors were forced to use always_inline for all functions
in their C code. That was causing llvm to unnecessary inflate the code size
and forcing developers to move code to header files with little code reuse.

With a bit of additional complexity teach verifier to recognize
arbitrary function calls from one bpf function to another as long as
all of functions are presented to the verifier as a single bpf program.
New program layout:
r6 = r1    // some code
..
r1 = ..    // arg1
r2 = ..    // arg2
call pc+1  // function call pc-relative
exit
.. = r1    // access arg1
.. = r2    // access arg2
..
call pc+20 // second level of function call
...

It allows for better optimized code and finally allows to introduce
the core bpf libraries that can be reused in different projects,
since programs are no longer limited by single elf file.
With function calls bpf can be compiled into multiple .o files.

This patch is the first step. It detects programs that contain
multiple functions and checks that calls between them are valid.
It splits the sequence of bpf instructions (one program) into a set
of bpf functions that call each other. Calls to only known
functions are allowed. In the future the verifier may allow
calls to unresolved functions and will do dynamic linking.
This logic supports statically linked bpf functions only.

Such function boundary detection could have been done as part of
control flow graph building in check_cfg(), but it's cleaner to
separate function boundary detection vs control flow checks within
a subprogram (function) into logically indepedent steps.
Follow up patches may split check_cfg() further, but not check_subprogs().

Only allow bpf-to-bpf calls for root only and for non-hw-offloaded programs.
These restrictions can be relaxed in the future.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

cc8b0b92

15 Dec, 2017 7 commits

nfp: bpf: correct printk formats for size_t · 0bce7c9a

Jakub Kicinski authored Dec 15, 2017

Build bot reported warning about invalid printk formats on 32bit
architectures.  Use %zu for size_t and %zd ptr diff.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

0bce7c9a

Merge branch 'bpf-nfp-jit-adjust-head-support' · 1c45597e

Daniel Borkmann authored Dec 15, 2017

Jakub Kicinski says:

====================
This small set adds support for bpf_xdp_adjust_head() to the offload.
Since we have access to unmodified BPF bytecode translating calls is
pretty trivial.  First part of the series adds handling of BPF
capabilities included in the FW in TLV format.  The last two patches
add adjust head support in the nfp verifier and jit, and a small
optimization in case we can guarantee the constant adjustment
will always meet adjustment constaints.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

1c45597e

nfp: bpf: optimize the adjust_head calls in trivial cases · 8231f844

Jakub Kicinski authored Dec 14, 2017

If the program is simple and has only one adjust head call
with constant parameters, we can check that the call will
always succeed at translation time.  We need to track the
location of the call and make sure parameters are always
the same.  We also have to check the parameters against
datapath constraints and ETH_HLEN.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

8231f844

nfp: bpf: add basic support for adjust head call · 0d49eaf4

Jakub Kicinski authored Dec 14, 2017

Support bpf_xdp_adjust_head().  We need to check whether the
packet offset after adjustment is within datapath's limits.
We also check if the frame is at least ETH_HLEN long (similar
to the kernel implementation).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

0d49eaf4

nfp: bpf: prepare for call support · 2cb230bd

Jakub Kicinski authored Dec 14, 2017

Add skeleton of verifier checks and translation handler
for call instructions.  Make sure jump target resolution
will not treat them as jumps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

2cb230bd

nfp: bpf: prepare for parsing BPF FW capabilities · 77a844ee

Jakub Kicinski authored Dec 14, 2017

BPF FW creates a run time symbol called bpf_capabilities which
contains TLV-formatted capability information.  Allocate app
private structure to store parsed capabilities and add a skeleton
of parsing logic.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

77a844ee

nfp: add nfp_cpp_area_size() accessor · a351ab56

Jakub Kicinski authored Dec 14, 2017

Allow users outside of core reading area sizes. This was not needed
previously because whatever entity created the area would usually know
what size it asked for. The nfp_rtsym_map() helper, however, will
allocate the area based on the size of an RT-symbol with given name.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

a351ab56

14 Dec, 2017 5 commits

Merge branch 'bpf-bpftool-cgroup-ops' · 7310c233

Daniel Borkmann authored Dec 14, 2017

Roman Gushchin says:

====================
This patchset adds basic cgroup bpf operations to bpftool.

Right now there is no convenient way to perform these operations.
The /samples/bpf/load_sock_ops.c implements attach/detacg operations,
but only for BPF_CGROUP_SOCK_OPS programs. Bps (part of bcc) implements
bpf introspection, but lacks any cgroup-related specific.

I find having a tool to perform these basic operations in the kernel tree
very useful, as it can be used in the corresponding bpf documentation
without creating additional dependencies. And bpftool seems to be
a right tool to extend with such functionality.

v4:
  - ATTACH_FLAGS and ATTACH_TYPE are listed and described in docs and usage
  - ATTACH_FLAG names converted to "multi" and "override"
  - do_attach() recognizes ATTACH_FLAG abbreviations, e.g "mul"
  - Local variables sorted ("reverse Christmas tree")
  - unknown attach flags value will be never truncated

v3:
  - SRC replaced with OBJ in prog load docs
  - Output unknown attach type in hex
  - License header in SPDX format
  - Minor style fixes (e.g. variable reordering)

v2:
  - Added prog load operations
  - All cgroup operations are looking like bpftool cgroup <command>
  - All cgroup-related stuff is moved to a separate file
  - Added support for attach flags
  - Added support for attaching/detaching programs by id, pinned name, etc
  - Changed cgroup detach arguments order
  - Added empty json output for succesful programs
  - Style fixed: includes order, strncmp and macroses, error handling
  - Added man pages

v1:
  https://lwn.net/Articles/740366/
====================
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7310c233

bpftool: implement cgroup bpf operations · 5ccda64d

Roman Gushchin authored Dec 13, 2017

This patch adds basic cgroup bpf operations to bpftool:
cgroup list, attach and detach commands.

Usage is described in the corresponding man pages,
and examples are provided.

Syntax:
$ bpftool cgroup list CGROUP
$ bpftool cgroup attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]
$ bpftool cgroup detach CGROUP ATTACH_TYPE PROG
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

5ccda64d

bpftool: implement prog load command · 49a086c2

Roman Gushchin authored Dec 13, 2017

Add the prog load command to load a bpf program from a specified
binary file and pin it to bpffs.

Usage description and examples are given in the corresponding man
page.

Syntax:
$ bpftool prog load OBJ FILE

FILE is a non-existing file on bpffs.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

49a086c2

libbpf: prefer global symbols as bpf program name source · fe4d44b2

Roman Gushchin authored Dec 13, 2017

Libbpf picks the name of the first symbol in the corresponding
elf section to use as a program name. But without taking symbol's
scope into account it may end's up with some local label
as a program name. E.g.:

$ bpftool prog
1: type 15  name LBB0_10    tag 0390a5136ba23f5c
	loaded_at Dec 07/17:22  uid 0
	xlated 456B  not jited  memlock 4096B

Fix this by preferring global symbols as program name.

For instance:
$ bpftool prog
1: type 15  name bpf_prog1  tag 0390a5136ba23f5c
	loaded_at Dec 07/17:26  uid 0
	xlated 456B  not jited  memlock 4096B
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

fe4d44b2

libbpf: add ability to guess program type based on section name · 583c9009

Roman Gushchin authored Dec 13, 2017

The bpf_prog_load() function will guess program type if it's not
specified explicitly. This functionality will be used to implement
loading of different programs without asking a user to specify
the program type. In first order it will be used by bpftool.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

583c9009

13 Dec, 2017 1 commit

bpf/tracing: fix kernel/events/core.c compilation error · f4e2298e

Yonghong Song authored Dec 13, 2017

Commit f371b304 ("bpf/tracing: allow user space to
query prog array on the same tp") introduced a perf
ioctl command to query prog array attached to the
same perf tracepoint. The commit introduced a
compilation error under certain config conditions, e.g.,
  (1). CONFIG_BPF_SYSCALL is not defined, or
  (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS
       nor CONFIG_KPROBE_EVENTS is defined.

Error message:
  kernel/events/core.o: In function `perf_ioctl':
  core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array'

This patch fixed this error by guarding the real definition under
CONFIG_BPF_EVENTS and provided static inline dummy function
if CONFIG_BPF_EVENTS was not defined.
It renamed the function from bpf_event_query_prog_array to
perf_event_query_prog_array and moved the definition from linux/bpf.h
to linux/trace_events.h so the definition is in proximity to
other prog_array related functions.

Fixes: f371b304 ("bpf/tracing: allow user space to query prog array on the same tp")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

f4e2298e

12 Dec, 2017 10 commits

Merge branch 'bpf-override-return' · 553a8f2f

Alexei Starovoitov authored Dec 12, 2017

Josef Bacik says:

====================
This is the same as v8, just rebased onto the bpf tree.

v8->v9:
- rebased onto the bpf tree.

v7->v8:
- removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.

v6->v7:
- moved the opt-in macro to bpf.h out of kprobes.h.

v5->v6:
- add BPF_ALLOW_ERROR_INJECTION() tagging for functions that will support this
  feature.  This way only functions that opt-in will be allowed to be
  overridden.
- added a btrfs patch to allow error injection for open_ctree() so that the bpf
  sample actually works.

v4->v5:
- disallow kprobe_override programs from being put in the prog map array so we
  don't tail call into something we didn't check.  This allows us to make the
  normal path still fast without a bunch of percpu operations.

v3->v4:
- fix a build error found by kbuild test bot (I didn't wait long enough
  apparently.)
- Added a warning message as per Daniels suggestion.

v2->v3:
- added a ->kprobe_override flag to bpf_prog.
- added some sanity checks to disallow attaching bpf progs that have
  ->kprobe_override set that aren't for ftrace kprobes.
- added the trace_kprobe_ftrace helper to check if the trace_event_call is a
  ftrace kprobe.
- renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read this
  value in the kprobe path, and thus only write to it if we're overriding or
  clearing the override.

v1->v2:
- moved things around to make sure that bpf_override_return could really only be
  used for an ftrace kprobe.
- killed the special return values from trace_call_bpf.
- renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
  it was being called from an ftrace kprobe context.
- reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
- updated the test as per Alexei's review.

- Original message -

A lot of our error paths are not well tested because we have no good way of
injecting errors generically.  Some subystems (block, memory) have ways to
inject errors, but they are random so it's hard to get reproduceable results.

With BPF we can add determinism to our error injection.  We can use kprobes and
other things to verify we are injecting errors at the exact case we are trying
to test.  This patch gives us the tool to actual do the error injection part.
It is very simple, we just set the return value of the pt_regs we're given to
whatever we provide, and then override the PC with a dummy function that simply
returns.

Right now this only works on x86, but it would be simple enough to expand to
other architectures.  Thanks,

Josef
====================

In patch "bpf: add a bpf_override_function helper" Alexei moved
"ifdef CONFIG_BPF_KPROBE_OVERRIDE" few lines to fail program loading
when kprobe_override is not available instead of failing at run-time.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

553a8f2f

btrfs: allow us to inject errors at io_ctl_init · 023f46c5

Josef Bacik authored Dec 11, 2017

This was instrumental in reproducing a space cache bug.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

023f46c5

samples/bpf: add a test for bpf_override_return · 965de87e

Josef Bacik authored Dec 11, 2017

This adds a basic test for bpf_override_return to verify it works.  We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

965de87e

bpf: add a bpf_override_function helper · 9802d865

Josef Bacik authored Dec 11, 2017

Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9802d865

btrfs: make open_ctree error injectable · 8556e509

Josef Bacik authored Dec 11, 2017

This allows us to do error injection with BPF for open_ctree.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

8556e509

add infrastructure for tagging functions as error injectable · 92ace999

Josef Bacik authored Dec 11, 2017

Using BPF we can override kprob'ed functions and return arbitrary
values.  Obviously this can be a bit unsafe, so make this feature opt-in
for functions.  Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
order to give BPF access to that function for error injection purposes.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

92ace999

Merge branch 'bpf-tracing-multiprog-tp-query' · a23967c1

Alexei Starovoitov authored Dec 12, 2017

Yonghong Song says:

====================
Commit e87c6bc3 ("bpf: permit multiple bpf attachments
for a single perf event") added support to attach multiple
bpf programs to a single perf event. Given a perf event
(kprobe, uprobe, or kernel tracepoint), the perf ioctl interface
is used to query bpf programs attached to the same trace event.

There already exists a BPF_PROG_QUERY command for introspection
currently used by cgroup+bpf. We did have an implementation for
querying tracepoint+bpf through the same interface. However, it
looks cleaner to use ioctl() style of api here, since attaching
bpf prog to tracepoint/kuprobe is also done via ioctl.

Patch #1 had the core implementation and patch #2 added
a test case in tools bpf selftests suite.

Changelogs:
v3 -> v4:
  - Fix a compilation error with newer gcc like 6.3.1 while
    old gcc 4.8.5 is okay. I was using &uquery->ids to represent
    the address to the ids array to make it explicit that the
    address is passed, and this syntax is rightly rejected
    by gcc 6.3.1.
v2 -> v3:
  - Change uapi structure perf_event_query_bpf to be more
    clearer based on Peter's suggestion, and adjust
    other codes accordingly.
v1 -> v2:
  - Rebase on top of net-next.
  - Use existing bpf_prog_array_length function instead of
    implementing the same functionality in function
    bpf_prog_array_copy_info.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a23967c1

bpf/tracing: add a bpf test for new ioctl query interface · d279f1f8

Yonghong Song authored Dec 11, 2017

Added a subtest in test_progs. The tracepoint is
sched/sched_switch. Multiple bpf programs are attached to
this tracepoint and the query interface is exercised.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

d279f1f8

bpf/tracing: allow user space to query prog array on the same tp · f371b304

Yonghong Song authored Dec 11, 2017

Commit e87c6bc3 ("bpf: permit multiple bpf attachments
for a single perf event") added support to attach multiple
bpf programs to a single perf event.
Although this provides flexibility, users may want to know
what other bpf programs attached to the same tp interface.
Besides getting visibility for the underlying bpf system,
such information may also help consolidate multiple bpf programs,
understand potential performance issues due to a large array,
and debug (e.g., one bpf program which overwrites return code
may impact subsequent program results).

Commit 2541517c ("tracing, perf: Implement BPF programs
attached to kprobes") utilized the existing perf ioctl
interface and added the command PERF_EVENT_IOC_SET_BPF
to attach a bpf program to a tracepoint. This patch adds a new
ioctl command, given a perf event fd, to query the bpf program
array attached to the same perf tracepoint event.

The new uapi ioctl command:
  PERF_EVENT_IOC_QUERY_BPF

The new uapi/linux/perf_event.h structure:
  struct perf_event_query_bpf {
       __u32	ids_len;
       __u32	prog_cnt;
       __u32	ids[0];
  };

User space provides buffer "ids" for kernel to copy to.
When returning from the kernel, the number of available
programs in the array is set in "prog_cnt".

The usage:
  struct perf_event_query_bpf *query =
    malloc(sizeof(*query) + sizeof(u32) * ids_len);
  query.ids_len = ids_len;
  err = ioctl(pmu_efd, PERF_EVENT_IOC_QUERY_BPF, query);
  if (err == 0) {
    /* query.prog_cnt is the number of available progs,
     * number of progs in ids: (ids_len == 0) ? 0 : query.prog_cnt
     */
  } else if (errno == ENOSPC) {
    /* query.ids_len number of progs copied,
     * query.prog_cnt is the number of available progs
     */
  } else {
      /* other errors */
  }
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

f371b304

selftests: bpf: Adding config fragment CONFIG_CGROUP_BPF=y · 63060c39

Naresh Kamboju authored Dec 12, 2017

CONFIG_CGROUP_BPF=y is required for test_dev_cgroup test case.
Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

63060c39

08 Dec, 2017 5 commits

Merge branch 'bpf-bpftool-makefile-cleanups' · 97c50589

Daniel Borkmann authored Dec 08, 2017

Quentin Monnet says:

====================
First patch of this series cleans up the two Makefiles (Makefile and
Documentation/Makefile) and make their contents more consistent.

The second one add "uninstall" and "doc-uninstall" targets, to remove
files previously installed on the system with "install" and "doc-install"
targets.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

97c50589

tools: bpftool: create "uninstall", "doc-uninstall" make targets · d3244248

Quentin Monnet authored Dec 07, 2017

Create two targets to remove executable and documentation that would
have been previously installed with `make install` and `make
doc-install`.

Also create a "QUIET_UNINST" helper in tools/scripts/Makefile.include.

Do not attempt to remove directories /usr/local/sbin and
/usr/share/bash-completions/completions, even if they are empty, as
those specific directories probably already existed on the system before
we installed the program, and we do not wish to break other makefiles
that might assume their existence. Do remvoe /usr/local/share/man/man8
if empty however, as this directory does not seem to exist by default.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d3244248

tools: bpftool: harmonise Makefile and Documentation/Makefile · 658e85aa

Quentin Monnet authored Dec 07, 2017

Several minor fixes and harmonisation items for Makefiles:

* Use the same mechanism for verbose/non-verbose output in two files
  ("$(Q)"), for all commands.
* Use calls to "QUIET_INSTALL" and equivalent in Makefile. In
  particular, use "call(descend, ...)" instead of "make -C" to run
  documentation targets.
* Add a "doc-clean" target, aligned on "doc" and "doc-install".
* Make "install" target in Makefile depend on "bpftool".
* Remove condition on DESTDIR to initialise prefix in doc Makefile.
* Remove modification of VPATH based on OUTPUT, it is unused.
* Formatting: harmonise spaces around equal signs.
* Make install path for man pages /usr/local/man instead of
  /usr/local/share/man (respects the Makefile conventions, and the
  latter is usually a symbolic link to the former anyway).
* Do not erase prefix if set by user in bpftool Makefile.
* Fix install target for bpftool: append DESTDIR to install path.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

658e85aa

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 62cd2770

David S. Miller authored Dec 08, 2017

Alexei Starovoitov says:

====================
pull-request: bpf-next 2017-12-07

The following pull-request contains BPF updates for your net-next tree.

The main changes are:

1) Detailed documentation of BPF development process from Daniel.

2) Addition of is_fullsock, snd_cwnd and srtt_us fields to bpf_sock_ops
   from Lawrence.

3) Minor follow up for bpf_skb_set_tunnel_key() from William.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

62cd2770

tuntap: fix possible deadlock when fail to register netdev · 124da8f6

Jason Wang authored Dec 08, 2017

Private destructor could be called when register_netdev() fail with
rtnl lock held. This will lead deadlock in tun_free_netdev() who tries
to hold rtnl_lock. Fixing this by switching to use spinlock to
synchronize.

Fixes: 96f84061 ("tun: add eBPF based queue selection method")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

124da8f6

07 Dec, 2017 6 commits

Merge branch 'smc-fixes-next' · 66c5c5b5

David S. Miller authored Dec 07, 2017

Ursula Braun says:

====================
smc: fixes 2017-12-07

here are some smc-patches. The initial 4 patches are cleanups.
Patch 5 gets rid of ib_post_sends in tasklet context to avoid peer drops due
to out-of-order receivals.
Patch 6 makes sure, the Linux SMC code understands variable sized CLC proposal
messages built according to RFC7609.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

66c5c5b5

smc: support variable CLC proposal messages · e7b7a64a

Ursula Braun authored Dec 07, 2017

According to RFC7609 [1] the CLC proposal message contains an area of
unknown length for future growth. Additionally it may contain up to
8 IPv6 prefixes. The current version of the SMC-code does not
understand CLC proposal messages using these variable length fields and,
thus, is incompatible with SMC implementations in other operating
systems.

This patch makes sure, SMC understands incoming CLC proposals
* with arbitrary length values for future growth
* with up to 8 IPv6 prefixes

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7b7a64a

smc: no consumer update in tasklet context · 6b5771aa

Ursula Braun authored Dec 07, 2017

The SMC protocol requires to send a separate consumer cursor update,
if it cannot be piggybacked to updates of the producer cursor.
When receiving a blocked signal from the sender, this update is sent
already in tasklet context. In addition consumer cursor updates are
sent after data receival.
Sending of cursor updates is controlled by sequence numbers.
Assuming receiving stray messages the receiver drops updates with older
sequence numbers than an already received cursor update with a higher
sequence number.
Sending consumer cursor updates in tasklet context may result in
wrong order sends and its corresponding drops at the receiver. Since
it is sufficient to send consumer cursor updates once the data is
received, this patch gets rid of the consumer cursor update in tasklet
context to guarantee in-sequence arrival of cursor updates.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b5771aa

smc: cleanup close checking during data receival · 71c125c3

Ursula Braun authored Dec 07, 2017

When waiting for data to be received it must be checked if the
peer signals shutdown. The SMC code uses two different checks
for this purpose, even though just one check is sufficient.
This patch removes the superfluous test for SOCK_DONE.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

71c125c3

smc: no update for unused sk_write_pending · 4bd3e7fb

Ursula Braun authored Dec 07, 2017

The smc code never checks the sk_write_pending sock field.
Thus there is no need to update it.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4bd3e7fb

smc: improve smc_clc_send_decline() error handling · 0c9f1515

Ursula Braun authored Dec 07, 2017

Let smc_clc_send_decline() return with an error, if the amount
sent is smaller than the length of an smc decline message.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c9f1515