1. 17 Dec, 2017 12 commits
    • Alexei Starovoitov's avatar
      bpf: arm64: add JIT support for multi-function programs · db496944
      Alexei Starovoitov authored
      similar to x64 add support for bpf-to-bpf calls.
      When program has calls to in-kernel helpers the target call offset
      is known at JIT time and arm64 architecture needs 2 passes.
      With bpf-to-bpf calls the dynamically allocated function start
      is unknown until all functions of the program are JITed.
      Therefore (just like x64) arm64 JIT needs one extra pass over
      the program to emit correct call offsets.
      
      Implementation detail:
      Avoid being too clever in 64-bit immediate moves and
      always use 4 instructions (instead of 3-4 depending on the address)
      to make sure only one extra pass is needed.
      If some future optimization would make it worth while to optimize
      'call 64-bit imm' further, the JIT would need to do 4 passes
      over the program instead of 3 as in this patch.
      For typical bpf program address the mov needs 3 or 4 insns,
      so unconditional 4 insns to save extra pass is a worthy trade off
      at this state of JIT.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      db496944
    • Alexei Starovoitov's avatar
      bpf: x64: add JIT support for multi-function programs · 1c2a088a
      Alexei Starovoitov authored
      Typical JIT does several passes over bpf instructions to
      compute total size and relative offsets of jumps and calls.
      With multitple bpf functions calling each other all relative calls
      will have invalid offsets intially therefore we need to additional
      last pass over the program to emit calls with correct offsets.
      For example in case of three bpf functions:
      main:
        call foo
        call bpf_map_lookup
        exit
      foo:
        call bar
        exit
      bar:
        exit
      
      We will call bpf_int_jit_compile() indepedently for main(), foo() and bar()
      x64 JIT typically does 4-5 passes to converge.
      After these initial passes the image for these 3 functions
      will be good except call targets, since start addresses of
      foo() and bar() are unknown when we were JITing main()
      (note that call bpf_map_lookup will be resolved properly
      during initial passes).
      Once start addresses of 3 functions are known we patch
      call_insn->imm to point to right functions and call
      bpf_int_jit_compile() again which needs only one pass.
      Additional safety checks are done to make sure this
      last pass doesn't produce image that is larger or smaller
      than previous pass.
      
      When constant blinding is on it's applied to all functions
      at the first pass, since doing it once again at the last
      pass can change size of the JITed code.
      
      Tested on x64 and arm64 hw with JIT on/off, blinding on/off.
      x64 jits bpf-to-bpf calls correctly while arm64 falls back to interpreter.
      All other JITs that support normal BPF_CALL will behave the same way
      since bpf-to-bpf call is equivalent to bpf-to-kernel call from
      JITs point of view.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1c2a088a
    • Alexei Starovoitov's avatar
      bpf: fix net.core.bpf_jit_enable race · 60b58afc
      Alexei Starovoitov authored
      global bpf_jit_enable variable is tested multiple times in JITs,
      blinding and verifier core. The malicious root can try to toggle
      it while loading the programs. This race condition was accounted
      for and there should be no issues, but it's safer to avoid
      this race condition.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      60b58afc
    • Alexei Starovoitov's avatar
      bpf: add support for bpf_call to interpreter · 1ea47e01
      Alexei Starovoitov authored
      though bpf_call is still the same call instruction and
      calling convention 'bpf to bpf' and 'bpf to helper' is the same
      the interpreter has to oparate on 'struct bpf_insn *'.
      To distinguish these two cases add a kernel internal opcode and
      mark call insns with it.
      This opcode is seen by interpreter only. JITs will never see it.
      Also add tiny bit of debug code to aid interpreter debugging.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1ea47e01
    • Alexei Starovoitov's avatar
      selftests/bpf: add xdp noinline test · b0b04fc4
      Alexei Starovoitov authored
      add large semi-artificial XDP test with 18 functions to stress test
      bpf call verification logic
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b0b04fc4
    • Alexei Starovoitov's avatar
      selftests/bpf: add bpf_call test · 3bc35c63
      Alexei Starovoitov authored
      strip always_inline from test_l4lb.c and compile it with -fno-inline
      to let verifier go through 11 function with various function arguments
      and return values
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3bc35c63
    • Alexei Starovoitov's avatar
      libbpf: add support for bpf_call · 48cca7e4
      Alexei Starovoitov authored
      - recognize relocation emitted by llvm
      - since all regular function will be kept in .text section and llvm
        takes care of pc-relative offsets in bpf_call instruction
        simply copy all of .text to relevant program section while adjusting
        bpf_call instructions in program section to point to newly copied
        body of instructions from .text
      - do so for all programs in the elf file
      - set all programs types to the one passed to bpf_prog_load()
      
      Note for elf files with multiple programs that use different
      functions in .text section we need to do 'linker' style logic.
      This work is still TBD
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      48cca7e4
    • Alexei Starovoitov's avatar
      selftests/bpf: add tests for stack_zero tracking · d98588ce
      Alexei Starovoitov authored
      adjust two tests, since verifier got smarter
      and add new one to test stack_zero logic
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d98588ce
    • Alexei Starovoitov's avatar
      bpf: teach verifier to recognize zero initialized stack · cc2b14d5
      Alexei Starovoitov authored
      programs with function calls are often passing various
      pointers via stack. When all calls are inlined llvm
      flattens stack accesses and optimizes away extra branches.
      When functions are not inlined it becomes the job of
      the verifier to recognize zero initialized stack to avoid
      exploring paths that program will not take.
      The following program would fail otherwise:
      
      ptr = &buffer_on_stack;
      *ptr = 0;
      ...
      func_call(.., ptr, ...) {
        if (..)
          *ptr = bpf_map_lookup();
      }
      ...
      if (*ptr != 0) {
        // Access (*ptr)->field is valid.
        // Without stack_zero tracking such (*ptr)->field access
        // will be rejected
      }
      
      since stack slots are no longer uniform invalid | spill | misc
      add liveness marking to all slots, but do it in 8 byte chunks.
      So if nothing was read or written in [fp-16, fp-9] range
      it will be marked as LIVE_NONE.
      If any byte in that range was read, it will be marked LIVE_READ
      and stacksafe() check will perform byte-by-byte verification.
      If all bytes in the range were written the slot will be
      marked as LIVE_WRITTEN.
      This significantly speeds up state equality comparison
      and reduces total number of states processed.
      
                          before   after
      bpf_lb-DLB_L3.o       2051    2003
      bpf_lb-DLB_L4.o       3287    3164
      bpf_lb-DUNKNOWN.o     1080    1080
      bpf_lxc-DDROP_ALL.o   24980   12361
      bpf_lxc-DUNKNOWN.o    34308   16605
      bpf_netdev.o          15404   10962
      bpf_overlay.o         7191    6679
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cc2b14d5
    • Alexei Starovoitov's avatar
      selftests/bpf: add verifier tests for bpf_call · a7ff3eca
      Alexei Starovoitov authored
      Add extensive set of tests for bpf_call verification logic:
      
      calls: basic sanity
      calls: using r0 returned by callee
      calls: callee is using r1
      calls: callee using args1
      calls: callee using wrong args2
      calls: callee using two args
      calls: callee changing pkt pointers
      calls: two calls with args
      calls: two calls with bad jump
      calls: recursive call. test1
      calls: recursive call. test2
      calls: unreachable code
      calls: invalid call
      calls: jumping across function bodies. test1
      calls: jumping across function bodies. test2
      calls: call without exit
      calls: call into middle of ld_imm64
      calls: call into middle of other call
      calls: two calls with bad fallthrough
      calls: two calls with stack read
      calls: two calls with stack write
      calls: spill into caller stack frame
      calls: two calls with stack write and void return
      calls: ambiguous return value
      calls: two calls that return map_value
      calls: two calls that return map_value with bool condition
      calls: two calls that return map_value with incorrect bool check
      calls: two calls that receive map_value via arg=ptr_stack_of_caller. test1
      calls: two calls that receive map_value via arg=ptr_stack_of_caller. test2
      calls: two jumps that receive map_value via arg=ptr_stack_of_jumper. test3
      calls: two calls that receive map_value_ptr_or_null via arg. test1
      calls: two calls that receive map_value_ptr_or_null via arg. test2
      calls: pkt_ptr spill into caller stack
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a7ff3eca
    • Alexei Starovoitov's avatar
      bpf: introduce function calls (verification) · f4d7e40a
      Alexei Starovoitov authored
      Allow arbitrary function calls from bpf function to another bpf function.
      
      To recognize such set of bpf functions the verifier does:
      1. runs control flow analysis to detect function boundaries
      2. proceeds with verification of all functions starting from main(root) function
      It recognizes that the stack of the caller can be accessed by the callee
      (if the caller passed a pointer to its stack to the callee) and the callee
      can store map_value and other pointers into the stack of the caller.
      3. keeps track of the stack_depth of each function to make sure that total
      stack depth is still less than 512 bytes
      4. disallows pointers to the callee stack to be stored into the caller stack,
      since they will be invalid as soon as the callee returns
      5. to reuse all of the existing state_pruning logic each function call
      is considered to be independent call from the verifier point of view.
      The verifier pretends to inline all function calls it sees are being called.
      It stores the callsite instruction index as part of the state to make sure
      that two calls to the same callee from two different places in the caller
      will be different from state pruning point of view
      6. more safety checks are added to liveness analysis
      
      Implementation details:
      . struct bpf_verifier_state is now consists of all stack frames that
        led to this function
      . struct bpf_func_state represent one stack frame. It consists of
        registers in the given frame and its stack
      . propagate_liveness() logic had a premature optimization where
        mark_reg_read() and mark_stack_slot_read() were manually inlined
        with loop iterating over parents for each register or stack slot.
        Undo this optimization to reuse more complex mark_*_read() logic
      . skip_callee() logic is not necessary from safety point of view,
        but without it mark_*_read() markings become too conservative,
        since after returning from the funciton call a read of r6-r9
        will incorrectly propagate the read marks into callee causing
        inefficient pruning later
      . mark_*_read() logic is now aware of control flow which makes it
        more complex. In the future the plan is to rewrite liveness
        to be hierarchical. So that liveness can be done within
        basic block only and control flow will be responsible for
        propagation of liveness information along cfg and between calls.
      . tail_calls and ld_abs insns are not allowed in the programs with
        bpf-to-bpf calls
      . returning stack pointers to the caller or storing them into stack
        frame of the caller is not allowed
      
      Testing:
      . no difference in cilium processed_insn numbers
      . large number of tests follows in next patches
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f4d7e40a
    • Alexei Starovoitov's avatar
      bpf: introduce function calls (function boundaries) · cc8b0b92
      Alexei Starovoitov authored
      Allow arbitrary function calls from bpf function to another bpf function.
      
      Since the beginning of bpf all bpf programs were represented as a single function
      and program authors were forced to use always_inline for all functions
      in their C code. That was causing llvm to unnecessary inflate the code size
      and forcing developers to move code to header files with little code reuse.
      
      With a bit of additional complexity teach verifier to recognize
      arbitrary function calls from one bpf function to another as long as
      all of functions are presented to the verifier as a single bpf program.
      New program layout:
      r6 = r1    // some code
      ..
      r1 = ..    // arg1
      r2 = ..    // arg2
      call pc+1  // function call pc-relative
      exit
      .. = r1    // access arg1
      .. = r2    // access arg2
      ..
      call pc+20 // second level of function call
      ...
      
      It allows for better optimized code and finally allows to introduce
      the core bpf libraries that can be reused in different projects,
      since programs are no longer limited by single elf file.
      With function calls bpf can be compiled into multiple .o files.
      
      This patch is the first step. It detects programs that contain
      multiple functions and checks that calls between them are valid.
      It splits the sequence of bpf instructions (one program) into a set
      of bpf functions that call each other. Calls to only known
      functions are allowed. In the future the verifier may allow
      calls to unresolved functions and will do dynamic linking.
      This logic supports statically linked bpf functions only.
      
      Such function boundary detection could have been done as part of
      control flow graph building in check_cfg(), but it's cleaner to
      separate function boundary detection vs control flow checks within
      a subprogram (function) into logically indepedent steps.
      Follow up patches may split check_cfg() further, but not check_subprogs().
      
      Only allow bpf-to-bpf calls for root only and for non-hw-offloaded programs.
      These restrictions can be relaxed in the future.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cc8b0b92
  2. 15 Dec, 2017 7 commits
  3. 14 Dec, 2017 5 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-bpftool-cgroup-ops' · 7310c233
      Daniel Borkmann authored
      Roman Gushchin says:
      
      ====================
      This patchset adds basic cgroup bpf operations to bpftool.
      
      Right now there is no convenient way to perform these operations.
      The /samples/bpf/load_sock_ops.c implements attach/detacg operations,
      but only for BPF_CGROUP_SOCK_OPS programs. Bps (part of bcc) implements
      bpf introspection, but lacks any cgroup-related specific.
      
      I find having a tool to perform these basic operations in the kernel tree
      very useful, as it can be used in the corresponding bpf documentation
      without creating additional dependencies. And bpftool seems to be
      a right tool to extend with such functionality.
      
      v4:
        - ATTACH_FLAGS and ATTACH_TYPE are listed and described in docs and usage
        - ATTACH_FLAG names converted to "multi" and "override"
        - do_attach() recognizes ATTACH_FLAG abbreviations, e.g "mul"
        - Local variables sorted ("reverse Christmas tree")
        - unknown attach flags value will be never truncated
      
      v3:
        - SRC replaced with OBJ in prog load docs
        - Output unknown attach type in hex
        - License header in SPDX format
        - Minor style fixes (e.g. variable reordering)
      
      v2:
        - Added prog load operations
        - All cgroup operations are looking like bpftool cgroup <command>
        - All cgroup-related stuff is moved to a separate file
        - Added support for attach flags
        - Added support for attaching/detaching programs by id, pinned name, etc
        - Changed cgroup detach arguments order
        - Added empty json output for succesful programs
        - Style fixed: includes order, strncmp and macroses, error handling
        - Added man pages
      
      v1:
        https://lwn.net/Articles/740366/
      ====================
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7310c233
    • Roman Gushchin's avatar
      bpftool: implement cgroup bpf operations · 5ccda64d
      Roman Gushchin authored
      This patch adds basic cgroup bpf operations to bpftool:
      cgroup list, attach and detach commands.
      
      Usage is described in the corresponding man pages,
      and examples are provided.
      
      Syntax:
      $ bpftool cgroup list CGROUP
      $ bpftool cgroup attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]
      $ bpftool cgroup detach CGROUP ATTACH_TYPE PROG
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Quentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5ccda64d
    • Roman Gushchin's avatar
      bpftool: implement prog load command · 49a086c2
      Roman Gushchin authored
      Add the prog load command to load a bpf program from a specified
      binary file and pin it to bpffs.
      
      Usage description and examples are given in the corresponding man
      page.
      
      Syntax:
      $ bpftool prog load OBJ FILE
      
      FILE is a non-existing file on bpffs.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Quentin Monnet <quentin.monnet@netronome.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      49a086c2
    • Roman Gushchin's avatar
      libbpf: prefer global symbols as bpf program name source · fe4d44b2
      Roman Gushchin authored
      Libbpf picks the name of the first symbol in the corresponding
      elf section to use as a program name. But without taking symbol's
      scope into account it may end's up with some local label
      as a program name. E.g.:
      
      $ bpftool prog
      1: type 15  name LBB0_10    tag 0390a5136ba23f5c
      	loaded_at Dec 07/17:22  uid 0
      	xlated 456B  not jited  memlock 4096B
      
      Fix this by preferring global symbols as program name.
      
      For instance:
      $ bpftool prog
      1: type 15  name bpf_prog1  tag 0390a5136ba23f5c
      	loaded_at Dec 07/17:26  uid 0
      	xlated 456B  not jited  memlock 4096B
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Quentin Monnet <quentin.monnet@netronome.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      fe4d44b2
    • Roman Gushchin's avatar
      libbpf: add ability to guess program type based on section name · 583c9009
      Roman Gushchin authored
      The bpf_prog_load() function will guess program type if it's not
      specified explicitly. This functionality will be used to implement
      loading of different programs without asking a user to specify
      the program type. In first order it will be used by bpftool.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Quentin Monnet <quentin.monnet@netronome.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      583c9009
  4. 13 Dec, 2017 1 commit
    • Yonghong Song's avatar
      bpf/tracing: fix kernel/events/core.c compilation error · f4e2298e
      Yonghong Song authored
      Commit f371b304 ("bpf/tracing: allow user space to
      query prog array on the same tp") introduced a perf
      ioctl command to query prog array attached to the
      same perf tracepoint. The commit introduced a
      compilation error under certain config conditions, e.g.,
        (1). CONFIG_BPF_SYSCALL is not defined, or
        (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS
             nor CONFIG_KPROBE_EVENTS is defined.
      
      Error message:
        kernel/events/core.o: In function `perf_ioctl':
        core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array'
      
      This patch fixed this error by guarding the real definition under
      CONFIG_BPF_EVENTS and provided static inline dummy function
      if CONFIG_BPF_EVENTS was not defined.
      It renamed the function from bpf_event_query_prog_array to
      perf_event_query_prog_array and moved the definition from linux/bpf.h
      to linux/trace_events.h so the definition is in proximity to
      other prog_array related functions.
      
      Fixes: f371b304 ("bpf/tracing: allow user space to query prog array on the same tp")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f4e2298e
  5. 12 Dec, 2017 10 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-override-return' · 553a8f2f
      Alexei Starovoitov authored
      Josef Bacik says:
      
      ====================
      This is the same as v8, just rebased onto the bpf tree.
      
      v8->v9:
      - rebased onto the bpf tree.
      
      v7->v8:
      - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.
      
      v6->v7:
      - moved the opt-in macro to bpf.h out of kprobes.h.
      
      v5->v6:
      - add BPF_ALLOW_ERROR_INJECTION() tagging for functions that will support this
        feature.  This way only functions that opt-in will be allowed to be
        overridden.
      - added a btrfs patch to allow error injection for open_ctree() so that the bpf
        sample actually works.
      
      v4->v5:
      - disallow kprobe_override programs from being put in the prog map array so we
        don't tail call into something we didn't check.  This allows us to make the
        normal path still fast without a bunch of percpu operations.
      
      v3->v4:
      - fix a build error found by kbuild test bot (I didn't wait long enough
        apparently.)
      - Added a warning message as per Daniels suggestion.
      
      v2->v3:
      - added a ->kprobe_override flag to bpf_prog.
      - added some sanity checks to disallow attaching bpf progs that have
        ->kprobe_override set that aren't for ftrace kprobes.
      - added the trace_kprobe_ftrace helper to check if the trace_event_call is a
        ftrace kprobe.
      - renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read this
        value in the kprobe path, and thus only write to it if we're overriding or
        clearing the override.
      
      v1->v2:
      - moved things around to make sure that bpf_override_return could really only be
        used for an ftrace kprobe.
      - killed the special return values from trace_call_bpf.
      - renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
        it was being called from an ftrace kprobe context.
      - reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
      - updated the test as per Alexei's review.
      
      - Original message -
      
      A lot of our error paths are not well tested because we have no good way of
      injecting errors generically.  Some subystems (block, memory) have ways to
      inject errors, but they are random so it's hard to get reproduceable results.
      
      With BPF we can add determinism to our error injection.  We can use kprobes and
      other things to verify we are injecting errors at the exact case we are trying
      to test.  This patch gives us the tool to actual do the error injection part.
      It is very simple, we just set the return value of the pt_regs we're given to
      whatever we provide, and then override the PC with a dummy function that simply
      returns.
      
      Right now this only works on x86, but it would be simple enough to expand to
      other architectures.  Thanks,
      
      Josef
      ====================
      
      In patch "bpf: add a bpf_override_function helper" Alexei moved
      "ifdef CONFIG_BPF_KPROBE_OVERRIDE" few lines to fail program loading
      when kprobe_override is not available instead of failing at run-time.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      553a8f2f
    • Josef Bacik's avatar
      btrfs: allow us to inject errors at io_ctl_init · 023f46c5
      Josef Bacik authored
      This was instrumental in reproducing a space cache bug.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      023f46c5
    • Josef Bacik's avatar
      samples/bpf: add a test for bpf_override_return · 965de87e
      Josef Bacik authored
      This adds a basic test for bpf_override_return to verify it works.  We
      override the main function for mounting a btrfs fs so it'll return
      -ENOMEM and then make sure that trying to mount a btrfs fs will fail.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      965de87e
    • Josef Bacik's avatar
      bpf: add a bpf_override_function helper · 9802d865
      Josef Bacik authored
      Error injection is sloppy and very ad-hoc.  BPF could fill this niche
      perfectly with it's kprobe functionality.  We could make sure errors are
      only triggered in specific call chains that we care about with very
      specific situations.  Accomplish this with the bpf_override_funciton
      helper.  This will modify the probe'd callers return value to the
      specified value and set the PC to an override function that simply
      returns, bypassing the originally probed function.  This gives us a nice
      clean way to implement systematic error injection for all of our code
      paths.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9802d865
    • Josef Bacik's avatar
      btrfs: make open_ctree error injectable · 8556e509
      Josef Bacik authored
      This allows us to do error injection with BPF for open_ctree.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8556e509
    • Josef Bacik's avatar
      add infrastructure for tagging functions as error injectable · 92ace999
      Josef Bacik authored
      Using BPF we can override kprob'ed functions and return arbitrary
      values.  Obviously this can be a bit unsafe, so make this feature opt-in
      for functions.  Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
      order to give BPF access to that function for error injection purposes.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      92ace999
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-tracing-multiprog-tp-query' · a23967c1
      Alexei Starovoitov authored
      Yonghong Song says:
      
      ====================
      Commit e87c6bc3 ("bpf: permit multiple bpf attachments
      for a single perf event") added support to attach multiple
      bpf programs to a single perf event. Given a perf event
      (kprobe, uprobe, or kernel tracepoint), the perf ioctl interface
      is used to query bpf programs attached to the same trace event.
      
      There already exists a BPF_PROG_QUERY command for introspection
      currently used by cgroup+bpf. We did have an implementation for
      querying tracepoint+bpf through the same interface. However, it
      looks cleaner to use ioctl() style of api here, since attaching
      bpf prog to tracepoint/kuprobe is also done via ioctl.
      
      Patch #1 had the core implementation and patch #2 added
      a test case in tools bpf selftests suite.
      
      Changelogs:
      v3 -> v4:
        - Fix a compilation error with newer gcc like 6.3.1 while
          old gcc 4.8.5 is okay. I was using &uquery->ids to represent
          the address to the ids array to make it explicit that the
          address is passed, and this syntax is rightly rejected
          by gcc 6.3.1.
      v2 -> v3:
        - Change uapi structure perf_event_query_bpf to be more
          clearer based on Peter's suggestion, and adjust
          other codes accordingly.
      v1 -> v2:
        - Rebase on top of net-next.
        - Use existing bpf_prog_array_length function instead of
          implementing the same functionality in function
          bpf_prog_array_copy_info.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a23967c1
    • Yonghong Song's avatar
      bpf/tracing: add a bpf test for new ioctl query interface · d279f1f8
      Yonghong Song authored
      Added a subtest in test_progs. The tracepoint is
      sched/sched_switch. Multiple bpf programs are attached to
      this tracepoint and the query interface is exercised.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d279f1f8
    • Yonghong Song's avatar
      bpf/tracing: allow user space to query prog array on the same tp · f371b304
      Yonghong Song authored
      Commit e87c6bc3 ("bpf: permit multiple bpf attachments
      for a single perf event") added support to attach multiple
      bpf programs to a single perf event.
      Although this provides flexibility, users may want to know
      what other bpf programs attached to the same tp interface.
      Besides getting visibility for the underlying bpf system,
      such information may also help consolidate multiple bpf programs,
      understand potential performance issues due to a large array,
      and debug (e.g., one bpf program which overwrites return code
      may impact subsequent program results).
      
      Commit 2541517c ("tracing, perf: Implement BPF programs
      attached to kprobes") utilized the existing perf ioctl
      interface and added the command PERF_EVENT_IOC_SET_BPF
      to attach a bpf program to a tracepoint. This patch adds a new
      ioctl command, given a perf event fd, to query the bpf program
      array attached to the same perf tracepoint event.
      
      The new uapi ioctl command:
        PERF_EVENT_IOC_QUERY_BPF
      
      The new uapi/linux/perf_event.h structure:
        struct perf_event_query_bpf {
             __u32	ids_len;
             __u32	prog_cnt;
             __u32	ids[0];
        };
      
      User space provides buffer "ids" for kernel to copy to.
      When returning from the kernel, the number of available
      programs in the array is set in "prog_cnt".
      
      The usage:
        struct perf_event_query_bpf *query =
          malloc(sizeof(*query) + sizeof(u32) * ids_len);
        query.ids_len = ids_len;
        err = ioctl(pmu_efd, PERF_EVENT_IOC_QUERY_BPF, query);
        if (err == 0) {
          /* query.prog_cnt is the number of available progs,
           * number of progs in ids: (ids_len == 0) ? 0 : query.prog_cnt
           */
        } else if (errno == ENOSPC) {
          /* query.ids_len number of progs copied,
           * query.prog_cnt is the number of available progs
           */
        } else {
            /* other errors */
        }
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f371b304
    • Naresh Kamboju's avatar
      selftests: bpf: Adding config fragment CONFIG_CGROUP_BPF=y · 63060c39
      Naresh Kamboju authored
      CONFIG_CGROUP_BPF=y is required for test_dev_cgroup test case.
      Signed-off-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      63060c39
  6. 08 Dec, 2017 5 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-bpftool-makefile-cleanups' · 97c50589
      Daniel Borkmann authored
      Quentin Monnet says:
      
      ====================
      First patch of this series cleans up the two Makefiles (Makefile and
      Documentation/Makefile) and make their contents more consistent.
      
      The second one add "uninstall" and "doc-uninstall" targets, to remove
      files previously installed on the system with "install" and "doc-install"
      targets.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      97c50589
    • Quentin Monnet's avatar
      tools: bpftool: create "uninstall", "doc-uninstall" make targets · d3244248
      Quentin Monnet authored
      Create two targets to remove executable and documentation that would
      have been previously installed with `make install` and `make
      doc-install`.
      
      Also create a "QUIET_UNINST" helper in tools/scripts/Makefile.include.
      
      Do not attempt to remove directories /usr/local/sbin and
      /usr/share/bash-completions/completions, even if they are empty, as
      those specific directories probably already existed on the system before
      we installed the program, and we do not wish to break other makefiles
      that might assume their existence. Do remvoe /usr/local/share/man/man8
      if empty however, as this directory does not seem to exist by default.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d3244248
    • Quentin Monnet's avatar
      tools: bpftool: harmonise Makefile and Documentation/Makefile · 658e85aa
      Quentin Monnet authored
      Several minor fixes and harmonisation items for Makefiles:
      
      * Use the same mechanism for verbose/non-verbose output in two files
        ("$(Q)"), for all commands.
      * Use calls to "QUIET_INSTALL" and equivalent in Makefile. In
        particular, use "call(descend, ...)" instead of "make -C" to run
        documentation targets.
      * Add a "doc-clean" target, aligned on "doc" and "doc-install".
      * Make "install" target in Makefile depend on "bpftool".
      * Remove condition on DESTDIR to initialise prefix in doc Makefile.
      * Remove modification of VPATH based on OUTPUT, it is unused.
      * Formatting: harmonise spaces around equal signs.
      * Make install path for man pages /usr/local/man instead of
        /usr/local/share/man (respects the Makefile conventions, and the
        latter is usually a symbolic link to the former anyway).
      * Do not erase prefix if set by user in bpftool Makefile.
      * Fix install target for bpftool: append DESTDIR to install path.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      658e85aa
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 62cd2770
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf-next 2017-12-07
      
      The following pull-request contains BPF updates for your net-next tree.
      
      The main changes are:
      
      1) Detailed documentation of BPF development process from Daniel.
      
      2) Addition of is_fullsock, snd_cwnd and srtt_us fields to bpf_sock_ops
         from Lawrence.
      
      3) Minor follow up for bpf_skb_set_tunnel_key() from William.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62cd2770
    • Jason Wang's avatar
      tuntap: fix possible deadlock when fail to register netdev · 124da8f6
      Jason Wang authored
      Private destructor could be called when register_netdev() fail with
      rtnl lock held. This will lead deadlock in tun_free_netdev() who tries
      to hold rtnl_lock. Fixing this by switching to use spinlock to
      synchronize.
      
      Fixes: 96f84061 ("tun: add eBPF based queue selection method")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      124da8f6