1. 28 Aug, 2020 11 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-sleepable' · 10496f26
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      v2->v3:
      - switched to minimal allowlist approach. Essentially that means that syscall
        entry, few btrfs allow_error_inject functions, should_fail_bio(), and two LSM
        hooks: file_mprotect and bprm_committed_creds are the only hooks that allow
        attaching of sleepable BPF programs. When comprehensive analysis of LSM hooks
        will be done this allowlist will be extended.
      - added patch 1 that fixes prototypes of two mm functions to reliably work with
        error injection. It's also necessary for resolve_btfids tool to recognize
        these two funcs, but that's secondary.
      
      v1->v2:
      - split fmod_ret fix into separate patch
      - added denylist
      
      v1:
      This patch set introduces the minimal viable support for sleepable bpf programs.
      In this patch only fentry/fexit/fmod_ret and lsm progs can be sleepable.
      Only array and pre-allocated hash and lru maps allowed.
      
      Here is 'perf report' difference of sleepable vs non-sleepable:
         3.86%  bench     [k] __srcu_read_unlock
         3.22%  bench     [k] __srcu_read_lock
         0.92%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
         0.50%  bench     [k] bpf_trampoline_10297
         0.26%  bench     [k] __bpf_prog_exit_sleepable
         0.21%  bench     [k] __bpf_prog_enter_sleepable
      vs
         0.88%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry
         0.84%  bench     [k] bpf_trampoline_10297
         0.13%  bench     [k] __bpf_prog_enter
         0.12%  bench     [k] __bpf_prog_exit
      vs
         0.79%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
         0.72%  bench     [k] bpf_trampoline_10381
         0.31%  bench     [k] __bpf_prog_exit_sleepable
         0.29%  bench     [k] __bpf_prog_enter_sleepable
      
      Sleepable vs non-sleepable program invocation overhead is only marginally higher
      due to rcu_trace. srcu approach is much slower.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      10496f26
    • Alexei Starovoitov's avatar
      selftests/bpf: Add sleepable tests · e68a1445
      Alexei Starovoitov authored
      Modify few tests to sanity test sleepable bpf functionality.
      
      Running 'bench trig-fentry-sleep' vs 'bench trig-fentry' and 'perf report':
      sleepable with SRCU:
         3.86%  bench     [k] __srcu_read_unlock
         3.22%  bench     [k] __srcu_read_lock
         0.92%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
         0.50%  bench     [k] bpf_trampoline_10297
         0.26%  bench     [k] __bpf_prog_exit_sleepable
         0.21%  bench     [k] __bpf_prog_enter_sleepable
      
      sleepable with RCU_TRACE:
         0.79%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
         0.72%  bench     [k] bpf_trampoline_10381
         0.31%  bench     [k] __bpf_prog_exit_sleepable
         0.29%  bench     [k] __bpf_prog_enter_sleepable
      
      non-sleepable with RCU:
         0.88%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry
         0.84%  bench     [k] bpf_trampoline_10297
         0.13%  bench     [k] __bpf_prog_enter
         0.12%  bench     [k] __bpf_prog_exit
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarKP Singh <kpsingh@google.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-6-alexei.starovoitov@gmail.com
      e68a1445
    • Alexei Starovoitov's avatar
      libbpf: Support sleepable progs · 2b288740
      Alexei Starovoitov authored
      Pass request to load program as sleepable via ".s" suffix in the section name.
      If it happens in the future that all map types and helpers are allowed with
      BPF_F_SLEEPABLE flag "fmod_ret/" and "lsm/" can be aliased to "fmod_ret.s/" and
      "lsm.s/" to make all lsm and fmod_ret programs sleepable by default. The fentry
      and fexit programs would always need to have sleepable vs non-sleepable
      distinction, since not all fentry/fexit progs will be attached to sleepable
      kernel functions.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarKP Singh <kpsingh@google.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-5-alexei.starovoitov@gmail.com
      2b288740
    • Alexei Starovoitov's avatar
      bpf: Add bpf_copy_from_user() helper. · 07be4c4a
      Alexei Starovoitov authored
      Sleepable BPF programs can now use copy_from_user() to access user memory.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarKP Singh <kpsingh@google.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-4-alexei.starovoitov@gmail.com
      07be4c4a
    • Alexei Starovoitov's avatar
      bpf: Introduce sleepable BPF programs · 1e6c62a8
      Alexei Starovoitov authored
      Introduce sleepable BPF programs that can request such property for themselves
      via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
      to use helpers like bpf_copy_from_user() that might sleep. At present only
      fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
      when they are attached to kernel functions that are known to allow sleeping.
      
      The non-sleepable programs are relying on implicit rcu_read_lock() and
      migrate_disable() to protect life time of programs, maps that they use and
      per-cpu kernel structures used to pass info between bpf programs and the
      kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
      migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
      should not be enclosed in migrate_disable() as well. Therefore
      rcu_read_lock_trace is used to protect the life time of sleepable progs.
      
      There are many networking and tracing program types. In many cases the
      'struct bpf_prog *' pointer itself is rcu protected within some other kernel
      data structure and the kernel code is using rcu_dereference() to load that
      program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
      Instead sleepable bpf programs are allowed with bpf trampoline only. The
      program pointers are hard-coded into generated assembly of bpf trampoline and
      synchronize_rcu_tasks_trace() is used to protect the life time of the program.
      The same trampoline can hold both sleepable and non-sleepable progs.
      
      When rcu_read_lock_trace is held it means that some sleepable bpf program is
      running from bpf trampoline. Those programs can use bpf arrays and preallocated
      hash/lru maps. These map types are waiting on programs to complete via
      synchronize_rcu_tasks_trace();
      
      Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
      synchronize_rcu_tasks() to wait for sleepable progs to finish and for
      trampoline assembly to finish.
      
      This is the first step of introducing sleepable progs. Eventually dynamically
      allocated hash maps can be allowed and networking program types can become
      sleepable too.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarKP Singh <kpsingh@google.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
      1e6c62a8
    • Alexei Starovoitov's avatar
      mm/error_inject: Fix allow_error_inject function signatures. · 76cd6173
      Alexei Starovoitov authored
      'static' and 'static noinline' function attributes make no guarantees that
      gcc/clang won't optimize them. The compiler may decide to inline 'static'
      function and in such case ALLOW_ERROR_INJECT becomes meaningless. The compiler
      could have inlined __add_to_page_cache_locked() in one callsite and didn't
      inline in another. In such case injecting errors into it would cause
      unpredictable behavior. It's worse with 'static noinline' which won't be
      inlined, but it still can be optimized. Like the compiler may decide to remove
      one argument or constant propagate the value depending on the callsite.
      
      To avoid such issues make sure that these functions are global noinline.
      
      Fixes: af3b8544 ("mm/page_alloc.c: allow error injection")
      Fixes: cfcbfb13 ("mm/filemap.c: enable error injection at add_to_page_cache()")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-2-alexei.starovoitov@gmail.com
      76cd6173
    • Martin KaFai Lau's avatar
      bpf: selftests: Add test for different inner map size · d557ea39
      Martin KaFai Lau authored
      This patch tests the inner map size can be different
      for reuseport_sockarray but has to be the same for
      arraymap.  A new subtest "diff_size" is added for this.
      
      The existing test is moved to a subtest "lookup_update".
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200828011819.1970825-1-kafai@fb.com
      d557ea39
    • Martin KaFai Lau's avatar
      bpf: Relax max_entries check for most of the inner map types · 134fede4
      Martin KaFai Lau authored
      Most of the maps do not use max_entries during verification time.
      Thus, those map_meta_equal() do not need to enforce max_entries
      when it is inserted as an inner map during runtime.  The max_entries
      check is removed from the default implementation bpf_map_meta_equal().
      
      The prog_array_map and xsk_map are exception.  Its map_gen_lookup
      uses max_entries to generate inline lookup code.  Thus, they will
      implement its own map_meta_equal() to enforce max_entries.
      Since there are only two cases now, the max_entries check
      is not refactored and stays in its own .c file.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200828011813.1970516-1-kafai@fb.com
      134fede4
    • Martin KaFai Lau's avatar
      bpf: Add map_meta_equal map ops · f4d05259
      Martin KaFai Lau authored
      Some properties of the inner map is used in the verification time.
      When an inner map is inserted to an outer map at runtime,
      bpf_map_meta_equal() is currently used to ensure those properties
      of the inserting inner map stays the same as the verification
      time.
      
      In particular, the current bpf_map_meta_equal() checks max_entries which
      turns out to be too restrictive for most of the maps which do not use
      max_entries during the verification time.  It limits the use case that
      wants to replace a smaller inner map with a larger inner map.  There are
      some maps do use max_entries during verification though.  For example,
      the map_gen_lookup in array_map_ops uses the max_entries to generate
      the inline lookup code.
      
      To accommodate differences between maps, the map_meta_equal is added
      to bpf_map_ops.  Each map-type can decide what to check when its
      map is used as an inner map during runtime.
      
      Also, some map types cannot be used as an inner map and they are
      currently black listed in bpf_map_meta_alloc() in map_in_map.c.
      It is not unusual that the new map types may not aware that such
      blacklist exists.  This patch enforces an explicit opt-in
      and only allows a map to be used as an inner map if it has
      implemented the map_meta_equal ops.  It is based on the
      discussion in [1].
      
      All maps that support inner map has its map_meta_equal points
      to bpf_map_meta_equal in this patch.  A later patch will
      relax the max_entries check for most maps.  bpf_types.h
      counts 28 map types.  This patch adds 23 ".map_meta_equal"
      by using coccinelle.  -5 for
      	BPF_MAP_TYPE_PROG_ARRAY
      	BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE
      	BPF_MAP_TYPE_STRUCT_OPS
      	BPF_MAP_TYPE_ARRAY_OF_MAPS
      	BPF_MAP_TYPE_HASH_OF_MAPS
      
      The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc()
      is moved such that the same error is returned.
      
      [1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
      f4d05259
    • Yonghong Song's avatar
      bpf: Make bpf_link_info.iter similar to bpf_iter_link_info · b0c9eb37
      Yonghong Song authored
      bpf_link_info.iter is used by link_query to return bpf_iter_link_info
      to user space. Fields may be different, e.g., map_fd vs. map_id, so
      we cannot reuse the exact structure. But make them similar, e.g.,
      
        struct bpf_link_info {
           /* common fields */
           union {
      	struct { ... } raw_tracepoint;
      	struct { ... } tracing;
      	...
      	struct {
      	    /* common fields for iter */
      	    union {
      		struct {
      		    __u32 map_id;
      		} map;
      		/* other structs for other targets */
      	    };
      	};
          };
       };
      
      so the structure is extensible the same way as bpf_iter_link_info.
      
      Fixes: 6b0a249a ("bpf: Implement link_query for bpf iterators")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200828051922.758950-1-yhs@fb.com
      b0c9eb37
    • Jesper Dangaard Brouer's avatar
      tools, bpf/build: Cleanup feature files on make clean · 661b37cd
      Jesper Dangaard Brouer authored
      The system for "Auto-detecting system features" located under
      tools/build/ are (currently) used by perf, libbpf and bpftool. It can
      contain stalled feature detection files, which are not cleaned up by
      libbpf and bpftool on make clean (side-note: perf tool is correct).
      
      Fix this by making the users invoke the make clean target.
      
      Some details about the changes. The libbpf Makefile already had a
      clean-config target (which seems to be copy-pasted from perf), but this
      target was not "connected" (a make dependency) to clean target. Choose
      not to rename target as someone might be using it. Did change the output
      from "CLEAN config" to "CLEAN feature-detect", to make it more clear
      what happens.
      
      This is related to the complaint and troubleshooting in the following
      link: https://lore.kernel.org/lkml/20200818122007.2d1cfe2d@carbon/Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/lkml/20200818122007.2d1cfe2d@carbon/
      Link: https://lore.kernel.org/bpf/159851841661.1072907.13770213104521805592.stgit@firesoul
      661b37cd
  2. 27 Aug, 2020 3 commits
    • Andrii Nakryiko's avatar
      libbpf: Fix compilation warnings for 64-bit printf args · 2e80be60
      Andrii Nakryiko authored
      Fix compilation warnings due to __u64 defined differently as `unsigned long`
      or `unsigned long long` on different architectures (e.g., ppc64le differs from
      x86-64). Also cast one argument to size_t to fix printf warning of similar
      nature.
      
      Fixes: eacaaed7 ("libbpf: Implement enum value-based CO-RE relocations")
      Fixes: 50e09460 ("libbpf: Skip well-known ELF sections when iterating ELF")
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200827041109.3613090-1-andriin@fb.com
      2e80be60
    • Yonghong Song's avatar
      selftests/bpf: Add verifier tests for xor operation · f5493c51
      Yonghong Song authored
      Added some test_verifier bounds check test cases for
      xor operations.
        $ ./test_verifier
        ...
        #78/u bounds check for reg = 0, reg xor 1 OK
        #78/p bounds check for reg = 0, reg xor 1 OK
        #79/u bounds check for reg32 = 0, reg32 xor 1 OK
        #79/p bounds check for reg32 = 0, reg32 xor 1 OK
        #80/u bounds check for reg = 2, reg xor 3 OK
        #80/p bounds check for reg = 2, reg xor 3 OK
        #81/u bounds check for reg = any, reg xor 3 OK
        #81/p bounds check for reg = any, reg xor 3 OK
        #82/u bounds check for reg32 = any, reg32 xor 3 OK
        #82/p bounds check for reg32 = any, reg32 xor 3 OK
        #83/u bounds check for reg > 0, reg xor 3 OK
        #83/p bounds check for reg > 0, reg xor 3 OK
        #84/u bounds check for reg32 > 0, reg32 xor 3 OK
        #84/p bounds check for reg32 > 0, reg32 xor 3 OK
        ...
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200825064609.2018077-1-yhs@fb.com
      f5493c51
    • Yonghong Song's avatar
      bpf: Fix a verifier failure with xor · 2921c90d
      Yonghong Song authored
      bpf selftest test_progs/test_sk_assign failed with llvm 11 and llvm 12.
      Compared to llvm 10, llvm 11 and 12 generates xor instruction which
      is not handled properly in verifier. The following illustrates the
      problem:
      
        16: (b4) w5 = 0
        17: ... R5_w=inv0 ...
        ...
        132: (a4) w5 ^= 1
        133: ... R5_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ...
        ...
        37: (bc) w8 = w5
        38: ... R5=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
                R8_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ...
        ...
        41: (bc) w3 = w8
        42: ... R3_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ...
        45: (56) if w3 != 0x0 goto pc+1
         ... R3_w=inv0 ...
        46: (b7) r1 = 34
        47: R1_w=inv34 R7=pkt(id=0,off=26,r=38,imm=0)
        47: (0f) r7 += r1
        48: R1_w=invP34 R3_w=inv0 R7_w=pkt(id=0,off=60,r=38,imm=0)
        48: (b4) w9 = 0
        49: R1_w=invP34 R3_w=inv0 R7_w=pkt(id=0,off=60,r=38,imm=0)
        49: (69) r1 = *(u16 *)(r7 +0)
        invalid access to packet, off=60 size=2, R7(id=0,off=60,r=38)
        R7 offset is outside of the packet
      
      At above insn 132, w5 = 0, but after w5 ^= 1, we give a really conservative
      value of w5. At insn 45, in reality the condition should be always false.
      But due to conservative value for w3, the verifier evaluates it could be
      true and this later leads to verifier failure complaining potential
      packet out-of-bound access.
      
      This patch implemented proper XOR support in verifier.
      In the above example, we have:
        132: R5=invP0
        132: (a4) w5 ^= 1
        133: R5_w=invP1
        ...
        37: (bc) w8 = w5
        ...
        41: (bc) w3 = w8
        42: R3_w=invP1
        ...
        45: (56) if w3 != 0x0 goto pc+1
        47: R3_w=invP1
        ...
        processed 353 insns ...
      and the verifier can verify the program successfully.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200825064608.2017937-1-yhs@fb.com
      2921c90d
  3. 26 Aug, 2020 8 commits
  4. 25 Aug, 2020 18 commits