An error occurred fetching the project authors.
  1. 16 Mar, 2022 1 commit
  2. 02 Mar, 2022 1 commit
  3. 17 Feb, 2022 1 commit
    • Song Liu's avatar
      bpf: bpf_prog_pack: Set proper size before freeing ro_header · d24d2a2b
      Song Liu authored
      bpf_prog_pack_free() uses header->size to decide whether the header
      should be freed with module_memfree() or the bpf_prog_pack logic.
      However, in kvmalloc() failure path of bpf_jit_binary_pack_alloc(),
      header->size is not set yet. As a result, bpf_prog_pack_free() may treat
      a slice of a pack as a standalone kvmalloc'd header and call
      module_memfree() on the whole pack. This in turn causes use-after-free by
      other users of the pack.
      
      Fix this by setting ro_header->size before freeing ro_header.
      
      Fixes: 33c98058 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]")
      Reported-by: syzbot+2f649ec6d2eea1495a8f@syzkaller.appspotmail.com
      Reported-by: syzbot+ecb1e7e51c52f68f7481@syzkaller.appspotmail.com
      Reported-by: syzbot+87f65c75f4a72db05445@syzkaller.appspotmail.com
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220217183001.1876034-1-song@kernel.org
      d24d2a2b
  4. 11 Feb, 2022 1 commit
  5. 09 Feb, 2022 1 commit
  6. 08 Feb, 2022 6 commits
  7. 27 Jan, 2022 1 commit
  8. 21 Jan, 2022 1 commit
  9. 30 Nov, 2021 2 commits
  10. 16 Nov, 2021 1 commit
    • Tiezhu Yang's avatar
      bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33 · ebf7f6f0
      Tiezhu Yang authored
      In the current code, the actual max tail call count is 33 which is greater
      than MAX_TAIL_CALL_CNT (defined as 32). The actual limit is not consistent
      with the meaning of MAX_TAIL_CALL_CNT and thus confusing at first glance.
      We can see the historical evolution from commit 04fd61ab ("bpf: allow
      bpf programs to tail-call other bpf programs") and commit f9dabe01
      ("bpf: Undo off-by-one in interpreter tail call count limit"). In order
      to avoid changing existing behavior, the actual limit is 33 now, this is
      reasonable.
      
      After commit 874be05f ("bpf, tests: Add tail call test suite"), we can
      see there exists failed testcase.
      
      On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
       # echo 0 > /proc/sys/net/core/bpf_jit_enable
       # modprobe test_bpf
       # dmesg | grep -w FAIL
       Tail call error path, max count reached jited:0 ret 34 != 33 FAIL
      
      On some archs:
       # echo 1 > /proc/sys/net/core/bpf_jit_enable
       # modprobe test_bpf
       # dmesg | grep -w FAIL
       Tail call error path, max count reached jited:1 ret 34 != 33 FAIL
      
      Although the above failed testcase has been fixed in commit 18935a72
      ("bpf/tests: Fix error in tail call limit tests"), it would still be good
      to change the value of MAX_TAIL_CALL_CNT from 32 to 33 to make the code
      more readable.
      
      The 32-bit x86 JIT was using a limit of 32, just fix the wrong comments and
      limit to 33 tail calls as the constant MAX_TAIL_CALL_CNT updated. For the
      mips64 JIT, use "ori" instead of "addiu" as suggested by Johan Almbladh.
      For the riscv JIT, use RV_REG_TCC directly to save one register move as
      suggested by Björn Töpel. For the other implementations, no function changes,
      it does not change the current limit 33, the new value of MAX_TAIL_CALL_CNT
      can reflect the actual max tail call count, the related tail call testcases
      in test_bpf module and selftests can work well for the interpreter and the
      JIT.
      
      Here are the test results on x86_64:
      
       # uname -m
       x86_64
       # echo 0 > /proc/sys/net/core/bpf_jit_enable
       # modprobe test_bpf test_suite=test_tail_calls
       # dmesg | tail -1
       test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
       # rmmod test_bpf
       # echo 1 > /proc/sys/net/core/bpf_jit_enable
       # modprobe test_bpf test_suite=test_tail_calls
       # dmesg | tail -1
       test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
       # rmmod test_bpf
       # ./test_progs -t tailcalls
       #142 tailcalls:OK
       Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Tested-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Acked-by: default avatarBjörn Töpel <bjorn@kernel.org>
      Acked-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Link: https://lore.kernel.org/bpf/1636075800-3264-1-git-send-email-yangtiezhu@loongson.cn
      ebf7f6f0
  11. 06 Nov, 2021 1 commit
    • Martin KaFai Lau's avatar
      bpf: Stop caching subprog index in the bpf_pseudo_func insn · 3990ed4c
      Martin KaFai Lau authored
      This patch is to fix an out-of-bound access issue when jit-ing the
      bpf_pseudo_func insn (i.e. ld_imm64 with src_reg == BPF_PSEUDO_FUNC)
      
      In jit_subprog(), it currently reuses the subprog index cached in
      insn[1].imm.  This subprog index is an index into a few array related
      to subprogs.  For example, in jit_subprog(), it is an index to the newly
      allocated 'struct bpf_prog **func' array.
      
      The subprog index was cached in insn[1].imm after add_subprog().  However,
      this could become outdated (and too big in this case) if some subprogs
      are completely removed during dead code elimination (in
      adjust_subprog_starts_after_remove).  The cached index in insn[1].imm
      is not updated accordingly and causing out-of-bound issue in the later
      jit_subprog().
      
      Unlike bpf_pseudo_'func' insn, the current bpf_pseudo_'call' insn
      is handling the DCE properly by calling find_subprog(insn->imm) to
      figure out the index instead of caching the subprog index.
      The existing bpf_adj_branches() will adjust the insn->imm
      whenever insn is added or removed.
      
      Instead of having two ways handling subprog index,
      this patch is to make bpf_pseudo_func works more like
      bpf_pseudo_call.
      
      First change is to stop caching the subprog index result
      in insn[1].imm after add_subprog().  The verification
      process will use find_subprog(insn->imm) to figure
      out the subprog index.
      
      Second change is in bpf_adj_branches() and have it to
      adjust the insn->imm for the bpf_pseudo_func insn also
      whenever insn is added or removed.
      
      Third change is in jit_subprog().  Like the bpf_pseudo_call handling,
      bpf_pseudo_func temporarily stores the find_subprog() result
      in insn->off.  It is fine because the prog's insn has been finalized
      at this point.  insn->off will be reset back to 0 later to avoid
      confusing the userspace prog dump tool.
      
      Fixes: 69c087ba ("bpf: Add bpf_for_each_map_elem() helper")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211106014014.651018-1-kafai@fb.com
      3990ed4c
  12. 26 Oct, 2021 1 commit
    • Toke Høiland-Jørgensen's avatar
      bpf: Fix potential race in tail call compatibility check · 54713c85
      Toke Høiland-Jørgensen authored
      Lorenzo noticed that the code testing for program type compatibility of
      tail call maps is potentially racy in that two threads could encounter a
      map with an unset type simultaneously and both return true even though they
      are inserting incompatible programs.
      
      The race window is quite small, but artificially enlarging it by adding a
      usleep_range() inside the check in bpf_prog_array_compatible() makes it
      trivial to trigger from userspace with a program that does, essentially:
      
              map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0);
              pid = fork();
              if (pid) {
                      key = 0;
                      value = xdp_fd;
              } else {
                      key = 1;
                      value = tc_fd;
              }
              err = bpf_map_update_elem(map_fd, &key, &value, 0);
      
      While the race window is small, it has potentially serious ramifications in
      that triggering it would allow a BPF program to tail call to a program of a
      different type. So let's get rid of it by protecting the update with a
      spinlock. The commit in the Fixes tag is the last commit that touches the
      code in question.
      
      v2:
      - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei)
      v3:
      - Put lock and the members it protects into an embedded 'owner' struct (Daniel)
      
      Fixes: 3324b584 ("ebpf: misc core cleanup")
      Reported-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
      54713c85
  13. 23 Oct, 2021 1 commit
  14. 06 Oct, 2021 1 commit
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Introduce BPF support for kernel module function calls · 2357672c
      Kumar Kartikeya Dwivedi authored
      This change adds support on the kernel side to allow for BPF programs to
      call kernel module functions. Userspace will prepare an array of module
      BTF fds that is passed in during BPF_PROG_LOAD using fd_array parameter.
      In the kernel, the module BTFs are placed in the auxilliary struct for
      bpf_prog, and loaded as needed.
      
      The verifier then uses insn->off to index into the fd_array. insn->off
      0 is reserved for vmlinux BTF (for backwards compat), so userspace must
      use an fd_array index > 0 for module kfunc support. kfunc_btf_tab is
      sorted based on offset in an array, and each offset corresponds to one
      descriptor, with a max limit up to 256 such module BTFs.
      
      We also change existing kfunc_tab to distinguish each element based on
      imm, off pair as each such call will now be distinct.
      
      Another change is to check_kfunc_call callback, which now include a
      struct module * pointer, this is to be used in later patch such that the
      kfunc_id and module pointer are matched for dynamically registered BTF
      sets from loadable modules, so that same kfunc_id in two modules doesn't
      lead to check_kfunc_call succeeding. For the duration of the
      check_kfunc_call, the reference to struct module exists, as it returns
      the pointer stored in kfunc_btf_tab.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211002011757.311265-2-memxor@gmail.com
      2357672c
  15. 28 Sep, 2021 1 commit
  16. 17 Sep, 2021 1 commit
  17. 19 Aug, 2021 1 commit
  18. 16 Aug, 2021 2 commits
    • Andrii Nakryiko's avatar
      bpf: Allow to specify user-provided bpf_cookie for BPF perf links · 82e6b1ee
      Andrii Nakryiko authored
      Add ability for users to specify custom u64 value (bpf_cookie) when creating
      BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
      tracepoints).
      
      This is useful for cases when the same BPF program is used for attaching and
      processing invocation of different tracepoints/kprobes/uprobes in a generic
      fashion, but such that each invocation is distinguished from each other (e.g.,
      BPF program can look up additional information associated with a specific
      kernel function without having to rely on function IP lookups). This enables
      new use cases to be implemented simply and efficiently that previously were
      possible only through code generation (and thus multiple instances of almost
      identical BPF program) or compilation at runtime (BCC-style) on target hosts
      (even more expensive resource-wise). For uprobes it is not even possible in
      some cases to know function IP before hand (e.g., when attaching to shared
      library without PID filtering, in which case base load address is not known
      for a library).
      
      This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
      corresponding to each attached and run BPF program. Given cgroup BPF programs
      already use two 8-byte pointers for their needs and cgroup BPF programs don't
      have (yet?) support for bpf_cookie, reuse that space through union of
      cgroup_storage and new bpf_cookie field.
      
      Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
      This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
      program execution code, which luckily is now also split from
      BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
      giving access to this user-provided cookie value from inside a BPF program.
      Generic perf_event BPF programs will access this value from perf_event itself
      through passed in BPF program context.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org
      82e6b1ee
    • Andrii Nakryiko's avatar
      bpf: Refactor BPF_PROG_RUN into a function · fb7dd8bc
      Andrii Nakryiko authored
      Turn BPF_PROG_RUN into a proper always inlined function. No functional and
      performance changes are intended, but it makes it much easier to understand
      what's going on with how BPF programs are actually get executed. It's more
      obvious what types and callbacks are expected. Also extra () around input
      parameters can be dropped, as well as `__` variable prefixes intended to avoid
      naming collisions, which makes the code simpler to read and write.
      
      This refactoring also highlighted one extra issue. BPF_PROG_RUN is both
      a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
      BPF_PROG_RUN into a function causes naming conflict compilation error. So
      rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
      bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of
      BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org
      fb7dd8bc
  19. 10 Aug, 2021 1 commit
  20. 02 Aug, 2021 1 commit
  21. 28 Jul, 2021 1 commit
    • Daniel Borkmann's avatar
      bpf: Introduce BPF nospec instruction for mitigating Spectre v4 · f5e81d11
      Daniel Borkmann authored
      In case of JITs, each of the JIT backends compiles the BPF nospec instruction
      /either/ to a machine instruction which emits a speculation barrier /or/ to
      /no/ machine instruction in case the underlying architecture is not affected
      by Speculative Store Bypass or has different mitigations in place already.
      
      This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence'
      instruction for mitigation. In case of arm64, we rely on the firmware mitigation
      as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled,
      it works for all of the kernel code with no need to provide any additional
      instructions here (hence only comment in arm64 JIT). Other archs can follow
      as needed. The BPF nospec instruction is specifically targeting Spectre v4
      since i) we don't use a serialization barrier for the Spectre v1 case, and
      ii) mitigation instructions for v1 and v4 might be different on some archs.
      
      The BPF nospec is required for a future commit, where the BPF verifier does
      annotate intermediate BPF programs with speculation barriers.
      Co-developed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Co-developed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Signed-off-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f5e81d11
  22. 09 Jul, 2021 1 commit
    • John Fastabend's avatar
      bpf: Track subprog poke descriptors correctly and fix use-after-free · f263a814
      John Fastabend authored
      Subprograms are calling map_poke_track(), but on program release there is no
      hook to call map_poke_untrack(). However, on program release, the aux memory
      (and poke descriptor table) is freed even though we still have a reference to
      it in the element list of the map aux data. When we run map_poke_run(), we then
      end up accessing free'd memory, triggering KASAN in prog_array_map_poke_run():
      
        [...]
        [  402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e
        [  402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337
        [  402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G          I       5.12.0+ #399
        [  402.824715] Call Trace:
        [  402.824719]  dump_stack+0x93/0xc2
        [  402.824727]  print_address_description.constprop.0+0x1a/0x140
        [  402.824736]  ? prog_array_map_poke_run+0xc2/0x34e
        [  402.824740]  ? prog_array_map_poke_run+0xc2/0x34e
        [  402.824744]  kasan_report.cold+0x7c/0xd8
        [  402.824752]  ? prog_array_map_poke_run+0xc2/0x34e
        [  402.824757]  prog_array_map_poke_run+0xc2/0x34e
        [  402.824765]  bpf_fd_array_map_update_elem+0x124/0x1a0
        [...]
      
      The elements concerned are walked as follows:
      
          for (i = 0; i < elem->aux->size_poke_tab; i++) {
                 poke = &elem->aux->poke_tab[i];
          [...]
      
      The access to size_poke_tab is a 4 byte read, verified by checking offsets
      in the KASAN dump:
      
        [  402.825004] The buggy address belongs to the object at ffff8881905a7800
                       which belongs to the cache kmalloc-1k of size 1024
        [  402.825008] The buggy address is located 320 bytes inside of
                       1024-byte region [ffff8881905a7800, ffff8881905a7c00)
      
      The pahole output of bpf_prog_aux:
      
        struct bpf_prog_aux {
          [...]
          /* --- cacheline 5 boundary (320 bytes) --- */
          u32                        size_poke_tab;        /*   320     4 */
          [...]
      
      In general, subprograms do not necessarily manage their own data structures.
      For example, BTF func_info and linfo are just pointers to the main program
      structure. This allows reference counting and cleanup to be done on the latter
      which simplifies their management a bit. The aux->poke_tab struct, however,
      did not follow this logic. The initial proposed fix for this use-after-free
      bug further embedded poke data tracking into the subprogram with proper
      reference counting. However, Daniel and Alexei questioned why we were treating
      these objects special; I agree, its unnecessary. The fix here removes the per
      subprogram poke table allocation and map tracking and instead simply points
      the aux->poke_tab pointer at the main programs poke table. This way, map
      tracking is simplified to the main program and we do not need to manage them
      per subprogram.
      
      This also means, bpf_prog_free_deferred(), which unwinds the program reference
      counting and kfrees objects, needs to ensure that we don't try to double free
      the poke_tab when free'ing the subprog structures. This is easily solved by
      NULL'ing the poke_tab pointer. The second detail is to ensure that per
      subprogram JIT logic only does fixups on poke_tab[] entries it owns. To do
      this, we add a pointer in the poke structure to point at the subprogram value
      so JITs can easily check while walking the poke_tab structure if the current
      entry belongs to the current program. The aux pointer is stable and therefore
      suitable for such comparison. On the jit_subprogs() error path, we omit
      cleaning up the poke->aux field because these are only ever referenced from
      the JIT side, but on error we will never make it to the JIT, so its fine to
      leave them dangling. Removing these pointers would complicate the error path
      for no reason. However, we do need to untrack all poke descriptors from the
      main program as otherwise they could race with the freeing of JIT memory from
      the subprograms. Lastly, a748c697 ("bpf: propagate poke descriptors to
      subprograms") had an off-by-one on the subprogram instruction index range
      check as it was testing 'insn_idx >= subprog_start && insn_idx <= subprog_end'.
      However, subprog_end is the next subprogram's start instruction.
      
      Fixes: a748c697 ("bpf: propagate poke descriptors to subprograms")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Co-developed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210707223848.14580-2-john.fastabend@gmail.com
      f263a814
  23. 17 Jun, 2021 1 commit
    • Daniel Borkmann's avatar
      bpf: Fix up register-based shifts in interpreter to silence KUBSAN · 28131e9d
      Daniel Borkmann authored
      syzbot reported a shift-out-of-bounds that KUBSAN observed in the
      interpreter:
      
        [...]
        UBSAN: shift-out-of-bounds in kernel/bpf/core.c:1420:2
        shift exponent 255 is too large for 64-bit type 'long long unsigned int'
        CPU: 1 PID: 11097 Comm: syz-executor.4 Not tainted 5.12.0-rc2-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:79 [inline]
         dump_stack+0x141/0x1d7 lib/dump_stack.c:120
         ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
         __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
         ___bpf_prog_run.cold+0x19/0x56c kernel/bpf/core.c:1420
         __bpf_prog_run32+0x8f/0xd0 kernel/bpf/core.c:1735
         bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline]
         bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline]
         bpf_prog_run_clear_cb include/linux/filter.h:755 [inline]
         run_filter+0x1a1/0x470 net/packet/af_packet.c:2031
         packet_rcv+0x313/0x13e0 net/packet/af_packet.c:2104
         dev_queue_xmit_nit+0x7c2/0xa90 net/core/dev.c:2387
         xmit_one net/core/dev.c:3588 [inline]
         dev_hard_start_xmit+0xad/0x920 net/core/dev.c:3609
         __dev_queue_xmit+0x2121/0x2e00 net/core/dev.c:4182
         __bpf_tx_skb net/core/filter.c:2116 [inline]
         __bpf_redirect_no_mac net/core/filter.c:2141 [inline]
         __bpf_redirect+0x548/0xc80 net/core/filter.c:2164
         ____bpf_clone_redirect net/core/filter.c:2448 [inline]
         bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2420
         ___bpf_prog_run+0x34e1/0x77d0 kernel/bpf/core.c:1523
         __bpf_prog_run512+0x99/0xe0 kernel/bpf/core.c:1737
         bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline]
         bpf_test_run+0x3ed/0xc50 net/bpf/test_run.c:50
         bpf_prog_test_run_skb+0xabc/0x1c50 net/bpf/test_run.c:582
         bpf_prog_test_run kernel/bpf/syscall.c:3127 [inline]
         __do_sys_bpf+0x1ea9/0x4f00 kernel/bpf/syscall.c:4406
         do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        [...]
      
      Generally speaking, KUBSAN reports from the kernel should be fixed.
      However, in case of BPF, this particular report caused concerns since
      the large shift is not wrong from BPF point of view, just undefined.
      In the verifier, K-based shifts that are >= {64,32} (depending on the
      bitwidth of the instruction) are already rejected. The register-based
      cases were not given their content might not be known at verification
      time. Ideas such as verifier instruction rewrite with an additional
      AND instruction for the source register were brought up, but regularly
      rejected due to the additional runtime overhead they incur.
      
      As Edward Cree rightly put it:
      
        Shifts by more than insn bitness are legal in the BPF ISA; they are
        implementation-defined behaviour [of the underlying architecture],
        rather than UB, and have been made legal for performance reasons.
        Each of the JIT backends compiles the BPF shift operations to machine
        instructions which produce implementation-defined results in such a
        case; the resulting contents of the register may be arbitrary but
        program behaviour as a whole remains defined.
      
        Guard checks in the fast path (i.e. affecting JITted code) will thus
        not be accepted.
      
        The case of division by zero is not truly analogous here, as division
        instructions on many of the JIT-targeted architectures will raise a
        machine exception / fault on division by zero, whereas (to the best
        of my knowledge) none will do so on an out-of-bounds shift.
      
      Given the KUBSAN report only affects the BPF interpreter, but not JITs,
      one solution is to add the ANDs with 63 or 31 into ___bpf_prog_run().
      That would make the shifts defined, and thus shuts up KUBSAN, and the
      compiler would optimize out the AND on any CPU that interprets the shift
      amounts modulo the width anyway (e.g., confirmed from disassembly that
      on x86-64 and arm64 the generated interpreter code is the same before
      and after this fix).
      
      The BPF interpreter is slow path, and most likely compiled out anyway
      as distros select BPF_JIT_ALWAYS_ON to avoid speculative execution of
      BPF instructions by the interpreter. Given the main argument was to
      avoid sacrificing performance, the fact that the AND is optimized away
      from compiler for mainstream archs helps as well as a solution moving
      forward. Also add a comment on LSH/RSH/ARSH translation for JIT authors
      to provide guidance when they see the ___bpf_prog_run() interpreter
      code and use it as a model for a new JIT backend.
      
      Reported-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com
      Reported-by: default avatarKurt Manucredo <fuzzybritches0@gmail.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@kernel.org>
      Co-developed-by: default avatarEric Biggers <ebiggers@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Tested-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com
      Cc: Edward Cree <ecree.xilinx@gmail.com>
      Link: https://lore.kernel.org/bpf/0000000000008f912605bd30d5d7@google.com
      Link: https://lore.kernel.org/bpf/bac16d8d-c174-bdc4-91bd-bfa62b410190@gmail.com
      28131e9d
  24. 02 Apr, 2021 1 commit
  25. 27 Mar, 2021 2 commits
    • Martin KaFai Lau's avatar
      bpf: Support bpf program calling kernel function · e6ac2450
      Martin KaFai Lau authored
      This patch adds support to BPF verifier to allow bpf program calling
      kernel function directly.
      
      The use case included in this set is to allow bpf-tcp-cc to directly
      call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()").  Those
      functions have already been used by some kernel tcp-cc implementations.
      
      This set will also allow the bpf-tcp-cc program to directly call the
      kernel tcp-cc implementation,  For example, a bpf_dctcp may only want to
      implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
      from the kernel tcp_dctcp.c instead of reimplementing (or
      copy-and-pasting) them.
      
      The tcp-cc kernel functions mentioned above will be white listed
      for the struct_ops bpf-tcp-cc programs to use in a later patch.
      The white listed functions are not bounded to a fixed ABI contract.
      Those functions have already been used by the existing kernel tcp-cc.
      If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
      implementations have to be changed.  The same goes for the struct_ops
      bpf-tcp-cc programs which have to be adjusted accordingly.
      
      This patch is to make the required changes in the bpf verifier.
      
      First change is in btf.c, it adds a case in "btf_check_func_arg_match()".
      When the passed in "btf->kernel_btf == true", it means matching the
      verifier regs' states with a kernel function.  This will handle the
      PTR_TO_BTF_ID reg.  It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET,
      and PTR_TO_TCP_SOCK to its kernel's btf_id.
      
      In the later libbpf patch, the insn calling a kernel function will
      look like:
      
      insn->code == (BPF_JMP | BPF_CALL)
      insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */
      insn->imm == func_btf_id /* btf_id of the running kernel */
      
      [ For the future calling function-in-kernel-module support, an array
        of module btf_fds can be passed at the load time and insn->off
        can be used to index into this array. ]
      
      At the early stage of verifier, the verifier will collect all kernel
      function calls into "struct bpf_kfunc_desc".  Those
      descriptors are stored in "prog->aux->kfunc_tab" and will
      be available to the JIT.  Since this "add" operation is similar
      to the current "add_subprog()" and looking for the same insn->code,
      they are done together in the new "add_subprog_and_kfunc()".
      
      In the "do_check()" stage, the new "check_kfunc_call()" is added
      to verify the kernel function call instruction:
      1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE.
         A new bpf_verifier_ops "check_kfunc_call" is added to do that.
         The bpf-tcp-cc struct_ops program will implement this function in
         a later patch.
      2. Call "btf_check_kfunc_args_match()" to ensure the regs can be
         used as the args of a kernel function.
      3. Mark the regs' type, subreg_def, and zext_dst.
      
      At the later do_misc_fixups() stage, the new fixup_kfunc_call()
      will replace the insn->imm with the function address (relative
      to __bpf_call_base).  If needed, the jit can find the btf_func_model
      by calling the new bpf_jit_find_kfunc_model(prog, insn).
      With the imm set to the function address, "bpftool prog dump xlated"
      will be able to display the kernel function calls the same way as
      it displays other bpf helper calls.
      
      gpl_compatible program is required to call kernel function.
      
      This feature currently requires JIT.
      
      The verifier selftests are adjusted because of the changes in
      the verbose log in add_subprog_and_kfunc().
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210325015142.1544736-1-kafai@fb.com
      e6ac2450
    • Martin KaFai Lau's avatar
      bpf: Simplify freeing logic in linfo and jited_linfo · e16301fb
      Martin KaFai Lau authored
      This patch simplifies the linfo freeing logic by combining
      "bpf_prog_free_jited_linfo()" and "bpf_prog_free_unused_jited_linfo()"
      into the new "bpf_prog_jit_attempt_done()".
      It is a prep work for the kernel function call support.  In a later
      patch, freeing the kernel function call descriptors will also
      be done in the "bpf_prog_jit_attempt_done()".
      
      "bpf_prog_free_linfo()" is removed since it is only called by
      "__bpf_prog_put_noref()".  The kvfree() are directly called
      instead.
      
      It also takes this chance to s/kcalloc/kvcalloc/ for the jited_linfo
      allocation.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210325015130.1544323-1-kafai@fb.com
      e16301fb
  26. 17 Mar, 2021 1 commit
    • Alexei Starovoitov's avatar
      bpf: Fix fexit trampoline. · e21aa341
      Alexei Starovoitov authored
      The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
      The synchronize_rcu_tasks() will not wait for such tasks to complete.
      In such case the trampoline image will be freed and when the task
      wakes up the return IP will point to freed memory causing the crash.
      Solve this by adding percpu_ref_get/put for the duration of trampoline
      and separate trampoline vs its image life times.
      The "half page" optimization has to be removed, since
      first_half->second_half->first_half transition cannot be guaranteed to
      complete in deterministic time. Every trampoline update becomes a new image.
      The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
      call_rcu_tasks. Together they will wait for the original function and
      trampoline asm to complete. The trampoline is patched from nop to jmp to skip
      fexit progs. They are freed independently from the trampoline. The image with
      fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
      will wait for both sleepable and non-sleepable progs to complete.
      
      Fixes: fec56f58 ("bpf: Introduce BPF trampoline")
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: Paul E. McKenney <paulmck@kernel.org>  # for RCU
      Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
      e21aa341
  27. 05 Mar, 2021 1 commit
  28. 22 Feb, 2021 1 commit
    • Cong Wang's avatar
      bpf: Clear percpu pointers in bpf_prog_clone_free() · 53f523f3
      Cong Wang authored
      Similar to bpf_prog_realloc(), bpf_prog_clone_create() also copies
      the percpu pointers, but the clone still shares them with the original
      prog, so we have to clear these two percpu pointers in
      bpf_prog_clone_free(). Otherwise we would get a double free:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP PTI
       CPU: 13 PID: 8140 Comm: kworker/13:247 Kdump: loaded Tainted: G                W    OE
        5.11.0-rc4.bm.1-amd64+ #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
       test_bpf: #1 TXA
       Workqueue: events bpf_prog_free_deferred
       RIP: 0010:percpu_ref_get_many.constprop.97+0x42/0xf0
       Code: [...]
       RSP: 0018:ffffa6bce1f9bda0 EFLAGS: 00010002
       RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000021dfc7b
       RDX: ffffffffae2eeb90 RSI: 867f92637e338da5 RDI: 0000000000000046
       RBP: ffffa6bce1f9bda8 R08: 0000000000000000 R09: 0000000000000001
       R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000280
       R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b5f3ffdedc0
       FS:    0000000000000000(0000) GS:ffff9b5f2fb40000(0000) knlGS:0000000000000000
       CS:    0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 000000027c36c002 CR4: 00000000003706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
          refill_obj_stock+0x5e/0xd0
          free_percpu+0xee/0x550
          __bpf_prog_free+0x4d/0x60
          process_one_work+0x26a/0x590
          worker_thread+0x3c/0x390
          ? process_one_work+0x590/0x590
          kthread+0x130/0x150
          ? kthread_park+0x80/0x80
          ret_from_fork+0x1f/0x30
      
      This bug is 100% reproducible with test_kmod.sh.
      
      Fixes: 700d4796 ("bpf: Optimize program stats")
      Fixes: ca06f55b ("bpf: Add per-program recursion prevention mechanism")
      Reported-by: default avatarJiang Wang <jiang.wang@bytedance.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210218001647.71631-1-xiyou.wangcong@gmail.com
      53f523f3
  29. 12 Feb, 2021 1 commit
  30. 11 Feb, 2021 2 commits
  31. 15 Jan, 2021 1 commit