1. 13 May, 2023 4 commits
  2. 12 May, 2023 3 commits
    • Andrii Nakryiko's avatar
      libbpf: fix offsetof() and container_of() to work with CO-RE · bdeeed34
      Andrii Nakryiko authored
      It seems like __builtin_offset() doesn't preserve CO-RE field
      relocations properly. So if offsetof() macro is defined through
      __builtin_offset(), CO-RE-enabled BPF code using container_of() will be
      subtly and silently broken.
      
      To avoid this problem, redefine offsetof() and container_of() in the
      form that works with CO-RE relocations more reliably.
      
      Fixes: 5fbc2208 ("tools/libpf: Add offsetof/container_of macro in bpf_helpers.h")
      Reported-by: default avatarLennart Poettering <lennart@poettering.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230509065502.2306180-1-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bdeeed34
    • Martin KaFai Lau's avatar
      bpf: Address KCSAN report on bpf_lru_list · ee9fd0ac
      Martin KaFai Lau authored
      KCSAN reported a data-race when accessing node->ref.
      Although node->ref does not have to be accurate,
      take this chance to use a more common READ_ONCE() and WRITE_ONCE()
      pattern instead of data_race().
      
      There is an existing bpf_lru_node_is_ref() and bpf_lru_node_set_ref().
      This patch also adds bpf_lru_node_clear_ref() to do the
      WRITE_ONCE(node->ref, 0) also.
      
      ==================================================================
      BUG: KCSAN: data-race in __bpf_lru_list_rotate / __htab_lru_percpu_map_update_elem
      
      write to 0xffff888137038deb of 1 bytes by task 11240 on cpu 1:
      __bpf_lru_node_move kernel/bpf/bpf_lru_list.c:113 [inline]
      __bpf_lru_list_rotate_active kernel/bpf/bpf_lru_list.c:149 [inline]
      __bpf_lru_list_rotate+0x1bf/0x750 kernel/bpf/bpf_lru_list.c:240
      bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:329 [inline]
      bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline]
      bpf_lru_pop_free+0x638/0xe20 kernel/bpf/bpf_lru_list.c:499
      prealloc_lru_pop kernel/bpf/hashtab.c:290 [inline]
      __htab_lru_percpu_map_update_elem+0xe7/0x820 kernel/bpf/hashtab.c:1316
      bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
      bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
      generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
      bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
      __sys_bpf+0x338/0x810
      __do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
      __se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
      __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff888137038deb of 1 bytes by task 11241 on cpu 0:
      bpf_lru_node_set_ref kernel/bpf/bpf_lru_list.h:70 [inline]
      __htab_lru_percpu_map_update_elem+0x2f1/0x820 kernel/bpf/hashtab.c:1332
      bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
      bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
      generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
      bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
      __sys_bpf+0x338/0x810
      __do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
      __se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
      __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x01 -> 0x00
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 11241 Comm: syz-executor.3 Not tainted 6.3.0-rc7-syzkaller-00136-g6a66fdd2 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
      ==================================================================
      
      Reported-by: syzbot+ebe648a84e8784763f82@syzkaller.appspotmail.com
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230511043748.1384166-1-martin.lau@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ee9fd0ac
    • Alan Maguire's avatar
      bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25 · 7b99f759
      Alan Maguire authored
      v1.25 of pahole supports filtering out functions with multiple inconsistent
      function prototypes or optimized-out parameters from the BTF representation.
      These present problems because there is no additional info in BTF saying which
      inconsistent prototype matches which function instance to help guide attachment,
      and functions with optimized-out parameters can lead to incorrect assumptions
      about register contents.
      
      So for now, filter out such functions while adding BTF representations for
      functions that have "."-suffixes (foo.isra.0) but not optimized-out parameters.
      This patch assumes that below linked changes land in pahole for v1.25.
      
      Issues with pahole filtering being too aggressive in removing functions
      appear to be resolved now, but CI and further testing will confirm.
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20230510130241.1696561-1-alan.maguire@oracle.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7b99f759
  3. 06 May, 2023 9 commits
  4. 05 May, 2023 14 commits
    • Pengcheng Yang's avatar
      samples/bpf: Fix buffer overflow in tcp_basertt · f4dea968
      Pengcheng Yang authored
      Using sizeof(nv) or strlen(nv)+1 is correct.
      
      Fixes: c890063e ("bpf: sample BPF_SOCKET_OPS_BASE_RTT program")
      Signed-off-by: default avatarPengcheng Yang <yangpc@wangsu.com>
      Link: https://lore.kernel.org/r/1683276658-2860-1-git-send-email-yangpc@wangsu.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f4dea968
    • Will Hawkins's avatar
      bpf, docs: Update llvm_relocs.rst with typo fixes · 69535186
      Will Hawkins authored
      Correct a few typographical errors and fix some mistakes in examples.
      Signed-off-by: default avatarWill Hawkins <hawkinsw@obs.cr>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230428023015.1698072-2-hawkinsw@obs.crSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      69535186
    • Alexei Starovoitov's avatar
      Merge branch 'Add precision propagation for subprogs and callbacks' · fbc0b025
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      As more and more real-world BPF programs become more complex
      and increasingly use subprograms (both static and global), scalar precision
      tracking and its (previously weak) support for BPF subprograms (and callbacks
      as a special case of that) is becoming more and more of an issue and
      limitation. Couple that with increasing reliance on state equivalence (BPF
      open-coded iterators have a hard requirement for state equivalence to converge
      and successfully validate loops), and it becomes pretty critical to address
      this limitation and make precision tracking universally supported for BPF
      programs of any complexity and composition.
      
      This patch set teaches BPF verifier to support SCALAR precision
      backpropagation across multiple frames (for subprogram calls and callback
      simulations) and addresses most practical situations (SCALAR stack
      loads/stores using registers other than r10 being the last remaining
      limitation, though thankfully rarely used in practice).
      
      Main logic is explained in details in patch #8. The rest are preliminary
      preparations, refactorings, clean ups, and fixes. See respective patches for
      details.
      
      Patch #8 has also veristat comparison of results for selftests, Cilium, and
      some of Meta production BPF programs before and after these changes.
      
      v2->v3:
        - drop bitcnt and ifs from bt_xxx() helpers (Alexei);
      v1->v2:
        - addressed review feedback form Alexei, adjusted commit messages, comments,
          added verbose(), WARN_ONCE(), etc;
        - re-ran all the tests and veristat on selftests, cilium, and meta-internal
          code: no new changes and no kernel warnings.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fbc0b025
    • Andrii Nakryiko's avatar
      selftests/bpf: revert iter test subprog precision workaround · c91ab90c
      Andrii Nakryiko authored
      Now that precision propagation is supported fully in the presence of
      subprogs, there is no need to work around iter test. Revert original
      workaround.
      
      This reverts be7dbd27 ("selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests").
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-11-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c91ab90c
    • Andrii Nakryiko's avatar
      selftests/bpf: add precision propagation tests in the presence of subprogs · 3ef3d217
      Andrii Nakryiko authored
      Add a bunch of tests validating verifier's precision backpropagation
      logic in the presence of subprog calls and/or callback-calling
      helpers/kfuncs.
      
      We validate the following conditions:
        - subprog_result_precise: static subprog r0 result precision handling;
        - global_subprog_result_precise: global subprog r0 precision
          shortcutting, similar to BPF helper handling;
        - callback_result_precise: similarly r0 marking precise for
          callback-calling helpers;
        - parent_callee_saved_reg_precise, parent_callee_saved_reg_precise_global:
          propagation of precision for callee-saved registers bypassing
          static/global subprogs;
        - parent_callee_saved_reg_precise_with_callback: same as above, but in
          the presence of callback-calling helper;
        - parent_stack_slot_precise, parent_stack_slot_precise_global:
          similar to above, but instead propagating precision of stack slot
          (spilled SCALAR reg);
        - parent_stack_slot_precise_with_callback: same as above, but in the
          presence of callback-calling helper;
        - subprog_arg_precise: propagation of precision of static subprog's
          input argument back to caller;
        - subprog_spill_into_parent_stack_slot_precise: negative test
          validating that verifier currently can't support backtracking of stack
          access with non-r10 register, we validate that we fallback to
          forcing precision for all SCALARs.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-10-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3ef3d217
    • Andrii Nakryiko's avatar
      bpf: support precision propagation in the presence of subprogs · fde2a388
      Andrii Nakryiko authored
      Add support precision backtracking in the presence of subprogram frames in
      jump history.
      
      This means supporting a few different kinds of subprogram invocation
      situations, all requiring a slightly different handling in precision
      backtracking handling logic:
        - static subprogram calls;
        - global subprogram calls;
        - callback-calling helpers/kfuncs.
      
      For each of those we need to handle a few precision propagation cases:
        - what to do with precision of subprog returns (r0);
        - what to do with precision of input arguments;
        - for all of them callee-saved registers in caller function should be
          propagated ignoring subprog/callback part of jump history.
      
      N.B. Async callback-calling helpers (currently only
      bpf_timer_set_callback()) are transparent to all this because they set
      a separate async callback environment and thus callback's history is not
      shared with main program's history. So as far as all the changes in this
      commit goes, such helper is just a regular helper.
      
      Let's look at all these situation in more details. Let's start with
      static subprogram being called, using an exxerpt of a simple main
      program and its static subprog, indenting subprog's frame slightly to
      make everything clear.
      
      frame 0				frame 1			precision set
      =======				=======			=============
      
       9: r6 = 456;
      10: r1 = 123;						fr0: r6
      11: call pc+10;						fr0: r1, r6
      				22: r0 = r1;		fr0: r6;     fr1: r1
      				23: exit		fr0: r6;     fr1: r0
      12: r1 = <map_pointer>					fr0: r0, r6
      13: r1 += r0;						fr0: r0, r6
      14: r1 += r6;						fr0: r6
      15: exit
      
      As can be seen above main function is passing 123 as single argument to
      an identity (`return x;`) subprog. Returned value is used to adjust map
      pointer offset, which forces r0 to be marked as precise. Then
      instruction #14 does the same for callee-saved r6, which will have to be
      backtracked all the way to instruction #9. For brevity, precision sets
      for instruction #13 and #14 are combined in the diagram above.
      
      First, for subprog calls, r0 returned from subprog (in frame 0) has to
      go into subprog's frame 1, and should be cleared from frame 0. So we go
      back into subprog's frame knowing we need to mark r0 precise. We then
      see that insn #22 sets r0 from r1, so now we care about marking r1
      precise.  When we pop up from subprog's frame back into caller at
      insn #11 we keep r1, as it's an argument-passing register, so we eventually
      find `10: r1 = 123;` and satify precision propagation chain for insn #13.
      
      This example demonstrates two sets of rules:
        - r0 returned after subprog call has to be moved into subprog's r0 set;
        - *static* subprog arguments (r1-r5) are moved back to caller precision set.
      
      Let's look at what happens with callee-saved precision propagation. Insn #14
      mark r6 as precise. When we get into subprog's frame, we keep r6 in
      frame 0's precision set *only*. Subprog itself has its own set of
      independent r6-r10 registers and is not affected. When we eventually
      made our way out of subprog frame we keep r6 in precision set until we
      reach `9: r6 = 456;`, satisfying propagation. r6-r10 propagation is
      perhaps the simplest aspect, it always stays in its original frame.
      
      That's pretty much all we have to do to support precision propagation
      across *static subprog* invocation.
      
      Let's look at what happens when we have global subprog invocation.
      
      frame 0				frame 1			precision set
      =======				=======			=============
      
       9: r6 = 456;
      10: r1 = 123;						fr0: r6
      11: call pc+10; # global subprog			fr0: r6
      12: r1 = <map_pointer>					fr0: r0, r6
      13: r1 += r0;						fr0: r0, r6
      14: r1 += r6;						fr0: r6;
      15: exit
      
      Starting from insn #13, r0 has to be precise. We backtrack all the way
      to insn #11 (call pc+10) and see that subprog is global, so was already
      validated in isolation. As opposed to static subprog, global subprog
      always returns unknown scalar r0, so that satisfies precision
      propagation and we drop r0 from precision set. We are done for insns #13.
      
      Now for insn #14. r6 is in precision set, we backtrack to `call pc+10;`.
      Here we need to recognize that this is effectively both exit and entry
      to global subprog, which means we stay in caller's frame. So we carry on
      with r6 still in precision set, until we satisfy it at insn #9. The only
      hard part with global subprogs is just knowing when it's a global func.
      
      Lastly, callback-calling helpers and kfuncs do simulate subprog calls,
      so jump history will have subprog instructions in between caller
      program's instructions, but the rules of propagating r0 and r1-r5
      differ, because we don't actually directly call callback. We actually
      call helper/kfunc, which at runtime will call subprog, so the only
      difference between normal helper/kfunc handling is that we need to make
      sure to skip callback simulatinog part of jump history.
      Let's look at an example to make this clearer.
      
      frame 0				frame 1			precision set
      =======				=======			=============
      
       8: r6 = 456;
       9: r1 = 123;						fr0: r6
      10: r2 = &callback;					fr0: r6
      11: call bpf_loop;					fr0: r6
      				22: r0 = r1;		fr0: r6      fr1:
      				23: exit		fr0: r6      fr1:
      12: r1 = <map_pointer>					fr0: r0, r6
      13: r1 += r0;						fr0: r0, r6
      14: r1 += r6;						fr0: r6;
      15: exit
      
      Again, insn #13 forces r0 to be precise. As soon as we get to `23: exit`
      we see that this isn't actually a static subprog call (it's `call
      bpf_loop;` helper call instead). So we clear r0 from precision set.
      
      For callee-saved register, there is no difference: it stays in frame 0's
      precision set, we go through insn #22 and #23, ignoring them until we
      get back to caller frame 0, eventually satisfying precision backtrack
      logic at insn #8 (`r6 = 456;`).
      
      Assuming callback needed to set r0 as precise at insn #23, we'd
      backtrack to insn #22, switching from r0 to r1, and then at the point
      when we pop back to frame 0 at insn #11, we'll clear r1-r5 from
      precision set, as we don't really do a subprog call directly, so there
      is no input argument precision propagation.
      
      That's pretty much it. With these changes, it seems like the only still
      unsupported situation for precision backpropagation is the case when
      program is accessing stack through registers other than r10. This is
      still left as unsupported (though rare) case for now.
      
      As for results. For selftests, few positive changes for bigger programs,
      cls_redirect in dynptr variant benefitting the most:
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results.csv ~/subprog-precise-after-results.csv -f @veristat.cfg -e file,prog,insns -f 'insns_diff!=0'
      File                                      Program        Insns (A)  Insns (B)  Insns     (DIFF)
      ----------------------------------------  -------------  ---------  ---------  ----------------
      pyperf600_bpf_loop.bpf.linked1.o          on_event            2060       2002      -58 (-2.82%)
      test_cls_redirect_dynptr.bpf.linked1.o    cls_redirect       15660       2914  -12746 (-81.39%)
      test_cls_redirect_subprogs.bpf.linked1.o  cls_redirect       61620      59088    -2532 (-4.11%)
      xdp_synproxy_kern.bpf.linked1.o           syncookie_tc      109980      86278  -23702 (-21.55%)
      xdp_synproxy_kern.bpf.linked1.o           syncookie_xdp      97716      85147  -12569 (-12.86%)
      
      Cilium progress don't really regress. They don't use subprogs and are
      mostly unaffected, but some other fixes and improvements could have
      changed something. This doesn't appear to be the case:
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-cilium.csv ~/subprog-precise-after-results-cilium.csv -e file,prog,insns -f 'insns_diff!=0'
      File           Program                         Insns (A)  Insns (B)  Insns (DIFF)
      -------------  ------------------------------  ---------  ---------  ------------
      bpf_host.o     tail_nodeport_nat_ingress_ipv6       4983       5003  +20 (+0.40%)
      bpf_lxc.o      tail_nodeport_nat_ingress_ipv6       4983       5003  +20 (+0.40%)
      bpf_overlay.o  tail_nodeport_nat_ingress_ipv6       4983       5003  +20 (+0.40%)
      bpf_xdp.o      tail_handle_nat_fwd_ipv6            12475      12504  +29 (+0.23%)
      bpf_xdp.o      tail_nodeport_nat_ingress_ipv6       6363       6371   +8 (+0.13%)
      
      Looking at (somewhat anonymized) Meta production programs, we see mostly
      insignificant variation in number of instructions, with one program
      (syar_bind6_protect6) benefitting the most at -17%.
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-fbcode.csv ~/subprog-precise-after-results-fbcode.csv -e prog,insns -f 'insns_diff!=0'
      Program                   Insns (A)  Insns (B)  Insns     (DIFF)
      ------------------------  ---------  ---------  ----------------
      on_request_context_event        597        585      -12 (-2.01%)
      read_async_py_stack           43789      43657     -132 (-0.30%)
      read_sync_py_stack            35041      37599    +2558 (+7.30%)
      rrm_usdt                        946        940       -6 (-0.63%)
      sysarmor_inet6_bind           28863      28249     -614 (-2.13%)
      sysarmor_inet_bind            28845      28240     -605 (-2.10%)
      syar_bind4_protect4          154145     147640    -6505 (-4.22%)
      syar_bind6_protect6          165242     137088  -28154 (-17.04%)
      syar_task_exit_setgid         21289      19720    -1569 (-7.37%)
      syar_task_exit_setuid         21290      19721    -1569 (-7.37%)
      do_uprobe                     19967      19413     -554 (-2.77%)
      tw_twfw_ingress              215877     204833   -11044 (-5.12%)
      tw_twfw_tc_in                215877     204833   -11044 (-5.12%)
      
      But checking duration (wall clock) differences, that is the actual time taken
      by verifier to validate programs, we see a sometimes dramatic improvements, all
      the way to about 16x improvements:
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-meta.csv ~/subprog-precise-after-results-meta.csv -e prog,duration -s duration_diff^ | head -n20
      Program                                   Duration (us) (A)  Duration (us) (B)  Duration (us) (DIFF)
      ----------------------------------------  -----------------  -----------------  --------------------
      tw_twfw_ingress                                     4488374             272836    -4215538 (-93.92%)
      tw_twfw_tc_in                                       4339111             268175    -4070936 (-93.82%)
      tw_twfw_egress                                      3521816             270751    -3251065 (-92.31%)
      tw_twfw_tc_eg                                       3472878             284294    -3188584 (-91.81%)
      balancer_ingress                                     343119             291391      -51728 (-15.08%)
      syar_bind6_protect6                                   78992              64782      -14210 (-17.99%)
      ttls_tc_ingress                                       11739               8176       -3563 (-30.35%)
      kprobe__security_inode_link                           13864              11341       -2523 (-18.20%)
      read_sync_py_stack                                    21927              19442       -2485 (-11.33%)
      read_async_py_stack                                   30444              28136        -2308 (-7.58%)
      syar_task_exit_setuid                                 10256               8440       -1816 (-17.71%)
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-9-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fde2a388
    • Andrii Nakryiko's avatar
      bpf: fix mark_all_scalars_precise use in mark_chain_precision · c50c0b57
      Andrii Nakryiko authored
      When precision backtracking bails out due to some unsupported sequence
      of instructions (e.g., stack access through register other than r10), we
      need to mark all SCALAR registers as precise to be safe. Currently,
      though, we mark SCALARs precise only starting from the state we detected
      unsupported condition, which could be one of the parent states of the
      actual current state. This will leave some registers potentially not
      marked as precise, even though they should. So make sure we start
      marking scalars as precise from current state (env->cur_state).
      
      Further, we don't currently detect a situation when we end up with some
      stack slots marked as needing precision, but we ran out of available
      states to find the instructions that populate those stack slots. This is
      akin the `i >= func->allocated_stack / BPF_REG_SIZE` check and should be
      handled similarly by falling back to marking all SCALARs precise. Add
      this check when we run out of states.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-8-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c50c0b57
    • Andrii Nakryiko's avatar
      bpf: fix propagate_precision() logic for inner frames · f655badf
      Andrii Nakryiko authored
      Fix propagate_precision() logic to perform propagation of all necessary
      registers and stack slots across all active frames *in one batch step*.
      
      Doing this for each register/slot in each individual frame is wasteful,
      but the main problem is that backtracking of instruction in any frame
      except the deepest one just doesn't work. This is due to backtracking
      logic relying on jump history, and available jump history always starts
      (or ends, depending how you view it) in current frame. So, if
      prog A (frame #0) called subprog B (frame #1) and we need to propagate
      precision of, say, register R6 (callee-saved) within frame #0, we
      actually don't even know where jump history that corresponds to prog
      A even starts. We'd need to skip subprog part of jump history first to
      be able to do this.
      
      Luckily, with struct backtrack_state and __mark_chain_precision()
      handling bitmasks tracking/propagation across all active frames at the
      same time (added in previous patch), propagate_precision() can be both
      fixed and sped up by setting all the necessary bits across all frames
      and then performing one __mark_chain_precision() pass. This makes it
      unnecessary to skip subprog parts of jump history.
      
      We also improve logging along the way, to clearly specify which
      registers' and slots' precision markings are propagated within which
      frame. Each frame will have dedicated line and all registers and stack
      slots from that frame will be reported in format similar to precision
      backtrack regs/stack logging. E.g.:
      
      frame 1: propagating r1,r2,r3,fp-8,fp-16
      frame 0: propagating r3,r9,fp-120
      
      Fixes: 529409ea ("bpf: propagate precision across all frames, not just the last one")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-7-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f655badf
    • Andrii Nakryiko's avatar
      bpf: maintain bitmasks across all active frames in __mark_chain_precision · 1ef22b68
      Andrii Nakryiko authored
      Teach __mark_chain_precision logic to maintain register/stack masks
      across all active frames when going from child state to parent state.
      Currently this should be mostly no-op, as precision backtracking usually
      bails out when encountering subprog entry/exit.
      
      It's not very apparent from the diff due to increased indentation, but
      the logic remains the same, except everything is done on specific `fr`
      frame index. Calls to bt_clear_reg() and bt_clear_slot() are replaced
      with frame-specific bt_clear_frame_reg() and bt_clear_frame_slot(),
      where frame index is passed explicitly, instead of using current frame
      number.
      
      We also adjust logging to emit affected frame number. And we also add
      better logging of human-readable register and stack slot masks, similar
      to previous patch.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-6-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1ef22b68
    • Andrii Nakryiko's avatar
      bpf: improve precision backtrack logging · d9439c21
      Andrii Nakryiko authored
      Add helper to format register and stack masks in more human-readable
      format. Adjust logging a bit during backtrack propagation and especially
      during forcing precision fallback logic to make it clearer what's going
      on (with log_level=2, of course), and also start reporting affected
      frame depth. This is in preparation for having more than one active
      frame later when precision propagation between subprog calls is added.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-5-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d9439c21
    • Andrii Nakryiko's avatar
      bpf: encapsulate precision backtracking bookkeeping · 407958a0
      Andrii Nakryiko authored
      Add struct backtrack_state and straightforward API around it to keep
      track of register and stack masks used and maintained during precision
      backtracking process. Having this logic separately allow to keep
      high-level backtracking algorithm cleaner, but also it sets us up to
      cleanly keep track of register and stack masks per frame, allowing (with
      some further logic adjustments) to perform precision backpropagation
      across multiple frames (i.e., subprog calls).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      407958a0
    • Andrii Nakryiko's avatar
      bpf: mark relevant stack slots scratched for register read instructions · e0bf4622
      Andrii Nakryiko authored
      When handling instructions that read register slots, mark relevant stack
      slots as scratched so that verifier log would contain those slots' states, in
      addition to currently emitted registers with stack slot offsets.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e0bf4622
    • Andrii Nakryiko's avatar
      veristat: add -t flag for adding BPF_F_TEST_STATE_FREQ program flag · 5956f301
      Andrii Nakryiko authored
      Sometimes during debugging it's important that BPF program is loaded
      with BPF_F_TEST_STATE_FREQ flag set to force verifier to do frequent
      state checkpointing. Teach veristat to do this when -t ("test state")
      flag is specified.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5956f301
    • Kenjiro Nakayama's avatar
      libbpf: Fix comment about arc and riscv arch in bpf_tracing.h · 7866fc6a
      Kenjiro Nakayama authored
      To make comments about arc and riscv arch in bpf_tracing.h accurate,
      this patch fixes the comment about arc and adds the comment for riscv.
      Signed-off-by: default avatarKenjiro Nakayama <nakayamakenjiro@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230504035443.427927-1-nakayamakenjiro@gmail.com
      7866fc6a
  5. 02 May, 2023 2 commits
  6. 01 May, 2023 4 commits
  7. 28 Apr, 2023 1 commit
  8. 27 Apr, 2023 3 commits
    • Yonghong Song's avatar
      selftests/bpf: Fix selftest test_global_funcs/global_func1 failure with latest clang · f1f5553d
      Yonghong Song authored
      The selftest test_global_funcs/global_func1 failed with the latest clang17.
      The reason is due to upstream ArgumentPromotionPass ([1]),
      which may manipulate static function parameters and cause inlining
      although the funciton is marked as noinline.
      
      The original code:
        static __attribute__ ((noinline))
        int f0(int var, struct __sk_buff *skb)
        {
              return skb->len;
        }
      
        __attribute__ ((noinline))
        int f1(struct __sk_buff *skb)
        {
      	...
              return f0(0, skb) + skb->len;
        }
      
        ...
      
        SEC("tc")
        __failure __msg("combined stack size of 4 calls is 544")
        int global_func1(struct __sk_buff *skb)
        {
              return f0(1, skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4);
        }
      
      After ArgumentPromotionPass, the code is translated to
        static __attribute__ ((noinline))
        int f0(int var, int skb_len)
        {
              return skb_len;
        }
      
        __attribute__ ((noinline))
        int f1(struct __sk_buff *skb)
        {
      	...
              return f0(0, skb->len) + skb->len;
        }
      
        ...
      
        SEC("tc")
        __failure __msg("combined stack size of 4 calls is 544")
        int global_func1(struct __sk_buff *skb)
        {
              return f0(1, skb->len) + f1(skb) + f2(2, skb) + f3(3, skb, 4);
        }
      
      And later llvm InstCombine phase recognized that f0()
      simplify returns the value of the second argument and removed f0()
      completely and the final code looks like:
        __attribute__ ((noinline))
        int f1(struct __sk_buff *skb)
        {
      	...
              return skb->len + skb->len;
        }
      
        ...
      
        SEC("tc")
        __failure __msg("combined stack size of 4 calls is 544")
        int global_func1(struct __sk_buff *skb)
        {
              return skb->len + f1(skb) + f2(2, skb) + f3(3, skb, 4);
        }
      
      If f0() is not inlined, the verification will fail with stack size
      544 for a particular callchain. With f0() inlined, the maximum
      stack size is 512 which is in the limit.
      
      Let us add a `asm volatile ("")` in f0() to prevent ArgumentPromotionPass
      from hoisting the code to its caller, and this fixed the test failure.
      
        [1] https://reviews.llvm.org/D148269Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230425174744.1758515-1-yhs@fb.com
      f1f5553d
    • Xueming Feng's avatar
      bpftool: Dump map id instead of value for map_of_maps types · bf06c939
      Xueming Feng authored
      When using `bpftool map dump` with map_of_maps, it is usually
      more convenient to show the inner map id instead of raw value.
      
      We are changing the plain print behavior to show inner_map_id
      instead of hex value, this would help with quick look up of
      inner map with `bpftool map dump id <inner_map_id>`.
      To avoid disrupting scripted behavior, we will add a new
      `inner_map_id` field to json output instead of replacing value.
      
      plain print:
      ```
      $ bpftool map dump id 138
      
      Without Patch:
      key:
      fc 00 00 00 00 00 00 00  00 00 00 00 00 00 00 05
      27 16 06 00
      value:
      8b 00 00 00
      Found 1 element
      
      With Patch:
      key:
      fc 00 00 00 00 00 00 00  00 00 00 00 00 00 00 05
      27 16 06 00
      inner_map_id:
      139
      Found 1 element
      ```
      
      json print:
      ```
      $ bpftool -p map dump id 567
      
      Without Patch:
      [{
              "key": ["0xc0","0x00","0x02","0x05","0x27","0x16","0x06","0x00"
              ],
              "value": ["0x38","0x02","0x00","0x00"
              ]
          }
      ]
      
      With Patch:
      [{
              "key": ["0xc0","0x00","0x02","0x05","0x27","0x16","0x06","0x00"
              ],
              "value": ["0x38","0x02","0x00","0x00"
              ],
              "inner_map_id": 568
          }
      ]
      ```
      Signed-off-by: default avatarXueming Feng <kuro@kuroa.me>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230427120313.43574-1-kuro@kuroa.meSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      bf06c939
    • Kal Conley's avatar
      xsk: Use pool->dma_pages to check for DMA · 6ec7be9a
      Kal Conley authored
      Compare pool->dma_pages instead of pool->dma_pages_cnt to check for an
      active DMA mapping. pool->dma_pages needs to be read anyway to access
      the map so this compiles to more efficient code.
      Signed-off-by: default avatarKal Conley <kal.conley@dectris.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20230423180157.93559-1-kal.conley@dectris.com
      6ec7be9a