1. 05 May, 2023 13 commits
    • Will Hawkins's avatar
      bpf, docs: Update llvm_relocs.rst with typo fixes · 69535186
      Will Hawkins authored
      Correct a few typographical errors and fix some mistakes in examples.
      Signed-off-by: default avatarWill Hawkins <hawkinsw@obs.cr>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230428023015.1698072-2-hawkinsw@obs.crSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      69535186
    • Alexei Starovoitov's avatar
      Merge branch 'Add precision propagation for subprogs and callbacks' · fbc0b025
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      As more and more real-world BPF programs become more complex
      and increasingly use subprograms (both static and global), scalar precision
      tracking and its (previously weak) support for BPF subprograms (and callbacks
      as a special case of that) is becoming more and more of an issue and
      limitation. Couple that with increasing reliance on state equivalence (BPF
      open-coded iterators have a hard requirement for state equivalence to converge
      and successfully validate loops), and it becomes pretty critical to address
      this limitation and make precision tracking universally supported for BPF
      programs of any complexity and composition.
      
      This patch set teaches BPF verifier to support SCALAR precision
      backpropagation across multiple frames (for subprogram calls and callback
      simulations) and addresses most practical situations (SCALAR stack
      loads/stores using registers other than r10 being the last remaining
      limitation, though thankfully rarely used in practice).
      
      Main logic is explained in details in patch #8. The rest are preliminary
      preparations, refactorings, clean ups, and fixes. See respective patches for
      details.
      
      Patch #8 has also veristat comparison of results for selftests, Cilium, and
      some of Meta production BPF programs before and after these changes.
      
      v2->v3:
        - drop bitcnt and ifs from bt_xxx() helpers (Alexei);
      v1->v2:
        - addressed review feedback form Alexei, adjusted commit messages, comments,
          added verbose(), WARN_ONCE(), etc;
        - re-ran all the tests and veristat on selftests, cilium, and meta-internal
          code: no new changes and no kernel warnings.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fbc0b025
    • Andrii Nakryiko's avatar
      selftests/bpf: revert iter test subprog precision workaround · c91ab90c
      Andrii Nakryiko authored
      Now that precision propagation is supported fully in the presence of
      subprogs, there is no need to work around iter test. Revert original
      workaround.
      
      This reverts be7dbd27 ("selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests").
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-11-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c91ab90c
    • Andrii Nakryiko's avatar
      selftests/bpf: add precision propagation tests in the presence of subprogs · 3ef3d217
      Andrii Nakryiko authored
      Add a bunch of tests validating verifier's precision backpropagation
      logic in the presence of subprog calls and/or callback-calling
      helpers/kfuncs.
      
      We validate the following conditions:
        - subprog_result_precise: static subprog r0 result precision handling;
        - global_subprog_result_precise: global subprog r0 precision
          shortcutting, similar to BPF helper handling;
        - callback_result_precise: similarly r0 marking precise for
          callback-calling helpers;
        - parent_callee_saved_reg_precise, parent_callee_saved_reg_precise_global:
          propagation of precision for callee-saved registers bypassing
          static/global subprogs;
        - parent_callee_saved_reg_precise_with_callback: same as above, but in
          the presence of callback-calling helper;
        - parent_stack_slot_precise, parent_stack_slot_precise_global:
          similar to above, but instead propagating precision of stack slot
          (spilled SCALAR reg);
        - parent_stack_slot_precise_with_callback: same as above, but in the
          presence of callback-calling helper;
        - subprog_arg_precise: propagation of precision of static subprog's
          input argument back to caller;
        - subprog_spill_into_parent_stack_slot_precise: negative test
          validating that verifier currently can't support backtracking of stack
          access with non-r10 register, we validate that we fallback to
          forcing precision for all SCALARs.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-10-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3ef3d217
    • Andrii Nakryiko's avatar
      bpf: support precision propagation in the presence of subprogs · fde2a388
      Andrii Nakryiko authored
      Add support precision backtracking in the presence of subprogram frames in
      jump history.
      
      This means supporting a few different kinds of subprogram invocation
      situations, all requiring a slightly different handling in precision
      backtracking handling logic:
        - static subprogram calls;
        - global subprogram calls;
        - callback-calling helpers/kfuncs.
      
      For each of those we need to handle a few precision propagation cases:
        - what to do with precision of subprog returns (r0);
        - what to do with precision of input arguments;
        - for all of them callee-saved registers in caller function should be
          propagated ignoring subprog/callback part of jump history.
      
      N.B. Async callback-calling helpers (currently only
      bpf_timer_set_callback()) are transparent to all this because they set
      a separate async callback environment and thus callback's history is not
      shared with main program's history. So as far as all the changes in this
      commit goes, such helper is just a regular helper.
      
      Let's look at all these situation in more details. Let's start with
      static subprogram being called, using an exxerpt of a simple main
      program and its static subprog, indenting subprog's frame slightly to
      make everything clear.
      
      frame 0				frame 1			precision set
      =======				=======			=============
      
       9: r6 = 456;
      10: r1 = 123;						fr0: r6
      11: call pc+10;						fr0: r1, r6
      				22: r0 = r1;		fr0: r6;     fr1: r1
      				23: exit		fr0: r6;     fr1: r0
      12: r1 = <map_pointer>					fr0: r0, r6
      13: r1 += r0;						fr0: r0, r6
      14: r1 += r6;						fr0: r6
      15: exit
      
      As can be seen above main function is passing 123 as single argument to
      an identity (`return x;`) subprog. Returned value is used to adjust map
      pointer offset, which forces r0 to be marked as precise. Then
      instruction #14 does the same for callee-saved r6, which will have to be
      backtracked all the way to instruction #9. For brevity, precision sets
      for instruction #13 and #14 are combined in the diagram above.
      
      First, for subprog calls, r0 returned from subprog (in frame 0) has to
      go into subprog's frame 1, and should be cleared from frame 0. So we go
      back into subprog's frame knowing we need to mark r0 precise. We then
      see that insn #22 sets r0 from r1, so now we care about marking r1
      precise.  When we pop up from subprog's frame back into caller at
      insn #11 we keep r1, as it's an argument-passing register, so we eventually
      find `10: r1 = 123;` and satify precision propagation chain for insn #13.
      
      This example demonstrates two sets of rules:
        - r0 returned after subprog call has to be moved into subprog's r0 set;
        - *static* subprog arguments (r1-r5) are moved back to caller precision set.
      
      Let's look at what happens with callee-saved precision propagation. Insn #14
      mark r6 as precise. When we get into subprog's frame, we keep r6 in
      frame 0's precision set *only*. Subprog itself has its own set of
      independent r6-r10 registers and is not affected. When we eventually
      made our way out of subprog frame we keep r6 in precision set until we
      reach `9: r6 = 456;`, satisfying propagation. r6-r10 propagation is
      perhaps the simplest aspect, it always stays in its original frame.
      
      That's pretty much all we have to do to support precision propagation
      across *static subprog* invocation.
      
      Let's look at what happens when we have global subprog invocation.
      
      frame 0				frame 1			precision set
      =======				=======			=============
      
       9: r6 = 456;
      10: r1 = 123;						fr0: r6
      11: call pc+10; # global subprog			fr0: r6
      12: r1 = <map_pointer>					fr0: r0, r6
      13: r1 += r0;						fr0: r0, r6
      14: r1 += r6;						fr0: r6;
      15: exit
      
      Starting from insn #13, r0 has to be precise. We backtrack all the way
      to insn #11 (call pc+10) and see that subprog is global, so was already
      validated in isolation. As opposed to static subprog, global subprog
      always returns unknown scalar r0, so that satisfies precision
      propagation and we drop r0 from precision set. We are done for insns #13.
      
      Now for insn #14. r6 is in precision set, we backtrack to `call pc+10;`.
      Here we need to recognize that this is effectively both exit and entry
      to global subprog, which means we stay in caller's frame. So we carry on
      with r6 still in precision set, until we satisfy it at insn #9. The only
      hard part with global subprogs is just knowing when it's a global func.
      
      Lastly, callback-calling helpers and kfuncs do simulate subprog calls,
      so jump history will have subprog instructions in between caller
      program's instructions, but the rules of propagating r0 and r1-r5
      differ, because we don't actually directly call callback. We actually
      call helper/kfunc, which at runtime will call subprog, so the only
      difference between normal helper/kfunc handling is that we need to make
      sure to skip callback simulatinog part of jump history.
      Let's look at an example to make this clearer.
      
      frame 0				frame 1			precision set
      =======				=======			=============
      
       8: r6 = 456;
       9: r1 = 123;						fr0: r6
      10: r2 = &callback;					fr0: r6
      11: call bpf_loop;					fr0: r6
      				22: r0 = r1;		fr0: r6      fr1:
      				23: exit		fr0: r6      fr1:
      12: r1 = <map_pointer>					fr0: r0, r6
      13: r1 += r0;						fr0: r0, r6
      14: r1 += r6;						fr0: r6;
      15: exit
      
      Again, insn #13 forces r0 to be precise. As soon as we get to `23: exit`
      we see that this isn't actually a static subprog call (it's `call
      bpf_loop;` helper call instead). So we clear r0 from precision set.
      
      For callee-saved register, there is no difference: it stays in frame 0's
      precision set, we go through insn #22 and #23, ignoring them until we
      get back to caller frame 0, eventually satisfying precision backtrack
      logic at insn #8 (`r6 = 456;`).
      
      Assuming callback needed to set r0 as precise at insn #23, we'd
      backtrack to insn #22, switching from r0 to r1, and then at the point
      when we pop back to frame 0 at insn #11, we'll clear r1-r5 from
      precision set, as we don't really do a subprog call directly, so there
      is no input argument precision propagation.
      
      That's pretty much it. With these changes, it seems like the only still
      unsupported situation for precision backpropagation is the case when
      program is accessing stack through registers other than r10. This is
      still left as unsupported (though rare) case for now.
      
      As for results. For selftests, few positive changes for bigger programs,
      cls_redirect in dynptr variant benefitting the most:
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results.csv ~/subprog-precise-after-results.csv -f @veristat.cfg -e file,prog,insns -f 'insns_diff!=0'
      File                                      Program        Insns (A)  Insns (B)  Insns     (DIFF)
      ----------------------------------------  -------------  ---------  ---------  ----------------
      pyperf600_bpf_loop.bpf.linked1.o          on_event            2060       2002      -58 (-2.82%)
      test_cls_redirect_dynptr.bpf.linked1.o    cls_redirect       15660       2914  -12746 (-81.39%)
      test_cls_redirect_subprogs.bpf.linked1.o  cls_redirect       61620      59088    -2532 (-4.11%)
      xdp_synproxy_kern.bpf.linked1.o           syncookie_tc      109980      86278  -23702 (-21.55%)
      xdp_synproxy_kern.bpf.linked1.o           syncookie_xdp      97716      85147  -12569 (-12.86%)
      
      Cilium progress don't really regress. They don't use subprogs and are
      mostly unaffected, but some other fixes and improvements could have
      changed something. This doesn't appear to be the case:
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-cilium.csv ~/subprog-precise-after-results-cilium.csv -e file,prog,insns -f 'insns_diff!=0'
      File           Program                         Insns (A)  Insns (B)  Insns (DIFF)
      -------------  ------------------------------  ---------  ---------  ------------
      bpf_host.o     tail_nodeport_nat_ingress_ipv6       4983       5003  +20 (+0.40%)
      bpf_lxc.o      tail_nodeport_nat_ingress_ipv6       4983       5003  +20 (+0.40%)
      bpf_overlay.o  tail_nodeport_nat_ingress_ipv6       4983       5003  +20 (+0.40%)
      bpf_xdp.o      tail_handle_nat_fwd_ipv6            12475      12504  +29 (+0.23%)
      bpf_xdp.o      tail_nodeport_nat_ingress_ipv6       6363       6371   +8 (+0.13%)
      
      Looking at (somewhat anonymized) Meta production programs, we see mostly
      insignificant variation in number of instructions, with one program
      (syar_bind6_protect6) benefitting the most at -17%.
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-fbcode.csv ~/subprog-precise-after-results-fbcode.csv -e prog,insns -f 'insns_diff!=0'
      Program                   Insns (A)  Insns (B)  Insns     (DIFF)
      ------------------------  ---------  ---------  ----------------
      on_request_context_event        597        585      -12 (-2.01%)
      read_async_py_stack           43789      43657     -132 (-0.30%)
      read_sync_py_stack            35041      37599    +2558 (+7.30%)
      rrm_usdt                        946        940       -6 (-0.63%)
      sysarmor_inet6_bind           28863      28249     -614 (-2.13%)
      sysarmor_inet_bind            28845      28240     -605 (-2.10%)
      syar_bind4_protect4          154145     147640    -6505 (-4.22%)
      syar_bind6_protect6          165242     137088  -28154 (-17.04%)
      syar_task_exit_setgid         21289      19720    -1569 (-7.37%)
      syar_task_exit_setuid         21290      19721    -1569 (-7.37%)
      do_uprobe                     19967      19413     -554 (-2.77%)
      tw_twfw_ingress              215877     204833   -11044 (-5.12%)
      tw_twfw_tc_in                215877     204833   -11044 (-5.12%)
      
      But checking duration (wall clock) differences, that is the actual time taken
      by verifier to validate programs, we see a sometimes dramatic improvements, all
      the way to about 16x improvements:
      
      [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-meta.csv ~/subprog-precise-after-results-meta.csv -e prog,duration -s duration_diff^ | head -n20
      Program                                   Duration (us) (A)  Duration (us) (B)  Duration (us) (DIFF)
      ----------------------------------------  -----------------  -----------------  --------------------
      tw_twfw_ingress                                     4488374             272836    -4215538 (-93.92%)
      tw_twfw_tc_in                                       4339111             268175    -4070936 (-93.82%)
      tw_twfw_egress                                      3521816             270751    -3251065 (-92.31%)
      tw_twfw_tc_eg                                       3472878             284294    -3188584 (-91.81%)
      balancer_ingress                                     343119             291391      -51728 (-15.08%)
      syar_bind6_protect6                                   78992              64782      -14210 (-17.99%)
      ttls_tc_ingress                                       11739               8176       -3563 (-30.35%)
      kprobe__security_inode_link                           13864              11341       -2523 (-18.20%)
      read_sync_py_stack                                    21927              19442       -2485 (-11.33%)
      read_async_py_stack                                   30444              28136        -2308 (-7.58%)
      syar_task_exit_setuid                                 10256               8440       -1816 (-17.71%)
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-9-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fde2a388
    • Andrii Nakryiko's avatar
      bpf: fix mark_all_scalars_precise use in mark_chain_precision · c50c0b57
      Andrii Nakryiko authored
      When precision backtracking bails out due to some unsupported sequence
      of instructions (e.g., stack access through register other than r10), we
      need to mark all SCALAR registers as precise to be safe. Currently,
      though, we mark SCALARs precise only starting from the state we detected
      unsupported condition, which could be one of the parent states of the
      actual current state. This will leave some registers potentially not
      marked as precise, even though they should. So make sure we start
      marking scalars as precise from current state (env->cur_state).
      
      Further, we don't currently detect a situation when we end up with some
      stack slots marked as needing precision, but we ran out of available
      states to find the instructions that populate those stack slots. This is
      akin the `i >= func->allocated_stack / BPF_REG_SIZE` check and should be
      handled similarly by falling back to marking all SCALARs precise. Add
      this check when we run out of states.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-8-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c50c0b57
    • Andrii Nakryiko's avatar
      bpf: fix propagate_precision() logic for inner frames · f655badf
      Andrii Nakryiko authored
      Fix propagate_precision() logic to perform propagation of all necessary
      registers and stack slots across all active frames *in one batch step*.
      
      Doing this for each register/slot in each individual frame is wasteful,
      but the main problem is that backtracking of instruction in any frame
      except the deepest one just doesn't work. This is due to backtracking
      logic relying on jump history, and available jump history always starts
      (or ends, depending how you view it) in current frame. So, if
      prog A (frame #0) called subprog B (frame #1) and we need to propagate
      precision of, say, register R6 (callee-saved) within frame #0, we
      actually don't even know where jump history that corresponds to prog
      A even starts. We'd need to skip subprog part of jump history first to
      be able to do this.
      
      Luckily, with struct backtrack_state and __mark_chain_precision()
      handling bitmasks tracking/propagation across all active frames at the
      same time (added in previous patch), propagate_precision() can be both
      fixed and sped up by setting all the necessary bits across all frames
      and then performing one __mark_chain_precision() pass. This makes it
      unnecessary to skip subprog parts of jump history.
      
      We also improve logging along the way, to clearly specify which
      registers' and slots' precision markings are propagated within which
      frame. Each frame will have dedicated line and all registers and stack
      slots from that frame will be reported in format similar to precision
      backtrack regs/stack logging. E.g.:
      
      frame 1: propagating r1,r2,r3,fp-8,fp-16
      frame 0: propagating r3,r9,fp-120
      
      Fixes: 529409ea ("bpf: propagate precision across all frames, not just the last one")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-7-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f655badf
    • Andrii Nakryiko's avatar
      bpf: maintain bitmasks across all active frames in __mark_chain_precision · 1ef22b68
      Andrii Nakryiko authored
      Teach __mark_chain_precision logic to maintain register/stack masks
      across all active frames when going from child state to parent state.
      Currently this should be mostly no-op, as precision backtracking usually
      bails out when encountering subprog entry/exit.
      
      It's not very apparent from the diff due to increased indentation, but
      the logic remains the same, except everything is done on specific `fr`
      frame index. Calls to bt_clear_reg() and bt_clear_slot() are replaced
      with frame-specific bt_clear_frame_reg() and bt_clear_frame_slot(),
      where frame index is passed explicitly, instead of using current frame
      number.
      
      We also adjust logging to emit affected frame number. And we also add
      better logging of human-readable register and stack slot masks, similar
      to previous patch.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-6-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1ef22b68
    • Andrii Nakryiko's avatar
      bpf: improve precision backtrack logging · d9439c21
      Andrii Nakryiko authored
      Add helper to format register and stack masks in more human-readable
      format. Adjust logging a bit during backtrack propagation and especially
      during forcing precision fallback logic to make it clearer what's going
      on (with log_level=2, of course), and also start reporting affected
      frame depth. This is in preparation for having more than one active
      frame later when precision propagation between subprog calls is added.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-5-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d9439c21
    • Andrii Nakryiko's avatar
      bpf: encapsulate precision backtracking bookkeeping · 407958a0
      Andrii Nakryiko authored
      Add struct backtrack_state and straightforward API around it to keep
      track of register and stack masks used and maintained during precision
      backtracking process. Having this logic separately allow to keep
      high-level backtracking algorithm cleaner, but also it sets us up to
      cleanly keep track of register and stack masks per frame, allowing (with
      some further logic adjustments) to perform precision backpropagation
      across multiple frames (i.e., subprog calls).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      407958a0
    • Andrii Nakryiko's avatar
      bpf: mark relevant stack slots scratched for register read instructions · e0bf4622
      Andrii Nakryiko authored
      When handling instructions that read register slots, mark relevant stack
      slots as scratched so that verifier log would contain those slots' states, in
      addition to currently emitted registers with stack slot offsets.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e0bf4622
    • Andrii Nakryiko's avatar
      veristat: add -t flag for adding BPF_F_TEST_STATE_FREQ program flag · 5956f301
      Andrii Nakryiko authored
      Sometimes during debugging it's important that BPF program is loaded
      with BPF_F_TEST_STATE_FREQ flag set to force verifier to do frequent
      state checkpointing. Teach veristat to do this when -t ("test state")
      flag is specified.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230505043317.3629845-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5956f301
    • Kenjiro Nakayama's avatar
      libbpf: Fix comment about arc and riscv arch in bpf_tracing.h · 7866fc6a
      Kenjiro Nakayama authored
      To make comments about arc and riscv arch in bpf_tracing.h accurate,
      this patch fixes the comment about arc and adds the comment for riscv.
      Signed-off-by: default avatarKenjiro Nakayama <nakayamakenjiro@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230504035443.427927-1-nakayamakenjiro@gmail.com
      7866fc6a
  2. 02 May, 2023 2 commits
  3. 01 May, 2023 4 commits
  4. 28 Apr, 2023 1 commit
  5. 27 Apr, 2023 18 commits
  6. 26 Apr, 2023 2 commits
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 6e98b09d
      Linus Torvalds authored
      Pull networking updates from Paolo Abeni:
       "Core:
      
         - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
           default value allows for better BIG TCP performances
      
         - Reduce compound page head access for zero-copy data transfers
      
         - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
           possible
      
         - Threaded NAPI improvements, adding defer skb free support and
           unneeded softirq avoidance
      
         - Address dst_entry reference count scalability issues, via false
           sharing avoidance and optimize refcount tracking
      
         - Add lockless accesses annotation to sk_err[_soft]
      
         - Optimize again the skb struct layout
      
         - Extends the skb drop reasons to make it usable by multiple
           subsystems
      
         - Better const qualifier awareness for socket casts
      
        BPF:
      
         - Add skb and XDP typed dynptrs which allow BPF programs for more
           ergonomic and less brittle iteration through data and
           variable-sized accesses
      
         - Add a new BPF netfilter program type and minimal support to hook
           BPF programs to netfilter hooks such as prerouting or forward
      
         - Add more precise memory usage reporting for all BPF map types
      
         - Adds support for using {FOU,GUE} encap with an ipip device
           operating in collect_md mode and add a set of BPF kfuncs for
           controlling encap params
      
         - Allow BPF programs to detect at load time whether a particular
           kfunc exists or not, and also add support for this in light
           skeleton
      
         - Bigger batch of BPF verifier improvements to prepare for upcoming
           BPF open-coded iterators allowing for less restrictive looping
           capabilities
      
         - Rework RCU enforcement in the verifier, add kptr_rcu and enforce
           BPF programs to NULL-check before passing such pointers into kfunc
      
         - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
           in local storage maps
      
         - Enable RCU semantics for task BPF kptrs and allow referenced kptr
           tasks to be stored in BPF maps
      
         - Add support for refcounted local kptrs to the verifier for allowing
           shared ownership, useful for adding a node to both the BPF list and
           rbtree
      
         - Add BPF verifier support for ST instructions in
           convert_ctx_access() which will help new -mcpu=v4 clang flag to
           start emitting them
      
         - Add ARM32 USDT support to libbpf
      
         - Improve bpftool's visual program dump which produces the control
           flow graph in a DOT format by adding C source inline annotations
      
        Protocols:
      
         - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
           indicates the provenance of the IP address
      
         - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
      
         - Add the handshake upcall mechanism, allowing the user-space to
           implement generic TLS handshake on kernel's behalf
      
         - Bridge: support per-{Port, VLAN} neighbor suppression, increasing
           resilience to nodes failures
      
         - SCTP: add support for Fair Capacity and Weighted Fair Queueing
           schedulers
      
         - MPTCP: delay first subflow allocation up to its first usage. This
           will allow for later better LSM interaction
      
         - xfrm: Remove inner/outer modes from input/output path. These are
           not needed anymore
      
         - WiFi:
            - reduced neighbor report (RNR) handling for AP mode
            - HW timestamping support
            - support for randomized auth/deauth TA for PASN privacy
            - per-link debugfs for multi-link
            - TC offload support for mac80211 drivers
            - mac80211 mesh fast-xmit and fast-rx support
            - enable Wi-Fi 7 (EHT) mesh support
      
        Netfilter:
      
         - Add nf_tables 'brouting' support, to force a packet to be routed
           instead of being bridged
      
         - Update bridge netfilter and ovs conntrack helpers to handle IPv6
           Jumbo packets properly, i.e. fetch the packet length from
           hop-by-hop extension header. This is needed for BIT TCP support
      
         - The iptables 32bit compat interface isn't compiled in by default
           anymore
      
         - Move ip(6)tables builtin icmp matches to the udptcp one. This has
           the advantage that icmp/icmpv6 match doesn't load the
           iptables/ip6tables modules anymore when iptables-nft is used
      
         - Extended netlink error report for netdevice in flowtables and
           netdev/chains. Allow for incrementally add/delete devices to netdev
           basechain. Allow to create netdev chain without device
      
        Driver API:
      
         - Remove redundant Device Control Error Reporting Enable, as PCI core
           has already error reporting enabled at enumeration time
      
         - Move Multicast DB netlink handlers to core, allowing devices other
           then bridge to use them
      
         - Allow the page_pool to directly recycle the pages from safely
           localized NAPI
      
         - Implement lockless TX queue stop/wake combo macros, allowing for
           further code de-duplication and sanitization
      
         - Add YNL support for user headers and struct attrs
      
         - Add partial YNL specification for devlink
      
         - Add partial YNL specification for ethtool
      
         - Add tc-mqprio and tc-taprio support for preemptible traffic classes
      
         - Add tx push buf len param to ethtool, specifies the maximum number
           of bytes of a transmitted packet a driver can push directly to the
           underlying device
      
         - Add basic LED support for switch/phy
      
         - Add NAPI documentation, stop relaying on external links
      
         - Convert dsa_master_ioctl() to netdev notifier. This is a
           preparatory work to make the hardware timestamping layer selectable
           by user space
      
         - Add transceiver support and improve the error messages for CAN-FD
           controllers
      
        New hardware / drivers:
      
         - Ethernet:
            - AMD/Pensando core device support
            - MediaTek MT7981 SoC
            - MediaTek MT7988 SoC
            - Broadcom BCM53134 embedded switch
            - Texas Instruments CPSW9G ethernet switch
            - Qualcomm EMAC3 DWMAC ethernet
            - StarFive JH7110 SoC
            - NXP CBTX ethernet PHY
      
         - WiFi:
            - Apple M1 Pro/Max devices
            - RealTek rtl8710bu/rtl8188gu
            - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
      
         - Bluetooth:
            - Realtek RTL8821CS, RTL8851B, RTL8852BS
            - Mediatek MT7663, MT7922
            - NXP w8997
            - Actions Semi ATS2851
            - QTI WCN6855
            - Marvell 88W8997
      
         - Can:
            - STMicroelectronics bxcan stm32f429
      
        Drivers:
      
         - Ethernet NICs:
            - Intel (1G, icg):
               - add tracking and reporting of QBV config errors
               - add support for configuring max SDU for each Tx queue
            - Intel (100G, ice):
               - refactor mailbox overflow detection to support Scalable IOV
               - GNSS interface optimization
            - Intel (i40e):
               - support XDP multi-buffer
            - nVidia/Mellanox:
               - add the support for linux bridge multicast offload
               - enable TC offload for egress and engress MACVLAN over bond
               - add support for VxLAN GBP encap/decap flows offload
               - extend packet offload to fully support libreswan
               - support tunnel mode in mlx5 IPsec packet offload
               - extend XDP multi-buffer support
               - support MACsec VLAN offload
               - add support for dynamic msix vectors allocation
               - drop RX page_cache and fully use page_pool
               - implement thermal zone to report NIC temperature
            - Netronome/Corigine:
               - add support for multi-zone conntrack offload
            - Solarflare/Xilinx:
               - support offloading TC VLAN push/pop actions to the MAE
               - support TC decap rules
               - support unicast PTP
      
         - Other NICs:
            - Broadcom (bnxt): enforce software based freq adjustments only on
              shared PHC NIC
            - RealTek (r8169): refactor to addess ASPM issues during NAPI poll
            - Micrel (lan8841): add support for PTP_PF_PEROUT
            - Cadence (macb): enable PTP unicast
            - Engleder (tsnep): add XDP socket zero-copy support
            - virtio-net: implement exact header length guest feature
            - veth: add page_pool support for page recycling
            - vxlan: add MDB data path support
            - gve: add XDP support for GQI-QPL format
            - geneve: accept every ethertype
            - macvlan: allow some packets to bypass broadcast queue
            - mana: add support for jumbo frame
      
         - Ethernet high-speed switches:
            - Microchip (sparx5): Add support for TC flower templates
      
         - Ethernet embedded switches:
            - Broadcom (b54):
               - configure 6318 and 63268 RGMII ports
            - Marvell (mv88e6xxx):
               - faster C45 bus scan
            - Microchip:
               - lan966x:
                  - add support for IS1 VCAP
                  - better TX/RX from/to CPU performances
               - ksz9477: add ETS Qdisc support
               - ksz8: enhance static MAC table operations and error handling
               - sama7g5: add PTP capability
            - NXP (ocelot):
               - add support for external ports
               - add support for preemptible traffic classes
            - Texas Instruments:
               - add CPSWxG SGMII support for J7200 and J721E
      
         - Intel WiFi (iwlwifi):
            - preparation for Wi-Fi 7 EHT and multi-link support
            - EHT (Wi-Fi 7) sniffer support
            - hardware timestamping support for some devices/firwmares
            - TX beacon protection on newer hardware
      
         - Qualcomm 802.11ax WiFi (ath11k):
            - MU-MIMO parameters support
            - ack signal support for management packets
      
         - RealTek WiFi (rtw88):
            - SDIO bus support
            - better support for some SDIO devices (e.g. MAC address from
              efuse)
      
         - RealTek WiFi (rtw89):
            - HW scan support for 8852b
            - better support for 6 GHz scanning
            - support for various newer firmware APIs
            - framework firmware backwards compatibility
      
         - MediaTek WiFi (mt76):
            - P2P support
            - mesh A-MSDU support
            - EHT (Wi-Fi 7) support
            - coredump support"
      
      * tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
        net: phy: hide the PHYLIB_LEDS knob
        net: phy: marvell-88x2222: remove unnecessary (void*) conversions
        tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
        net: amd: Fix link leak when verifying config failed
        net: phy: marvell: Fix inconsistent indenting in led_blink_set
        lan966x: Don't use xdp_frame when action is XDP_TX
        tsnep: Add XDP socket zero-copy TX support
        tsnep: Add XDP socket zero-copy RX support
        tsnep: Move skb receive action to separate function
        tsnep: Add functions for queue enable/disable
        tsnep: Rework TX/RX queue initialization
        tsnep: Replace modulo operation with mask
        net: phy: dp83867: Add led_brightness_set support
        net: phy: Fix reading LED reg property
        drivers: nfc: nfcsim: remove return value check of `dev_dir`
        net: phy: dp83867: Remove unnecessary (void*) conversions
        net: ethtool: coalesce: try to make user settings stick twice
        net: mana: Check if netdev/napi_alloc_frag returns single page
        net: mana: Rename mana_refill_rxoob and remove some empty lines
        net: veth: add page_pool stats
        ...
      6e98b09d
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · b68ee1c6
      Linus Torvalds authored
      Pull SCSI updates from James Bottomley:
       "Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target,
        mpi3mr, hisi_sas, arcmsr).
      
        The major core change is the constification of the host templates
        (which touches everything) along with other minor fixups and clean
        ups"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
        scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command()
        scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately
        scsi: cxlflash: s/semahpore/semaphore/
        scsi: lpfc: Silence an incorrect device output
        scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation
        scsi: scsi_debug: Fix missing error code in scsi_debug_init()
        scsi: hisi_sas: Work around build failure in suspend function
        scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()
        scsi: mpt3sas: Fix an issue when driver is being removed
        scsi: mpt3sas: Remove HBA BIOS version in the kernel log
        scsi: target: core: Fix invalid memory access
        scsi: scsi_debug: Drop sdebug_queue
        scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts
        scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store()
        scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued()
        scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll()
        scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd
        scsi: scsi_debug: Use scsi_block_requests() to block queues
        scsi: scsi_debug: Protect block_unblock_all_queues() with mutex
        scsi: scsi_debug: Change shost list lock to a mutex
        ...
      b68ee1c6