1. 13 Feb, 2024 1 commit
  2. 11 Feb, 2024 1 commit
    • Marco Elver's avatar
      bpf: Allow compiler to inline most of bpf_local_storage_lookup() · 68bc61c2
      Marco Elver authored
      In various performance profiles of kernels with BPF programs attached,
      bpf_local_storage_lookup() appears as a significant portion of CPU
      cycles spent. To enable the compiler generate more optimal code, turn
      bpf_local_storage_lookup() into a static inline function, where only the
      cache insertion code path is outlined
      
      Notably, outlining cache insertion helps avoid bloating callers by
      duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on
      architectures which do not inline spin_lock/unlock, such as x86), which
      would cause the compiler produce worse code by deciding to outline
      otherwise inlinable functions. The call overhead is neutral, because we
      make 2 calls either way: either calling raw_spin_lock_irqsave() and
      raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(),
      which calls raw_spin_lock_irqsave(), followed by a tail-call to
      raw_spin_unlock_irqsave() where the compiler can perform TCO and (in
      optimized uninstrumented builds) turns it into a plain jump. The call to
      __bpf_local_storage_insert_cache() can be elided entirely if
      cacheit_lockit is a false constant expression.
      
      Based on results from './benchs/run_bench_local_storage.sh' (21 trials,
      reboot between each trial; x86 defconfig + BPF, clang 16) this produces
      improvements in throughput and latency in the majority of cases, with an
      average (geomean) improvement of 8%:
      
      +---- Hashmap Control --------------------
      |
      | + num keys: 10
      | :                                         <before>             | <after>
      | +-+ hashmap (control) sequential get    +----------------------+----------------------
      |   +- hits throughput                    | 14.789 M ops/s       | 14.745 M ops/s (  ~  )
      |   +- hits latency                       | 67.679 ns/op         | 67.879 ns/op   (  ~  )
      |   +- important_hits throughput          | 14.789 M ops/s       | 14.745 M ops/s (  ~  )
      |
      | + num keys: 1000
      | :                                         <before>             | <after>
      | +-+ hashmap (control) sequential get    +----------------------+----------------------
      |   +- hits throughput                    | 12.233 M ops/s       | 12.170 M ops/s (  ~  )
      |   +- hits latency                       | 81.754 ns/op         | 82.185 ns/op   (  ~  )
      |   +- important_hits throughput          | 12.233 M ops/s       | 12.170 M ops/s (  ~  )
      |
      | + num keys: 10000
      | :                                         <before>             | <after>
      | +-+ hashmap (control) sequential get    +----------------------+----------------------
      |   +- hits throughput                    | 7.220 M ops/s        | 7.204 M ops/s  (  ~  )
      |   +- hits latency                       | 138.522 ns/op        | 138.842 ns/op  (  ~  )
      |   +- important_hits throughput          | 7.220 M ops/s        | 7.204 M ops/s  (  ~  )
      |
      | + num keys: 100000
      | :                                         <before>             | <after>
      | +-+ hashmap (control) sequential get    +----------------------+----------------------
      |   +- hits throughput                    | 5.061 M ops/s        | 5.165 M ops/s  (+2.1%)
      |   +- hits latency                       | 198.483 ns/op        | 194.270 ns/op  (-2.1%)
      |   +- important_hits throughput          | 5.061 M ops/s        | 5.165 M ops/s  (+2.1%)
      |
      | + num keys: 4194304
      | :                                         <before>             | <after>
      | +-+ hashmap (control) sequential get    +----------------------+----------------------
      |   +- hits throughput                    | 2.864 M ops/s        | 2.882 M ops/s  (  ~  )
      |   +- hits latency                       | 365.220 ns/op        | 361.418 ns/op  (-1.0%)
      |   +- important_hits throughput          | 2.864 M ops/s        | 2.882 M ops/s  (  ~  )
      |
      +---- Local Storage ----------------------
      |
      | + num_maps: 1
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 33.005 M ops/s       | 39.068 M ops/s (+18.4%)
      |   +- hits latency                       | 30.300 ns/op         | 25.598 ns/op   (-15.5%)
      |   +- important_hits throughput          | 33.005 M ops/s       | 39.068 M ops/s (+18.4%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 37.151 M ops/s       | 44.926 M ops/s (+20.9%)
      |   +- hits latency                       | 26.919 ns/op         | 22.259 ns/op   (-17.3%)
      |   +- important_hits throughput          | 37.151 M ops/s       | 44.926 M ops/s (+20.9%)
      |
      | + num_maps: 10
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 32.288 M ops/s       | 38.099 M ops/s (+18.0%)
      |   +- hits latency                       | 30.972 ns/op         | 26.248 ns/op   (-15.3%)
      |   +- important_hits throughput          | 3.229 M ops/s        | 3.810 M ops/s  (+18.0%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 34.473 M ops/s       | 41.145 M ops/s (+19.4%)
      |   +- hits latency                       | 29.010 ns/op         | 24.307 ns/op   (-16.2%)
      |   +- important_hits throughput          | 12.312 M ops/s       | 14.695 M ops/s (+19.4%)
      |
      | + num_maps: 16
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 32.524 M ops/s       | 38.341 M ops/s (+17.9%)
      |   +- hits latency                       | 30.748 ns/op         | 26.083 ns/op   (-15.2%)
      |   +- important_hits throughput          | 2.033 M ops/s        | 2.396 M ops/s  (+17.9%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 34.575 M ops/s       | 41.338 M ops/s (+19.6%)
      |   +- hits latency                       | 28.925 ns/op         | 24.193 ns/op   (-16.4%)
      |   +- important_hits throughput          | 11.001 M ops/s       | 13.153 M ops/s (+19.6%)
      |
      | + num_maps: 17
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 28.861 M ops/s       | 32.756 M ops/s (+13.5%)
      |   +- hits latency                       | 34.649 ns/op         | 30.530 ns/op   (-11.9%)
      |   +- important_hits throughput          | 1.700 M ops/s        | 1.929 M ops/s  (+13.5%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 31.529 M ops/s       | 36.110 M ops/s (+14.5%)
      |   +- hits latency                       | 31.719 ns/op         | 27.697 ns/op   (-12.7%)
      |   +- important_hits throughput          | 9.598 M ops/s        | 10.993 M ops/s (+14.5%)
      |
      | + num_maps: 24
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 18.602 M ops/s       | 19.937 M ops/s (+7.2%)
      |   +- hits latency                       | 53.767 ns/op         | 50.166 ns/op   (-6.7%)
      |   +- important_hits throughput          | 0.776 M ops/s        | 0.831 M ops/s  (+7.2%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 21.718 M ops/s       | 23.332 M ops/s (+7.4%)
      |   +- hits latency                       | 46.047 ns/op         | 42.865 ns/op   (-6.9%)
      |   +- important_hits throughput          | 6.110 M ops/s        | 6.564 M ops/s  (+7.4%)
      |
      | + num_maps: 32
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 14.118 M ops/s       | 14.626 M ops/s (+3.6%)
      |   +- hits latency                       | 70.856 ns/op         | 68.381 ns/op   (-3.5%)
      |   +- important_hits throughput          | 0.442 M ops/s        | 0.458 M ops/s  (+3.6%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 17.111 M ops/s       | 17.906 M ops/s (+4.6%)
      |   +- hits latency                       | 58.451 ns/op         | 55.865 ns/op   (-4.4%)
      |   +- important_hits throughput          | 4.776 M ops/s        | 4.998 M ops/s  (+4.6%)
      |
      | + num_maps: 100
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 5.281 M ops/s        | 5.528 M ops/s  (+4.7%)
      |   +- hits latency                       | 192.398 ns/op        | 183.059 ns/op  (-4.9%)
      |   +- important_hits throughput          | 0.053 M ops/s        | 0.055 M ops/s  (+4.9%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 6.265 M ops/s        | 6.498 M ops/s  (+3.7%)
      |   +- hits latency                       | 161.436 ns/op        | 152.877 ns/op  (-5.3%)
      |   +- important_hits throughput          | 1.636 M ops/s        | 1.697 M ops/s  (+3.7%)
      |
      | + num_maps: 1000
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 0.355 M ops/s        | 0.354 M ops/s  (  ~  )
      |   +- hits latency                       | 2826.538 ns/op       | 2827.139 ns/op (  ~  )
      |   +- important_hits throughput          | 0.000 M ops/s        | 0.000 M ops/s  (  ~  )
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 0.404 M ops/s        | 0.403 M ops/s  (  ~  )
      |   +- hits latency                       | 2481.190 ns/op       | 2487.555 ns/op (  ~  )
      |   +- important_hits throughput          | 0.102 M ops/s        | 0.101 M ops/s  (  ~  )
      
      The on_lookup test in {cgrp,task}_ls_recursion.c is removed
      because the bpf_local_storage_lookup is no longer traceable
      and adding tracepoint will make the compiler generate worse
      code: https://lore.kernel.org/bpf/ZcJmok64Xqv6l4ZS@elver.google.com/Signed-off-by: default avatarMarco Elver <elver@google.com>
      Cc: Martin KaFai Lau <martin.lau@linux.dev>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20240207122626.3508658-1-elver@google.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      68bc61c2
  3. 08 Feb, 2024 7 commits
  4. 07 Feb, 2024 3 commits
    • Andrii Nakryiko's avatar
      Merge branch 'tools-resolve_btfids-fix-cross-compilation-to-non-host-endianness' · abae1ac5
      Andrii Nakryiko authored
      Viktor Malik says:
      
      ====================
      tools/resolve_btfids: fix cross-compilation to non-host endianness
      
      The .BTF_ids section is pre-filled with zeroed BTF ID entries during the
      build and afterwards patched by resolve_btfids with correct values.
      Since resolve_btfids always writes in host-native endianness, it relies
      on libelf to do the translation when the target ELF is cross-compiled to
      a different endianness (this was introduced in commit 61e8aeda
      ("bpf: Fix libelf endian handling in resolv_btfids")).
      
      Unfortunately, the translation will corrupt the flags fields of SET8
      entries because these were written during vmlinux compilation and are in
      the correct endianness already. This will lead to numerous selftests
      failures such as:
      
          $ sudo ./test_verifier 502 502
          #502/p sleepable fentry accept FAIL
          Failed to load prog 'Invalid argument'!
          bpf_fentry_test1 is not sleepable
          verification time 34 usec
          stack depth 0
          processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
          Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
      
      Since it's not possible to instruct libelf to translate just certain
      values, let's manually bswap the flags (both global and entry flags) in
      resolve_btfids when needed, so that libelf then translates everything
      correctly.
      
      The first patch of the series refactors resolve_btfids by using types
      from btf_ids.h instead of accessing the BTF ID data using magic offsets.
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      ---
      Changes in v4:
      - remove unnecessary vars and pointer casts (suggested by Daniel Xu)
      
      Changes in v3:
      - add byte swap of global 'flags' field in btf_id_set8 (suggested by
        Jiri Olsa)
      - cleaner refactoring of sets_patch (suggested by Jiri Olsa)
      - add compile-time assertion that IDs are at the beginning of pairs
        struct in btf_id_set8 (suggested by Daniel Borkmann)
      
      Changes in v2:
      - use type defs from btf_ids.h (suggested by Andrii Nakryiko)
      ====================
      
      Link: https://lore.kernel.org/r/cover.1707223196.git.vmalik@redhat.comSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      abae1ac5
    • Viktor Malik's avatar
      tools/resolve_btfids: Fix cross-compilation to non-host endianness · 903fad43
      Viktor Malik authored
      The .BTF_ids section is pre-filled with zeroed BTF ID entries during the
      build and afterwards patched by resolve_btfids with correct values.
      Since resolve_btfids always writes in host-native endianness, it relies
      on libelf to do the translation when the target ELF is cross-compiled to
      a different endianness (this was introduced in commit 61e8aeda
      ("bpf: Fix libelf endian handling in resolv_btfids")).
      
      Unfortunately, the translation will corrupt the flags fields of SET8
      entries because these were written during vmlinux compilation and are in
      the correct endianness already. This will lead to numerous selftests
      failures such as:
      
          $ sudo ./test_verifier 502 502
          #502/p sleepable fentry accept FAIL
          Failed to load prog 'Invalid argument'!
          bpf_fentry_test1 is not sleepable
          verification time 34 usec
          stack depth 0
          processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
          Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
      
      Since it's not possible to instruct libelf to translate just certain
      values, let's manually bswap the flags (both global and entry flags) in
      resolve_btfids when needed, so that libelf then translates everything
      correctly.
      
      Fixes: ef2c6f37 ("tools/resolve_btfids: Add support for 8-byte BTF sets")
      Signed-off-by: default avatarViktor Malik <vmalik@redhat.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/7b6bff690919555574ce0f13d2a5996cacf7bf69.1707223196.git.vmalik@redhat.com
      903fad43
    • Viktor Malik's avatar
      tools/resolve_btfids: Refactor set sorting with types from btf_ids.h · 9707ac4f
      Viktor Malik authored
      Instead of using magic offsets to access BTF ID set data, leverage types
      from btf_ids.h (btf_id_set and btf_id_set8) which define the actual
      layout of the data. Thanks to this change, set sorting should also
      continue working if the layout changes.
      
      This requires to sync the definition of 'struct btf_id_set8' from
      include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync
      the rest of the file at the moment, b/c that would require to also sync
      multiple dependent headers and we don't need any other defs from
      btf_ids.h.
      Signed-off-by: default avatarViktor Malik <vmalik@redhat.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com
      9707ac4f
  5. 06 Feb, 2024 14 commits
  6. 05 Feb, 2024 5 commits
  7. 03 Feb, 2024 4 commits
  8. 02 Feb, 2024 5 commits
    • Shung-Hsi Yu's avatar
      selftests/bpf: trace_helpers.c: do not use poisoned type · a68b50f4
      Shung-Hsi Yu authored
      After commit c698eaeb ("selftests/bpf: trace_helpers.c: Optimize
      kallsyms cache") trace_helpers.c now includes libbpf_internal.h, and
      thus can no longer use the u32 type (among others) since they are poison
      in libbpf_internal.h. Replace u32 with __u32 to fix the following error
      when building trace_helpers.c on powerpc:
      
        error: attempt to use poisoned "u32"
      
      Fixes: c698eaeb ("selftests/bpf: trace_helpers.c: Optimize kallsyms cache")
      Signed-off-by: default avatarShung-Hsi Yu <shung-hsi.yu@suse.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20240202095559.12900-1-shung-hsi.yu@suse.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      a68b50f4
    • Andrii Nakryiko's avatar
      Merge branch 'improvements-for-tracking-scalars-in-the-bpf-verifier' · 6fb3f727
      Andrii Nakryiko authored
      Maxim Mikityanskiy says:
      
      ====================
      Improvements for tracking scalars in the BPF verifier
      
      From: Maxim Mikityanskiy <maxim@isovalent.com>
      
      The goal of this series is to extend the verifier's capabilities of
      tracking scalars when they are spilled to stack, especially when the
      spill or fill is narrowing. It also contains a fix by Eduard for
      infinite loop detection and a state pruning optimization by Eduard that
      compensates for a verification complexity regression introduced by
      tracking unbounded scalars. These improvements reduce the surface of
      false rejections that I saw while working on Cilium codebase.
      
      Patches 1-9 of the original series were previously applied in v2.
      
      Patches 1-2 (Maxim): Support the case when boundary checks are first
      performed after the register was spilled to the stack.
      
      Patches 3-4 (Maxim): Support narrowing fills.
      
      Patches 5-6 (Eduard): Optimization for state pruning in stacksafe() to
      mitigate the verification complexity regression.
      
      veristat -e file,prog,states -f '!states_diff<50' -f '!states_pct<10' -f '!states_a<10' -f '!states_b<10' -C ...
      
       * Without patch 5:
      
      File                  Program   States (A)  States (B)  States    (DIFF)
      --------------------  --------  ----------  ----------  ----------------
      pyperf100.bpf.o       on_event        4878        6528   +1650 (+33.83%)
      pyperf180.bpf.o       on_event        6936       11032   +4096 (+59.05%)
      pyperf600.bpf.o       on_event       22271       39455  +17184 (+77.16%)
      pyperf600_iter.bpf.o  on_event         400         490     +90 (+22.50%)
      strobemeta.bpf.o      on_event        4895       14028  +9133 (+186.58%)
      
       * With patch 5:
      
      File                     Program        States (A)  States (B)  States   (DIFF)
      -----------------------  -------------  ----------  ----------  ---------------
      bpf_xdp.o                tail_lb_ipv4         2770        2224   -546 (-19.71%)
      pyperf100.bpf.o          on_event             4878        5848   +970 (+19.89%)
      pyperf180.bpf.o          on_event             6936        8868  +1932 (+27.85%)
      pyperf600.bpf.o          on_event            22271       29656  +7385 (+33.16%)
      pyperf600_iter.bpf.o     on_event              400         450    +50 (+12.50%)
      xdp_synproxy_kern.bpf.o  syncookie_tc          280         226    -54 (-19.29%)
      xdp_synproxy_kern.bpf.o  syncookie_xdp         302         228    -74 (-24.50%)
      
      v2 changes:
      
      Fixed comments in patch 1, moved endianness checks to header files in
      patch 12 where possible, added Eduard's ACKs.
      
      v3 changes:
      
      Maxim: Removed __is_scalar_unbounded altogether, addressed Andrii's
      comments.
      
      Eduard: Patch #5 (#14 in v2) changed significantly:
      - Logical changes:
        - Handling of STACK_{MISC,ZERO} mix turned out to be incorrect:
          a mix of MISC and ZERO in old state is not equivalent to e.g.
          just MISC is current state, because verifier could have deduced
          zero scalars from ZERO slots in old state for some loads.
        - There is no reason to limit the change only to cases when
          old or current stack is a spill of unbounded scalar,
          it is valid to compare any 64-bit scalar spill with fake
          register impersonating MISC.
        - STACK_ZERO vs spilled zero case was dropped,
          after recent changes for zero handling by Andrii and Yonghong
          it is hard (impossible?) to conjure all ZERO slots for an spi.
          => the case does not make any difference in veristat results.
      - Use global static variable for unbound_reg (Andrii)
      - Code shuffling to remove duplication in stacksafe() (Andrii)
      ====================
      
      Link: https://lore.kernel.org/r/20240127175237.526726-1-maxtram95@gmail.comSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      6fb3f727
    • Eduard Zingerman's avatar
      selftests/bpf: States pruning checks for scalar vs STACK_MISC · 73a28d9d
      Eduard Zingerman authored
      Check that stacksafe() compares spilled scalars with STACK_MISC.
      The following combinations are explored:
      - old spill of imprecise scalar is equivalent to cur STACK_{MISC,INVALID}
        (plus error in unpriv mode);
      - old spill of precise scalar is not equivalent to cur STACK_MISC;
      - old STACK_MISC is equivalent to cur scalar;
      - old STACK_MISC is not equivalent to cur non-scalar.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240127175237.526726-7-maxtram95@gmail.com
      73a28d9d
    • Eduard Zingerman's avatar
      bpf: Handle scalar spill vs all MISC in stacksafe() · 6efbde20
      Eduard Zingerman authored
      When check_stack_read_fixed_off() reads value from an spi
      all stack slots of which are set to STACK_{MISC,INVALID},
      the destination register is set to unbound SCALAR_VALUE.
      
      Exploit this fact by allowing stacksafe() to use a fake
      unbound scalar register to compare 'mmmm mmmm' stack value
      in old state vs spilled 64-bit scalar in current state
      and vice versa.
      
      Veristat results after this patch show some gains:
      
      ./veristat -C -e file,prog,states -f 'states_pct>10'  not-opt after
      File                     Program                States   (DIFF)
      -----------------------  ---------------------  ---------------
      bpf_overlay.o            tail_rev_nodeport_lb4    -45 (-15.85%)
      bpf_xdp.o                tail_lb_ipv4            -541 (-19.57%)
      pyperf100.bpf.o          on_event                -680 (-10.42%)
      pyperf180.bpf.o          on_event               -2164 (-19.62%)
      pyperf600.bpf.o          on_event               -9799 (-24.84%)
      strobemeta.bpf.o         on_event               -9157 (-65.28%)
      xdp_synproxy_kern.bpf.o  syncookie_tc             -54 (-19.29%)
      xdp_synproxy_kern.bpf.o  syncookie_xdp            -74 (-24.50%)
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240127175237.526726-6-maxtram95@gmail.com
      6efbde20
    • Maxim Mikityanskiy's avatar
      selftests/bpf: Add test cases for narrowing fill · 067313a8
      Maxim Mikityanskiy authored
      The previous commit allowed to preserve boundaries and track IDs of
      scalars on narrowing fills. Add test cases for that pattern.
      Signed-off-by: default avatarMaxim Mikityanskiy <maxim@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/bpf/20240127175237.526726-5-maxtram95@gmail.com
      067313a8