1. 29 Jun, 2022 7 commits
  2. 28 Jun, 2022 17 commits
  3. 24 Jun, 2022 7 commits
  4. 23 Jun, 2022 9 commits
    • Jörn-Thorben Hinz's avatar
      selftests/bpf: Fix rare segfault in sock_fields prog test · 6dc7a0ba
      Jörn-Thorben Hinz authored
      test_sock_fields__detach() got called with a null pointer here when one
      of the CHECKs or ASSERTs up to the test_sock_fields__open_and_load()
      call resulted in a jump to the "done" label.
      
      A skeletons *__detach() is not safe to call with a null pointer, though.
      This led to a segfault.
      
      Go the easy route and only call test_sock_fields__destroy() which is
      null-pointer safe and includes detaching.
      
      Came across this while looking[1] to introduce the usage of
      bpf_tcp_helpers.h (included in progs/test_sock_fields.c) together with
      vmlinux.h.
      
      [1] https://lore.kernel.org/bpf/629bc069dd807d7ac646f836e9dca28bbc1108e2.camel@mailbox.tu-berlin.de/
      
      Fixes: 8f50f16f ("selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads")
      Signed-off-by: default avatarJörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220621070116.307221-1-jthinz@mailbox.tu-berlin.de
      6dc7a0ba
    • Alexei Starovoitov's avatar
      Merge branch 'Align BPF TCP CCs implementing cong_control() with non-BPF CCs' · bb7a4257
      Alexei Starovoitov authored
      Jörn-Thorben Hinz says:
      
      ====================
      
      This series corrects some inconveniences for a BPF TCP CC that
      implements and uses tcp_congestion_ops.cong_control(). Until now, such a
      CC did not have all necessary write access to struct sock and
      unnecessarily needed to implement cong_avoid().
      
      v4:
       - Remove braces around single statements after if
       - Don’t check pointer passed to bpf_link__destroy()
      v3:
       - Add a selftest writing sk_pacing_*
       - Add a selftest with incomplete tcp_congestion_ops
       - Add a selftest with unsupported get_info()
       - Remove an unused variable
       - Reword a comment about reg() in bpf_struct_ops_map_update_elem()
      v2:
       - Drop redundant check for required functions and just rely on
         tcp_register_congestion_control() (Martin KaFai Lau)
      ====================
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bb7a4257
    • Jörn-Thorben Hinz's avatar
      selftests/bpf: Test a BPF CC implementing the unsupported get_info() · f14a3f64
      Jörn-Thorben Hinz authored
      Test whether a TCP CC implemented in BPF providing get_info() is
      rejected correctly. get_info() is unsupported in a BPF CC. The check for
      required functions in a BPF CC has moved, this test ensures unsupported
      functions are still rejected correctly.
      Signed-off-by: default avatarJörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/r/20220622191227.898118-6-jthinz@mailbox.tu-berlin.deSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f14a3f64
    • Jörn-Thorben Hinz's avatar
      selftests/bpf: Test an incomplete BPF CC · 0735627d
      Jörn-Thorben Hinz authored
      Test whether a TCP CC implemented in BPF providing neither cong_avoid()
      nor cong_control() is correctly rejected. This check solely depends on
      tcp_register_congestion_control() now, which is invoked during
      bpf_map__attach_struct_ops().
      Signed-off-by: default avatarJörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
      Link: https://lore.kernel.org/r/20220622191227.898118-5-jthinz@mailbox.tu-berlin.deSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0735627d
    • Jörn-Thorben Hinz's avatar
      selftests/bpf: Test a BPF CC writing sk_pacing_* · 6e945d57
      Jörn-Thorben Hinz authored
      Test whether a TCP CC implemented in BPF is allowed to write
      sk_pacing_rate and sk_pacing_status in struct sock. This is needed when
      cong_control() is implemented and used.
      Signed-off-by: default avatarJörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
      Link: https://lore.kernel.org/r/20220622191227.898118-4-jthinz@mailbox.tu-berlin.deSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6e945d57
    • Jörn-Thorben Hinz's avatar
      bpf: Require only one of cong_avoid() and cong_control() from a TCP CC · 9f0265e9
      Jörn-Thorben Hinz authored
      Remove the check for required and optional functions in a struct
      tcp_congestion_ops from bpf_tcp_ca.c. Rely on
      tcp_register_congestion_control() to reject a BPF CC that does not
      implement all required functions, as it will do for a non-BPF CC.
      
      When a CC implements tcp_congestion_ops.cong_control(), the alternate
      cong_avoid() is not in use in the TCP stack. Previously, a BPF CC was
      still forced to implement cong_avoid() as a no-op since it was
      non-optional in bpf_tcp_ca.c.
      Signed-off-by: default avatarJörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/r/20220622191227.898118-3-jthinz@mailbox.tu-berlin.deSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9f0265e9
    • Jörn-Thorben Hinz's avatar
      bpf: Allow a TCP CC to write sk_pacing_rate and sk_pacing_status · 41c95dd6
      Jörn-Thorben Hinz authored
      A CC that implements tcp_congestion_ops.cong_control() should be able to
      control sk_pacing_rate and set sk_pacing_status, since
      tcp_update_pacing_rate() is never called in this case. A built-in CC or
      one from a kernel module is already able to write to both struct sock
      members. For a BPF program, write access has not been allowed, yet.
      Signed-off-by: default avatarJörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/r/20220622191227.898118-2-jthinz@mailbox.tu-berlin.deSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      41c95dd6
    • Jian Shen's avatar
      test_bpf: fix incorrect netdev features · 9676fecc
      Jian Shen authored
      The prototype of .features is netdev_features_t, it should use
      NETIF_F_LLTX and NETIF_F_HW_VLAN_STAG_TX, not NETIF_F_LLTX_BIT
      and NETIF_F_HW_VLAN_STAG_TX_BIT.
      
      Fixes: cf204a71 ("bpf, testing: Introduce 'gso_linear_no_head_frag' skb_segment test")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20220622135002.8263-1-shenjian15@huawei.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9676fecc
    • Dave Marchevsky's avatar
      selftests/bpf: Add benchmark for local_storage get · 73087489
      Dave Marchevsky authored
      Add a benchmarks to demonstrate the performance cliff for local_storage
      get as the number of local_storage maps increases beyond current
      local_storage implementation's cache size.
      
      "sequential get" and "interleaved get" benchmarks are added, both of
      which do many bpf_task_storage_get calls on sets of task local_storage
      maps of various counts, while considering a single specific map to be
      'important' and counting task_storage_gets to the important map
      separately in addition to normal 'hits' count of all gets. Goal here is
      to mimic scenario where a particular program using one map - the
      important one - is running on a system where many other local_storage
      maps exist and are accessed often.
      
      While "sequential get" benchmark does bpf_task_storage_get for map 0, 1,
      ..., {9, 99, 999} in order, "interleaved" benchmark interleaves 4
      bpf_task_storage_gets for the important map for every 10 map gets. This
      is meant to highlight performance differences when important map is
      accessed far more frequently than non-important maps.
      
      A "hashmap control" benchmark is also included for easy comparison of
      standard bpf hashmap lookup vs local_storage get. The benchmark is
      similar to "sequential get", but creates and uses BPF_MAP_TYPE_HASH
      instead of local storage. Only one inner map is created - a hashmap
      meant to hold tid -> data mapping for all tasks. Size of the hashmap is
      hardcoded to my system's PID_MAX_LIMIT (4,194,304). The number of these
      keys which are actually fetched as part of the benchmark is
      configurable.
      
      Addition of this benchmark is inspired by conversation with Alexei in a
      previous patchset's thread [0], which highlighted the need for such a
      benchmark to motivate and validate improvements to local_storage
      implementation. My approach in that series focused on improving
      performance for explicitly-marked 'important' maps and was rejected
      with feedback to make more generally-applicable improvements while
      avoiding explicitly marking maps as important. Thus the benchmark
      reports both general and important-map-focused metrics, so effect of
      future work on both is clear.
      
      Regarding the benchmark results. On a powerful system (Skylake, 20
      cores, 256gb ram):
      
      Hashmap Control
      ===============
              num keys: 10
      hashmap (control) sequential    get:  hits throughput: 20.900 ± 0.334 M ops/s, hits latency: 47.847 ns/op, important_hits throughput: 20.900 ± 0.334 M ops/s
      
              num keys: 1000
      hashmap (control) sequential    get:  hits throughput: 13.758 ± 0.219 M ops/s, hits latency: 72.683 ns/op, important_hits throughput: 13.758 ± 0.219 M ops/s
      
              num keys: 10000
      hashmap (control) sequential    get:  hits throughput: 6.995 ± 0.034 M ops/s, hits latency: 142.959 ns/op, important_hits throughput: 6.995 ± 0.034 M ops/s
      
              num keys: 100000
      hashmap (control) sequential    get:  hits throughput: 4.452 ± 0.371 M ops/s, hits latency: 224.635 ns/op, important_hits throughput: 4.452 ± 0.371 M ops/s
      
              num keys: 4194304
      hashmap (control) sequential    get:  hits throughput: 3.043 ± 0.033 M ops/s, hits latency: 328.587 ns/op, important_hits throughput: 3.043 ± 0.033 M ops/s
      
      Local Storage
      =============
              num_maps: 1
      local_storage cache sequential  get:  hits throughput: 47.298 ± 0.180 M ops/s, hits latency: 21.142 ns/op, important_hits throughput: 47.298 ± 0.180 M ops/s
      local_storage cache interleaved get:  hits throughput: 55.277 ± 0.888 M ops/s, hits latency: 18.091 ns/op, important_hits throughput: 55.277 ± 0.888 M ops/s
      
              num_maps: 10
      local_storage cache sequential  get:  hits throughput: 40.240 ± 0.802 M ops/s, hits latency: 24.851 ns/op, important_hits throughput: 4.024 ± 0.080 M ops/s
      local_storage cache interleaved get:  hits throughput: 48.701 ± 0.722 M ops/s, hits latency: 20.533 ns/op, important_hits throughput: 17.393 ± 0.258 M ops/s
      
              num_maps: 16
      local_storage cache sequential  get:  hits throughput: 44.515 ± 0.708 M ops/s, hits latency: 22.464 ns/op, important_hits throughput: 2.782 ± 0.044 M ops/s
      local_storage cache interleaved get:  hits throughput: 49.553 ± 2.260 M ops/s, hits latency: 20.181 ns/op, important_hits throughput: 15.767 ± 0.719 M ops/s
      
              num_maps: 17
      local_storage cache sequential  get:  hits throughput: 38.778 ± 0.302 M ops/s, hits latency: 25.788 ns/op, important_hits throughput: 2.284 ± 0.018 M ops/s
      local_storage cache interleaved get:  hits throughput: 43.848 ± 1.023 M ops/s, hits latency: 22.806 ns/op, important_hits throughput: 13.349 ± 0.311 M ops/s
      
              num_maps: 24
      local_storage cache sequential  get:  hits throughput: 19.317 ± 0.568 M ops/s, hits latency: 51.769 ns/op, important_hits throughput: 0.806 ± 0.024 M ops/s
      local_storage cache interleaved get:  hits throughput: 24.397 ± 0.272 M ops/s, hits latency: 40.989 ns/op, important_hits throughput: 6.863 ± 0.077 M ops/s
      
              num_maps: 32
      local_storage cache sequential  get:  hits throughput: 13.333 ± 0.135 M ops/s, hits latency: 75.000 ns/op, important_hits throughput: 0.417 ± 0.004 M ops/s
      local_storage cache interleaved get:  hits throughput: 16.898 ± 0.383 M ops/s, hits latency: 59.178 ns/op, important_hits throughput: 4.717 ± 0.107 M ops/s
      
              num_maps: 100
      local_storage cache sequential  get:  hits throughput: 6.360 ± 0.107 M ops/s, hits latency: 157.233 ns/op, important_hits throughput: 0.064 ± 0.001 M ops/s
      local_storage cache interleaved get:  hits throughput: 7.303 ± 0.362 M ops/s, hits latency: 136.930 ns/op, important_hits throughput: 1.907 ± 0.094 M ops/s
      
              num_maps: 1000
      local_storage cache sequential  get:  hits throughput: 0.452 ± 0.010 M ops/s, hits latency: 2214.022 ns/op, important_hits throughput: 0.000 ± 0.000 M ops/s
      local_storage cache interleaved get:  hits throughput: 0.542 ± 0.007 M ops/s, hits latency: 1843.341 ns/op, important_hits throughput: 0.136 ± 0.002 M ops/s
      
      Looking at the "sequential get" results, it's clear that as the
      number of task local_storage maps grows beyond the current cache size
      (16), there's a significant reduction in hits throughput. Note that
      current local_storage implementation assigns a cache_idx to maps as they
      are created. Since "sequential get" is creating maps 0..n in order and
      then doing bpf_task_storage_get calls in the same order, the benchmark
      is effectively ensuring that a map will not be in cache when the program
      tries to access it.
      
      For "interleaved get" results, important-map hits throughput is greatly
      increased as the important map is more likely to be in cache by virtue
      of being accessed far more frequently. Throughput still reduces as #
      maps increases, though.
      
      To get a sense of the overhead of the benchmark program, I
      commented out bpf_task_storage_get/bpf_map_lookup_elem in
      local_storage_bench.c and ran the benchmark on the same host as the
      'real' run. Results:
      
      Hashmap Control
      ===============
              num keys: 10
      hashmap (control) sequential    get:  hits throughput: 54.288 ± 0.655 M ops/s, hits latency: 18.420 ns/op, important_hits throughput: 54.288 ± 0.655 M ops/s
      
              num keys: 1000
      hashmap (control) sequential    get:  hits throughput: 52.913 ± 0.519 M ops/s, hits latency: 18.899 ns/op, important_hits throughput: 52.913 ± 0.519 M ops/s
      
              num keys: 10000
      hashmap (control) sequential    get:  hits throughput: 53.480 ± 1.235 M ops/s, hits latency: 18.699 ns/op, important_hits throughput: 53.480 ± 1.235 M ops/s
      
              num keys: 100000
      hashmap (control) sequential    get:  hits throughput: 54.982 ± 1.902 M ops/s, hits latency: 18.188 ns/op, important_hits throughput: 54.982 ± 1.902 M ops/s
      
              num keys: 4194304
      hashmap (control) sequential    get:  hits throughput: 50.858 ± 0.707 M ops/s, hits latency: 19.662 ns/op, important_hits throughput: 50.858 ± 0.707 M ops/s
      
      Local Storage
      =============
              num_maps: 1
      local_storage cache sequential  get:  hits throughput: 110.990 ± 4.828 M ops/s, hits latency: 9.010 ns/op, important_hits throughput: 110.990 ± 4.828 M ops/s
      local_storage cache interleaved get:  hits throughput: 161.057 ± 4.090 M ops/s, hits latency: 6.209 ns/op, important_hits throughput: 161.057 ± 4.090 M ops/s
      
              num_maps: 10
      local_storage cache sequential  get:  hits throughput: 112.930 ± 1.079 M ops/s, hits latency: 8.855 ns/op, important_hits throughput: 11.293 ± 0.108 M ops/s
      local_storage cache interleaved get:  hits throughput: 115.841 ± 2.088 M ops/s, hits latency: 8.633 ns/op, important_hits throughput: 41.372 ± 0.746 M ops/s
      
              num_maps: 16
      local_storage cache sequential  get:  hits throughput: 115.653 ± 0.416 M ops/s, hits latency: 8.647 ns/op, important_hits throughput: 7.228 ± 0.026 M ops/s
      local_storage cache interleaved get:  hits throughput: 138.717 ± 1.649 M ops/s, hits latency: 7.209 ns/op, important_hits throughput: 44.137 ± 0.525 M ops/s
      
              num_maps: 17
      local_storage cache sequential  get:  hits throughput: 112.020 ± 1.649 M ops/s, hits latency: 8.927 ns/op, important_hits throughput: 6.598 ± 0.097 M ops/s
      local_storage cache interleaved get:  hits throughput: 128.089 ± 1.960 M ops/s, hits latency: 7.807 ns/op, important_hits throughput: 38.995 ± 0.597 M ops/s
      
              num_maps: 24
      local_storage cache sequential  get:  hits throughput: 92.447 ± 5.170 M ops/s, hits latency: 10.817 ns/op, important_hits throughput: 3.855 ± 0.216 M ops/s
      local_storage cache interleaved get:  hits throughput: 128.844 ± 2.808 M ops/s, hits latency: 7.761 ns/op, important_hits throughput: 36.245 ± 0.790 M ops/s
      
              num_maps: 32
      local_storage cache sequential  get:  hits throughput: 102.042 ± 1.462 M ops/s, hits latency: 9.800 ns/op, important_hits throughput: 3.194 ± 0.046 M ops/s
      local_storage cache interleaved get:  hits throughput: 126.577 ± 1.818 M ops/s, hits latency: 7.900 ns/op, important_hits throughput: 35.332 ± 0.507 M ops/s
      
              num_maps: 100
      local_storage cache sequential  get:  hits throughput: 111.327 ± 1.401 M ops/s, hits latency: 8.983 ns/op, important_hits throughput: 1.113 ± 0.014 M ops/s
      local_storage cache interleaved get:  hits throughput: 131.327 ± 1.339 M ops/s, hits latency: 7.615 ns/op, important_hits throughput: 34.302 ± 0.350 M ops/s
      
              num_maps: 1000
      local_storage cache sequential  get:  hits throughput: 101.978 ± 0.563 M ops/s, hits latency: 9.806 ns/op, important_hits throughput: 0.102 ± 0.001 M ops/s
      local_storage cache interleaved get:  hits throughput: 141.084 ± 1.098 M ops/s, hits latency: 7.088 ns/op, important_hits throughput: 35.430 ± 0.276 M ops/s
      
      Adjusting for overhead, latency numbers for "hashmap control" and
      "sequential get" are:
      
      hashmap_control_1k:   ~53.8ns
      hashmap_control_10k:  ~124.2ns
      hashmap_control_100k: ~206.5ns
      sequential_get_1:     ~12.1ns
      sequential_get_10:    ~16.0ns
      sequential_get_16:    ~13.8ns
      sequential_get_17:    ~16.8ns
      sequential_get_24:    ~40.9ns
      sequential_get_32:    ~65.2ns
      sequential_get_100:   ~148.2ns
      sequential_get_1000:  ~2204ns
      
      Clearly demonstrating a cliff.
      
      In the discussion for v1 of this patch, Alexei noted that local_storage
      was 2.5x faster than a large hashmap when initially implemented [1]. The
      benchmark results show that local_storage is 5-10x faster: a
      long-running BPF application putting some pid-specific info into a
      hashmap for each pid it sees will probably see on the order of 10-100k
      pids. Bench numbers for hashmaps of this size are ~10x slower than
      sequential_get_16, but as the number of local_storage maps grows far
      past local_storage cache size the performance advantage shrinks and
      eventually reverses.
      
      When running the benchmarks it may be necessary to bump 'open files'
      ulimit for a successful run.
      
        [0]: https://lore.kernel.org/all/20220420002143.1096548-1-davemarchevsky@fb.com
        [1]: https://lore.kernel.org/bpf/20220511173305.ftldpn23m4ski3d3@MBP-98dd607d3435.dhcp.thefacebook.com/Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20220620222554.270578-1-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      73087489