1. 26 Aug, 2021 4 commits
    • Martin KaFai Lau's avatar
      bpf: selftests: Add dctcp fallback test · 574ee209
      Martin KaFai Lau authored
      This patch makes the bpf_dctcp test to fallback to cubic by
      using setsockopt(TCP_CONGESTION) when the tcp flow is not
      ecn ready.
      
      It also checks setsockopt() is not available to release().
      
      The settimeo() from the network_helpers.h is used, so the local
      one is removed.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210824173026.3979130-1-kafai@fb.com
      574ee209
    • Martin KaFai Lau's avatar
      bpf: selftests: Add connect_to_fd_opts to network_helpers · 3d778983
      Martin KaFai Lau authored
      The next test requires to setsockopt(TCP_CONGESTION) before
      connect(), so a new arg is needed for the connect_to_fd() to specify
      the cc's name.
      
      This patch adds a new "struct network_helper_opts" for the future
      option needs.  It starts with the "cc" and "timeout_ms" option.
      A new helper connect_to_fd_opts() is added to take the new
      "const struct network_helper_opts *opts" as an arg.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210824173019.3977910-1-kafai@fb.com
      3d778983
    • Martin KaFai Lau's avatar
      bpf: selftests: Add sk_state to bpf_tcp_helpers.h · 700dcf0f
      Martin KaFai Lau authored
      Add sk_state define to bpf_tcp_helpers.h.  Rename the existing
      global variable "sk_state" in the kfunc_call test to "sk_state_res".
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210824173013.3977316-1-kafai@fb.com
      700dcf0f
    • Martin KaFai Lau's avatar
      bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt · eb18b49e
      Martin KaFai Lau authored
      This patch allows the bpf-tcp-cc to call bpf_setsockopt.  One use
      case is to allow a bpf-tcp-cc switching to another cc during init().
      For example, when the tcp flow is not ecn ready, the bpf_dctcp
      can switch to another cc by calling setsockopt(TCP_CONGESTION).
      
      During setsockopt(TCP_CONGESTION), the new tcp-cc's init() will be
      called and this could cause a recursion but it is stopped by the
      current trampoline's logic (in the prog->active counter).
      
      While retiring a bpf-tcp-cc (e.g. in tcp_v[46]_destroy_sock()),
      the tcp stack calls bpf-tcp-cc's release().  To avoid the retiring
      bpf-tcp-cc making further changes to the sk, bpf_setsockopt is not
      available to the bpf-tcp-cc's release().  This will avoid release()
      making setsockopt() call that will potentially allocate new resources.
      
      Although the bpf-tcp-cc already has a more powerful way to read tcp_sock
      from the PTR_TO_BTF_ID, it is usually expected that bpf_getsockopt and
      bpf_setsockopt are available together.  Thus, bpf_getsockopt() is also
      added to all tcp_congestion_ops except release().
      
      When the old bpf-tcp-cc is calling setsockopt(TCP_CONGESTION)
      to switch to a new cc, the old bpf-tcp-cc will be released by
      bpf_struct_ops_put().  Thus, this patch also puts the bpf_struct_ops_map
      after a rcu grace period because the trampoline's image cannot be freed
      while the old bpf-tcp-cc is still running.
      
      bpf-tcp-cc can only access icsk_ca_priv as SCALAR.  All kernel's
      tcp-cc is also accessing the icsk_ca_priv as SCALAR.   The size
      of icsk_ca_priv has already been raised a few times to avoid
      extra kmalloc and memory referencing.  The only exception is the
      kernel's tcp_cdg.c that stores a kmalloc()-ed pointer in icsk_ca_priv.
      To avoid the old bpf-tcp-cc accidentally overriding this tcp_cdg's pointer
      value stored in icsk_ca_priv after switching and without over-complicating
      the bpf's verifier for this one exception in tcp_cdg, this patch does not
      allow switching to tcp_cdg.  If there is a need, bpf_tcp_cdg can be
      implemented and then use the bpf_sk_storage as the extended storage.
      
      bpf_sk_setsockopt proto has only been recently added and used
      in bpf-sockopt and bpf-iter-tcp, so impose the tcp_cdg limitation in the
      same proto instead of adding a new proto specifically for bpf-tcp-cc.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210824173007.3976921-1-kafai@fb.com
      eb18b49e
  2. 25 Aug, 2021 23 commits
  3. 24 Aug, 2021 13 commits
    • Alexei Starovoitov's avatar
      Merge branch 'Improve XDP samples usability and output' · 3bbc8ee7
      Alexei Starovoitov authored
      Kumar Kartikeya says:
      
      ====================
      
      This set revamps XDP samples related to redirection to show better output and
      implement missing features consolidating all their differences and giving them a
      consistent look and feel, by implementing common features and command line
      options.  Some of the TODO items like reporting redirect error numbers
      (ENETDOWN, EINVAL, ENOSPC, etc.) have also been implemented.
      
      Some of the features are:
      * Received packet statistics
      * xdp_redirect/xdp_redirect_map tracepoint statistics
      * xdp_redirect_err/xdp_redirect_map_err tracepoint statistics (with support for
        showing exact errno)
      * xdp_cpumap_enqueue/xdp_cpumap_kthread tracepoint statistics
      * xdp_devmap_xmit tracepoint statistics
      * xdp_exception tracepoint statistics
      * Per ifindex pair devmap_xmit stats shown dynamically (for xdp_monitor) to
        decompose the total.
      * Use of BPF skeleton and BPF static linking to share BPF programs.
      * Use of vmlinux.h and tp_btf for raw_tracepoint support.
      * Removal of redundant -N/--native-mode option (enforced by default now)
      * ... and massive cleanups all over the place.
      
      All tracepoints also use raw_tp now, and tracepoints like xdp_redirect
      are only enabled when requested explicitly to capture successful redirection
      statistics.
      
      The set of programs converted as part of this series are:
       * xdp_redirect_cpu
       * xdp_redirect_map_multi
       * xdp_redirect_map
       * xdp_redirect
       * xdp_monitor
      
       Explanation of the output:
      
      There is now a concise output mode by default that shows primarily four fields:
        rx/s        Number of packets received per second
        redir/s     Number of packets successfully redirected per second
        err,drop/s  Aggregated count of errors per second (including dropped packets)
        xmit/s      Number of packets transmitted on the output device per second
      
      Some examples:
       ; sudo ./xdp_redirect_map veth0 veth1 -s
      Redirecting from veth0 (ifindex 15; driver veth) to veth1 (ifindex 14; driver veth)
      veth0->veth1                    0 rx/s                  0 redir/s		0 err,drop/s               0 xmit/s
      veth0->veth1            9,998,660 rx/s          9,998,658 redir/s		0 err,drop/s       9,998,654 xmit/s
      ...
      
      There is also a verbose mode, that can also be enabled by default using -v (--verbose).
      The output mode can be switched dynamically at runtime using Ctrl + \ (SIGQUIT).
      
      To make the concise output more useful, the errors that occur are expanded inline
      (as if verbose mode was enabled) to let the user pin down the source of the
      problem without having to clutter output (or possibly miss it) or always use verbose mode.
      
      For instance, let's consider a case where the output device link state is set to
      down while redirection is happening:
      
      [...]
      veth0->veth1           24,503,376 rx/s                  0 err,drop/s      24,503,372 xmit/s
      veth0->veth1           25,044,775 rx/s                  0 err,drop/s      25,044,783 xmit/s
      veth0->veth1           25,263,046 rx/s                  4 err,drop/s      25,263,028 xmit/s
        redirect_err                  4 error/s
          ENETDOWN                    4 error/s
      [...]
      
      The same holds for xdp_exception actions.
      
      An example of how a complete xdp_redirect_map session would look:
      
       ; sudo ./xdp_redirect_map veth0 veth1
      Redirecting from veth0 (ifindex 5; driver veth) to veth1 (ifindex 4; driver veth)
      veth0->veth1            7,411,506 rx/s                  0 err,drop/s    7,411,470 xmit/s
      veth0->veth1            8,931,770 rx/s                  0 err,drop/s    8,931,771 xmit/s
      ^\
      veth0->veth1            8,787,295 rx/s                  0 err,drop/s    8,787,325 xmit/s
        receive total         8,787,295 pkt/s                 0 drop/s                0 error/s
          cpu:7               8,787,295 pkt/s                 0 drop/s                0 error/s
        redirect_err                  0 error/s
        xdp_exception                 0 hit/s
        xmit veth0->veth1     8,787,325 xmit/s                0 drop/s                0 drv_err/s          2.00 bulk-avg
           cpu:7              8,787,325 xmit/s                0 drop/s                0 drv_err/s          2.00 bulk-avg
      
      veth0->veth1            8,842,610 rx/s                  0 err,drop/s    8,842,606 xmit/s
        receive total         8,842,610 pkt/s                 0 drop/s                0 error/s
          cpu:7               8,842,610 pkt/s                 0 drop/s                0 error/s
        redirect_err                  0 error/s
        xdp_exception                 0 hit/s
        xmit veth0->veth1     8,842,606 xmit/s                0 drop/s                0 drv_err/s          2.00 bulk-avg
           cpu:7              8,842,606 xmit/s                0 drop/s                0 drv_err/s          2.00 bulk-avg
      
      ^C
        Packets received    : 33,973,181
        Average packets/s   : 4,246,648
        Packets transmitted : 33,973,172
        Average transmit/s  : 4,246,647
      
      The xdp_redirect tracepoint (for success stats) needs to be enabled explicitly
      using --stats/-s. Documentation for entire output and options is provided when
      user specifies --help/-h with a sample.
      
      Changelog:
      ----------
      v3 -> v4:
      v3: https://lore.kernel.org/bpf/20210728165552.435050-1-memxor@gmail.com
      
       * Address all feedback from Daniel
        * Use READ_ONCE/WRITE_ONCE from linux/compiler.h (cannot directly include
          due to conflicts with vmlinux.h)
        * Fix MAX_CPUS hardcoding by switching to mmapable array maps, that are
          resized based on the value of libbpf_num_possible_cpus
        * s/ELEMENTS_OF/ARRAY_SIZE/g
        * Use tools/include/linux/hashtable.h
        * Coding style fixes
        * Remove hyperlinks for tracepoints
        * Split into smaller reviewable changes
       * Restore support for specifying custom xdp_redirect_cpu cpumap prog with some
         enhancements, including built-in programs for common actions (pass, drop,
         redirect). By default, cpumap prog is now disabled.
       * Misc bug fixes all over the place
      
        The printing stuff is a lot more basic without hyperlink support, hence it
        has not been exported into a more general facility.
      
      v2 -> v3
      v2: https://lore.kernel.org/bpf/20210721212833.701342-1-memxor@gmail.com
      
       * Address all feedback from Andrii
        * Replace usage of libbpf hashmap (internal API) with custom one
        * Rename ATOMIC_* macros to NO_TEAR_* to better reflect their use
        * Use size_t as a portable word sized data type
        * Set libbpf_set_strict_mode
        * Invert conditions in BPF programs to exit early and reduce nesting
        * Use canonical SEC("xdp") naming for all XDP BPF progams
       * Add missing help description for cpumap enqueue and kthread tracepoints
       * Move private struct declarations from xdp_sample_user.h to .c file
       * Improve help output for cpumap enqueue and cpumap kthread tracepoints
       * Fix a bug where keys array for BPF_MAP_LOOKUP_BATCH is overallocated
       * Fix some conditions for printing stats (earlier only checked pps, now pps,
         drop, err and print if any is greater than zero)
       * Fix alloc_stats_record to properly return and cleanup allocated memory on
         allocation failure instead of calling exit(3)
       * Bump bpf_map_lookup_batch count to 32 to reduce lookup time with multiple
         devices in map
       * Fix a bug where devmap_xmit_multi stats are not printed when previous record
         is missing (i.e. when the first time stats are printed), by simply using a
         dummy record that is zeroed out
       * Also print per-CPU counts for devmap_xmit_multi which we collect already
       * Change mac_map to be BPF_MAP_TYPE_HASH instead of array to prevent resizing
         to a large size when max_ifindex is high, in xdp_redirect_map_multi
       * Fix instance of strerror(errno) in sample_install_xdp to use saved errno
       * Provide a usage function from samples helper
       * Provide a fix where incorrect stats are shown for parallel sessions of
         xdp_redirect_* samples by introducing matching support for input device(s),
         output device(s) and cpumap map id for enqueue and kthread stats.
         Only xdp_monitor doesn't filter stats, all others do.
      
      RFC (v1) -> v2
      RFC (v1): https://lore.kernel.org/bpf/20210528235250.2635167-1-memxor@gmail.com
      
       * Address all feedback from Andrii
         * Use BPF static linking
         * Use vmlinux.h
         * Use BPF_PROG macro
         * Use global variables instead of maps
       * Use of tp_btf for raw_tracepoint progs
       * Switch to timerfd for polling
       * Use libbpf hashmap for maintaing device sets for per ifindex pair
         devmap_xmit stats
       * Fix Makefile to specify object dependencies properly
       * Use in-tree bpftool
       * ... misc fixes and cleanups all over the place
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3bbc8ee7
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Convert xdp_redirect_map_multi to XDP samples helper · 594a116b
      Kumar Kartikeya Dwivedi authored
      Use the libbpf skeleton facility and other utilities provided by XDP
      samples helper. Also adapt to change of type of mac address map, so that
      no resizing is required.
      
      Add a new flag for sample mask that skips priting the
      from_device->to_device heading for each line, as xdp_redirect_map_multi
      may have two devices but the flow of data may be bidirectional, so the
      output would be confusing.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-23-memxor@gmail.com
      594a116b
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Convert xdp_redirect_map_multi_kern.o to XDP samples helper · a29b3ca1
      Kumar Kartikeya Dwivedi authored
      One of the notable changes is using a BPF_MAP_TYPE_HASH instead of array
      map to store mac addresses of devices, as the resizing behavior was
      based on max_ifindex, which unecessarily maximized the capacity of map
      beyond what was needed.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-22-memxor@gmail.com
      a29b3ca1
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Convert xdp_redirect_map to XDP samples helper · bbe65865
      Kumar Kartikeya Dwivedi authored
      Use the libbpf skeleton facility and other utilities provided by XDP
      samples helper.
      
      Since get_mac_addr is already provided by XDP samples helper, we drop
      it. Also convert to XDP samples helper similar to prior samples to
      minimize duplication of code.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-21-memxor@gmail.com
      bbe65865
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Convert xdp_redirect_map_kern.o to XDP samples helper · 54af769d
      Kumar Kartikeya Dwivedi authored
      Also update it to use consistent SEC("xdp") and SEC("xdp_devmap")
      naming, and use global variable instead of BPF map for copying the mac
      address.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-20-memxor@gmail.com
      54af769d
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Convert xdp_redirect_cpu to XDP samples helper · e531a220
      Kumar Kartikeya Dwivedi authored
      Use the libbpf skeleton facility and other utilities provided by XDP
      samples helper.
      
      Similar to xdp_monitor, xdp_redirect_cpu was quite featureful except a
      few minor omissions (e.g. redirect errno reporting). All of these have
      been moved to XDP samples helper, hence drop the unneeded code and
      convert to usage of helpers provided by it.
      
      One of the important changes here is dropping of mprog-disable option,
      as we make that the default. Also, we support built-in programs for some
      common actions on the packet when it reaches kthread (pass, drop,
      redirect to device). If the user still needs to install a custom
      program, they can still supply a BPF object, however the program should
      be suitably tagged with SEC("xdp_cpumap") annotation so that the
      expected attach type is correct when updating our cpumap map element.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-19-memxor@gmail.com
      e531a220
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Convert xdp_redirect_cpu_kern.o to XDP samples helper · 79ccf452
      Kumar Kartikeya Dwivedi authored
      Similar to xdp_monitor_kern, a lot of these BPF programs have been
      reimplemented properly consolidating missing features from other XDP
      samples. Hence, drop the unneeded code and rename to .bpf.c suffix.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-18-memxor@gmail.com
      79ccf452
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Convert xdp_redirect to XDP samples helper · b926c55d
      Kumar Kartikeya Dwivedi authored
      Use the libbpf skeleton facility and other utilities provided by XDP
      samples helper.
      
      One important note:
      The XDP samples helper handles ownership of installed XDP programs on
      devices, including responding to SIGINT and SIGTERM, so drop the code
      here and use the helpers we provide going forward for all xdp_redirect*
      conversions.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-17-memxor@gmail.com
      b926c55d
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Convert xdp_redirect_kern.o to XDP samples helper · 66fc4ca8
      Kumar Kartikeya Dwivedi authored
      We moved swap_src_dst_mac to xdp_sample.bpf.h to be shared with other
      potential users, so drop it while moving code to the new file.
      Also, consistently use SEC("xdp") naming instead.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-16-memxor@gmail.com
      66fc4ca8
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Convert xdp_monitor to XDP samples helper · 6e1051a5
      Kumar Kartikeya Dwivedi authored
      Use the libbpf skeleton facility and other utilities provided by XDP
      samples helper.
      
      A lot of the code in xdp_monitor and xdp_redirect_cpu has been moved to
      the xdp_sample_user.o helper, so we remove the duplicate functions here
      that are no longer needed.
      
      Thanks to BPF skeleton, we no longer depend on order of tracepoints to
      uninstall them on startup. Instead, the sample mask is used to install
      the needed tracepoints.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-15-memxor@gmail.com
      6e1051a5
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Convert xdp_monitor_kern.o to XDP samples helper · 3f199560
      Kumar Kartikeya Dwivedi authored
      We already moved all the functionality it provided in XDP samples helper
      userspace and kernel BPF object, so just delete the unneeded code.
      
      We also add generation of BPF skeleton and compilation using clang
      -target bpf for files ending with .bpf.c suffix (to denote that they use
      vmlinux.h).
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-14-memxor@gmail.com
      3f199560
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Add vmlinux.h generation support · 384b6b3b
      Kumar Kartikeya Dwivedi authored
      Also, take this opportunity to depend on in-tree bpftool, so that we can
      use static linking support in subsequent commits for XDP samples BPF
      helper object.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-13-memxor@gmail.com
      384b6b3b
    • Kumar Kartikeya Dwivedi's avatar
      samples: bpf: Add devmap_xmit tracepoint statistics support · af93d58c
      Kumar Kartikeya Dwivedi authored
      This adds support for retrieval and printing for devmap_xmit total and
      mutli mode tracepoint. For multi mode, we keep a hash map entry for each
      redirection stream, such that we can dynamically add and remove entries
      on output.
      
      The from_match and to_match will be set by individual samples when
      setting up the XDP program on these devices.
      
      The multi mode tracepoint is also handy for xdp_redirect_map_multi,
      where up to 32 devices can be specified.
      
      Also add samples_init_pre_load macro to finally set up the resized maps
      and mmap them in place for low overhead stats retrieval.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210821002010.845777-12-memxor@gmail.com
      af93d58c