1. 30 Aug, 2019 18 commits
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Allow XSK frames smaller than a page · 282c0c79
      Maxim Mikityanskiy authored
      Relax the requirements to the XSK frame size to allow it to be smaller
      than a page and even not a power of two. The current implementation can
      work in this mode, both with Striding RQ and without it.
      
      The code that checks `mtu + headroom <= XSK frame size` is modified
      accordingly. Any frame size between 2048 and PAGE_SIZE is accepted.
      
      Functions that worked with pages only now work with XSK frames, even if
      their size is different from PAGE_SIZE.
      
      With XSK queues, regardless of the frame size, Striding RQ uses the
      stride size of PAGE_SIZE, and UMR MTTs are posted using starting
      addresses of frames, but PAGE_SIZE as page size. MTU guarantees that no
      packet data will overlap with other frames. UMR MTT size is made equal
      to the stride size of the RQ, because UMEM frames may come in random
      order, and we need to handle them one by one. PAGE_SIZE is just a power
      of two that is bigger than any allowed XSK frame size, and also it
      doesn't require making additional changes to the code.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      282c0c79
    • Kevin Laatz's avatar
      mlx5e: modify driver for handling offsets · beb3e4b2
      Kevin Laatz authored
      With the addition of the unaligned chunks option, we need to make sure we
      handle the offsets accordingly based on the mode we are currently running
      in. This patch modifies the driver to appropriately mask the address for
      each case.
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      beb3e4b2
    • Kevin Laatz's avatar
      ixgbe: modify driver for handling offsets · d8c3061e
      Kevin Laatz authored
      With the addition of the unaligned chunks option, we need to make sure we
      handle the offsets accordingly based on the mode we are currently running
      in. This patch modifies the driver to appropriately mask the address for
      each case.
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d8c3061e
    • Kevin Laatz's avatar
      i40e: modify driver for handling offsets · 2f86c806
      Kevin Laatz authored
      With the addition of the unaligned chunks option, we need to make sure we
      handle the offsets accordingly based on the mode we are currently running
      in. This patch modifies the driver to appropriately mask the address for
      each case.
      Signed-off-by: default avatarBruce Richardson <bruce.richardson@intel.com>
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2f86c806
    • Kevin Laatz's avatar
      xsk: add support to allow unaligned chunk placement · c05cd364
      Kevin Laatz authored
      Currently, addresses are chunk size aligned. This means, we are very
      restricted in terms of where we can place chunk within the umem. For
      example, if we have a chunk size of 2k, then our chunks can only be placed
      at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
      
      This patch introduces the ability to use unaligned chunks. With these
      changes, we are no longer bound to having to place chunks at a 2k (or
      whatever your chunk size is) interval. Since we are no longer dealing with
      aligned chunks, they can now cross page boundaries. Checks for page
      contiguity have been added in order to keep track of which pages are
      followed by a physically contiguous page.
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Signed-off-by: default avatarBruce Richardson <bruce.richardson@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c05cd364
    • Kevin Laatz's avatar
      ixgbe: simplify Rx buffer recycle · b35a2d3e
      Kevin Laatz authored
      Currently, the dma, addr and handle are modified when we reuse Rx buffers
      in zero-copy mode. However, this is not required as the inputs to the
      function are copies, not the original values themselves. As we use the
      copies within the function, we can use the original 'obi' values
      directly without having to mask and add the headroom.
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b35a2d3e
    • Kevin Laatz's avatar
      i40e: simplify Rx buffer recycle · 10912fc9
      Kevin Laatz authored
      Currently, the dma, addr and handle are modified when we reuse Rx buffers
      in zero-copy mode. However, this is not required as the inputs to the
      function are copies, not the original values themselves. As we use the
      copies within the function, we can use the original 'old_bi' values
      directly without having to mask and add the headroom.
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      10912fc9
    • Masanari Iida's avatar
      selftests/bpf: Fix a typo in test_offload.py · 1c6d6e02
      Masanari Iida authored
      This patch fix a spelling typo in test_offload.py
      Signed-off-by: default avatarMasanari Iida <standby24x7@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1c6d6e02
    • Petar Penkov's avatar
      bpf: fix error check in bpf_tcp_gen_syncookie · 0741be35
      Petar Penkov authored
      If a SYN cookie is not issued by tcp_v#_gen_syncookie, then the return
      value will be exactly 0, rather than <= 0. Let's change the check to
      reflect that, especially since mss is an unsigned value and cannot be
      negative.
      
      Fixes: 70d66244 ("bpf: add bpf_tcp_gen_syncookie helper")
      Reported-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarPetar Penkov <ppenkov@google.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0741be35
    • Daniel Borkmann's avatar
      Merge branch 'bpf-nfp-map-op-cache' · 736a5530
      Daniel Borkmann authored
      Jakub Kicinski says:
      
      ====================
      This set adds a small batching and cache mechanism to the driver.
      Map dumps require two operations per element - get next, and
      lookup. Each of those needs a round trip to the device, and on
      a loaded system scheduling out and in of the dumping process.
      This set makes the driver request a number of entries at the same
      time, and if no operation which would modify the map happens
      from the host side those entries are used to serve lookup
      requests for up to 250us, at which point they are considered
      stale.
      
      This set has been measured to provide almost 4x dumping speed
      improvement, Jaco says:
      
      OLD dump times
          500 000 elements: 26.1s
        1 000 000 elements: 54.5s
      
      NEW dump times
          500 000 elements: 7.6s
        1 000 000 elements: 16.5s
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      736a5530
    • Jakub Kicinski's avatar
      nfp: bpf: add simple map op cache · f24e2909
      Jakub Kicinski authored
      Each get_next and lookup call requires a round trip to the device.
      However, the device is capable of giving us a few entries back,
      instead of just one.
      
      In this patch we ask for a small yet reasonable number of entries
      (4) on every get_next call, and on subsequent get_next/lookup calls
      check this little cache for a hit. The cache is only kept for 250us,
      and is invalidated on every operation which may modify the map
      (e.g. delete or update call). Note that operations may be performed
      simultaneously, so we have to keep track of operations in flight.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f24e2909
    • Jakub Kicinski's avatar
      nfp: bpf: rework MTU checking · bc2796db
      Jakub Kicinski authored
      If control channel MTU is too low to support map operations a warning
      will be printed. This is not enough, we want to make sure probe fails
      in such scenario, as this would clearly be a faulty configuration.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bc2796db
    • Daniel Borkmann's avatar
      Merge branch 'bpf-bpftool-build-improvements' · c5a2c734
      Daniel Borkmann authored
      Quentin Monnet says:
      
      ====================
      This set attempts to make it easier to build bpftool, in particular when
      passing a specific output directory. This is a follow-up to the
      conversation held last month by Lorenz, Ilya and Jakub [0].
      
      The first patch is a minor fix to bpftool's Makefile, regarding the
      retrieval of kernel version (which currently prints a non-relevant make
      warning on some invocations).
      
      Second patch improves the Makefile commands to support more "make"
      invocations, or to fix building with custom output directory. On Jakub's
      suggestion, a script is also added to BPF selftests in order to keep track
      of the supported build variants.
      
      Building bpftool with "make tools/bpf" from the top of the repository
      generates files in "libbpf/" and "feature/" directories under tools/bpf/
      and tools/bpf/bpftool/. The third patch ensures such directories are taken
      care of on "make clean", and add them to the relevant .gitignore files.
      
      At last, fourth patch is a sligthly modified version of Ilya's fix
      regarding libbpf.a appearing twice on the linking command for bpftool.
      
      [0] https://lore.kernel.org/bpf/CACAyw9-CWRHVH3TJ=Tke2x8YiLsH47sLCijdp=V+5M836R9aAA@mail.gmail.com/
      
      v2:
      - Return error from check script if one of the make invocations returns
        non-zero (even if binary is successfully produced).
      - Run "make clean" from bpf/ and not only bpf/bpftool/ in that same script,
        when relevant.
      - Add a patch to clean up generated "feature/" and "libbpf/" directories.
      ====================
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Tested-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Cc: Lorenz Bauer <lmb@cloudflare.com>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c5a2c734
    • Quentin Monnet's avatar
      tools: bpftool: do not link twice against libbpf.a in Makefile · 5b84ad2e
      Quentin Monnet authored
      In bpftool's Makefile, $(LIBS) includes $(LIBBPF), therefore the library
      is used twice in the linking command. No need to have $(LIBBPF) (from
      $^) on that command, let's do with "$(OBJS) $(LIBS)" (but move $(LIBBPF)
      _before_ the -l flags in $(LIBS)).
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5b84ad2e
    • Quentin Monnet's avatar
      tools: bpf: account for generated feature/ and libbpf/ directories · fbdb620b
      Quentin Monnet authored
      When building "tools/bpf" from the top of the Linux repository, the
      build system passes a value for the $(OUTPUT) Makefile variable to
      tools/bpf/Makefile and tools/bpf/bpftool/Makefile, which results in
      generating "libbpf/" (for bpftool) and "feature/" (bpf and bpftool)
      directories inside the tree.
      
      This commit adds such directories to the relevant .gitignore files, and
      edits the Makefiles to ensure they are removed on "make clean". The use
      of "rm" is also made consistent throughout those Makefiles (relies on
      the $(RM) variable, use "--" to prevent interpreting
      $(OUTPUT)/$(DESTDIR) as options.
      
      v2:
      - New patch.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      fbdb620b
    • Quentin Monnet's avatar
      tools: bpftool: improve and check builds for different make invocations · 45c5589d
      Quentin Monnet authored
      There are a number of alternative "make" invocations that can be used to
      compile bpftool. The following invocations are expected to work:
      
        - through the kbuild system, from the top of the repository
          (make tools/bpf)
        - by telling make to change to the bpftool directory
          (make -C tools/bpf/bpftool)
        - by building the BPF tools from tools/
          (cd tools && make bpf)
        - by running make from bpftool directory
          (cd tools/bpf/bpftool && make)
      
      Additionally, setting the O or OUTPUT variables should tell the build
      system to use a custom output path, for each of these alternatives.
      
      The following patch fixes the following invocations:
      
        $ make tools/bpf
        $ make tools/bpf O=<dir>
        $ make -C tools/bpf/bpftool OUTPUT=<dir>
        $ make -C tools/bpf/bpftool O=<dir>
        $ cd tools/ && make bpf O=<dir>
        $ cd tools/bpf/bpftool && make OUTPUT=<dir>
        $ cd tools/bpf/bpftool && make O=<dir>
      
      After this commit, the build still fails for two variants when passing
      the OUTPUT variable:
      
        $ make tools/bpf OUTPUT=<dir>
        $ cd tools/ && make bpf OUTPUT=<dir>
      
      In order to remember and check what make invocations are supposed to
      work, and to document the ones which do not, a new script is added to
      the BPF selftests. Note that some invocations require the kernel to be
      configured, so the script skips them if no .config file is found.
      
      v2:
      - In make_and_clean(), set $ERROR to 1 when "make" returns non-zero,
        even if the binary was produced.
      - Run "make clean" from the correct directory (bpf/ instead of bpftool/,
        when relevant).
      Reported-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      45c5589d
    • Quentin Monnet's avatar
      tools: bpftool: ignore make built-in rules for getting kernel version · e0a43aa3
      Quentin Monnet authored
      Bpftool calls the toplevel Makefile to get the kernel version for the
      sources it is built from. But when the utility is built from the top of
      the kernel repository, it may dump the following error message for
      certain architectures (including x86):
      
          $ make tools/bpf
          [...]
          make[3]: *** [checkbin] Error 1
          [...]
      
      This does not prevent bpftool compilation, but may feel disconcerting.
      The "checkbin" arch-dependent target is not supposed to be called for
      target "kernelversion", which is a simple "echo" of the version number.
      
      It turns out this is caused by the make invocation in tools/bpf/bpftool,
      which attempts to find implicit rules to apply. Extract from debug
      output:
      
          Reading makefiles...
          Reading makefile 'Makefile'...
          Reading makefile 'scripts/Kbuild.include' (search path) (no ~ expansion)...
          Reading makefile 'scripts/subarch.include' (search path) (no ~ expansion)...
          Reading makefile 'arch/x86/Makefile' (search path) (no ~ expansion)...
          Reading makefile 'scripts/Makefile.kcov' (search path) (no ~ expansion)...
          Reading makefile 'scripts/Makefile.gcc-plugins' (search path) (no ~ expansion)...
          Reading makefile 'scripts/Makefile.kasan' (search path) (no ~ expansion)...
          Reading makefile 'scripts/Makefile.extrawarn' (search path) (no ~ expansion)...
          Reading makefile 'scripts/Makefile.ubsan' (search path) (no ~ expansion)...
          Updating makefiles....
           Considering target file 'scripts/Makefile.ubsan'.
            Looking for an implicit rule for 'scripts/Makefile.ubsan'.
            Trying pattern rule with stem 'Makefile.ubsan'.
          [...]
            Trying pattern rule with stem 'Makefile.ubsan'.
            Trying implicit prerequisite 'scripts/Makefile.ubsan.o'.
            Looking for a rule with intermediate file 'scripts/Makefile.ubsan.o'.
             Avoiding implicit rule recursion.
             Trying pattern rule with stem 'Makefile.ubsan'.
             Trying rule prerequisite 'prepare'.
             Trying rule prerequisite 'FORCE'.
            Found an implicit rule for 'scripts/Makefile.ubsan'.
              Considering target file 'prepare'.
               File 'prepare' does not exist.
                Considering target file 'prepare0'.
                 File 'prepare0' does not exist.
                  Considering target file 'archprepare'.
                   File 'archprepare' does not exist.
                    Considering target file 'archheaders'.
                     File 'archheaders' does not exist.
                     Finished prerequisites of target file 'archheaders'.
                    Must remake target 'archheaders'.
          Putting child 0x55976f4f6980 (archheaders) PID 31743 on the chain.
      
      To avoid that, pass the -r and -R flags to eliminate the use of make
      built-in rules (and while at it, built-in variables) when running
      command "make kernelversion" from bpftool's Makefile.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e0a43aa3
    • Yauheni Kaliuta's avatar
      bpf: s390: add JIT support for multi-function programs · 1c8f9b91
      Yauheni Kaliuta authored
      This adds support for bpf-to-bpf function calls in the s390 JIT
      compiler. The JIT compiler converts the bpf call instructions to
      native branch instructions. After a round of the usual passes, the
      start addresses of the JITed images for the callee functions are
      known. Finally, to fixup the branch target addresses, we need to
      perform an extra pass.
      
      Because of the address range in which JITed images are allocated on
      s390, the offsets of the start addresses of these images from
      __bpf_call_base are as large as 64 bits. So, for a function call,
      the imm field of the instruction cannot be used to determine the
      callee's address. Use bpf_jit_get_func_addr() helper instead.
      
      The patch borrows a lot from:
      
      commit 8c11ea5c ("bpf, arm64: fix getting subprog addr from aux
      for calls")
      
      commit e2c95a61 ("bpf, ppc64: generalize fetching subprog into
      bpf_jit_get_func_addr")
      
      commit 8484ce83 ("bpf: powerpc64: add JIT support for
      multi-function programs")
      
      (including the commit message).
      
      test_verifier (5.3-rc6 with CONFIG_BPF_JIT_ALWAYS_ON=y):
      
      without patch:
      Summary: 1501 PASSED, 0 SKIPPED, 47 FAILED
      
      with patch:
      Summary: 1540 PASSED, 0 SKIPPED, 8 FAILED
      Signed-off-by: default avatarYauheni Kaliuta <yauheni.kaliuta@redhat.com>
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Tested-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1c8f9b91
  2. 27 Aug, 2019 11 commits
  3. 21 Aug, 2019 10 commits
  4. 20 Aug, 2019 1 commit
    • Alexei Starovoitov's avatar
      Merge branch 'btf_get_next_id' · 51746f94
      Alexei Starovoitov authored
      Quentin Monnet says:
      
      ====================
      This set adds a new command BPF_BTF_GET_NEXT_ID to the bpf() system call,
      adds the relevant API function in libbpf, and uses it in bpftool to list
      all BTF objects loaded on the system (and to dump the ids of maps and
      programs associated with them, if any).
      
      The main motivation of listing BTF objects is introspection and debugging
      purposes. By getting BPF program and map information, it should already be
      possible to list all BTF objects associated to at least one map or one
      program. But there may be unattached BTF objects, held by a file descriptor
      from a user space process only, and we may want to list them too.
      
      As a side note, it also turned useful for examining the BTF objects
      attached to offloaded programs, which would not show in program information
      because the BTF id is not copied when retrieving such info. A fix is in
      progress on that side.
      
      v2:
      - Rebase patch with new libbpf function on top of Andrii's changes
        regarding libbpf versioning.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      51746f94