1. 17 Aug, 2019 8 commits
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Add AF_XDP need_wakeup support · a7bd4018
      Maxim Mikityanskiy authored
      This commit adds support for the new need_wakeup feature of AF_XDP. The
      applications can opt-in by using the XDP_USE_NEED_WAKEUP bind() flag.
      When this feature is enabled, some behavior changes:
      
      RX side: If the Fill Ring is empty, instead of busy-polling, set the
      flag to tell the application to kick the driver when it refills the Fill
      Ring.
      
      TX side: If there are pending completions or packets queued for
      transmission, set the flag to tell the application that it can skip the
      sendto() syscall and save time.
      
      The performance testing was performed on a machine with the following
      configuration:
      
      - 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz
      - Mellanox ConnectX-5 Ex with 100 Gbit/s link
      
      The results with retpoline disabled:
      
             | without need_wakeup  | with need_wakeup     |
             |----------------------|----------------------|
             | one core | two cores | one core | two cores |
      -------|----------|-----------|----------|-----------|
      txonly | 20.1     | 33.5      | 29.0     | 34.2      |
      rxdrop | 0.065    | 14.1      | 12.0     | 14.1      |
      l2fwd  | 0.032    | 7.3       | 6.6      | 7.2       |
      
      "One core" means the application and NAPI run on the same core. "Two
      cores" means they are pinned to different cores.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a7bd4018
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Move the SW XSK code from NAPI poll to a separate function · 871aa189
      Maxim Mikityanskiy authored
      Two XSK tasks are performed during NAPI polling, that are not bound to
      hardware interrupts: TXing packets and polling for frames in the Fill
      Ring. They are special in a way that the hardware doesn't know about
      these tasks, so it doesn't trigger interrupts if there is still some
      work to be done, it's our driver's responsibility to ensure NAPI will be
      rescheduled if needed.
      
      Create a new function to handle these tasks and move the corresponding
      code from mlx5e_napi_poll to the new function to improve modularity and
      prepare for the changes in the following patch.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      871aa189
    • Magnus Karlsson's avatar
      samples/bpf: add use of need_wakeup flag in xdpsock · 46738f73
      Magnus Karlsson authored
      This commit adds using the need_wakeup flag to the xdpsock sample
      application. It is turned on by default as we think it is a feature
      that seems to always produce a performance benefit, if the application
      has been written taking advantage of it. It can be turned off in the
      sample app by using the '-m' command line option.
      
      The txpush and l2fwd sub applications have also been updated to
      support poll() with multiple sockets.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      46738f73
    • Magnus Karlsson's avatar
      libbpf: add support for need_wakeup flag in AF_XDP part · a4500432
      Magnus Karlsson authored
      This commit adds support for the new need_wakeup flag in AF_XDP. The
      xsk_socket__create function is updated to handle this and a new
      function is introduced called xsk_ring_prod__needs_wakeup(). This
      function can be used by the application to check if Rx and/or Tx
      processing needs to be explicitly woken up.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a4500432
    • Magnus Karlsson's avatar
      ixgbe: add support for AF_XDP need_wakeup feature · 5c129241
      Magnus Karlsson authored
      This patch adds support for the need_wakeup feature of AF_XDP. If the
      application has told the kernel that it might sleep using the new bind
      flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has
      no more buffers on the NIC Rx ring and yield to the application. For
      Tx, it will set the flag if it has no outstanding Tx completion
      interrupts and return to the application.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5c129241
    • Magnus Karlsson's avatar
      i40e: add support for AF_XDP need_wakeup feature · 3d0c5f1c
      Magnus Karlsson authored
      This patch adds support for the need_wakeup feature of AF_XDP. If the
      application has told the kernel that it might sleep using the new bind
      flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has
      no more buffers on the NIC Rx ring and yield to the application. For
      Tx, it will set the flag if it has no outstanding Tx completion
      interrupts and return to the application.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3d0c5f1c
    • Magnus Karlsson's avatar
      xsk: add support for need_wakeup flag in AF_XDP rings · 77cd0d7b
      Magnus Karlsson authored
      This commit adds support for a new flag called need_wakeup in the
      AF_XDP Tx and fill rings. When this flag is set, it means that the
      application has to explicitly wake up the kernel Rx (for the bit in
      the fill ring) or kernel Tx (for bit in the Tx ring) processing by
      issuing a syscall. Poll() can wake up both depending on the flags
      submitted and sendto() will wake up tx processing only.
      
      The main reason for introducing this new flag is to be able to
      efficiently support the case when application and driver is executing
      on the same core. Previously, the driver was just busy-spinning on the
      fill ring if it ran out of buffers in the HW and there were none on
      the fill ring. This approach works when the application is running on
      another core as it can replenish the fill ring while the driver is
      busy-spinning. Though, this is a lousy approach if both of them are
      running on the same core as the probability of the fill ring getting
      more entries when the driver is busy-spinning is zero. With this new
      feature the driver now sets the need_wakeup flag and returns to the
      application. The application can then replenish the fill queue and
      then explicitly wake up the Rx processing in the kernel using the
      syscall poll(). For Tx, the flag is only set to one if the driver has
      no outstanding Tx completion interrupts. If it has some, the flag is
      zero as it will be woken up by a completion interrupt anyway.
      
      As a nice side effect, this new flag also improves the performance of
      the case where application and driver are running on two different
      cores as it reduces the number of syscalls to the kernel. The kernel
      tells user space if it needs to be woken up by a syscall, and this
      eliminates many of the syscalls.
      
      This flag needs some simple driver support. If the driver does not
      support this, the Rx flag is always zero and the Tx flag is always
      one. This makes any application relying on this feature default to the
      old behaviour of not requiring any syscalls in the Rx path and always
      having to call sendto() in the Tx path.
      
      For backwards compatibility reasons, this feature has to be explicitly
      turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend
      that you always turn it on as it so far always have had a positive
      performance impact.
      
      The name and inspiration of the flag has been taken from io_uring by
      Jens Axboe. Details about this feature in io_uring can be found in
      http://kernel.dk/io_uring.pdf, section 8.3.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      77cd0d7b
    • Magnus Karlsson's avatar
      xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup · 9116e5e2
      Magnus Karlsson authored
      This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new
      ndo provides the same functionality as before but with the addition of
      a new flags field that is used to specifiy if Rx, Tx or both should be
      woken up. The previous ndo only woke up Tx, as implied by the
      name. The i40e and ixgbe drivers (which are all the supported ones)
      are updated with this new interface.
      
      This new ndo will be used by the new need_wakeup functionality of XDP
      sockets that need to be able to wake up both Rx and Tx driver
      processing.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9116e5e2
  2. 16 Aug, 2019 14 commits
  3. 14 Aug, 2019 12 commits
  4. 13 Aug, 2019 6 commits
    • Vlad Buslov's avatar
      net: devlink: remove redundant rtnl lock assert · 043b8413
      Vlad Buslov authored
      It is enough for caller of devlink_compat_switch_id_get() to hold the net
      device to guarantee that devlink port is not destroyed concurrently. Remove
      rtnl lock assertion and modify comment to warn user that they must hold
      either rtnl lock or reference to net device. This is necessary to
      accommodate future implementation of rtnl-unlocked TC offloads driver
      callbacks.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      043b8413
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 708852dc
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      The following pull-request contains BPF updates for your *net-next* tree.
      
      There is a small merge conflict in libbpf (Cc Andrii so he's in the loop
      as well):
      
              for (i = 1; i <= btf__get_nr_types(btf); i++) {
                      t = (struct btf_type *)btf__type_by_id(btf, i);
      
                      if (!has_datasec && btf_is_var(t)) {
                              /* replace VAR with INT */
                              t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
        <<<<<<< HEAD
                              /*
                               * using size = 1 is the safest choice, 4 will be too
                               * big and cause kernel BTF validation failure if
                               * original variable took less than 4 bytes
                               */
                              t->size = 1;
                              *(int *)(t+1) = BTF_INT_ENC(0, 0, 8);
                      } else if (!has_datasec && kind == BTF_KIND_DATASEC) {
        =======
                              t->size = sizeof(int);
                              *(int *)(t + 1) = BTF_INT_ENC(0, 0, 32);
                      } else if (!has_datasec && btf_is_datasec(t)) {
        >>>>>>> 72ef80b5
                              /* replace DATASEC with STRUCT */
      
      Conflict is between the two commits 1d4126c4 ("libbpf: sanitize VAR to
      conservative 1-byte INT") and b03bc685 ("libbpf: convert libbpf code to
      use new btf helpers"), so we need to pick the sanitation fixup as well as
      use the new btf_is_datasec() helper and the whitespace cleanup. Looks like
      the following:
      
        [...]
                      if (!has_datasec && btf_is_var(t)) {
                              /* replace VAR with INT */
                              t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
                              /*
                               * using size = 1 is the safest choice, 4 will be too
                               * big and cause kernel BTF validation failure if
                               * original variable took less than 4 bytes
                               */
                              t->size = 1;
                              *(int *)(t + 1) = BTF_INT_ENC(0, 0, 8);
                      } else if (!has_datasec && btf_is_datasec(t)) {
                              /* replace DATASEC with STRUCT */
        [...]
      
      The main changes are:
      
      1) Addition of core parts of compile once - run everywhere (co-re) effort,
         that is, relocation of fields offsets in libbpf as well as exposure of
         kernel's own BTF via sysfs and loading through libbpf, from Andrii.
      
         More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2
         and http://vger.kernel.org/lpc-bpf2018.html#session-2
      
      2) Enable passing input flags to the BPF flow dissector to customize parsing
         and allowing it to stop early similar to the C based one, from Stanislav.
      
      3) Add a BPF helper function that allows generating SYN cookies from XDP and
         tc BPF, from Petar.
      
      4) Add devmap hash-based map type for more flexibility in device lookup for
         redirects, from Toke.
      
      5) Improvements to XDP forwarding sample code now utilizing recently enabled
         devmap lookups, from Jesper.
      
      6) Add support for reporting the effective cgroup progs in bpftool, from Jakub
         and Takshak.
      
      7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter.
      
      8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan.
      
      9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei.
      
      10) Add perf event output helper also for other skb-based program types, from Allan.
      
      11) Fix a co-re related compilation error in selftests, from Yonghong.
      ====================
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      708852dc
    • YueHaibing's avatar
      net: hns3: Make hclge_func_reset_sync_vf static · a9a96760
      YueHaibing authored
      Fix sparse warning:
      
      drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:3190:5:
       warning: symbol 'hclge_func_reset_sync_vf' was not declared. Should it be static?
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      a9a96760
    • Jiri Pirko's avatar
      devlink: send notifications for deleted snapshots on region destroy · 92b49822
      Jiri Pirko authored
      Currently the notifications for deleted snapshots are sent only in case
      user deletes a snapshot manually. Send the notifications in case region
      is destroyed too.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      92b49822
    • Daniel Borkmann's avatar
      Merge branch 'bpf-libbpf-read-sysfs-btf' · 72ef80b5
      Daniel Borkmann authored
      Andrii Nakryiko says:
      
      ====================
      Now that kernel's BTF is exposed through sysfs at well-known location, attempt
      to load it first as a target BTF for the purpose of BPF CO-RE relocations.
      
      Patch #1 is a follow-up patch to rename /sys/kernel/btf/kernel into
      /sys/kernel/btf/vmlinux.
      
      Patch #2 adds ability to load raw BTF contents from sysfs and expands the list
      of locations libbpf attempts to load vmlinux BTF from.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      72ef80b5
    • Andrii Nakryiko's avatar
      libbpf: attempt to load kernel BTF from sysfs first · a1916a15
      Andrii Nakryiko authored
      Add support for loading kernel BTF from sysfs (/sys/kernel/btf/vmlinux)
      as a target BTF. Also extend the list of on disk search paths for
      vmlinux ELF image with entries that perf is searching for.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a1916a15