1. 04 Jun, 2018 10 commits
    • Magnus Karlsson's avatar
    • Björn Töpel's avatar
      samples/bpf: adapted to new uapi · a412ef54
      Björn Töpel authored
      Here, the xdpsock sample application is adjusted to the new descriptor
      format.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a412ef54
    • Björn Töpel's avatar
      xsk: new descriptor addressing scheme · bbff2f32
      Björn Töpel authored
      Currently, AF_XDP only supports a fixed frame-size memory scheme where
      each frame is referenced via an index (idx). A user passes the frame
      index to the kernel, and the kernel acts upon the data.  Some NICs,
      however, do not have a fixed frame-size model, instead they have a
      model where a memory window is passed to the hardware and multiple
      frames are filled into that window (referred to as the "type-writer"
      model).
      
      By changing the descriptor format from the current frame index
      addressing scheme, AF_XDP can in the future be extended to support
      these kinds of NICs.
      
      In the index-based model, an idx refers to a frame of size
      frame_size. Addressing a frame in the UMEM is done by offseting the
      UMEM starting address by a global offset, idx * frame_size + offset.
      Communicating via the fill- and completion-rings are done by means of
      idx.
      
      In this commit, the idx is removed in favor of an address (addr),
      which is a relative address ranging over the UMEM. To convert an
      idx-based address to the new addr is simply: addr = idx * frame_size +
      offset.
      
      We also stop referring to the UMEM "frame" as a frame. Instead it is
      simply called a chunk.
      
      To transfer ownership of a chunk to the kernel, the addr of the chunk
      is passed in the fill-ring. Note, that the kernel will mask addr to
      make it chunk aligned, so there is no need for userspace to do
      that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or
      3000 to the fill-ring will refer to the same chunk.
      
      On the completion-ring, the addr will match that of the Tx descriptor,
      passed to the kernel.
      
      Changing the descriptor format to use chunks/addr will allow for
      future changes to move to a type-writer based model, where multiple
      frames can reside in one chunk. In this model passing one single chunk
      into the fill-ring, would potentially result in multiple Rx
      descriptors.
      
      This commit changes the uapi of AF_XDP sockets, and updates the
      documentation.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bbff2f32
    • Björn Töpel's avatar
      xsk: proper Rx drop statistics update · a509a955
      Björn Töpel authored
      Previously, rx_dropped could be updated incorrectly, e.g. if the XDP
      program redirected the frame to a socket bound to a different queue
      than where the XDP program was executing.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a509a955
    • Björn Töpel's avatar
      xsk: proper fill queue descriptor validation · 4e64c835
      Björn Töpel authored
      Previously the fill queue descriptor was not copied to kernel space
      prior validating it, making it possible for userland to change the
      descriptor post-kernel-validation.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4e64c835
    • David Ahern's avatar
      bpf: flowlabel in bpf_fib_lookup should be flowinfo · bd3a08aa
      David Ahern authored
      As Michal noted the flow struct takes both the flow label and priority.
      Update the bpf_fib_lookup API to note that it is flowinfo and not just
      the flow label.
      
      Cc: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bd3a08aa
    • Alexei Starovoitov's avatar
      Merge branch 'bpf_get_current_cgroup_id' · 432bdb58
      Alexei Starovoitov authored
      Yonghong Song says:
      
      ====================
      bpf has been used extensively for tracing. For example, bcc
      contains an almost full set of bpf-based tools to trace kernel
      and user functions/events. Most tracing tools are currently
      either filtered based on pid or system-wide.
      
      Containers have been used quite extensively in industry and
      cgroup is often used together to provide resource isolation
      and protection. Several processes may run inside the same
      container. It is often desirable to get container-level tracing
      results as well, e.g. syscall count, function count, I/O
      activity, etc.
      
      This patch implements a new helper, bpf_get_current_cgroup_id(),
      which will return cgroup id based on the cgroup within which
      the current task is running.
      
      Patch #1 implements the new helper in the kernel.
      Patch #2 syncs the uapi bpf.h header and helper between tools
      and kernel.
      Patch #3 shows how to get the same cgroup id in user space,
      so a filter or policy could be configgured in the bpf program
      based on current task cgroup.
      
      Changelog:
        v1 -> v2:
           . rebase to resolve merge conflict with latest bpf-next.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      432bdb58
    • Yonghong Song's avatar
      tools/bpf: add a selftest for bpf_get_current_cgroup_id() helper · f269099a
      Yonghong Song authored
      Syscall name_to_handle_at() can be used to get cgroup id
      for a particular cgroup path in user space. The selftest
      got cgroup id from both user and kernel, and compare to
      ensure they are equal to each other.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f269099a
    • Yonghong Song's avatar
      tools/bpf: sync uapi bpf.h for bpf_get_current_cgroup_id() helper · c7ddbbaf
      Yonghong Song authored
      Sync kernel uapi/linux/bpf.h with tools uapi/linux/bpf.h.
      Also add the necessary helper define in bpf_helpers.h.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c7ddbbaf
    • Yonghong Song's avatar
      bpf: implement bpf_get_current_cgroup_id() helper · bf6fa2c8
      Yonghong Song authored
      bpf has been used extensively for tracing. For example, bcc
      contains an almost full set of bpf-based tools to trace kernel
      and user functions/events. Most tracing tools are currently
      either filtered based on pid or system-wide.
      
      Containers have been used quite extensively in industry and
      cgroup is often used together to provide resource isolation
      and protection. Several processes may run inside the same
      container. It is often desirable to get container-level tracing
      results as well, e.g. syscall count, function count, I/O
      activity, etc.
      
      This patch implements a new helper, bpf_get_current_cgroup_id(),
      which will return cgroup id based on the cgroup within which
      the current task is running.
      
      The later patch will provide an example to show that
      userspace can get the same cgroup id so it could
      configure a filter or policy in the bpf program based on
      task cgroup id.
      
      The helper is currently implemented for tracing. It can
      be added to other program types as well when needed.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bf6fa2c8
  2. 03 Jun, 2018 21 commits
  3. 02 Jun, 2018 9 commits