1. 19 Mar, 2018 3 commits
    • John Fastabend's avatar
      net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG · 312fc2b4
      John Fastabend authored
      When calling do_tcp_sendpages() from in kernel and we know the data
      has no references from user side we can omit SKBTX_SHARED_FRAG flag.
      This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
      to omit setting SKBTX_SHARED_FRAG.
      
      The flag is not exposed to userspace because the sendpage call from
      the splice logic masks out all bits except MSG_MORE.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      312fc2b4
    • John Fastabend's avatar
      sockmap: convert refcnt to an atomic refcnt · ffa35660
      John Fastabend authored
      The sockmap refcnt up until now has been wrapped in the
      sk_callback_lock(). So its not actually needed any locking of its
      own. The counter itself tracks the lifetime of the psock object.
      Sockets in a sockmap have a lifetime that is independent of the
      map they are part of. This is possible because a single socket may
      be in multiple maps. When this happens we can only release the
      psock data associated with the socket when the refcnt reaches
      zero. There are three possible delete sock reference decrement
      paths first through the normal sockmap process, the user deletes
      the socket from the map. Second the map is removed and all sockets
      in the map are removed, delete path is similar to case 1. The third
      case is an asyncronous socket event such as a closing the socket. The
      last case handles removing sockets that are no longer available.
      For completeness, although inc does not pose any problems in this
      patch series, the inc case only happens when a psock is added to a
      map.
      
      Next we plan to add another socket prog type to handle policy and
      monitoring on the TX path. When we do this however we will need to
      keep a reference count open across the sendmsg/sendpage call and
      holding the sk_callback_lock() here (on every send) seems less than
      ideal, also it may sleep in cases where we hit memory pressure.
      Instead of dealing with these issues in some clever way simply make
      the reference counting a refcnt_t type and do proper atomic ops.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ffa35660
    • John Fastabend's avatar
      sock: make static tls function alloc_sg generic sock helper · 2c3682f0
      John Fastabend authored
      The TLS ULP module builds scatterlists from a sock using
      page_frag_refill(). This is going to be useful for other ULPs
      so move it into sock file for more general use.
      
      In the process remove useless goto at end of while loop.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2c3682f0
  2. 16 Mar, 2018 5 commits
  3. 15 Mar, 2018 3 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-stackmap-build-id' · 68de5ef4
      Daniel Borkmann authored
      Song Liu says:
      
      ====================
      This work follows up discussion at Plumbers'17 on improving addr->sym
      resolution of user stack traces. The following links have more information
      of the discussion:
      
      http://www.linuxplumbersconf.org/2017/ocw/proposals/4764
      https://lwn.net/Articles/734453/     Section "Stack traces and kprobes"
      
      Currently, bpf stackmap store address for each entry in the call trace.
      To map these addresses to user space files, it is necessary to maintain
      the mapping from these virtual address to symbols in the binary. Usually,
      the user space profiler (such as perf) has to scan /proc/pid/maps at the
      beginning of profiling, and monitor mmap2() calls afterwards. Given the
      cost of maintaining the address map, this solution is not practical for
      system wide profiling that is always on.
      
      This patch tries to address this with a variation to stackmap. Instead
      of storing addresses, the variation stores ELF file build_id + offset.
      After profiling, a user space tool will look up these functions with
      build_id (to find the binary or shared library) and the offset.
      
      I also updated bcc/cc library for the stackmap (no python/lua support yet).
      You can find the work at:
      
        https://github.com/liu-song-6/bcc/commits/bpf_get_stackid_v02
      
      Changes v5 -> v6:
      
      1. When kernel stack is added to stackmap with build_id, use fallback
         mechanism to store ip (status == BPF_STACK_BUILD_ID_IP).
      
      Changes v4 -> v5:
      
      1. Only allow build_id lookup in non-nmi context. Added comment and
         commit message to highlight this limitation.
      2. Minor fix reported by kbuild test robot.
      
      Changes v3 -> v4:
      
      1. Add fallback when build_id lookup failed. In this case, status is set
         to BPF_STACK_BUILD_ID_IP, and ip of this entry is saved.
      2. Handle cases where vma is only part of the file (vma->vm_pgoff != 0).
         Thanks to Teng for helping me identify this issue!
      3. Address feedbacks for previous versions.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      68de5ef4
    • Song Liu's avatar
      bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID · 81f77fd0
      Song Liu authored
      test_stacktrace_build_id() is added. It accesses tracepoint urandom_read
      with "dd" and "urandom_read" and gathers stack traces. Then it reads the
      stack traces from the stackmap.
      
      urandom_read is a statically link binary that reads from /dev/urandom.
      test_stacktrace_build_id() calls readelf to read build ID of urandom_read
      and compares it with build ID from the stackmap.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      81f77fd0
    • Song Liu's avatar
      bpf: extend stackmap to save binary_build_id+offset instead of address · 615755a7
      Song Liu authored
      Currently, bpf stackmap store address for each entry in the call trace.
      To map these addresses to user space files, it is necessary to maintain
      the mapping from these virtual address to symbols in the binary. Usually,
      the user space profiler (such as perf) has to scan /proc/pid/maps at the
      beginning of profiling, and monitor mmap2() calls afterwards. Given the
      cost of maintaining the address map, this solution is not practical for
      system wide profiling that is always on.
      
      This patch tries to solve this problem with a variation of stackmap. This
      variation is enabled by flag BPF_F_STACK_BUILD_ID. Instead of storing
      addresses, the variation stores ELF file build_id + offset.
      
      Build ID is a 20-byte unique identifier for ELF files. The following
      command shows the Build ID of /bin/bash:
      
        [user@]$ readelf -n /bin/bash
        ...
          Build ID: XXXXXXXXXX
        ...
      
      With BPF_F_STACK_BUILD_ID, bpf_get_stackid() tries to parse Build ID
      for each entry in the call trace, and translate it into the following
      struct:
      
        struct bpf_stack_build_id_offset {
                __s32           status;
                unsigned char   build_id[BPF_BUILD_ID_SIZE];
                union {
                        __u64   offset;
                        __u64   ip;
                };
        };
      
      The search of build_id is limited to the first page of the file, and this
      page should be in page cache. Otherwise, we fallback to store ip for this
      entry (ip field in struct bpf_stack_build_id_offset). This requires the
      build_id to be stored in the first page. A quick survey of binary and
      dynamic library files in a few different systems shows that almost all
      binary and dynamic library files have build_id in the first page.
      
      Build_id is only meaningful for user stack. If a kernel stack is added to
      a stackmap with BPF_F_STACK_BUILD_ID, it will automatically fallback to
      only store ip (status == BPF_STACK_BUILD_ID_IP). Similarly, if build_id
      lookup failed for some reason, it will also fallback to store ip.
      
      User space can access struct bpf_stack_build_id_offset with bpf
      syscall BPF_MAP_LOOKUP_ELEM. It is necessary for user space to
      maintain mapping from build id to binary files. This mostly static
      mapping is much easier to maintain than per process address maps.
      
      Note: Stackmap with build_id only works in non-nmi context at this time.
      This is because we need to take mm->mmap_sem for find_vma(). If this
      changes, we would like to allow build_id lookup in nmi context.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      615755a7
  4. 09 Mar, 2018 9 commits
  5. 08 Mar, 2018 3 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-perf-sample-addr' · 12ef9bda
      Daniel Borkmann authored
      Teng Qin says:
      
      ====================
      These patches add support that allows bpf programs attached to perf events to
      read the address values recorded with the perf events. These values are
      requested by specifying sample_type with PERF_SAMPLE_ADDR when calling
      perf_event_open().
      
      The main motivation for these changes is to support building memory or lock
      access profiling and tracing tools. For example on Intel CPUs, the recorded
      address values for supported memory or lock access perf events would be
      the access or lock target addresses from PEBS buffer. Such information would
      be very valuable for building tools that help understand memory access or
      lock acquire pattern.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      12ef9bda
    • Teng Qin's avatar
      samples/bpf: add example to test reading address · 12fe1225
      Teng Qin authored
      This commit adds additional test in the trace_event example, by
      attaching the bpf program to MEM_UOPS_RETIRED.LOCK_LOADS event with
      PERF_SAMPLE_ADDR requested, and print the lock address value read from
      the bpf program to trace_pipe.
      Signed-off-by: default avatarTeng Qin <qinteng@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      12fe1225
    • Teng Qin's avatar
      bpf: add support to read sample address in bpf program · 95da0cdb
      Teng Qin authored
      This commit adds new field "addr" to bpf_perf_event_data which could be
      read and used by bpf programs attached to perf events. The value of the
      field is copied from bpf_perf_event_data_kern.addr and contains the
      address value recorded by specifying sample_type with PERF_SAMPLE_ADDR
      when calling perf_event_open.
      Signed-off-by: default avatarTeng Qin <qinteng@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      95da0cdb
  6. 07 Mar, 2018 17 commits