1. 02 Apr, 2015 18 commits
    • Alexander Shishkin's avatar
      perf: Add API for PMUs to write to the AUX area · fdc26706
      Alexander Shishkin authored
      For pmus that wish to write data to ring buffer's AUX area, provide
      perf_aux_output_{begin,end}() calls to initiate/commit data writes,
      similarly to perf_output_{begin,end}. These also use the same output
      handle structure. Also, similarly to software counterparts, these
      will direct inherited events' output to parents' ring buffers.
      
      After the perf_aux_output_begin() returns successfully, handle->size
      is set to the maximum amount of data that can be written wrt aux_tail
      pointer, so that no data that the user hasn't seen will be overwritten,
      therefore this should always be called before hardware writing is
      enabled. On success, this will return the pointer to pmu driver's
      private structure allocated for this aux area by pmu::setup_aux. Same
      pointer can also be retrieved using perf_get_aux() while hardware
      writing is enabled.
      
      PMU driver should pass the actual amount of data written as a parameter
      to perf_aux_output_end(). All hardware writes should be completed and
      visible before this one is called.
      
      Additionally, perf_aux_output_skip() will adjust output handle and
      aux_head in case some part of the buffer has to be skipped over to
      maintain hardware's alignment constraints.
      
      Nested writers are forbidden and guards are in place to catch such
      attempts.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-8-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fdc26706
    • Alexander Shishkin's avatar
      perf: Add AUX record · 68db7e98
      Alexander Shishkin authored
      When there's new data in the AUX space, output a record indicating its
      offset and size and a set of flags, such as PERF_AUX_FLAG_TRUNCATED, to
      mean the described data was truncated to fit in the ring buffer.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-7-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      68db7e98
    • Alexander Shishkin's avatar
      perf: Add a pmu capability for "exclusive" events · bed5b25a
      Alexander Shishkin authored
      Usually, pmus that do, for example, instruction tracing, would only ever
      be able to have one event per task per cpu (or per perf_event_context). For
      such pmus it makes sense to disallow creating conflicting events early on,
      so as to provide consistent behavior for the user.
      
      This patch adds a pmu capability that indicates such constraint on event
      creation.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1422613866-113186-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bed5b25a
    • Alexander Shishkin's avatar
      perf: Add a capability for AUX_NO_SG pmus to do software double buffering · 6a279230
      Alexander Shishkin authored
      For pmus that don't support scatter-gather for AUX data in hardware, it
      might still make sense to implement software double buffering to avoid
      losing data while the user is reading data out. For this purpose, add
      a pmu capability that guarantees multiple high-order chunks for AUX buffer,
      so that the pmu driver can do switchover tricks.
      
      To make use of this feature, add PERF_PMU_CAP_AUX_SW_DOUBLEBUF to your
      pmu's capability mask. This will make the ring buffer AUX allocation code
      ensure that the biggest high order allocation for the aux buffer pages is
      no bigger than half of the total requested buffer size, thus making sure
      that the buffer has at least two high order allocations.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-5-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6a279230
    • Alexander Shishkin's avatar
      perf: Support high-order allocations for AUX space · 0a4e38e6
      Alexander Shishkin authored
      Some pmus (such as BTS or Intel PT without multiple-entry ToPA capability)
      don't support scatter-gather and will prefer larger contiguous areas for
      their output regions.
      
      This patch adds a new pmu capability to request higher order allocations.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-4-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0a4e38e6
    • Peter Zijlstra's avatar
      perf: Add AUX area to ring buffer for raw data streams · 45bfb2e5
      Peter Zijlstra authored
      This patch introduces "AUX space" in the perf mmap buffer, intended for
      exporting high bandwidth data streams to userspace, such as instruction
      flow traces.
      
      AUX space is a ring buffer, defined by aux_{offset,size} fields in the
      user_page structure, and read/write pointers aux_{head,tail}, which abide
      by the same rules as data_* counterparts of the main perf buffer.
      
      In order to allocate/mmap AUX, userspace needs to set up aux_offset to
      such an offset that will be greater than data_offset+data_size and
      aux_size to be the desired buffer size. Both need to be page aligned.
      Then, same aux_offset and aux_size should be passed to mmap() call and
      if everything adds up, you should have an AUX buffer as a result.
      
      Pages that are mapped into this buffer also come out of user's mlock
      rlimit plus perf_event_mlock_kb allowance.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-3-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      45bfb2e5
    • Alexander Shishkin's avatar
      perf: Add data_{offset,size} to user_page · e8c6deac
      Alexander Shishkin authored
      Currently, the actual perf ring buffer is one page into the mmap area,
      following the user page and the userspace follows this convention. This
      patch adds data_{offset,size} fields to user_page that can be used by
      userspace instead for locating perf data in the mmap area. This is also
      helpful when mapping existing or shared buffers if their size is not
      known in advance.
      
      Right now, it is made to follow the existing convention that
      
      	data_offset == PAGE_SIZE and
      	data_offset + data_size == mmap_size.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-2-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e8c6deac
    • Ingo Molnar's avatar
      bpf: Fix the build on BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable · e1abf2cc
      Ingo Molnar authored
      So bpf_tracing.o depends on CONFIG_BPF_SYSCALL - but that's not its only
      dependency, it also depends on the tracing infrastructure and on kprobes,
      without which it will fail to build with:
      
        In file included from kernel/trace/bpf_trace.c:14:0:
        kernel/trace/trace.h: In function ‘trace_test_and_set_recursion’:
        kernel/trace/trace.h:491:28: error: ‘struct task_struct’ has no member named ‘trace_recursion’
          unsigned int val = current->trace_recursion;
        [...]
      
      It took quite some time to trigger this build failure, because right now
      BPF_SYSCALL is very obscure, depends on CONFIG_EXPERT. So also make BPF_SYSCALL
      more configurable, not just under CONFIG_EXPERT.
      
      If BPF_SYSCALL, tracing and kprobes are enabled then enable the bpf_tracing
      gateway as well.
      
      We might want to make this an interactive option later on, although
      I'd not complicate it unnecessarily: enabling BPF_SYSCALL is enough of
      an indicator that the user wants BPF support.
      
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e1abf2cc
    • Alexei Starovoitov's avatar
      samples/bpf: Add kmem_alloc()/free() tracker tool · 9811e353
      Alexei Starovoitov authored
      One BPF program attaches to kmem_cache_alloc_node() and
      remembers all allocated objects in the map.
      Another program attaches to kmem_cache_free() and deletes
      corresponding object from the map.
      
      User space walks the map every second and prints any objects
      which are older than 1 second.
      
      Usage:
      
      	$ sudo tracex4
      
      Then start few long living processes. The 'tracex4' will print
      something like this:
      
      	obj 0xffff880465928000 is 13sec old was allocated at ip ffffffff8105dc32
      	obj 0xffff88043181c280 is 13sec old was allocated at ip ffffffff8105dc32
      	obj 0xffff880465848000 is  8sec old was allocated at ip ffffffff8105dc32
      	obj 0xffff8804338bc280 is 15sec old was allocated at ip ffffffff8105dc32
      
      	$ addr2line -fispe vmlinux ffffffff8105dc32
      	do_fork at fork.c:1665
      
      As soon as processes exit the memory is reclaimed and 'tracex4'
      prints nothing.
      
      Similar experiment can be done with the __kmalloc()/kfree() pair.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-10-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9811e353
    • Alexei Starovoitov's avatar
      samples/bpf: Add IO latency analysis (iosnoop/heatmap) tool · 5c7fc2d2
      Alexei Starovoitov authored
      BPF C program attaches to
      blk_mq_start_request()/blk_update_request() kprobe events to
      calculate IO latency.
      
      For every completed block IO event it computes the time delta
      in nsec and records in a histogram map:
      
      	map[log10(delta)*10]++
      
      User space reads this histogram map every 2 seconds and prints
      it as a 'heatmap' using gray shades of text terminal. Black
      spaces have many events and white spaces have very few events.
      Left most space is the smallest latency, right most space is
      the largest latency in the range.
      
      Usage:
      
      	$ sudo ./tracex3
      	and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal.
      
      Observe IO latencies and how different activity (like 'make
      kernel') affects it.
      
      Similar experiments can be done for network transmit latencies,
      syscalls, etc.
      
      '-t' flag prints the heatmap using normal ascii characters:
      
      $ sudo ./tracex3 -t
        heatmap of IO latency
        # - many events with this latency
          - few events
      	|1us      |10us     |100us    |1ms      |10ms     |100ms    |1s |10s
      				 *ooo. *O.#.                                    # 221
      			      .  *#     .                                       # 125
      				 ..   .o#*..                                    # 55
      			    .  . .  .  .#O                                      # 37
      				 .#                                             # 175
      				       .#*.                                     # 37
      				  #                                             # 199
      		      .              . *#*.                                     # 55
      				       *#..*                                    # 42
      				  #                                             # 266
      			      ...***Oo#*OO**o#* .                               # 629
      				  #                                             # 271
      				      . .#o* o.*o*                              # 221
      				. . o* *#O..                                    # 50
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-9-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5c7fc2d2
    • Alexei Starovoitov's avatar
      samples/bpf: Add counting example for kfree_skb() function calls and the write() syscall · d822a192
      Alexei Starovoitov authored
      this example has two probes in one C file that attach to
      different kprove events and use two different maps.
      
      1st probe is x64 specific equivalent of dropmon. It attaches to
      kfree_skb, retrevies 'ip' address of kfree_skb() caller and
      counts number of packet drops at that 'ip' address. User space
      prints 'location - count' map every second.
      
      2nd probe attaches to kprobe:sys_write and computes a histogram
      of different write sizes
      
      Usage:
      	$ sudo tracex2
      	location 0xffffffff81695995 count 1
      	location 0xffffffff816d0da9 count 2
      
      	location 0xffffffff81695995 count 2
      	location 0xffffffff816d0da9 count 2
      
      	location 0xffffffff81695995 count 3
      	location 0xffffffff816d0da9 count 2
      
      	557145+0 records in
      	557145+0 records out
      	285258240 bytes (285 MB) copied, 1.02379 s, 279 MB/s
      		   syscall write() stats
      	     byte_size       : count     distribution
      	       1 -> 1        : 3        |                                      |
      	       2 -> 3        : 0        |                                      |
      	       4 -> 7        : 0        |                                      |
      	       8 -> 15       : 0        |                                      |
      	      16 -> 31       : 2        |                                      |
      	      32 -> 63       : 3        |                                      |
      	      64 -> 127      : 1        |                                      |
      	     128 -> 255      : 1        |                                      |
      	     256 -> 511      : 0        |                                      |
      	     512 -> 1023     : 1118968  |************************************* |
      
      Ctrl-C at any time. Kernel will auto cleanup maps and programs
      
      	$ addr2line -ape ./bld_x64/vmlinux 0xffffffff81695995
      	0xffffffff816d0da9 0xffffffff81695995:
      	./bld_x64/../net/ipv4/icmp.c:1038 0xffffffff816d0da9:
      	./bld_x64/../net/unix/af_unix.c:1231
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-8-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d822a192
    • Alexei Starovoitov's avatar
      samples/bpf: Add simple non-portable kprobe filter example · b896c4f9
      Alexei Starovoitov authored
      tracex1_kern.c - C program compiled into BPF.
      
      It attaches to kprobe:netif_receive_skb()
      
      When skb->dev->name == "lo", it prints sample debug message into
      trace_pipe via bpf_trace_printk() helper function.
      
      tracex1_user.c - corresponding user space component that:
        - loads BPF program via bpf() syscall
        - opens kprobes:netif_receive_skb event via perf_event_open()
          syscall
        - attaches the program to event via ioctl(event_fd,
          PERF_EVENT_IOC_SET_BPF, prog_fd);
        - prints from trace_pipe
      
      Note, this BPF program is non-portable. It must be recompiled
      with current kernel headers. kprobe is not a stable ABI and
      BPF+kprobe scripts may no longer be meaningful when kernel
      internals change.
      
      No matter in what way the kernel changes, neither the kprobe,
      nor the BPF program can ever crash or corrupt the kernel,
      assuming the kprobes, perf and BPF subsystem has no bugs.
      
      The verifier will detect that the program is using
      bpf_trace_printk() and the kernel will print 'this is a DEBUG
      kernel' warning banner, which means that bpf_trace_printk()
      should be used for debugging of the BPF program only.
      
      Usage:
      $ sudo tracex1
                  ping-19826 [000] d.s2 63103.382648: : skb ffff880466b1ca00 len 84
                  ping-19826 [000] d.s2 63103.382684: : skb ffff880466b1d300 len 84
      
                  ping-19826 [000] d.s2 63104.382533: : skb ffff880466b1ca00 len 84
                  ping-19826 [000] d.s2 63104.382594: : skb ffff880466b1d300 len 84
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-7-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b896c4f9
    • Alexei Starovoitov's avatar
      tracing: Allow BPF programs to call bpf_trace_printk() · 9c959c86
      Alexei Starovoitov authored
      Debugging of BPF programs needs some form of printk from the
      program, so let programs call limited trace_printk() with %d %u
      %x %p modifiers only.
      
      Similar to kernel modules, during program load verifier checks
      whether program is calling bpf_trace_printk() and if so, kernel
      allocates trace_printk buffers and emits big 'this is debug
      only' banner.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-6-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9c959c86
    • Alexei Starovoitov's avatar
      tracing: Allow BPF programs to call bpf_ktime_get_ns() · d9847d31
      Alexei Starovoitov authored
      bpf_ktime_get_ns() is used by programs to compute time delta
      between events or as a timestamp
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-5-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d9847d31
    • Alexei Starovoitov's avatar
      tracing, perf: Implement BPF programs attached to kprobes · 2541517c
      Alexei Starovoitov authored
      BPF programs, attached to kprobes, provide a safe way to execute
      user-defined BPF byte-code programs without being able to crash or
      hang the kernel in any way. The BPF engine makes sure that such
      programs have a finite execution time and that they cannot break
      out of their sandbox.
      
      The user interface is to attach to a kprobe via the perf syscall:
      
      	struct perf_event_attr attr = {
      		.type	= PERF_TYPE_TRACEPOINT,
      		.config	= event_id,
      		...
      	};
      
      	event_fd = perf_event_open(&attr,...);
      	ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
      
      'prog_fd' is a file descriptor associated with BPF program
      previously loaded.
      
      'event_id' is an ID of the kprobe created.
      
      Closing 'event_fd':
      
      	close(event_fd);
      
      ... automatically detaches BPF program from it.
      
      BPF programs can call in-kernel helper functions to:
      
        - lookup/update/delete elements in maps
      
        - probe_read - wraper of probe_kernel_read() used to access any
          kernel data structures
      
      BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
      architecture dependent) and return 0 to ignore the event and 1 to store
      kprobe event into the ring buffer.
      
      Note, kprobes are a fundamentally _not_ a stable kernel ABI,
      so BPF programs attached to kprobes must be recompiled for
      every kernel version and user must supply correct LINUX_VERSION_CODE
      in attr.kern_version during bpf_prog_load() call.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2541517c
    • Alexei Starovoitov's avatar
      tracing: Add kprobe flag · 72cbbc89
      Alexei Starovoitov authored
      add TRACE_EVENT_FL_KPROBE flag to differentiate kprobe type of
      tracepoints, since bpf programs can only be attached to kprobe
      type of PERF_TYPE_TRACEPOINT perf events.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-3-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      72cbbc89
    • Daniel Borkmann's avatar
      bpf: Make internal bpf API independent of CONFIG_BPF_SYSCALL #ifdefs · 4e537f7f
      Daniel Borkmann authored
      Socket filter code and other subsystems with upcoming eBPF
      support should not need to deal with the fact that we have
      CONFIG_BPF_SYSCALL defined or not.
      
      Having the bpf syscall as a config option is a nice thing and
      I'd expect it to stay that way for expert users (I presume one
      day the default setting of it might change, though), but code
      making use of it should not care if it's actually enabled or
      not.
      
      Instead, hide this via header files and let the rest deal with it.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-2-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4e537f7f
    • Ingo Molnar's avatar
      Merge branch 'perf/timer' into perf/core · 223aa646
      Ingo Molnar authored
      This WIP branch is now ready to be merged.
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      223aa646
  2. 01 Apr, 2015 1 commit
  3. 31 Mar, 2015 5 commits
  4. 30 Mar, 2015 1 commit
  5. 27 Mar, 2015 15 commits