1. 02 Apr, 2015 9 commits
    • Alexei Starovoitov's avatar
      samples/bpf: Add IO latency analysis (iosnoop/heatmap) tool · 5c7fc2d2
      Alexei Starovoitov authored
      BPF C program attaches to
      blk_mq_start_request()/blk_update_request() kprobe events to
      calculate IO latency.
      
      For every completed block IO event it computes the time delta
      in nsec and records in a histogram map:
      
      	map[log10(delta)*10]++
      
      User space reads this histogram map every 2 seconds and prints
      it as a 'heatmap' using gray shades of text terminal. Black
      spaces have many events and white spaces have very few events.
      Left most space is the smallest latency, right most space is
      the largest latency in the range.
      
      Usage:
      
      	$ sudo ./tracex3
      	and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal.
      
      Observe IO latencies and how different activity (like 'make
      kernel') affects it.
      
      Similar experiments can be done for network transmit latencies,
      syscalls, etc.
      
      '-t' flag prints the heatmap using normal ascii characters:
      
      $ sudo ./tracex3 -t
        heatmap of IO latency
        # - many events with this latency
          - few events
      	|1us      |10us     |100us    |1ms      |10ms     |100ms    |1s |10s
      				 *ooo. *O.#.                                    # 221
      			      .  *#     .                                       # 125
      				 ..   .o#*..                                    # 55
      			    .  . .  .  .#O                                      # 37
      				 .#                                             # 175
      				       .#*.                                     # 37
      				  #                                             # 199
      		      .              . *#*.                                     # 55
      				       *#..*                                    # 42
      				  #                                             # 266
      			      ...***Oo#*OO**o#* .                               # 629
      				  #                                             # 271
      				      . .#o* o.*o*                              # 221
      				. . o* *#O..                                    # 50
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-9-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5c7fc2d2
    • Alexei Starovoitov's avatar
      samples/bpf: Add counting example for kfree_skb() function calls and the write() syscall · d822a192
      Alexei Starovoitov authored
      this example has two probes in one C file that attach to
      different kprove events and use two different maps.
      
      1st probe is x64 specific equivalent of dropmon. It attaches to
      kfree_skb, retrevies 'ip' address of kfree_skb() caller and
      counts number of packet drops at that 'ip' address. User space
      prints 'location - count' map every second.
      
      2nd probe attaches to kprobe:sys_write and computes a histogram
      of different write sizes
      
      Usage:
      	$ sudo tracex2
      	location 0xffffffff81695995 count 1
      	location 0xffffffff816d0da9 count 2
      
      	location 0xffffffff81695995 count 2
      	location 0xffffffff816d0da9 count 2
      
      	location 0xffffffff81695995 count 3
      	location 0xffffffff816d0da9 count 2
      
      	557145+0 records in
      	557145+0 records out
      	285258240 bytes (285 MB) copied, 1.02379 s, 279 MB/s
      		   syscall write() stats
      	     byte_size       : count     distribution
      	       1 -> 1        : 3        |                                      |
      	       2 -> 3        : 0        |                                      |
      	       4 -> 7        : 0        |                                      |
      	       8 -> 15       : 0        |                                      |
      	      16 -> 31       : 2        |                                      |
      	      32 -> 63       : 3        |                                      |
      	      64 -> 127      : 1        |                                      |
      	     128 -> 255      : 1        |                                      |
      	     256 -> 511      : 0        |                                      |
      	     512 -> 1023     : 1118968  |************************************* |
      
      Ctrl-C at any time. Kernel will auto cleanup maps and programs
      
      	$ addr2line -ape ./bld_x64/vmlinux 0xffffffff81695995
      	0xffffffff816d0da9 0xffffffff81695995:
      	./bld_x64/../net/ipv4/icmp.c:1038 0xffffffff816d0da9:
      	./bld_x64/../net/unix/af_unix.c:1231
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-8-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d822a192
    • Alexei Starovoitov's avatar
      samples/bpf: Add simple non-portable kprobe filter example · b896c4f9
      Alexei Starovoitov authored
      tracex1_kern.c - C program compiled into BPF.
      
      It attaches to kprobe:netif_receive_skb()
      
      When skb->dev->name == "lo", it prints sample debug message into
      trace_pipe via bpf_trace_printk() helper function.
      
      tracex1_user.c - corresponding user space component that:
        - loads BPF program via bpf() syscall
        - opens kprobes:netif_receive_skb event via perf_event_open()
          syscall
        - attaches the program to event via ioctl(event_fd,
          PERF_EVENT_IOC_SET_BPF, prog_fd);
        - prints from trace_pipe
      
      Note, this BPF program is non-portable. It must be recompiled
      with current kernel headers. kprobe is not a stable ABI and
      BPF+kprobe scripts may no longer be meaningful when kernel
      internals change.
      
      No matter in what way the kernel changes, neither the kprobe,
      nor the BPF program can ever crash or corrupt the kernel,
      assuming the kprobes, perf and BPF subsystem has no bugs.
      
      The verifier will detect that the program is using
      bpf_trace_printk() and the kernel will print 'this is a DEBUG
      kernel' warning banner, which means that bpf_trace_printk()
      should be used for debugging of the BPF program only.
      
      Usage:
      $ sudo tracex1
                  ping-19826 [000] d.s2 63103.382648: : skb ffff880466b1ca00 len 84
                  ping-19826 [000] d.s2 63103.382684: : skb ffff880466b1d300 len 84
      
                  ping-19826 [000] d.s2 63104.382533: : skb ffff880466b1ca00 len 84
                  ping-19826 [000] d.s2 63104.382594: : skb ffff880466b1d300 len 84
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-7-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b896c4f9
    • Alexei Starovoitov's avatar
      tracing: Allow BPF programs to call bpf_trace_printk() · 9c959c86
      Alexei Starovoitov authored
      Debugging of BPF programs needs some form of printk from the
      program, so let programs call limited trace_printk() with %d %u
      %x %p modifiers only.
      
      Similar to kernel modules, during program load verifier checks
      whether program is calling bpf_trace_printk() and if so, kernel
      allocates trace_printk buffers and emits big 'this is debug
      only' banner.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-6-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9c959c86
    • Alexei Starovoitov's avatar
      tracing: Allow BPF programs to call bpf_ktime_get_ns() · d9847d31
      Alexei Starovoitov authored
      bpf_ktime_get_ns() is used by programs to compute time delta
      between events or as a timestamp
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-5-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d9847d31
    • Alexei Starovoitov's avatar
      tracing, perf: Implement BPF programs attached to kprobes · 2541517c
      Alexei Starovoitov authored
      BPF programs, attached to kprobes, provide a safe way to execute
      user-defined BPF byte-code programs without being able to crash or
      hang the kernel in any way. The BPF engine makes sure that such
      programs have a finite execution time and that they cannot break
      out of their sandbox.
      
      The user interface is to attach to a kprobe via the perf syscall:
      
      	struct perf_event_attr attr = {
      		.type	= PERF_TYPE_TRACEPOINT,
      		.config	= event_id,
      		...
      	};
      
      	event_fd = perf_event_open(&attr,...);
      	ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
      
      'prog_fd' is a file descriptor associated with BPF program
      previously loaded.
      
      'event_id' is an ID of the kprobe created.
      
      Closing 'event_fd':
      
      	close(event_fd);
      
      ... automatically detaches BPF program from it.
      
      BPF programs can call in-kernel helper functions to:
      
        - lookup/update/delete elements in maps
      
        - probe_read - wraper of probe_kernel_read() used to access any
          kernel data structures
      
      BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
      architecture dependent) and return 0 to ignore the event and 1 to store
      kprobe event into the ring buffer.
      
      Note, kprobes are a fundamentally _not_ a stable kernel ABI,
      so BPF programs attached to kprobes must be recompiled for
      every kernel version and user must supply correct LINUX_VERSION_CODE
      in attr.kern_version during bpf_prog_load() call.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2541517c
    • Alexei Starovoitov's avatar
      tracing: Add kprobe flag · 72cbbc89
      Alexei Starovoitov authored
      add TRACE_EVENT_FL_KPROBE flag to differentiate kprobe type of
      tracepoints, since bpf programs can only be attached to kprobe
      type of PERF_TYPE_TRACEPOINT perf events.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-3-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      72cbbc89
    • Daniel Borkmann's avatar
      bpf: Make internal bpf API independent of CONFIG_BPF_SYSCALL #ifdefs · 4e537f7f
      Daniel Borkmann authored
      Socket filter code and other subsystems with upcoming eBPF
      support should not need to deal with the fact that we have
      CONFIG_BPF_SYSCALL defined or not.
      
      Having the bpf syscall as a config option is a nice thing and
      I'd expect it to stay that way for expert users (I presume one
      day the default setting of it might change, though), but code
      making use of it should not care if it's actually enabled or
      not.
      
      Instead, hide this via header files and let the rest deal with it.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-2-git-send-email-ast@plumgrid.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4e537f7f
    • Ingo Molnar's avatar
      Merge branch 'perf/timer' into perf/core · 223aa646
      Ingo Molnar authored
      This WIP branch is now ready to be merged.
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      223aa646
  2. 01 Apr, 2015 1 commit
  3. 31 Mar, 2015 5 commits
  4. 30 Mar, 2015 1 commit
  5. 27 Mar, 2015 22 commits
  6. 26 Mar, 2015 2 commits