1. 08 Apr, 2016 9 commits
    • Arnaldo Carvalho de Melo's avatar
      perf script: Use readdir() instead of deprecated readdir_r() · a5e8e825
      Arnaldo Carvalho de Melo authored
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case in 'perf script', so, to avoid breaking the build
      with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-mt3xz7n2hl49ni2vx7kuq74g@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a5e8e825
    • Wang Nan's avatar
      perf symbols: Adjust symbol for shared objects · 99e87f7b
      Wang Nan authored
      He Kuang reported a problem that perf fails to get correct symbol on
      Android platform in [1]. The problem can be reproduced on normal x86_64
      platform. I will describe the reproducing steps in detail at the end of
      commit message.
      
      The reason of this problem is the missing of symbol adjustment for normal
      shared objects. In most of the cases skipping adjustment is okay. However,
      when '.text' section have different 'address' and 'offset' the result is wrong.
      I checked all shared objects in my working platform, only wine dll objects and
      debug objects (in .debug) have this problem. However, it is common on Android.
      For example:
      
       $ readelf -S ./libsurfaceflinger.so | grep \.text
         [10] .text             PROGBITS         0000000000029030  00012030
      
      This patch enables symbol adjustment for dynamic objects so the symbol
      address got from elfutils would be adjusted correctly.
      
      Now nearly all types of ELF files should adjust symbols. Makes
      ss->adjust_symbols default to true.
      
      Steps to reproduce the problem:
      
        $ cat ./Makefile
        PWD := $(shell pwd)
        LDFLAGS += "-Wl,-rpath=$(PWD)"
        CFLAGS += -g
        main: main.c libbuggy.so
        libbuggy.so: buggy.c
      	gcc -g -shared -fPIC -Wl,-Ttext-segment=0x200000 $< -o $@
        clean:
      	rm -rf main libbuggy.so *.o
      
        $ cat ./buggy.c
        int fib(int x)
        {
            return (x == 0) ? 1 : (x == 1) ? 1 : fib(x - 1) + fib(x - 2);
        }
      
        $ cat ./main.c
        #include <stdio.h>
      
        extern int fib(int x);
        int main()
        {
           int i;
      
           for (i = 0; i < 40; i++)
               printf("%d\n", fib(i));
           return 0;
       }
      
       $ make
       $ perf record ./main
       ...
       $ perf report --stdio
       # Overhead  Command  Shared Object      Symbol
       # ........  .......  .................  ...............................
       #
           14.97%  main     libbuggy.so        [.] 0x000000000000066c
            8.68%  main     libbuggy.so        [.] 0x00000000000006aa
            8.52%  main     libbuggy.so        [.] fib@plt
            7.95%  main     libbuggy.so        [.] 0x0000000000000664
            5.94%  main     libbuggy.so        [.] 0x00000000000006a9
            5.35%  main     libbuggy.so        [.] 0x0000000000000678
       ...
      
      The correct result should be (after this patch):
      
        # Overhead  Command  Shared Object      Symbol
        # ........  .......  .................  ...............................
        #
            91.47%  main     libbuggy.so        [.] fib
             8.52%  main     libbuggy.so        [.] fib@plt
             0.00%  main     [kernel.kallsyms]  [k] kmem_cache_free
      
      [1] http://lkml.kernel.org/g/1452567507-54013-1-git-send-email-hekuang@huawei.comSigned-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460024671-64774-3-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      99e87f7b
    • Wang Nan's avatar
      perf symbols: Record text offset in dso to calculate objdump address · a58f7033
      Wang Nan authored
      In this patch, the offset of '.text' section is stored into dso
      and used here to re-calculate address to objdump.
      
      In most of the cases, executable code is in '.text' section, so the
      adjustment made to a symbol in dso__load_sym (using
      sym.st_value -= shdr.sh_addr - shdr.sh_offset) should equal to
      'sym.st_value -= dso->text_offset'. Therefore, adding text_offset back
      get objdump address from symbol address (rip). However, it is not true
      for kernel and kernel module since there could be multiple executable
      sections with different offset. Exclude kernel for this reason.
      
      After this patch, even dso->adjust_symbols is set to true for shared
      objects, map__rip_2objdump() and map__objdump_2mem() would return
      correct result, so perf behavior of annotate won't be changed.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1460024671-64774-2-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a58f7033
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Build syscall table .c header from kernel's syscall_64.tbl · 1b700c99
      Arnaldo Carvalho de Melo authored
      We used libaudit to map ids to syscall names and vice-versa, but that
      imposes a delay in supporting new syscalls, having to wait for libaudit
      to get those new syscalls on its tables.
      
      To remove that delay, for x86_64 initially, grab a copy of
      arch/x86/entry/syscalls/syscall_64.tbl and use it to generate those
      tables.
      
      Syscalls currently not available in audit-libs:
      
        # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
        Error:	Invalid syscall copy_file_range, membarrier, mlock2, pread64, pwrite64, timerfd_create, userfaultfd
        Hint:	try 'perf list syscalls:sys_enter_*'
        Hint:	and: 'man syscalls'
        #
      
      With this patch:
      
        # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
          8505.733 ( 0.010 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 36
          8506.688 ( 0.005 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 40
         30023.097 ( 0.025 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63ae382000, count: 4096, pos: 529592320) = 4096
         31268.712 ( 0.028 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afd8b000, count: 4096, pos: 2314133504) = 4096
         31268.854 ( 0.016 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afda2000, count: 4096, pos: 2314137600) = 4096
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-51xfjbxevdsucmnbc4ka5r88@git.kernel.org
      [ Added make dep for 'prepare' in 'LIBPERF_IN', fix by Wang Nan to fix parallell build ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1b700c99
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Allow generating per-arch syscall table arrays · 5af56fab
      Arnaldo Carvalho de Melo authored
      Tools should use a mechanism similar to arch/x86/entry/syscalls/ to
      generate a header file with the definitions for two variables:
      
        static const char *syscalltbl_x86_64[] = {
      	[0] = "read",
      	[1] = "write",
        <SNIP>
      	[324] = "membarrier",
      	[325] = "mlock2",
      	[326] = "copy_file_range",
        };
        static const int syscalltbl_x86_64_max_id = 326;
      
      In a per arch file that should then be included in
      tools/perf/util/syscalltbl.c.
      
      First one will be for x86_64.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-02uuamkxgccczdth8komspgp@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5af56fab
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Move syscall table id <-> name routines to separate class · fd0db102
      Arnaldo Carvalho de Melo authored
      We're using libaudit for doing name to id and id to syscall name
      translations, but that makes 'perf trace' to have to wait for newer
      libaudit versions supporting recently added syscalls, such as
      "userfaultfd" at the time of this changeset.
      
      We have all the information right there, in the kernel sources, so move
      this code to a separate place, wrapped behind functions that will
      progressively use the kernel source files to extract the syscall table
      for use in 'perf trace'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i38opd09ow25mmyrvfwnbvkj@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fd0db102
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Beautify mode_t arguments · ba2f22cf
      Arnaldo Carvalho de Melo authored
      When reading the syscall tracepoint /format file, look for arguments of type
      "mode_t" and attach a beautifier:
      
        [root@jouet ~]# cat ~/bin/tp_with_fields_of_type
        #!/bin/bash
        grep -w $1 /sys/kernel/tracing/events/syscalls/*/format | sed -r 's%.*sys_enter_(.*)/format.*%\1%g' | paste -d, -s
        # tp_with_fields_of_type umode_t
        chmod,creat,fchmodat,fchmod,mkdirat,mkdir,mknodat,mknod,mq_open,openat,open
        #
      
      Testing it:
      
        #define S_ISUID 0004000
        #define S_ISGID 0002000
        #define S_ISVTX 0001000
        #define S_IRWXU 0000700
        #define S_IRUSR 0000400
        #define S_IWUSR 0000200
        #define S_IXUSR 0000100
      
        #define S_IRWXG 0000070
        #define S_IRGRP 0000040
        #define S_IWGRP 0000020
        #define S_IXGRP 0000010
      
        #define S_IRWXO 0000007
        #define S_IROTH 0000004
        #define S_IWOTH 0000002
        #define S_IXOTH 0000001
      
        # for mode in 4000 2000 1000 700 400 200 100 70 40 20 10 7 4 2 1 ; do \
            echo -n $mode '->' ; trace --no-inherit -e chmod,fchmodat,fchmod chmod $mode x; \
          done
        4000 -> 0.338 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: ISUID) = 0
        2000 -> 0.438 ( 0.015 ms): fchmodat(dfd: CWD, filename: x, mode: ISGID) = 0
        1000 -> 0.677 ( 0.040 ms): fchmodat(dfd: CWD, filename: x, mode: ISVTX) = 0
         700 -> 0.394 ( 0.013 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXU) = 0
         400 -> 0.337 ( 0.010 ms): fchmodat(dfd: CWD, filename: x, mode: IRUSR) = 0
         200 -> 0.259 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IWUSR) = 0
         100 -> 0.249 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXUSR) = 0
          70 -> 0.266 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXG) = 0
          40 -> 0.329 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IRGRP) = 0
          20 -> 0.250 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IWGRP) = 0
          10 -> 0.259 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXGRP) = 0
           7 -> 0.249 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXO) = 0
           4 -> 0.278 ( 0.011 ms): fchmodat(dfd: CWD, filename: x, mode: IROTH) = 0
           2 -> 0.276 ( 0.009 ms): fchmodat(dfd: CWD, filename: x, mode: IWOTH) = 0
           1 -> 0.250 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: IXOTH) = 0
        #
        # trace --no-inherit -e chmod,fchmodat,fchmod chmod 7777 x
           0.258 ( 0.011 ms): fchmodat(dfd: CWD, filename: x, mode: IALLUGO) = 0
        # trace --no-inherit -e chmod,fchmodat,fchmod chmod 7770 x
           0.258 ( 0.008 ms): fchmodat(dfd: CWD, filename: x, mode: ISUID|ISGID|ISVTX|IRWXU|IRWXG) = 0
        # trace --no-inherit -e chmod,fchmodat,fchmod chmod 777 x
           0.293 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: IRWXUGO
        #
      
      Now lets see if check by using the tracepoint for that specific syscall,
      instead of raw_syscalls:sys_enter as 'trace' does for its strace fu:
      
        # trace --no-inherit --ev syscalls:sys_enter_fchmodat -e fchmodat chmod 666 x
           0.255 (         ): syscalls:sys_enter_fchmodat:dfd: 0xffffffffffffff9c, filename: 0x55db32a3f0f0, mode: 0x000001b6)
           0.268 ( 0.012 ms): fchmodat(dfd: CWD, filename: x, mode: IRUGO|IWUGO                     ) = 0
        #
      
      Perfect, 0x1bc == 0666.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-18e8zfgbkj83xo87yoom43kd@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ba2f22cf
    • Jiri Olsa's avatar
      perf script: Process event update events · 91daee30
      Jiri Olsa authored
      Andreas reported following command produces no output:
      
        # cat test.py
        #!/usr/bin/env python
      
        def stat__krava(cpu, thread, time, val, ena, run):
            print "event %s cpu %d, thread %d, time %d, val %d, ena %d, run %d" % \
                  ("krava", cpu, thread, time, val, ena, run)
        # perf stat -a -I 1000 -e cycles,"cpu/config=0x6530160,name=krava/" record | perf script -s test.py
        ^C
        #
      
      The reason is that 'perf script' does not process event update events and
      will never get the event name update thus the python callback is never
      called.
      
      The fix is just to add already existing callback we use in 'perf stat
      report'.
      
      Committer note:
      
      After the patch:
      
        # perf stat -a -I 1000 -e cycles,"cpu/config=0x6530160,name=krava/" record | perf script -s test.py
        event krava cpu -1, thread -1, time 1000239179, val 1789051, ena 4000690920, run 4000690920
        event krava cpu -1, thread -1, time 2000479061, val 2391338, ena 4000879596, run 4000879596
        event krava cpu -1, thread -1, time 3000740802, val 1939121, ena 4000977209, run 4000977209
        event krava cpu -1, thread -1, time 4001006730, val 2356115, ena 4001000489, run 4001000489
        ^C
        #
      Reported-by: default avatarAndreas Hollmann <hollmann@in.tum.de>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1460013073-18444-3-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      91daee30
    • Jiri Olsa's avatar
      perf tools: Add dedicated unwind addr_space member into thread struct · e583d70c
      Jiri Olsa authored
      Milian reported issue with thread::priv, which was double booked by perf
      trace and DWARF unwind code. So using those together is impossible at
      the moment.
      
      Moving DWARF unwind private data into separate variable so perf trace
      can keep using thread::priv.
      Reported-and-Tested-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Andreas Hollmann <hollmann@in.tum.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1460013073-18444-2-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e583d70c
  2. 07 Apr, 2016 1 commit
  3. 06 Apr, 2016 11 commits
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Beautify pid_t arguments · d1d438a3
      Arnaldo Carvalho de Melo authored
      When reading the syscall tracepoint /format file, look for arguments
      of type "pid_t" and attach the PID beautifier, that will do a lookup
      on the threads it knows, i.e. the ones that came from PERF_RECORD_COMM
      events and add the COMM after the pid in such args:
      
      Excerpt of a system wide trace for syscalls with pid_t args:
      
        55602.977 ( 0.006 ms): bash/12122 setpgid(pid: 24347 (bash), pgid: 24347 (bash)) = 0
        55603.024 ( 0.004 ms): bash/24347 setpgid(pid: 24347 (bash), pgid: 24347 (bash)) = 0
        55691.527 (88.397 ms): bash/12122 wait4(upid: -1, stat_addr: 0x7ffe0cee1720, options: UNTRACED|CONTINUED) ...
        55692.479 ( 0.952 ms): git/24347 wait4(upid: 24368, stat_addr: 0x7ffe030d5724) ...
        55694.549 ( 2.070 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4fc10) = 24369 (pre-commit)
        55694.575 ( 0.002 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4f650, options: NOHANG) = -1 ECHILD No child processes
        55695.934 ( 0.010 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4f2d0, options: NOHANG) = 24370 (git)
        55695.937 ( 0.001 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4f2d0, options: NOHANG) = -1 ECHILD No child processes
        55717.963 ( 0.000 ms): pre-commit/24371  ... [continued]: wait4()) = 24372
        55717.978 (21.468 ms): :24371/24371 wait4(upid: -1, stat_addr: 0x7ffc94f4f230) ...
        55718.087 ( 0.109 ms): pre-commit/24371 wait4(upid: -1, stat_addr: 0x7ffc94f4f230) = 24373 (tr)
        55718.187 ( 0.096 ms): pre-commit/24371 wait4(upid: -1, stat_addr: 0x7ffc94f4f230) = 24374 (wc)
        55718.218 ( 0.002 ms): pre-commit/24371 wait4(upid: -1, stat_addr: 0x7ffc94f4eed0, options: NOHANG) = -1 ECHILD No child processes
        55718.367 ( 0.005 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4f1d0, options: NOHANG) = 24371 (pre-commit)
        55718.369 ( 0.001 ms): pre-commit/24368 wait4(upid: -1, stat_addr: 0x7ffc94f4f1d0, options: NOHANG) = -1 ECHILD No child processes
        55741.021 (49.494 ms): git/24347  ... [continued]: wait4()) = 24368 (pre-commit)
        74146.427 (18319.601 ms): git/24347 wait4(upid: 24375 (git), stat_addr: 0x7ffe030d6824) ...
        74149.036 ( 0.891 ms): bash/24391 wait4(upid: -1, stat_addr: 0x7ffe0cee0560) = 24393 (sed)
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-75yl9hzjhb020iadc81gdj8t@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d1d438a3
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Beautify set_tid_address, getpid, getppid return values · c65f1070
      Arnaldo Carvalho de Melo authored
      Showing the COMM for that return, if available.
      
        # trace -e getpid,getppid,set_tid_address
          490.007 ( 0.005 ms): sh/8250 getpid(...) = 8250 (sh)
          490.014 ( 0.001 ms): sh/8250 getppid(...) = 7886 (make)
          491.156 ( 0.004 ms): install/8251 set_tid_address(tidptr: 0x7f204a9d4ad0) = 8251 (install)
        ^C
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-psbpplqupatom9x4uohbxid5@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c65f1070
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Infrastructure to show COMM strings for syscalls returning PIDs · 11c8e39f
      Arnaldo Carvalho de Melo authored
      Starting with clone, waitid and wait4:
      
        # trace -e waitid,wait4
           1.385 ( 1.385 ms): bash/12122 wait4(upid: -1, stat_addr: 0x7ffe0cee1720, options: UNTRACED|CONTINUED) = 1210 (ls)
           1.426 ( 0.002 ms): bash/12122 wait4(upid: -1, stat_addr: 0x7ffe0cee1150, options: NOHANG|UNTRACED|CONTINUED) = 0
           3.293 ( 0.604 ms): bash/1211 wait4(upid: -1, stat_addr: 0x7ffe0cee0560                             ) = 1214 (sed)
           3.342 ( 0.002 ms): bash/1211 wait4(upid: -1, stat_addr: 0x7ffe0cee01d0, options: NOHANG            ) = -1 ECHILD No child processes
           3.576 ( 0.016 ms): bash/12122 wait4(upid: -1, stat_addr: 0x7ffe0cee0550, options: NOHANG|UNTRACED|CONTINUED) = 1211 (bash)
        ^C# trace -e clone
           0.027 ( 0.000 ms): systemd/1  ... [continued]: clone()) = 1227 (systemd)
           0.050 ( 0.000 ms): systemd/1227  ... [continued]: clone()) = 0
        ^C[root@jouet ~]#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-lyf5d3y5j15wikjb6pe6ukoi@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      11c8e39f
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Beautify wait4/waitid 'options' argument · 7206b900
      Arnaldo Carvalho de Melo authored
        # trace -e waitid,wait4
      
         0.557 ( 0.557 ms): bash/27335 wait4(upid: -1, stat_addr: 0x7ffd02f449f0) = 27336
         1.250 ( 0.685 ms): bash/27335 wait4(upid: -1, stat_addr: 0x7ffd02f449f0) = 27337
         1.312 ( 0.002 ms): bash/27335 wait4(upid: -1, stat_addr: 0x7ffd02f44690, options: NOHANG) = -1 ECHILD No child processes
         1.550 ( 0.015 ms): bash/3856 wait4(upid: -1, stat_addr: 0x7ffd02f44990, options: NOHANG|UNTRACED|CONTINUED) = 27335
         1.552 ( 0.001 ms): bash/3856 wait4(upid: -1, stat_addr: 0x7ffd02f44990, options: NOHANG|UNTRACED|CONTINUED) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i5vlo5n5jv0amt8bkyicmdxh@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7206b900
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Beautify sched_setscheduler 'policy' argument · a3bca91f
      Arnaldo Carvalho de Melo authored
        $ trace -e sched_setscheduler chrt -f 1 usleep 1
        chrt: failed to set pid 0's policy: Operation not permitted
           0.005 ( 0.005 ms): chrt/19189 sched_setscheduler(policy: FIFO, param: 0x7ffec5273d70) = -1 EPERM Operation not permitted
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i5vlo5n5jv0amt8bkyicmdxh@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a3bca91f
    • Andi Kleen's avatar
      perf list: Document event specifications better · 85f8f966
      Andi Kleen authored
      Document some features for specifying events in the perf list manpage:
      
      - Event groups
      - Leader sampling
      - How to specify raw PMU events in the new syntax
      - Global versus per process PMUs.
      - Access restrictions
      - Fix Intel SDM URL
      
      v2: Lots of new content. address review feedback.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/1459810686-15913-1-git-send-email-andi@firstfloor.org
      [ Add quotes to some keywords, such as "any" ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      85f8f966
    • Jiri Olsa's avatar
    • Arnaldo Carvalho de Melo's avatar
      perf script perl: Do error checking on new backtrace routine · 76e20522
      Arnaldo Carvalho de Melo authored
      This ended up triggering these warnings when building on Ubuntu 12.04.5:
      
        util/scripting-engines/trace-event-perl.c: In function 'perl_process_callchain':
        util/scripting-engines/trace-event-perl.c:293:4: error: value computed is not used [-Werror=unused-value]
        util/scripting-engines/trace-event-perl.c:294:4: error: value computed is not used [-Werror=unused-value]
        util/scripting-engines/trace-event-perl.c:295:4: error: value computed is not used [-Werror=unused-value]
        util/scripting-engines/trace-event-perl.c:297:4: error: value computed is not used [-Werror=unused-value]
        util/scripting-engines/trace-event-perl.c:309:4: error: value computed is not used [-Werror=unused-value]
        cc1: all warnings being treated as errors
        mv: cannot stat `/tmp/build/perf/util/scripting-engines/.trace-event-perl.o.tmp': No such file or directory
        make[4]: *** [/tmp/build/perf/util/scripting-engines/trace-event-perl.o] Error 1
      
      Fix it by doing error checking when building the perl data structures
      related to callchains.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Dima Kogan <dima@secretsauce.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Fixes: f7380c12 ("perf script perl: Perl scripts now get a backtrace, like the python ones")
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      76e20522
    • Arnaldo Carvalho de Melo's avatar
      perf probe: Check if dwarf_getlocations() is available · bd0419e2
      Arnaldo Carvalho de Melo authored
      If not, tell the user that:
      
        config/Makefile:273: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157
      
      And return -ENOTSUPP in die_get_var_range(), failing features that
      need it, like the one pointed out above.
      
      This fixes the build on older systems, such as Ubuntu 12.04.5.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Vinson Lee <vlee@freedesktop.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-9l7luqkq4gfnx7vrklkq4obs@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bd0419e2
    • Vinson Lee's avatar
      perf config: Fix build with older toolchain. · d8e28654
      Vinson Lee authored
      Fix build error on Ubuntu 12.04.5 with GCC 4.6.3.
      
          CC       util/config.o
        util/config.c: In function ‘perf_buildid_config’:
        util/config.c:384:15: error: declaration of ‘dirname’ shadows a global declaration [-Werror=shadow]
      Signed-off-by: default avatarVinson Lee <vlee@freedesktop.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 9cb5987c ("perf config: Rework buildid_dir_command_config to perf_buildid_config")
      Link: http://lkml.kernel.org/r/1459807659-9020-1-git-send-email-vlee@freedesktop.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d8e28654
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-20160401' of... · dad38ca6
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-20160401' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
       - Do not use events that don't have timestamps when setting 'perf trace's
         base timestamp, fixing up the timestamp column for syscalls (Arnaldo Carvalho de Melo)
      
       - Make the 'bpf-output' sample_type be the same as tracepoint's, fixing up
         'perf trace's timestamp column for bpf events (Wang Nan)
      
       - Fix PMU term format max value calculation (Kan Liang)
      
       - Pretty print 'seccomp', 'getrandom' syscalls in 'perf trace' (Arnaldo Carvalho de Melo)
      
      Infrastructure changes:
      
       - Add support for using TSC as an ARCH timestamp when synthesizing
         JIT records (Adrian Hunter)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dad38ca6
  4. 01 Apr, 2016 6 commits
    • Wang Nan's avatar
      perf bpf: Add sample types for 'bpf-output' event · d37ba880
      Wang Nan authored
      Before this patch we can see very large time in the events before the
      'bpf-output' event. For example:
      
        # perf trace -vv -T --ev sched:sched_switch \
                            --ev bpf-output/no-inherit,name=evt/ \
                            --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                            usleep 10
        ...
        18446744073709.551 (18446564645918.480 ms): usleep/4157 nanosleep(rqtp: 0x7ffd3f0dc4e0) ...
        18446744073709.551 (         ): evt:Raise a BPF event!..)
        179427791.076 (         ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
        179427791.081 (         ): sched:sched_switch:usleep:4157 [120] S ==> swapper/2:0 [120])
        ...
      
      We can also see the differences between bpf-output events and
      breakpoint events:
      
      For bpf output event:
         sample_type                    IP|TID|RAW|IDENTIFIER
      
      For tracepoint events:
         sample_type                    IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
      
      This patch fix this differences by adding more sample type for
      bpf-output events.
      
      After this patch:
      
        # perf trace -vv -T --ev sched:sched_switch \
                            --ev bpf-output/no-inherit,name=evt/ \
                            --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                            usleep 10
        ...
        179877370.878 ( 0.003 ms): usleep/5336 nanosleep(rqtp: 0x7ffff866c450) ...
        179877370.878 (         ): evt:Raise a BPF event!..)
        179877370.878 (         ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
        179877370.882 (         ): sched:sched_switch:usleep:5336 [120] S ==> swapper/4:0 [120])
        179877370.945 (         ): evt:Raise a BPF event!..)
        ...
      
        # ./perf trace -vv -T --ev sched:sched_switch \
                              --ev bpf-output/no-inherit,name=evt/ \
                              --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                              usleep 10 2>&1 | grep sample_type
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
      
      The 'IDENTIFIER' info is not required because all events have the same
      sample_type.
      
      Committer notes:
      
      Further testing, on top of the changes making 'perf trace' avoid samples
      from events without PERF_SAMPLE_TIME:
      
      Before:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        <SNIP>
          0.560 ( 0.001 ms): brk(                                                   ) = 0x55e5a1df8000
          18446640227439.430 (18446640227438.859 ms): nanosleep(rqtp: 0x7ffc96643370) ...
          18446640227439.430 (         ): evt:Raise a BPF event!..)
          0.576 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
          18446640227439.430 (         ): evt:Raise a BPF event!..)
          0.645 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
          0.646 ( 0.076 ms):  ... [continued]: nanosleep()) = 0
        #
      
      After:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        <SNIP>
           0.292 ( 0.001 ms): brk(                          ) = 0x55c7cd6e1000
           0.302 ( 0.004 ms): nanosleep(rqtp: 0x7ffedd8bc0f0) ...
           0.302 (         ): evt:Raise a BPF event!..)
           0.303 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
           0.397 (         ): evt:Raise a BPF event!..)
           0.397 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
           0.398 ( 0.100 ms):  ... [continued]: nanosleep()) = 0
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Reported-and-Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1459517202-42320-1-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d37ba880
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Don't set the base timestamp using events without PERF_SAMPLE_TIME · 8a07a809
      Arnaldo Carvalho de Melo authored
      This was causing bogus values to be shown at the timestamp column:
      
      Before:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        94631143.385 ( 0.001 ms): brk(                                     ) = 0x555555757000
        94631143.398 ( 0.003 ms): mmap(len: 4096, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1) = 0x7ffff7ff6000
        94631143.406 ( 0.004 ms): access(filename: 0xf7df9e10, mode: R     ) = -1 ENOENT No such file or directory
        94631143.412 ( 0.004 ms): open(filename: 0xf7df8761, flags: CLOEXEC) = 3
        94631143.415 ( 0.002 ms): fstat(fd: 3, statbuf: 0x7fffffffd6b0     ) = 0
        94631143.419 ( 0.003 ms): mmap(len: 106798, prot: READ, flags: PRIVATE, fd: 3) = 0x7ffff7fdb000
        94631143.420 ( 0.001 ms): close(fd: 3                              ) = 0
        94631143.432 ( 0.004 ms): open(filename: 0xf7ff6640, flags: CLOEXEC) = 3
        <SNIP>
      
      After:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        0.022 ( 0.001 ms): brk(                                     ) = 0x55d7668a6000
        0.037 ( 0.003 ms): mmap(len: 4096, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1) = 0x7f8fbeb97000
        0.123 ( 0.083 ms): access(filename: 0xbe995e10, mode: R     ) = -1 ENOENT No such file or directory
        0.130 ( 0.004 ms): open(filename: 0xbe994761, flags: CLOEXEC) = 3
        0.133 ( 0.002 ms): fstat(fd: 3, statbuf: 0x7fff6487a890     ) = 0
        0.138 ( 0.003 ms): mmap(len: 106798, prot: READ, flags: PRIVATE, fd: 3) = 0x7f8fbeb7c000
        0.140 ( 0.001 ms): close(fd: 3                              ) = 0
        0.151 ( 0.004 ms): open(filename: 0xbeb97640, flags: CLOEXEC) = 3
        <SNIP>
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-p7m8llv81iv55ekxexdp5n57@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8a07a809
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Introduce function to set the base timestamp · e6001980
      Arnaldo Carvalho de Melo authored
      That is used in both live runs, i.e.:
      
        # trace ls
      
      As when processing events recorded in a perf.data file:
      
        # trace -i perf.data
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-901l6yebnzeqg7z8mbaf49xb@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e6001980
    • Kan Liang's avatar
      perf tools: Fix PMU term format max value calculation · ac0e2cd5
      Kan Liang authored
      Currently the max value of format is calculated by the bits number. It
      relies on the continuity of the format.
      
      However, uncore event format is not continuous. E.g. uncore qpi event
      format can be 0-7,21.
      
      If bit 21 is set, there is parsing issues as below.
      
        $ perf stat -a -e uncore_qpi_0/event=0x200002,umask=0x8/
        event syntax error: '..pi_0/event=0x200002,umask=0x8/'
                                          \___ value too big for format, maximum is 511
      
      This patch return the real max value by setting all possible bits to 1.
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1459365375-14285-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ac0e2cd5
    • Adrian Hunter's avatar
      perf intel-pt/bts: Define JITDUMP_USE_ARCH_TIMESTAMP · bd0c7a54
      Adrian Hunter authored
      For Intel PT / BTS, define the environment variable that selects TSC
      timestamps in the jitdump file.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1457426333-30260-1-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bd0c7a54
    • Adrian Hunter's avatar
      perf jit: Add support for using TSC as a timestamp · 2a28e230
      Adrian Hunter authored
      Intel PT uses TSC as a timestamp, so add support for using TSC instead
      of the monotonic clock.  Use of TSC is selected by an environment
      variable "JITDUMP_USE_ARCH_TIMESTAMP" and flagged in the jitdump file
      with flag JITDUMP_FLAGS_ARCH_TIMESTAMP.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457426330-30226-1-git-send-email-adrian.hunter@intel.com
      [ Added the fixup from He Kuang to make it build on other arches, ]
      [ such as aarch64, to avoid inserting this bisectiong breakage upstream ]
      Link: http://lkml.kernel.org/r/1459482572-129494-1-git-send-email-hekuang@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a28e230
  5. 31 Mar, 2016 13 commits
    • Adrian Hunter's avatar
      perf tools: Add time conversion event · 46bc29b9
      Adrian Hunter authored
      Intel PT uses the time members from the perf_event_mmap_page to convert
      between TSC and perf time.
      
      Due to a lack of foresight when Intel PT was implemented, those time
      members were recorded in the (implementation dependent) AUXTRACE_INFO
      event, the structure of which is generally inaccessible outside of the
      Intel PT decoder.  However now the conversion between TSC and perf time
      is needed when processing a jitdump file when Intel PT has been used for
      tracing.
      
      So add a user event to record the time members.  'perf record' will
      synthesize the event if the information is available.  And session
      processing will put a copy of the event on the session so that tools
      like 'perf inject' can easily access it.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1457426324-30158-1-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      46bc29b9
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Pretty print getrandom() args · 39878d49
      Arnaldo Carvalho de Melo authored
        # trace -e getrandom
        35622.560 ( 0.023 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35622.585 ( 0.006 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35622.594 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35627.395 ( 0.010 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        35630.940 ( 0.013 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        35718.613 ( 0.015 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35718.629 ( 0.005 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35718.637 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35719.355 ( 0.010 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        35721.042 ( 0.030 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        41090.830 ( 0.012 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41090.845 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41090.851 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41091.750 ( 0.010 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        41091.823 ( 0.006 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        41122.078 ( 0.053 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41122.129 ( 0.009 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41122.139 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41124.492 ( 0.007 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        41124.470 ( 0.013 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        41590.832 ( 0.014 ms): chrome/5957 getrandom(buf: 0x7fabac7b15b0, count: 16, flags: NONBLOCK      ) = 16
        41590.884 ( 0.004 ms): chrome/5957 getrandom(buf: 0x7fabac7b15c0, count: 16, flags: NONBLOCK      ) = 16
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-gca0n1p3aca3depey703ph2q@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      39878d49
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Pretty print seccomp() args · 997bba8c
      Arnaldo Carvalho de Melo authored
      E.g:
      
        # trace -e seccomp
         200.061 (0.009 ms): :2441/2441 seccomp(op: FILTER, flags: TSYNC                       ) = -1 EFAULT Bad address
         200.910 (0.121 ms): :2441/2441 seccomp(op: FILTER, flags: TSYNC, uargs: 0x7fff57479fe0) = 0
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-t369uckshlwp4evkks4bcoo7@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      997bba8c
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Do not process PERF_RECORD_LOST twice · 3ed5ca2e
      Arnaldo Carvalho de Melo authored
      We catch this record to provide a visual indication that events are
      getting lost, then call the default method to allow extra logging shared
      with the other tools to take place.
      
      This extra logging was done twice because we were continuing to the
      "default" clause where machine__process_event() will end up calling
      machine__process_lost_event() again, fix it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-wus2zlhw3qo24ye84ewu4aqw@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3ed5ca2e
    • Wang Nan's avatar
      perf/ring_buffer: Prepare writing into the ring-buffer from the end · d1b26c70
      Wang Nan authored
      Convert perf_output_begin() to __perf_output_begin() and make the later
      function able to write records from the end of the ring-buffer.
      
      Following commits will utilize the 'backward' flag.
      
      This is the core patch to support writing to the ring-buffer backwards,
      which will be introduced by upcoming patches to support reading from
      overwritable ring-buffers.
      
      In theory, this patch should not introduce any extra performance
      overhead since we use always_inline, but it does not hurt to double
      check that assumption:
      
      When CONFIG_OPTIMIZE_INLINING is disabled, the output object is nearly
      identical to original one. See:
      
         http://lkml.kernel.org/g/56F52E83.70409@huawei.com
      
      When CONFIG_OPTIMIZE_INLINING is enabled, the resuling object file becomes
      smaller:
      
       $ size kernel/events/ring_buffer.o*
         text       data        bss        dec        hex    filename
         4641          4          8       4653       122d kernel/events/ring_buffer.o.old
         4545          4          8       4557       11cd kernel/events/ring_buffer.o.new
      
      Performance testing results:
      
      Calling 3000000 times of 'close(-1)', use gettimeofday() to check
      duration.  Use 'perf record -o /dev/null -e raw_syscalls:*' to capture
      system calls. In ns.
      
      Testing environment:
      
       CPU    : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
       Kernel : v4.5.0
      
                           MEAN         STDVAR
        BASE            800214.950    2853.083
        PRE            2253846.700    9997.014
        POST           2257495.540    8516.293
      
      Where 'BASE' is pure performance without capturing. 'PRE' is test
      result of pure 'v4.5.0' kernel. 'POST' is test result after this
      patch.
      
      Considering the stdvar, this patch doesn't hurt performance, within
      noise margin.
      
      For testing details, see:
      
        http://lkml.kernel.org/g/56F89DCD.1040202@huawei.comSigned-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <pi3orama@163.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1459147292-239310-4-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d1b26c70
    • Wang Nan's avatar
      perf/core: Set event's default ::overflow_handler() · 1879445d
      Wang Nan authored
      Set a default event->overflow_handler in perf_event_alloc() so don't
      need to check event->overflow_handler in __perf_event_overflow().
      Following commits can give a different default overflow_handler.
      
      Initial idea comes from Peter:
      
        http://lkml.kernel.org/r/20130708121557.GA17211@twins.programming.kicks-ass.net
      
      Since the default value of event->overflow_handler is not NULL, existing
      'if (!overflow_handler)' checks need to be changed.
      
      is_default_overflow_handler() is introduced for this.
      
      No extra performance overhead is introduced into the hot path because in the
      original code we still need to read this handler from memory. A conditional
      branch is avoided so actually we remove some instructions.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <pi3orama@163.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1459147292-239310-3-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1879445d
    • Wang Nan's avatar
      perf/ring_buffer: Introduce new ioctl options to pause and resume the ring-buffer · 86e7972f
      Wang Nan authored
      Add new ioctl() to pause/resume ring-buffer output.
      
      In some situations we want to read from the ring-buffer only when we
      ensure nothing can write to the ring-buffer during reading. Without
      this patch we have to turn off all events attached to this ring-buffer
      to achieve this.
      
      This patch is a prerequisite to enable overwrite support for the
      perf ring-buffer support. Following commits will introduce new methods
      support reading from overwrite ring buffer. Before reading, caller
      must ensure the ring buffer is frozen, or the reading is unreliable.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <pi3orama@163.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1459147292-239310-2-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      86e7972f
    • Jiri Olsa's avatar
      ftrace/perf: Check sample types only for sampling events · 0a74c5b3
      Jiri Olsa authored
      Currently we check sample type for ftrace:function events
      even if it's not created as a sampling event. That prevents
      creating ftrace_function event in counting mode.
      
      Make sure we check sample types only for sampling events.
      
      Before:
        $ sudo perf stat -e ftrace:function ls
        ...
      
         Performance counter stats for 'ls':
      
           <not supported>      ftrace:function
      
               0.001983662 seconds time elapsed
      
      After:
        $ sudo perf stat -e ftrace:function ls
        ...
      
         Performance counter stats for 'ls':
      
                    44,498      ftrace:function
      
               0.037534722 seconds time elapsed
      Suggested-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1458138873-1553-2-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0a74c5b3
    • Alexander Shishkin's avatar
      perf/x86/intel/bts: Move transaction start/stop to start/stop callbacks · 981a4cb3
      Alexander Shishkin authored
      As per AUX buffer management requirement, AUX output has to happen between
      pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
      and therefore perf can free the AUX data after it has called pmu::stop.
      
      This patch moves perf_aux_output_{begin,end} from bts_event_{add,del} to
      bts_event_{start,stop}. As a bonus, we get rid of bts_buffer_is_full(),
      which is already taken care of by perf_aux_output_begin() anyway.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-6-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      981a4cb3
    • Alexander Shishkin's avatar
      perf/x86/intel/pt: Move transaction start/stop to PMU start/stop callbacks · 66d21901
      Alexander Shishkin authored
      As per AUX buffer management requirement, AUX output has to happen between
      pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
      and therefore perf can free the AUX data after it has called pmu::stop.
      
      This patch moves perf_aux_output_{begin,end} from pt_event_{add,del} to
      pt_event_{start,stop}. As a bonus, we get rid of pt_buffer_is_full(),
      which is already taken care of by perf_aux_output_begin() anyway.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-5-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      66d21901
    • Alexander Shishkin's avatar
      perf/ring_buffer: Document AUX API usage · af5bb4ed
      Alexander Shishkin authored
      In order to ensure safe AUX buffer management, we rely on the assumption
      that pmu::stop() stops its ongoing AUX transaction and not just the hw.
      
      This patch documents this requirement for the perf_aux_output_{begin,end}()
      APIs.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-4-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      af5bb4ed
    • Alexander Shishkin's avatar
      perf/core: Free AUX pages in unmap path · 95ff4ca2
      Alexander Shishkin authored
      Now that we can ensure that when ring buffer's AUX area is on the way
      to getting unmapped new transactions won't start, we only need to stop
      all events that can potentially be writing aux data to our ring buffer.
      
      Having done that, we can safely free the AUX pages and corresponding
      PMU data, as this time it is guaranteed to be the last aux reference
      holder.
      
      This partially reverts:
      
        57ffc5ca ("perf: Fix AUX buffer refcounting")
      
      ... which was made to defer deallocation that was otherwise possible
      from an NMI context. Now it is no longer the case; the last call to
      rb_free_aux() that drops the last AUX reference has to happen in
      perf_mmap_close() on that AUX area.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/87d1qtz23d.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      95ff4ca2
    • Alexander Shishkin's avatar
      perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops · dcb10a96
      Alexander Shishkin authored
      When ring buffer's AUX area is unmapped and rb->aux_mmap_count drops to
      zero, new AUX transactions into this buffer can still be started,
      even though the buffer in en route to deallocation.
      
      This patch adds a check to perf_aux_output_begin() for rb->aux_mmap_count
      being zero, in which case there is no point starting new transactions,
      in other words, the ring buffers that pass a certain point in
      perf_mmap_close will not have their events sending new data, which
      clears path for freeing those buffers' pages right there and then,
      provided that no active transactions are holding the AUX reference.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-2-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dcb10a96