1. 29 Apr, 2016 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-20160429' of... · 03d85a63
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-20160429' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
       - Allow generate timestamped suffixed multiple perf.data files upon receiving
         SIGUSR2 in 'perf record', to slice a long running monitoring session, allowing
         to dump uninteresting sessions (Wang Nan)
      
       - Handle ENOMEM for perf_event_max_stack + PERF_SAMPLE_CALLCHAIN
         in perf_evsel__open_strerror(), showing a more informative
         message when the request call stack depth can't be allocated by
         the kernel (Arnaldo Carvalho de Melo)
      
      Infrastructure changes:
      
       - Use strbuf for making strings in 'perf probe' (Masami Hiramatsu)
      
       - Do not use sizeof on pointer type, not a problem since its a pointer to
         pointer, fix none the less. Found by Coccinelle (Vaishali Thakkar)
      
      Cleanups:
      
       - Fix for Coverity found issues in the bpf feature build test (Florian Fainelli)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      03d85a63
  2. 28 Apr, 2016 19 commits
  3. 27 Apr, 2016 3 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-20160427' of... · a8944c5b
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-20160427' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
      - perf trace --pf maj/min/all works with --call-graph: (Arnaldo Carvalho de Melo)
      
        Tracing write syscalls and major page faults with callchains while starting
        firefox, limiting the stack to 5 frames:
      
       # perf trace -e write --pf maj --max-stack 5 firefox
         589.549 ( 0.014 ms): firefox/15377 write(fd: 4, buf: 0x7fff80acc898, count: 151) = 151
                                             [0xfaed] (/usr/lib64/libpthread-2.22.so)
                                             fire_glxtest_process+0x5c (/usr/lib64/firefox/libxul.so)
                                             InstallGdkErrorHandler+0x41 (/usr/lib64/firefox/libxul.so)
                                             XREMain::XRE_mainInit+0x12c (/usr/lib64/firefox/libxul.so)
                                             XREMain::XRE_main+0x1e4 (/usr/lib64/firefox/libxul.so)
         760.704 ( 0.000 ms): firefox/15332 majfault [gtk_tree_view_accessible_get_type+0x0] => /usr/lib64/libgtk-3.so.0.1800.9@0xa0850 (x.)
                                             gtk_tree_view_accessible_get_type+0x0 (/usr/lib64/libgtk-3.so.0.1800.9)
                                             gtk_tree_view_class_intern_init+0x1a54 (/usr/lib64/libgtk-3.so.0.1800.9)
                                             g_type_class_ref+0x6dd (/usr/lib64/libgobject-2.0.so.0.4600.2)
                                             [0x115378] (/usr/lib64/libgnutls.so.30.6.3)
      
        This automagically selects "--call-graph dwarf", use "--call-graph fp" on systems
        where -fno-omit-frame-pointer was used to built the components of interest, to
        incur in less overhead, or tune "--call-graph dwarf" appropriately, see 'perf record --help'.
      
      - Allow /proc/sys/kernel/perf_event_max_stack, that defaults to the old hard coded value
        of PERF_MAX_STACK_DEPTH (127), useful for huge callstacks for things like Groovy, Ruby, etc,
        and also to reduce overhead by limiting it to a smaller value, upcoming work will allow
        this to be done per-event (Arnaldo Carvalho de Melo)
      
      - Make 'perf trace --min-stack' be honoured by --pf and --event (Arnaldo Carvalho de Melo)
      
      - Make 'perf evlist -v' decode perf_event_attr->branch_sample_type (Arnaldo Carvalho de Melo)
      
         # perf record --call lbr usleep 1
         # perf evlist -v
         cycles:ppp: ... sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|BRANCH_STACK, ...
                  branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES
         #
      
      - Clear dummy entry accumulated period, fixing such 'perf top/report' output
        as: (Kan Liang)
      
          4769.98%  0.01%  0.00%  0.01%  tchain_edit  [kernel] [k] update_fast_timekeeper
      
      - System calls with pid_t arguments gets them augmented with the COMM event
        more thoroughly:
      
        # trace -e perf_event_open perf stat -e cycles -p 15608
         6.876 ( 0.014 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15608 (hexchat), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
         6.882 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15639 (gmain), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
         6.889 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15640 (gdbus), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5
                                                                  ^^^^^^^^^^^^^^^^^^
         ^C
      
      - Fix offline module name mismatch issue in 'perf probe' (Ravi Bangoria)
      
      - Fix module probe issue if no dwarf support in (Ravi Bangoria)
      
      Assorted fixes:
      
      - Fix off-by-one in write_buildid() (Andrey Ryabinin)
      
      - Fix segfault when printing callchains in 'perf script' (Chris Phlipot)
      
      - Replace assignment with comparison on assert check in 'perf test' entry (Colin Ian King)
      
      - Fix off-by-one comparison in intel-pt code (Colin Ian King)
      
      - Close target file on error path in 'perf probe' (Masami Hiramatsu)
      
      - Set default kprobe group name if not given in 'perf probe' (Masami Hiramatsu)
      
      - Avoid partial perf_event_header reads (Wang Nan)
      
      Infrastructure changes:
      
      - Update x86's syscall_64.tbl copy, adding preadv2 & pwritev2 (Arnaldo Carvalho de Melo)
      
      - Make the x86 clean quiet wrt syscall table removal (Jiri Olsa)
      
      Cleanups:
      
      - Simplify wrapper for LOCK_PI in 'perf bench futex' (Davidlohr Bueso)
      
      - Remove duplicate const qualifier (Eric Engestrom)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a8944c5b
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Set the maximum allowed stack from /proc/sys/kernel/perf_event_max_stack · 4cb93446
      Arnaldo Carvalho de Melo authored
      There is an upper limit to what tooling considers a valid callchain,
      and it was tied to the hardcoded value in the kernel,
      PERF_MAX_STACK_DEPTH (127), now that this can be tuned via a sysctl,
      make it read it and use that as the upper limit, falling back to
      PERF_MAX_STACK_DEPTH for kernels where this sysctl isn't present.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-yjqsd30nnkogvj5oyx9ghir9@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4cb93446
    • Arnaldo Carvalho de Melo's avatar
      perf core: Allow setting up max frame stack depth via sysctl · c5dfd78e
      Arnaldo Carvalho de Melo authored
      The default remains 127, which is good for most cases, and not even hit
      most of the time, but then for some cases, as reported by Brendan, 1024+
      deep frames are appearing on the radar for things like groovy, ruby.
      
      And in some workloads putting a _lower_ cap on this may make sense. One
      that is per event still needs to be put in place tho.
      
      The new file is:
      
        # cat /proc/sys/kernel/perf_event_max_stack
        127
      
      Chaging it:
      
        # echo 256 > /proc/sys/kernel/perf_event_max_stack
        # cat /proc/sys/kernel/perf_event_max_stack
        256
      
      But as soon as there is some event using callchains we get:
      
        # echo 512 > /proc/sys/kernel/perf_event_max_stack
        -bash: echo: write error: Device or resource busy
        #
      
      Because we only allocate the callchain percpu data structures when there
      is a user, which allows for changing the max easily, its just a matter
      of having no callchain users at that point.
      Reported-and-Tested-by: default avatarBrendan Gregg <brendan.d.gregg@gmail.com>
      Reviewed-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c5dfd78e
  4. 26 Apr, 2016 14 commits
  5. 25 Apr, 2016 3 commits