1. 05 Oct, 2015 7 commits
  2. 03 Oct, 2015 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · e3b0ac1b
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
       - Do event name substring search as last resort in 'perf list'.
         (Arnaldo Carvalho de Melo)
      
         E.g.:
      
          # perf list clock
      
          List of pre-defined events (to be used in -e):
      
           cpu-clock                                          [Software event]
           task-clock                                         [Software event]
      
           uncore_cbox_0/clockticks/                          [Kernel PMU event]
           uncore_cbox_1/clockticks/                          [Kernel PMU event]
      
           kvm:kvm_pvclock_update                             [Tracepoint event]
           kvm:kvm_update_master_clock                        [Tracepoint event]
           power:clock_disable                                [Tracepoint event]
           power:clock_enable                                 [Tracepoint event]
           power:clock_set_rate                               [Tracepoint event]
           syscalls:sys_enter_clock_adjtime                   [Tracepoint event]
           syscalls:sys_enter_clock_getres                    [Tracepoint event]
           syscalls:sys_enter_clock_gettime                   [Tracepoint event]
           syscalls:sys_enter_clock_nanosleep                 [Tracepoint event]
           syscalls:sys_enter_clock_settime                   [Tracepoint event]
           syscalls:sys_exit_clock_adjtime                    [Tracepoint event]
           syscalls:sys_exit_clock_getres                     [Tracepoint event]
           syscalls:sys_exit_clock_gettime                    [Tracepoint event]
           syscalls:sys_exit_clock_nanosleep                  [Tracepoint event]
           syscalls:sys_exit_clock_settime                    [Tracepoint event]
      
       - Reduce min 'perf stat --interval-print/-I' to 10ms. (Kan Liang)
      
         perf stat --interval in action:
      
         # perf stat -e cycles -I 50 -a usleep $((200 * 1000))
         print interval < 100ms. The overhead percentage could be high in some cases. Please proceed with caution.
         #   time                    counts unit events
            0.050233636         48,240,396      cycles
            0.100557098         35,492,594      cycles
            0.150804687         39,295,112      cycles
            0.201032269         33,101,961      cycles
            0.201980732            786,379      cycles
        #
      
       - Allow for max_stack greater than PERF_MAX_STACK_DEPTH, as when
         synthesizing callchains from Intel PT data. (Adrian Hunter)
      
       - Allow probing on kmodules without DWARF. (Masami Hiramatsu)
      
       - Fix a segfault when processing a perf.data file with callchains using
         "perf report --call-graph none". (Namhyung Kim)
      
       - Fix unresolved COMMs in 'perf top' when -s comm is used. (Namhyung Kim)
      
       - Register idle thread in 'perf top'. (Namhyung Kim)
      
       - Change 'record.samples' type to unsigned long long, fixing output of
         number of samples in 32-bit architectures. (Yang Shi)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e3b0ac1b
  3. 02 Oct, 2015 4 commits
    • Kan Liang's avatar
      perf stat: Reduce min --interval-print to 10ms · 19afd104
      Kan Liang authored
      The --interval-print parameter was limited to 100ms. However, for
      example, 10ms is required to do sophisticated bandwidth analysis using
      uncore events.
      
      The test shows that the overhead of the system-wide uncore monitoring
      with 10ms interval is only ~2%. So this patch reduces the minimal
      interval-print allowd to 10ms.
      
      But 10ms may not work well for all cases. For example, when the
      cpus/threads number is very large, for system-wide core event monitoring
      the overhead could be high.
      
      To handle this issue, a warning will be displayed when the
      interval-print is set between 10ms to 100ms. So users can make a
      decision according to their specific cases.
      
       # perf stat -e uncore_imc_1/cas_count_read/ -a --interval-print 10 -- sleep 1
      
       print interval < 100ms. The overhead percentage could be high in some
       cases. Please proceed with caution.
       #           time             counts unit events
            0.010200451               0.10 MiB  uncore_imc_1/cas_count_read/
            0.020475117               0.02 MiB  uncore_imc_1/cas_count_read/
            0.030692800               0.01 MiB  uncore_imc_1/cas_count_read/
            0.040948161               0.02 MiB  uncore_imc_1/cas_count_read/
            0.051159564               0.00 MiB  uncore_imc_1/cas_count_read/
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1443776674-42511-1-git-send-email-kan.liang@intel.com
      [ Added warning about overhead when using sub 100ms intervals to the man page ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      19afd104
    • Yang Shi's avatar
      perf record: Change 'record.samples' type to unsigned long long · 9f065194
      Yang Shi authored
      When run "perf record -e", the number of samples showed up is wrong on some
      32 bit systems, i.e. powerpc and arm.
      
      For example, run the below commands on 32 bit powerpc:
      
        perf probe -x /lib/libc.so.6 malloc
        perf record -e probe_libc:malloc -a ls perf.data
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.036 MB perf.data (13829241621624967218 samples) ]
      
      Actually, "perf script" just shows 21 samples. The number of samples is also
      absurd since samples is long type, but it is printed as PRIu64.
      
      Build test ran on x86-64, x86, aarch64, arm, mips, ppc and ppc64.
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Cc: linaro-kernel@lists.linaro.org
      Link: http://lkml.kernel.org/r/1443563383-4064-1-git-send-email-yang.shi@linaro.org
      [ Bumped the 'hits' var used together with record.samples to 'unsigned long long' too ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9f065194
    • Masami Hiramatsu's avatar
      perf probe: Allow probing on kmodules without dwarf · 1a8ac29c
      Masami Hiramatsu authored
      Allow probing on kernel modules when 'perf' is built without debuginfo
      support.
      
      Currently perf-probe --module requires linking with libdw, but this
      doesn't make sense.
      
      E.g.
        ----
        # make NO_DWARF=1
        # ./perf probe -m pcspkr pcspkr_event%return
          Error: unknown switch `m'
        ----
      
      With this patch
        ----
        # ./perf probe -m pcspkr pcspkr_event%return
        Added new event:
          probe:pcspkr_event   (on pcspkr_event%return in pcspkr)
      
        You can now use it in all perf tools, such as:
      
                perf record -e probe:pcspkr_event -aR sleep 1
        ----
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20151002125832.18617.78721.stgit@localhost.localdomainSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1a8ac29c
    • Arnaldo Carvalho de Melo's avatar
      perf list: Honour 'event_glob' whem printing selectable PMUs · fa52ceab
      Arnaldo Carvalho de Melo authored
      Some PMUs, like the 'intel_bts' one can be used as an event name, i.e.:
      
      	$ perf record -e intel_bts:// usleep 1
      
      Is a valid event name.
      
      But the code printing such PMUs was not honouring the 'event_glob'
      parameter, so the following line was always appearing:
      
        $ intel_bts//                                        [Kernel PMU event]
      
      Fix it:
      
        $ [acme@felicio linux]$ perf list data
      
        List of pre-defined events (to be used in -e):
      
          uncore_imc/data_reads/                             [Kernel PMU event]
          uncore_imc/data_writes/                            [Kernel PMU event]
      
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-ajb71858n7q7ao77b8pyy74w@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fa52ceab
  4. 01 Oct, 2015 8 commits
  5. 30 Sep, 2015 16 commits
  6. 29 Sep, 2015 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · 9c17dbc6
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
        - Accept a zero --itrace period, meaning "as often as possible".  In the case
          of Intel PT that is the same as a period of 1 and a unit of 'instructions'
          (i.e.  --itrace=i1i). (Adrian Hunter)
      
        - Harmonize itrace's synthesized callchains with the existing --max-stack
          tool option. (Adrian Hunter)
      
        - Allow time to be displayed in nanoseconds in 'perf script'. (Adrian Hunter)
      
        - Fix potential infinite loop when handling Intel PT timestamps. (Adrian Hunter)
      
        - Slighly improve Intel PT debug logging. (Adrian Hunter)
      
        - Warn when AUX data has been lost, just like when processing PERF_RECORD_LOST.
          (Adrian Hunter)
      
        - Further document export-to-postgresql.py script. (Adrian Hunter)
      
        - Add option to synthesize branch stack from auxtrace data. (Adrian Hunter)
      
        - Use equivalent logic to avoid using dso->kernel. (Arnaldo Carvalho de Melo)
      
        - Show proper error messages when parsing bad terms for hw/sw events. (He Kuang)
      
        - Tracepoint event parsing improvements. (He Kuang)
      
        - Store tracing mountpoint for better error message. (Jiri Olsa)
      
        - Add fixdep to tools/build, bringing it closer to the kernel counterpart, from
          where it is being lifted. (Jiri Olsa)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9c17dbc6
  7. 28 Sep, 2015 3 commits
    • He Kuang's avatar
      perf tools: Enable event_config terms to tracepoint events · e637d177
      He Kuang authored
      This patch enables config terms for tracepoint perf events. Valid terms
      for tracepoint events are 'call-graph' and 'stack-size', so we can use
      different callgraph settings for each event and eliminate unnecessary
      overhead.
      
      Here is an example for using different call-graph config for each
      tracepoint.
      
        $ perf record -e syscalls:sys_enter_write/call-graph=fp/
                      -e syscalls:sys_exit_write/call-graph=no/
                      dd if=/dev/zero of=test bs=4k count=10
      
        $ perf report --stdio
      
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'syscalls:sys_enter_write'
        # Event count (approx.): 13
        #
        # Children      Self  Command  Shared Object       Symbol
        # ........  ........  .......  ..................  ......................
        #
            76.92%    76.92%  dd       libpthread-2.20.so  [.] __write_nocancel
                         |
                         ---__write_nocancel
      
            23.08%    23.08%  dd       libc-2.20.so        [.] write
                         |
                         ---write
                            |
                            |--33.33%-- 0x2031342820736574
                            |
                            |--33.33%-- 0xa6e69207364726f
                            |
                             --33.33%-- 0x34202c7320393039
        ...
      
        # Samples: 13  of event 'syscalls:sys_exit_write'
        # Event count (approx.): 13
        #
        # Children      Self  Command  Shared Object       Symbol
        # ........  ........  .......  ..................  ......................
        #
            76.92%    76.92%  dd       libpthread-2.20.so  [.] __write_nocancel
            23.08%    23.08%  dd       libc-2.20.so        [.] write
             7.69%     0.00%  dd       [unknown]           [.] 0x0a6e69207364726f
             7.69%     0.00%  dd       [unknown]           [.] 0x2031342820736574
             7.69%     0.00%  dd       [unknown]           [.] 0x34202c7320393039
      Signed-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1443412336-120050-4-git-send-email-hekuang@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e637d177
    • He Kuang's avatar
      perf tools: Adds the tracepoint name parsing support · 865582c3
      He Kuang authored
      Adds rules for parsing tracepoint names. Change rules of tracepoint which
      derives from PE_NAMEs into tracepoint names directly, so adding more rules
      based on tracepoint names will be easier.
      
      Changes v2-v3:
         - Change __event_legacy_tracepoint label in bison file to tracepoint_name
         - Fix formats error.
      Signed-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1443412336-120050-3-git-send-email-hekuang@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      865582c3
    • He Kuang's avatar
      perf tools: Show proper error message for wrong terms of hw/sw events · ffeb883e
      He Kuang authored
      Show proper error message and show valid terms when wrong config terms
      is specified for hw/sw type perf events.
      
      This patch makes the original error format function formats_error_string()
      more generic, which only outputs the static config terms for hw/sw perf
      events, and prepends pmu formats for pmu events.
      
      Before this patch:
      
        $ perf record -e 'cpu-clock/freqx=200/' -a sleep 1
        invalid or unsupported event: 'cpu-clock/freqx=200/'
        Run 'perf list' for a list of valid events
      
         usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      After this patch:
      
        $ perf record -e 'cpu-clock/freqx=200/' -a sleep 1
        event syntax error: 'cpu-clock/freqx=200/'
                                       \___ unknown term
      
        valid terms: config,config1,config2,name,period,freq,branch_type,time,call-graph,stack-size
      
        Run 'perf list' for a list of valid events
      
         usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      Signed-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1443412336-120050-2-git-send-email-hekuang@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ffeb883e