1. 07 Jun, 2015 10 commits
    • Kan Liang's avatar
      perf/x86/intel: Introduce PERF_RECORD_LOST_SAMPLES · f38b0dbb
      Kan Liang authored
      After enlarging the PEBS interrupt threshold, there may be some mixed up
      PEBS samples which are discarded by the kernel.
      
      This patch makes the kernel emit a PERF_RECORD_LOST_SAMPLES record with
      the number of possible discarded records when it is impossible to demux
      the samples.
      
      It makes sure the user is not left in the dark about such discards.
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1431285195-14269-8-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f38b0dbb
    • Yan, Zheng's avatar
      perf/intel/x86: Enlarge the PEBS buffer · 15617499
      Yan, Zheng authored
      Currently the PEBS buffer size is 4k, it can only hold about 21
      PEBS records. This patch enlarges the PEBS buffer size to 64k
      (the same as the BTS buffer).
      
      64k memory can hold about 330 PEBS records. This will significantly
      reduce the number of PMIs when batched PEBS interrupts are enabled.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1430940834-8964-7-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      15617499
    • Yan, Zheng's avatar
      perf/x86/intel: Drain the PEBS buffer during context switches · 9c964efa
      Yan, Zheng authored
      Flush the PEBS buffer during context switches if PEBS interrupt threshold
      is larger than one. This allows perf to supply TID for sample outputs.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1430940834-8964-6-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9c964efa
    • Yan, Zheng's avatar
      perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold) · 3569c0d7
      Yan, Zheng authored
      PEBS always had the capability to log samples to its buffers without
      an interrupt. Traditionally perf has not used this but always set the
      PEBS threshold to one.
      
      For frequently occurring events (like cycles or branches or load/store)
      this in term requires using a relatively high sampling period to avoid
      overloading the system, by only processing PMIs. This in term increases
      sampling error.
      
      For the common cases we still need to use the PMI because the PEBS
      hardware has various limitations. The biggest one is that it can not
      supply a callgraph. It also requires setting a fixed period, as the
      hardware does not support adaptive period. Another issue is that it
      cannot supply a time stamp and some other options. To supply a TID it
      requires flushing on context switch. It can however supply the IP, the
      load/store address, TSX information, registers, and some other things.
      
      So we can make PEBS work for some specific cases, basically as long as
      you can do without a callgraph and can set the period you can use this
      new PEBS mode.
      
      The main benefit is the ability to support much lower sampling period
      (down to -c 1000) without extensive overhead.
      
      One use cases is for example to increase the resolution of the c2c tool.
      Another is double checking when you suspect the standard sampling has
      too much sampling error.
      
      Some numbers on the overhead, using cycle soak, comparing the elapsed
      time from "kernbench -M -H" between plain (threshold set to one) and
      multi (large threshold).
      
      The test command for plain:
        "perf record --time -e cycles:p -c $period -- kernbench -M -H"
      
      The test command for multi:
        "perf record --no-time -e cycles:p -c $period -- kernbench -M -H"
      
      ( The only difference of test command between multi and plain is time
        stamp options. Since time stamp is not supported by large PEBS
        threshold, it can be used as a flag to indicate if large threshold is
        enabled during the test. )
      
      	period    plain(Sec)  multi(Sec)  Delta
      	10003     32.7        16.5        16.2
      	20003     30.2        16.2        14.0
      	40003     18.6        14.1        4.5
      	80003     16.8        14.6        2.2
      	100003    16.9        14.1        2.8
      	800003    15.4        15.7        -0.3
      	1000003   15.3        15.2        0.2
      	2000003   15.3        15.1        0.1
      
      With periods below 100003, plain (threshold one) cause much more
      overhead. With 10003 sampling period, the Elapsed Time for multi is
      even 2X faster than plain.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1430940834-8964-5-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3569c0d7
    • Yan, Zheng's avatar
      perf/x86/intel: Handle multiple records in the PEBS buffer · 21509084
      Yan, Zheng authored
      When the PEBS interrupt threshold is larger than one record and the
      machine supports multiple PEBS events, the records of these events are
      mixed up and we need to demultiplex them.
      
      Demuxing the records is hard because the hardware is deficient. The
      hardware has two issues that, when combined, create impossible
      scenarios to demux.
      
      The first issue is that the 'status' field of the PEBS record is a copy
      of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a
      problem let us first describe the regular PEBS cycle:
      
      A) the CTRn value reaches 0:
        - the corresponding bit in GLOBAL_STATUS gets set
        - we start arming the hardware assist
        < some unspecified amount of time later -- this could cover multiple
          events of interest >
      
      B) the hardware assist is armed, any next event will trigger it
      
      C) a matching event happens:
        - the hardware assist triggers and generates a PEBS record
          this includes a copy of GLOBAL_STATUS at this moment
        - if we auto-reload we (re)set CTRn
        - we clear the relevant bit in GLOBAL_STATUS
      
      Now consider the following chain of events:
      
        A0, B0, A1, C0
      
      The event generated for counter 0 will include a status with counter 1
      set, even though its not at all related to the record. A similar thing
      can happen with a !PEBS event if it just happens to overflow at the
      right moment.
      
      The second issue is that the hardware will only emit one record for two
      or more counters if the event that triggers the assist is 'close'. The
      'close' can be several cycles. In some cases even the complete assist,
      if the event is something that doesn't need retirement.
      
      For instance, consider this chain of events:
      
        A0, B0, A1, B1, C01
      
      Where C01 is an event that triggers both hardware assists, we will
      generate but a single record, but again with both counters listed in the
      status field.
      
      This time the record pertains to both events.
      
      Note that these two cases are different but undistinguishable with the
      data as generated. Therefore demuxing records with multiple PEBS bits
      (we can safely ignore status bits for !PEBS counters) is impossible.
      
      Furthermore we cannot emit the record to both events because that might
      cause a data leak -- the events might not have the same privileges -- so
      what this patch does is discard such events.
      
      The assumption/hope is that such discards will be rare.
      
      Here lists some possible ways you may get high discard rate.
      
        - when you count the same thing multiple times. But it is not a useful
          configuration.
        - you can be unfortunate if you measure with a userspace only PEBS
          event along with either a kernel or unrestricted PEBS event. Imagine
          the event triggering and setting the overflow flag right before
          entering the kernel. Then all kernel side events will end up with
          multiple bits set.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      [ Changelog improvements. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1430940834-8964-4-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      21509084
    • Yan, Zheng's avatar
      perf/x86/intel: Introduce setup_pebs_sample_data() · 43cf7631
      Yan, Zheng authored
      Move code that sets up the PEBS sample data to a separate function.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1430940834-8964-3-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      43cf7631
    • Yan, Zheng's avatar
      perf/x86/intel: Use the PEBS auto reload mechanism when possible · 851559e3
      Yan, Zheng authored
      When a fixed period is specified, this patch makes perf use the PEBS
      auto reload mechanism. This makes normal profiling faster, because
      it avoids one costly MSR write in the PMI handler.
      
      However, the reset value will be loaded by hardware assist. There is a
      small delay compared to the previous non-auto-reload mechanism. The
      delay time is arbitrary, but very small. The assist cost is 400-800
      cycles, assuming common cases with everything cached. The minimum period
      the patch currently uses is 10000. In that extreme case it can be ~10%
      if cycles are used.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1430940834-8964-2-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      851559e3
    • Stephane Eranian's avatar
      perf record: Add support for sampling indirect jumps · 5b68164d
      Stephane Eranian authored
      This patch adds a new branch sampling type support for indirect jumps:
      
        perf record -j ind_jmp .......
      
      It enables analysis of indirect jumps targets. It requires kernel and
      possibly hardware support to operate correctly.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      [ Fixup against: f00898f4 (perf tools: Move branch option parsing to own file) ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@redhat.com
      Cc: dsahern@gmail.com
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Cc: namhyung@kernel.org
      Link: http://lkml.kernel.org/r/1431637800-31061-4-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5b68164d
    • Stephane Eranian's avatar
      perf/x86/intel: add support for PERF_SAMPLE_BRANCH_IND_JUMP · 7b74cfb2
      Stephane Eranian authored
      This patch enables support for branch sampling filter
      for indirect jumps (IND_JUMP). It enables LBR IND_JMP
      filtering where available. There is also software filtering
      support.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@redhat.com
      Cc: dsahern@gmail.com
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Cc: namhyung@kernel.org
      Link: http://lkml.kernel.org/r/1431637800-31061-3-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7b74cfb2
    • Stephane Eranian's avatar
      perf: add new PERF_SAMPLE_BRANCH_IND_JUMP branch sample type · c9fdfa14
      Stephane Eranian authored
      This patch adds a new branch_sample_type flag to enable
      filtering branch sampling to indirect jumps. The support
      is subject to hardware or kernel software support on each
      architecture.
      
      Filtering on indirect jump is useful to study the targets
      of the jump.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@redhat.com
      Cc: dsahern@gmail.com
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Cc: namhyung@kernel.org
      Link: http://lkml.kernel.org/r/1431637800-31061-2-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c9fdfa14
  2. 04 Jun, 2015 1 commit
  3. 03 Jun, 2015 1 commit
    • Wang Nan's avatar
      perf tools: Deal with kernel module names in '[]' correctly · 1f121b03
      Wang Nan authored
      Before patch ba92732e ('perf kmaps: Check kmaps to make code more
      robust'), 'perf report' and 'perf annotate' will segfault if trace data
      contains kernel module information like this:
      
       # perf report -D -i ./perf.data
       ...
       0 0 0x188 [0x50]: PERF_RECORD_MMAP -1/0: [0xffffffbff1018000(0xf068000) @ 0]: x [test_module]
       ...
      
       # perf report -i ./perf.data --objdump=/path/to/objdump --kallsyms=/path/to/kallsyms
      
       perf: Segmentation fault
       -------- backtrace --------
       /path/to/perf[0x503478]
       /lib64/libc.so.6(+0x3545f)[0x7fb201f3745f]
       /path/to/perf[0x499b56]
       /path/to/perf(dso__load_kallsyms+0x13c)[0x49b56c]
       /path/to/perf(dso__load+0x72e)[0x49c21e]
       /path/to/perf(map__load+0x6e)[0x4ae9ee]
       /path/to/perf(thread__find_addr_map+0x24c)[0x47deec]
       /path/to/perf(perf_event__preprocess_sample+0x88)[0x47e238]
       /path/to/perf[0x43ad02]
       /path/to/perf[0x4b55bc]
       /path/to/perf(ordered_events__flush+0xca)[0x4b57ea]
       /path/to/perf[0x4b1a01]
       /path/to/perf(perf_session__process_events+0x3be)[0x4b428e]
       /path/to/perf(cmd_report+0xf11)[0x43bfc1]
       /path/to/perf[0x474702]
       /path/to/perf(main+0x5f5)[0x42de95]
       /lib64/libc.so.6(__libc_start_main+0xf4)[0x7fb201f23bd4]
       /path/to/perf[0x42dfc4]
      
      This is because __kmod_path__parse treats '[' leading names as kernel
      name instead of names of kernel module.
      
      If perf.data contains build information and the buildid of such modules
      can be found, the dso->kernel of it will be set to DSO_TYPE_KERNEL by
      __event_process_build_id(), not kernel module.
      
      It will then be passed to dso__load() -> dso__load_kernel_sym() ->
      dso__load_kcore() if --kallsyms is provided.
      
      The refered patch adds NULL pointer checker to avoid segfault. However,
      such kernel modules are still processed incorrectly.
      
      This patch fixes __kmod_path__parse, makes it treat names like
      '[test_module]' as kernel modules.
      
      kmod-path.c is also update to reflect the above changes.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1433321541-170245-1-git-send-email-wangnan0@huawei.com
      [ Fixed the merged with 0443f36b ("perf machine: Fix the search
        for the kernel DSO on the unified list" ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1f121b03
  4. 02 Jun, 2015 3 commits
  5. 01 Jun, 2015 1 commit
  6. 30 May, 2015 1 commit
    • Wang Nan's avatar
      perf probe: Fix segfault when glob matching function without debuginfo · 6bb536cc
      Wang Nan authored
      Commit 4c859351 ("perf probe: Support
      glob wildcards for function name") introduces segfault problems when
      debuginfo is not available:
      
       # perf probe 'sys_w*'
        Added new events:
        Segmentation fault
      
      The first problem resides in find_probe_trace_events_from_map(). In
      that function, find_probe_functions() is called to match each symbol
      against glob to find the number of matching functions, but still use
      map__for_each_symbol_by_name() to find 'struct symbol' for matching
      functions. Unfortunately, map__for_each_symbol_by_name() does
      exact matching by searching in an rbtree.
      
      It doesn't know glob matching, and not easy for it to support it because
      it use rbtree based binary search, but we are unable to ensure all names
      matched by the glob (any glob passed by user) reside in one subtree.
      
      This patch drops map__for_each_symbol_by_name(). Since there is no
      rbtree again, re-matching all symbols costs a lot. This patch avoid it
      by saving all matching results into an array (syms).
      
      The second problem is the lost of tp->realname. In
      __add_probe_trace_events(), if pev->point.function is glob, the event
      name should be set to tev->point.realname. This patch ensures its
      existence by strdup sym->name instead of leaving a NULL pointer there.
      
      After this patch:
      
       # perf probe 'sys_w*'
       Added new events:
         probe:sys_waitid     (on sys_w*)
         probe:sys_wait4      (on sys_w*)
         probe:sys_waitpid    (on sys_w*)
         probe:sys_write      (on sys_w*)
         probe:sys_writev     (on sys_w*)
      
       You can now use it in all perf tools, such as:
      
               perf record -e probe:sys_writev -aR sleep 1
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1432892747-232506-1-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6bb536cc
  7. 29 May, 2015 15 commits
  8. 28 May, 2015 3 commits
  9. 27 May, 2015 5 commits