- 30 Sep, 2015 16 commits
-
-
Arnaldo Carvalho de Melo authored
If the user doesn't specify any event, try the most precise "cycles" available, i.e. start by "cycles:ppp" and go on removing "p" till it works. E.g. $ perf record usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.017 MB perf.data (11 samples) ] $ perf evlist cycles:pp $ perf evlist -v cycles:pp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 $ grep 'model name' /proc/cpuinfo | head -1 model name : Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz $ When 'cycles' appears explicitely is specified this will not be tried, i.e. the user has full control of the level of precision to be used: $ perf record -e cycles usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data (9 samples) ] $ perf evlist cycles $ perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chandler Carruth <chandlerc@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://www.youtube.com/watch?v=nXaxk27zwlk Link: http://lkml.kernel.org/n/tip-b1ywebmt22pi78vjxau01wth@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
So that one can, for instance, use it with wc -l: # perf list *:*write* | wc -l 60 Or to look for the "bio" tracepoints, without 'perf list' headers: # perf list *:*bio* | head block:block_bio_backmerge [Tracepoint event] block:block_bio_bounce [Tracepoint event] block:block_bio_complete [Tracepoint event] block:block_bio_frontmerge [Tracepoint event] block:block_bio_queue [Tracepoint event] block:block_bio_remap [Tracepoint event] # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ts7sc0x8u4io4cifzkup4j44@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Masami Hiramatsu authored
perf probe shows more precisely message when it finds given %return target function is inlined. Without this fix: ---- # ./perf probe -V getname_flags%return Return probe must be on the head of a real function. Debuginfo analysis failed. Error: Failed to show vars. ---- With this fix: ---- # ./perf probe -V getname_flags%return Failed to find "getname_flags%return", because getname_flags is an inlined function and has no return point. Debuginfo analysis failed. Error: Failed to show vars. ---- Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20150930164137.3733.55055.stgit@localhost.localdomainSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Masami Hiramatsu authored
perf probe --list will get a segfault if the first kprobe event is on a module and the second or latter one is on the kernel. e.g. ---- # ./perf probe -q -m pcspkr pcspkr_event # ./perf probe -q vfs_read # ./perf probe -l Segmentation fault (core dumped) ---- This is because the debuginfo_cache fails to handle NULL module name, which causes segfault on strcmp. (Note that strcmp("something", NULL) always causes segfault) To fix this debuginfo_cache__open always translates the NULL module name to "kernel" (this is correct, because NULL module name means opening the debuginfo for the kernel) ---- # ./perf probe -l probe:pcspkr_event (on pcspkr_event@drivers/input/misc/pcspkr.c in pcspkr) probe:vfs_read (on vfs_read@ksrc/linux-3/fs/read_write.c) ---- Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20150930164135.3733.23993.stgit@localhost.localdomainSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Masami Hiramatsu authored
Perf probe always failed to find appropriate line numbers because of failing to find .text start address offset from debuginfo. e.g. ---- # ./perf probe -m pcspkr pcspkr_event:5 Added new events: probe:pcspkr_event (on pcspkr_event:5 in pcspkr) probe:pcspkr_event_1 (on pcspkr_event:5 in pcspkr) You can now use it in all perf tools, such as: perf record -e probe:pcspkr_event_1 -aR sleep 1 # ./perf probe -l Failed to find debug information for address ffffffffa031f006 Failed to find debug information for address ffffffffa031f016 probe:pcspkr_event (on pcspkr_event+6 in pcspkr) probe:pcspkr_event_1 (on pcspkr_event+22 in pcspkr) ---- This fixes the above issue as below. 1. Get the relative address of the symbol in .text by using map->start. 2. Adjust the address by adding the offset of .text section in the kernel module binary. With this fix, perf probe -l shows lines correctly. ---- # ./perf probe -l probe:pcspkr_event (on pcspkr_event:5@drivers/input/misc/pcspkr.c in pcspkr) probe:pcspkr_event_1 (on pcspkr_event:5@drivers/input/misc/pcspkr.c in pcspkr) ---- Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20150930164132.3733.24643.stgit@localhost.localdomainSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Masami Hiramatsu authored
Fix a trival bug about libdwfl usage of the report session, it should explicitly begin and end a report session around dwfl_report_offline(). Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20150930164128.3733.59876.stgit@localhost.localdomainSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Masami Hiramatsu authored
Fix to remove dot suffix (e.g. .const, .isra) from the second or latter events which has suffix numbers. Since the previous commit 35a23ff9 ("perf probe: Cut off the gcc optimization postfixes from function name") didn't care about the suffix numbered events, therefore we'll have an error when we add additional events on the same dot suffix functions. e.g. ---- # ./perf probe -f -a get_sigframe.isra.2.constprop.3 \ -a get_sigframe.isra.2.constprop.3 Failed to write event: Invalid argument Error: Failed to add events. ---- This fixes above issue as below: ---- # ./perf probe -f -a get_sigframe.isra.2.constprop.3 \ -a get_sigframe.isra.2.constprop.3 Added new events: probe:get_sigframe (on get_sigframe.isra.2.constprop.3) probe:get_sigframe_1 (on get_sigframe.isra.2.constprop.3) You can now use it in all perf tools, such as: perf record -e probe:get_sigframe_1 -aR sleep 1 ---- Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20150930164130.3733.26573.stgit@localhost.localdomainSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
Map 't', 'T' (text, local, global), 'w' and 'W' (weak text, local, global) as STT_FUNC, and the rest as STT_OBJECT Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-sbwcixulpc5v1xuxn3xvm0nn@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
It is about binding, not type, we have just a letter in kallsyms that should map both for the ELF type (STT_FUNC, etc) and to the ELF symbol binding (STB_WEAK, STB_GLOBAL, etc), so rename it now before introducing kallsyms2_elf_type() Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-uu5vj343ms1q2wm55690on6v@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
And it is also a step in the direction of killing the separation of data and text maps in map_groups. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-rrds86kb3wx5wk8v38v56gw8@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
In places where we were using its open coded equivalent. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-khkdugcdoqy3tkszm3jdxgbe@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Sukadev Bhattiprolu authored
The perf_regs.c file does not get built on Powerpc as CONFIG_PERF_REGS is false. So the weak definition for 'sample_regs_masks' doesn't get picked up. Adding perf_regs.o to util/Build unconditionally, exposes a redefinition error for 'perf_reg_value()' function (due to the static inline version in util/perf_regs.h). So use #ifdef HAVE_PERF_REGS_SUPPORT' around that function. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: linuxppc-dev@ozlabs.org Link: http://lkml.kernel.org/r/20150930182836.GA27858@us.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
The --max_stack option was added as an optimization to reduce processing time, so people specifying --max-stack might get a increased processing time if combined with synthesized callchains, but otherwise no real harm. A warning about setting both --max_stack and the synthesized callchains max depth seems like overkill. Amend the documentation. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/560A5155.4060105@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
Out of map_groups__find_symbol_by_name(), so that we can turn this later one first into a call to maps__find_symbol_by_name(MAP__FUNCTION) + MAP__VARIABLE, and then to just one call, we'll merge MAP__FUNCTION with MAP__VARIABLE maps, to simplify the code. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-pvkar0jacqn92g148u9sqttt@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
The error variable breaks build on CentOS 6.7, due to a collision with a global error symbol: CC util/parse-events.o cc1: warnings being treated as errors util/parse-events.c:419: error: declaration of ‘error’ shadows a global declaration util/util.h:135: error: shadowed declaration is here util/parse-events.c: In function ‘add_tracepoint_multi_event’: ... Using different argument names instead to fix it. Reported-by: Vinson Lee <vlee@twopensource.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: linux-tip-commits@vger.kernel.org Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Raphael Beamonte <raphael.beamonte@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150929150531.GI27383@krava.redhat.com [ Fix one more case, at line 770 ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
The error variable breaks build on CentOS 6.7, due to collision with global error symbol: CC util/evlist.o cc1: warnings being treated as errors In file included from util/evlist.c:28: tools/include/linux/err.h: In function ‘ERR_PTR’: tools/include/linux/err.h:34: error: declaration of ‘error’ shadows a global declaration util/util.h:135: error: shadowed declaration is here Using 'error_' name instead to fix it. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/n/tip-i9mdgdbrgauy3fe76s9rd125@git.kernel.orgReported-by: Vinson Lee <vlee@twopensource.com> [ Use 'error_' instead of 'err' to, visually, not diverge too much from include/linux/err.h ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 29 Sep, 2015 1 commit
-
-
Ingo Molnar authored
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: - Accept a zero --itrace period, meaning "as often as possible". In the case of Intel PT that is the same as a period of 1 and a unit of 'instructions' (i.e. --itrace=i1i). (Adrian Hunter) - Harmonize itrace's synthesized callchains with the existing --max-stack tool option. (Adrian Hunter) - Allow time to be displayed in nanoseconds in 'perf script'. (Adrian Hunter) - Fix potential infinite loop when handling Intel PT timestamps. (Adrian Hunter) - Slighly improve Intel PT debug logging. (Adrian Hunter) - Warn when AUX data has been lost, just like when processing PERF_RECORD_LOST. (Adrian Hunter) - Further document export-to-postgresql.py script. (Adrian Hunter) - Add option to synthesize branch stack from auxtrace data. (Adrian Hunter) - Use equivalent logic to avoid using dso->kernel. (Arnaldo Carvalho de Melo) - Show proper error messages when parsing bad terms for hw/sw events. (He Kuang) - Tracepoint event parsing improvements. (He Kuang) - Store tracing mountpoint for better error message. (Jiri Olsa) - Add fixdep to tools/build, bringing it closer to the kernel counterpart, from where it is being lifted. (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 28 Sep, 2015 23 commits
-
-
He Kuang authored
This patch enables config terms for tracepoint perf events. Valid terms for tracepoint events are 'call-graph' and 'stack-size', so we can use different callgraph settings for each event and eliminate unnecessary overhead. Here is an example for using different call-graph config for each tracepoint. $ perf record -e syscalls:sys_enter_write/call-graph=fp/ -e syscalls:sys_exit_write/call-graph=no/ dd if=/dev/zero of=test bs=4k count=10 $ perf report --stdio # # Total Lost Samples: 0 # # Samples: 13 of event 'syscalls:sys_enter_write' # Event count (approx.): 13 # # Children Self Command Shared Object Symbol # ........ ........ ....... .................. ...................... # 76.92% 76.92% dd libpthread-2.20.so [.] __write_nocancel | ---__write_nocancel 23.08% 23.08% dd libc-2.20.so [.] write | ---write | |--33.33%-- 0x2031342820736574 | |--33.33%-- 0xa6e69207364726f | --33.33%-- 0x34202c7320393039 ... # Samples: 13 of event 'syscalls:sys_exit_write' # Event count (approx.): 13 # # Children Self Command Shared Object Symbol # ........ ........ ....... .................. ...................... # 76.92% 76.92% dd libpthread-2.20.so [.] __write_nocancel 23.08% 23.08% dd libc-2.20.so [.] write 7.69% 0.00% dd [unknown] [.] 0x0a6e69207364726f 7.69% 0.00% dd [unknown] [.] 0x2031342820736574 7.69% 0.00% dd [unknown] [.] 0x34202c7320393039 Signed-off-by: He Kuang <hekuang@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1443412336-120050-4-git-send-email-hekuang@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
He Kuang authored
Adds rules for parsing tracepoint names. Change rules of tracepoint which derives from PE_NAMEs into tracepoint names directly, so adding more rules based on tracepoint names will be easier. Changes v2-v3: - Change __event_legacy_tracepoint label in bison file to tracepoint_name - Fix formats error. Signed-off-by: He Kuang <hekuang@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1443412336-120050-3-git-send-email-hekuang@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
He Kuang authored
Show proper error message and show valid terms when wrong config terms is specified for hw/sw type perf events. This patch makes the original error format function formats_error_string() more generic, which only outputs the static config terms for hw/sw perf events, and prepends pmu formats for pmu events. Before this patch: $ perf record -e 'cpu-clock/freqx=200/' -a sleep 1 invalid or unsupported event: 'cpu-clock/freqx=200/' Run 'perf list' for a list of valid events usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events After this patch: $ perf record -e 'cpu-clock/freqx=200/' -a sleep 1 event syntax error: 'cpu-clock/freqx=200/' \___ unknown term valid terms: config,config1,config2,name,period,freq,branch_type,time,call-graph,stack-size Run 'perf list' for a list of valid events usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: He Kuang <hekuang@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1443412336-120050-2-git-send-email-hekuang@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
He Kuang authored
Currently, function config_term() is used for checking config terms of all types of events, while unknown terms is not reported as an error because pmu events have valid terms in sysfs. But this is wrong when unknown terms are specificed to hw/sw events. This patch Adds the config_term callback so we can use separate check routines for each type of events. Signed-off-by: He Kuang <hekuang@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1443412336-120050-1-git-send-email-hekuang@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
autofdo incorrectly expects branch flags to include either mispred or predicted. In fact mispred = predicted = 0 is valid and means the flags are not supported, which they aren't by Intel PT. To make autofdo work, add a config option which will cause Intel PT decoder to set the mispred flag on all branches. Below is an example of using Intel PT with autofdo. The example is also added to the Intel PT documentation. It requires autofdo (https://github.com/google/autofdo) and gcc version 5. The bubble sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial) amended to take the number of elements as a parameter. $ gcc-5 -O3 sort.c -o sort_optimized $ ./sort_optimized 30000 Bubble sorting array of 30000 elements 2254 ms $ cat ~/.perfconfig [intel-pt] mispred-all $ perf record -e intel_pt//u ./sort 3000 Bubble sorting array of 3000 elements 58 ms [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 3.939 MB perf.data ] $ perf inject -i perf.data -o inj --itrace=i100usle --strip $ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo $ ./sort_autofdo 30000 Bubble sorting array of 30000 elements 2155 ms Note there is currently no advantage to using Intel PT instead of LBR, but that may change in the future if greater use is made of the data. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-26-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Add a new option --strip which is used with --itrace to strip out non-synthesized events. This results in a perf.data file that is simpler for external tools to parse. In particular, this can be used to prepare a perf.data file for consumption by autofdo. A subsequent patch makes a change to Intel PT also to enable use with autofdo and gives an example of that use. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-25-git-send-email-adrian.hunter@intel.com [ Made it use perf_evlist__remove() + perf_evsel__delete() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
perf inject can process instruction traces (using the --itrace option) which removes aux-related events and replaces them with the requested synthesized events. However there are still some leftovers, namely PERF_RECORD_ITRACE_START events and the original evsel (selected event) e.g. intel_pt// For the sake of completeness, remove them too. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-24-git-send-email-adrian.hunter@intel.com [ Made it use perf_evlist__remove() + perf_evsel__delete() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Add a counterpart to perf_evlist__add() that does the opposite and deletes the evsel. This will be used by perf inject to remove unwanted evsels. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-23-git-send-email-adrian.hunter@intel.com [ Renamed it from perf_evlist__del() to perf_evlist__remove() and removed the perf_evsel__delete() call ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
perf_evlist__id2evsel_strict() is the same as perf_evlist__id2evsel() except that it ensures that the id must match. This will be used by perf inject to find a specific evsel that is to be deleted, hence the need to match exactly. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-22-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
perf script has a setting to set the maximum stack depth when processing callchains. The setting defaults to the hard-coded maximum definition PERF_MAX_STACK_DEPTH which is 127. It is possible, when processing instruction traces, to synthesize callchains. Synthesized callchains do not have the kernel size limitation and are whatever size the user requests, although validation presently prevents the user requested a value greater that 1024. The default value is 16. To allow for synthesized callchains, make the scripting_max_stack value at least the same size as the synthesized callchain size. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-21-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Use the scripting_max_stack value to allow for values greater than PERF_MAX_STACK_DEPTH. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-20-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Add a setting for maximum stack depth in preparation for allowing for synthesized callchains. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-19-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Use the max_stack value instead of PERF_MAX_STACK_DEPTH so that arbitrary-sized callchains can be supported. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-17-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
perf report has an option (--max-stack) to set the maximum stack depth when processing callchains. The option defaults to the hard-coded maximum definition PERF_MAX_STACK_DEPTH which is 127. The intention of the option is to allow the user to reduce the processing time by reducing the amount of the callchain that is processed. It is also possible, when processing instruction traces, to synthesize callchains. Synthesized callchains do not have the kernel size limitation and are whatever size the user requests, although validation presently prevents the user requested a value greater that 1024. The default value is 16. To allow for synthesized callchains, make the max_stack value at least the same size as the synthesized callchain size. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-16-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Add support for generating branch stack context for PT samples. The decoder reports a configurable number of branches as branch context for each sample. Internally it keeps track of them by using a simple sliding window. We also flush the last branch buffer on each sample to avoid overlapping intervals. This is useful for: - Reporting accurate basic block edge frequencies through the perf report branch view - Using with --branch-history to get the wider context of samples - Other users of LBRs Also the Documentation is updated. Examples: Record with Intel PT: perf record -e intel_pt//u ls Branch stacks are used by default if synthesized so: perf report --itrace=ile is the same as: perf report --itrace=ile -b Branch history can be requested also: perf report --itrace=igle --branch-history Based-on-patch-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-15-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
intel_pt_synth_branch_sample() skips synthesizing if the branch does not match the branch filter. That logic was sitting in the middle of the function but is more efficiently placed at the start of the function, so move it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-14-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
The branch stack feature flag is set by 'perf record' when recording data that contains branch stacks. Consequently, when 'perf inject' synthesizes branch stacks, the feature flag should be set also. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-13-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
A non-synthesized event might not have a branch stack if branch stacks have been synthesized (using itrace options). An example of that is when Intel PT records sched_switch events for decoding purposes. Those sched_switch events do not have branch stacks even though the Intel PT decoder may be synthesizing other events that do due to the itrace options. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-12-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
The 'perf report' tool will default to displaying branch stacks (-b option) if they are present. Make that also happen for synthesized branch stacks. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-11-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
perf report looks at event sample types to determine if branch stacks have been sampled. Adjust the validation to know about instruction tracing options. This change allows the use of the -b option which otherwise would complain with an error like: Error: Selected -b but no branch data. Did you call perf record without -b? # To display the perf.data header info, # please use --header/--header-only options. # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-10-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Add AUX area tracing option 'l' to synthesize branch stacks on samples just like sample type PERF_SAMPLE_BRANCH_STACK. This is taken into use by Intel PT in a subsequent patch. Based-on-patch-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-9-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Add some comments to the script and some 'views' to the created database that better illustrate the database structure and how it can be used. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-8-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
By default 'perf record' will postprocess the perf.data file to determine build-ids. When that happens, the number of lost perf events is displayed. Make that also happen for AUX events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-7-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-