- 10 Jun, 2019 29 commits
-
-
Arnaldo Carvalho de Melo authored
Suzuki noticed that this should be more useful in a generic header, and after looking I noticed we have it already in our copy of include/linux/bits.h in tools/include, so just use it, test built on x86-64 and ubuntu 19.04 with: perfbuilder@46646c9e848e:/$ aarch64-linux-gnu-gcc --version |& head -1 aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0 perfbuilder@46646c9e848e:/$ Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lkml.kernel.org/r/68c1c548-33cd-31e8-100d-7ffad008c7b2@arm.com Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org, Link: https://lkml.kernel.org/n/tip-69pd3mqvxdlh2shddsc7yhyv@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
This patch adds the necessary intelligence to properly compute the value of 'old' and 'head' when operating in snapshot mode. That way we can get the latest information in the AUX buffer and be compatible with the generic AUX ring buffer mechanic. Tester notes: > Leo, have you had the chance to test/review this one? Suzuki? Sure. I applied this patch on the perf/core branch (with latest commit 3e4fbf36c1e3 'perf augmented_raw_syscalls: Move reading filename to the loop') and passed testing with below steps: # perf record -e cs_etm/@tmc_etr0/ -S -m,64 --per-thread ./sort & [1] 19097 Bubble sorting array of 30000 elements # kill -USR2 19097 # kill -USR2 19097 # kill -USR2 19097 [ perf record: Woken up 4 times to write data ] [ perf record: Captured and wrote 0.753 MB perf.data ] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190605161633.12245-1-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
The 'die' info isn't in the same array as core and socket ids, and we missed the 'dies' string list, that comes right after the 'core' + 'socket' id variable length array, followed by the VLA for the dies. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: c9cb12c5ba08 ("perf header: Add die information in CPU topology") Link: https://lkml.kernel.org/n/tip-nubi6mxp2n8ofvlx7ph6k3h6@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
The existing "thread_siblings" and "thread_siblings_list" attribute will be deprecated. Use the new CPU topology sysfs attributes, "core_cpus" and "core_cpus_list", which are synonymous with the deprecated attributes. Check the new name first. If not available, use the deprecated name to be compatible with old kernel. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1559688644-106558-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
The "sibling cores" actually shows the sibling CPUs of a socket. The name "sibling cores" is very misleading. Rename "sibling cores" to "sibling sockets" Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1559688644-106558-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
It is useful to aggregate counts per die. E.g. Uncore becomes die-scope on Xeon Cascade Lake-AP. Introduce a new option "--per-die" to support per-die aggregation. The global id for each core has been changed to socket + die id + core id. The global id for each die is socket + die id. Add die information for per-core aggregation. The output of per-core aggregation will be changed from "S0-C0" to "S0-D0-C0". Any scripts which rely on the output format of per-core aggregation probably be broken. For 'perf stat record/report', there is no die information when processing the old perf.data. The per-die result will be the same as per-socket. Committer notes: Renamed 'die' variable to 'die_id' to fix the build in some systems: CC /tmp/build/perf/builtin-script.o cc1: warnings being treated as errors builtin-stat.c: In function 'perf_env__get_die': builtin-stat.c:963: error: declaration of 'die' shadows a global declaration util/util.h:19: error: shadowed declaration is here mv: cannot stat `/tmp/build/perf/.builtin-stat.o.tmp': No such file or directory Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/n/tip-bsnhx7vgsuu6ei307mw60mbj@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
With the new CPUID.1F, a new level type of CPU topology, 'die', is introduced. The 'die' information in CPU topology should be added in perf header. To be compatible with old perf.data, the patch checks the section size before reading the die information. The new info is added at the end of the cpu_topology section, the old perf tool ignores the extra data. It never reads data crossing the section boundary. The new perf tool with the patch can be used on legacy kernel. Add a new function has_die_topology() to check if die topology information is supported by kernel. The function only check X86 and CPU 0. Assuming other CPUs have same topology. Use similar method for core and socket to support die id and sibling dies string. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1559688644-106558-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
There is no function to retrieve die id information of a given CPU. Add cpu_map__get_die_id() to retrieve die id information. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1559688644-106558-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
Add support for CPU-wide trace scenarios by correlating range packets with timestamp packets. That way range packets received on different ETMQ/traceID channels can be processed and synthesized in chronological order. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-18-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
This patch deals with timestamp packets received from the decoding library in order to give the front end packet processing loop a handle on the time instruction conveyed by range packets have been executed at. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-17-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
Link contextID packets received from the decoder with the perf tool thread mechanic so that we know the specifics of the process currently executing. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-16-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
When operating in CPU-wide trace mode with a source/sink topology of N:1 packets with multiple traceID will end up in the same cs_etm_queue. In order to properly decode packets they need to be split in different queues, i.e one queue per traceID. As such add support for multiple traceID per cs_etm_queue by adding a new cs_etm_traceid_queue every time a new traceID is discovered in the trace stream. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-15-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
When working with CPU-wide traces different traceID may be found in the same stream. As such we need to use the decoder callback that provides the traceID in order to know the thread context being decoded. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-14-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
The tid/pid fields of structure cs_etm_queue are CPU dependent and as such need to be part of the cs_etm_traceid_queue in order to support CPU-wide trace scenarios. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-13-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
The thread field of structure cs_etm_queue is CPU dependent and as such need to be part of the cs_etm_traceid_queue in order to support CPU-wide trace scenarios. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-12-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
Nowadays the synthesize code is using the packet's cpu information, making cs_etm_queue::cpu useless. As such simply remove it. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-11-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
In an ideal world there is one CPU per cs_etm_queue and as such, one trace ID per cs_etm_queue. In the real world CoreSight topologies allow multiple CPUs to use the same sink, which translates to multiple trace IDs per cs_etm_queue. To deal with this a new cs_etm_traceid_queue structure is introduced to enclose all the information related to a single trace ID, allowing a cs_etm_queue to handle traces generated by any number of CPUs. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-10-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
Fixing wrong indentation of the while() loop - no change of functionality. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Fixes: 3fa0e83e ("perf cs-etm: Modularize main packet processing loop") Link: http://lkml.kernel.org/r/20190524173508.29044-9-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
The decoder needs to work with more than one traceID queue if we want to support CPU-wide scenarios with N:1 source/sink topologies. As such move the packet buffer and related fields out of the decoder structure and into the cs_etm_queue structure. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-8-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
There is no point in having two different error goto statement since the openCSD API to free a decoder handles NULL pointers. As such function cs_etm_decoder__free() can be called to deal with all aspect of freeing decoder memory. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-7-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
Add handling of SWITCH-CPU-WIDE events in order to add the tid/pid of the incoming process to the perf tools machine infrastructure. This information is later retrieved when a contextID packet is found in the trace stream. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-6-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
Add handling of ITRACE events in order to add the tid/pid of the executing process to the perf tools machine infrastructure. This information is later retrieved when a contextID packet is found in the trace stream. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-5-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
Ask the perf core to generate an event when processes are swapped in/out of context. That way proper action can be taken by the decoding code when faced with such event. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-4-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
When operating in CPU-wide mode tracers need to generate timestamps in order to correlate the code being traced on one CPU with what is executed on other CPUs. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-3-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mathieu Poirier authored
When operating in CPU-wide mode being notified of contextID changes is required so that the decoding mechanic is aware of the process context switch. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Suzuki Poulouse <suzuki.poulose@arm.com> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190524173508.29044-2-mathieu.poirier@linaro.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
It's already setup in the only caller of this method in perf_evsel__open(), right before calling perf_evsel__alloc_fd(), no need to do it again. Also it's better to have it out of the function before we move it to libperf. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-1k8lhyjxfk7o8v4g3r7eyjc9@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
yuzhoujian authored
One can just record callchains in the kernel or user space with this new options. We can use it together with "--all-kernel" options. This two options is used just like print_stack(sys) or print_ustack(usr) for systemtap. Shown below is the usage of this new option combined with "--all-kernel" options: 1. Configure all used events to run in kernel space and just collect kernel callchains. $ perf record -a -g --all-kernel --kernel-callchains 2. Configure all used events to run in kernel space and just collect user callchains. $ perf record -a -g --all-kernel --user-callchains Committer notes: Improved documentation to state that asking for kernel callchains really is asking for excluding user callchains, and vice versa. Further mentioned that using both won't get both, but nothing, as both will be excluded. Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1559222962-22891-1-git-send-email-ufo19890607@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
So perf_config() uses: int ret = 0; perf_config_set__for_each_entry(config_set, section, item) { ... ret = fn(); if (ret < 0) break; } return ret; Expecting that that break will imediatelly go to function exit to return that error value (ret). The problem is that perf_config_set__for_each_entry() expands into two nested for() loops, one traversing the sections in a config and the second the items in each of those sections, so we have to change that 'break' to a goto label right before that final 'return ret'. With that, for instance 'perf trace' now correctly bails out when a event that is requested to be added via its 'trace.add_events' ~/.perfconfig entry gets rejected by the kernel BPF verifier: # perf trace ls event syntax error: '/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o' \___ Kernel verifier blocks program loading (add -v to see detail) Run 'perf list' for a list of valid events Error: wrong config key-value pair trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o # While before it would continue and explode later, when trying to find maps that would have been in place had that augmented_raw_syscalls.o precompiled BPF proggie been accepted by the, humm, bast... rigorous kernel BPF verifier 8-) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Yonghong Song <yhs@fb.com> Fixes: 8a0a9c7e ("perf config: Introduce new init() and exit()") Link: https://lkml.kernel.org/n/tip-qvqxfk9d0rn1l7lcntwiezrr@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Leo Yan authored
On my Juno board with ARM64 CPUs, perf trace command reports the eBPF program building failure but the command will not exit and continue to run. If we define an eBPF event in config file, the event will be parsed with below flow: perf_config() `> trace__config() `> parse_events_option() `> parse_events__scanner() `-> parse_events_parse() `> parse_events_load_bpf() `> llvm__compile_bpf() Though the low level functions return back error values when detect eBPF building failure, but parse_events_option() returns 1 for this case and trace__config() passes 1 to perf_config(); perf_config() doesn't treat the returned value 1 as failure and it continues to parse other configurations. Thus the perf command continues to run even without enabling eBPF event successfully. This patch changes error handling in trace__config(), when it detects failure it will return -1 rather than directly pass error value (1); finally, perf_config() will directly bail out and perf will exit for this case. Committer notes: Simplified the patch to just check directly the return of parse_events_option() and it it is non-zero, change err from its initial zero value to -1. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Yonghong Song <yhs@fb.com> Fixes: ac96287c ("perf trace: Allow specifying a set of events to add in perfconfig") Link: https://lkml.kernel.org/n/tip-x4i63f5kscykfok0hqim3zma@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 05 Jun, 2019 11 commits
-
-
Arnaldo Carvalho de Melo authored
For instance, the rename* family uses "oldname", "newname", so check if "name" is at the end and treat it as a filename. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-wjy7j4bk06g7atzwoz1mid24@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To support the SCA_FILENAME beautifier in more than one syscall arg, as needed for syscalls such as the rename* family, we need to, after processing one such arg, bump the augmented pointers so that the next augmented arg don't reuse data for the previous augmented arguments. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-4e4cmzyjxb3wkonfo1x9a27y@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
We are getting false positive gcc warning when we compile with gcc9 (9.1.1): CC jvmti/libjvmti.o In file included from /usr/include/string.h:494, from jvmti/libjvmti.c:5: In function ‘strncpy’, inlined from ‘copy_class_filename.constprop’ at jvmti/libjvmti.c:166:3: /usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] 106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ jvmti/libjvmti.c: In function ‘copy_class_filename.constprop’: jvmti/libjvmti.c:165:26: note: length computed here 165 | size_t file_name_len = strlen(file_name); | ^~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors As per Arnaldo's suggestion use strlcpy(), which does the same thing and keeps gcc silent. Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20190531131321.GB1281@kravaSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
Almost there, next step is to copy more than one filename payload. Probably to read syscall arg structs, etc we'll need just a variation of this that will decide what to use, if probe_read_str() or plain probe_read for structs, i.e. fixed size. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-uf6u0pld6xe4xuo16f04owlz@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
So that we can use it for multiple args, baby steps not to step into the verifier toes. In the process make sure we handle -EFAULT from bpf_prog_read_str(), as this really is needed now that we'll handle more than one augmented argument, i.e. if there is failure, then we have the argument that fails have: (size = 0, err = -EFAULT, value = [] ) followed by the next, lets say that worked for a second pathname: (size = 4, err = 0, value = "/tmp" ) So we can skip the first while telling the user about the problem and then process the second. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-deyvqi39um6gp6hux6jovos8@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
One more step into copying multiple filenames to support syscalls like rename*. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-xdqtjexdyp81oomm1rkzeifl@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
Since we know what args are strings from reading the syscall descriptions in tracefs and also already mark such args to be beautified using the syscall_arg__scnprintf_filename() helper, all we need is to fill in this info in the 'syscalls' BPF map we were using to state which syscalls the user is interested in, i.e. the syscall filter. Right now just set that with PATH_MAX and unroll the syscall arg in the BPF program, as the verifier isn't liking something clang generates when unrolling the loop. This also makes the augmented_raw_syscalls.c program support all arches, since we removed that set of defines with the hard coded syscall numbers, all should be automatically set for all arches, with the syscall id mapping done correcly. Doing baby steps here, i.e. just the first string arg for a syscall is printed, syscalls with more than one, say, the various rename* syscalls, need further work, but lets get first something that the BPF verifier accepts before increasing the complexity To test it, something like: # perf trace -e string -e /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c With: # cat ~/.perfconfig [llvm] dump-obj = true clang-opt = -g [trace] #add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c show_zeros = yes show_duration = no no_inherit = yes show_timestamp = no show_arg_names = no args_alignment = 40 show_prefix = yes # That commented add_events line is needed for developing this augmented_raw_syscalls.c BPF program, as if we add it via the 'add_events' mechanism so as to shorten the 'perf trace' command lines, then we end up not setting up the -v option which precludes us having access to the bpf verifier log :-\ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lkml.kernel.org/n/tip-dn863ya0cbsqycxuy0olvbt1@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
The user probably wants to replace the find text, so select the find text when the find bar is activated. That is fairly standard behaviour for search text entry. Entering text will replace the current text, but using edit keys (arrows, home, end etc) cancels the selection and enables editing. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190520113728.14389-23-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Enhance the call tree to display IPC information if it is available. Committer testing: [acme@quaco adrian.hunter]$ python ~acme/libexec/perf-core/scripts/python/exported-sql-viewer.py ~/c/adrian.hunter/simple-retpoline.db Reports -> Call Tree, then expand a few trees, then select with the mouse and press control+C (copy): Call Path Object Call Time Time Time(%) Insn Insn Cyc Cyc IPC Branch Branch ▼ simple-retpolin (ns) Cnt Cnt(%) Cnt Cnt(%) Count Count(%) ▼ 23003:23003 ▼ _start ld-2.28.so 112195670 218295 100.0 127746 100.0 207320 100.0 0.62 13046 100.0
▶ unknown unknown 112195987 3202 1.5 0 0.0 0 0.0 0 1 0.0▶ _dl_start ld-2.28.so 112199189 188471 86.3 123394 96.6 180007 86.8 0.69 12529 96.0 ▼ _dl_init ld-2.28.so 112387660 13406 6.1 3207 2.5 14868 7.2 0.22 327 2.5▶ call_init.part.0 ld-2.28.so 112387773 117 0.9 70 2.2 639 4.3 0.11 3 0.9▶ call_init.part.0 ld-2.28.so 112387890 13129 97.9 3103 96.8 14100 94.8 0.22 315 96.3▶ call_init.part.0 ld-2.28.so 112401020 0 0.0 0 0.0 0 0.0 0 2 0.6 ▼ _start simple-retpol 112401066 12899 5.9 1142 0.9 11561 5.6 0.10 184 1.4▶ unknown unknown 112401388 846 6.6 0 0.0 0 0.0 0 1 0.5 ▼ __libc_start_main libc-2.28.so 112402344 11621 90.1 1129 98.9 10350 89.5 0.11 181 98.4▶ __cxa_atexit libc-2.28.so 112402360 2302 19.8 101 8.9 1817 17.6 0.06 13 7.2▶ __libc_csu_init simple-retpol 112404673 121 1.0 43 3.8 340 3.3 0.13 8 4.4▶ _setjmp libc-2.28.so 112404794 74 0.6 46 4.1 206 2.0 0.22 4 2.2 ▼ main simple-retpol 112404892 44 0.4 23 2.0 126 1.2 0.18 12 6.6 ▼ foo simple-retpol 112404892 19 43.2 12 52.2 55 43.7 0.22 5 41.7 bar simple-retpol 112404896 12 63.2 3 25.0 34 61.8 0.09 1 20.0 ▼ foo simple-retpol 112404911 25 56.8 11 47.8 71 56.3 0.15 5 41.7▶ bar simple-retpol 112404924 10 40.0 3 27.3 27 38.0 0.11 1 20.0▶ exit libc-2.28.so 112404936 9029 77.7 878 77.8 7765 75.0 0.11 139 76.8 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190520113728.14389-22-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -
Adrian Hunter authored
Enhance the call graph to display IPC information if it is available. Committer testing: [acme@quaco adrian.hunter]$ python ~acme/libexec/perf-core/scripts/python/exported-sql-viewer.py ~/c/adrian.hunter/simple-retpoline.db Reports -> Context Sensitive Callgraph, then expand a few trees, then select with the mouse and press control+C: Call Path Object Count Time(ns) Time(%) Insn Insn Cyc Cyc IPC Branch Branch ▼ simple-retpolin Cnt Cnt(%) Cnt Cnt(%) Cnt Cnt(%) ▼ 23003:23003 ▼ _start ld-2.28.so 1 218295 100.0 127746 100.0 207320 100.0 0.62 13046 100.0
▶ unknown unknown 1 3202 1.5 0 0.0 0 0.0 0 1 0.0▶ _dl_start ld-2.28.so 1 188471 86.3 123394 96.6 180007 86.8 0.69 12529 96.0▶ _dl_init ld-2.28.so 1 13406 6.1 3207 2.5 14868 7.2 0.22 327 2.5 ▼ _start simple-retpoline 1 12899 5.9 1142 0.9 11561 5.6 0.10 184 1.4▶ unknown unknown 1 846 6.6 0 0.0 0 0.0 0 1 0.5 ▼ __libc_start_main libc-2.28.so 1 11621 90.1 1129 98.9 10350 89.5 0.11 181 98.4▶ __cxa_atexit libc-2.28.so 1 2302 19.8 101 8.9 1817 17.6 0.06 13 7.2▶ __libc_csu_init simple-retpoline 1 121 1.0 43 3.8 340 3.3 0.13 8 4.4 ▼ _setjmp libc-2.28.so 1 74 0.6 46 4.1 206 2.0 0.22 4 2.2 ▼ __sigsetjmp libc-2.28.so 1 74 100.0 46 100.0 206 100.0 0.22 3 75.0▶ __sigjmp_save libc-2.28.so 1 0 0.0 0 0.0 0 0.0 0 1 33.3 ▼ main simple-retpoline 1 44 0.4 23 2.0 126 1.2 0.18 12 6.6 ▼ foo simple-retpoline 2 44 100.0 23 100.0 126 100.0 0.18 10 83.3 bar simple-retpoline 2 22 50.0 6 26.1 61 48.4 0.10 2 20.0▶ exit libc-2.28.so 1 9029 77.7 878 77.8 7765 75.0 0.11 139 76.8 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190520113728.14389-21-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -
Adrian Hunter authored
Add a parameter to call graph and call tree, to determine whether IPC information is available. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190520113728.14389-20-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-