- 24 Jun, 2023 3 commits
-
-
Ian Rogers authored
Commit a8874665 ("perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE") made it so that vmlinux.h was uncondtionally included from tools/perf/util/vmlinux.h. This change reverts part of that change (so that vmlinux.h is once again generated) and makes it so that the vmlinux.h used at build time is selected from the VMLINUX_H variable. By default the VMLINUX_H variable is set to the vmlinux.h added in change a8874665, but if GEN_VMLINUX_H=1 is passed on the build command line then the previous generation behavior kicks in. The build with GEN_VMLINUX_H=1 currently fails with: util/bpf_skel/lock_contention.bpf.c:419:8: error: redefinition of 'rq' struct rq {}; ^ /tmp/perf/util/bpf_skel/.tmp/../vmlinux.h:45630:8: note: previous definition is here struct rq { ^ 1 error generated. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230623041405.4039475-2-irogers@google.com [ Format the error message and add a comment for GEN_VMLINUX_H ] Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-
Namhyung Kim authored
This test checks if the output of perf stat to match event names and metrics. So it wants the output lines to have both event name and metric. Otherwise it should skip the line. On AMD machines, the instruction event has two metrics and they are printed in separate lines. It makes the line without event name like below: # perf stat -a sleep 1 Performance counter stats for 'system wide': 64,383.34 msec cpu-clock # 64.048 CPUs utilized 14,526 context-switches # 225.617 /sec 112 cpu-migrations # 1.740 /sec 190 page-faults # 2.951 /sec 807,558,652 cycles # 0.013 GHz (83.30%) 69,809,799 stalled-cycles-frontend # 8.64% frontend cycles idle (83.30%) 196,983,266 stalled-cycles-backend # 24.39% backend cycles idle (83.30%) 424,876,008 instructions # 0.53 insn per cycle (here) ---> # 0.46 stalled cycles per insn (83.30%) 97,788,321 branches # 1.519 M/sec (83.34%) 4,147,377 branch-misses # 4.24% of all branches (83.46%) 1.005241409 seconds time elapsed Also modern Intel machines have TopDown metrics which also don't have event names. # perf stat -a sleep 1 Performance counter stats for 'system wide': 8,015.39 msec cpu-clock # 7.996 CPUs utilized 5,823 context-switches # 726.477 /sec 189 cpu-migrations # 23.580 /sec 139 page-faults # 17.342 /sec 435,139,308 cycles # 0.054 GHz 193,891,345 instructions # 0.45 insn per cycle 42,773,028 branches # 5.336 M/sec 2,298,113 branch-misses # 5.37% of all branches TopdownL1 # 25.5 % tma_backend_bound /--> # 7.9 % tma_bad_speculation (here) --+ # 55.7 % tma_frontend_bound \--> # 10.9 % tma_retiring 1.002395924 seconds time elapsed There is a check to skip TopdownL1 and TopdownL2 specifically but it does not cover every affected lines. So there is another check to skip the line if it has nothing on the left side of # sign. Well.. it seems ok but that's not enough too. When aggregation mode (like --per-socket or --per-thread) is used, it adds some prefix (e.g. CPU socket, task name and PID) in the output line. So the test code ignores them to normalize result. A problem can happen for per-thread mode when task name contains one or more spaces. It'd only ignore the first part of the task name, and it thinks there's something more in the line so it would not skip. # perf stat -a --perf-thread sleep 1 ... perf-21276 # 70.2 % tma_backend_bound perf-21276 # 3.9 % tma_bad_speculation perf-21276 # 10.5 % tma_frontend_bound perf-21276 # 15.3 % tma_retiring ^^^^^^^^^^ (ignored) my task-21328 # 70.2 % tma_backend_bound my task-21328 # 3.9 % tma_bad_speculation my task-21328 # 10.5 % tma_frontend_bound my task-21328 # 15.3 % tma_retiring ^^ (ignored) So I think it should look at the metric names instead. Add skip_metric to hold the list of names to skip. It would contain 'stalled cycles per insn' and metrics started by 'tma_'. Fixes: 99a04a48 ("perf test: Add test case for the standard 'perf stat' output") Acked-by: Ian Rogers <irogers@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230623230139.985594-2-namhyung@kernel.orgSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Namhyung Kim authored
On AMD machines, the perf stat STD output test failed like below: $ sudo ./perf test -v 98 98: perf stat STD output linter : --- start --- test child forked, pid 1841901 Checking STD output: no argswrong event metric. expected 'GHz' in 108,121 stalled-cycles-frontend # 10.88% frontend cycles idle test child finished with -1 ---- end ---- perf stat STD output linter: FAILED! This is because there are stalled-cycles-{frontend,backend} events are used by default. The current logic checks the event_name array to find which event it's running. But 'cycles' event comes before those stalled cycles event and it matches first. So it tries to find 'GHz' metric in the output (which is for the 'cycles') and fails. Move the stalled-cycles-{frontend,backend} events before 'cycles' so that it can find the stalled cycles events first. Also add a space after 'no args' test name for consistency. Fixes: 99a04a48 ("perf test: Add test case for the standard 'perf stat' output") Acked-by: Ian Rogers <irogers@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230623230139.985594-1-namhyung@kernel.orgSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
- 23 Jun, 2023 6 commits
-
-
Ian Rogers authored
The property of "cpu" when it has no cpu map is true on S390 with the PMU cpum_cf. Rather than maintain a list of such PMUs, reuse the is_core test result from the caller. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Link: https://lore.kernel.org/r/20230623043843.4080180-2-irogers@google.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Ian Rogers authored
JSON events created in pmu-events.c by jevents.py may not specify a PMU they are associated with, in which case it is implied that it is the first core PMU. Care is needed to select this for regular 'cpu', s390 'cpum_cf' and ARMs many names as at the point the name is first needed the core PMUs list hasn't been initialized. Add a helper in perf_pmus to create this value, in the worst case by scanning sysfs. v2. Add missing close if fdopendir fails. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623043843.4080180-1-irogers@google.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Ian Rogers authored
The result of thread__find_map is the map in the passed in addr_location. Calling addr_location__exit puts that map and so copies need to do a map__get. Add in the corresponding map__puts. v2. Add missing map__put when dso is missing. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623043107.4077510-1-irogers@google.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Namhyung Kim authored
The task-analyzer.py script (actually every other scripts too) requires PERF_EXEC_PATH env to find dependent libraries and scripts. For scripts test to run correctly, it needs to set PERF_EXEC_PATH to the perf tool source directory. Instead of blindly update the env, let's check the directory structure to make sure it points to the correct location. Fixes: e8478b84 ("perf test: add new task-analyzer tests") Cc: Petar Gligoric <petar.gligoric@rohde-schwarz.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-
Namhyung Kim authored
The buffer is used to save register mapping in a sample. Normally perf samples don't have any register so the string should be empty. But it missed to initialize the buffer when the size is 0. And it's passed to PyUnicode_FromString() with a garbage data. So it returns NULL due to invalid input (instead of an empty unicode string object) which causes a segfault like below: Thread 2.1 "perf" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff7c83780 (LWP 193775)] 0x00007ffff6dbca2e in PyDict_SetItem () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0 (gdb) bt #0 0x00007ffff6dbca2e in PyDict_SetItem () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0 #1 0x00007ffff6dbf848 in PyDict_SetItemString () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0 #2 0x000055555575824d in pydict_set_item_string_decref (val=0x0, key=0x5555557f96e3 "iregs", dict=0x7ffff5f7f780) at util/scripting-engines/trace-event-python.c:145 #3 set_regs_in_dict (evsel=0x555555efc370, sample=0x7fffffffb870, dict=0x7ffff5f7f780) at util/scripting-engines/trace-event-python.c:776 #4 get_perf_sample_dict (sample=sample@entry=0x7fffffffb870, evsel=evsel@entry=0x555555efc370, al=al@entry=0x7fffffffb2e0, addr_al=addr_al@entry=0x0, callchain=callchain@entry=0x7ffff63ef440) at util/scripting-engines/trace-event-python.c:923 #5 0x0000555555758ec1 in python_process_tracepoint (sample=0x7fffffffb870, evsel=0x555555efc370, al=0x7fffffffb2e0, addr_al=0x0) at util/scripting-engines/trace-event-python.c:1044 #6 0x00005555555c5db8 in process_sample_event (tool=<optimized out>, event=<optimized out>, sample=<optimized out>, evsel=0x555555efc370, machine=0x555555ef4d68) at builtin-script.c:2421 #7 0x00005555556b7793 in perf_session__deliver_event (session=0x555555ef4b60, event=0x7ffff62ff7d0, tool=0x7fffffffc150, file_offset=30672, file_path=0x555555efb8a0 "perf.data") at util/session.c:1639 #8 0x00005555556bc864 in do_flush (show_progress=true, oe=0x555555efb700) at util/ordered-events.c:245 #9 __ordered_events__flush (oe=oe@entry=0x555555efb700, how=how@entry=OE_FLUSH__FINAL, timestamp=timestamp@entry=0) at util/ordered-events.c:324 #10 0x00005555556bd06e in ordered_events__flush (oe=oe@entry=0x555555efb700, how=how@entry=OE_FLUSH__FINAL) at util/ordered-events.c:342 #11 0x00005555556b9d63 in __perf_session__process_events (session=0x555555ef4b60) at util/session.c:2465 #12 perf_session__process_events (session=0x555555ef4b60) at util/session.c:2627 #13 0x00005555555cb1d0 in __cmd_script (script=0x7fffffffc150) at builtin-script.c:2839 #14 cmd_script (argc=<optimized out>, argv=<optimized out>) at builtin-script.c:4365 #15 0x0000555555650811 in run_builtin (p=p@entry=0x555555ed8948 <commands+456>, argc=argc@entry=4, argv=argv@entry=0x7fffffffe240) at perf.c:323 #16 0x0000555555597eb3 in handle_internal_command (argv=0x7fffffffe240, argc=4) at perf.c:377 #17 run_argv (argv=<synthetic pointer>, argcp=<synthetic pointer>) at perf.c:421 #18 main (argc=4, argv=0x7fffffffe240) at perf.c:537 Fixes: 51cfe7a3 ("perf python: Avoid 2 leak sanitizer issues") Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-
James Clark authored
$TEST_PROGRAM is a command with spaces so it's supposed to be word split. The referenced fix to fix the shellcheck warnings incorrectly quoted this string so unquote it to fix the test. At the same time silence the shellcheck warning for that line and fix two more shellcheck errors at the end of the script. Fixes: 1bb17b4c ("perf tests arm_callgraph_fp: Address shellcheck warnings about signal names and adding double quotes for expression") Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: spoorts2@in.ibm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230622101809.2431897-1-james.clark@arm.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
- 22 Jun, 2023 5 commits
-
-
Tiezhu Yang authored
We can see the following definitions in bfd/elfnn-loongarch.c: #define PLT_HEADER_INSNS 8 #define PLT_HEADER_SIZE (PLT_HEADER_INSNS * 4) #define PLT_ENTRY_INSNS 4 #define PLT_ENTRY_SIZE (PLT_ENTRY_INSNS * 4) so plt header size is 32 and plt entry size is 16 on LoongArch, let us add LoongArch case in get_plt_sizes(). Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: loongarch@lists.linux.dev Cc: loongson-kernel@lists.loongnix.cn Cc: Ingo Molnar <mingo@redhat.com> Link: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfnn-loongarch.c Link: https://lore.kernel.org/r/1684835873-15956-1-git-send-email-yangtiezhu@loongson.cnSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Namhyung Kim authored
The commit fc51fc87 factored out the helper functions to a library but the new file had execute permission. Due to the way it detects the shell test scripts, it showed up in the perf test list unexpectedly. $ ./perf test list 2>&1 | grep 86 76: x86 bp modify 77: x86 Sample parsing 78: x86 hybrid 86: <---- (here) $ ./perf test -v 86 86: : --- start --- test child forked, pid 1932207 test child finished with 0 ---- end ---- : Ok As it's a collection of library functions, it should not run as is. Let's remove the execute permission. Fixes: fc51fc87 ("perf test: Move all the check functions of stat CSV output to lib") Acked-by: Ian Rogers <irogers@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230622055832.83476-1-namhyung@kernel.orgSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Weilin Wang authored
Rerun failed metrics with longer workload to avoid false failure because sometimes metric value test fails when running in very short amount of time. Skip rerun if equal to or more than 20 metrics fail. Signed-off-by: Weilin Wang <weilin.wang@intel.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: ravi.bangoria@amd.com Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230620170027.1861012-4-weilin.wang@intel.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Weilin Wang authored
Add skip list for metrics known would fail because some of the metrics are very likely to fail due to multiplexing or other errors. So add all of the flaky tests into the skip list. Signed-off-by: Weilin Wang <weilin.wang@intel.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: ravi.bangoria@amd.com Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230620170027.1861012-3-weilin.wang@intel.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Weilin Wang authored
Add metric value validation test to check if metric values are with in correct value ranges. There are three types of tests included: 1) positive-value test checks if all the metrics collected are non-negative; 2) single-value test checks if the list of metrics have values in given value ranges; 3) relationship test checks if multiple metrics follow a given relationship, e.g. memory_bandwidth_read + memory_bandwidth_write = memory_bandwidth_total. Signed-off-by: Weilin Wang <weilin.wang@intel.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: ravi.bangoria@amd.com Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230620170027.1861012-2-weilin.wang@intel.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
- 21 Jun, 2023 4 commits
-
-
elisabeth authored
Fixes an issue where an incorrect filename was added in the DWARF line table of an ELF object file when calling 'perf inject --jit' due to not checking the filename of a debug entry against the repeated name marker (/xff/0). The marker is mentioned in the tools/perf/util/jitdump.h header, which describes the jitdump binary format, and indicitates that the filename in a debug entry is the same as the previous enrty. In the function emit_lineno_info(), in the file tools/perf/util/genelf-debug.c, the debug entry filename gets compared to the previous entry filename. If they are not the same, a new filename is added to the DWARF line table. However, since there is no check against '\xff\0', in some cases '\xff\0' is inserted as the filename into the DWARF line table. This can be seen with `objdump --dwarf=line` on the ELF file after `perf inject --jit`. It also makes no source code information show up in 'perf annotate'. Signed-off-by: Elisabeth Panholzer <elisabeth@leaningtech.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230602123815.255001-1-paniii94@gmail.com [ Fixed a trailing white space, removed a subject prefix ] Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-
WANG Rui authored
In the perf annotate view for LoongArch, there is no arrowed line pointing to the target from the branch instruction. This issue is caused by incorrect instruction association and parsing. $ perf record alloc-6276705c94ad1398 # rust benchmark $ perf report 0.28 │ ori $a1, $zero, 0x63 │ move $a2, $zero 10.55 │ addi.d $a3, $a2, 1(0x1) │ sltu $a4, $a3, $s7 9.53 │ masknez $a4, $s7, $a4 │ sub.d $a3, $a3, $a4 12.12 │ st.d $a1, $fp, 24(0x18) │ st.d $a3, $fp, 16(0x10) 16.29 │ slli.d $a2, $a2, 0x2 │ ldx.w $a2, $s8, $a2 12.77 │ st.w $a2, $sp, 724(0x2d4) │ st.w $s0, $sp, 720(0x2d0) 7.03 │ addi.d $a2, $sp, 720(0x2d0) │ addi.d $a1, $a1, -1(0xfff) 12.03 │ move $a2, $a3 │ → bne $a1, $s3, -52(0x3ffcc) # 82ce8 <test::bench::Bencher::iter+0x3f4> 2.50 │ addi.d $a0, $a0, 1(0x1) This patch fixes instruction association issues, such as associating branch instructions with jump_ops instead of call_ops, and corrects false instruction matches. It also implements branch instruction parsing specifically for LoongArch. With this patch, we will be able to see the arrowed line. 0.79 │3ec: ori $a1, $zero, 0x63 │ move $a2, $zero 10.32 │3f4:┌─→addi.d $a3, $a2, 1(0x1) │ │ sltu $a4, $a3, $s7 10.44 │ │ masknez $a4, $s7, $a4 │ │ sub.d $a3, $a3, $a4 14.17 │ │ st.d $a1, $fp, 24(0x18) │ │ st.d $a3, $fp, 16(0x10) 13.15 │ │ slli.d $a2, $a2, 0x2 │ │ ldx.w $a2, $s8, $a2 11.00 │ │ st.w $a2, $sp, 724(0x2d4) │ │ st.w $s0, $sp, 720(0x2d0) 8.00 │ │ addi.d $a2, $sp, 720(0x2d0) │ │ addi.d $a1, $a1, -1(0xfff) 11.99 │ │ move $a2, $a3 │ └──bne $a1, $s3, 3f4 3.17 │ addi.d $a0, $a0, 1(0x1) Signed-off-by: WANG Rui <wangrui@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: loongarch@lists.linux.dev Cc: loongson-kernel@lists.loongnix.cn Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: WANG Xuerui <kernel@xen0n.name> Link: https://lore.kernel.org/r/20230620132025.105563-1-wangrui@loongson.cnSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Ian Rogers authored
Remove the "struct mutex lock" variable from annotation that is allocated per symbol. This removes in the region of 40 bytes per symbol allocation. Use a sharded mutex where the number of shards is set to the number of CPUs. Assuming good hashing of the annotation (done based on the pointer), this means in order to contend there needs to be more threads than CPUs, which is not currently true in any perf command. Were contention an issue it is straightforward to increase the number of shards in the mutex. On my Debian/glibc based machine, this reduces the size of struct annotation from 136 bytes to 96 bytes, or nearly 30%. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andres Freund <andres@anarazel.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yuan Can <yuancan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230615040715.2064350-2-irogers@google.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Ian Rogers authored
Per object mutexes may come with significant memory cost while a global mutex can suffer from unnecessary contention. A sharded mutex is a compromise where objects are hashed and then a particular mutex for the hash of the object used. Contention can be controlled by the number of shards. v2. Use hashmap.h's hash_bits in case of contention from alignment of objects. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andres Freund <andres@anarazel.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yuan Can <yuancan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230615040715.2064350-1-irogers@google.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
- 20 Jun, 2023 5 commits
-
-
Li Dong authored
What we need to calculate is the size of the object, not the size of the pointer. Fixed: 51cfe7a3 ("perf python: Avoid 2 leak sanitizer issues") Signed-off-by: Li Dong <lidong@vivo.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: opensource.kernel@vivo.com Link: https://lore.kernel.org/r/20230619082036.410-1-lidong@vivo.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Chenyuan Mi authored
The malloc() function may return NULL when it fails, which may cause null pointer deference in add_cmdname(), add Null check for return value of malloc(). Found by our static analysis tool. Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn> Acked-by: Ian Rogers <irogers@google.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/20230614150118.115208-1-cymi20@fudan.edu.cnSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
baomingtong001@208suo.com authored
./tools/perf/util/parse-events.c:1466:2-3: Unneeded semicolon Signed-off-by: Mingtong Bao <baomingtong001@208suo.com> Link: https://lore.kernel.org/r/2c733a91717eae93119ba2226420fd8f@208suo.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Yang Jihong authored
The newline is missing for pr_debug message in evsel__compute_group_pmu_name(), fix it. Before: # perf --debug verbose=2 record -e cpu-clock true <SNIP> No PMU found for 'cycles:u'No PMU found for 'instructions:u'------------------------------------------------------------ perf_event_attr: type 1 size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ <SNIP> After: # perf --debug verbose=2 record -e cpu-clock true <SNIP> No PMU found for 'cycles:u' No PMU found for 'instructions:u' ------------------------------------------------------------ perf_event_attr: type 1 size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ <SNIP> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: peterz@infradead.org Cc: adrian.hunter@intel.com Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: kan.liang@linux.intel.com Cc: mingo@redhat.com Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Link: https://lore.kernel.org/r/20230616024515.80814-1-yangjihong1@huawei.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
Yang Jihong authored
The newline is missing for error messages in add_default_attributes() Before: # perf stat --topdown Topdown requested but the topdown metric groups aren't present. (See perf list the metric groups have names like TopdownL1)# After: # perf stat --topdown Topdown requested but the topdown metric groups aren't present. (See perf list the metric groups have names like TopdownL1) # In addition, perf_stat_init_aggr_mode() and perf_stat_init_aggr_mode_file() have the same problem, fixed by the way. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20230614021505.59856-1-yangjihong1@huawei.comSigned-off-by: Namhyung Kim <namhyung@kernel.org>
-
- 16 Jun, 2023 17 commits
-
-
Arnaldo Carvalho de Melo authored
In some architectures we can't encode the PMU number in perf_event_attr.type and thus can't just ask for the same event in multiple CPUs (and thus PMUs), that is what we want in hybrid systems but we can't when that encoding isn't understood by the kernel, such as in ARM64's big.LITTLE. If that is the case, fallback to the previous behaviour till we find a better solution to have consistent output accross architectures with hybrid CPU configurations. Co-developed-with: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
Will be used when checking if we can encode the PMU number in perf_event_attr.type, part of the logic to use in hybrid systems (multiple types of CPUs, such as Intel's (Alder Lake, etc) or ARM's big.LITTLE). Co-developed-with: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Tiezhu Yang authored
There exists the following warning when executing 'perf test record+probe_libc_inet_pton.sh': fgrep: warning: fgrep is obsolescent; using grep -F This is tested on Fedora 38, the version of grep is 3.8, the latest version of grep claims the fgrep is obsolete, use "grep -F" instead of "fgrep" to silence the warning. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: loongson-kernel@lists.loongnix.cn Link: https://lore.kernel.org/r/1686880567-30017-1-git-send-email-yangtiezhu@loongson.cnSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ravi Bangoria authored
Scanning only core PMUs is not sufficient on platforms like AMD since perf mem on AMD uses IBS OP PMU, which is independent of core PMU. Scan all PMUs instead of just core PMUs. There should be negligible performance overhead because of scanning all PMUs, so we should be okay. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230615051700.1833-4-ravi.bangoria@amd.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ravi Bangoria authored
perf mem/c2c on AMD internally uses IBS OP PMU, not the core PMU. Also, AMD platforms does not have heterogeneous PMUs. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230615051700.1833-3-ravi.bangoria@amd.com [ Added the improved comment for perf_pmus__num_mem_pmus() as b4 didn't from the per-patch (not series) newer version ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ravi Bangoria authored
Notion of 'core_pmus' and 'other_pmus' are independent of hw core and uncore pmus. For example, AMD IBS PMUs are present in each SMT-thread but they belongs to 'other_pmus'. Add a comment describing what these list contains and how they are treated. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230615051700.1833-2-ravi.bangoria@amd.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Namhyung Kim authored
When -r option is used, perf stat runs the command multiple times and update stats in the evsel->stats.res_stats for global aggregation. But the value is never used and the value it prints at the end is just the value from the last run. I think we should print the average number of multiple runs. Add evlist__copy_res_stats() to update the aggr counter (for display) using the values in the evsel->stats.res_stats. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230616073211.1057936-2-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Namhyung Kim authored
When it runs multiple times with -r option, it missed to reset the aggregation counters and the values were added up. The aggregation count has the values to be printed in the end. It should reset the counters at the beginning of each run. But the current code does that only when -I/--interval-print option is given. Fixes: 91f85f98 ("perf stat: Display event stats using aggr counts") Reported-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230616073211.1057936-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Thomas Richter authored
In linux-next tree the many test cases fail on s390x when running the perf test suite, sometime the perf tool dumps core. Output before: 6.1: Test event parsing : FAILED! 10.3: Parsing of PMU event table metrics : FAILED! 10.4: Parsing of PMU event table metrics with fake PMUs: FAILED! 17: Setup struct perf_event_attr : FAILED! 24: Number of exit events of a simple workload : FAILED! 26: Object code reading : FAILED! 28: Use a dummy software event to keep tracking : FAILED! 35: Track with sched_switch : FAILED! 42.3: BPF prologue generation : FAILED! 66: Parse and process metrics : FAILED! 68: Event expansion for cgroups : FAILED! 69.2: Perf time to TSC : FAILED! 74: build id cache operations : FAILED! 86: Zstd perf.data compression/decompression : FAILED! 87: perf record tests : FAILED! 106: Test java symbol : FAILED! The reason for all these failure is a missing PMU. On s390x the PMU is named cpum_cf which is not detected as core PMU. A similar patch was added before, see commit 9bacbced ("perf list: Add s390 support for detailed PMU event description") which got lost during the recent reworks. Add it again. Output after: 10.2: PMU event map aliases : FAILED! 42.3: BPF prologue generation : FAILED! Most test cases now work and there is not core dump anymore. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20230616081437.1932003-1-tmricht@linux.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Vincent Whitchurch authored
It is currently possible to use --symfs along with a vmlinux which lies outside of the symfs by passing an absolute path to --vmlinux, thanks to the check in dso__load_vmlinux() which handles this explicitly. However, the annotate code lacks this check and thus 'perf annotate' does not work ("Internal error: Invalid -1 error code") for kernel functions with this combination. Add the missing handling. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel@axis.com Link: https://lore.kernel.org/r/20221125114210.2353820-1-vincent.whitchurch@axis.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
Add the default tags for Hisi hip08 as well. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230616031420.3751973-6-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
Add a new test case to verify the standard 'perf stat' output with different options. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230616031420.3751973-5-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
These functions can be shared with the stat std output test. There is no functional change. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230616031420.3751973-4-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
In the default mode, the current output of the metricgroup include both events and metrics, which is not necessary and just makes the output hard to read. Since different ARCHs (even different generations in the same ARCH) may use different events. The output also vary on different platforms. For a metricgroup, only outputting the value of each metric is good enough. Add a new field default_metricgroup in evsel to indicate an event of the default metricgroup. For those events, printout() should print the metricgroup name rather than each event. Add perf_stat__skip_metric_event() to skip the evsel in the Default metricgroup, if it's not running or not the metric event. Add print_metricgroup_header_t to pass the functions which print the display name of each metricgroup in the Default metricgroup. Support all three output methods. Factor out perf_stat__print_shadow_stats_metricgroup() to print out each metrics. On SPR: Before: ./perf_old stat sleep 1 Performance counter stats for 'sleep 1': 0.54 msec task-clock:u # 0.001 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 68 page-faults:u # 125.445 K/sec 540,970 cycles:u # 0.998 GHz 556,325 instructions:u # 1.03 insn per cycle 123,602 branches:u # 228.018 M/sec 6,889 branch-misses:u # 5.57% of all branches 3,245,820 TOPDOWN.SLOTS:u # 18.4 % tma_backend_bound # 17.2 % tma_retiring # 23.1 % tma_bad_speculation # 41.4 % tma_frontend_bound 564,859 topdown-retiring:u 1,370,999 topdown-fe-bound:u 603,271 topdown-be-bound:u 744,874 topdown-bad-spec:u 12,661 INT_MISC.UOP_DROPPING:u # 23.357 M/sec 1.001798215 seconds time elapsed 0.000193000 seconds user 0.001700000 seconds sys After: $ ./perf stat sleep 1 Performance counter stats for 'sleep 1': 0.51 msec task-clock:u # 0.001 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 68 page-faults:u # 132.683 K/sec 545,228 cycles:u # 1.064 GHz 555,509 instructions:u # 1.02 insn per cycle 123,574 branches:u # 241.120 M/sec 6,957 branch-misses:u # 5.63% of all branches TopdownL1 # 17.5 % tma_backend_bound # 22.6 % tma_bad_speculation # 42.7 % tma_frontend_bound # 17.1 % tma_retiring TopdownL2 # 21.8 % tma_branch_mispredicts # 11.5 % tma_core_bound # 13.4 % tma_fetch_bandwidth # 29.3 % tma_fetch_latency # 2.7 % tma_heavy_operations # 14.5 % tma_light_operations # 0.8 % tma_machine_clears # 6.1 % tma_memory_bound 1.001712086 seconds time elapsed 0.000151000 seconds user 0.001618000 seconds sys Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
The new default mode will print the metrics as a metric group. The metrics from the same metric group must be adjacent to each other in the metric list. But the metric_list_cmp() sorts metrics by the number of events. Add a new sort for the Default metricgroup, which sorts by default_metricgroup_name and metric_name. Add is_default in the struct metric_event to indicate that it's from the Default metricgroup. Store the displayed metricgroup name of the Default metricgroup into the metric expr for output. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230616031420.3751973-2-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
There may be multiplexing triggered, e.g., e-core of ADL. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230615135315.3662428-7-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
Introduce a new metricgroup, Default, to tag all the metric groups which will be collected in the default mode. Add a new field, DefaultMetricgroupName, in the JSON file to indicate the real metric group name. It will be printed in the default output to replace the event names. There is nothing changed for the output format. On SPR, both TopdownL1 and TopdownL2 are displayed in the default output. On ARM, Intel ICL and later platforms (before SPR), only TopdownL1 is displayed in the default output. Suggested-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230615135315.3662428-4-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-