- 22 Feb, 2024 3 commits
-
-
Ian Rogers authored
Aggregation index was being computed using the evsel's cpumap which may have a different (typically the same or fewer) entries. Before: ``` $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total CPU0 12.8 0.0 12.9 12.7 0.0 12.6 CPU1 1.007806367 seconds time elapsed ``` After: ``` $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total CPU0 15.4 0.0 15.3 15.0 0.0 14.9 CPU18 0.0 0.0 13.5 5.2 0.0 11.9 1.007858736 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> | Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-3-irogers@google.com
-
Ian Rogers authored
When merging counts from multiple uncore PMUs the metric is only computed for the metric leader. When merging/aggregation is disabled, prior to this patch just the leader's metric would be computed. Fix this by computing the metric for each PMU. On a SkylakeX: Before: ``` $ perf stat -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': CPU0 82,217 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.2 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total CPU0 61,395 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU0 81,570 UNC_M_CAS_COUNT.RD [uncore_imc_2] CPU18 113,886 UNC_M_CAS_COUNT.RD [uncore_imc_2] CPU0 62,330 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU18 66,942 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU0 75,489 UNC_M_CAS_COUNT.RD [uncore_imc_3] CPU18 27,958 UNC_M_CAS_COUNT.RD [uncore_imc_3] CPU0 55,864 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU18 38,727 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU0 75,423 UNC_M_CAS_COUNT.RD [uncore_imc_5] CPU18 104,527 UNC_M_CAS_COUNT.RD [uncore_imc_5] CPU0 57,596 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU18 56,777 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU0 1,003,440,851 ns duration_time 1.003440851 seconds time elapsed ``` After: ``` $ perf stat -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': CPU0 88,968 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.5 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total CPU0 59,498 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU0 88,635 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 9.5 MB/s memory_bandwidth_total CPU18 117,975 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 11.5 MB/s memory_bandwidth_total CPU0 60,829 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU18 62,105 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU0 82,238 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 8.7 MB/s memory_bandwidth_total CPU18 22,906 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 3.6 MB/s memory_bandwidth_total CPU0 53,959 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU18 32,990 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU0 83,595 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 8.9 MB/s memory_bandwidth_total CPU18 110,151 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 10.5 MB/s memory_bandwidth_total CPU0 56,540 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU18 53,816 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU0 1,003,353,416 ns duration_time ``` Signed-off-by: Ian Rogers <irogers@google.com> | Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-2-irogers@google.com
-
Ian Rogers authored
Pass metric_expr and evsel rather than specific variables from the struct, thereby reducing the number of arguments. This will enable later fixes. To reduce the size of the diff, local variables are added to match the previous parameter names. This isn't done in the case of "name" as evsel->name is more intention revealing. A whitespace issue is also addressed. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-1-irogers@google.com
-
- 21 Feb, 2024 5 commits
-
-
Changbin Du authored
Now perf can show assembly instructions with libcapstone for x86, and the capstone is better in general. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-6-changbin.du@huawei.com
-
Changbin Du authored
Now '--insn-trace' accept a argument to specify the output format: - raw: display raw instructions. - disasm: display mnemonic instructions (if capstone is installed). $ sudo perf script --insn-trace=raw ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: 48 89 e7 ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: e8 e8 0c 00 00 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: f3 0f 1e fa $ sudo perf script --insn-trace=disasm ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) illegal instruction ls 1443864 [006] 2275506.209908875: 7f216b426df4 _dl_start+0x4 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df5 _dl_start+0x5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df8 _dl_start+0x8 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %r15 Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-5-changbin.du@huawei.com
-
Changbin Du authored
In addition to the 'insn' field, this adds a new field 'disasm' to display mnemonic instructions instead of the raw code. $ sudo perf script -F +disasm perf-exec 1443864 [006] 2275506.209848: psb: psb offs: 0 0 [unknown] ([unknown]) perf-exec 1443864 [006] 2275506.209848: cbr: cbr: 41 freq: 4100 MHz (114%) 0 [unknown] ([unknown]) ls 1443864 [006] 2275506.209905: 1 branches:uH: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908: 1 branches:uH: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-4-changbin.du@huawei.com
-
Changbin Du authored
Currently, the instructions of samples are shown as raw hex strings which are hard to read. x86 has a special option '--xed' to disassemble the hex string via intel XED tool. Here we use capstone as our disassembler engine to give more friendly instructions. We select libcapstone because capstone can provide more insn details. Perf will fallback to raw instructions if libcapstone is not available. The advantages compared to XED tool: * Support arm, arm64, x86-32, x86_64 (more could be supported), xed only for x86_64. * Immediate address operands are shown as symbol+offs. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-3-changbin.du@huawei.com
-
Changbin Du authored
Later we will use libcapstone to disassemble instructions of samples. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-2-changbin.du@huawei.com
-
- 17 Feb, 2024 2 commits
-
-
Ian Rogers authored
If perf list is invoked with 'metricgroups' include the description unless it is invoked with flags to exclude it. Make the description of metricgroup dumping dependent on the desc flag in print_state as with metrics. Before: ``` $ perf list metricgroups List of pre-defined events (to be used in -e or -M): Metric Groups: Backend Bad BadSpec ... ``` After: ``` $ perf list metricgroups List of pre-defined events (to be used in -e or -M): Metric Groups: Backend [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] Bad [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] BadSpec ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240216192044.119897-1-irogers@google.com
-
Namhyung Kim authored
I got a strange error on ARM to fail on processing FINISHED_ROUND record. It turned out that it was failing in symbol__alloc_hist() because the symbol size is too big. When a sample is captured on a specific BPF program, it failed. I've added a debug code and found the end address of the symbol is from the next module which is placed far way. ffff800008795778-ffff80000879d6d8: bpf_prog_1bac53b8aac4bc58_netcg_sock [bpf] ffff80000879d6d8-ffff80000ad656b4: bpf_prog_76867454b5944e15_netcg_getsockopt [bpf] ffff80000ad656b4-ffffd69b7af74048: bpf_prog_1d50286d2eb1be85_hn_egress [bpf] <---------- here ffffd69b7af74048-ffffd69b7af74048: $x.5 [sha3_generic] ffffd69b7af74048-ffffd69b7af740b8: crypto_sha3_init [sha3_generic] ffffd69b7af740b8-ffffd69b7af741e0: crypto_sha3_update [sha3_generic] The logic in symbols__fixup_end() just uses curr->start to update the prev->end. But in this case, it won't work as it's too different. I think ARM has a different kernel memory layout for modules and BPF than on x86. Actually there's a logic to handle kernel and module boundary. Let's do the same for symbols between different modules. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Leo Yan <leo.yan@linux.dev> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240212233322.1855161-1-namhyung@kernel.org
-
- 16 Feb, 2024 30 commits
-
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-31-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-30-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-29-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - tma_c01_wait and tma_c02_wait metrics measure power-performance states. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-28-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Add metrics tma_fp_vector_128b, tma_fp_vector_256b and tma_info_system_cpus_utilized. - Remove metrics tma_info_system_mem_parallel_requests, tma_info_system_core_frequency and tma_info_system_mem_request_latency. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-27-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-26-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-25-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Reduced number of events when SMT is off. - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-24-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Reduced number of events when SMT is off. - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-23-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-22-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-21-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-20-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-19-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-18-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-17-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-16-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-15-irogers@google.com
-
Ian Rogers authored
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - tma_c01_wait and tma_c02_wait metrics measure power-performance states. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-14-irogers@google.com
-
Ian Rogers authored
Update alderlake events to v1.15 released in: https://github.com/intel/perfmon/commit/282a6951fd9f025cff6c8c0ea16b1fcec786a4cd Documentation fixes, removal of TOPDOWN.BR_MISPREDICT_SLOTS, deprecation of UNC_ARB_DAT_REQUESTS.RD, UNC_ARB_DAT_REQUESTS.RD and UNC_ARB_IFA_OCCUPANCY.ALL. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-13-irogers@google.com
-
Ian Rogers authored
Update skylake events to v58 released in: https://github.com/intel/perfmon/commit/625fb7507373fef8297052c5f9af9ffe78d460c0 Improves documentation. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-12-irogers@google.com
-
Ian Rogers authored
Update sierraforest events to v1.01 released in: https://github.com/intel/perfmon/commit/582bca24aa0d742306cd4697c5bd1b1b529aa3ce Adds the majority of core and uncore events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-11-irogers@google.com
-
Ian Rogers authored
Update alderlake events to v1.02 released in: https://github.com/intel/perfmon/commit/4931178d1ede1099a3e4ac7e04ed9f073e03d219 Improves documentation and removes TOPDOWN.BR_MISPREDICT_SLOTS. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-10-irogers@google.com
-
Ian Rogers authored
Update meteorlake events to v1.07 released in: https://github.com/intel/perfmon/commit/62517223080e46bfa9a905a1195c7febae7fdb3e Umask changed on atom mem_bound events. Adds atom events ARITH.FPDIV_ACTIVE, FP_FLOPS_RETIRED.ALL, FP_FLOPS_RETIRED.DP, FP_FLOPS_RETIRED.FP32, ARITH.DIV_ACTIVE, BR_INST_RETIRED.COND, BR_INST_RETIRED.COND_TAKEN, BR_INST_RETIRED.INDIRECT, BR_INST_RETIRED.INDIRECT_CALL, BR_INST_RETIRED.IND_CALL, BR_INST_RETIRED.NEAR_RETURN, DTLB_LOAD_MISSES.WALK_COMPLETED_4K, DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M, DTLB_STORE_MISSES.WALK_COMPLETED_4K, ITLB_MISSES.WALK_COMPLETED_4K, and alias events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-9-irogers@google.com
-
Ian Rogers authored
Update icelake events to v1.21 released in: https://github.com/intel/perfmon/commit/54f1246b0496112c1d2b2a49e4859c85caa3dbf4 Improves descriptions, removes TOPDOWN.BR_MISPREDICT_SLOTS. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-8-irogers@google.com
-
Ian Rogers authored
Update haswell events to v35 released in: https://github.com/intel/perfmon/commit/c0f9b34d421941bc3e13c6ca5554e6a54e8bd574 Updates "must be precise" on RTM_RETIRED.ABORTED. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: linux-perf-users@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-7-irogers@google.com
-
Ian Rogers authored
Update grandridge events to v1.01 released in: https://github.com/intel/perfmon/commit/211d60716509d8248e57450e434de98cc6e511d8 Adds the majority of core and uncore events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-6-irogers@google.com
-
Ian Rogers authored
Update emeraldrapids events to v1.03 released in: https://github.com/intel/perfmon/commit/c7c6f72dae07fee35d5982232829c0cd37f9e28e Adds uncore CHA events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-5-irogers@google.com
-
Ian Rogers authored
Update broadwell events to v29 released in: https://github.com/intel/perfmon/commit/47117146c6b9e38811618beca31eba4e41c3d874 Updates "must be precise" on RTM_RETIRED.ABORTED. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-4-irogers@google.com
-
Ian Rogers authored
Update alderlaken events to v1.24 released in: https://github.com/intel/perfmon/commit/e627dd8d89e2d2110f1d499608dd6f37aae37a8c Adds LBR_INSERTS.ANY/MISC_RETIRED.LBR_INSERTS event. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-3-irogers@google.com
-
Ian Rogers authored
Update alderlake events to v1.24 released in: https://github.com/intel/perfmon/commit/e627dd8d89e2d2110f1d499608dd6f37aae37a8c Adds aliased events, improves documentation and fix some event fields. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.pySigned-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-2-irogers@google.com
-