Commit e172f1e9 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:

 - Enable turbostat extensions to add both perf and PMT (Intel
   Platform Monitoring Technology) counters via the cmdline

 - Demonstrate PMT access with built-in support for Meteor Lake's
   Die C6 counter

* tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2024.07.26
  tools/power turbostat: Include umask=%x in perf counter's config
  tools/power turbostat: Document PMT in turbostat.8
  tools/power turbostat: Add MTL's PMT DC6 builtin counter
  tools/power turbostat: Add early support for PMT counters
  tools/power turbostat: Add selftests for added perf counters
  tools/power turbostat: Add selftests for SMI, APERF and MPERF counters
  tools/power turbostat: Move verbose counter messages to level 2
  tools/power turbostat: Move debug prints from stdout to stderr
  tools/power turbostat: Fix typo in turbostat.8
  tools/power turbostat: Add perf added counter example to turbostat.8
  tools/power turbostat: Fix formatting in turbostat.8
  tools/power turbostat: Extend --add option with perf counters
  tools/power turbostat: Group SMI counter with APERF and MPERF
  tools/power turbostat: Add ZERO_ARRAY for zero initializing builtin array
  tools/power turbostat: Replace enum rapl_source and cstate_source with counter_source
  tools/power turbostat: Remove anonymous union from rapl_counter_info_t
  tools/power/turbostat: Switch to new Intel CPU model defines
parents e62f81bb 866d2d36
......@@ -46,6 +46,7 @@ snapshot: turbostat
@echo "#define GENMASK_ULL(h, l) (((~0ULL) << (l)) & (~0ULL >> (sizeof(long long) * 8 - 1 - (h))))" >> $(SNAPSHOT)/bits.h
@echo '#define BUILD_BUG_ON(cond) do { enum { compile_time_check ## __COUNTER__ = 1/(!(cond)) }; } while (0)' > $(SNAPSHOT)/build_bug.h
@echo '#define __must_be_array(arr) 0' >> $(SNAPSHOT)/build_bug.h
@echo PWD=. > $(SNAPSHOT)/Makefile
@echo "CFLAGS += -DMSRHEADER='\"msr-index.h\"'" >> $(SNAPSHOT)/Makefile
......
......@@ -28,10 +28,13 @@ name as necessary to disambiguate it from others is necessary. Note that option
.PP
\fB--add attributes\fP add column with counter having specified 'attributes'. The 'location' attribute is required, all others are optional.
.nf
location: {\fBmsrDDD\fP | \fBmsr0xXXX\fP | \fB/sys/path...\fP}
location: {\fBmsrDDD\fP | \fBmsr0xXXX\fP | \fB/sys/path...\fP | \fBperf/<device>/<event>\fP}
msrDDD is a decimal offset, eg. msr16
msr0xXXX is a hex offset, eg. msr0x10
/sys/path... is an absolute path to a sysfs attribute
<device> is a perf device from /sys/bus/event_source/devices/<device> eg. cstate_core
<event> is a perf event for given device from /sys/bus/event_source/devices/<device>/events/<event> eg. c1-residency
perf/cstate_core/c1-residency would then use /sys/bus/event_source/devices/cstate_core/events/c1-residency
scope: {\fBcpu\fP | \fBcore\fP | \fBpackage\fP}
sample and print the counter for every cpu, core, or package.
......@@ -52,6 +55,39 @@ name as necessary to disambiguate it from others is necessary. Note that option
as the column header.
.fi
.PP
\fB--add pmt,[attr_name=attr_value, ...]\fP add column with a PMT (Intel Platform Monitoring Technology) counter in a similar way to --add option above, but require PMT metadata to be supplied to correctly read and display the counter. The metadata can be found in the Intel PMT XML files, hosted at https://github.com/intel/Intel-PMT. For a complete example see "ADD PMT COUNTER EXAMPLE".
.nf
name="name_string"
For column header.
type={\fBraw\fP}
'raw' shows the counter contents in hex.
default: raw
format={\fBraw\fP | \fBdelta\fP}
'raw' shows the counter contents in hex.
'delta' shows the difference in values during the measurement interval.
default: raw
domain={\fBcpu%u\fP | \fBcore%u\fP | \fBpackage%u\fP}
'cpu' per cpu/thread counter.
'core' per core counter.
'package' per package counter.
'%u' denotes id of the domain that the counter is associated with. For example core4 would mean that the counter is associated with core number 4.
offset=\fB%u\fP
'%u' offset within the PMT MMIO region.
lsb=\fB%u\fP
'%u' least significant bit within the 64 bit value read from 'offset'. Together with 'msb', used to form a read mask.
msb=\fB%u\fP
'%u' most significant bit within the 64 bit value read from 'offset'. Together with 'lsb', used to form a read mask.
guid=\fB%x\fP
'%x' hex identifier of the PMT MMIO region.
.fi
.PP
\fB--cpu cpu-set\fP limit output to system summary plus the specified cpu-set. If cpu-set is the string "core", then the system summary plus the first CPU in each core are printed -- eg. subsequent HT siblings are not printed. Or if cpu-set is the string "package", then the system summary plus the first CPU in each package is printed. Otherwise, the system summary plus the specified set of CPUs are printed. The cpu-set is ordered from low to high, comma delimited with ".." and "-" permitted to denote a range. eg. 1,2,8,14..17,21-44
.PP
\fB--hide column\fP do not show the specified built-in columns. May be invoked multiple times, or with a comma-separated list of column names.
......@@ -67,10 +103,10 @@ The column name "all" can be used to enable all disabled-by-default built-in cou
.PP
\fB--quiet\fP Do not decode and print the system configuration header information.
.PP
+\fB--no-msr\fP Disable all the uses of the MSR driver.
+.PP
+\fB--no-perf\fP Disable all the uses of the perf API.
+.PP
\fB--no-msr\fP Disable all the uses of the MSR driver.
.PP
\fB--no-perf\fP Disable all the uses of the perf API.
.PP
\fB--interval seconds\fP overrides the default 5.0 second measurement interval.
.PP
\fB--num_iterations num\fP number of the measurement iterations.
......@@ -320,7 +356,7 @@ available on all processors.
Here we limit turbostat to showing just the CPU number for cpu0 - cpu3.
We add a counter showing the 32-bit raw value of MSR 0x199 (MSR_IA32_PERF_CTL),
labeling it with the column header, "PRF_CTRL", and display it only once,
afte the conclusion of a 0.1 second sleep.
after the conclusion of a 0.1 second sleep.
.nf
sudo ./turbostat --quiet --cpu 0-3 --show CPU --add msr0x199,u32,raw,PRF_CTRL sleep .1
0.101604 sec
......@@ -333,6 +369,56 @@ CPU PRF_CTRL
.fi
.SH ADD PERF COUNTER EXAMPLE
Here we limit turbostat to showing just the CPU number for cpu0 - cpu3.
We add a counter showing time spent in C1 core cstate,
labeling it with the column header, "pCPU%c1", and display it only once,
after the conclusion of 0.1 second sleep.
We also show CPU%c1 built-in counter that should show similar values.
.nf
sudo ./turbostat --quiet --cpu 0-3 --show CPU,CPU%c1 --add perf/cstate_core/c1-residency,cpu,delta,percent,pCPU%c1 sleep .1
0.102448 sec
CPU pCPU%c1 CPU%c1
- 34.89 34.89
0 45.99 45.99
1 45.94 45.94
2 23.83 23.83
3 23.84 23.84
.fi
.SH ADD PMT COUNTER EXAMPLE
Here we limit turbostat to showing just the CPU number 0.
We add two counters, showing crystal clock count and the DC6 residency.
All the parameters passed are based on the metadata found in the PMT XML files.
For the crystal clock count, we
label it with the column header, "XTAL",
we set the type to 'raw', to read the number of clock ticks in hex,
we set the format to 'delta', to display the difference in ticks during the measurement interval,
we set the domain to 'package0', to collect it and associate it with the whole package number 0,
we set the offset to '0', which is a offset of the counter within the PMT MMIO region,
we set the lsb and msb to cover all 64 bits of the read 64 bit value,
and finally we set the guid to '0x1a067102', that identifies the PMT MMIO region to which the 'offset' is applied to read the counter value.
For the DC6 residency counter, we
label it with the column header, "Die%c6",
we set the type to 'txtal_time', to obtain the percent residency value
we set the format to 'delta', to display the difference in ticks during the measurement interval,
we set the domain to 'package0', to collect it and associate it with the whole package number 0,
we set the offset to '0', which is a offset of the counter within the PMT MMIO region,
we set the lsb and msb to cover all 64 bits of the read 64 bit value,
and finally we set the guid to '0x1a067102', that identifies the PMT MMIO region to which the 'offset' is applied to read the counter value.
.nf
sudo ./turbostat --quiet --cpu 0 --show CPU --add pmt,name=XTAL,type=raw,format=delta,domain=package0,offset=0,lsb=0,msb=63,guid=0x1a067102 --add pmt,name=Die%c6,type=txtal_time,format=delta,domain=package0,offset=120,lsb=0,msb=63,guid=0x1a067102
0.104352 sec
CPU XTAL Die%c6
- 0x0000006d4d957ca7 0.00
0 0x0000006d4d957ca7 0.00
0.102448 sec
.fi
.SH INPUT
For interval-mode, turbostat will immediately end the current interval
......
This diff is collapsed.
#!/bin/env python3
# SPDX-License-Identifier: GPL-2.0
import subprocess
from shutil import which
from os import pread
class PerfCounterInfo:
def __init__(self, subsys, event):
self.subsys = subsys
self.event = event
def get_perf_event_name(self):
return f'{self.subsys}/{self.event}/'
def get_turbostat_perf_id(self, counter_scope, counter_type, column_name):
return f'perf/{self.subsys}/{self.event},{counter_scope},{counter_type},{column_name}'
PERF_COUNTERS_CANDIDATES = [
PerfCounterInfo('msr', 'mperf'),
PerfCounterInfo('msr', 'aperf'),
PerfCounterInfo('msr', 'tsc'),
PerfCounterInfo('cstate_core', 'c1-residency'),
PerfCounterInfo('cstate_core', 'c6-residency'),
PerfCounterInfo('cstate_core', 'c7-residency'),
PerfCounterInfo('cstate_pkg', 'c2-residency'),
PerfCounterInfo('cstate_pkg', 'c3-residency'),
PerfCounterInfo('cstate_pkg', 'c6-residency'),
PerfCounterInfo('cstate_pkg', 'c7-residency'),
PerfCounterInfo('cstate_pkg', 'c8-residency'),
PerfCounterInfo('cstate_pkg', 'c9-residency'),
PerfCounterInfo('cstate_pkg', 'c10-residency'),
]
present_perf_counters = []
def check_perf_access():
perf = which('perf')
if perf is None:
print('SKIP: Could not find perf binary, thus could not determine perf access.')
return False
def has_perf_counter_access(counter_name):
proc_perf = subprocess.run([perf, 'stat', '-e', counter_name, '--timeout', '10'],
capture_output = True)
if proc_perf.returncode != 0:
print(f'SKIP: Could not read {counter_name} perf counter.')
return False
if b'<not supported>' in proc_perf.stderr:
print(f'SKIP: Could not read {counter_name} perf counter.')
return False
return True
for counter in PERF_COUNTERS_CANDIDATES:
if has_perf_counter_access(counter.get_perf_event_name()):
present_perf_counters.append(counter)
if len(present_perf_counters) == 0:
print('SKIP: Could not read any perf counter.')
return False
if len(present_perf_counters) != len(PERF_COUNTERS_CANDIDATES):
print(f'WARN: Could not access all of the counters - some will be left untested')
return True
if not check_perf_access():
exit(0)
turbostat_counter_source_opts = ['']
turbostat = which('turbostat')
if turbostat is None:
print('Could not find turbostat binary')
exit(1)
timeout = which('timeout')
if timeout is None:
print('Could not find timeout binary')
exit(1)
proc_turbostat = subprocess.run([turbostat, '--list'], capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
EXPECTED_COLUMNS_DEBUG_DEFAULT = [b'usec', b'Time_Of_Day_Seconds', b'APIC', b'X2APIC']
expected_columns = [b'CPU']
counters_argv = []
for counter in present_perf_counters:
if counter.subsys == 'cstate_core':
counter_scope = 'core'
elif counter.subsys == 'cstate_pkg':
counter_scope = 'package'
else:
counter_scope = 'cpu'
counter_type = 'delta'
column_name = counter.event
cparams = counter.get_turbostat_perf_id(
counter_scope = counter_scope,
counter_type = counter_type,
column_name = column_name
)
expected_columns.append(column_name.encode())
counters_argv.extend(['--add', cparams])
expected_columns_debug = EXPECTED_COLUMNS_DEBUG_DEFAULT + expected_columns
def gen_user_friendly_cmdline(argv_):
argv = argv_[:]
ret = ''
while len(argv) != 0:
arg = argv.pop(0)
arg_next = ''
if arg in ('-i', '--show', '--add'):
arg_next = argv.pop(0) if len(argv) > 0 else ''
ret += f'{arg} {arg_next} \\\n\t'
# Remove the last separator and return
return ret[:-4]
#
# Run turbostat for some time and send SIGINT
#
timeout_argv = [timeout, '--preserve-status', '-s', 'SIGINT', '-k', '3', '0.2s']
turbostat_argv = [turbostat, '-i', '0.50', '--show', 'CPU'] + counters_argv
def check_columns_or_fail(expected_columns: list, actual_columns: list):
if len(actual_columns) != len(expected_columns):
print(f'turbostat column check failed\n{expected_columns=}\n{actual_columns=}')
exit(1)
failed = False
for expected_column in expected_columns:
if expected_column not in actual_columns:
print(f'turbostat column check failed: missing column {expected_column.decode()}')
failed = True
if failed:
exit(1)
cmdline = gen_user_friendly_cmdline(turbostat_argv)
print(f'Running turbostat with:\n\t{cmdline}\n... ', end = '', flush = True)
proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
actual_columns = proc_turbostat.stdout.split(b'\n')[0].split(b'\t')
check_columns_or_fail(expected_columns, actual_columns)
print('OK')
#
# Same, but with --debug
#
# We explicitly specify '--show CPU' to make sure turbostat
# don't show a bunch of default counters instead.
#
turbostat_argv.append('--debug')
cmdline = gen_user_friendly_cmdline(turbostat_argv)
print(f'Running turbostat (in debug mode) with:\n\t{cmdline}\n... ', end = '', flush = True)
proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
actual_columns = proc_turbostat.stdout.split(b'\n')[0].split(b'\t')
check_columns_or_fail(expected_columns_debug, actual_columns)
print('OK')
#!/bin/env python3
# SPDX-License-Identifier: GPL-2.0
import subprocess
from shutil import which
from os import pread
# CDLL calls dlopen underneath.
# Calling it with None (null), we get handle to the our own image (python interpreter).
# We hope to find sched_getcpu() inside ;]
# This is a bit ugly, but helps shipping working software, so..
try:
import ctypes
this_image = ctypes.CDLL(None)
BASE_CPU = this_image.sched_getcpu()
except:
BASE_CPU = 0 # If we fail, set to 0 and pray it's not offline.
MSR_IA32_MPERF = 0x000000e7
MSR_IA32_APERF = 0x000000e8
def check_perf_access():
perf = which('perf')
if perf is None:
print('SKIP: Could not find perf binary, thus could not determine perf access.')
return False
def has_perf_counter_access(counter_name):
proc_perf = subprocess.run([perf, 'stat', '-e', counter_name, '--timeout', '10'],
capture_output = True)
if proc_perf.returncode != 0:
print(f'SKIP: Could not read {counter_name} perf counter, assuming no access.')
return False
if b'<not supported>' in proc_perf.stderr:
print(f'SKIP: Could not read {counter_name} perf counter, assuming no access.')
return False
return True
if not has_perf_counter_access('msr/mperf/'):
return False
if not has_perf_counter_access('msr/aperf/'):
return False
if not has_perf_counter_access('msr/smi/'):
return False
return True
def check_msr_access():
try:
file_msr = open(f'/dev/cpu/{BASE_CPU}/msr', 'rb')
except:
return False
if len(pread(file_msr.fileno(), 8, MSR_IA32_MPERF)) != 8:
return False
if len(pread(file_msr.fileno(), 8, MSR_IA32_APERF)) != 8:
return False
return True
has_perf_access = check_perf_access()
has_msr_access = check_msr_access()
turbostat_counter_source_opts = ['']
if has_msr_access:
turbostat_counter_source_opts.append('--no-perf')
else:
print('SKIP: doesn\'t have MSR access, skipping run with --no-perf')
if has_perf_access:
turbostat_counter_source_opts.append('--no-msr')
else:
print('SKIP: doesn\'t have perf access, skipping run with --no-msr')
if not has_msr_access and not has_perf_access:
print('SKIP: No MSR nor perf access detected. Skipping the tests entirely')
exit(0)
turbostat = which('turbostat')
if turbostat is None:
print('Could not find turbostat binary')
exit(1)
timeout = which('timeout')
if timeout is None:
print('Could not find timeout binary')
exit(1)
proc_turbostat = subprocess.run([turbostat, '--list'], capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
EXPECTED_COLUMNS_DEBUG_DEFAULT = b'usec\tTime_Of_Day_Seconds\tAPIC\tX2APIC'
SMI_APERF_MPERF_DEPENDENT_BICS = [
'SMI',
'Avg_MHz',
'Busy%',
'Bzy_MHz',
]
if has_perf_access:
SMI_APERF_MPERF_DEPENDENT_BICS.append('IPC')
for bic in SMI_APERF_MPERF_DEPENDENT_BICS:
for counter_source_opt in turbostat_counter_source_opts:
# Ugly special case, but it is what it is..
if counter_source_opt == '--no-perf' and bic == 'IPC':
continue
expected_columns = bic.encode()
expected_columns_debug = EXPECTED_COLUMNS_DEBUG_DEFAULT + f'\t{bic}'.encode()
#
# Run turbostat for some time and send SIGINT
#
timeout_argv = [timeout, '--preserve-status', '-s', 'SIGINT', '-k', '3', '0.2s']
turbostat_argv = [turbostat, '-i', '0.50', '--show', bic]
if counter_source_opt:
turbostat_argv.append(counter_source_opt)
print(f'Running turbostat with {turbostat_argv=}... ', end = '', flush = True)
proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
actual_columns = proc_turbostat.stdout.split(b'\n')[0]
if expected_columns != actual_columns:
print(f'turbostat column check failed\n{expected_columns=}\n{actual_columns=}')
exit(1)
print('OK')
#
# Same, but with --debug
#
turbostat_argv.append('--debug')
print(f'Running turbostat with {turbostat_argv=}... ', end = '', flush = True)
proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
actual_columns = proc_turbostat.stdout.split(b'\n')[0]
if expected_columns_debug != actual_columns:
print(f'turbostat column check failed\n{expected_columns_debug=}\n{actual_columns=}')
exit(1)
print('OK')
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment