Commit d1ac1a2b authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'perf-tools-for-v6.2-2-2022-12-22' of...

Merge tag 'perf-tools-for-v6.2-2-2022-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull more perf tools updates from Arnaldo Carvalho de Melo:
 "perf tools fixes and improvements:

   - Don't stop building perf if python setuptools isn't installed, just
     disable the affected perf feature.

   - Remove explicit reference to python 2.x devel files, that warning
     is about python-devel, no matter what version, being unavailable
     and thus disabling the linking with libpython.

   - Don't use -Werror=switch-enum when building the python support that
     handles libtraceevent enumerations, as there is no good way to test
     if some specific enum entry is available with the libtraceevent
     installed on the system.

   - Introduce 'perf lock contention' --type-filter and --lock-filter,
     to filter by lock type and lock name:

        $ sudo ./perf lock record -a -- ./perf bench sched messaging

        $ sudo ./perf lock contention -E 5 -Y spinlock
         contended  total wait   max wait  avg wait      type  caller

               802     1.26 ms   11.73 us   1.58 us  spinlock  __wake_up_common_lock+0x62
                13   787.16 us  105.44 us  60.55 us  spinlock  remove_wait_queue+0x14
                12   612.96 us   78.70 us  51.08 us  spinlock  prepare_to_wait+0x27
               114   340.68 us   12.61 us   2.99 us  spinlock  try_to_wake_up+0x1f5
                83   226.38 us    9.15 us   2.73 us  spinlock  folio_lruvec_lock_irqsave+0x5e

        $ sudo ./perf lock contention -l
         contended  total wait  max wait  avg wait           address  symbol

                57     1.11 ms  42.83 us  19.54 us  ffff9f4140059000
                15   280.88 us  23.51 us  18.73 us  ffffffff9d007a40  jiffies_lock
                 1    20.49 us  20.49 us  20.49 us  ffffffff9d0d50c0  rcu_state
                 1     9.02 us   9.02 us   9.02 us  ffff9f41759e9ba0

        $ sudo ./perf lock contention -L jiffies_lock,rcu_state
         contended  total wait  max wait  avg wait      type  caller

                15   280.88 us  23.51 us  18.73 us  spinlock  tick_sched_do_timer+0x93
                 1    20.49 us  20.49 us  20.49 us  spinlock  __softirqentry_text_start+0xeb

        $ sudo ./perf lock contention -L ffff9f4140059000
         contended  total wait  max wait  avg wait      type  caller

                38   779.40 us  42.83 us  20.51 us  spinlock  worker_thread+0x50
                11   216.30 us  39.87 us  19.66 us  spinlock  queue_work_on+0x39
                 8   118.13 us  20.51 us  14.77 us  spinlock  kthread+0xe5

   - Fix splitting CC into compiler and options when checking if a
     option is present in clang to build the python binding, needed in
     systems such as yocto that set CC to, e.g.: "gcc --sysroot=/a/b/c".

   - Refresh metris and events for Intel systems: alderlake.
     alderlake-n, bonnell, broadwell, broadwellde, broadwellx,
     cascadelakex, elkhartlake, goldmont, goldmontplus, haswell,
     haswellx, icelake, icelakex, ivybridge, ivytown, jaketown,
     knightslanding, meteorlake, nehalemep, nehalemex, sandybridge,
     sapphirerapids, silvermont, skylake, skylakex, snowridgex,
     tigerlake, westmereep-dp, westmereep-sp, westmereex.

   - Add vendor events files (JSON) for AMD Zen 4, from sections
     2.1.15.4 "Core Performance Monitor Counters", 2.1.15.5 "L3 Cache
     Performance Monitor Counter"s and Section 7.1 "Fabric Performance
     Monitor Counter (PMC) Events" in the Processor Programming
     Reference (PPR) for AMD Family 19h Model 11h Revision B1
     processors.

     This constitutes events which capture op dispatch, execution and
     retirement, branch prediction, L1 and L2 cache activity, TLB
     activity, L3 cache activity and data bandwidth for various links
     and interfaces in the Data Fabric.

   - Also, from the same PPR are metrics taken from Section 2.1.15.2
     "Performance Measurement", including pipeline utilization, which
     are new to Zen 4 processors and useful for finding performance
     bottlenecks by analyzing activity at different stages of the
     pipeline.

   - Greatly improve the 'srcline', 'srcline_from', 'srcline_to' and
     'srcfile' sort keys performance by postponing calling the external
     addr2line utility to the collapse phase of histogram bucketing.

   - Fix 'perf test' "all PMU test" to skip parametrized events, that
     requires setting up and are not supported by this test.

   - Update tools/ copies of kernel headers: features,
     disabled-features, fscrypt.h, i915_drm.h, msr-index.h, power pc
     syscall table and kvm.h.

   - Add .DELETE_ON_ERROR special Makefile target to clean up partially
     updated files on error.

   - Simplify the mksyscalltbl script for arm64 by avoiding to run the
     host compiler to create the syscall table, do it all just with the
     shell script.

   - Further fixes to honour quiet mode (-q)"

* tag 'perf-tools-for-v6.2-2-2022-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (67 commits)
  perf python: Fix splitting CC into compiler and options
  perf scripting python: Don't be strict at handling libtraceevent enumerations
  perf arm64: Simplify mksyscalltbl
  perf build: Remove explicit reference to python 2.x devel files
  perf vendor events amd: Add Zen 4 mapping
  perf vendor events amd: Add Zen 4 metrics
  perf vendor events amd: Add Zen 4 uncore events
  perf vendor events amd: Add Zen 4 core events
  perf vendor events intel: Refresh westmereex events
  perf vendor events intel: Refresh westmereep-sp events
  perf vendor events intel: Refresh westmereep-dp events
  perf vendor events intel: Refresh tigerlake metrics and events
  perf vendor events intel: Refresh snowridgex events
  perf vendor events intel: Refresh skylakex metrics and events
  perf vendor events intel: Refresh skylake metrics and events
  perf vendor events intel: Refresh silvermont events
  perf vendor events intel: Refresh sapphirerapids metrics and events
  perf vendor events intel: Refresh sandybridge metrics and events
  perf vendor events intel: Refresh nehalemex events
  perf vendor events intel: Refresh nehalemep events
  ...
parents 9d2f6060 09e6f9f9
......@@ -304,10 +304,16 @@
#define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */
#define X86_FEATURE_USE_IBPB_FW (11*32+16) /* "" Use IBPB during runtime firmware calls */
#define X86_FEATURE_RSB_VMEXIT_LITE (11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
#define X86_FEATURE_SGX_EDECCSSA (11*32+18) /* "" SGX EDECCSSA user leaf function */
#define X86_FEATURE_CALL_DEPTH (11*32+19) /* "" Call depth tracking for RSB stuffing */
#define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
#define X86_FEATURE_CMPCCXADD (12*32+ 7) /* "" CMPccXADD instructions */
#define X86_FEATURE_AMX_FP16 (12*32+21) /* "" AMX fp16 Support */
#define X86_FEATURE_AVX_IFMA (12*32+23) /* "" Support for VPMADD52[H,L]UQ */
/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
#define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */
......
......@@ -69,6 +69,12 @@
# define DISABLE_UNRET (1 << (X86_FEATURE_UNRET & 31))
#endif
#ifdef CONFIG_CALL_DEPTH_TRACKING
# define DISABLE_CALL_DEPTH_TRACKING 0
#else
# define DISABLE_CALL_DEPTH_TRACKING (1 << (X86_FEATURE_CALL_DEPTH & 31))
#endif
#ifdef CONFIG_INTEL_IOMMU_SVM
# define DISABLE_ENQCMD 0
#else
......@@ -81,6 +87,12 @@
# define DISABLE_SGX (1 << (X86_FEATURE_SGX & 31))
#endif
#ifdef CONFIG_XEN_PV
# define DISABLE_XENPV 0
#else
# define DISABLE_XENPV (1 << (X86_FEATURE_XENPV & 31))
#endif
#ifdef CONFIG_INTEL_TDX_GUEST
# define DISABLE_TDX_GUEST 0
#else
......@@ -98,10 +110,11 @@
#define DISABLED_MASK5 0
#define DISABLED_MASK6 0
#define DISABLED_MASK7 (DISABLE_PTI)
#define DISABLED_MASK8 (DISABLE_TDX_GUEST)
#define DISABLED_MASK8 (DISABLE_XENPV|DISABLE_TDX_GUEST)
#define DISABLED_MASK9 (DISABLE_SGX)
#define DISABLED_MASK10 0
#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET)
#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
DISABLE_CALL_DEPTH_TRACKING)
#define DISABLED_MASK12 0
#define DISABLED_MASK13 0
#define DISABLED_MASK14 0
......
......@@ -4,12 +4,7 @@
#include <linux/bits.h>
/*
* CPU model specific register (MSR) numbers.
*
* Do not add new entries to this file unless the definitions are shared
* between multiple compilation units.
*/
/* CPU model specific register (MSR) numbers. */
/* x86-64 specific MSRs */
#define MSR_EFER 0xc0000080 /* extended feature register */
......@@ -537,7 +532,7 @@
#define MSR_AMD64_DC_CFG 0xc0011022
#define MSR_AMD64_DE_CFG 0xc0011029
#define MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT 1
#define MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT 1
#define MSR_AMD64_DE_CFG_LFENCE_SERIALIZE BIT_ULL(MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT)
#define MSR_AMD64_BU_CFG2 0xc001102a
......@@ -798,6 +793,7 @@
#define ENERGY_PERF_BIAS_PERFORMANCE 0
#define ENERGY_PERF_BIAS_BALANCE_PERFORMANCE 4
#define ENERGY_PERF_BIAS_NORMAL 6
#define ENERGY_PERF_BIAS_NORMAL_POWERSAVE 7
#define ENERGY_PERF_BIAS_BALANCE_POWERSAVE 8
#define ENERGY_PERF_BIAS_POWERSAVE 15
......@@ -1052,6 +1048,20 @@
#define VMX_BASIC_MEM_TYPE_WB 6LLU
#define VMX_BASIC_INOUT 0x0040000000000000LLU
/* Resctrl MSRs: */
/* - Intel: */
#define MSR_IA32_L3_QOS_CFG 0xc81
#define MSR_IA32_L2_QOS_CFG 0xc82
#define MSR_IA32_QM_EVTSEL 0xc8d
#define MSR_IA32_QM_CTR 0xc8e
#define MSR_IA32_PQR_ASSOC 0xc8f
#define MSR_IA32_L3_CBM_BASE 0xc90
#define MSR_IA32_L2_CBM_BASE 0xd10
#define MSR_IA32_MBA_THRTL_BASE 0xd50
/* - AMD: */
#define MSR_IA32_MBA_BW_BASE 0xc0000200
/* MSR_IA32_VMX_MISC bits */
#define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
......
......@@ -645,6 +645,22 @@ typedef struct drm_i915_irq_wait {
*/
#define I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP (1ul << 5)
/*
* Query the status of HuC load.
*
* The query can fail in the following scenarios with the listed error codes:
* -ENODEV if HuC is not present on this platform,
* -EOPNOTSUPP if HuC firmware usage is disabled,
* -ENOPKG if HuC firmware fetch failed,
* -ENOEXEC if HuC firmware is invalid or mismatched,
* -ENOMEM if i915 failed to prepare the FW objects for transfer to the uC,
* -EIO if the FW transfer or the FW authentication failed.
*
* If the IOCTL is successful, the returned parameter will be set to one of the
* following values:
* * 0 if HuC firmware load is not complete,
* * 1 if HuC firmware is authenticated and running.
*/
#define I915_PARAM_HUC_STATUS 42
/* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to opt-out of
......@@ -749,6 +765,12 @@ typedef struct drm_i915_irq_wait {
/* Query if the kernel supports the I915_USERPTR_PROBE flag. */
#define I915_PARAM_HAS_USERPTR_PROBE 56
/*
* Frequency of the timestamps in OA reports. This used to be the same as the CS
* timestamp frequency, but differs on some platforms.
*/
#define I915_PARAM_OA_TIMESTAMP_FREQUENCY 57
/* Must be kept compact -- no holes and well documented */
/**
......@@ -2650,6 +2672,10 @@ enum drm_i915_oa_format {
I915_OA_FORMAT_A12_B8_C8,
I915_OA_FORMAT_A32u40_A4u32_B8_C8,
/* DG2 */
I915_OAR_FORMAT_A32u40_A4u32_B8_C8,
I915_OA_FORMAT_A24u40_A14u32_B8_C8,
I915_OA_FORMAT_MAX /* non-ABI */
};
......@@ -3493,27 +3519,13 @@ struct drm_i915_gem_create_ext {
*
* The (page-aligned) allocated size for the object will be returned.
*
* DG2 64K min page size implications:
*
* On discrete platforms, starting from DG2, we have to contend with GTT
* page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
* objects. Specifically the hardware only supports 64K or larger GTT
* page sizes for such memory. The kernel will already ensure that all
* I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
* sizes underneath.
*
* Note that the returned size here will always reflect any required
* rounding up done by the kernel, i.e 4K will now become 64K on devices
* such as DG2. The kernel will always select the largest minimum
* page-size for the set of possible placements as the value to use when
* rounding up the @size.
*
* Special DG2 GTT address alignment requirement:
*
* The GTT alignment will also need to be at least 2M for such objects.
* On platforms like DG2/ATS the kernel will always use 64K or larger
* pages for I915_MEMORY_CLASS_DEVICE. The kernel also requires a
* minimum of 64K GTT alignment for such objects.
*
* Note that due to how the hardware implements 64K GTT page support, we
* have some further complications:
* NOTE: Previously the ABI here required a minimum GTT alignment of 2M
* on DG2/ATS, due to how the hardware implemented 64K GTT page support,
* where we had the following complications:
*
* 1) The entire PDE (which covers a 2MB virtual address range), must
* contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
......@@ -3522,12 +3534,10 @@ struct drm_i915_gem_create_ext {
* 2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
* objects.
*
* To keep things simple for userland, we mandate that any GTT mappings
* must be aligned to and rounded up to 2MB. The kernel will internally
* pad them out to the next 2MB boundary. As this only wastes virtual
* address space and avoids userland having to copy any needlessly
* complicated PDE sharing scheme (coloring) and only affects DG2, this
* is deemed to be a good compromise.
* However on actual production HW this was completely changed to now
* allow setting a TLB hint at the PTE level (see PS64), which is a lot
* more flexible than the above. With this the 2M restriction was
* dropped where we now only require 64K.
*/
__u64 size;
......
......@@ -26,6 +26,8 @@
#define FSCRYPT_MODE_AES_256_CTS 4
#define FSCRYPT_MODE_AES_128_CBC 5
#define FSCRYPT_MODE_AES_128_CTS 6
#define FSCRYPT_MODE_SM4_XTS 7
#define FSCRYPT_MODE_SM4_CTS 8
#define FSCRYPT_MODE_ADIANTUM 9
#define FSCRYPT_MODE_AES_256_HCTR2 10
/* If adding a mode number > 10, update FSCRYPT_MODE_MAX in fscrypt_private.h */
......@@ -185,8 +187,6 @@ struct fscrypt_get_key_status_arg {
#define FS_ENCRYPTION_MODE_AES_256_CTS FSCRYPT_MODE_AES_256_CTS
#define FS_ENCRYPTION_MODE_AES_128_CBC FSCRYPT_MODE_AES_128_CBC
#define FS_ENCRYPTION_MODE_AES_128_CTS FSCRYPT_MODE_AES_128_CTS
#define FS_ENCRYPTION_MODE_SPECK128_256_XTS 7 /* removed */
#define FS_ENCRYPTION_MODE_SPECK128_256_CTS 8 /* removed */
#define FS_ENCRYPTION_MODE_ADIANTUM FSCRYPT_MODE_ADIANTUM
#define FS_KEY_DESC_PREFIX FSCRYPT_KEY_DESC_PREFIX
#define FS_KEY_DESC_PREFIX_SIZE FSCRYPT_KEY_DESC_PREFIX_SIZE
......
......@@ -98,7 +98,7 @@ struct kvm_userspace_memory_region {
/*
* The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for
* userspace, other bits are reserved for kvm internal use which are defined
*in include/linux/kvm_host.h.
* in include/linux/kvm_host.h.
*/
#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
#define KVM_MEM_READONLY (1UL << 1)
......@@ -477,6 +477,9 @@ struct kvm_run {
#define KVM_MSR_EXIT_REASON_INVAL (1 << 0)
#define KVM_MSR_EXIT_REASON_UNKNOWN (1 << 1)
#define KVM_MSR_EXIT_REASON_FILTER (1 << 2)
#define KVM_MSR_EXIT_REASON_VALID_MASK (KVM_MSR_EXIT_REASON_INVAL | \
KVM_MSR_EXIT_REASON_UNKNOWN | \
KVM_MSR_EXIT_REASON_FILTER)
__u32 reason; /* kernel -> user */
__u32 index; /* kernel -> user */
__u64 data; /* kernel <-> user */
......@@ -1170,6 +1173,8 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_S390_ZPCI_OP 221
#define KVM_CAP_S390_CPU_TOPOLOGY 222
#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
#ifdef KVM_CAP_IRQ_ROUTING
......@@ -1259,6 +1264,7 @@ struct kvm_x86_mce {
#define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3)
#define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4)
#define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5)
#define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG (1 << 6)
struct kvm_xen_hvm_config {
__u32 flags;
......@@ -1726,6 +1732,8 @@ enum pv_cmd_id {
KVM_PV_UNSHARE_ALL,
KVM_PV_INFO,
KVM_PV_DUMP,
KVM_PV_ASYNC_CLEANUP_PREPARE,
KVM_PV_ASYNC_CLEANUP_PERFORM,
};
struct kvm_pv_cmd {
......@@ -1756,6 +1764,7 @@ struct kvm_xen_hvm_attr {
union {
__u8 long_mode;
__u8 vector;
__u8 runstate_update_flag;
struct {
__u64 gfn;
} shared_info;
......@@ -1796,6 +1805,8 @@ struct kvm_xen_hvm_attr {
/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */
#define KVM_XEN_ATTR_TYPE_EVTCHN 0x3
#define KVM_XEN_ATTR_TYPE_XEN_VERSION 0x4
/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG */
#define KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG 0x5
/* Per-vCPU Xen attributes */
#define KVM_XEN_VCPU_GET_ATTR _IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr)
......
......@@ -143,25 +143,25 @@ CONTENTION OPTIONS
System-wide collection from all CPUs.
-C::
--cpu::
--cpu=<value>::
Collect samples only on the list of CPUs provided. Multiple CPUs can be
provided as a comma-separated list with no space: 0,1. Ranges of CPUs
are specified with -: 0-2. Default is to monitor all CPUs.
-p::
--pid=::
--pid=<value>::
Record events on existing process ID (comma separated list).
--tid=::
--tid=<value>::
Record events on existing thread ID (comma separated list).
--map-nr-entries::
--map-nr-entries=<value>::
Maximum number of BPF map entries (default: 10240).
--max-stack::
--max-stack=<value>::
Maximum stack depth when collecting lock contention (default: 8).
--stack-skip
--stack-skip=<value>::
Number of stack depth to skip when finding a lock caller (default: 3).
-E::
......@@ -172,6 +172,21 @@ CONTENTION OPTIONS
--lock-addr::
Show lock contention stat by address
-Y::
--type-filter=<value>::
Show lock contention only for given lock types (comma separated list).
Available values are:
semaphore, spinlock, rwlock, rwlock:R, rwlock:W, rwsem, rwsem:R, rwsem:W,
rtmutex, rwlock-rt, rwlock-rt:R, rwlock-rt:W, pcpu-sem, pcpu-sem:R, pcpu-sem:W,
mutex
Note that RW-variant of locks have :R and :W suffix. Names without the
suffix are shortcuts for the both variants. Ex) rwsem = rwsem:R + rwsem:W.
-L::
--lock-filter=<value>::
Show lock contention only for given lock addresses or names (comma separated list).
SEE ALSO
--------
......
......@@ -886,12 +886,17 @@ else
else
ifneq ($(feature-libpython), 1)
$(call disable-python,No 'Python.h' (for Python 2.x support) was found: disables Python support - please install python-devel/python-dev)
$(call disable-python,No 'Python.h' was found: disables Python support - please install python-devel/python-dev)
else
LDFLAGS += $(PYTHON_EMBED_LDFLAGS)
EXTLIBS += $(PYTHON_EMBED_LIBADD)
PYTHON_EXTENSION_SUFFIX := $(shell $(PYTHON) -c 'from importlib import machinery; print(machinery.EXTENSION_SUFFIXES[0])')
LANG_BINDINGS += $(obj-perf)python/perf$(PYTHON_EXTENSION_SUFFIX)
PYTHON_SETUPTOOLS_INSTALLED := $(shell $(PYTHON) -c 'import setuptools;' 2> /dev/null && echo "yes" || echo "no")
ifeq ($(PYTHON_SETUPTOOLS_INSTALLED), yes)
PYTHON_EXTENSION_SUFFIX := $(shell $(PYTHON) -c 'from importlib import machinery; print(machinery.EXTENSION_SUFFIXES[0])')
LANG_BINDINGS += $(obj-perf)python/perf$(PYTHON_EXTENSION_SUFFIX)
else
msg := $(warning Missing python setuptools, the python binding won't be built, please install python3-setuptools or equivalent);
endif
CFLAGS += -DHAVE_LIBPYTHON_SUPPORT
$(call detected,CONFIG_LIBPYTHON)
endif
......
......@@ -1151,3 +1151,6 @@ FORCE:
.PHONY: archheaders
endif # force_fixdep
# Delete partially updated (corrupted) files on error
.DELETE_ON_ERROR:
......@@ -23,34 +23,17 @@ create_table_from_c()
{
local sc nr last_sc
create_table_exe=`mktemp ${TMPDIR:-/tmp}/create-table-XXXXXX`
{
cat <<-_EoHEADER
#include <stdio.h>
#include "$input"
int main(int argc, char *argv[])
{
_EoHEADER
while read sc nr; do
printf "%s\n" " printf(\"\\t[%d] = \\\"$sc\\\",\\n\", __NR_$sc);"
printf "%s\n" " [$nr] = \"$sc\","
last_sc=$sc
done
printf "%s\n" " printf(\"#define SYSCALLTBL_ARM64_MAX_ID %d\\n\", __NR_$last_sc);"
printf "}\n"
} | $hostcc -I $incpath/include/uapi -o $create_table_exe -x c -
$create_table_exe
rm -f $create_table_exe
printf "%s\n" "#define SYSCALLTBL_ARM64_MAX_ID __NR_$last_sc"
}
create_table()
{
echo "#include \"$input\""
echo "static const char *syscalltbl_arm64[] = {"
create_table_from_c
echo "};"
......
......@@ -394,8 +394,11 @@
305 common signalfd sys_signalfd compat_sys_signalfd
306 common timerfd_create sys_timerfd_create
307 common eventfd sys_eventfd
308 common sync_file_range2 sys_sync_file_range2 compat_sys_ppc_sync_file_range2
309 nospu fallocate sys_fallocate compat_sys_fallocate
308 32 sync_file_range2 sys_ppc_sync_file_range2 compat_sys_ppc_sync_file_range2
308 64 sync_file_range2 sys_sync_file_range2
308 spu sync_file_range2 sys_sync_file_range2
309 32 fallocate sys_ppc_fallocate compat_sys_fallocate
309 64 fallocate sys_fallocate
310 nospu subpage_prot sys_subpage_prot
311 32 timerfd_settime sys_timerfd_settime32
311 64 timerfd_settime sys_timerfd_settime
......
This diff is collapsed.
......@@ -612,6 +612,15 @@ __cmd_probe(int argc, const char **argv)
argc = parse_options(argc, argv, options, probe_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
if (quiet) {
if (verbose != 0) {
pr_err(" Error: -v and -q are exclusive.\n");
return -EINVAL;
}
verbose = -1;
}
if (argc > 0) {
if (strcmp(argv[0], "-") == 0) {
usage_with_options_msg(probe_usage, options,
......@@ -633,14 +642,6 @@ __cmd_probe(int argc, const char **argv)
if (ret)
return ret;
if (quiet) {
if (verbose != 0) {
pr_err(" Error: -v and -q are exclusive.\n");
return -EINVAL;
}
verbose = -1;
}
if (probe_conf.max_probes == 0)
probe_conf.max_probes = MAX_PROBES;
......
......@@ -3629,7 +3629,7 @@ static int record__init_thread_cpu_masks(struct record *rec, struct perf_cpu_map
for (t = 0; t < rec->nr_threads; t++) {
__set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].maps.bits);
__set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].affinity.bits);
if (verbose) {
if (verbose > 0) {
pr_debug("thread_masks[%d]: ", t);
mmap_cpu_mask__scnprintf(&rec->thread_masks[t].maps, "maps");
pr_debug("thread_masks[%d]: ", t);
......@@ -3726,7 +3726,7 @@ static int record__init_thread_masks_spec(struct record *rec, struct perf_cpu_ma
}
rec->thread_masks = thread_masks;
rec->thread_masks[t] = thread_mask;
if (verbose) {
if (verbose > 0) {
pr_debug("thread_masks[%d]: ", t);
mmap_cpu_mask__scnprintf(&rec->thread_masks[t].maps, "maps");
pr_debug("thread_masks[%d]: ", t);
......
......@@ -2233,7 +2233,7 @@ static void process_event(struct perf_script *script,
if (PRINT_FIELD(METRIC))
perf_sample__fprint_metric(script, thread, evsel, sample, fp);
if (verbose)
if (verbose > 0)
fflush(fp);
}
......
......@@ -266,7 +266,7 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
evsel__group_desc(leader, buf, sizeof(buf));
pr_warning(" %s\n", buf);
if (verbose) {
if (verbose > 0) {
cpu_map__snprint(leader->core.cpus, buf, sizeof(buf));
pr_warning(" %s: %s\n", leader->name, buf);
cpu_map__snprint(evsel->core.cpus, buf, sizeof(buf));
......@@ -2493,7 +2493,7 @@ int cmd_stat(int argc, const char **argv)
if (iostat_mode == IOSTAT_LIST) {
iostat_list(evsel_list, &stat_config);
goto out;
} else if (verbose)
} else if (verbose > 0)
iostat_list(evsel_list, &stat_config);
if (iostat_mode == IOSTAT_RUN && !target__has_cpu(&target))
target.system_wide = true;
......
......@@ -119,7 +119,7 @@ struct perf_dlfilter_fns perf_dlfilter_fns;
static int verbose;
#define pr_debug(fmt, ...) do { \
if (verbose) \
if (verbose > 0) \
fprintf(stderr, fmt, ##__VA_ARGS__); \
} while (0)
......
......@@ -165,14 +165,14 @@
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to stores or loads. ",
"MetricExpr": "min((TOPDOWN_BE_BOUND.ALL / SLOTS), (LD_HEAD.ANY_AT_RET / CLKS) + tma_store_bound)",
"MetricExpr": "min(tma_backend_bound, LD_HEAD.ANY_AT_RET / CLKS + tma_store_bound)",
"MetricGroup": "TopdownL2;tma_backend_bound_group",
"MetricName": "tma_load_store_bound",
"ScaleUnit": "100%"
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to store buffer full.",
"MetricExpr": "tma_mem_scheduler * (MEM_SCHEDULER_BLOCK.ST_BUF / MEM_SCHEDULER_BLOCK.ALL)",
"MetricExpr": "tma_st_buffer",
"MetricGroup": "TopdownL3;tma_load_store_bound_group",
"MetricName": "tma_store_bound",
"ScaleUnit": "100%"
......@@ -214,21 +214,21 @@
},
{
"BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the L2 Cache.",
"MetricExpr": "(MEM_BOUND_STALLS.LOAD_L2_HIT / CLKS) - (MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.LOAD)",
"MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / CLKS - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.LOAD",
"MetricGroup": "TopdownL3;tma_load_store_bound_group",
"MetricName": "tma_l2_bound",
"ScaleUnit": "100%"
},
{
"BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the Last Level Cache (LLC) or other core with HITE/F/M.",
"MetricExpr": "(MEM_BOUND_STALLS.LOAD_LLC_HIT / CLKS) - (MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.LOAD)",
"MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / CLKS - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.LOAD",
"MetricGroup": "TopdownL3;tma_load_store_bound_group",
"MetricName": "tma_l3_bound",
"ScaleUnit": "100%"
},
{
"BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).",
"MetricExpr": "(MEM_BOUND_STALLS.LOAD_DRAM_HIT / CLKS) - (MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STALLS.LOAD)",
"MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / CLKS - MEM_BOUND_STALLS_AT_RET_CORRECTION * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STALLS.LOAD",
"MetricGroup": "TopdownL3;tma_load_store_bound_group",
"MetricName": "tma_dram_bound",
"ScaleUnit": "100%"
......@@ -492,22 +492,22 @@
},
{
"BriefDescription": "Percent of instruction miss cost that hit in the L2",
"MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_L2_HIT / (MEM_BOUND_STALLS.IFETCH)",
"MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_L2_HIT / MEM_BOUND_STALLS.IFETCH",
"MetricName": "Inst_Miss_Cost_L2Hit_Percent"
},
{
"BriefDescription": "Percent of instruction miss cost that hit in the L3",
"MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_LLC_HIT / (MEM_BOUND_STALLS.IFETCH)",
"MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_LLC_HIT / MEM_BOUND_STALLS.IFETCH",
"MetricName": "Inst_Miss_Cost_L3Hit_Percent"
},
{
"BriefDescription": "Percent of instruction miss cost that hit in DRAM",
"MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_DRAM_HIT / (MEM_BOUND_STALLS.IFETCH)",
"MetricExpr": "100 * MEM_BOUND_STALLS.IFETCH_DRAM_HIT / MEM_BOUND_STALLS.IFETCH",
"MetricName": "Inst_Miss_Cost_DRAMHit_Percent"
},
{
"BriefDescription": "load ops retired per 1000 instruction",
"MetricExpr": "1000 * MEM_UOPS_RETIRED.ALL_LOADS / INST_RETIRED.ANY",
"MetricExpr": "1e3 * MEM_UOPS_RETIRED.ALL_LOADS / INST_RETIRED.ANY",
"MetricName": "MemLoadPKI"
},
{
......
[
{
"EventName": "bp_l2_btb_correct",
"EventCode": "0x8b",
"BriefDescription": "L2 branch prediction overrides existing prediction (speculative)."
},
{
"EventName": "bp_dyn_ind_pred",
"EventCode": "0x8e",
"BriefDescription": "Dynamic indirect predictions (branch used the indirect predictor to make a prediction)."
},
{
"EventName": "bp_de_redirect",
"EventCode": "0x91",
"BriefDescription": "Instruction decoder corrects the predicted target and resteers the branch predictor."
},
{
"EventName": "ex_ret_brn",
"EventCode": "0xc2",
"BriefDescription": "Retired branch instructions (all types of architectural control flow changes, including exceptions and interrupts)."
},
{
"EventName": "ex_ret_brn_misp",
"EventCode": "0xc3",
"BriefDescription": "Retired branch instructions mispredicted."
},
{
"EventName": "ex_ret_brn_tkn",
"EventCode": "0xc4",
"BriefDescription": "Retired taken branch instructions (all types of architectural control flow changes, including exceptions and interrupts)."
},
{
"EventName": "ex_ret_brn_tkn_misp",
"EventCode": "0xc5",
"BriefDescription": "Retired taken branch instructions mispredicted."
},
{
"EventName": "ex_ret_brn_far",
"EventCode": "0xc6",
"BriefDescription": "Retired far control transfers (far call/jump/return, IRET, SYSCALL and SYSRET, plus exceptions and interrupts). Far control transfers are not subject to branch prediction."
},
{
"EventName": "ex_ret_near_ret",
"EventCode": "0xc8",
"BriefDescription": "Retired near returns (RET or RET Iw)."
},
{
"EventName": "ex_ret_near_ret_mispred",
"EventCode": "0xc9",
"BriefDescription": "Retired near returns mispredicted. Each misprediction incurs the same penalty as a mispredicted conditional branch instruction."
},
{
"EventName": "ex_ret_brn_ind_misp",
"EventCode": "0xca",
"BriefDescription": "Retired indirect branch instructions mispredicted (only EX mispredicts). Each misprediction incurs the same penalty as a mispredicted conditional branch instruction."
},
{
"EventName": "ex_ret_ind_brch_instr",
"EventCode": "0xcc",
"BriefDescription": "Retired indirect branch instructions."
},
{
"EventName": "ex_ret_cond",
"EventCode": "0xd1",
"BriefDescription": "Retired conditional branch instructions."
},
{
"EventName": "ex_ret_msprd_brnch_instr_dir_msmtch",
"EventCode": "0x1c7",
"BriefDescription": "Retired branch instructions mispredicted due to direction mismatch."
},
{
"EventName": "ex_ret_uncond_brnch_instr_mispred",
"EventCode": "0x1c8",
"BriefDescription": "Retired unconditional indirect branch instructions mispredicted."
},
{
"EventName": "ex_ret_uncond_brnch_instr",
"EventCode": "0x1c9",
"BriefDescription": "Retired unconditional branch instructions."
}
]
This diff is collapsed.
[
{
"EventName": "ls_locks.bus_lock",
"EventCode": "0x25",
"BriefDescription": "Retired Lock instructions which caused a bus lock.",
"UMask": "0x01"
},
{
"EventName": "ls_ret_cl_flush",
"EventCode": "0x26",
"BriefDescription": "Retired CLFLUSH instructions."
},
{
"EventName": "ls_ret_cpuid",
"EventCode": "0x27",
"BriefDescription": "Retired CPUID instructions."
},
{
"EventName": "ls_smi_rx",
"EventCode": "0x2b",
"BriefDescription": "SMIs received."
},
{
"EventName": "ls_int_taken",
"EventCode": "0x2c",
"BriefDescription": "Interrupts taken."
},
{
"EventName": "ls_not_halted_cyc",
"EventCode": "0x76",
"BriefDescription": "Core cycles not in halt."
},
{
"EventName": "ex_ret_instr",
"EventCode": "0xc0",
"BriefDescription": "Retired instructions."
},
{
"EventName": "ex_ret_ops",
"EventCode": "0xc1",
"BriefDescription": "Retired macro-ops."
},
{
"EventName": "ex_div_busy",
"EventCode": "0xd3",
"BriefDescription": "Number of cycles the divider is busy."
},
{
"EventName": "ex_div_count",
"EventCode": "0xd4",
"BriefDescription": "Divide ops executed."
},
{
"EventName": "ex_no_retire.empty",
"EventCode": "0xd6",
"BriefDescription": "Cycles with no retire due to the lack of valid ops in the retire queue (may be caused by front-end bottlenecks or pipeline redirects).",
"UMask": "0x01"
},
{
"EventName": "ex_no_retire.not_complete",
"EventCode": "0xd6",
"BriefDescription": "Cycles with no retire while the oldest op is waiting to be executed.",
"UMask": "0x02"
},
{
"EventName": "ex_no_retire.other",
"EventCode": "0xd6",
"BriefDescription": "Cycles with no retire caused by other reasons (retire breaks, traps, faults, etc.).",
"UMask": "0x08"
},
{
"EventName": "ex_no_retire.thread_not_selected",
"EventCode": "0xd6",
"BriefDescription": "Cycles with no retire because thread arbitration did not select the thread.",
"UMask": "0x10"
},
{
"EventName": "ex_no_retire.load_not_complete",
"EventCode": "0xd6",
"BriefDescription": "Cycles with no retire while the oldest op is waiting for load data.",
"UMask": "0xa2"
},
{
"EventName": "ex_no_retire.all",
"EventCode": "0xd6",
"BriefDescription": "Cycles with no retire for any reason.",
"UMask": "0x1b"
},
{
"EventName": "ls_not_halted_p0_cyc.p0_freq_cyc",
"EventCode": "0x120",
"BriefDescription": "Reference cycles (P0 frequency) not in halt .",
"UMask": "0x1"
},
{
"EventName": "ex_ret_ucode_instr",
"EventCode": "0x1c1",
"BriefDescription": "Retired microcoded instructions."
},
{
"EventName": "ex_ret_ucode_ops",
"EventCode": "0x1c2",
"BriefDescription": "Retired microcode ops."
},
{
"EventName": "ex_tagged_ibs_ops.ibs_tagged_ops",
"EventCode": "0x1cf",
"BriefDescription": "Ops tagged by IBS.",
"UMask": "0x01"
},
{
"EventName": "ex_tagged_ibs_ops.ibs_tagged_ops_ret",
"EventCode": "0x1cf",
"BriefDescription": "Ops tagged by IBS that retired.",
"UMask": "0x02"
},
{
"EventName": "ex_ret_fused_instr",
"EventCode": "0x1d0",
"BriefDescription": "Retired fused instructions."
}
]
This diff is collapsed.
This diff is collapsed.
[
{
"EventName": "ls_bad_status2.stli_other",
"EventCode": "0x24",
"BriefDescription": "Store-to-load conflicts (load unable to complete due to a non-forwardable conflict with an older store).",
"UMask": "0x02"
},
{
"EventName": "ls_dispatch.ld_dispatch",
"EventCode": "0x29",
"BriefDescription": "Number of memory load operations dispatched to the load-store unit.",
"UMask": "0x01"
},
{
"EventName": "ls_dispatch.store_dispatch",
"EventCode": "0x29",
"BriefDescription": "Number of memory store operations dispatched to the load-store unit.",
"UMask": "0x02"
},
{
"EventName": "ls_dispatch.ld_st_dispatch",
"EventCode": "0x29",
"BriefDescription": "Number of memory load-store operations dispatched to the load-store unit.",
"UMask": "0x04"
},
{
"EventName": "ls_stlf",
"EventCode": "0x35",
"BriefDescription": "Store-to-load-forward (STLF) hits."
},
{
"EventName": "ls_st_commit_cancel2.st_commit_cancel_wcb_full",
"EventCode": "0x37",
"BriefDescription": "Non-cacheable store commits cancelled due to the non-cacheable commit buffer being full.",
"UMask": "0x01"
},
{
"EventName": "ls_l1_d_tlb_miss.tlb_reload_4k_l2_hit",
"EventCode": "0x45",
"BriefDescription": "L1 DTLB misses with L2 DTLB hits for 4k pages.",
"UMask": "0x01"
},
{
"EventName": "ls_l1_d_tlb_miss.tlb_reload_coalesced_page_hit",
"EventCode": "0x45",
"BriefDescription": "L1 DTLB misses with L2 DTLB hits for coalesced pages. A coalesced page is a 16k page created from four adjacent 4k pages.",
"UMask": "0x02"
},
{
"EventName": "ls_l1_d_tlb_miss.tlb_reload_2m_l2_hit",
"EventCode": "0x45",
"BriefDescription": "L1 DTLB misses with L2 DTLB hits for 2M pages.",
"UMask": "0x04"
},
{
"EventName": "ls_l1_d_tlb_miss.tlb_reload_1g_l2_hit",
"EventCode": "0x45",
"BriefDescription": "L1 DTLB misses with L2 DTLB hits for 1G pages.",
"UMask": "0x08"
},
{
"EventName": "ls_l1_d_tlb_miss.tlb_reload_4k_l2_miss",
"EventCode": "0x45",
"BriefDescription": "L1 DTLB misses with L2 DTLB misses (page-table walks are requested) for 4k pages.",
"UMask": "0x10"
},
{
"EventName": "ls_l1_d_tlb_miss.tlb_reload_coalesced_page_miss",
"EventCode": "0x45",
"BriefDescription": "L1 DTLB misses with L2 DTLB misses (page-table walks are requested) for coalesced pages. A coalesced page is a 16k page created from four adjacent 4k pages.",
"UMask": "0x20"
},
{
"EventName": "ls_l1_d_tlb_miss.tlb_reload_2m_l2_miss",
"EventCode": "0x45",
"BriefDescription": "L1 DTLB misses with L2 DTLB misses (page-table walks are requested) for 2M pages.",
"UMask": "0x40"
},
{
"EventName": "ls_l1_d_tlb_miss.tlb_reload_1g_l2_miss",
"EventCode": "0x45",
"BriefDescription": "L1 DTLB misses with L2 DTLB misses (page-table walks are requested) for 1G pages.",
"UMask": "0x80"
},
{
"EventName": "ls_l1_d_tlb_miss.all_l2_miss",
"EventCode": "0x45",
"BriefDescription": "L1 DTLB misses with L2 DTLB misses (page-table walks are requested) for all page sizes.",
"UMask": "0xf0"
},
{
"EventName": "ls_l1_d_tlb_miss.all",
"EventCode": "0x45",
"BriefDescription": "L1 DTLB misses for all page sizes.",
"UMask": "0xff"
},
{
"EventName": "ls_misal_loads.ma64",
"EventCode": "0x47",
"BriefDescription": "64B misaligned (cacheline crossing) loads.",
"UMask": "0x01"
},
{
"EventName": "ls_misal_loads.ma4k",
"EventCode": "0x47",
"BriefDescription": "4kB misaligned (page crossing) loads.",
"UMask": "0x02"
},
{
"EventName": "ls_tlb_flush.all",
"EventCode": "0x78",
"BriefDescription": "All TLB Flushes.",
"UMask": "0xff"
},
{
"EventName": "bp_l1_tlb_miss_l2_tlb_hit",
"EventCode": "0x84",
"BriefDescription": "Instruction fetches that miss in the L1 ITLB but hit in the L2 ITLB."
},
{
"EventName": "bp_l1_tlb_miss_l2_tlb_miss.if4k",
"EventCode": "0x85",
"BriefDescription": "Instruction fetches that miss in both the L1 and L2 ITLBs (page-table walks are requested) for 4k pages.",
"UMask": "0x01"
},
{
"EventName": "bp_l1_tlb_miss_l2_tlb_miss.if2m",
"EventCode": "0x85",
"BriefDescription": "Instruction fetches that miss in both the L1 and L2 ITLBs (page-table walks are requested) for 2M pages.",
"UMask": "0x02"
},
{
"EventName": "bp_l1_tlb_miss_l2_tlb_miss.if1g",
"EventCode": "0x85",
"BriefDescription": "Instruction fetches that miss in both the L1 and L2 ITLBs (page-table walks are requested) for 1G pages.",
"UMask": "0x04"
},
{
"EventName": "bp_l1_tlb_miss_l2_tlb_miss.coalesced_4k",
"EventCode": "0x85",
"BriefDescription": "Instruction fetches that miss in both the L1 and L2 ITLBs (page-table walks are requested) for coalesced pages. A coalesced page is a 16k page created from four adjacent 4k pages.",
"UMask": "0x08"
},
{
"EventName": "bp_l1_tlb_miss_l2_tlb_miss.all",
"EventCode": "0x85",
"BriefDescription": "Instruction fetches that miss in both the L1 and L2 ITLBs (page-table walks are requested) for all page sizes.",
"UMask": "0x0f"
},
{
"EventName": "bp_l1_tlb_fetch_hit.if4k",
"EventCode": "0x94",
"BriefDescription": "Instruction fetches that hit in the L1 ITLB for 4k or coalesced pages. A coalesced page is a 16k page created from four adjacent 4k pages.",
"UMask": "0x01"
},
{
"EventName": "bp_l1_tlb_fetch_hit.if2m",
"EventCode": "0x94",
"BriefDescription": "Instruction fetches that hit in the L1 ITLB for 2M pages.",
"UMask": "0x02"
},
{
"EventName": "bp_l1_tlb_fetch_hit.if1g",
"EventCode": "0x94",
"BriefDescription": "Instruction fetches that hit in the L1 ITLB for 1G pages.",
"UMask": "0x04"
},
{
"EventName": "bp_l1_tlb_fetch_hit.all",
"EventCode": "0x94",
"BriefDescription": "Instruction fetches that hit in the L1 ITLB for all page sizes.",
"UMask": "0x07"
}
]
[
{
"EventName": "resyncs_or_nc_redirects",
"EventCode": "0x96",
"BriefDescription": "Pipeline restarts not caused by branch mispredicts."
},
{
"EventName": "de_op_queue_empty",
"EventCode": "0xa9",
"BriefDescription": "Cycles when the op queue is empty. Such cycles indicate that the front-end is not delivering instructions fast enough."
},
{
"EventName": "de_src_op_disp.decoder",
"EventCode": "0xaa",
"BriefDescription": "Ops fetched from instruction cache and dispatched.",
"UMask": "0x01"
},
{
"EventName": "de_src_op_disp.op_cache",
"EventCode": "0xaa",
"BriefDescription": "Ops fetched from op cache and dispatched.",
"UMask": "0x02"
},
{
"EventName": "de_src_op_disp.loop_buffer",
"EventCode": "0xaa",
"BriefDescription": "Ops dispatched from loop buffer.",
"UMask": "0x04"
},
{
"EventName": "de_src_op_disp.all",
"EventCode": "0xaa",
"BriefDescription": "Ops dispatched from any source.",
"UMask": "0x07"
},
{
"EventName": "de_dis_ops_from_decoder.any_fp_dispatch",
"EventCode": "0xab",
"BriefDescription": "Number of ops dispatched to the floating-point unit.",
"UMask": "0x04"
},
{
"EventName": "de_dis_ops_from_decoder.disp_op_type.any_integer_dispatch",
"EventCode": "0xab",
"BriefDescription": "Number of ops dispatched to the integer execution unit.",
"UMask": "0x08"
},
{
"EventName": "de_dis_dispatch_token_stalls1.int_phy_reg_file_rsrc_stall",
"EventCode": "0xae",
"BriefDescription": "Number of cycles dispatch is stalled for integer physical register file tokens.",
"UMask": "0x01"
},
{
"EventName": "de_dis_dispatch_token_stalls1.load_queue_rsrc_stall",
"EventCode": "0xae",
"BriefDescription": "Number of cycles dispatch is stalled for Load queue token.",
"UMask": "0x02"
},
{
"EventName": "de_dis_dispatch_token_stalls1.store_queue_rsrc_stall",
"EventCode": "0xae",
"BriefDescription": "Number of cycles dispatch is stalled for store queue tokens.",
"UMask": "0x04"
},
{
"EventName": "de_dis_dispatch_token_stalls1.taken_brnch_buffer_rsrc",
"EventCode": "0xae",
"BriefDescription": "Number of cycles dispatch is stalled for taken branch buffer tokens.",
"UMask": "0x10"
},
{
"EventName": "de_dis_dispatch_token_stalls1.fp_reg_file_rsrc_stall",
"EventCode": "0xae",
"BriefDescription": "Number of cycles dispatch is stalled for floating-point register file tokens.",
"UMask": "0x20"
},
{
"EventName": "de_dis_dispatch_token_stalls1.fp_sch_rsrc_stall",
"EventCode": "0xae",
"BriefDescription": "Number of cycles dispatch is stalled for floating-point scheduler tokens.",
"UMask": "0x40"
},
{
"EventName": "de_dis_dispatch_token_stalls1.fp_flush_recovery_stall",
"EventCode": "0xae",
"BriefDescription": "Number of cycles dispatch is stalled for floating-point flush recovery.",
"UMask": "0x80"
},
{
"EventName": "de_dis_dispatch_token_stalls2.int_sch0_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Number of cycles dispatch is stalled for integer scheduler queue 0 tokens.",
"UMask": "0x01"
},
{
"EventName": "de_dis_dispatch_token_stalls2.int_sch1_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Number of cycles dispatch is stalled for integer scheduler queue 1 tokens.",
"UMask": "0x02"
},
{
"EventName": "de_dis_dispatch_token_stalls2.int_sch2_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Number of cycles dispatch is stalled for integer scheduler queue 2 tokens.",
"UMask": "0x04"
},
{
"EventName": "de_dis_dispatch_token_stalls2.int_sch3_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Number of cycles dispatch is stalled for integer scheduler queue 3 tokens.",
"UMask": "0x08"
},
{
"EventName": "de_dis_dispatch_token_stalls2.retire_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Number of cycles dispatch is stalled for retire queue tokens.",
"UMask": "0x20"
},
{
"EventName": "de_no_dispatch_per_slot.no_ops_from_frontend",
"EventCode": "0x1a0",
"BriefDescription": "In each cycle counts dispatch slots left empty because the front-end did not supply ops.",
"UMask": "0x01"
},
{
"EventName": "de_no_dispatch_per_slot.backend_stalls",
"EventCode": "0x1a0",
"BriefDescription": "In each cycle counts ops unable to dispatch because of back-end stalls.",
"UMask": "0x1e"
},
{
"EventName": "de_no_dispatch_per_slot.smt_contention",
"EventCode": "0x1a0",
"BriefDescription": "In each cycle counts ops unable to dispatch because the dispatch cycle was granted to the other SMT thread.",
"UMask": "0x60"
}
]
[
{
"MetricName": "total_dispatch_slots",
"BriefDescription": "Total dispatch slots (upto 6 instructions can be dispatched in each cycle).",
"MetricExpr": "6 * ls_not_halted_cyc"
},
{
"MetricName": "frontend_bound",
"BriefDescription": "Fraction of dispatch slots that remained unused because the frontend did not supply enough instructions/ops.",
"MetricExpr": "d_ratio(de_no_dispatch_per_slot.no_ops_from_frontend, total_dispatch_slots)",
"MetricGroup": "PipelineL1",
"ScaleUnit": "100%"
},
{
"MetricName": "bad_speculation",
"BriefDescription": "Fraction of dispatched ops that did not retire.",
"MetricExpr": "d_ratio(de_src_op_disp.all - ex_ret_ops, total_dispatch_slots)",
"MetricGroup": "PipelineL1",
"ScaleUnit": "100%"
},
{
"MetricName": "backend_bound",
"BriefDescription": "Fraction of dispatch slots that remained unused because of backend stalls.",
"MetricExpr": "d_ratio(de_no_dispatch_per_slot.backend_stalls, total_dispatch_slots)",
"MetricGroup": "PipelineL1",
"ScaleUnit": "100%"
},
{
"MetricName": "smt_contention",
"BriefDescription": "Fraction of dispatch slots that remained unused because the other thread was selected.",
"MetricExpr": "d_ratio(de_no_dispatch_per_slot.smt_contention, total_dispatch_slots)",
"MetricGroup": "PipelineL1",
"ScaleUnit": "100%"
},
{
"MetricName": "retiring",
"BriefDescription": "Fraction of dispatch slots used by ops that retired.",
"MetricExpr": "d_ratio(ex_ret_ops, total_dispatch_slots)",
"MetricGroup": "PipelineL1",
"ScaleUnit": "100%"
},
{
"MetricName": "frontend_bound_latency",
"BriefDescription": "Fraction of dispatch slots that remained unused because of a latency bottleneck in the frontend (such as instruction cache or TLB misses).",
"MetricExpr": "d_ratio((6 * cpu@de_no_dispatch_per_slot.no_ops_from_frontend\\,cmask\\=0x6@), total_dispatch_slots)",
"MetricGroup": "PipelineL2;frontend_bound_group",
"ScaleUnit": "100%"
},
{
"MetricName": "frontend_bound_bandwidth",
"BriefDescription": "Fraction of dispatch slots that remained unused because of a bandwidth bottleneck in the frontend (such as decode or op cache fetch bandwidth).",
"MetricExpr": "d_ratio(de_no_dispatch_per_slot.no_ops_from_frontend - (6 * cpu@de_no_dispatch_per_slot.no_ops_from_frontend\\,cmask\\=0x6@), total_dispatch_slots)",
"MetricGroup": "PipelineL2;frontend_bound_group",
"ScaleUnit": "100%"
},
{
"MetricName": "bad_speculation_mispredicts",
"BriefDescription": "Fraction of dispatched ops that were flushed due to branch mispredicts.",
"MetricExpr": "d_ratio(bad_speculation * ex_ret_brn_misp, ex_ret_brn_misp + resyncs_or_nc_redirects)",
"MetricGroup": "PipelineL2;bad_speculation_group",
"ScaleUnit": "100%"
},
{
"MetricName": "bad_speculation_pipeline_restarts",
"BriefDescription": "Fraction of dispatched ops that were flushed due to pipeline restarts (resyncs).",
"MetricExpr": "d_ratio(bad_speculation * resyncs_or_nc_redirects, ex_ret_brn_misp + resyncs_or_nc_redirects)",
"MetricGroup": "PipelineL2;bad_speculation_group",
"ScaleUnit": "100%"
},
{
"MetricName": "backend_bound_memory",
"BriefDescription": "Fraction of dispatch slots that remained unused because of stalls due to the memory subsystem.",
"MetricExpr": "backend_bound * d_ratio(ex_no_retire.load_not_complete, ex_no_retire.not_complete)",
"MetricGroup": "PipelineL2;backend_bound_group",
"ScaleUnit": "100%"
},
{
"MetricName": "backend_bound_cpu",
"BriefDescription": "Fraction of dispatch slots that remained unused because of stalls not related to the memory subsystem.",
"MetricExpr": "backend_bound * (1 - d_ratio(ex_no_retire.load_not_complete, ex_no_retire.not_complete))",
"MetricGroup": "PipelineL2;backend_bound_group",
"ScaleUnit": "100%"
},
{
"MetricName": "retiring_fastpath",
"BriefDescription": "Fraction of dispatch slots used by fastpath ops that retired.",
"MetricExpr": "retiring * (1 - d_ratio(ex_ret_ucode_ops, ex_ret_ops))",
"MetricGroup": "PipelineL2;retiring_group",
"ScaleUnit": "100%"
},
{
"MetricName": "retiring_microcode",
"BriefDescription": "Fraction of dispatch slots used by microcode ops that retired.",
"MetricExpr": "retiring * d_ratio(ex_ret_ucode_ops, ex_ret_ops)",
"MetricGroup": "PipelineL2;retiring_group",
"ScaleUnit": "100%"
}
]
This diff is collapsed.
[
{
"BriefDescription": "Floating point assists for retired operations.",
"Counter": "0,1",
"EventCode": "0x11",
"EventName": "FP_ASSIST.AR",
"SampleAfterValue": "10000",
......@@ -9,7 +8,6 @@
},
{
"BriefDescription": "Floating point assists.",
"Counter": "0,1",
"EventCode": "0x11",
"EventName": "FP_ASSIST.S",
"SampleAfterValue": "10000",
......@@ -17,15 +15,12 @@
},
{
"BriefDescription": "SIMD assists invoked.",
"Counter": "0,1",
"EventCode": "0xCD",
"EventName": "SIMD_ASSIST",
"SampleAfterValue": "100000",
"UMask": "0x0"
"SampleAfterValue": "100000"
},
{
"BriefDescription": "Retired computational Streaming SIMD Extensions (SSE) packed-single instructions.",
"Counter": "0,1",
"EventCode": "0xCA",
"EventName": "SIMD_COMP_INST_RETIRED.PACKED_SINGLE",
"SampleAfterValue": "2000000",
......@@ -33,7 +28,6 @@
},
{
"BriefDescription": "Retired computational Streaming SIMD Extensions 2 (SSE2) scalar-double instructions.",
"Counter": "0,1",
"EventCode": "0xCA",
"EventName": "SIMD_COMP_INST_RETIRED.SCALAR_DOUBLE",
"SampleAfterValue": "2000000",
......@@ -41,7 +35,6 @@
},
{
"BriefDescription": "Retired computational Streaming SIMD Extensions (SSE) scalar-single instructions.",
"Counter": "0,1",
"EventCode": "0xCA",
"EventName": "SIMD_COMP_INST_RETIRED.SCALAR_SINGLE",
"SampleAfterValue": "2000000",
......@@ -49,15 +42,12 @@
},
{
"BriefDescription": "SIMD Instructions retired.",
"Counter": "0,1",
"EventCode": "0xCE",
"EventName": "SIMD_INSTR_RETIRED",
"SampleAfterValue": "2000000",
"UMask": "0x0"
"SampleAfterValue": "2000000"
},
{
"BriefDescription": "Retired Streaming SIMD Extensions (SSE) packed-single instructions.",
"Counter": "0,1",
"EventCode": "0xC7",
"EventName": "SIMD_INST_RETIRED.PACKED_SINGLE",
"SampleAfterValue": "2000000",
......@@ -65,7 +55,6 @@
},
{
"BriefDescription": "Retired Streaming SIMD Extensions 2 (SSE2) scalar-double instructions.",
"Counter": "0,1",
"EventCode": "0xC7",
"EventName": "SIMD_INST_RETIRED.SCALAR_DOUBLE",
"SampleAfterValue": "2000000",
......@@ -73,7 +62,6 @@
},
{
"BriefDescription": "Retired Streaming SIMD Extensions (SSE) scalar-single instructions.",
"Counter": "0,1",
"EventCode": "0xC7",
"EventName": "SIMD_INST_RETIRED.SCALAR_SINGLE",
"SampleAfterValue": "2000000",
......@@ -81,7 +69,6 @@
},
{
"BriefDescription": "Retired Streaming SIMD Extensions 2 (SSE2) vector instructions.",
"Counter": "0,1",
"EventCode": "0xC7",
"EventName": "SIMD_INST_RETIRED.VECTOR",
"SampleAfterValue": "2000000",
......@@ -89,15 +76,12 @@
},
{
"BriefDescription": "Saturated arithmetic instructions retired.",
"Counter": "0,1",
"EventCode": "0xCF",
"EventName": "SIMD_SAT_INSTR_RETIRED",
"SampleAfterValue": "2000000",
"UMask": "0x0"
"SampleAfterValue": "2000000"
},
{
"BriefDescription": "SIMD saturated arithmetic micro-ops retired.",
"Counter": "0,1",
"EventCode": "0xB1",
"EventName": "SIMD_SAT_UOP_EXEC.AR",
"SampleAfterValue": "2000000",
......@@ -105,15 +89,12 @@
},
{
"BriefDescription": "SIMD saturated arithmetic micro-ops executed.",
"Counter": "0,1",
"EventCode": "0xB1",
"EventName": "SIMD_SAT_UOP_EXEC.S",
"SampleAfterValue": "2000000",
"UMask": "0x0"
"SampleAfterValue": "2000000"
},
{
"BriefDescription": "SIMD micro-ops retired (excluding stores).",
"Counter": "0,1",
"EventCode": "0xB0",
"EventName": "SIMD_UOPS_EXEC.AR",
"PEBS": "2",
......@@ -122,15 +103,12 @@
},
{
"BriefDescription": "SIMD micro-ops executed (excluding stores).",
"Counter": "0,1",
"EventCode": "0xB0",
"EventName": "SIMD_UOPS_EXEC.S",
"SampleAfterValue": "2000000",
"UMask": "0x0"
"SampleAfterValue": "2000000"
},
{
"BriefDescription": "SIMD packed arithmetic micro-ops retired",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.ARITHMETIC.AR",
"SampleAfterValue": "2000000",
......@@ -138,7 +116,6 @@
},
{
"BriefDescription": "SIMD packed arithmetic micro-ops executed",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.ARITHMETIC.S",
"SampleAfterValue": "2000000",
......@@ -146,7 +123,6 @@
},
{
"BriefDescription": "SIMD packed logical micro-ops retired",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.LOGICAL.AR",
"SampleAfterValue": "2000000",
......@@ -154,7 +130,6 @@
},
{
"BriefDescription": "SIMD packed logical micro-ops executed",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.LOGICAL.S",
"SampleAfterValue": "2000000",
......@@ -162,7 +137,6 @@
},
{
"BriefDescription": "SIMD packed multiply micro-ops retired",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.MUL.AR",
"SampleAfterValue": "2000000",
......@@ -170,7 +144,6 @@
},
{
"BriefDescription": "SIMD packed multiply micro-ops executed",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.MUL.S",
"SampleAfterValue": "2000000",
......@@ -178,7 +151,6 @@
},
{
"BriefDescription": "SIMD packed micro-ops retired",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.PACK.AR",
"SampleAfterValue": "2000000",
......@@ -186,7 +158,6 @@
},
{
"BriefDescription": "SIMD packed micro-ops executed",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.PACK.S",
"SampleAfterValue": "2000000",
......@@ -194,7 +165,6 @@
},
{
"BriefDescription": "SIMD packed shift micro-ops retired",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.SHIFT.AR",
"SampleAfterValue": "2000000",
......@@ -202,7 +172,6 @@
},
{
"BriefDescription": "SIMD packed shift micro-ops executed",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.SHIFT.S",
"SampleAfterValue": "2000000",
......@@ -210,7 +179,6 @@
},
{
"BriefDescription": "SIMD unpacked micro-ops retired",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.UNPACK.AR",
"SampleAfterValue": "2000000",
......@@ -218,7 +186,6 @@
},
{
"BriefDescription": "SIMD unpacked micro-ops executed",
"Counter": "0,1",
"EventCode": "0xB3",
"EventName": "SIMD_UOP_TYPE_EXEC.UNPACK.S",
"SampleAfterValue": "2000000",
......@@ -226,7 +193,6 @@
},
{
"BriefDescription": "Floating point computational micro-ops retired.",
"Counter": "0,1",
"EventCode": "0x10",
"EventName": "X87_COMP_OPS_EXE.ANY.AR",
"PEBS": "2",
......@@ -235,7 +201,6 @@
},
{
"BriefDescription": "Floating point computational micro-ops executed.",
"Counter": "0,1",
"EventCode": "0x10",
"EventName": "X87_COMP_OPS_EXE.ANY.S",
"SampleAfterValue": "2000000",
......@@ -243,7 +208,6 @@
},
{
"BriefDescription": "FXCH uops retired.",
"Counter": "0,1",
"EventCode": "0x10",
"EventName": "X87_COMP_OPS_EXE.FXCH.AR",
"PEBS": "2",
......@@ -252,7 +216,6 @@
},
{
"BriefDescription": "FXCH uops executed.",
"Counter": "0,1",
"EventCode": "0x10",
"EventName": "X87_COMP_OPS_EXE.FXCH.S",
"SampleAfterValue": "2000000",
......
[
{
"BriefDescription": "Unhalted core cycles when the thread is in ring 0",
"Counter": "0,1,2,3",
"CounterHTOff": "0,1,2,3,4,5,6,7",
"EventCode": "0x5C",
"EventName": "CPL_CYCLES.RING0",
"PublicDescription": "This event counts the unhalted core cycles during which the thread is in the ring 0 privileged mode.",
......@@ -11,8 +9,6 @@
},
{
"BriefDescription": "Number of intervals between processor halts while thread is in ring 0",
"Counter": "0,1,2,3",
"CounterHTOff": "0,1,2,3,4,5,6,7",
"CounterMask": "1",
"EdgeDetect": "1",
"EventCode": "0x5C",
......@@ -23,8 +19,6 @@
},
{
"BriefDescription": "Unhalted core cycles when thread is in rings 1, 2, or 3",
"Counter": "0,1,2,3",
"CounterHTOff": "0,1,2,3,4,5,6,7",
"EventCode": "0x5C",
"EventName": "CPL_CYCLES.RING123",
"PublicDescription": "This event counts unhalted core cycles during which the thread is in rings 1, 2, or 3.",
......@@ -33,8 +27,6 @@
},
{
"BriefDescription": "Cycles when L1 and L2 are locked due to UC or split lock",
"Counter": "0,1,2,3",
"CounterHTOff": "0,1,2,3,4,5,6,7",
"EventCode": "0x63",
"EventName": "LOCK_CYCLES.SPLIT_LOCK_UC_LOCK_DURATION",
"PublicDescription": "This event counts cycles in which the L1 and L2 are locked due to a UC lock or split lock. A lock is asserted in case of locked memory access, due to noncacheable memory, locked operation that spans two cache lines, or a page walk from the noncacheable page table. L1D and L2 locks have a very high performance penalty and it is highly recommended to avoid such access.",
......
[
{
"BriefDescription": "Cycles the FP divide unit is busy",
"CollectPEBSRecord": "1",
"Counter": "0,1,2,3",
"EventCode": "0xCD",
"EventName": "CYCLES_DIV_BUSY.FPDIV",
"PublicDescription": "Counts core cycles the floating point divide unit is busy.",
......@@ -11,8 +9,6 @@
},
{
"BriefDescription": "Machine clears due to FP assists",
"CollectPEBSRecord": "1",
"Counter": "0,1,2,3",
"EventCode": "0xC3",
"EventName": "MACHINE_CLEARS.FP_ASSIST",
"PublicDescription": "Counts machine clears due to floating point (FP) operations needing assists. For instance, if the result was a floating point denormal, the hardware clears the pipeline and reissues uops to produce the correct IEEE compliant denormal result.",
......@@ -21,8 +17,6 @@
},
{
"BriefDescription": "Floating point divide uops retired. (Precise Event Capable)",
"CollectPEBSRecord": "1",
"Counter": "0,1,2,3",
"EventCode": "0xC2",
"EventName": "UOPS_RETIRED.FPDIV",
"PEBS": "2",
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment