1. 12 Apr, 2024 1 commit
  2. 08 Apr, 2024 11 commits
  3. 05 Apr, 2024 2 commits
    • Andi Kleen's avatar
      perf script: Add capstone support for '-F +brstackdisasm' · d8120446
      Andi Kleen authored
      Support capstone output for the '-F +brstackinsn' branch dump.
      
      The new output is enabled with the new field 'brstackdisasm'.
      
      This was possible before with --xed, but now also allow it for users
      that don't have xed using the builtin capstone support.
      
      Before:
      
        perf record -b emacs -Q --batch '()'
        perf script -F +brstackinsn
        ...
                  emacs   55778 1814366.755945:     151564 cycles:P:      7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s>        intel_check_word.constprop.0+237:
                00007f0ab2d1711d        insn: 75 e6                     # PRED 3 cycles [3]
                00007f0ab2d17105        insn: 73 51
                00007f0ab2d17107        insn: 48 89 c1
                00007f0ab2d1710a        insn: 48 39 ca
                00007f0ab2d1710d        insn: 73 96
                00007f0ab2d1710f        insn: 48 8d 04 11
                00007f0ab2d17113        insn: 48 d1 e8
                00007f0ab2d17116        insn: 49 8d 34 c1
                00007f0ab2d1711a        insn: 44 3a 06
                00007f0ab2d1711d        insn: 75 e6                     # PRED 3 cycles [6] 3.00 IPC
                00007f0ab2d17105        insn: 73 51                     # PRED 1 cycles [7] 1.00 IPC
                00007f0ab2d17158        insn: 48 8d 50 01
                00007f0ab2d1715c        insn: eb 92                     # PRED 1 cycles [8] 2.00 IPC
                00007f0ab2d170f0        insn: 48 39 ca
                00007f0ab2d170f3        insn: 73 b0                     # PRED 1 cycles [9] 2.00 IPC
      
      After (perf must be compiled with capstone):
      
        perf script -F +brstackdisasm
      
        ...
                   emacs   55778 1814366.755945:     151564 cycles:P:      7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s>        intel_check_word.constprop.0+237:
                00007f0ab2d1711d        jne intel_check_word.constprop.0+0xd5   # PRED 3 cycles [3]
                00007f0ab2d17105        jae intel_check_word.constprop.0+0x128
                00007f0ab2d17107        movq %rax, %rcx
                00007f0ab2d1710a        cmpq %rcx, %rdx
                00007f0ab2d1710d        jae intel_check_word.constprop.0+0x75
                00007f0ab2d1710f        leaq (%rcx, %rdx), %rax
                00007f0ab2d17113        shrq $1, %rax
                00007f0ab2d17116        leaq (%r9, %rax, 8), %rsi
                00007f0ab2d1711a        cmpb (%rsi), %r8b
                00007f0ab2d1711d        jne intel_check_word.constprop.0+0xd5   # PRED 3 cycles [6] 3.00 IPC
                00007f0ab2d17105        jae intel_check_word.constprop.0+0x128  # PRED 1 cycles [7] 1.00 IPC
                00007f0ab2d17158        leaq 1(%rax), %rdx
                00007f0ab2d1715c        jmp intel_check_word.constprop.0+0xc0   # PRED 1 cycles [8] 2.00 IPC
                00007f0ab2d170f0        cmpq %rcx, %rdx
                00007f0ab2d170f3        jae intel_check_word.constprop.0+0x75   # PRED 1 cycles [9] 2.00 IPC
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Link: https://lore.kernel.org/r/20240401210925.209671-3-ak@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d8120446
    • Andi Kleen's avatar
      perf script: Support 32bit code under 64bit OS with capstone · 38ab6013
      Andi Kleen authored
      Use the DSO to resolve whether an IP is 32bit or 64bit and use that to
      configure capstone to the correct mode. This allows to correctly
      disassemble 32bit code under a 64bit OS.
      
        % cat > loop.c
        volatile int var;
        int main(void)
        {
        	int i;
        	for (i = 0; i < 100000; i++)
        		var++;
        }
        % gcc -m32 -o loop loop.c
        % perf record -e cycles:u ./loop
        % perf script -F +disasm
          loop   82665 1833176.618023:      1 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618029:      1 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618031:      7 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618034:     91 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
          loop   82665 1833176.618036:   1242 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Link: https://lore.kernel.org/r/20240401210925.209671-2-ak@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      38ab6013
  4. 04 Apr, 2024 2 commits
    • Thomas Richter's avatar
      perf stat: Do not fail on metrics on s390 z/VM systems · c2f3d7df
      Thomas Richter authored
      On s390 z/VM virtual machines command 'perf list' also displays metrics:
      
        # perf list | grep -A 20 'Metric Groups:'
        Metric Groups:
      
        No_group:
         cpi
              [Cycles per Instruction]
         est_cpi
              [Estimated Instruction Complexity CPI infinite Level 1]
         finite_cpi
              [Cycles per Instructions from Finite cache/memory]
         l1mp
              [Level One Miss per 100 Instructions]
         l2p
              [Percentage sourced from Level 2 cache]
         l3p
              [Percentage sourced from Level 3 on same chip cache]
         l4lp
              [Percentage sourced from Level 4 Local cache on same book]
         l4rp
              [Percentage sourced from Level 4 Remote cache on different book]
         memp
              [Percentage sourced from memory]
         ....
        #
      
      The command
      
        # perf stat -M cpi -- true
        event syntax error: '{CPU_CYCLES/metric-id=CPU_CYCLES/.....'
                              \___ Bad event or PMU
      
        Unable to find PMU or event on a PMU of 'CPU_CYCLES'
      
         event syntax error: '{CPU_CYCLES/metric-id=CPU_CYCLES/...'
                              \___ Cannot find PMU `CPU_CYCLES'.
                                   Missing kernel support?
       #
      
      fails. 'perf stat' should not fail on metrics when the referenced CPU
      Counter Measurement PMU is not available.
      
      Output after:
      
        # perf stat -M est_cpi -- sleep 1
      
        Performance counter stats for 'sleep 1':
      
           1,000,887,494 ns   duration_time   #     0.00 est_cpi
      
             1.000887494 seconds time elapsed
      
             0.000143000 seconds user
             0.000662000 seconds sys
      
       #
      
      Fixes: 7f76b311 ("perf list: Add IBM z16 event description for s390")
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240404064806.1362876-2-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c2f3d7df
    • Thomas Richter's avatar
      perf report: Fix PAI counter names for s390 virtual machines · b74bc5a6
      Thomas Richter authored
      s390 introduced the Processor Activity Instrumentation (PAI) counter
      facility on LPAR and virtual machines z/VM for models 3931 and 3932.
      
      These counters are stored as raw data in the perf.data file and are
      displayed with:
      
       # perf report -i /tmp//perfout-635468 -D | grep Counter
      	Counter:007 <unknown> Value:0x00000000000186a0
      	Counter:032 <unknown> Value:0x0000000000000001
      	Counter:032 <unknown> Value:0x0000000000000001
      	Counter:032 <unknown> Value:0x0000000000000001
       #
      
      However on z/VM virtual machines, the counter names are not retrieved
      from the PMU and are shown as '<unknown>'.  This is caused by the CPU
      string saved in the mapfile.csv for this machine:
      
         ^IBM.393[12].*3\.7.[[:xdigit:]]+$,3,cf_z16,core
      
      This string contains the CPU Measurement facility first and second
      version number and authorization level (3\.7.[[:xdigit:]]+).  These
      numbers do not apply to the PAI counter facility.  In fact they can be
      omitted.
      
      Shorten the CPU identification string for this machine to manufacturer
      and model. This is sufficient for all PMU devices.
      
      Output after:
      
       # perf report -i /tmp//perfout-635468 -D | grep Counter
      	Counter:007 km_aes_128 Value:0x00000000000186a0
      	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
      	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
      	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
       #
      
      Fixes: b539deaf ("perf report: Add s390 raw data interpretation for PAI counters")
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Acked-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240404064806.1362876-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b74bc5a6
  5. 03 Apr, 2024 13 commits
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Initialize 'arch' variable not to trip some -Werror=maybe-uninitialized · b6347cb5
      Arnaldo Carvalho de Melo authored
      In some older distros the build is failing due to
      -Werror=maybe-uninitialized, in this case we know that this isn't the
      case because 'arch' gets initialized by evsel__get_arch(), so make sure
      it is initialized to NULL before returning from evsel__get_arch(), as
      suggested by Ian Rogers.
      
      E.g.:
      
          32    17.12 opensuse:15.5                 : FAIL gcc version 7.5.0 (SUSE Linux)
              util/annotate.c: In function 'hist_entry__get_data_type':
          util/annotate.c:2269:15: error: 'arch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
            struct arch *arch;
                         ^~~~
          cc1: all warnings being treated as errors
      
            43     7.30 ubuntu:18.04-x-powerpc64el    : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
          util/annotate.c: In function 'hist_entry__get_data_type':
          util/annotate.c:2351:36: error: 'arch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
             if (map__dso(ms->map)->kernel && arch__is(arch, "x86") &&
                                              ^~~~~~~~~~~~~~~~~~~~~
          cc1: all warnings being treated as errors
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/CAP-5=fUqtjxAsmdGrnkjhUTLHs-JvV10TtxyocpYDJK_+LYTiQ@mail.gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b6347cb5
    • Yang Jihong's avatar
      perf build: Add LIBTRACEEVENT_DIR build option · baa2ca59
      Yang Jihong authored
      Currently, when libtraceevent is not linked,
      perf does not support tracepoint:
      
        # ./perf record -e sched:sched_switch -a sleep 10
        event syntax error: 'sched:sched_switch'
                             \___ unsupported tracepoint
      
        libtraceevent is necessary for tracepoint support
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      For cross-compilation scenario, library may not be installed in the default
      system path. Based on the above requirements, add LIBTRACEEVENT_DIR build
      option to support specifying path of libtraceevent.
      
      Example:
      
        1. Cross compile libtraceevent
        # cd /opt/libtraceevent
        # CROSS_COMPILE=aarch64-linux-gnu- make
      
        2. Cross compile perf
        # cd tool/perf
        # make VF=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- NO_LIBELF=1 LDFLAGS=--static LIBTRACEEVENT_DIR=/opt/libtraceevent
        <SNIP>
        Auto-detecting system features:
        <SNIP>
        ...                       LIBTRACEEVENT_DIR: /opt/libtraceevent
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarYang Jihong <yangjihong@bytedance.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240314063000.2139877-1-yangjihong@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      baa2ca59
    • Yang Jihong's avatar
      perf beauty: Fix AT_EACCESS undeclared build error for system with kernel versions lower than v5.8 · 089ef2f4
      Yang Jihong authored
      In the environment of ubuntu 20.04 (the version of kernel headers is
      5.4), there is an error in building perf:
      
          CC      trace/beauty/fs_at_flags.o
        trace/beauty/fs_at_flags.c: In function ‘faccessat2__scnprintf_flags’:
        trace/beauty/fs_at_flags.c:35:14: error: ‘AT_EACCESS’ undeclared (first use in this function); did you mean ‘DN_ACCESS’?
           35 |  if (flags & AT_EACCESS) {
              |              ^~~~~~~~~~
              |              DN_ACCESS
        trace/beauty/fs_at_flags.c:35:14: note: each undeclared identifier is reported only once for each function it appears in
      
      commit 8a1ad441 ("tools headers: Remove now unused copies of
      uapi/{fcntl,openat2}.h and asm/fcntl.h") removes fcntl.h from tools
      headers directory, and fs_at_flags.c uses the 'AT_EACCESS' macro.
      
      This macro was introduced in the kernel version v5.8.  For system with a
      kernel version older than this version, it will cause compilation to
      fail.
      
      Fixes: 8a1ad441 ("tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and asm/fcntl.h")
      Signed-off-by: default avatarYang Jihong <yangjihong@bytedance.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240403122558.1438841-1-yangjihong@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      089ef2f4
    • Namhyung Kim's avatar
      perf annotate: Add symbol name when using capstone · 92dfc594
      Namhyung Kim authored
      This is to keep the existing behavior with objdump.  It needs to show
      symbol information of global variables like below:
      
         Percent |      Source code & Disassembly of elf for cycles:P (1 samples, percent: local period)
        ------------------------------------------------------------------------------------------------
                 : 0                0xffffffff81338f70 <vm_normal_page>:
            0.00 :   ffffffff81338f70:       endbr64
            0.00 :   ffffffff81338f74:       callq   0xffffffff81083a40
            0.00 :   ffffffff81338f79:       movq    %rdi, %r8
            0.00 :   ffffffff81338f7c:       movq    %rdx, %rdi
            0.00 :   ffffffff81338f7f:       callq   *0x17021c3(%rip)   # ffffffff82a3b148 <pv_ops+0x1e8>
            0.00 :   ffffffff81338f85:       movq    0xffbf3c(%rip), %rdx       # ffffffff82334ec8 <physical_mask>
            0.00 :   ffffffff81338f8c:       testq   %rax, %rax                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            0.00 :   ffffffff81338f8f:       je      0xffffffff81338fd0                         here
            0.00 :   ffffffff81338f91:       movq    %rax, %rcx
            0.00 :   ffffffff81338f94:       andl    $1, %ecx
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      92dfc594
    • Namhyung Kim's avatar
      perf annotate: Use libcapstone to disassemble · 6d17edc1
      Namhyung Kim authored
      Now it can use the capstone library to disassemble the instructions.
      Let's use that (if available) for perf annotate to speed up.  Currently
      it only supports x86 architecture.  With this change I can see ~3x speed
      up in data type profiling.
      
      But note that capstone cannot give the source file and line number info.
      For now, users should use the external objdump for that by specifying
      the --objdump option explicitly.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6d17edc1
    • Namhyung Kim's avatar
      perf annotate: Split out util/disasm.c · 98f69a57
      Namhyung Kim authored
      The util/annotate.c code has both disassembly and sample annotation
      related codes.  Factor out the disasm part so that it can be handled
      more easily.
      
      No functional changes intended.
      
      Committer notes:
      
      Add missing include env.h, util.h, bpf-event.h and bpf-util.h to
      disasm.c, to fix things like:
      
        util/disasm.c: In function ‘symbol__disassemble_bpf’:
        util/disasm.c:1203:9: error: implicit declaration of function ‘perf_exe’ [-Werror=implicit-function-declaration]
         1203 |         perf_exe(tpath, sizeof(tpath));
              |         ^~~~~~~~
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      98f69a57
    • Namhyung Kim's avatar
      perf annotate: Add and use ins__is_nop() · 10adbf77
      Namhyung Kim authored
      Likewise, add ins__is_nop() to check if the current instruction is NOP.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      10adbf77
    • Namhyung Kim's avatar
      perf annotate: Use ins__is_xxx() if possible · ad399baa
      Namhyung Kim authored
      This is to prepare separation of disasm related code.  Use the public
      ins API instead of checking the internal data structure.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240329215812.537846-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ad399baa
    • Yang Jihong's avatar
      perf evsel: Use evsel__name_is() helper · 09d2056e
      Yang Jihong authored
      Code cleanup, replace strcmp(evsel__name(evsel, {NAME})) with
      evsel__name_is() helper.
      
      No functional change.
      
      Committer notes:
      
      Fix this build error:
      
                trace.syscalls.events.bpf_output = evlist__last(trace.evlist);
        -       assert(evsel__name_is(trace.syscalls.events.bpf_output), "__augmented_syscalls__");
        +       assert(evsel__name_is(trace.syscalls.events.bpf_output, "__augmented_syscalls__"));
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarYang Jihong <yangjihong@bytedance.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240401062724.1006010-3-yangjihong@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      09d2056e
    • Yang Jihong's avatar
      perf sched timehist: Fix -g/--call-graph option failure · 6e4b3987
      Yang Jihong authored
      When 'perf sched' enables the call-graph recording, sample_type of dummy
      event does not have PERF_SAMPLE_CALLCHAIN, timehist_check_attr() checks
      that the evsel does not have a callchain, and set show_callchain to 0.
      
      Currently 'perf sched timehist' only saves callchain when processing the
      'sched:sched_switch event', timehist_check_attr() only needs to determine
      whether the event has PERF_SAMPLE_CALLCHAIN.
      
      Before:
      
        # perf sched record -g true
        [ perf record: Woken up 0 times to write data ]
        [ perf record: Captured and wrote 4.153 MB perf.data (7536 samples) ]
        # perf sched timehist
        Samples do not have callchains.
                   time    cpu  task name                       wait time  sch delay   run time
                                [tid/pid]                          (msec)     (msec)     (msec)
        --------------- ------  ------------------------------  ---------  ---------  ---------
          147851.826019 [0000]  perf[285035]                        0.000      0.000      0.000
          147851.826029 [0000]  migration/0[15]                     0.000      0.003      0.009
          147851.826063 [0001]  perf[285035]                        0.000      0.000      0.000
          147851.826069 [0001]  migration/1[21]                     0.000      0.003      0.006
        <SNIP>
      
      After:
      
        # perf sched record -g true
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.572 MB perf.data (822 samples) ]
        # perf sched timehist
               time cpu task name        waittime  sch delay  runtime
                          [tid/pid]        (msec)  (msec)    (msec)
        ----------- --- ---------------  --------  --------  -----
        4193.035164 [0] perf[277062]        0.000     0.000   0.000 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- preempt_schedule_common <- __cond_resched <- __wait_for_common <- wait_for_completion
        4193.035174 [0] migration/0[15]     0.000     0.003   0.009 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- smpboot_thread_fn <- kthread <- ret_from_fork
        4193.035207 [1] perf[277062]        0.000     0.000   0.000 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- preempt_schedule_common <- __cond_resched <- __wait_for_common <- wait_for_completion
        4193.035214 [1] migration/1[21]     0.000     0.003   0.007 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- smpboot_thread_fn <- kthread <- ret_from_fork
        <SNIP>
      
      Fixes: 9c95e4ef ("perf evlist: Add evlist__findnew_tracking_event() helper")
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarYang Jihong <yangjihong@bytedance.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/r/20240401062724.1006010-2-yangjihong@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6e4b3987
    • Namhyung Kim's avatar
      perf annotate: Honor output options with --data-type · bdeaf6ff
      Namhyung Kim authored
      For data type profiling output, it should be in sync with normal output
      so make it display percentage for each field.  Also use coloring scheme
      for users to identify fields with big overhead easily.
      
      Users can use --show-total-period or --show-nr-samples to change the
      output style like in the normal perf annotate output.
      
      Before:
      
        $ perf annotate --data-type
        Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples):
        ============================================================================
            samples     offset       size  field
                 34          0       9792  struct task_struct    {
                  2          0         24      struct thread_info       thread_info {
                  0          0          8          long unsigned int    flags;
                  1          8          8          long unsigned int    syscall_work;
                  0         16          4          u32  status;
                  1         20          4          u32  cpu;
                                               };
      
      After:
      
        $ perf annotate --data-type
        Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples):
        ============================================================================
         Percent     offset       size  field
          100.00          0       9792  struct task_struct       {
            3.55          0         24      struct thread_info  thread_info {
            0.00          0          8          long unsigned int       flags;
            1.63          8          8          long unsigned int       syscall_work;
            0.00         16          4          u32     status;
            1.91         20          4          u32     cpu;
                                            };
      
      Committer testing:
      
      First collect a suitable perf.data file for use with 'perf annotate --data-type':
      
        root@number:~# perf mem record -a sleep 1s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 11.047 MB perf.data (3466 samples) ]
        root@number:~#
      
      Then, before:
      
        root@number:~# perf annotate --data-type
        Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples):
        ============================================================================
            samples     offset       size  field
                  6          0         40  union         {
                  6          0         40      struct __pthread_mutex_s __data {
                  2          0          4          int  __lock;
                  0          4          4          unsigned int __count;
                  0          8          4          int  __owner;
                  1         12          4          unsigned int __nusers;
                  2         16          4          int  __kind;
                  1         20          2          short int    __spins;
                  0         22          2          short int    __elision;
                  0         24         16          __pthread_list_t     __list {
                  0         24          8              struct __pthread_internal_list*  __prev;
                  0         32          8              struct __pthread_internal_list*  __next;
                                                   };
                                               };
                  0          0          0      char*    __size;
                  2          0          8      long int __align;
                                           };
        <SNIP>
      
      And after:
      
        Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples):
        ============================================================================
         Percent     offset       size  field
          100.00          0         40  union    {
          100.00          0         40      struct __pthread_mutex_s    __data {
           31.27          0          4          int     __lock;
            0.00          4          4          unsigned int    __count;
            0.00          8          4          int     __owner;
            7.67         12          4          unsigned int    __nusers;
           53.10         16          4          int     __kind;
            7.96         20          2          short int       __spins;
            0.00         22          2          short int       __elision;
            0.00         24         16          __pthread_list_t        __list {
            0.00         24          8              struct __pthread_internal_list*     __prev;
            0.00         32          8              struct __pthread_internal_list*     __next;
                                                };
                                            };
            0.00          0          0      char*       __size;
           31.27          0          8      long int    __align;
                                        };
        <SNIP>
      
      The lines with percentages >= 7.67 have its percentages red colored.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240322224313.423181-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bdeaf6ff
    • Namhyung Kim's avatar
      perf annotate: Get rid of duplicate --group option item · 374af9f1
      Namhyung Kim authored
      The options array in cmd_annotate() has duplicate --group options.  It
      only needs one and let's get rid of the other.
      
        $ perf annotate -h 2>&1 | grep group
              --group           Show event group information together
              --group           Show event group information together
      
      Fixes: 7ebaf489 ("perf annotate: Support '--group' option")
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240322224313.423181-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      374af9f1
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Add Kan Liang to MAINTAINERS as a reviewer · f7a0674e
      Arnaldo Carvalho de Melo authored
      Kan has been reviewing patches regularly, add him as a perf tools
      reviewer so that people CC him on new patches.
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatar"Liang, Kan" <kan.liang@linux.intel.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f7a0674e
  6. 21 Mar, 2024 11 commits
    • Arnaldo Carvalho de Melo's avatar
      perf beauty: Move uapi/linux/vhost.h copy out of the directory used to build perf · 4962e194
      Arnaldo Carvalho de Melo authored
      It is only used to generate string tables, not to build perf, so move it
      to the tools/perf/trace/beauty/include/ hierarchy, that is used just for
      scraping.
      
      This is a something that should've have happened, as happened with the
      linux/socket.h scrapper, do it now as Ian suggested while doing an
      audit/refactor session in the headers used by perf.
      
      No other tools/ living code uses it, just <linux/vhost.h> coming from
      either 'make install_headers' or from the system /usr/include/
      directory.
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4962e194
    • Ian Rogers's avatar
      perf dso: Reorder members to save space in 'struct dso' · b3ad832d
      Ian Rogers authored
      Save 40 bytes and move from 8 to 7 cache lines. Make member dwfl
      dependent on being a powerpc build. Squeeze bits of int/enum types
      when appropriate. Remove holes/padding by reordering variables.
      
      Before:
      
        struct dso {
                struct mutex               lock;                 /*     0    40 */
                struct list_head           node;                 /*    40    16 */
                struct rb_node             rb_node __attribute__((__aligned__(8))); /*    56    24 */
                /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
                struct rb_root *           root;                 /*    80     8 */
                struct rb_root_cached      symbols;              /*    88    16 */
                struct symbol * *          symbol_names;         /*   104     8 */
                size_t                     symbol_names_len;     /*   112     8 */
                struct rb_root_cached      inlined_nodes;        /*   120    16 */
                /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
                struct rb_root_cached      srclines;             /*   136    16 */
                struct {
                        u64                addr;                 /*   152     8 */
                        struct symbol *    symbol;               /*   160     8 */
                } last_find_result;                              /*   152    16 */
                void *                     a2l;                  /*   168     8 */
                char *                     symsrc_filename;      /*   176     8 */
                unsigned int               a2l_fails;            /*   184     4 */
                enum dso_space_type        kernel;               /*   188     4 */
                /* --- cacheline 3 boundary (192 bytes) --- */
                _Bool                      is_kmod;              /*   192     1 */
      
                /* XXX 3 bytes hole, try to pack */
      
                enum dso_swap_type         needs_swap;           /*   196     4 */
                enum dso_binary_type       symtab_type;          /*   200     4 */
                enum dso_binary_type       binary_type;          /*   204     4 */
                enum dso_load_errno        load_errno;           /*   208     4 */
                u8                         adjust_symbols:1;     /*   212: 0  1 */
                u8                         has_build_id:1;       /*   212: 1  1 */
                u8                         header_build_id:1;    /*   212: 2  1 */
                u8                         has_srcline:1;        /*   212: 3  1 */
                u8                         hit:1;                /*   212: 4  1 */
                u8                         annotate_warned:1;    /*   212: 5  1 */
                u8                         auxtrace_warned:1;    /*   212: 6  1 */
                u8                         short_name_allocated:1; /*   212: 7  1 */
                u8                         long_name_allocated:1; /*   213: 0  1 */
                u8                         is_64_bit:1;          /*   213: 1  1 */
      
                /* XXX 6 bits hole, try to pack */
      
                _Bool                      sorted_by_name;       /*   214     1 */
                _Bool                      loaded;               /*   215     1 */
                u8                         rel;                  /*   216     1 */
      
                /* XXX 7 bytes hole, try to pack */
      
                struct build_id            bid;                  /*   224    32 */
                /* --- cacheline 4 boundary (256 bytes) --- */
                u64                        text_offset;          /*   256     8 */
                u64                        text_end;             /*   264     8 */
                const char  *              short_name;           /*   272     8 */
                const char  *              long_name;            /*   280     8 */
                u16                        long_name_len;        /*   288     2 */
                u16                        short_name_len;       /*   290     2 */
      
                /* XXX 4 bytes hole, try to pack */
      
                void *                     dwfl;                 /*   296     8 */
                struct auxtrace_cache *    auxtrace_cache;       /*   304     8 */
                int                        comp;                 /*   312     4 */
      
                /* XXX 4 bytes hole, try to pack */
      
                /* --- cacheline 5 boundary (320 bytes) --- */
                struct {
                        struct rb_root     cache;                /*   320     8 */
                        int                fd;                   /*   328     4 */
                        int                status;               /*   332     4 */
                        u32                status_seen;          /*   336     4 */
      
                        /* XXX 4 bytes hole, try to pack */
      
                        u64                file_size;            /*   344     8 */
                        struct list_head   open_entry;           /*   352    16 */
                        u64                elf_base_addr;        /*   368     8 */
                        u64                debug_frame_offset;   /*   376     8 */
                        /* --- cacheline 6 boundary (384 bytes) --- */
                        u64                eh_frame_hdr_addr;    /*   384     8 */
                        u64                eh_frame_hdr_offset;  /*   392     8 */
                } data;                                          /*   320    80 */
                struct {
                        u32                id;                   /*   400     4 */
                        u32                sub_id;               /*   404     4 */
                        struct perf_env *  env;                  /*   408     8 */
                } bpf_prog;                                      /*   400    16 */
                union {
                        void *             priv;                 /*   416     8 */
                        u64                db_id;                /*   416     8 */
                };                                               /*   416     8 */
                struct nsinfo *            nsinfo;               /*   424     8 */
                struct dso_id              id;                   /*   432    24 */
                /* --- cacheline 7 boundary (448 bytes) was 8 bytes ago --- */
                refcount_t                 refcnt;               /*   456     4 */
                char                       name[];               /*   460     0 */
      
                /* size: 464, cachelines: 8, members: 49 */
                /* sum members: 440, holes: 4, sum holes: 18 */
                /* sum bitfield members: 10 bits, bit holes: 1, sum bit holes: 6 bits */
                /* padding: 4 */
                /* forced alignments: 1 */
                /* last cacheline: 16 bytes */
        } __attribute__((__aligned__(8)));
      
      After:
      
        struct dso {
                struct mutex               lock;                 /*     0    40 */
                struct list_head           node;                 /*    40    16 */
                struct rb_node             rb_node __attribute__((__aligned__(8))); /*    56    24 */
                /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
                struct rb_root *           root;                 /*    80     8 */
                struct rb_root_cached      symbols;              /*    88    16 */
                struct symbol * *          symbol_names;         /*   104     8 */
                size_t                     symbol_names_len;     /*   112     8 */
                struct rb_root_cached      inlined_nodes;        /*   120    16 */
                /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
                struct rb_root_cached      srclines;             /*   136    16 */
                struct {
                        u64                addr;                 /*   152     8 */
                        struct symbol *    symbol;               /*   160     8 */
                } last_find_result;                              /*   152    16 */
                struct build_id            bid;                  /*   168    32 */
                /* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */
                u64                        text_offset;          /*   200     8 */
                u64                        text_end;             /*   208     8 */
                const char  *              short_name;           /*   216     8 */
                const char  *              long_name;            /*   224     8 */
                void *                     a2l;                  /*   232     8 */
                char *                     symsrc_filename;      /*   240     8 */
                struct nsinfo *            nsinfo;               /*   248     8 */
                /* --- cacheline 4 boundary (256 bytes) --- */
                struct auxtrace_cache *    auxtrace_cache;       /*   256     8 */
                union {
                        void *             priv;                 /*   264     8 */
                        u64                db_id;                /*   264     8 */
                };                                               /*   264     8 */
                struct {
                        struct perf_env *  env;                  /*   272     8 */
                        u32                id;                   /*   280     4 */
                        u32                sub_id;               /*   284     4 */
                } bpf_prog;                                      /*   272    16 */
                struct {
                        struct rb_root     cache;                /*   288     8 */
                        struct list_head   open_entry;           /*   296    16 */
                        u64                file_size;            /*   312     8 */
                        /* --- cacheline 5 boundary (320 bytes) --- */
                        u64                elf_base_addr;        /*   320     8 */
                        u64                debug_frame_offset;   /*   328     8 */
                        u64                eh_frame_hdr_addr;    /*   336     8 */
                        u64                eh_frame_hdr_offset;  /*   344     8 */
                        int                fd;                   /*   352     4 */
                        int                status;               /*   356     4 */
                        u32                status_seen;          /*   360     4 */
                } data;                                          /*   288    80 */
      
                /* XXX last struct has 4 bytes of padding */
      
                struct dso_id              id;                   /*   368    24 */
                /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
                unsigned int               a2l_fails;            /*   392     4 */
                int                        comp;                 /*   396     4 */
                refcount_t                 refcnt;               /*   400     4 */
                enum dso_load_errno        load_errno;           /*   404     4 */
                u16                        long_name_len;        /*   408     2 */
                u16                        short_name_len;       /*   410     2 */
                enum dso_binary_type       symtab_type:8;        /*   412: 0  4 */
                enum dso_binary_type       binary_type:8;        /*   412: 8  4 */
                enum dso_space_type        kernel:2;             /*   412:16  4 */
                enum dso_swap_type         needs_swap:2;         /*   412:18  4 */
      
                /* Bitfield combined with next fields */
      
                _Bool                      is_kmod:1;            /*   414: 4  1 */
                u8                         adjust_symbols:1;     /*   414: 5  1 */
                u8                         has_build_id:1;       /*   414: 6  1 */
                u8                         header_build_id:1;    /*   414: 7  1 */
                u8                         has_srcline:1;        /*   415: 0  1 */
                u8                         hit:1;                /*   415: 1  1 */
                u8                         annotate_warned:1;    /*   415: 2  1 */
                u8                         auxtrace_warned:1;    /*   415: 3  1 */
                u8                         short_name_allocated:1; /*   415: 4  1 */
                u8                         long_name_allocated:1; /*   415: 5  1 */
                u8                         is_64_bit:1;          /*   415: 6  1 */
      
                /* XXX 1 bit hole, try to pack */
      
                _Bool                      sorted_by_name;       /*   416     1 */
                _Bool                      loaded;               /*   417     1 */
                u8                         rel;                  /*   418     1 */
                char                       name[];               /*   419     0 */
      
                /* size: 424, cachelines: 7, members: 48 */
                /* sum members: 415 */
                /* sum bitfield members: 31 bits, bit holes: 1, sum bit holes: 1 bits */
                /* padding: 5 */
                /* paddings: 1, sum paddings: 4 */
                /* forced alignments: 1 */
                /* last cacheline: 40 bytes */
        } __attribute__((__aligned__(8)));
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Chengen Du <chengen.du@canonical.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Li Dong <lidong@vivo.com>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Markus Elfring <Markus.Elfring@web.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paran Lee <p4ranlee@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
      Link: https://lore.kernel.org/r/20240321160300.1635121-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b3ad832d
    • Anne Macedo's avatar
      perf lock contention: Trim backtrace by skipping traceiter functions · 2a5049b7
      Anne Macedo authored
      The 'perf lock contention' program currently shows the caller of the locks
      as __traceiter_contention_begin+0x??. This caller can be ignored, as it is
      from the traceiter itself. Instead, it should show the real callers for
      the locks.
      
      When fiddling with the --stack-skip parameter, the actual callers for
      the locks start to show up. However, just ignore the
      __traceiter_contention_begin and the __traceiter_contention_end symbols
      so the actual callers will show up.
      
      Before this patch is applied:
      
      sudo perf lock con -a -b -- sleep 3
       contended   total wait     max wait     avg wait         type   caller
      
               8      2.33 s       2.28 s     291.18 ms     rwlock:W   __traceiter_contention_begin+0x44
               4      2.33 s       2.28 s     582.35 ms     rwlock:W   __traceiter_contention_begin+0x44
               7    140.30 ms     46.77 ms     20.04 ms     rwlock:W   __traceiter_contention_begin+0x44
               2     63.35 ms     33.76 ms     31.68 ms        mutex   trace_contention_begin+0x84
               2     46.74 ms     46.73 ms     23.37 ms     rwlock:W   __traceiter_contention_begin+0x44
               1     13.54 us     13.54 us     13.54 us        mutex   trace_contention_begin+0x84
               1      3.67 us      3.67 us      3.67 us      rwsem:R   __traceiter_contention_begin+0x44
      
      Before this patch is applied - using --stack-skip 5
      
      sudo perf lock con --stack-skip 5 -a -b -- sleep 3
       contended   total wait     max wait     avg wait         type   caller
      
               2      2.24 s       2.24 s       1.12 s      rwlock:W   do_epoll_wait+0x5a0
               4      1.65 s     824.21 ms    412.08 ms     rwlock:W   do_exit+0x338
               2    824.35 ms    824.29 ms    412.17 ms     spinlock   get_signal+0x108
               2    824.14 ms    824.14 ms    412.07 ms     rwlock:W   release_task+0x68
               1     25.22 ms     25.22 ms     25.22 ms        mutex   cgroup_kn_lock_live+0x58
               1     24.71 us     24.71 us     24.71 us     spinlock   do_exit+0x44
               1     22.04 us     22.04 us     22.04 us      rwsem:R   lock_mm_and_find_vma+0xb0
      
      After this patch is applied:
      
      sudo ./perf lock con -a -b -- sleep 3
       contended   total wait     max wait     avg wait         type   caller
      
               4      4.13 s       2.07 s       1.03 s      rwlock:W   release_task+0x68
               2      2.07 s       2.07 s       1.03 s      rwlock:R   mm_update_next_owner+0x50
               2      2.07 s       2.07 s       1.03 s      rwlock:W   do_exit+0x338
               1     41.56 ms     41.56 ms     41.56 ms        mutex   cgroup_kn_lock_live+0x58
               2     36.12 us     18.83 us     18.06 us     rwlock:W   do_exit+0x338
      Signed-off-by: default avatarAnne Macedo <retpolanne@posteo.net>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240319143629.3422590-1-retpolanne@posteo.netSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a5049b7
    • Ian Rogers's avatar
      perf vendor events intel: Remove info metrics erroneously in TopdownL1 · af34a16d
      Ian Rogers authored
      Bug affected server metrics only. This doesn't impact default metrics
      but if the TopdownL1 metric group is specified. Passes on the fix in:
      
        https://github.com/intel/perfmon/commit/b09f0a3953234ec592b4a872b87764c78da05d8bReviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Link: https://lore.kernel.org/r/20240321060016.1464787-13-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      af34a16d
    • Ian Rogers's avatar
      perf vendor events intel: Update snowridgex to 1.22 · 7bce27f8
      Ian Rogers authored
      Update events from 1.21 to 1.22 as released in:
      
        https://github.com/intel/perfmon/commit/ba4f96039f96231b51e3eb69d5a21e2b00f6de5b
      
      Updates various descriptions and removes the event
      UNC_IIO_NUM_REQ_FROM_CPU.IRP.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Link: https://lore.kernel.org/r/20240321060016.1464787-12-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7bce27f8
    • Ian Rogers's avatar
      perf vendor events intel: Update skylake to v58 · 70e7028c
      Ian Rogers authored
      Update events from:
      
        https://github.com/intel/perfmon/commit/f2e5136e062a91ae554dc40530132e66f9271848
      
      This change didn't increase the version number from v58.
      
      Updates various descriptions.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Link: https://lore.kernel.org/r/20240321060016.1464787-11-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      70e7028c
    • Ian Rogers's avatar
      perf vendor events intel: Update skylakex to 1.33 · d70cc755
      Ian Rogers authored
      Update events from 1.32 to 1.33 as released in:
      
        https://github.com/intel/perfmon/commit/3fe7390dd18496c35ec3a9cf17de0473fd5485cb
      
      Various description updates. Adds the event
      OFFCORE_RESPONSE.ALL_READS.L3_HIT.HIT_OTHER_CORE_FWD.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Link: https://lore.kernel.org/r/20240321060016.1464787-10-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d70cc755
    • Ian Rogers's avatar
      perf vendor events intel: Update sierraforest to 1.02 · bf270b15
      Ian Rogers authored
      Update events from 1.01 to 1.02 as released in:
      
        https://github.com/intel/perfmon/commit/451dd41ae627b56433ad4065bf3632789eb70834
      
      Various description updates. Adds topdown events
      TOPDOWN_BAD_SPECULATION.ALL_P, TOPDOWN_BE_BOUND.ALL_P,
      TOPDOWN_FE_BOUND.ALL_P and TOPDOWN_RETIRING.ALL_P.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Link: https://lore.kernel.org/r/20240321060016.1464787-9-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bf270b15
    • Ian Rogers's avatar
      perf vendor events intel: Update sapphirerapids to 1.20 · 2edee9e6
      Ian Rogers authored
      Update events from 1.17 to 1.20 as released in:
      
        https://github.com/intel/perfmon/commit/6f674057745acf0125395638ca6be36458a59bda
      
      Various description updates. Adds uncore events
      UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_LOCAL,
      UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_REMOTE,
      UNC_CHA_TOR_INSERTS.IO_ITOM_LOCAL, UNC_CHA_TOR_INSERTS.IO_ITOM_REMOTE,
      UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL,
      UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_REMOTE,
      UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOMCACHENEAR_LOCAL,
      UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOMCACHENEAR_REMOTE,
      UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOM_LOCAL,
      UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOM_REMOTE,
      UNC_CHA_TOR_OCCUPANCY.IO_MISS_PCIRDCUR_LOCAL,
      UNC_CHA_TOR_OCCUPANCY.IO_MISS_PCIRDCUR_REMOTE and removes core events
      AMX_OPS_RETIRED.BF16 and AMX_OPS_RETIRED.INT8.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Link: https://lore.kernel.org/r/20240321060016.1464787-8-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2edee9e6
    • Ian Rogers's avatar
      perf vendor events intel: Update meteorlake to 1.08 · 84d0e8c6
      Ian Rogers authored
      Update events from 1.07 to 1.08 as released in:
      
        https://github.com/intel/perfmon/commit/f0f8f3e163d9eb84e6ce8e2108a22cb43b2527e5
      
      Various description updates. Adds topdown, offcore and uncore events
      OCR.DEMAND_DATA_RD.L3_HIT, OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_NO_FWD,
      OCR.DEMAND_RFO.L3_HIT, OCR.DEMAND_DATA_RD.L3_MISS,
      OCR.DEMAND_RFO.L3_MISS, OCR.DEMAND_DATA_RD.ANY_RESPONSE,
      OCR.DEMAND_DATA_RD.DRAM, OCR.DEMAND_RFO.ANY_RESPONSE,
      OCR.DEMAND_RFO.DRAM, TOPDOWN_BAD_SPECULATION.ALL_P,
      TOPDOWN_BE_BOUND.ALL_P, TOPDOWN_FE_BOUND.ALL_P,
      TOPDOWN_RETIRING.ALL_P, UNC_ARB_DAT_OCCUPANCY.RD and
      UNC_HAC_ARB_COH_TRK_REQUESTS.ALL.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Link: https://lore.kernel.org/r/20240321060016.1464787-7-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      84d0e8c6
    • Ian Rogers's avatar
      perf vendor events intel: Update lunarlake to 1.01 · 3670ffbd
      Ian Rogers authored
      Update events from 1.00 to 1.01 as released in:
      
        https://github.com/intel/perfmon/commit/56ab8d837ac566d51a4d8748b6b4b817a22c9b84
      
      Various encoding and description updates. Adds the events
      CPU_CLK_UNHALTED.CORE, CPU_CLK_UNHALTED.CORE_P,
      CPU_CLK_UNHALTED.REF_TSC_P, CPU_CLK_UNHALTED.THREAD,
      MISC_RETIRED.LBR_INSERTS, TOPDOWN_BAD_SPECULATION.ALL_P,
      TOPDOWN_BE_BOUND.ALL_P, TOPDOWN_FE_BOUND.ALL_P,
      TOPDOWN_RETIRING.ALL_P.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Link: https://lore.kernel.org/r/20240321060016.1464787-6-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3670ffbd