1. 12 Aug, 2024 16 commits
    • Ian Rogers's avatar
      perf kmem: Use perf_tool__init · f32b37cc
      Ian Rogers authored
      Reduce the scope of the tool from global/static to just that of the
      cmd_kmem function where the session is scoped. Use the perf_tool__init()
      to initialize default values.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20240812204720.631678-7-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f32b37cc
    • Ian Rogers's avatar
      perf tool: Add perf_tool__init() · ae737b61
      Ian Rogers authored
      Add init function that behaves like perf_tool__fill_defaults() but
      assumes all values haven't been initialized.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20240812204720.631678-6-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ae737b61
    • Ian Rogers's avatar
      perf tool: Move fill defaults into tool.c · 564e5cbc
      Ian Rogers authored
      The aim here is to eventually make perf_tool__fill_defaults() an init
      function so that the tools struct is more const.
      
      Create a tool.c to go along with tool.h. Move perf_tool__fill_defaults()
      out of session.c into tool.c along with the default stub values. Add
      perf_tool__compressed_is_stub() for a test in
      perf_session__process_user_event().
      
      perf_session__process_compressed_event() is only used from being default
      initialized so migrate into tool.c.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20240812204720.631678-5-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      564e5cbc
    • Ian Rogers's avatar
      perf tool: Constify tool pointers · 30f29bae
      Ian Rogers authored
      The tool pointer (to a struct largely of function pointers) is passed
      around but is unchanged except at initialization. Change parameter and
      variable types to be const to lower the possibilities of what could
      happen with a tool.
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarLeo Yan <leo.yan@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      30f29bae
    • Ian Rogers's avatar
      perf s390-cpumsf: Remove unused struct · 1816dc4b
      Ian Rogers authored
      struct s390_cpumsf_synth was likely cargo culted from other auxtrace
      examples. It has no users, so remove.
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20240812204720.631678-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1816dc4b
    • Ian Rogers's avatar
      perf auxtrace: Remove dummy tools · 4e322c78
      Ian Rogers authored
      Add perf_session__deliver_synth_attr_event that synthesizes a
      perf_record_header_attr event with one id. Remove use of
      perf_event__synthesize_attr that necessitates the use of the dummy
      tool in order to pass the session.
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarLeo Yan <leo.yan@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20240812204720.631678-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4e322c78
    • Ian Rogers's avatar
      perf inject: Fix leader sampling inserting additional samples · 79bcd34e
      Ian Rogers authored
      The processing of leader samples would turn an individual sample with
      a group of read values into multiple samples. 'perf inject' would pass
      through the additional samples increasing the output data file size:
      
        $ perf record -g -e "{instructions,cycles}:S" -o perf.orig.data true
        $ perf script -D -i perf.orig.data | sed -e 's/perf.orig.data/perf.data/g' > orig.txt
        $ perf inject -i perf.orig.data -o perf.new.data
        $ perf script -D -i perf.new.data | sed -e 's/perf.new.data/perf.data/g' > new.txt
        $ diff -u orig.txt new.txt
        --- orig.txt    2024-07-29 14:29:40.606576769 -0700
        +++ new.txt     2024-07-29 14:30:04.142737434 -0700
        ...
        -0xc550@perf.data [0x30]: event: 3
        +0xc550@perf.data [0xd0]: event: 9
        +.
        +. ... raw event: size 208 bytes
        +.  0000:  09 00 00 00 01 00 d0 00 fc 72 01 86 ff ff ff ff  .........r......
        +.  0010:  74 7d 2c 00 74 7d 2c 00 fb c3 79 f9 ba d5 05 00  t},.t},...y.....
        +.  0020:  e6 cb 1a 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
        +.  0030:  02 00 00 00 00 00 00 00 76 01 00 00 00 00 00 00  ........v.......
        +.  0040:  e6 cb 1a 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        +.  0050:  62 18 00 00 00 00 00 00 f6 cb 1a 00 00 00 00 00  b...............
        +.  0060:  00 00 00 00 00 00 00 00 0c 00 00 00 00 00 00 00  ................
        +.  0070:  80 ff ff ff ff ff ff ff fc 72 01 86 ff ff ff ff  .........r......
        +.  0080:  f3 0e 6e 85 ff ff ff ff 0c cb 7f 85 ff ff ff ff  ..n.............
        +.  0090:  bc f2 87 85 ff ff ff ff 44 af 7f 85 ff ff ff ff  ........D.......
        +.  00a0:  bd be 7f 85 ff ff ff ff 26 d0 7f 85 ff ff ff ff  ........&.......
        +.  00b0:  6d a4 ff 85 ff ff ff ff ea 00 20 86 ff ff ff ff  m......... .....
        +.  00c0:  00 fe ff ff ff ff ff ff 57 14 4f 43 fc 7e 00 00  ........W.OC.~..
        +
        +1642373909693435 0xc550 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 2915700/2915700: 0xffffffff860172fc period: 1 addr: 0
        +... FP chain: nr:12
        +.....  0: ffffffffffffff80
        +.....  1: ffffffff860172fc
        +.....  2: ffffffff856e0ef3
        +.....  3: ffffffff857fcb0c
        +.....  4: ffffffff8587f2bc
        +.....  5: ffffffff857faf44
        +.....  6: ffffffff857fbebd
        +.....  7: ffffffff857fd026
        +.....  8: ffffffff85ffa46d
        +.....  9: ffffffff862000ea
        +..... 10: fffffffffffffe00
        +..... 11: 00007efc434f1457
        +... sample_read:
        +.... group nr 2
        +..... id 00000000001acbe6, value 0000000000000176, lost 0
        +..... id 00000000001acbf6, value 0000000000001862, lost 0
        +
        +0xc620@perf.data [0x30]: event: 3
        ...
      
      This behavior is incorrect as in the case above 'perf inject' should
      have done nothing. Fix this behavior by disabling separating samples
      for a tool that requests it. Only request this for `perf inject` so as
      to not affect other perf tools. With the patch and the test above
      there are no differences between the orig.txt and new.txt.
      
      Fixes: e4caec0d ("perf evsel: Add PERF_SAMPLE_READ sample related processing")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240729220620.2957754-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      79bcd34e
    • Namhyung Kim's avatar
      perf annotate-data: Show first-level children by default in TUI · 7f3c8f13
      Namhyung Kim authored
      Now default is to fold everything but it only shows the name of the
      top-level data type which is not very useful.  Instead just expand the
      top level entry so that it can show the layout at a higher level.
      
        Annotate type: 'struct task_struct' (4 samples)
              Percent     Offset       Size  Field
        -      100.00          0       9792  struct task_struct {                           ◆
        +        0.50          0         24      struct thread_info     thread_info;        ▒
                 0.00         24          4      unsigned int   __state;                    ▒
                 0.00         32          8      void*  stack;                              ▒
        +        0.00         40          4      refcount_t     usage;                      ▒
                 0.00         44          4      unsigned int   flags;                      ▒
                 0.00         48          4      unsigned int   ptrace;                     ▒
                 0.00         52          4      int    on_cpu;                             ▒
        +        0.00         56         16      struct __call_single_node      wake_entry; ▒
                 0.00         72          4      unsigned int   wakee_flips;                ▒
                 0.00         80          8      long unsigned int      wakee_flip_decay_ts;▒
                 0.00         88          8      struct task_struct*    last_wakee;         ▒
                 0.00         96          4      int    recent_used_cpu;                    ▒
                 0.00        100          4      int    wake_cpu;                           ▒
                 0.00        104          4      int    on_rq;                              ▒
                 0.00        108          4      int    prio;                               ▒
                 0.00        112          4      int    static_prio;                        ▒
                 0.00        116          4      int    normal_prio;                        ▒
                 0.00        120          4      unsigned int   rt_priority;                ▒
        +        0.00        128        256      struct sched_entity    se;                 ▒
        +        0.00        384         48      struct sched_rt_entity rt;                 ▒
        +        0.00        432        224      struct sched_dl_entity dl;                 ▒
                 0.00        656          8      struct sched_class*    sched_class;        ▒
        ...
      
      Committer testing:
      
        # perf mem record -a sleep 5s
        # perf annotate --group --data-type=pthread_mutex_t
      
       Annotate type: 'pthread_mutex_t' (13 samples)
            Percent     Offset       Size  Field
      -      100.00          0         40  pthread_mutex_t {                                ▒
      -      100.00          0         40      struct __pthread_mutex_s       __data {      ▒
              39.45          0          4          int        __lock;                       ▒
               0.00          4          4          unsigned int       __count;              ▒
               7.80          8          4          int        __owner;                      ▒
               6.88         12          4          unsigned int       __nusers;             ▒
              45.87         16          4          int        __kind;                       ▒
               0.00         20          2          short int  __spins;                      ▒
               0.00         22          2          short int  __elision;                    ▒
      +        0.00         24         16          __pthread_list_t   __list;               ▒
                                               };                                           ▒
               0.00          0          0      char[] __size;                               ▒
              39.45          0          8      long int       __align;
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240812194447.2049187-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7f3c8f13
    • Namhyung Kim's avatar
      perf annotate-data: Implement folding in TUI browser · af73856e
      Namhyung Kim authored
      Like 'perf report', use 'e' or 'E' key to toggle folding the current
      entry so that it can control displaying child entries.
      
      Note I didn't add the 'c' and 'C' key to collapse the entry because it's
      also handled with the 'e'/'E' since it toggles the state.
      
      Committer testing:
      
      Do some 'perf mem record' for some workload of the whole system, using
      the target options, as usual (--pid/-p, -C/--cpu, -a for the system wide
      profiling, etc) and then:
      
        # perf annotate --skip-empty --data-type=pthread_mutex_t
      
      That, by default, will start as --tui, then press 'E' to see the whole
      struct unfolded, etc.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240812194447.2049187-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      af73856e
    • Namhyung Kim's avatar
      perf annotate-data: Support folding in TUI browser · 05fc5b7d
      Namhyung Kim authored
      Like in the hists browser, it should support folding current entry so
      that it can hide unwanted details in some data structures.
      
      The folded entries will be displayed with the '+' sign, while unfolded
      entries will have the '-' sign.
      
      Entries that have no children will not show any signs.
      
        Annotate type: 'struct socket' (1 samples)
              Percent     Offset       Size  Field
        -      100.00          0        128  struct socket {                                  ◆
                 0.00          0          4      socket_state   state;                        ▒
                 0.00          4          2      short int      type;                         ▒
                 0.00          8          8      long unsigned int      flags;                ▒
                 0.00         16          8      struct file*   file;                         ▒
               100.00         24          8      struct sock*   sk;                           ▒
                 0.00         32          8      struct proto_ops*      ops;                  ▒
        -        0.00         64         64      struct socket_wq       wq {                  ▒
        -        0.00         64         24          wait_queue_head_t  wait {                ▒
        +        0.00         64          4              spinlock_t     lock;                 ▒
        -        0.00         72         16              struct list_head       head {        ▒
                 0.00         72          8                  struct list_head*  next;         ▒
                 0.00         80          8                  struct list_head*  prev;         ▒
                                                         };                                   ▒
                                                     };                                       ▒
                 0.00         88          8          struct fasync_struct*      fasync_list;  ▒
                 0.00         96          8          long unsigned int  flags;                ▒
        +        0.00        104         16          struct callback_head       rcu;          ▒
                                                 };                                           ▒
                                             };                                               ▒
      
      This just adds the display logic for folding, actually folding action
      will be implemented in the next patch.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240812194447.2049187-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      05fc5b7d
    • Ian Rogers's avatar
      perf vendor events: SKX, CLX, SNR uncore cache event fixes · 7a75c6c2
      Ian Rogers authored
      Cache home agent (CHA) events were setting the low rather than high
      config1 bits. SNR was using CLX CHA events, however its CHA is similar
      to ICX so remove the events.
      
      Incorporate the updates in:
      
        https://github.com/intel/perfmon/pull/215
        https://github.com/intel/perfmon/pull/216
      
      Fixes: 4cc49942 ("perf vendor events: Update cascadelakex events/metrics")
      Closes: https://lore.kernel.org/linux-perf-users/CAPhsuW4nem9XZP+b=sJJ7kqXG-cafz0djZf51HsgjCiwkGBA+A@mail.gmail.com/Reported-by: default avatarSong Liu <song@kernel.org>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Co-authored-by: default avatarWeilin Wang <weilin.wang@intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240811042004.421869-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7a75c6c2
    • Namhyung Kim's avatar
      perf lock contention: Change stack_id type to s32 · 040c0f88
      Namhyung Kim authored
      The bpf_get_stackid() helper returns a signed type to check whether it
      failed to get a stacktrace or not.  But it saved the result in u32 and
      checked if the value is negative.
      
            376         if (needs_callstack) {
            377                 pelem->stack_id = bpf_get_stackid(ctx, &stacks,
            378                                                   BPF_F_FAST_STACK_CMP | stack_skip);
        --> 379                 if (pelem->stack_id < 0)
      
        ./tools/perf/util/bpf_skel/lock_contention.bpf.c:379 contention_begin()
        warn: unsigned 'pelem->stack_id' is never less than zero.
      
      Let's change the type to s32 instead.
      
      Fixes: 6d499a6b ("perf lock: Print the number of lost entries for BPF")
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240812172533.2015291-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      040c0f88
    • Namhyung Kim's avatar
      perf annotate-data: Fix a buffer overflow in TUI browser · 00b04242
      Namhyung Kim authored
      In get_member_overhead(), k is updated when it has a entry in the
      histogram.  But the entry->hists array is allocated with the number of
      evsel in the group.  So the k should be reset when it iterates the event
      using for_each_group_evsel(), otherwise it'd crash due to a buffer
      overflow.
      
      Fixes: cb1898f5 ("perf annotate-data: Support --skip-empty option")
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240810191502.1947959-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      00b04242
    • Leo Yan's avatar
      perf docs: Refine the description for the buffer size · 043da846
      Leo Yan authored
      Current description for the AUX trace buffer size is misleading. When a
      user specifies the option '-m,512M', it represents a size value in bytes
      (512MiB) but not 512M pages (512M x 4KiB regard to a page of 4KiB).
      
      Make the document clear that the normal buffer and the AUX tracing
      buffer share the same semantics. Syncs the documents for consistent
      text.
      Reviewed-by: default avatarJames Clark <james.clark@linaro.org>
      Signed-off-by: default avatarLeo Yan <leo.yan@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240812093459.2575278-1-leo.yan@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      043da846
    • Martin Liška's avatar
      perf script: add --addr2line option · e6b56ae7
      Martin Liška authored
      Similarly to other subcommands (like report, top), it would be handy to
      provide a path for addr2line command.
      Signed-off-by: default avatarMartin Liska <martin.liska@hey.com>
      Cc: Ian Rogers <irogers@google.com>
      Link: https://lore.kernel.org/r/eadc3e36-029d-4848-9d69-272fe5a83a26@foxlink.czSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e6b56ae7
    • Arnaldo Carvalho de Melo's avatar
      perf tests pmu: Initialize all fields of test_pmu variable · 4f21bfed
      Arnaldo Carvalho de Melo authored
      Instead of explicitely initializing just the .name and .alias_name,
      use struct member named initialization of just the non-null -name field,
      the compiler will initialize all the other non-explicitely initialized
      fields to NULL.
      
      This makes the code more robust, avoiding the error recently fixed when
      the .alias_name was used and contained a random value.
      Reviewed-by: default avatarVeronika Molnarova <vmolnaro@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Radostin Stoyanov <rstoyano@redhat.com>
      Link: https://lore.kernel.org/lkml/e26941f9-f86c-4f2e-b812-20c49fb2c0d3@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4f21bfed
  2. 09 Aug, 2024 5 commits
    • Namhyung Kim's avatar
      perf annotate-data: Support --skip-empty option · cb1898f5
      Namhyung Kim authored
      The --skip-empty option is to hide dummy events in a group.  Like other
      output mode in 'perf report' and 'perf annotate', the data-type
      profiling output should support the option.
      
      Committer testing:
      
      With dummy:
      
        root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24
        Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples):
         event[0] = cpu_atom/mem-loads,ldlat=30/P
         event[1] = cpu_atom/mem-stores/P
         event[2] = dummy:u
        ============================================================================
                         Percent     offset       size  field
          100.00  100.00    0.00          0         40  pthread_mutex_t	 {
          100.00  100.00    0.00          0         40      struct __pthread_mutex_s	__data {
           45.21   84.54    0.00          0          4          int	__lock;
            0.00    0.00    0.00          4          4          unsigned int	__count;
            0.00    1.83    0.00          8          4          int	__owner;
            5.19   10.65    0.00         12          4          unsigned int	__nusers;
           49.61    2.97    0.00         16          4          int	__kind;
            0.00    0.00    0.00         20          2          short int	__spins;
            0.00    0.00    0.00         22          2          short int	__elision;
            0.00    0.00    0.00         24         16          __pthread_list_t	__list {
            0.00    0.00    0.00         24          8              struct __pthread_internal_list*	__prev;
            0.00    0.00    0.00         32          8              struct __pthread_internal_list*	__next;
                                                                };
                                                            };
            0.00    0.00    0.00          0          0      char[]	__size;
           45.21   84.54    0.00          0          8      long int	__align;
                                                      };
      Skipping it:
      
        root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24
        Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples):
         event[0] = cpu_atom/mem-loads,ldlat=30/P
         event[1] = cpu_atom/mem-stores/P
        ============================================================================
                 Percent     offset       size  field
          100.00  100.00          0         40  pthread_mutex_t	 {
          100.00  100.00          0         40      struct __pthread_mutex_s	__data {
           45.21   84.54          0          4          int	__lock;
            0.00    0.00          4          4          unsigned int	__count;
            0.00    1.83          8          4          int	__owner;
            5.19   10.65         12          4          unsigned int	__nusers;
           49.61    2.97         16          4          int	__kind;
            0.00    0.00         20          2          short int	__spins;
            0.00    0.00         22          2          short int	__elision;
            0.00    0.00         24         16          __pthread_list_t	__list {
            0.00    0.00         24          8              struct __pthread_internal_list*	__prev;
            0.00    0.00         32          8              struct __pthread_internal_list*	__next;
                                                        };
                                                    };
            0.00    0.00          0          0      char[]	__size;
           45.21   84.54          0          8      long int	__align;
                                                };
      
        Annotate type: 'pthread_mutexattr_t' in /usr/lib64/libc.so.6 (1 samples):
        root@number:~#
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240807061713.1642924-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cb1898f5
    • Namhyung Kim's avatar
      perf annotate: Fix --group behavior when leader has no samples · 336989d0
      Namhyung Kim authored
      When --group option is used, it should display all events together.  But
      the current logic only checks if the first (leader) event has samples or
      not.  Let's check the member events as well.
      
      Also it missed to put the linked samples from member evsels to the
      output RB-tree so that it can be displayed in the output.
      
      For example, take a look at this example.
      
        $ ./perf evlist
        cpu/mem-loads,ldlat=30/P
        cpu/mem-stores/P
        dummy:u
      
      It has three events but 'path_put' function has samples only for
      mem-stores (second) event.
      
        $ sudo ./perf annotate --stdio -f path_put
         Percent |      Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period)
        ----------------------------------------------------------------------------------------------------------
                 : 0                0xffffffffae600020 <path_put>:
            0.00 :   ffffffffae600020:       endbr64
            0.00 :   ffffffffae600024:       nopl    (%rax, %rax)
           91.22 :   ffffffffae600029:       pushq   %rbx
            0.00 :   ffffffffae60002a:       movq    %rdi, %rbx
            0.00 :   ffffffffae60002d:       movq    8(%rdi), %rdi
            8.78 :   ffffffffae600031:       callq   0xffffffffae614aa0
            0.00 :   ffffffffae600036:       movq    (%rbx), %rdi
            0.00 :   ffffffffae600039:       popq    %rbx
            0.00 :   ffffffffae60003a:       jmp     0xffffffffae620670
            0.00 :   ffffffffae60003f:       nop
      
      Therefore, it didn't show up when --group option is used since the
      leader ("mem-loads") event has no samples.  But now it checks both
      events.
      
      Before:
        $ sudo ./perf annotate --stdio -f --group path_put
        (no output)
      
      After:
        $ sudo ./perf annotate --stdio -f --group path_put
         Percent                 |      Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period)
        -------------------------------------------------------------------------------------------------------------------------------------------------------------
                                 : 0                0xffffffffae600020 <path_put>:
            0.00    0.00    0.00 :   ffffffffae600020:       endbr64
            0.00    0.00    0.00 :   ffffffffae600024:       nopl    (%rax, %rax)
            0.00   91.22    0.00 :   ffffffffae600029:       pushq   %rbx
            0.00    0.00    0.00 :   ffffffffae60002a:       movq    %rdi, %rbx
            0.00    0.00    0.00 :   ffffffffae60002d:       movq    8(%rdi), %rdi
            0.00    8.78    0.00 :   ffffffffae600031:       callq   0xffffffffae614aa0
            0.00    0.00    0.00 :   ffffffffae600036:       movq    (%rbx), %rdi
            0.00    0.00    0.00 :   ffffffffae600039:       popq    %rbx
            0.00    0.00    0.00 :   ffffffffae60003a:       jmp     0xffffffffae620670
            0.00    0.00    0.00 :   ffffffffae60003f:       nop
      
      Committer testing:
      
      Before:
      
        root@number:~# perf annotate --group --stdio2 clear_page_erms
        root@number:~#
      
      After:
      
        root@number:~# perf annotate --group --stdio2 clear_page_erms
        Samples: 125  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 13198416, [percent: local period]
        clear_page_erms() /proc/kcore
        Percent                      0xffffffff990c6cc0 <clear_page_erms>:
                                       endbr64
                                       movl    $0x1000,%ecx
                                       xorl    %eax,%eax
           0.00  100.00    0.00        rep     stosb %al, (%rdi)
                                     ← retq
                                       int3
                                       int3
                                       int3
                                       int3
                                       nop
                                       nop
        root@number:~#
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20240807061555.1642669-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      336989d0
    • Andi Kleen's avatar
      perf tools: Create source symlink in perf object dir · 890a1961
      Andi Kleen authored
      Create a source symlink to the original source in the objdir.
      
      This is similar to what the main kernel build script does.
      
      Committer testing:
      
        ⬢[acme@toolbox perf-tools-next]$ make O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
        <SNIP>
        ⬢[acme@toolbox perf-tools-next]$ ls -la /tmp/build/perf-tools-next/source
        lrwxrwxrwx. 1 acme acme 41 Aug  9 16:26 /tmp/build/perf-tools-next/source -> /home/acme/git/perf-tools-next/tools/perf
        ⬢[acme@toolbox perf-tools-next]$
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240807231823.898979-1-ak@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      890a1961
    • Arnaldo Carvalho de Melo's avatar
      perf debuginfo: Fix the build with !HAVE_DWARF_SUPPORT · 13d675ae
      Arnaldo Carvalho de Melo authored
      In that case we have a set of placeholder functions, one of them uses a
      'Dwarf_Addr' type that is not present as it is defined in the missing
      DWARF libraries, so provide a placeholder typedef for that as well.
      
      The build error before this patch:
      
        In file included from util/annotate.c:28:
        util/debuginfo.h:44:46: error: unknown type name ‘Dwarf_Addr’
           44 |                                              Dwarf_Addr *offs __maybe_unused,
              |                                              ^~~~~~~~~~
        make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: util/annotate.o] Error 1
        make[6]: *** Waiting for unfinished jobs....
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Link: https://lore.kernel.org/lkml/CAM9d7ciushSwEfj7yW4rtDEJBTcCB991V4cswwFEL+cv6QF2pg@mail.gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      13d675ae
    • Zixian Cai's avatar
      perf script python: Add the 'ins_lat' field to event handler · 05673c42
      Zixian Cai authored
      For example, when using the Alder Lake PMU memory load event, the
      instruction latency is stored in 'ins_lat', while the cache latency
      is stored in 'weight'.
      
      This patch reports the 'ins_lat' field for Python scripting.
      
      Committer testing:
      
      On a Rocket Lake Refresh Intel machine (14th gen):
      
        root@number:~# grep -m1 'model name' /proc/cpuinfo
        model name	: Intel(R) Core(TM) i7-14700K
        root@number:~# perf mem record -a sleep 5
        Memory events are enabled on a subset of CPUs: 16-27
        [ perf record: Woken up 85 times to write data ]
        [ perf record: Captured and wrote 41.236 MB perf.data (191390 samples) ]
        root@number:~# perf evlist -v
        cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
        cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
        dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
        root@number:~#
      
      Now generate a python script to then dump the dictionary that now needs
      to have that 'ins_lat' field:
      
        root@number:~# perf script --gen python
        generated Python script: perf-script.py
        root@number:~# vim perf-script.py
        root@number:~# perf script -s perf-script.py | head -40
        in trace_begin
        in trace_end
        root@number:~# vim perf-script.py
      Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarZixian Cai <fzczx123@gmail.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paran Lee <p4ranlee@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240809080137.3590148-1-fzczx123@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      05673c42
  3. 08 Aug, 2024 9 commits
    • Arnaldo Carvalho de Melo's avatar
      perf test shell lbr: Support hybrid x86 systems too · 9e9d0a79
      Arnaldo Carvalho de Melo authored
      Running on a:
      
        root@x1:~# grep 'model name' -m1 /proc/cpuinfo
        model name	: 13th Gen Intel(R) Core(TM) i7-1365U
        root@x1:~#
      
      It skips all the tests with:
      
        root@x1:~# perf test -vvvv LBR
         97: perf record LBR tests:
        --- start ---
        test child forked, pid 2033388
        Skip: only x86 CPUs support LBR
        ---- end(-2) ----
         97: perf record LBR tests                                           : Skip
        root@x1:~#
      
      Because the test checks for the /sys/devices/cpu/caps/branches file,
      that isn't present as we have instead:
      
        root@x1:~# ls -la /sys/devices/cpu*/caps/branches
        -r--r--r--. 1 root root 4096 Aug  8 11:22 /sys/devices/cpu_atom/caps/branches
        -r--r--r--. 1 root root 4096 Aug  8 11:21 /sys/devices/cpu_core/caps/branches
        root@x1:~#
      
      If we check as well for one of those,
      /sys/devices/cpu_core/caps/branches, then we don't skip the tests and
      all are run on these x86 Intel Hybrid systems as well, passing all of
      them:
      
        root@x1:~# perf test -vvvv LBR
         97: perf record LBR tests:
        --- start ---
        test child forked, pid 2034956
        LBR callgraph
        [ perf record: Woken up 5 times to write data ]
        [ perf record: Captured and wrote 1.812 MB /tmp/__perf_test.perf.data.B2HvQ (8114 samples) ]
        LBR callgraph [Success]
        LBR any branch test
        [ perf record: Woken up 25 times to write data ]
        [ perf record: Captured and wrote 6.382 MB /tmp/__perf_test.perf.data.B2HvQ (8071 samples) ]
        LBR any branch test: 8071 samples
        LBR any branch test [Success]
        LBR any call test
        [ perf record: Woken up 23 times to write data ]
        [ perf record: Captured and wrote 6.208 MB /tmp/__perf_test.perf.data.B2HvQ (8092 samples) ]
        LBR any call test: 8092 samples
        LBR any call test [Success]
        LBR any ret test
        [ perf record: Woken up 24 times to write data ]
        [ perf record: Captured and wrote 6.396 MB /tmp/__perf_test.perf.data.B2HvQ (8093 samples) ]
        LBR any ret test: 8093 samples
        LBR any ret test [Success]
        LBR any indirect call test
        [ perf record: Woken up 25 times to write data ]
        [ perf record: Captured and wrote 6.344 MB /tmp/__perf_test.perf.data.B2HvQ (8067 samples) ]
        LBR any indirect call test: 8067 samples
        LBR any indirect call test [Success]
        LBR any indirect jump test
        [ perf record: Woken up 12 times to write data ]
        [ perf record: Captured and wrote 3.073 MB /tmp/__perf_test.perf.data.B2HvQ (8061 samples) ]
        LBR any indirect jump test: 8061 samples
        LBR any indirect jump test [Success]
        LBR direct calls test
        [ perf record: Woken up 25 times to write data ]
        [ perf record: Captured and wrote 6.380 MB /tmp/__perf_test.perf.data.B2HvQ (8076 samples) ]
        LBR direct calls test: 8076 samples
        LBR direct calls test [Success]
        LBR any indirect user call test
        [ perf record: Woken up 5 times to write data ]
        [ perf record: Captured and wrote 1.597 MB /tmp/__perf_test.perf.data.B2HvQ (8079 samples) ]
        LBR any indirect user call test: 8079 samples
        LBR any indirect user call test [Success]
        LBR system wide any branch test
        [ perf record: Woken up 26 times to write data ]
        [ perf record: Captured and wrote 9.088 MB /tmp/__perf_test.perf.data.B2HvQ (9209 samples) ]
        LBR system wide any branch test: 9209 samples
        LBR system wide any branch test [Success]
        LBR system wide any call test
        [ perf record: Woken up 25 times to write data ]
        [ perf record: Captured and wrote 8.945 MB /tmp/__perf_test.perf.data.B2HvQ (9333 samples) ]
        LBR system wide any call test: 9333 samples
        LBR system wide any call test [Success]
        LBR parallel any branch test
        LBR parallel any call test
        LBR parallel any ret test
        LBR parallel any indirect call test
        LBR parallel any indirect jump test
        LBR parallel direct calls test
        LBR parallel system wide any branch test
        LBR parallel any indirect user call test
        LBR parallel system wide any call test
        [ perf record: Woken up 9 times to write data ]
        [ perf record: Woken up 51 times to write data ]
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Woken up 5 times to write data ]
        [ perf record: Woken up 559 times to write data ]
        [ perf record: Woken up 14 times to write data ]
        [ perf record: Woken up 17 times to write data ]
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Woken up 11 times to write data ]
        [ perf record: Captured and wrote 0.150 MB /tmp/__perf_test.perf.data.lANpR (1909 samples) ]
        [ perf record: Captured and wrote 2.371 MB /tmp/__perf_test.perf.data.Olum8 (3033 samples) ]
        [ perf record: Captured and wrote 1.230 MB /tmp/__perf_test.perf.data.njfJ8 (1742 samples) ]
        [ perf record: Captured and wrote 5.554 MB /tmp/__perf_test.perf.data.4ZTrj (29662 samples) ]
        [ perf record: Captured and wrote 19.906 MB /tmp/__perf_test.perf.data.dlGQt (29576 samples) ]
        [ perf record: Captured and wrote 0.289 MB /tmp/__perf_test.perf.data.CAT7y (4311 samples) ]
        [ perf record: Captured and wrote 3.129 MB /tmp/__perf_test.perf.data.diuKG (3971 samples) ]
        LBR parallel any indirect user call test: 1909 samples
        [ perf record: Captured and wrote 4.858 MB /tmp/__perf_test.perf.data.sVjtN (6130 samples) ]
        LBR parallel any indirect user call test [Success]
        [ perf record: Captured and wrote 3.669 MB /tmp/__perf_test.perf.data.AJtNI (4827 samples) ]
        LBR parallel any indirect jump test: 4311 samples
        LBR parallel any indirect jump test [Success]
        LBR parallel direct calls test: 3033 samples
        LBR parallel direct calls test [Success]
        LBR parallel any indirect call test: 1742 samples
        LBR parallel any indirect call test [Success]
        LBR parallel any call test: 4827 samples
        LBR parallel any call test [Success]
        LBR parallel any branch test: 6130 samples
        LBR parallel any branch test [Success]
        LBR parallel system wide any branch test: 29662 samples
        LBR parallel any ret test: 3971 samples
        LBR parallel any ret test [Success]
        LBR parallel system wide any branch test [Success]
        LBR parallel system wide any call test: 29576 samples
        LBR parallel system wide any call test [Success]
        ---- end(0) ----
         97: perf record LBR tests                                           : Ok
        root@x1:~#
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZrTXftup0H46R8WK@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9e9d0a79
    • Ian Rogers's avatar
      perf test: Add set of perf record LBR tests · 32559b99
      Ian Rogers authored
      Adds coverage for LBR operations and LBR callgraph.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anne Macedo <retpolanne@posteo.net>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240808054644.1286065-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      32559b99
    • Ian Rogers's avatar
      perf callchain: Fix stitch LBR memory leaks · 599c1939
      Ian Rogers authored
      The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps
      and map members are reference counted. Ensure these values use a _get
      routine to increment the reference counts and use map_symbol__exit() to
      release the reference counts.
      
      Do similar for 'struct thread's prev_lbr_cursor, but save the size of
      the prev_lbr_cursor array so that it may be iterated.
      
      Ensure that when stitch_nodes are placed on the free list the
      map_symbols are exited.
      
      Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to
      list_splice_init(), so the whole list is moved and nodes aren't leaked.
      
      A reproduction of the memory leaks is possible with a leak sanitizer
      build in the perf report command of:
      
        ```
        $ perf record -e cycles --call-graph lbr perf test -w thloop
        $ perf report --stitch-lbr
        ```
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Fixes: ff165628 ("perf callchain: Stitch LBR call stack")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      [ Basic tests after applying the patch, repeating the example above ]
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anne Macedo <retpolanne@posteo.net>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240808054644.1286065-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      599c1939
    • Veronika Molnarova's avatar
      perf test pmu: Set uninitialized PMU alias to null · 37e2a19c
      Veronika Molnarova authored
      Commit 3e0bf9fd ("perf pmu: Restore full PMU name wildcard
      support") adds a test case "PMU cmdline match" that covers PMU name
      wildcard support provided by function perf_pmu__match().
      
      The test works with a wide range of supported combinations of PMU name
      matching but omits the case that if the perf_pmu__match() cannot match
      the PMU name to the wildcard, it tries to match its alias. However, this
      variable is not set up, causing the test case to fail when run with
      subprocesses or to segfault if run as a single process.
      
        ./perf test -vv 9
          9: Sysfs PMU tests                                :
          9.1: Parsing with PMU format directory            : Ok
          9.2: Parsing with PMU event                       : Ok
          9.3: PMU event names                              : Ok
          9.4: PMU name combining                           : Ok
          9.5: PMU name comparison                          : Ok
          9.6: PMU cmdline match                            : FAILED!
      
        ./perf test -F 9
          9.1: Parsing with PMU format directory            : Ok
          9.2: Parsing with PMU event                       : Ok
          9.3: PMU event names                              : Ok
          9.4: PMU name combining                           : Ok
          9.5: PMU name comparison                          : Ok
        Segmentation fault (core dumped)
      
      Initialize the PMU alias to null for all tests of perf_pmu__match()
      as this functionality is not being tested and the alias matching works
      exactly the same as the matching of the PMU name.
      
      ./perf test -F 9
        9.1: Parsing with PMU format directory                             : Ok
        9.2: Parsing with PMU event                                        : Ok
        9.3: PMU event names                                               : Ok
        9.4: PMU name combining                                            : Ok
        9.5: PMU name comparison                                           : Ok
        9.6: PMU cmdline match                                             : Ok
      
      Fixes: 3e0bf9fd ("perf pmu: Restore full PMU name wildcard support")
      Signed-off-by: default avatarVeronika Molnarova <vmolnaro@redhat.com>
      Cc: James Clark <james.clark@linaro.org>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Radostin Stoyanov <rstoyano@redhat.com>
      Link: https://lore.kernel.org/r/20240808103749.9356-1-vmolnaro@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      37e2a19c
    • Arnaldo Carvalho de Melo's avatar
      perf tests ftrace: Add pattern check for time, count · 2df5484b
      Arnaldo Carvalho de Melo authored
      In 'perf ftrace profile sleep 0.1' we know that we'll have an specific
      kernel function that will take a bit more than 0.1 seconds and will take
      place just one time, so we can add a check for that so that we validate
      more than just the presence of some functions in the profile.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Link: https://lore.kernel.org/lkml/ZrTBo7KACZeuCyLj@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2df5484b
    • Namhyung Kim's avatar
      perf test: Add a new shell test for perf ftrace · ed5bb548
      Namhyung Kim authored
        $ sudo ./perf test ftrace -vv
         86: perf ftrace tests:
        --- start ---
        test child forked, pid 1772223
        perf ftrace list test
        syscalls for sleep:
        __x64_sys_nanosleep
        __ia32_sys_nanosleep
        __x64_sys_clock_nanosleep
        __ia32_sys_clock_nanosleep
        perf ftrace list test  [Success]
        perf ftrace trace test
        # tracer: function_graph
        #
        # CPU  DURATION                  FUNCTION CALLS
        # |     |   |                     |   |   |   |
         0)               |  __x64_sys_clock_nanosleep() {
         0)               |    common_nsleep() {
         0)               |      hrtimer_nanosleep() {
         0)               |        do_nanosleep() {
        perf ftrace trace test  [Success]
        perf ftrace latency test
        target function: __x64_sys_clock_nanosleep
        #   DURATION     |      COUNT | GRAPH                                          |
            32 - 64   ms |          1 | ############################################## |
        perf ftrace latency test  [Success]
        perf ftrace profile test
        # Total (us)   Avg (us)   Max (us)      Count   Function
          100136.400 100136.400 100136.400          1   __x64_sys_clock_nanosleep
          100135.200 100135.200 100135.200          1   common_nsleep
          100134.700 100134.700 100134.700          1   hrtimer_nanosleep
          100133.700 100133.700 100133.700          1   do_nanosleep
          100130.600 100130.600 100130.600          1   schedule
             166.868     55.623     80.299          3   scheduler_tick
               5.926      5.926      5.926          1   native_smp_send_reschedule
             301.941    301.941    301.941          1   __x64_sys_execve
             295.786    295.786    295.786          1   do_execveat_common.isra.0
              71.397     35.699     46.403          2   bprm_execve
               2.519      1.260      1.547          2   sched_mm_cid_before_execve
               1.098      0.549      0.686          2   sched_mm_cid_after_execve
        perf ftrace profile test  [Success]
        ---- end(0) ----
         86: perf ftrace tests                                               : Ok
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Link: https://lore.kernel.org/r/20240808044954.1775333-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ed5bb548
    • Namhyung Kim's avatar
      perf annotate-data: Show typedef names properly · 90d78e7b
      Namhyung Kim authored
      The die_get_typename() would resolve typedef and get to the original
      type.  But sometimes the original type is a struct without name and it
      makes the output confusing and hard to read.
      
      This is a diff of perf report -s type before and after the change.
      New types such as atomic{,64}_t and sigset_t appeared and the portion
      of unnamed struct was reduced.  Also u32, u64 and size_t were splitted
      from the base types.
      
        --- b   2024-08-01 17:02:34.307809952 -0700
        +++ a   2024-08-07 14:17:05.245853999 -0700
        -     2.40%  long unsigned int
        +     2.26%  long unsigned int
        -     1.56%  unsigned int
        +     1.27%  unsigned int
        -     0.98%  struct
        -     0.79%  long long unsigned int
        +     0.58%  long long unsigned int
        +     0.36%  struct
        +     0.27%  atomic64_t
        +     0.22%  u32
        +     0.21%  u64
        +     0.19%  atomic_t
        +     0.13%  size_t
        -     0.08%  struct seqcount_spinlock
        +     0.08%  seqcount_spinlock_t
        +     0.08%  sigset_t
        +     0.08%  __poll_t
      
      Let's use the typedef name directly and the resolved to get the size of
      the type.
      
      Committer testing:
      
        root@x1:~# diff -u before after | head -30
        --- before	2024-08-08 09:35:13.917325041 -0300
        +++ after	2024-08-08 09:37:35.312257905 -0300
        @@ -10,25 +10,27 @@
         # ........  .........
         #
             79.40%  (unknown)
        -     2.28%  union
              1.96%  (stack operation)
        -     1.24%  struct
        +     1.87%  pthread_mutex_t
              0.99%  u32[]
        -     0.92%  unsigned int
              0.77%  struct task_struct
        +     0.75%  U32
              0.75%  struct pcpu_hot
              0.63%  struct qspinlock
        +     0.61%  atomic_t
              0.59%  struct list_head
        -     0.58%  int
              0.53%  struct cfs_rq
              0.51%  BYTE*
        -     0.48%  unsigned char
        +     0.48%  BYTE
              0.48%  long unsigned int
              0.46%  struct rq
              0.41%  struct worker
              0.41%  struct memcg_vmstats_percpu
        +     0.41%  pthread_cond_t
              0.37%  _Bool
        +     0.36%  int
        root@x1:~#
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240807223129.1738004-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      90d78e7b
    • Namhyung Kim's avatar
      perf annotate: Cache debuginfo for data type profiling · 037f1b67
      Namhyung Kim authored
      In find_data_type(), it creates and deletes a debug info whenver it
      tries to find data type for a sample.  This is inefficient and it most
      likely accesses the same binary again and again.
      
      Let's add a single entry cache the debug info structure for the last DSO.
      Depending on sample data, it usually gives me 2~3x (and sometimes more)
      speed ups.
      
      Note that this will introduce a little difference in the output due to
      the order of checking stack operations.  It used to check the stack ops
      before checking the availability of debug info but I moved it after the
      symbol check.  So it'll report stack operations in DSOs without debug
      info as unknown.  But I think it's ok and better to have the checking
      near the caching logic.
      
      Committer testing:
      
        root@x1:~# perf mem record -a sleep 5s
        root@x1:~# perf evlist
        cpu_atom/mem-loads,ldlat=30/P
        cpu_atom/mem-stores/P
        dummy:u
        root@x1:~# diff -u before after
        --- before	2024-08-08 09:33:53.880780784 -0300
        +++ after	2024-08-08 09:35:13.917325041 -0300
        @@ -81,8 +81,8 @@
         # Overhead  Data Type
         # ........  .........
         #
        -    55.43%  (unknown)
        -    11.61%  (stack operation)
        +    55.56%  (unknown)
        +    11.48%  (stack operation)
              4.93%  struct pcpu_hot
              3.26%  unsigned int
              2.48%  struct
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240805234648.1453689-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      037f1b67
    • Ian Rogers's avatar
      perf hist: Fix reference counting of branch_info · b2f70c99
      Ian Rogers authored
      iter_finish_branch_entry() doesn't put the branch_info from/to map
      elements creating memory leaks. This can be seen with:
      
      ```
      $ perf record -e cycles -b perf test -w noploop
      $ perf report -D
      ...
      Direct leak of 984344 byte(s) in 123043 object(s) allocated from:
          #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69
          #1 0x564d3400d10b in map__get util/map.h:186
          #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981
          #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151
          #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898
          #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238
          #6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334
          #7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655
          #8 0x564d3403ba52 in do_flush util/ordered-events.c:245
          #9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324
          #10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708
          #11 0x564d34032480 in perf_session__process_event util/session.c:1877
          #12 0x564d340336ad in reader__read_event util/session.c:2399
          #13 0x564d34033fdc in reader__process_events util/session.c:2448
          #14 0x564d34033fdc in __perf_session__process_events util/session.c:2495
          #15 0x564d34033fdc in perf_session__process_events util/session.c:2661
          #16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065
          #17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805
          #18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350
          #19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403
          #20 0x564d33cdd827 in run_argv tools/perf/perf.c:447
          #21 0x564d33cdd827 in main tools/perf/perf.c:561
      ...
      ```
      
      Clearing up the map_symbols properly creates maps reference count
      issues so resolve those. Resolving this issue doesn't improve peak
      heap consumption for the test above.
      
      Committer testing:
      
        $ sudo dnf install libasan
        $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Link: https://lore.kernel.org/r/20240807065136.1039977-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b2f70c99
  4. 06 Aug, 2024 6 commits
    • Arnaldo Carvalho de Melo's avatar
      Merge remote-tracking branch 'torvalds/master' into perf-tools-next · 37ce8a56
      Arnaldo Carvalho de Melo authored
      To pick a patch that albeit being for tools/perf/ directory went thru a
      different tree and ended up breaking some recent tests introduced in the
      perf-tools-next tree to validate duplicate events in the JSON
      performance event files.
      
      Link: https://lore.kernel.org/lkml/ZrIqDMg7cBVhstYU@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      37ce8a56
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v6.11-2' of... · eb5e56d1
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Ilpo Järvinen:
       "Fixes:
      
         - Fix ACPI notifier racing with itself (intel-vbtn)
      
         - Initialize local variable to cover a timeout corner case
           (intel/ifs)
      
         - WMI docs spelling
      
        New device IDs:
      
         - amd/{pmc,pmf}: AMD 1Ah model 60h series.
      
         - amd/pmf: SPS quirk support for ASUS ROG Ally X"
      
      * tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/x86/intel/ifs: Initialize union ifs_status to zero
        platform/x86: msi-wmi-platform: Fix spelling mistakes
        platform/x86/amd/pmf: Add new ACPI ID AMDI0107
        platform/x86/amd/pmc: Send OS_HINT command for new AMD platform
        platform/x86/amd: pmf: Add quirk for ROG Ally X
        platform/x86: intel-vbtn: Protect ACPI notify handler against recursion
      eb5e56d1
    • Ian Rogers's avatar
      perf jevents.py: Ensure event names aren't duplicated · 4bd38039
      Ian Rogers authored
      Duplicate event names break invariants in 'perf list'. Assert that an
      event name isn't duplicated so that broken JSON won't build.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Atish Patra <atishp@rivosinc.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
      Cc: Eric Lin <eric.lin@sifive.com>
      Cc: Greentime Hu <greentime.hu@sifive.com>
      Cc: Guilherme Amadio <amadio@gentoo.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Inochi Amaoto <inochiama@outlook.com>
      Cc: James Clark <james.clark@linaro.org>
      Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Locus Wei-Han Chen <locus84@andestech.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Vincent Chen <vincent.chen@sifive.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xu Yang <xu.yang_2@nxp.com>
      Link: https://lore.kernel.org/r/20240805194424.597244-5-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4bd38039
    • Ian Rogers's avatar
      perf pmu-events: Remove duplicated ampereone event · c4f74bb6
      Ian Rogers authored
      OP_SPEC is repeated twice in the file which will break invariants in
      'perf list' as discussed in this thread:
      
        https://lore.kernel.org/linux-perf-users/20240719081651.24853-1-eric.lin@sifive.com/Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Atish Patra <atishp@rivosinc.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
      Cc: Eric Lin <eric.lin@sifive.com>
      Cc: Greentime Hu <greentime.hu@sifive.com>
      Cc: Guilherme Amadio <amadio@gentoo.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Inochi Amaoto <inochiama@outlook.com>
      Cc: James Clark <james.clark@linaro.org>
      Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Locus Wei-Han Chen <locus84@andestech.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Vincent Chen <vincent.chen@sifive.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xu Yang <xu.yang_2@nxp.com>
      Link: https://lore.kernel.org/r/20240805194424.597244-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c4f74bb6
    • Ian Rogers's avatar
      perf pmu-events: Change dependencies for empty-pmu-events.c test · b79f9a43
      Ian Rogers authored
      Switch from $? (all the prerequisites that are newer than the target)
      to $^ (all the prerequisites) as touching jevents.py will mean that
      empty-pmu-events.c won't be passed to the diff command breaking the
      build.
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Atish Patra <atishp@rivosinc.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
      Cc: Eric Lin <eric.lin@sifive.com>
      Cc: Greentime Hu <greentime.hu@sifive.com>
      Cc: Guilherme Amadio <amadio@gentoo.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Inochi Amaoto <inochiama@outlook.com>
      Cc: James Clark <james.clark@linaro.org>
      Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Locus Wei-Han Chen <locus84@andestech.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Vincent Chen <vincent.chen@sifive.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xu Yang <xu.yang_2@nxp.com>
      Link: https://lore.kernel.org/r/20240805194424.597244-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b79f9a43
    • Ian Rogers's avatar
      perf test: Add build test for JEVENTS_ARCH=all · 2576b20a
      Ian Rogers authored
      Building with JEVENTS_ARCH=all builds all CPU types and allows things
      like assertions to check the validity of the input JSON.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Atish Patra <atishp@rivosinc.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
      Cc: Eric Lin <eric.lin@sifive.com>
      Cc: Greentime Hu <greentime.hu@sifive.com>
      Cc: Guilherme Amadio <amadio@gentoo.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Inochi Amaoto <inochiama@outlook.com>
      Cc: James Clark <james.clark@linaro.org>
      Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Locus Wei-Han Chen <locus84@andestech.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Vincent Chen <vincent.chen@sifive.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xu Yang <xu.yang_2@nxp.com>
      Link: https://lore.kernel.org/r/20240805194424.597244-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2576b20a
  5. 05 Aug, 2024 4 commits
    • Linus Torvalds's avatar
      Merge tag 'linux_kselftest-fixes-6.11-rc3' of... · b446a2da
      Linus Torvalds authored
      Merge tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fix from Shuah Khan:
       "A single fix to the conditional in ksft.py script which incorrectly
        flags a test suite failed when there are skipped tests in the mix.
      
        The logic is fixed to take skipped tests into account and report the
        test as passed"
      
      * tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: ksft: Fix finished() helper exit code on skipped tests
      b446a2da
    • Namhyung Kim's avatar
      perf annotate: Add --skip-empty option · ce533c9b
      Namhyung Kim authored
      Like in 'perf report', we want to hide empty events in the 'perf annotate'
      output.  This is consistent when the option is set in perf report.
      
      For example, the following command would use 3 events including dummy.
      
        $ perf mem record -a -- perf test -w noploop
      
        $ perf evlist
        cpu/mem-loads,ldlat=30/P
        cpu/mem-stores/P
        dummy:u
      
      Just using perf annotate with --group will show the all 3 events.
      
        $ perf annotate --group --stdio | head
         Percent                 |	Source code & Disassembly of ...
        --------------------------------------------------------------
                                 : 0     0xe060 <_dl_relocate_object>:
            0.00    0.00    0.00 :    e060:       pushq   %rbp
            0.00    0.00    0.00 :    e061:       movq    %rsp, %rbp
            0.00    0.00    0.00 :    e064:       pushq   %r15
            0.00    0.00    0.00 :    e066:       movq    %rdi, %r15
            0.00    0.00    0.00 :    e069:       pushq   %r14
            0.00    0.00    0.00 :    e06b:       pushq   %r13
            0.00    0.00    0.00 :    e06d:       movl    %edx, %r13d
      
      Now with --skip-empty, it'll hide the last dummy event.
      
        $ perf annotate --group --stdio --skip-empty | head
         Percent         |	Source code & Disassembly of ...
        ------------------------------------------------------
                         : 0     0xe060 <_dl_relocate_object>:
            0.00    0.00 :    e060:       pushq   %rbp
            0.00    0.00 :    e061:       movq    %rsp, %rbp
            0.00    0.00 :    e064:       pushq   %r15
            0.00    0.00 :    e066:       movq    %rdi, %r15
            0.00    0.00 :    e069:       pushq   %r14
            0.00    0.00 :    e06b:       pushq   %r13
            0.00    0.00 :    e06d:       movl    %edx, %r13d
      
      Committer testing:
      
        root@x1:~# perf evlist
        cpu_atom/mem-loads,ldlat=30/P
        cpu_atom/mem-stores/P
        dummy:u
        root@x1:~#
      
      Before:
      
        root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25
        Samples: 20  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period]
        do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
        Percent                       0x9900 <do_lookup_x>:
                                        pushq      %rbp
                                        movq       %rsp,%rbp
                                        pushq      %r15
                                        pushq      %r14
                                        pushq      %r13
                                        pushq      %r12
                                        pushq      %rbx
                                        subq       $0x88,%rsp
                                        movq       %rdi,-0x50(%rbp)
                                        movl       8(%r9),%edi
                                        movq       0x10(%rbp),%r12
                                        movq       0x28(%rbp),%r10
                                        movq       %rdx,-0x70(%rbp)
                                        movq       %rcx,-0x58(%rbp)
                                        movq       %rdi,%r11
           0.00    5.73    0.00         movq       %r8,-0x68(%rbp)
                                        movq       (%r9),%r8
                                        movl       %esi,%eax
           8.30    0.00    0.00         movl       0x30(%rbp),%r9d
                                        movl       %esi,%r15d
                                        shrl       $6, %eax
                                        movq       %r8,%r13
        root@x1:~#
      
      After:
      
        root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25
        Samples: 20  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period]
        do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
        Percent               0x9900 <do_lookup_x>:
                                pushq      %rbp
                                movq       %rsp,%rbp
                                pushq      %r15
                                pushq      %r14
                                pushq      %r13
                                pushq      %r12
                                pushq      %rbx
                                subq       $0x88,%rsp
                                movq       %rdi,-0x50(%rbp)
                                movl       8(%r9),%edi
                                movq       0x10(%rbp),%r12
                                movq       0x28(%rbp),%r10
                                movq       %rdx,-0x70(%rbp)
                                movq       %rcx,-0x58(%rbp)
                                movq       %rdi,%r11
           0.00    5.73         movq       %r8,-0x68(%rbp)
                                movq       (%r9),%r8
                                movl       %esi,%eax
           8.30    0.00         movl       0x30(%rbp),%r9d
                                movl       %esi,%r15d
                                shrl       $6, %eax
                                movq       %r8,%r13
        root@x1:~#
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240803211332.1107222-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ce533c9b
    • Namhyung Kim's avatar
      perf annotate: Set al->data_nr using the notes->src->nr_events · bb588e38
      Namhyung Kim authored
      This is a preparation to support skipping empty events.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240803211332.1107222-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bb588e38
    • Namhyung Kim's avatar
      perf annotate: Use annotation__pcnt_width() consistently · b00e4d0d
      Namhyung Kim authored
      The annotation__pcnt_width() calculates the screen width for the
      overhead (percent) area considering event groups properly.  Use this
      function consistently so that we can make sure it has similar output
      in different modes.  But there's a difference in stdio and tui output:
      stdio uses 8 and tui uses 7 for a percent.
      
      Let's use 8 and adjust the print width in __annotation_line__write()
      properly.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240803211332.1107222-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b00e4d0d