Commits · 7bedcbaefdf5d4f71302cb5428eb72ada4f380e9 · Kirill Smelkov / linux

28 Aug, 2024 20 commits

perf trace: Pass the richer 'struct syscall_arg' pointer to trace__btf_scnprintf() · 7bedcbae

Arnaldo Carvalho de Melo authored Aug 22, 2024

Since we'll need it later in the current patch series and we can get the
syscall_arg_fmt from syscall_arg->fmt.
Based-on-a-patch-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zsd8vqCrTh5h69rp@x1Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

7bedcbae

perf trace: Fix perf trace -p <PID> · 8df1d8c6

Howard Chu authored Aug 15, 2024

'perf trace -p <PID>' work on a syscall that is unaugmented, but doesn't
work on a syscall that's augmented (when it calls perf_event_output() in
BPF).

Let's take open() as an example. open() is augmented in perf trace.

Before:

  $ perf trace -e open -p 3792392
     ? (         ):  ... [continued]: open()) = -1 ENOENT (No such file or directory)
     ? (         ):  ... [continued]: open()) = -1 ENOENT (No such file or directory)

We can see there's no output.

After:

   $ perf trace -e open -p 3792392
      0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
   1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)

Reason:

bpf_perf_event_output() will fail when you specify a pid in 'perf trace' (EOPNOTSUPP).

When using 'perf trace -p 114', before perf_event_open(), we'll have PID
= 114, and CPU = -1.

This is bad for bpf-output event, because the ring buffer won't accept
output from BPF's perf_event_output(), making it fail. I'm still trying
to find out why.

If we open bpf-output for every cpu, instead of setting it to -1, like
this:

  PID = <PID>, CPU = 0
  PID = <PID>, CPU = 1
  PID = <PID>, CPU = 2
  PID = <PID>, CPU = 3

Everything works.

You can test it with this script (open.c):

  #include <unistd.h>
  #include <sys/syscall.h>

  int main()
  {
	int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
	char s1[] = "DINGZHEN", s2[] = "XUEBAO";

	while (1) {
		syscall(SYS_open, s1, i1, i2);
		sleep(1);
	}

	return 0;
  }

save, compile:

  make open

perf trace:

  perf trace -e open <path-to-the-executable>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-2-howardchu95@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8df1d8c6

perf evlist: Introduce method to find if there is a bpf-output event · 4451dae4

Howard Chu authored Aug 15, 2024

We'll use it in the next patch, to deciding how to set up the ring
buffer.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-2-howardchu95@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

4451dae4

perf report: Name events in stats for pipe mode · 8b48f8ba

Ian Rogers authored Aug 27, 2024

In stats mode PERF_RECORD_EVENT_UPDATE isn't being handled meaning the
evsels aren't named when handling pipe mode output.

Before:

  $ perf record -e inst_retired.any -a -o - sleep 0.1|perf report --stats -i -
  ...
  Aggregated stats:
             TOTAL events:      23358
              COMM events:       2608  (11.2%)
              EXIT events:          1  ( 0.0%)
              FORK events:       2607  (11.2%)
            SAMPLE events:        174  ( 0.7%)
             MMAP2 events:      17936  (76.8%)
              ATTR events:          2  ( 0.0%)
    FINISHED_ROUND events:          2  ( 0.0%)
          ID_INDEX events:          1  ( 0.0%)
        THREAD_MAP events:          1  ( 0.0%)
           CPU_MAP events:          1  ( 0.0%)
      EVENT_UPDATE events:          3  ( 0.0%)
         TIME_CONV events:          1  ( 0.0%)
           FEATURE events:         20  ( 0.1%)
     FINISHED_INIT events:          1  ( 0.0%)
  raw 0xc0 stats:
            SAMPLE events:        174

After:

  $ perf record -e inst_retired.any -a -o - sleep 0.1|perf report --stats -i -
  ...
  Aggregated stats:
             TOTAL events:      23742
              COMM events:       2620  (11.0%)
              EXIT events:          2  ( 0.0%)
              FORK events:       2619  (11.0%)
            SAMPLE events:        165  ( 0.7%)
             MMAP2 events:      18304  (77.1%)
              ATTR events:          2  ( 0.0%)
    FINISHED_ROUND events:          2  ( 0.0%)
          ID_INDEX events:          1  ( 0.0%)
        THREAD_MAP events:          1  ( 0.0%)
           CPU_MAP events:          1  ( 0.0%)
      EVENT_UPDATE events:          3  ( 0.0%)
         TIME_CONV events:          1  ( 0.0%)
           FEATURE events:         20  ( 0.1%)
     FINISHED_INIT events:          1  ( 0.0%)
  inst_retired.any stats:
            SAMPLE events:        165

This makes the pipe output match the regular output.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240827212757.1469340-1-irogers@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8b48f8ba

perf testsuite: Install perf-report tests in the 'make install-tests -C tools/perf' target · 097fe67d

Michael Petlan authored Jul 02, 2024

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-13-vmolnaro@redhat.comSigned-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

097fe67d

perf testsuite report: Add test case for perf report · e37cb2a6

Veronika Molnarova authored Jul 02, 2024

Add a new 'perf report' test case that acts as an entry element in 'perf
test list'.

Runs multiple subtests from directory "base_report", which can be
expanded without further editing.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-12-vmolnaro@redhat.comSigned-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

e37cb2a6

perf testsuite report: Add test for perf-report basic functionality · 61f87151

Veronika Molnarova authored Jul 02, 2024

Test basic execution and some options of perf-report subcommand, like
show-nr-samples, header, showcpuutilization, pid and symbol filtering.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-11-vmolnaro@redhat.comSigned-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

61f87151

perf testsuite: Add common output checking helper · 13d58a66

Veronika Molnarova authored Jul 02, 2024

As a form of validation, it is a common practice to check the outputs
of commands whether they contain expected patterns or match a certain
regular expression.

This output checking helper is designed to allow checking stderr output
of perf commands for unexpected messages, while ignoring messages that
are known to be harmless, e.g.:
  "Lowering default frequency rate to \d+\."
  "\d+ out of order events recorded."
  etc.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-10-vmolnaro@redhat.comSigned-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

13d58a66

perf testsuite probe: Add test for line semantics · c0964af8

Veronika Molnarova authored Jul 02, 2024

The perf-probe command uses a specific semantics to describe probes.
Test some patterns that are known to be both valid and invalid if
they are handled appropriately.

This test is run as a part of perftool-testsuite_probe test case.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-9-vmolnaro@redhat.comSigned-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

c0964af8

perf testsuite probe: Add test for invalid options · 83b6815d

Veronika Molnarova authored Jul 02, 2024

Test if various incompatible options are correctly handled-rejected.
It is run as a part of perftool-testsuite_probe test case.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-8-vmolnaro@redhat.comSigned-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

83b6815d

perf testsuite probe: Add test for basic perf-probe options · adc1dd00

Veronika Molnarova authored Jul 02, 2024

Test basic behavior of perf-probe subcommand. It is run as a part of
perftool-testsuite_probe test case.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-7-vmolnaro@redhat.comSigned-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

adc1dd00

perf testsuite probe: Add test for blacklisted kprobes handling · def5480d

Veronika Molnarova authored Jul 02, 2024

Test perf probe interface. Blacklisted functions should be rejected
when there is an attempt to set a kprobe to them.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-6-vmolnaro@redhat.comSigned-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

def5480d

perf testsuite: Fix shellcheck warnings · 32ddd082

Veronika Molnarova authored Jul 02, 2024

Shellcheck is becoming a standard when building perf to prevent
any unnecessary mistakes. Fix shellcheck warnings in perf testsuite.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-5-vmolnaro@redhat.comSigned-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

32ddd082

perf testsuite: Merge settings files for shell tests · a3a02a52

Veronika Molnarova authored Jul 02, 2024

Merge perf testsuite setting files into common settings to reduce
duplicates and prevent errors.
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-4-vmolnaro@redhat.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

a3a02a52

perf tests shell: Skip base_* dirs in test script search · 5a02447c

Michael Petlan authored Jul 02, 2024

The test scripts in base_* directories currently have their own drivers
that run them. Before this patch, the shell test-suite generator causes
them to run twice. Fix that by skipping them in the generator.

A cleaner solution (for future) will be to use the directory structure
idea (introduced by Carsten Haitzler in 7391db64 ("perf test:
Refactor shell tests allowing subdirs")) to generate test entries with
subtests, like:

  $ perf test list
  [...]
  97: perf probe shell tests
  97:1: perf probe basic functionality
  97:2: perf probe tests with arguments
  97:3: perf probe invalid options handling
  [...]

There is already a lot of shell test scripts and many are about to come,
so there is a need for some hierarchy.
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240702110849.31904-3-vmolnaro@redhat.comSigned-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

5a02447c

perf test vfs_getname: Look for alternative line where to collect the pathname · a68080e1

Arnaldo Carvalho de Melo authored Aug 27, 2024

The getname_flags() routine changed recently and thus the place where we
were getting the pathname is not probeable anymore, albeit still
present, so use the next line for that, before:

  root@number:/home/acme/git/perf-tools-next# perf test vfs_getname
   91: Add vfs_getname probe to get syscall args filenames             : FAILED!
   93: Use vfs_getname probe to get syscall args filenames             : FAILED!
  126: Check open filename arg using perf trace + vfs_getname          : FAILED!
  root@number:/home/acme/git/perf-tools-next#

Now tests 91 and 126 are passing, some more investigation is needed for
test 93, that continues to fail.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

a68080e1

perf test: Update sample filtering tests with multiple events · 150ca9cc

Namhyung Kim authored Aug 20, 2024

Add Multiple bpf-filter test for two or more events with filters.
It uses task-clock and page-faults events with different filter
expressions and check the perf script output

  $ sudo ./perf test filtering -vv
   96: perf record sample filtering (by BPF) tests:
  --- start ---
  test child forked, pid 2804025
  Checking BPF-filter privilege
  Basic bpf-filter test
  Basic bpf-filter test [Success]
  Failing bpf-filter test
  Error: task-clock event does not have PERF_SAMPLE_CPU
  Failing bpf-filter test [Success]
  Group bpf-filter test
  Error: task-clock event does not have PERF_SAMPLE_CPU
  Error: task-clock event does not have PERF_SAMPLE_CODE_PAGE_SIZE
  Group bpf-filter test [Success]
  Multiple bpf-filter test
  Multiple bpf-filter test [Success]
  ---- end(0) ----
   96: perf record sample filtering (by BPF) tests                     : Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240820154504.128923-3-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

150ca9cc

perf tools: Print lost samples due to BPF filter · 1a5474a7

Namhyung Kim authored Aug 20, 2024

Print the actual dropped sample count in the event stat.

  $ sudo perf record -o- -e cycles --filter 'period < 10000' \
      -e instructions --filter 'ip > 0x8000000000000000' perf test -w noploop | \
      perf report --stat -i-
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.058 MB - ]

  Aggregated stats:
                 TOTAL events:        469
                  MMAP events:        268  (57.1%)
                  COMM events:          2  ( 0.4%)
                  EXIT events:          1  ( 0.2%)
                SAMPLE events:         16  ( 3.4%)
                 MMAP2 events:         22  ( 4.7%)
          LOST_SAMPLES events:          2  ( 0.4%)
               KSYMBOL events:         89  (19.0%)
             BPF_EVENT events:         39  ( 8.3%)
                  ATTR events:          2  ( 0.4%)
        FINISHED_ROUND events:          1  ( 0.2%)
              ID_INDEX events:          1  ( 0.2%)
            THREAD_MAP events:          1  ( 0.2%)
               CPU_MAP events:          1  ( 0.2%)
          EVENT_UPDATE events:          2  ( 0.4%)
             TIME_CONV events:          1  ( 0.2%)
               FEATURE events:         20  ( 4.3%)
         FINISHED_INIT events:          1  ( 0.2%)
  cycles stats:
                SAMPLE events:          2
    LOST_SAMPLES (BPF) events:       4010
  instructions stats:
                SAMPLE events:         14
    LOST_SAMPLES (BPF) events:       3990
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240820154504.128923-2-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

1a5474a7

perf bpf-filter: Support multiple events properly · 0fe2b18d

Namhyung Kim authored Aug 20, 2024

So far it used tgid as a key to get the filter expressions in the
pinned filters map for regular users but it won't work well if the has
more than one filters at the same time.  Let's add the event id to the
key of the filter hash map so that it can identify the right filter
expression in the BPF program.

As the event can be inherited to child tasks, it should use the primary
id which belongs to the parent (original) event.  Since evsel opens the
event for multiple CPUs and tasks, it needs to maintain a separate hash
map for the event id.

In the user space, it keeps a list for the multiple evsel and release
the entries in the both hash map when it closes the event.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240820154504.128923-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

0fe2b18d

perf hist: Don't set hpp_fmt_value for members in --no-group · 4f3affe0

Kan Liang authored Aug 20, 2024

Perf crashes as below when applying --no-group

  # perf record -e "{cache-misses,branches"} -b sleep 1
  # perf report --stdio --no-group
  free(): invalid next size (fast)
  Aborted (core dumped)
  #

In the __hpp__fmt(), only 1 hpp_fmt_value is allocated for the current
event when --no-group is applied.

However, the current implementation tries to assign the hists from all
members to the hpp_fmt_value, which exceeds the allocated memory.

Fixes: 8f6071a3 ("perf hist: Simplify __hpp_fmt() using hpp_fmt_data")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240820183202.3174323-1-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

4f3affe0

26 Aug, 2024 1 commit

perf test: Support external tests for separate objdir · f133c764

Andi Kleen authored Aug 13, 2024

Extend the searching for the test files so that it works when running
perf from a separate objdir, and also when the perf executable is
symlinked.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240813213651.1057362-2-ak@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

f133c764

22 Aug, 2024 6 commits

perf python: Disable -Wno-cast-function-type-mismatch if present on clang · 00dc5146

Arnaldo Carvalho de Melo authored Aug 22, 2024

The -Wcast-function-type-mismatch option was introduced in clang 19 and
its enabled by default, since we use -Werror, and python bindings do
casts that are valid but trips this warning, disable it if present.

Closes: https://lore.kernel.org/all/CA+icZUXoJ6BS3GMhJHV3aZWyb5Cz2haFneX0C5pUMUUhG-UVKQ@mail.gmail.comReported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org # To allow building with the upcoming clang 19
Link: https://lore.kernel.org/lkml/CA+icZUVtHn8X1Tb_Y__c-WswsO0K8U9uy3r2MzKXwTA5THtL7w@mail.gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

00dc5146

perf python: Allow checking for the existence of warning options in clang · b8116230

Arnaldo Carvalho de Melo authored Aug 22, 2024

We'll need to check if an warning option introduced in clang 19 is
available on the clang version being used, so cover the error message
emitted when testing for a -W option.
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/CA+icZUVtHn8X1Tb_Y__c-WswsO0K8U9uy3r2MzKXwTA5THtL7w@mail.gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

b8116230

perf annotate-data: Copy back variable types after move · 1cfd01eb

Namhyung Kim authored Aug 21, 2024

In some cases, compilers don't set the location expression in DWARF
precisely.  For instance, it may assign a variable to a register after
copying it from a different register.  Then it should use the register
for the new type but still uses the old register.  This makes hard to
track the type information properly.

This is an example I found in __tcp_transmit_skb().  The first argument
(sk) of this function is a pointer to sock and there's a variable (tp)
for tcp_sock.

  static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
  				int clone_it, gfp_t gfp_mask, u32 rcv_nxt)
  {
  	...
  	struct tcp_sock *tp;

  	BUG_ON(!skb || !tcp_skb_pcount(skb));
  	tp = tcp_sk(sk);
  	prior_wstamp = tp->tcp_wstamp_ns;
  	tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
  	...

So it basically calls tcp_sk(sk) to get the tcp_sock pointer from sk.
But it turned out to be the same value because tcp_sock embeds sock as
the first member.  The sk is located in reg5 (RDI) and tp is in reg3
(RBX).  The offset of tcp_wstamp_ns is 0x748 and tcp_clock_cache is
0x750.  So you need to use RBX (reg3) to access the fields in the
tcp_sock.  But the code used RDI (reg5) as it has the same value.

  $ pahole --hex -C tcp_sock vmlinux | grep -e 748 -e 750
	u64                tcp_wstamp_ns;        /* 0x748   0x8 */
	u64                tcp_clock_cache;      /* 0x750   0x8 */

And this is the disassembly of the part of the function.

  <__tcp_transmit_skb>:
  ...
  44:  mov    %rdi, %rbx
  47:  mov    0x748(%rdi), %rsi
  4e:  mov    0x750(%rdi), %rax
  55:  cmp    %rax, %rsi

Because compiler put the debug info to RBX, it only knows RDI is a
pointer to sock and accessing those two fields resulted in error
due to offset being beyond the type size.

  -----------------------------------------------------------
  find data type for 0x748(reg5) at __tcp_transmit_skb+0x63
  CU for net/ipv4/tcp_output.c (die:0x817f543)
  frame base: cfa=0 fbreg=6
  scope: [1/1] (die:81aac3e)
  bb: [0 - 30]
  var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
  var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
  var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
  var [5] reg1 type='int' size=0x4 (die:0x818059e)
  var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
  var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c)                   <<<--- the first argument ('sk' at %RDI)
  mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6)
  mov [20] stack canary -> reg0
  mov [29] reg0 -> -0x30(stack) stack canary
  bb: [36 - 3e]
  mov [36] reg4 -> reg15 type='struct sk_buff*' size=0x8 (die:0x8181360)
  bb: [44 - 63]
  mov [44] reg5 -> reg3 type='struct sock*' size=0x8 (die:0x8181a0c)          <<<--- calling tcp_sk()
  var [47] reg3 type='struct tcp_sock*' size=0x8 (die:0x819eead)              <<<--- new variable ('tp' at %RBX)
  var [4e] reg4 type='unsigned long long' size=0x8 (die:0x8180edd)
  mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd)
  chk [63] reg5 offset=0x748 ok=1 kind=1 (struct sock*) : offset bigger than size    <<<--- access with old variable
  final result: offset bigger than size

While it's a fault in the compiler, we could work around this issue by
using the type of new variable when it's copied directly.  So I've added
copied_from field in the register state to track those direct register
to register copies.  After that new register gets a new type and the old
register still has the same type, it'll update (copy it back) the type
of the old register.

For example, if we can update type of reg5 at __tcp_transmit_skb+0x47,
we can find the target type of the instruction at 0x63 like below:

  -----------------------------------------------------------
  find data type for 0x748(reg5) at __tcp_transmit_skb+0x63
  ...
  bb: [44 - 63]
  mov [44] reg5 -> reg3 type='struct sock*' size=0x8 (die:0x8181a0c)
  var [47] reg3 type='struct tcp_sock*' size=0x8 (die:0x819eead)
  var [47] copyback reg5 type='struct tcp_sock*' size=0x8 (die:0x819eead)     <<<--- here
  mov [47] 0x748(reg5) -> reg4 type='unsigned long long' size=0x8 (die:0x8180edd)
  mov [4e] 0x750(reg5) -> reg0 type='unsigned long long' size=0x8 (die:0x8180edd)
  mov [58] reg4 -> -0xc0(stack) type='unsigned long long' size=0x8 (die:0x8180edd)
  chk [63] reg5 offset=0x748 ok=1 kind=1 (struct tcp_sock*) : Good!           <<<--- new type
  found by insn track: 0x748(reg5) type-offset=0x748
  final result:  type='struct tcp_sock' size=0xa98 (die:0x819eeb2)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-5-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

1cfd01eb

perf annotate-data: Update stack slot for the store · 895891da

Namhyung Kim authored Aug 21, 2024

When checking the match variable at the target instruction, it might not
have any information if it's a first write to a stack slot.  In this
case it could spill a register value into the stack so the type info is
in the source operand.

But currently it's hard to get the operand from the checking function.
Let's process the instruction and retry to get the type info from the
stack if there's no information already.

This is an example of __tcp_transmit_skb().  The instructions are

  <__tcp_transmit_skb>:
   0: nopl   0x0(%rax, %rax, 1)
   5: push   %rbp
   6: mov    %rsp, %rbp
   9: push   %r15
   b: push   %r14
   d: push   %r13
   f: push   %r12
  11: push   %rbx
  12: sub    $0x98, %rsp
  19: mov    %r8d, -0xa8(%rbp)
  ...

It cannot find any variable at -0xa8(%rbp) at this point.
  -----------------------------------------------------------
  find data type for -0xa8(reg6) at __tcp_transmit_skb+0x19
  CU for net/ipv4/tcp_output.c (die:0x817f543)
  frame base: cfa=0 fbreg=6
  scope: [1/1] (die:81aac3e)
  bb: [0 - 19]
  var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
  var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
  var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
  var [5] reg1 type='int' size=0x4 (die:0x818059e)
  var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
  var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c)
  chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : no type information
  no type information

And it was able to find the type after processing the 'mov' instruction.
  -----------------------------------------------------------
  find data type for -0xa8(reg6) at __tcp_transmit_skb+0x19
  CU for net/ipv4/tcp_output.c (die:0x817f543)
  frame base: cfa=0 fbreg=6
  scope: [1/1] (die:81aac3e)
  bb: [0 - 19]
  var [0] -0x98(stack) type='struct tcp_out_options' size=0x28 (die:0x81af3df)
  var [5] reg8 type='unsigned int' size=0x4 (die:0x8180ed6)
  var [5] reg2 type='unsigned int' size=0x4 (die:0x8180ed6)
  var [5] reg1 type='int' size=0x4 (die:0x818059e)
  var [5] reg4 type='struct sk_buff*' size=0x8 (die:0x8181360)
  var [5] reg5 type='struct sock*' size=0x8 (die:0x8181a0c)
  chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : retry                    <<<--- here
  mov [19] reg8 -> -0xa8(stack) type='unsigned int' size=0x4 (die:0x8180ed6)
  chk [19] reg6 offset=-0xa8 ok=0 kind=0 fbreg : Good!
  found by insn track: -0xa8(reg6) type-offset=0
  final result:  type='unsigned int' size=0x4 (die:0x8180ed6)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-4-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

895891da

perf annotate-data: Update debug messages · a0d57c60

Namhyung Kim authored Aug 21, 2024

In check_matching_type(), it'd be easier to display the typename in
question if it's available.

For example, check out the line starts with 'chk'.
  -----------------------------------------------------------
  find data type for 0x10(reg0) at cpuacct_charge+0x13
  CU for kernel/sched/build_utility.c (die:0x137ee0b)
  frame base: cfa=1 fbreg=7
  scope: [3/3] (die:13d9632)
  bb: [c - 13]
  var [c] reg5 type='struct task_struct*' size=0x8 (die:0x1381230)
  mov [c] 0xdf8(reg5) -> reg0 type='struct css_set*' size=0x8 (die:0x1385c56)
  chk [13] reg0 offset=0x10 ok=1 kind=1 (struct css_set*) : Good!         <<<--- here
  found by insn track: 0x10(reg0) type-offset=0x10
  final result:  type='struct css_set' size=0x250 (die:0x1385b0e)

Another example:
  -----------------------------------------------------------
  find data type for 0x8(reg0) at menu_select+0x279
  CU for drivers/cpuidle/governors/menu.c (die:0x7b0fe79)
  frame base: cfa=1 fbreg=7
  scope: [2/2] (die:7b11010)
  bb: [273 - 277]
  bb: [279 - 279]
  chk [279] reg0 offset=0x8 ok=0 kind=0 cfa : no type information
  scope: [1/2] (die:7b10cbc)
  bb: [0 - 64]
  ...
  mov [26a] imm=0xffffffff -> reg15
  bb: [273 - 277]
  bb: [279 - 279]
  chk [279] reg0 offset=0x8 ok=1 kind=1 (long long unsigned int) : no/void pointer    <<<--- here
  final result: no/void pointer

Also change some places to print negative offsets properly.

Before:
  -----------------------------------------------------------
  find data type for 0xffffff40(reg6) at __tcp_transmit_skb+0x58

After:
  -----------------------------------------------------------
  find data type for -0xc0(reg6) at __tcp_transmit_skb+0x58
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-3-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

a0d57c60

perf dwarf-aux: Handle bitfield members from pointer access · a11b4222

Namhyung Kim authored Aug 21, 2024

The __die_find_member_offset_cb() missed to handle bitfield members
which don't have DW_AT_data_member_location.  Like in adding member
types in __add_member_cb() it should fallback to check the bit offset
when it resolves the member type for an offset.

Fixes: 437683a9 ("perf dwarf-aux: Handle type transfer for memory access")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821232628.353177-2-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

a11b4222

21 Aug, 2024 6 commits

perf annotate-data: Add 'typecln' sort key · fd45d52e

Namhyung Kim authored Aug 19, 2024

Sometimes it's useful to organize member fields in cache-line boundary.

The 'typecln' sort key is short for type-cacheline and to show samples
in each cacheline.  The cacheline size is fixed to 64 for now, but it
can read the actual size once it saves the value from sysfs.

For example, you maybe want to which cacheline in a target is hot or
cold.  The following shows members in the cfs_rq's first cache line.

  $ perf report -s type,typecln,typeoff -H
  ...
  -    2.67%        struct cfs_rq
     +    1.23%        struct cfs_rq: cache-line 2
     +    0.57%        struct cfs_rq: cache-line 4
     +    0.46%        struct cfs_rq: cache-line 6
     -    0.41%        struct cfs_rq: cache-line 0
             0.39%        struct cfs_rq +0x14 (h_nr_running)
             0.02%        struct cfs_rq +0x38 (tasks_timeline.rb_leftmost)
  ...

Committer testing:

  # root@number:~# perf report -s type,typecln,typeoff -H --stdio
  # Total Lost Samples: 0
  #
  # Samples: 5K of event 'cpu_atom/mem-loads,ldlat=5/P'
  # Event count (approx.): 312251
  #
  #       Overhead  Data Type / Data Type Cacheline / Data Type Offset
  # ..............  ..................................................
  #
  <SNIP>
       0.07%        struct sigaction
          0.05%        struct sigaction: cache-line 1
             0.02%        struct sigaction +0x58 (sa_mask)
             0.02%        struct sigaction +0x78 (sa_mask)
          0.03%        struct sigaction: cache-line 0
             0.02%        struct sigaction +0x38 (sa_mask)
             0.01%        struct sigaction +0x8 (sa_mask)
  <SNIP>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819233603.54941-2-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

fd45d52e

perf annotate-data: Show offset and size in hex · 7a5c2170

Namhyung Kim authored Aug 19, 2024

It'd be better to have them in hex to check cacheline alignment.

 Percent     offset       size  field
  100.00          0      0x1c0  struct cfs_rq    {
    0.00          0       0x10      struct load_weight  load {
    0.00          0        0x8          long unsigned int       weight;
    0.00        0x8        0x4          u32     inv_weight;
                                    };
    0.00       0x10        0x4      unsigned int        nr_running;
   14.56       0x14        0x4      unsigned int        h_nr_running;
    0.00       0x18        0x4      unsigned int        idle_nr_running;
    0.00       0x1c        0x4      unsigned int        idle_h_nr_running;
  ...

Committer notes:

Justification from Namhyung when asked about why it would be "better":

Cache line sizes are power of 2 so it'd be natural to use hex and
check whether an offset is in the same boundary.  Also 'perf annotate'
shows instruction offsets in hex.

>
> Maybe this should be selectable?

I can add an option and/or a config if you want.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819233603.54941-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

7a5c2170

perf bpf: Remove redundant check that map is NULL · ce66d7c7

Yang Ruibin authored Aug 21, 2024

The check that map is NULL is already done in the bpf_map__fd(map) and
returns an errno, which does not run further checks.

In addition, even if the check for map is run, the return is a pointer,
which is not consistent with the err_number returned by bpf_map__fd(map).
Signed-off-by: Yang Ruibin <11162571@vivo.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: opensource.kernel@vivo.com
Link: https://lore.kernel.org/r/20240821101500.4568-1-11162571@vivo.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

ce66d7c7

perf annotate-data: Fix percpu pointer check · 4d6d6e0f

Namhyung Kim authored Aug 20, 2024

In check_matching_type(), it checks the type state of the register in a
wrong order.  When it's the percpu pointer, it should check the type for
the pointer, but it checks the CFA bit first and thought it has no type
in the stack slot.  This resulted in no type info.

  -----------------------------------------------------------
  find data type for 0x28(reg1) at hrtimer_reprogram+0x88
  CU for kernel/time/hrtimer.c (die:0x18f219f)
  frame base: cfa=1 fbreg=7
  ...
  add [72] percpu 0x24500 -> reg1 pointer type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46)
  bb: [7a - 7e]
  bb: [80 - 86]                        (here)
  bb: [88 - 88]                         vvv
  chk [88] reg1 offset=0x28 ok=1 kind=4 cfa : no type information
  no type information

Here, instruction at 0x72 found reg1 has a (percpu) pointer and got the
correct type.  But when it checks the final result, it wrongly thought
it was stack variable because it checks the cfa bit first.

After changing the order of state check:
  -----------------------------------------------------------
  find data type for 0x28(reg1) at hrtimer_reprogram+0x88
  CU for kernel/time/hrtimer.c (die:0x18f219f)
  frame base: cfa=1 fbreg=7
  ...                                     (here)
                                        vvvvvvvvvv
  chk [88] reg1 offset=0x28 ok=1 kind=4 percpu ptr : Good!
  found by insn track: 0x28(reg1) type-offset=0x28
  final type: type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821065408.285548-3-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

4d6d6e0f

perf annotate-data: Prefer struct/union over base type · 4a32a972

Namhyung Kim authored Aug 20, 2024

Sometimes a compound type can have a single field and the size is the
same as the base type.  But it's still preferred as struct or union
could carry more information than the base type.

Also put a slight priority on the typedef for the same reason.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821065408.285548-2-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

4a32a972

perf annotate-data: Fix missing constant copy · 922ec313

Namhyung Kim authored Aug 20, 2024

I found it missed to copy the immediate constant when it moves the
register value.  This could result in a wrong type inference since the
address for the per-cpu variable would be 0 always.

Fixes: eb9190af ("perf annotate-data: Handle ADD instructions")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240821065408.285548-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

922ec313

20 Aug, 2024 3 commits

perf cap: Tidy up and improve capability testing · e25ebda7

Ian Rogers authored Aug 06, 2024

Remove dependence on libcap. libcap is only used to query whether a
capability is supported, which is just 1 capget system call.

If the capget system call fails, fall back on root permission
checking. Previously if libcap fails then the permission is assumed
not present which may be pessimistic/wrong.

Add a used_root out argument to perf_cap__capable to say whether the
fall back root check was used. This allows the correct error message,
"root" vs "users with the CAP_PERFMON or CAP_SYS_ADMIN capability", to
be selected.

Tidy uses of perf_cap__capable so that tests aren't repeated if capget
isn't supported.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240806220614.831914-1-irogers@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

e25ebda7

perf annotate-data: Set bitfield member offset and size properly · 8b1042c4

Namhyung Kim authored Aug 15, 2024

The bitfield members might not have DW_AT_data_member_location.  Let's
use DW_AT_data_bit_offset to set the member offset correct.  Also use
DW_AT_bit_size for the name like in a C program.

Before:
  Annotate type: 'struct sk_buff' (1 samples)
        Percent     Offset       Size  Field
  -      100.00          0        232  struct sk_buff {
  +        0.00          0         24      union  ;
  +        0.00         24          8      union  ;
  +        0.00         32          8      union  ;
           0.00         40         48      char[] cb;
  +        0.00         88         16      union  ;
           0.00        104          8      long unsigned int      _nfct;
         100.00        112          4      unsigned int   len;
           0.00        116          4      unsigned int   data_len;
           0.00        120          2      __u16  mac_len;
           0.00        122          2      __u16  hdr_len;
           0.00        124          2      __u16  queue_mapping;
           0.00        126          0      __u8[] __cloned_offset;
           0.00          0          1      __u8   cloned;
           0.00          0          1      __u8   nohdr;
           0.00          0          1      __u8   fclone;
           0.00          0          1      __u8   peeked;
           0.00          0          1      __u8   head_frag;
           0.00          0          1      __u8   pfmemalloc;
           0.00          0          1      __u8   pp_recycle;
           0.00        127          1      __u8   active_extensions;
  +        0.00        128         60      union  ;
           0.00        188          4      sk_buff_data_t tail;
           0.00        192          4      sk_buff_data_t end;
           0.00        200          8      unsigned char* head;

After:

  Annotate type: 'struct sk_buff' (1 samples)
        Percent     Offset       Size  Field
  -      100.00          0        232  struct sk_buff {
  +        0.00          0         24      union  ;
  +        0.00         24          8      union  ;
  +        0.00         32          8      union  ;
           0.00         40         48      char[] cb
  +        0.00         88         16      union  ;
           0.00        104          8      long unsigned int      _nfct;
         100.00        112          4      unsigned int   len;
           0.00        116          4      unsigned int   data_len;
           0.00        120          2      __u16  mac_len;
           0.00        122          2      __u16  hdr_len;
           0.00        124          2      __u16  queue_mapping;
           0.00        126          0      __u8[] __cloned_offset;
           0.00        126          1      __u8   cloned:1;
           0.00        126          1      __u8   nohdr:1;
           0.00        126          1      __u8   fclone:2;
           0.00        126          1      __u8   peeked:1;
           0.00        126          1      __u8   head_frag:1;
           0.00        126          1      __u8   pfmemalloc:1;
           0.00        126          1      __u8   pp_recycle:1;
           0.00        127          1      __u8   active_extensions;
  +        0.00        128         60      union  ;
           0.00        188          4      sk_buff_data_t tail;
           0.00        192          4      sk_buff_data_t end;
           0.00        200          8      unsigned char* head;

Commiter notes:

Collect some data:

  root@number:~# perf mem record -a --ldlat 5 -- ping -s 8193 -f 192.168.86.1
  Memory events are enabled on a subset of CPUs: 16-27
  PING 192.168.86.1 (192.168.86.1) 8193(8221) bytes of data.
  .^C
  --- 192.168.86.1 ping statistics ---
  13881 packets transmitted, 13880 received, 0.00720409% packet loss, time 8664ms
  rtt min/avg/max/mdev = 0.510/0.599/7.768/0.115 ms, ipg/ewma 0.624/0.593 ms
  [ perf record: Woken up 8 times to write data ]
  [ perf record: Captured and wrote 14.877 MB perf.data (46785 samples) ]

  root@number:~#
  root@number:~# perf evlist
  cpu_atom/mem-loads,ldlat=5/P
  cpu_atom/mem-stores/P
  dummy:u
  root@number:~# perf evlist -v
  cpu_atom/mem-loads,ldlat=5/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x7
  cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
  dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
  root@number:~#

Ok, now lets see what changes from before this patch to after it:

  root@number:~# perf annotate --data-type > /tmp/before

Apply the patch, build:

  root@number:~# perf annotate --data-type > /tmp/after

The first hunk of the diff, for a glib data structure, in userspace,
look at those bitfields:

  root@number:~# diff -u10 /tmp/before /tmp/after | head -20
  --- /tmp/before	2024-08-20 17:29:58.306765780 -0300
  +++ /tmp/after	2024-08-20 17:33:13.210582596 -0300
  @@ -163,22 +163,22 @@

   Annotate type: 'GHashTable' in /usr/lib64/libglib-2.0.so.0.8000.3 (1 samples):
   ============================================================================
    Percent     offset       size  field
     100.00          0         96  GHashTable	 {
       0.00          0          8      gsize	size;
       0.00          8          4      gint	mod;
     100.00         12          4      guint	mask;
       0.00         16          4      guint	nnodes;
       0.00         20          4      guint	noccupied;
  -    0.00          0          4      guint	have_big_keys;
  -    0.00          0          4      guint	have_big_values;
  +    0.00         24          1      guint	have_big_keys:1;
  +    0.00         24          1      guint	have_big_values:1;
       0.00         32          8      gpointer	keys;
       0.00         40          8      guint*	hashes;
       0.00         48          8      gpointer	values;
  root@number:~#

As advertised :-)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240815223823.2402285-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8b1042c4

perf daemon: Fix the build on more 32-bit architectures · 6236ebe0

Arnaldo Carvalho de Melo authored Aug 19, 2024

The previous attempt fixed the build on debian:experimental-x-mipsel,
but when building on a larger set of containers I noticed it broke the
build on some other 32-bit architectures such as:

  42     7.87 ubuntu:18.04-x-arm            : FAIL gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)
    builtin-daemon.c: In function 'cmd_session_list':
    builtin-daemon.c:692:16: error: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'long int' [-Werror=format=]
       fprintf(out, "%c%" PRIu64,
                    ^~~~~
    builtin-daemon.c:694:13:
        csv_sep, (curr - daemon->start) / 60);
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In file included from builtin-daemon.c:3:0:
    /usr/arm-linux-gnueabihf/include/inttypes.h:105:34: note: format string is defined here
     # define PRIu64  __PRI64_PREFIX "u"

So lets cast that time_t (32-bit/64-bit) to uint64_t to make sure it
builds everywhere.

Fixes: 4bbe6002 ("perf daemon: Fix the build on 32-bit architectures")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZsPmldtJ0D9Cua9_@x1Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

6236ebe0

19 Aug, 2024 4 commits

perf test: Add cgroup sampling test · 5cc698ba

Namhyung Kim authored Aug 18, 2024

Add it to the record.sh shell test to verify if it tracks cgroup
information correctly.  It records with --all-cgroups option can check
if it has PERF_RECORD_CGROUP and the names are not "unknown".

  $ sudo ./perf test -vv 95
   95: perf record tests:
  --- start ---
  test child forked, pid 2871922
   169c90-169cd0 g test_loop
  perf does have symbol 'test_loop'
  Basic --per-thread mode test
  Basic --per-thread mode test [Success]
  Register capture test
  Register capture test [Success]
  Basic --system-wide mode test
  Basic --system-wide mode test [Success]
  Basic target workload test
  Basic target workload test [Success]
  Branch counter test
  branch counter feature not supported on all core PMUs (/sys/bus/event_source/devices/cpu) [Skipped]
  Cgroup sampling test
  Cgroup sampling test [Success]
  ---- end(0) ----
   95: perf record tests                                               : Ok
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240818212948.2873156-2-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

5cc698ba

perf record: Fix sample cgroup & namespace tracking · 3432bae8

Namhyung Kim authored Aug 18, 2024

The recent change in 'struct perf_tool' constification broke the cgroup
and/or namespace tracking by resetting tool fields.  It should set the
values after perf_tool__init().

Fixes: cecb1cf1 ("perf record: Use perf_tool__init()")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240818212948.2873156-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

3432bae8

perf inject: Combine mmap and mmap2 handling · 05c4cfeb

Ian Rogers authored Aug 16, 2024

The handling of mmap and mmap2 events is near identical. Add a common
helper function and call that by the two event handling functions.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-10-irogers@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

05c4cfeb

perf inject: Combine different mmap and mmap2 functions · 048a7a93

Ian Rogers authored Aug 16, 2024

There are repipe, build ID and JIT dump variants of the mmap and mmap2
repipe functions. The organization doesn't allow JIT dump to work with
build ID injection and the structure is less than clear. Combine the
function and enable the different behaviors based on ifs.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Link: https://lore.kernel.org/r/20240817064442.2152089-9-irogers@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

048a7a93