Commit 7e55b956 authored by Steinar H. Gunderson's avatar Steinar H. Gunderson Committed by Arnaldo Carvalho de Melo

perf intel-pt: Synthesize cycle events

There is no good reason why we cannot synthesize "cycle" events from
Intel PT just as we can synthesize "instruction" events, in particular
when CYC packets are available. This enables using PT to getting much
more accurate cycle profiles than regular sampling (record -e cycles)
when the work last for very short periods (<10 ms).  Thus, add support
for this, based off of the existing IPC calculation framework. The new
option to --itrace is "y" (for cYcles), as c was taken for calls. Cycle
and instruction events can be synthesized together, and are by default.

The only real caveat is that CYC packets are only emitted whenever some
other packet is, which in practice is when a branch instruction is
encountered (and not even all branches). Thus, even at no subsampling
(e.g. --itrace=y0ns), it is impossible to get more accuracy than a
single basic block, and all cycles spent executing that block will get
attributed to the branch instruction that ends the packet.  Thus, one
cannot know whether the cycles came from e.g. a specific load, a
mispredicted branch, or something else. When subsampling (which is the
default), the cycle events will get smeared out even more, but will
still be generally useful to attribute cycle counts to functions.
Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
Signed-off-by: default avatarSteinar H. Gunderson <sesse@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322082452.1429091-1-sesse@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parent 1470a108
i synthesize instructions events i synthesize instructions events
y synthesize cycles events
b synthesize branches events (branch misses for Arm SPE) b synthesize branches events (branch misses for Arm SPE)
c synthesize branches events (calls only) c synthesize branches events (calls only)
r synthesize branches events (returns only) r synthesize branches events (returns only)
...@@ -25,7 +26,7 @@ ...@@ -25,7 +26,7 @@
A approximate IPC A approximate IPC
Z prefer to ignore timestamps (so-called "timeless" decoding) Z prefer to ignore timestamps (so-called "timeless" decoding)
The default is all events i.e. the same as --itrace=ibxwpe, The default is all events i.e. the same as --itrace=iybxwpe,
except for perf script where it is --itrace=ce except for perf script where it is --itrace=ce
In addition, the period (default 100000, except for perf script where it is 1) In addition, the period (default 100000, except for perf script where it is 1)
......
...@@ -101,12 +101,12 @@ data is available you can use the 'perf script' tool with all itrace sampling ...@@ -101,12 +101,12 @@ data is available you can use the 'perf script' tool with all itrace sampling
options, which will list all the samples. options, which will list all the samples.
perf record -e intel_pt//u ls perf record -e intel_pt//u ls
perf script --itrace=ibxwpe perf script --itrace=iybxwpe
An interesting field that is not printed by default is 'flags' which can be An interesting field that is not printed by default is 'flags' which can be
displayed as follows: displayed as follows:
perf script --itrace=ibxwpe -F+flags perf script --itrace=iybxwpe -F+flags
The flags are "bcrosyiABExghDt" which stand for branch, call, return, conditional, The flags are "bcrosyiABExghDt" which stand for branch, call, return, conditional,
system, asynchronous, interrupt, transaction abort, trace begin, trace end, system, asynchronous, interrupt, transaction abort, trace begin, trace end,
...@@ -147,16 +147,17 @@ displayed as follows: ...@@ -147,16 +147,17 @@ displayed as follows:
There are two ways that instructions-per-cycle (IPC) can be calculated depending There are two ways that instructions-per-cycle (IPC) can be calculated depending
on the recording. on the recording.
If the 'cyc' config term (see config terms section below) was used, then IPC is If the 'cyc' config term (see config terms section below) was used, then IPC
calculated using the cycle count from CYC packets, otherwise MTC packets are and cycle events are calculated using the cycle count from CYC packets, otherwise
used - refer to the 'mtc' config term. When MTC is used, however, the values MTC packets are used - refer to the 'mtc' config term. When MTC is used, however,
are less accurate because the timing is less accurate. the values are less accurate because the timing is less accurate.
Because Intel PT does not update the cycle count on every branch or instruction, Because Intel PT does not update the cycle count on every branch or instruction,
the values will often be zero. When there are values, they will be the number the values will often be zero. When there are values, they will be the number
of instructions and number of cycles since the last update, and thus represent of instructions and number of cycles since the last update, and thus represent
the average IPC since the last IPC for that event type. Note IPC for "branches" the average IPC cycle count since the last IPC for that event type.
events is calculated separately from IPC for "instructions" events. Note IPC for "branches" events is calculated separately from IPC for "instructions"
events.
Even with the 'cyc' config term, it is possible to produce IPC information for Even with the 'cyc' config term, it is possible to produce IPC information for
every change of timestamp, but at the expense of accuracy. That is selected by every change of timestamp, but at the expense of accuracy. That is selected by
...@@ -900,11 +901,12 @@ Having no option is the same as ...@@ -900,11 +901,12 @@ Having no option is the same as
which, in turn, is the same as which, in turn, is the same as
--itrace=cepwx --itrace=cepwxy
The letters are: The letters are:
i synthesize "instructions" events i synthesize "instructions" events
y synthesize "cycles" events
b synthesize "branches" events b synthesize "branches" events
x synthesize "transactions" events x synthesize "transactions" events
w synthesize "ptwrite" events w synthesize "ptwrite" events
...@@ -927,6 +929,16 @@ The letters are: ...@@ -927,6 +929,16 @@ The letters are:
"Instructions" events look like they were recorded by "perf record -e "Instructions" events look like they were recorded by "perf record -e
instructions". instructions".
"Cycles" events look like they were recorded by "perf record -e cycles"
(ie., the default). Note that even with CYC packets enabled and no sampling,
these are not fully accurate, since CYC packets are not emitted for each
instruction, only when some other event (like an indirect branch, or a
TNT packet representing multiple branches) happens causes a packet to
be emitted. Thus, it is more effective for attributing cycles to functions
(and possibly basic blocks) than to individual instructions, although it
is not even perfect for functions (although it becomes better if the noretcomp
option is active).
"Branches" events look like they were recorded by "perf record -e branches". "c" "Branches" events look like they were recorded by "perf record -e branches". "c"
and "r" can be combined to get calls and returns. and "r" can be combined to get calls and returns.
...@@ -934,9 +946,9 @@ and "r" can be combined to get calls and returns. ...@@ -934,9 +946,9 @@ and "r" can be combined to get calls and returns.
'flags' field can be used in perf script to determine whether the event is a 'flags' field can be used in perf script to determine whether the event is a
transaction start, commit or abort. transaction start, commit or abort.
Note that "instructions", "branches" and "transactions" events depend on code Note that "instructions", "cycles", "branches" and "transactions" events
flow packets which can be disabled by using the config term "branch=0". Refer depend on code flow packets which can be disabled by using the config term
to the config terms section above. "branch=0". Refer to the config terms section above.
"ptwrite" events record the payload of the ptwrite instruction and whether "ptwrite" events record the payload of the ptwrite instruction and whether
"fup_on_ptw" was used. "ptwrite" events depend on PTWRITE packets which are "fup_on_ptw" was used. "ptwrite" events depend on PTWRITE packets which are
......
...@@ -1394,6 +1394,7 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts, ...@@ -1394,6 +1394,7 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
synth_opts->calls = true; synth_opts->calls = true;
} else { } else {
synth_opts->instructions = true; synth_opts->instructions = true;
synth_opts->cycles = true;
synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE; synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE;
synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD; synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
} }
...@@ -1482,7 +1483,11 @@ int itrace_do_parse_synth_opts(struct itrace_synth_opts *synth_opts, ...@@ -1482,7 +1483,11 @@ int itrace_do_parse_synth_opts(struct itrace_synth_opts *synth_opts,
for (p = str; *p;) { for (p = str; *p;) {
switch (*p++) { switch (*p++) {
case 'i': case 'i':
synth_opts->instructions = true; case 'y':
if (p[-1] == 'y')
synth_opts->cycles = true;
else
synth_opts->instructions = true;
while (*p == ' ' || *p == ',') while (*p == ' ' || *p == ',')
p += 1; p += 1;
if (isdigit(*p)) { if (isdigit(*p)) {
...@@ -1641,7 +1646,7 @@ int itrace_do_parse_synth_opts(struct itrace_synth_opts *synth_opts, ...@@ -1641,7 +1646,7 @@ int itrace_do_parse_synth_opts(struct itrace_synth_opts *synth_opts,
} }
} }
out: out:
if (synth_opts->instructions) { if (synth_opts->instructions || synth_opts->cycles) {
if (!period_type_set) if (!period_type_set)
synth_opts->period_type = synth_opts->period_type =
PERF_ITRACE_DEFAULT_PERIOD_TYPE; PERF_ITRACE_DEFAULT_PERIOD_TYPE;
......
...@@ -71,6 +71,9 @@ enum itrace_period_type { ...@@ -71,6 +71,9 @@ enum itrace_period_type {
* @inject: indicates the event (not just the sample) must be fully synthesized * @inject: indicates the event (not just the sample) must be fully synthesized
* because 'perf inject' will write it out * because 'perf inject' will write it out
* @instructions: whether to synthesize 'instructions' events * @instructions: whether to synthesize 'instructions' events
* @cycles: whether to synthesize 'cycles' events
* (not fully accurate, since CYC packets are only emitted
* together with other events, such as branches)
* @branches: whether to synthesize 'branches' events * @branches: whether to synthesize 'branches' events
* (branch misses only for Arm SPE) * (branch misses only for Arm SPE)
* @transactions: whether to synthesize events for transactions * @transactions: whether to synthesize events for transactions
...@@ -119,6 +122,7 @@ struct itrace_synth_opts { ...@@ -119,6 +122,7 @@ struct itrace_synth_opts {
bool default_no_sample; bool default_no_sample;
bool inject; bool inject;
bool instructions; bool instructions;
bool cycles;
bool branches; bool branches;
bool transactions; bool transactions;
bool ptwrites; bool ptwrites;
...@@ -643,6 +647,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session, ...@@ -643,6 +647,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
#define ITRACE_HELP \ #define ITRACE_HELP \
" i[period]: synthesize instructions events\n" \ " i[period]: synthesize instructions events\n" \
" y[period]: synthesize cycles events (same period as i)\n" \
" b: synthesize branches events (branch misses for Arm SPE)\n" \ " b: synthesize branches events (branch misses for Arm SPE)\n" \
" c: synthesize branches events (calls only)\n" \ " c: synthesize branches events (calls only)\n" \
" r: synthesize branches events (returns only)\n" \ " r: synthesize branches events (returns only)\n" \
...@@ -674,7 +679,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session, ...@@ -674,7 +679,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
" A: approximate IPC\n" \ " A: approximate IPC\n" \
" Z: prefer to ignore timestamps (so-called \"timeless\" decoding)\n" \ " Z: prefer to ignore timestamps (so-called \"timeless\" decoding)\n" \
" PERIOD[ns|us|ms|i|t]: specify period to sample stream\n" \ " PERIOD[ns|us|ms|i|t]: specify period to sample stream\n" \
" concatenate multiple options. Default is ibxwpe or cewp\n" " concatenate multiple options. Default is iybxwpe or cewp\n"
static inline static inline
void itrace_synth_opts__set_time_range(struct itrace_synth_opts *opts, void itrace_synth_opts__set_time_range(struct itrace_synth_opts *opts,
......
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
*/ */
#include <inttypes.h> #include <inttypes.h>
#include <linux/perf_event.h>
#include <stdio.h> #include <stdio.h>
#include <stdbool.h> #include <stdbool.h>
#include <errno.h> #include <errno.h>
...@@ -98,6 +99,10 @@ struct intel_pt { ...@@ -98,6 +99,10 @@ struct intel_pt {
u64 instructions_sample_type; u64 instructions_sample_type;
u64 instructions_id; u64 instructions_id;
bool sample_cycles;
u64 cycles_sample_type;
u64 cycles_id;
bool sample_branches; bool sample_branches;
u32 branches_filter; u32 branches_filter;
u64 branches_sample_type; u64 branches_sample_type;
...@@ -214,6 +219,8 @@ struct intel_pt_queue { ...@@ -214,6 +219,8 @@ struct intel_pt_queue {
u64 ipc_cyc_cnt; u64 ipc_cyc_cnt;
u64 last_in_insn_cnt; u64 last_in_insn_cnt;
u64 last_in_cyc_cnt; u64 last_in_cyc_cnt;
u64 last_cy_insn_cnt;
u64 last_cy_cyc_cnt;
u64 last_br_insn_cnt; u64 last_br_insn_cnt;
u64 last_br_cyc_cnt; u64 last_br_cyc_cnt;
unsigned int cbr_seen; unsigned int cbr_seen;
...@@ -1319,7 +1326,7 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt, ...@@ -1319,7 +1326,7 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
if (pt->filts.cnt > 0) if (pt->filts.cnt > 0)
params.pgd_ip = intel_pt_pgd_ip; params.pgd_ip = intel_pt_pgd_ip;
if (pt->synth_opts.instructions) { if (pt->synth_opts.instructions || pt->synth_opts.cycles) {
if (pt->synth_opts.period) { if (pt->synth_opts.period) {
switch (pt->synth_opts.period_type) { switch (pt->synth_opts.period_type) {
case PERF_ITRACE_PERIOD_INSTRUCTIONS: case PERF_ITRACE_PERIOD_INSTRUCTIONS:
...@@ -1830,6 +1837,33 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq) ...@@ -1830,6 +1837,33 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
pt->instructions_sample_type); pt->instructions_sample_type);
} }
static int intel_pt_synth_cycle_sample(struct intel_pt_queue *ptq)
{
struct intel_pt *pt = ptq->pt;
union perf_event *event = ptq->event_buf;
struct perf_sample sample = { .ip = 0, };
u64 period = 0;
if (ptq->sample_ipc)
period = ptq->ipc_cyc_cnt - ptq->last_cy_cyc_cnt;
if (!period || intel_pt_skip_event(pt))
return 0;
intel_pt_prep_sample(pt, ptq, event, &sample);
sample.id = ptq->pt->cycles_id;
sample.stream_id = ptq->pt->cycles_id;
sample.period = period;
sample.cyc_cnt = period;
sample.insn_cnt = ptq->ipc_insn_cnt - ptq->last_cy_insn_cnt;
ptq->last_cy_insn_cnt = ptq->ipc_insn_cnt;
ptq->last_cy_cyc_cnt = ptq->ipc_cyc_cnt;
return intel_pt_deliver_synth_event(pt, event, &sample, pt->cycles_sample_type);
}
static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq) static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
{ {
struct intel_pt *pt = ptq->pt; struct intel_pt *pt = ptq->pt;
...@@ -2598,10 +2632,17 @@ static int intel_pt_sample(struct intel_pt_queue *ptq) ...@@ -2598,10 +2632,17 @@ static int intel_pt_sample(struct intel_pt_queue *ptq)
} }
} }
if (pt->sample_instructions && (state->type & INTEL_PT_INSTRUCTION)) { if (state->type & INTEL_PT_INSTRUCTION) {
err = intel_pt_synth_instruction_sample(ptq); if (pt->sample_instructions) {
if (err) err = intel_pt_synth_instruction_sample(ptq);
return err; if (err)
return err;
}
if (pt->sample_cycles) {
err = intel_pt_synth_cycle_sample(ptq);
if (err)
return err;
}
} }
if (pt->sample_transactions && (state->type & INTEL_PT_TRANSACTION)) { if (pt->sample_transactions && (state->type & INTEL_PT_TRANSACTION)) {
...@@ -3731,6 +3772,22 @@ static int intel_pt_synth_events(struct intel_pt *pt, ...@@ -3731,6 +3772,22 @@ static int intel_pt_synth_events(struct intel_pt *pt,
id += 1; id += 1;
} }
if (pt->synth_opts.cycles) {
attr.config = PERF_COUNT_HW_CPU_CYCLES;
if (pt->synth_opts.period_type == PERF_ITRACE_PERIOD_NANOSECS)
attr.sample_period =
intel_pt_ns_to_ticks(pt, pt->synth_opts.period);
else
attr.sample_period = pt->synth_opts.period;
err = intel_pt_synth_event(session, "cycles", &attr, id);
if (err)
return err;
pt->sample_cycles = true;
pt->cycles_sample_type = attr.sample_type;
pt->cycles_id = id;
id += 1;
}
attr.sample_type &= ~(u64)PERF_SAMPLE_PERIOD; attr.sample_type &= ~(u64)PERF_SAMPLE_PERIOD;
attr.sample_period = 1; attr.sample_period = 1;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment