Commit d1e521ad authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'trace-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "No new features this release. Mostly clean ups, restructuring and
  documentation.

   - Have ftrace_bug() show ftrace errors before the WARN, as the WARN
     will reboot the box before the error messages are printed if
     panic_on_warn is set.

   - Have traceoff_on_warn disable tracing sooner (before prints)

   - Write a message to the trace buffer that its being disabled when
     disable_trace_on_warning() is set.

   - Separate out synthetic events from histogram code to let it be used
     by other parts of the kernel.

   - More documentation on histogram design.

   - Other small fixes and clean ups"

* tag 'trace-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Remove obsolete PREEMPTIRQ_EVENTS kconfig option
  tracing/doc: Fix ascii-art in histogram-design.rst
  tracing: Add a trace print when traceoff_on_warning is triggered
  ftrace,bug: Improve traceoff_on_warn
  selftests/ftrace: Distinguish between hist and synthetic event checks
  tracing: Move synthetic events to a separate file
  tracing: Fix events.rst section numbering
  tracing/doc: Fix typos in histogram-design.rst
  tracing: Add hist_debug trace event files for histogram debugging
  tracing: Add histogram-design document
  tracing: Check state.disabled in synth event trace functions
  tracing/probe: reverse arguments to list_add
  tools/bootconfig: Add a summary of test cases and return error
  ftrace: show debugging information when panic_on_warn set
parents 595a56ac 388d8bdb
.. SPDX-License-Identifier: GPL-2.0
======================
Histogram Design Notes
======================
:Author: Tom Zanussi <zanussi@kernel.org>
This document attempts to provide a description of how the ftrace
histograms work and how the individual pieces map to the data
structures used to implement them in trace_events_hist.c and
tracing_map.c.
Note: All the ftrace histogram command examples assume the working
directory is the ftrace /tracing directory. For example::
# cd /sys/kernel/debug/tracing
Also, the histogram output displayed for those commands will be
generally be truncated - only enough to make the point is displayed.
'hist_debug' trace event files
==============================
If the kernel is compiled with CONFIG_HIST_TRIGGERS_DEBUG set, an
event file named 'hist_debug' will appear in each event's
subdirectory. This file can be read at any time and will display some
of the hist trigger internals described in this document. Specific
examples and output will be described in test cases below.
Basic histograms
================
First, basic histograms. Below is pretty much the simplest thing you
can do with histograms - create one with a single key on a single
event and cat the output::
# echo 'hist:keys=pid' >> events/sched/sched_waking/trigger
# cat events/sched/sched_waking/hist
{ pid: 18249 } hitcount: 1
{ pid: 13399 } hitcount: 1
{ pid: 17973 } hitcount: 1
{ pid: 12572 } hitcount: 1
...
{ pid: 10 } hitcount: 921
{ pid: 18255 } hitcount: 1444
{ pid: 25526 } hitcount: 2055
{ pid: 5257 } hitcount: 2055
{ pid: 27367 } hitcount: 2055
{ pid: 1728 } hitcount: 2161
Totals:
Hits: 21305
Entries: 183
Dropped: 0
What this does is create a histogram on the sched_waking event using
pid as a key and with a single value, hitcount, which even if not
explicitly specified, exists for every histogram regardless.
The hitcount value is a per-bucket value that's automatically
incremented on every hit for the given key, which in this case is the
pid.
So in this histogram, there's a separate bucket for each pid, and each
bucket contains a value for that bucket, counting the number of times
sched_waking was called for that pid.
Each histogram is represented by a hist_data struct.
To keep track of each key and value field in the histogram, hist_data
keeps an array of these fields named fields[]. The fields[] array is
an array containing struct hist_field representations of each
histogram val and key in the histogram (variables are also included
here, but are discussed later). So for the above histogram we have one
key and one value; in this case the one value is the hitcount value,
which all histograms have, regardless of whether they define that
value or not, which the above histogram does not.
Each struct hist_field contains a pointer to the ftrace_event_field
from the event's trace_event_file along with various bits related to
that such as the size, offset, type, and a hist_field_fn_t function,
which is used to grab the field's data from the ftrace event buffer
(in most cases - some hist_fields such as hitcount don't directly map
to an event field in the trace buffer - in these cases the function
implementation gets its value from somewhere else). The flags field
indicates which type of field it is - key, value, variable, variable
reference, etc., with value being the default.
The other important hist_data data structure in addition to the
fields[] array is the tracing_map instance created for the histogram,
which is held in the .map member. The tracing_map implements the
lock-free hash table used to implement histograms (see
kernel/trace/tracing_map.h for much more discussion about the
low-level data structures implementing the tracing_map). For the
purposes of this discussion, the tracing_map contains a number of
buckets, each bucket corresponding to a particular tracing_map_elt
object hashed by a given histogram key.
Below is a diagram the first part of which describes the hist_data and
associated key and value fields for the histogram described above. As
you can see, there are two fields in the fields array, one val field
for the hitcount and one key field for the pid key.
Below that is a diagram of a run-time snapshot of what the tracing_map
might look like for a given run. It attempts to show the
relationships between the hist_data fields and the tracing_map
elements for a couple hypothetical keys and values.::
+------------------+
| hist_data |
+------------------+ +----------------+
| .fields[] |---->| val = hitcount |----------------------------+
+----------------+ +----------------+ |
| .map | | .size | |
+----------------+ +--------------+ |
| .offset | |
+--------------+ |
| .fn() | |
+--------------+ |
. |
. |
. |
+----------------+ <--- n_vals |
| key = pid |----------------------------|--+
+----------------+ | |
| .size | | |
+--------------+ | |
| .offset | | |
+--------------+ | |
| .fn() | | |
+----------------+ <--- n_fields | |
| unused | | |
+----------------+ | |
| | | |
+--------------+ | |
| | | |
+--------------+ | |
| | | |
+--------------+ | |
n_keys = n_fields - n_vals | |
The hist_data n_vals and n_fields delineate the extent of the fields[] | |
array and separate keys from values for the rest of the code. | |
Below is a run-time representation of the tracing_map part of the | |
histogram, with pointers from various parts of the fields[] array | |
to corresponding parts of the tracing_map. | |
The tracing_map consists of an array of tracing_map_entrys and a set | |
of preallocated tracing_map_elts (abbreviated below as map_entry and | |
map_elt). The total number of map_entrys in the hist_data.map array = | |
map->max_elts (actually map->map_size but only max_elts of those are | |
used. This is a property required by the map_insert() algorithm). | |
If a map_entry is unused, meaning no key has yet hashed into it, its | |
.key value is 0 and its .val pointer is NULL. Once a map_entry has | |
been claimed, the .key value contains the key's hash value and the | |
.val member points to a map_elt containing the full key and an entry | |
for each key or value in the map_elt.fields[] array. There is an | |
entry in the map_elt.fields[] array corresponding to each hist_field | |
in the histogram, and this is where the continually aggregated sums | |
corresponding to each histogram value are kept. | |
The diagram attempts to show the relationship between the | |
hist_data.fields[] and the map_elt.fields[] with the links drawn | |
between diagrams::
+-----------+ | |
| hist_data | | |
+-----------+ | |
| .fields | | |
+---------+ +-----------+ | |
| .map |---->| map_entry | | |
+---------+ +-----------+ | |
| .key |---> 0 | |
+---------+ | |
| .val |---> NULL | |
+-----------+ | |
| map_entry | | |
+-----------+ | |
| .key |---> pid = 999 | |
+---------+ +-----------+ | |
| .val |--->| map_elt | | |
+---------+ +-----------+ | |
. | .key |---> full key * | |
. +---------+ +---------------+ | |
. | .fields |--->| .sum (val) |<-+ |
+-----------+ +---------+ | 2345 | | |
| map_entry | +---------------+ | |
+-----------+ | .offset (key) |<----+
| .key |---> 0 | 0 | | |
+---------+ +---------------+ | |
| .val |---> NULL . | |
+-----------+ . | |
| map_entry | . | |
+-----------+ +---------------+ | |
| .key | | .sum (val) or | | |
+---------+ +---------+ | .offset (key) | | |
| .val |--->| map_elt | +---------------+ | |
+-----------+ +---------+ | .sum (val) or | | |
| map_entry | | .offset (key) | | |
+-----------+ +---------------+ | |
| .key |---> pid = 4444 | |
+---------+ +-----------+ | |
| .val | | map_elt | | |
+---------+ +-----------+ | |
| .key |---> full key * | |
+---------+ +---------------+ | |
| .fields |--->| .sum (val) |<-+ |
+---------+ | 65523 | |
+---------------+ |
| .offset (key) |<----+
| 0 |
+---------------+
.
.
.
+---------------+
| .sum (val) or |
| .offset (key) |
+---------------+
| .sum (val) or |
| .offset (key) |
+---------------+
Abbreviations used in the diagrams::
hist_data = struct hist_trigger_data
hist_data.fields = struct hist_field
fn = hist_field_fn_t
map_entry = struct tracing_map_entry
map_elt = struct tracing_map_elt
map_elt.fields = struct tracing_map_field
Whenever a new event occurs and it has a hist trigger associated with
it, event_hist_trigger() is called. event_hist_trigger() first deals
with the key: for each subkey in the key (in the above example, there
is just one subkey corresponding to pid), the hist_field that
represents that subkey is retrieved from hist_data.fields[] and the
hist_field_fn_t fn() associated with that field, along with the
field's size and offset, is used to grab that subkey's data from the
current trace record.
Once the complete key has been retrieved, it's used to look that key
up in the tracing_map. If there's no tracing_map_elt associated with
that key, an empty one is claimed and inserted in the map for the new
key. In either case, the tracing_map_elt associated with that key is
returned.
Once a tracing_map_elt available, hist_trigger_elt_update() is called.
As the name implies, this updates the element, which basically means
updating the element's fields. There's a tracing_map_field associated
with each key and value in the histogram, and each of these correspond
to the key and value hist_fields created when the histogram was
created. hist_trigger_elt_update() goes through each value hist_field
and, as for the keys, uses the hist_field's fn() and size and offset
to grab the field's value from the current trace record. Once it has
that value, it simply adds that value to that field's
continually-updated tracing_map_field.sum member. Some hist_field
fn()s, such as for the hitcount, don't actually grab anything from the
trace record (the hitcount fn() just increments the counter sum by 1),
but the idea is the same.
Once all the values have been updated, hist_trigger_elt_update() is
done and returns. Note that there are also tracing_map_fields for
each subkey in the key, but hist_trigger_elt_update() doesn't look at
them or update anything - those exist only for sorting, which can
happen later.
Basic histogram test
--------------------
This is a good example to try. It produces 3 value fields and 2 key
fields in the output::
# echo 'hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger
To see the debug data, cat the kmem/kmalloc's 'hist_debug' file. It
will show the trigger info of the histogram it corresponds to, along
with the address of the hist_data associated with the histogram, which
will become useful in later examples. It then displays the number of
total hist_fields associated with the histogram along with a count of
how many of those correspond to keys and how many correspond to values.
It then goes on to display details for each field, including the
field's flags and the position of each field in the hist_data's
fields[] array, which is useful information for verifying that things
internally appear correct or not, and which again will become even
more useful in further examples::
# cat events/kmem/kmalloc/hist_debug
# event histogram
#
# trigger info: hist:keys=common_pid,call_site.sym:vals=hitcount,bytes_req,bytes_alloc:sort=hitcount:size=2048 [active]
#
hist_data: 000000005e48c9a5
n_vals: 3
n_keys: 2
n_fields: 5
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
VAL: normal u64 value
ftrace_event_field name: bytes_req
type: size_t
size: 8
is_signed: 0
hist_data->fields[2]:
flags:
VAL: normal u64 value
ftrace_event_field name: bytes_alloc
type: size_t
size: 8
is_signed: 0
key fields:
hist_data->fields[3]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: common_pid
type: int
size: 8
is_signed: 1
hist_data->fields[4]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: call_site
type: unsigned long
size: 8
is_signed: 0
The commands below can be used to clean things up for the next test::
# echo '!hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger
Variables
=========
Variables allow data from one hist trigger to be saved by one hist
trigger and retrieved by another hist trigger. For example, a trigger
on the sched_waking event can capture a timestamp for a particular
pid, and later a sched_switch event that switches to that pid event
can grab the timestamp and use it to calculate a time delta between
the two events::
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >>
events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >>
events/sched/sched_switch/trigger
In terms of the histogram data structures, variables are implemented
as another type of hist_field and for a given hist trigger are added
to the hist_data.fields[] array just after all the val fields. To
distinguish them from the existing key and val fields, they're given a
new flag type, HIST_FIELD_FL_VAR (abbreviated FL_VAR) and they also
make use of a new .var.idx field member in struct hist_field, which
maps them to an index in a new map_elt.vars[] array added to the
map_elt specifically designed to store and retrieve variable values.
The diagram below shows those new elements and adds a new variable
entry, ts0, corresponding to the ts0 variable in the sched_waking
trigger above.
sched_waking histogram
----------------------::
+------------------+
| hist_data |<-------------------------------------------------------+
+------------------+ +-------------------+ |
| .fields[] |-->| val = hitcount | |
+----------------+ +-------------------+ |
| .map | | .size | |
+----------------+ +-----------------+ |
| .offset | |
+-----------------+ |
| .fn() | |
+-----------------+ |
| .flags | |
+-----------------+ |
| .var.idx | |
+-------------------+ |
| var = ts0 | |
+-------------------+ |
| .size | |
+-----------------+ |
| .offset | |
+-----------------+ |
| .fn() | |
+-----------------+ |
| .flags & FL_VAR | |
+-----------------+ |
| .var.idx |----------------------------+-+ |
+-----------------+ | | |
. | | |
. | | |
. | | |
+-------------------+ <--- n_vals | | |
| key = pid | | | |
+-------------------+ | | |
| .size | | | |
+-----------------+ | | |
| .offset | | | |
+-----------------+ | | |
| .fn() | | | |
+-----------------+ | | |
| .flags & FL_KEY | | | |
+-----------------+ | | |
| .var.idx | | | |
+-------------------+ <--- n_fields | | |
| unused | | | |
+-------------------+ | | |
| | | | |
+-----------------+ | | |
| | | | |
+-----------------+ | | |
| | | | |
+-----------------+ | | |
| | | | |
+-----------------+ | | |
| | | | |
+-----------------+ | | |
n_keys = n_fields - n_vals | | |
| | |
This is very similar to the basic case. In the above diagram, we can | | |
see a new .flags member has been added to the struct hist_field | | |
struct, and a new entry added to hist_data.fields representing the ts0 | | |
variable. For a normal val hist_field, .flags is just 0 (modulo | | |
modifier flags), but if the value is defined as a variable, the .flags | | |
contains a set FL_VAR bit. | | |
As you can see, the ts0 entry's .var.idx member contains the index | | |
into the tracing_map_elts' .vars[] array containing variable values. | | |
This idx is used whenever the value of the variable is set or read. | | |
The map_elt.vars idx assigned to the given variable is assigned and | | |
saved in .var.idx by create_tracing_map_fields() after it calls | | |
tracing_map_add_var(). | | |
Below is a representation of the histogram at run-time, which | | |
populates the map, along with correspondence to the above hist_data and | | |
hist_field data structures. | | |
The diagram attempts to show the relationship between the | | |
hist_data.fields[] and the map_elt.fields[] and map_elt.vars[] with | | |
the links drawn between diagrams. For each of the map_elts, you can | | |
see that the .fields[] members point to the .sum or .offset of a key | | |
or val and the .vars[] members point to the value of a variable. The | | |
arrows between the two diagrams show the linkages between those | | |
tracing_map members and the field definitions in the corresponding | | |
hist_data fields[] members.::
+-----------+ | | |
| hist_data | | | |
+-----------+ | | |
| .fields | | | |
+---------+ +-----------+ | | |
| .map |---->| map_entry | | | |
+---------+ +-----------+ | | |
| .key |---> 0 | | |
+---------+ | | |
| .val |---> NULL | | |
+-----------+ | | |
| map_entry | | | |
+-----------+ | | |
| .key |---> pid = 999 | | |
+---------+ +-----------+ | | |
| .val |--->| map_elt | | | |
+---------+ +-----------+ | | |
. | .key |---> full key * | | |
. +---------+ +---------------+ | | |
. | .fields |--->| .sum (val) | | | |
. +---------+ | 2345 | | | |
. +--| .vars | +---------------+ | | |
. | +---------+ | .offset (key) | | | |
. | | 0 | | | |
. | +---------------+ | | |
. | . | | |
. | . | | |
. | . | | |
. | +---------------+ | | |
. | | .sum (val) or | | | |
. | | .offset (key) | | | |
. | +---------------+ | | |
. | | .sum (val) or | | | |
. | | .offset (key) | | | |
. | +---------------+ | | |
. | | | |
. +---------------->+---------------+ | | |
. | ts0 |<--+ | |
. | 113345679876 | | | |
. +---------------+ | | |
. | unused | | | |
. | | | | |
. +---------------+ | | |
. . | | |
. . | | |
. . | | |
. +---------------+ | | |
. | unused | | | |
. | | | | |
. +---------------+ | | |
. | unused | | | |
. | | | | |
. +---------------+ | | |
. | | |
+-----------+ | | |
| map_entry | | | |
+-----------+ | | |
| .key |---> pid = 4444 | | |
+---------+ +-----------+ | | |
| .val |--->| map_elt | | | |
+---------+ +-----------+ | | |
. | .key |---> full key * | | |
. +---------+ +---------------+ | | |
. | .fields |--->| .sum (val) | | | |
+---------+ | 2345 | | | |
+--| .vars | +---------------+ | | |
| +---------+ | .offset (key) | | | |
| | 0 | | | |
| +---------------+ | | |
| . | | |
| . | | |
| . | | |
| +---------------+ | | |
| | .sum (val) or | | | |
| | .offset (key) | | | |
| +---------------+ | | |
| | .sum (val) or | | | |
| | .offset (key) | | | |
| +---------------+ | | |
| | | |
| +---------------+ | | |
+---------------->| ts0 |<--+ | |
| 213499240729 | | |
+---------------+ | |
| unused | | |
| | | |
+---------------+ | |
. | |
. | |
. | |
+---------------+ | |
| unused | | |
| | | |
+---------------+ | |
| unused | | |
| | | |
+---------------+ | |
For each used map entry, there's a map_elt pointing to an array of | |
.vars containing the current value of the variables associated with | |
that histogram entry. So in the above, the timestamp associated with | |
pid 999 is 113345679876, and the timestamp variable in the same | |
.var.idx for pid 4444 is 213499240729. | |
sched_switch histogram | |
---------------------- | |
The sched_switch histogram paired with the above sched_waking | |
histogram is shown below. The most important aspect of the | |
sched_switch histogram is that it references a variable on the | |
sched_waking histogram above. | |
The histogram diagram is very similar to the others so far displayed, | |
but it adds variable references. You can see the normal hitcount and | |
key fields along with a new wakeup_lat variable implemented in the | |
same way as the sched_waking ts0 variable, but in addition there's an | |
entry with the new FL_VAR_REF (short for HIST_FIELD_FL_VAR_REF) flag. | |
Associated with the new var ref field are a couple of new hist_field | |
members, var.hist_data and var_ref_idx. For a variable reference, the | |
var.hist_data goes with the var.idx, which together uniquely identify | |
a particular variable on a particular histogram. The var_ref_idx is | |
just the index into the var_ref_vals[] array that caches the values of | |
each variable whenever a hist trigger is updated. Those resulting | |
values are then finally accessed by other code such as trace action | |
code that uses the var_ref_idx values to assign param values. | |
The diagram below describes the situation for the sched_switch | |
histogram referred to before::
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> | |
events/sched/sched_switch/trigger | |
| |
+------------------+ | |
| hist_data | | |
+------------------+ +-----------------------+ | |
| .fields[] |-->| val = hitcount | | |
+----------------+ +-----------------------+ | |
| .map | | .size | | |
+----------------+ +---------------------+ | |
+--| .var_refs[] | | .offset | | |
| +----------------+ +---------------------+ | |
| | .fn() | | |
| var_ref_vals[] +---------------------+ | |
| +-------------+ | .flags | | |
| | $ts0 |<---+ +---------------------+ | |
| +-------------+ | | .var.idx | | |
| | | | +---------------------+ | |
| +-------------+ | | .var.hist_data | | |
| | | | +---------------------+ | |
| +-------------+ | | .var_ref_idx | | |
| | | | +-----------------------+ | |
| +-------------+ | | var = wakeup_lat | | |
| . | +-----------------------+ | |
| . | | .size | | |
| . | +---------------------+ | |
| +-------------+ | | .offset | | |
| | | | +---------------------+ | |
| +-------------+ | | .fn() | | |
| | | | +---------------------+ | |
| +-------------+ | | .flags & FL_VAR | | |
| | +---------------------+ | |
| | | .var.idx | | |
| | +---------------------+ | |
| | | .var.hist_data | | |
| | +---------------------+ | |
| | | .var_ref_idx | | |
| | +---------------------+ | |
| | . | |
| | . | |
| | . | |
| | +-----------------------+ <--- n_vals | |
| | | key = pid | | |
| | +-----------------------+ | |
| | | .size | | |
| | +---------------------+ | |
| | | .offset | | |
| | +---------------------+ | |
| | | .fn() | | |
| | +---------------------+ | |
| | | .flags | | |
| | +---------------------+ | |
| | | .var.idx | | |
| | +-----------------------+ <--- n_fields | |
| | | unused | | |
| | +-----------------------+ | |
| | | | | |
| | +---------------------+ | |
| | | | | |
| | +---------------------+ | |
| | | | | |
| | +---------------------+ | |
| | | | | |
| | +---------------------+ | |
| | | | | |
| | +---------------------+ | |
| | n_keys = n_fields - n_vals | |
| | | |
| | | |
| | +-----------------------+ | |
+---------------------->| var_ref = $ts0 | | |
| +-----------------------+ | |
| | .size | | |
| +---------------------+ | |
| | .offset | | |
| +---------------------+ | |
| | .fn() | | |
| +---------------------+ | |
| | .flags & FL_VAR_REF | | |
| +---------------------+ | |
| | .var.idx |--------------------------+ |
| +---------------------+ |
| | .var.hist_data |----------------------------+
| +---------------------+
+---| .var_ref_idx |
+---------------------+
Abbreviations used in the diagrams::
hist_data = struct hist_trigger_data
hist_data.fields = struct hist_field
fn = hist_field_fn_t
FL_KEY = HIST_FIELD_FL_KEY
FL_VAR = HIST_FIELD_FL_VAR
FL_VAR_REF = HIST_FIELD_FL_VAR_REF
When a hist trigger makes use of a variable, a new hist_field is
created with flag HIST_FIELD_FL_VAR_REF. For a VAR_REF field, the
var.idx and var.hist_data take the same values as the referenced
variable, as well as the referenced variable's size, type, and
is_signed values. The VAR_REF field's .name is set to the name of the
variable it references. If a variable reference was created using the
explicit system.event.$var_ref notation, the hist_field's system and
event_name variables are also set.
So, in order to handle an event for the sched_switch histogram,
because we have a reference to a variable on another histogram, we
need to resolve all variable references first. This is done via the
resolve_var_refs() calls made from event_hist_trigger(). What this
does is grabs the var_refs[] array from the hist_data representing the
sched_switch histogram. For each one of those, the referenced
variable's var.hist_data along with the current key is used to look up
the corresponding tracing_map_elt in that histogram. Once found, the
referenced variable's var.idx is used to look up the variable's value
using tracing_map_read_var(elt, var.idx), which yields the value of
the variable for that element, ts0 in the case above. Note that both
the hist_fields representing both the variable and the variable
reference have the same var.idx, so this is straightforward.
Variable and variable reference test
------------------------------------
This example creates a variable on the sched_waking event, ts0, and
uses it in the sched_switch trigger. The sched_switch trigger also
creates its own variable, wakeup_lat, but nothing yet uses it::
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger
Looking at the sched_waking 'hist_debug' output, in addition to the
normal key and value hist_fields, in the val fields section we see a
field with the HIST_FIELD_FL_VAR flag, which indicates that that field
represents a variable. Note that in addition to the variable name,
contained in the var.name field, it includes the var.idx, which is the
index into the tracing_map_elt.vars[] array of the actual variable
location. Note also that the output shows that variables live in the
same part of the hist_data->fields[] array as normal values::
# cat events/sched/sched_waking/hist_debug
# event histogram
#
# trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
#
hist_data: 000000009536f554
n_vals: 2
n_keys: 1
n_fields: 3
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
var.name: ts0
var.idx (into tracing_map_elt.vars[]): 0
type: u64
size: 8
is_signed: 0
key fields:
hist_data->fields[2]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: pid
type: pid_t
size: 8
is_signed: 1
Moving on to the sched_switch trigger hist_debug output, in addition
to the unused wakeup_lat variable, we see a new section displaying
variable references. Variable references are displayed in a separate
section because in addition to to being logically separate from
variables and values, they actually live in a separate hist_data
array, var_refs[].
In this example, the sched_switch trigger has a reference to a
variable on the sched_waking trigger, $ts0. Looking at the details,
we can see that the var.hist_data value of the referenced variable
matches the previously displayed sched_waking trigger, and the var.idx
value matches the previously displayed var.idx value for that
variable. Also displayed is the var_ref_idx value for that variable
reference, which is where the value for that variable is cached for
use when the trigger is invoked::
# cat events/sched/sched_switch/hist_debug
# event histogram
#
# trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global [active]
#
hist_data: 00000000f4ee8006
n_vals: 2
n_keys: 1
n_fields: 3
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
var.name: wakeup_lat
var.idx (into tracing_map_elt.vars[]): 0
type: u64
size: 0
is_signed: 0
key fields:
hist_data->fields[2]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: next_pid
type: pid_t
size: 8
is_signed: 1
variable reference fields:
hist_data->var_refs[0]:
flags:
HIST_FIELD_FL_VAR_REF
name: ts0
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 000000009536f554
var_ref_idx (into hist_data->var_refs[]): 0
type: u64
size: 8
is_signed: 0
The commands below can be used to clean things up for the next test::
# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger
# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
Actions and Handlers
====================
Adding onto the previous example, we will now do something with that
wakeup_lat variable, namely send it and another field as a synthetic
event.
The onmatch() action below basically says that whenever we have a
sched_switch event, if we have a matching sched_waking event, in this
case if we have a pid in the sched_waking histogram that matches the
the next_pid field on this sched_switch event, we retrieve the
variables specified in the wakeup_latency() trace action, and use
them to generate a new wakeup_latency event into the trace stream.
Note that the way the trace handlers such as wakeup_latency() (which
could equivalently be written trace(wakeup_latency,$wakeup_lat,next_pid)
are implemented, the parameters specified to the trace handler must be
variables. In this case, $wakeup_lat is obviously a variable, but
next_pid isn't, since it's just naming a field in the sched_switch
trace event. Since this is something that almost every trace() and
save() action does, a special shortcut is implemented to allow field
names to be used directly in those cases. How it works is that under
the covers, a temporary variable is created for the named field, and
this variable is what is actually passed to the trace handler. In the
code and documentation, this type of variable is called a 'field
variable'.
Fields on other trace event's histograms can be used as well. In that
case we have to generate a new histogram and an unfortunately named
'synthetic_field' (the use of synthetic here has nothing to do with
synthetic events) and use that special histogram field as a variable.
The diagram below illustrates the new elements described above in the
context of the sched_switch histogram using the onmatch() handler and
the trace() action.
First, we define the wakeup_latency synthetic event::
# echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events
Next, the sched_waking hist trigger as before::
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >>
events/sched/sched_waking/trigger
Finally, we create a hist trigger on the sched_switch event that
generates a wakeup_latency() trace event. In this case we pass
next_pid into the wakeup_latency synthetic event invocation, which
means it will be automatically converted into a field variable::
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid)' >>
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger
The diagram for the sched_switch event is similar to previous examples
but shows the additional field_vars[] array for hist_data and shows
the linkages between the field_vars and the variables and references
created to implement the field variables. The details are discussed
below::
+------------------+
| hist_data |
+------------------+ +-----------------------+
| .fields[] |-->| val = hitcount |
+----------------+ +-----------------------+
| .map | | .size |
+----------------+ +---------------------+
+---| .field_vars[] | | .offset |
| +----------------+ +---------------------+
|+--| .var_refs[] | | .offset |
|| +----------------+ +---------------------+
|| | .fn() |
|| var_ref_vals[] +---------------------+
|| +-------------+ | .flags |
|| | $ts0 |<---+ +---------------------+
|| +-------------+ | | .var.idx |
|| | $next_pid |<-+ | +---------------------+
|| +-------------+ | | | .var.hist_data |
||+>| $wakeup_lat | | | +---------------------+
||| +-------------+ | | | .var_ref_idx |
||| | | | | +-----------------------+
||| +-------------+ | | | var = wakeup_lat |
||| . | | +-----------------------+
||| . | | | .size |
||| . | | +---------------------+
||| +-------------+ | | | .offset |
||| | | | | +---------------------+
||| +-------------+ | | | .fn() |
||| | | | | +---------------------+
||| +-------------+ | | | .flags & FL_VAR |
||| | | +---------------------+
||| | | | .var.idx |
||| | | +---------------------+
||| | | | .var.hist_data |
||| | | +---------------------+
||| | | | .var_ref_idx |
||| | | +---------------------+
||| | | .
||| | | .
||| | | .
||| | | .
||| +--------------+ | | .
+-->| field_var | | | .
|| +--------------+ | | .
|| | var | | | .
|| +------------+ | | .
|| | val | | | .
|| +--------------+ | | .
|| | field_var | | | .
|| +--------------+ | | .
|| | var | | | .
|| +------------+ | | .
|| | val | | | .
|| +------------+ | | .
|| . | | .
|| . | | .
|| . | | +-----------------------+ <--- n_vals
|| +--------------+ | | | key = pid |
|| | field_var | | | +-----------------------+
|| +--------------+ | | | .size |
|| | var |--+| +---------------------+
|| +------------+ ||| | .offset |
|| | val |-+|| +---------------------+
|| +------------+ ||| | .fn() |
|| ||| +---------------------+
|| ||| | .flags |
|| ||| +---------------------+
|| ||| | .var.idx |
|| ||| +---------------------+ <--- n_fields
|| |||
|| ||| n_keys = n_fields - n_vals
|| ||| +-----------------------+
|| |+->| var = next_pid |
|| | | +-----------------------+
|| | | | .size |
|| | | +---------------------+
|| | | | .offset |
|| | | +---------------------+
|| | | | .flags & FL_VAR |
|| | | +---------------------+
|| | | | .var.idx |
|| | | +---------------------+
|| | | | .var.hist_data |
|| | | +-----------------------+
|| +-->| val for next_pid |
|| | | +-----------------------+
|| | | | .size |
|| | | +---------------------+
|| | | | .offset |
|| | | +---------------------+
|| | | | .fn() |
|| | | +---------------------+
|| | | | .flags |
|| | | +---------------------+
|| | | | |
|| | | +---------------------+
|| | |
|| | |
|| | | +-----------------------+
+|------------------|-|>| var_ref = $ts0 |
| | | +-----------------------+
| | | | .size |
| | | +---------------------+
| | | | .offset |
| | | +---------------------+
| | | | .fn() |
| | | +---------------------+
| | | | .flags & FL_VAR_REF |
| | | +---------------------+
| | +---| .var_ref_idx |
| | +-----------------------+
| | | var_ref = $next_pid |
| | +-----------------------+
| | | .size |
| | +---------------------+
| | | .offset |
| | +---------------------+
| | | .fn() |
| | +---------------------+
| | | .flags & FL_VAR_REF |
| | +---------------------+
| +-----| .var_ref_idx |
| +-----------------------+
| | var_ref = $wakeup_lat |
| +-----------------------+
| | .size |
| +---------------------+
| | .offset |
| +---------------------+
| | .fn() |
| +---------------------+
| | .flags & FL_VAR_REF |
| +---------------------+
+------------------------| .var_ref_idx |
+---------------------+
As you can see, for a field variable, two hist_fields are created: one
representing the variable, in this case next_pid, and one to actually
get the value of the field from the trace stream, like a normal val
field does. These are created separately from normal variable
creation and are saved in the hist_data->field_vars[] array. See
below for how these are used. In addition, a reference hist_field is
also created, which is needed to reference the field variables such as
$next_pid variable in the trace() action.
Note that $wakeup_lat is also a variable reference, referencing the
value of the expression common_timestamp-$ts0, and so also needs to
have a hist field entry representing that reference created.
When hist_trigger_elt_update() is called to get the normal key and
value fields, it also calls update_field_vars(), which goes through
each field_var created for the histogram, and available from
hist_data->field_vars and calls val->fn() to get the data from the
current trace record, and then uses the var's var.idx to set the
variable at the var.idx offset in the appropriate tracing_map_elt's
variable at elt->vars[var.idx].
Once all the variables have been updated, resolve_var_refs() can be
called from event_hist_trigger(), and not only can our $ts0 and
$next_pid references be resolved but the $wakeup_lat reference as
well. At this point, the trace() action can simply access the values
assembled in the var_ref_vals[] array and generate the trace event.
The same process occurs for the field variables associated with the
save() action.
Abbreviations used in the diagram::
hist_data = struct hist_trigger_data
hist_data.fields = struct hist_field
field_var = struct field_var
fn = hist_field_fn_t
FL_KEY = HIST_FIELD_FL_KEY
FL_VAR = HIST_FIELD_FL_VAR
FL_VAR_REF = HIST_FIELD_FL_VAR_REF
trace() action field variable test
----------------------------------
This example adds to the previous test example by finally making use
of the wakeup_lat variable, but in addition also creates a couple of
field variables that then are all passed to the wakeup_latency() trace
action via the onmatch() handler.
First, we create the wakeup_latency synthetic event::
# echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
Next, the sched_waking trigger from previous examples::
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
Finally, as in the previous test example, we calculate and assign the
wakeup latency using the $ts0 reference from the sched_waking trigger
to the wakeup_lat variable, and finally use it along with a couple
sched_switch event fields, next_pid and next_comm, to generate a
wakeup_latency trace event. The next_pid and next_comm event fields
are automatically converted into field variables for this purpose::
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
The sched_waking hist_debug output shows the same data as in the
previous test example::
# cat events/sched/sched_waking/hist_debug
# event histogram
#
# trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
#
hist_data: 00000000d60ff61f
n_vals: 2
n_keys: 1
n_fields: 3
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
var.name: ts0
var.idx (into tracing_map_elt.vars[]): 0
type: u64
size: 8
is_signed: 0
key fields:
hist_data->fields[2]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: pid
type: pid_t
size: 8
is_signed: 1
The sched_switch hist_debug output shows the same key and value fields
as in the previous test example - note that wakeup_lat is still in the
val fields section, but that the new field variables are not there -
although the field variables are variables, they're held separately in
the hist_data's field_vars[] array. Although the field variables and
the normal variables are located in separate places, you can see that
the actual variable locations for those variables in the
tracing_map_elt.vars[] do have increasing indices as expected:
wakeup_lat takes the var.idx = 0 slot, while the field variables for
next_pid and next_comm have values var.idx = 1, and var.idx = 2. Note
also that those are the same values displayed for the variable
references corresponding to those variables in the variable reference
fields section. Since there are two triggers and thus two hist_data
addresses, those addresses also need to be accounted for when doing
the matching - you can see that the first variable refers to the 0
var.idx on the previous hist trigger (see the hist_data address
associated with that trigger), while the second variable refers to the
0 var.idx on the sched_switch hist trigger, as do all the remaining
variable references.
Finally, the action tracking variables section just shows the system
and event name for the onmatch() handler::
# cat events/sched/sched_switch/hist_debug
# event histogram
#
# trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm) [active]
#
hist_data: 0000000008f551b7
n_vals: 2
n_keys: 1
n_fields: 3
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
var.name: wakeup_lat
var.idx (into tracing_map_elt.vars[]): 0
type: u64
size: 0
is_signed: 0
key fields:
hist_data->fields[2]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: next_pid
type: pid_t
size: 8
is_signed: 1
variable reference fields:
hist_data->var_refs[0]:
flags:
HIST_FIELD_FL_VAR_REF
name: ts0
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 00000000d60ff61f
var_ref_idx (into hist_data->var_refs[]): 0
type: u64
size: 8
is_signed: 0
hist_data->var_refs[1]:
flags:
HIST_FIELD_FL_VAR_REF
name: wakeup_lat
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 0000000008f551b7
var_ref_idx (into hist_data->var_refs[]): 1
type: u64
size: 0
is_signed: 0
hist_data->var_refs[2]:
flags:
HIST_FIELD_FL_VAR_REF
name: next_pid
var.idx (into tracing_map_elt.vars[]): 1
var.hist_data: 0000000008f551b7
var_ref_idx (into hist_data->var_refs[]): 2
type: pid_t
size: 4
is_signed: 0
hist_data->var_refs[3]:
flags:
HIST_FIELD_FL_VAR_REF
name: next_comm
var.idx (into tracing_map_elt.vars[]): 2
var.hist_data: 0000000008f551b7
var_ref_idx (into hist_data->var_refs[]): 3
type: char[16]
size: 256
is_signed: 0
field variables:
hist_data->field_vars[0]:
field_vars[0].var:
flags:
HIST_FIELD_FL_VAR
var.name: next_pid
var.idx (into tracing_map_elt.vars[]): 1
field_vars[0].val:
ftrace_event_field name: next_pid
type: pid_t
size: 4
is_signed: 1
hist_data->field_vars[1]:
field_vars[1].var:
flags:
HIST_FIELD_FL_VAR
var.name: next_comm
var.idx (into tracing_map_elt.vars[]): 2
field_vars[1].val:
ftrace_event_field name: next_comm
type: char[16]
size: 256
is_signed: 0
action tracking variables (for onmax()/onchange()/onmatch()):
hist_data->actions[0].match_data.event_system: sched
hist_data->actions[0].match_data.event: sched_waking
The commands below can be used to clean things up for the next test::
# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
# echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
action_data and the trace() action
----------------------------------
As mentioned above, when the trace() action generates a synthetic
event, all the parameters to the synthetic event either already are
variables or are converted into variables (via field variables), and
finally all those variable values are collected via references to them
into a var_ref_vals[] array.
The values in the var_ref_vals[] array, however, don't necessarily
follow the same ordering as the synthetic event params. To address
that, struct action_data contains another array, var_ref_idx[] that
maps the trace action params to the var_ref_vals[] values. Below is a
diagram illustrating that for the wakeup_latency() synthetic event::
+------------------+ wakeup_latency()
| action_data | event params var_ref_vals[]
+------------------+ +-----------------+ +-----------------+
| .var_ref_idx[] |--->| $wakeup_lat idx |---+ | |
+----------------+ +-----------------+ | +-----------------+
| .synth_event | | $next_pid idx |---|-+ | $wakeup_lat val |
+----------------+ +-----------------+ | | +-----------------+
. | +->| $next_pid val |
. | +-----------------+
. | .
+-----------------+ | .
| | | .
+-----------------+ | +-----------------+
+--->| $wakeup_lat val |
+-----------------+
Basically, how this ends up getting used in the synthetic event probe
function, trace_event_raw_event_synth(), is as follows::
for each field i in .synth_event
val_idx = .var_ref_idx[i]
val = var_ref_vals[val_idx]
action_data and the onXXX() handlers
------------------------------------
The hist trigger onXXX() actions other than onmatch(), such as onmax()
and onchange(), also make use of and internally create hidden
variables. This information is contained in the
action_data.track_data struct, and is also visible in the hist_debug
output as will be described in the example below.
Typically, the onmax() or onchange() handlers are used in conjunction
with the save() and snapshot() actions. For example::
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >>
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger
or::
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
onmax($wakeup_lat).snapshot()' >>
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger
save() action field variable test
---------------------------------
For this example, instead of generating a synthetic event, the save()
action is used to save field values whenever an onmax() handler
detects that a new max latency has been hit. As in the previous
example, the values being saved are also field values, but in this
case, are kept in a separate hist_data array named save_vars[].
As in previous test examples, we set up the sched_waking trigger::
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
In this case, however, we set up the sched_switch trigger to save some
sched_switch field values whenever we hit a new maximum latency. For
both the onmax() handler and save() action, variables will be created,
which we can use the hist_debug files to examine::
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger
The sched_waking hist_debug output shows the same data as in the
previous test examples::
# cat events/sched/sched_waking/hist_debug
#
# trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
#
hist_data: 00000000e6290f48
n_vals: 2
n_keys: 1
n_fields: 3
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
var.name: ts0
var.idx (into tracing_map_elt.vars[]): 0
type: u64
size: 8
is_signed: 0
key fields:
hist_data->fields[2]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: pid
type: pid_t
size: 8
is_signed: 1
The output of the sched_switch trigger shows the same val and key
values as before, but also shows a couple new sections.
First, the action tracking variables section now shows the
actions[].track_data information describing the special tracking
variables and references used to track, in this case, the running
maximum value. The actions[].track_data.var_ref member contains the
reference to the variable being tracked, in this case the $wakeup_lat
variable. In order to perform the onmax() handler function, there
also needs to be a variable that tracks the current maximum by getting
updated whenever a new maximum is hit. In this case, we can see that
an auto-generated variable named ' __max' has been created and is
visible in the actions[].track_data.track_var variable.
Finally, in the new 'save action variables' section, we can see that
the 4 params to the save() function have resulted in 4 field variables
being created for the purposes of saving the values of the named
fields when the max is hit. These variables are kept in a separate
save_vars[] array off of hist_data, so are displayed in a separate
section::
# cat events/sched/sched_switch/hist_debug
# event histogram
#
# trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) [active]
#
hist_data: 0000000057bcd28d
n_vals: 2
n_keys: 1
n_fields: 3
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
var.name: wakeup_lat
var.idx (into tracing_map_elt.vars[]): 0
type: u64
size: 0
is_signed: 0
key fields:
hist_data->fields[2]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: next_pid
type: pid_t
size: 8
is_signed: 1
variable reference fields:
hist_data->var_refs[0]:
flags:
HIST_FIELD_FL_VAR_REF
name: ts0
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 00000000e6290f48
var_ref_idx (into hist_data->var_refs[]): 0
type: u64
size: 8
is_signed: 0
hist_data->var_refs[1]:
flags:
HIST_FIELD_FL_VAR_REF
name: wakeup_lat
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 0000000057bcd28d
var_ref_idx (into hist_data->var_refs[]): 1
type: u64
size: 0
is_signed: 0
action tracking variables (for onmax()/onchange()/onmatch()):
hist_data->actions[0].track_data.var_ref:
flags:
HIST_FIELD_FL_VAR_REF
name: wakeup_lat
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 0000000057bcd28d
var_ref_idx (into hist_data->var_refs[]): 1
type: u64
size: 0
is_signed: 0
hist_data->actions[0].track_data.track_var:
flags:
HIST_FIELD_FL_VAR
var.name: __max
var.idx (into tracing_map_elt.vars[]): 1
type: u64
size: 8
is_signed: 0
save action variables (save() params):
hist_data->save_vars[0]:
save_vars[0].var:
flags:
HIST_FIELD_FL_VAR
var.name: next_comm
var.idx (into tracing_map_elt.vars[]): 2
save_vars[0].val:
ftrace_event_field name: next_comm
type: char[16]
size: 256
is_signed: 0
hist_data->save_vars[1]:
save_vars[1].var:
flags:
HIST_FIELD_FL_VAR
var.name: prev_pid
var.idx (into tracing_map_elt.vars[]): 3
save_vars[1].val:
ftrace_event_field name: prev_pid
type: pid_t
size: 4
is_signed: 1
hist_data->save_vars[2]:
save_vars[2].var:
flags:
HIST_FIELD_FL_VAR
var.name: prev_prio
var.idx (into tracing_map_elt.vars[]): 4
save_vars[2].val:
ftrace_event_field name: prev_prio
type: int
size: 4
is_signed: 1
hist_data->save_vars[3]:
save_vars[3].var:
flags:
HIST_FIELD_FL_VAR
var.name: prev_comm
var.idx (into tracing_map_elt.vars[]): 5
save_vars[3].val:
ftrace_event_field name: prev_comm
type: char[16]
size: 256
is_signed: 0
The commands below can be used to clean things up for the next test::
# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger
# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
A couple special cases
======================
While the above covers the basics of the histogram internals, there
are a couple of special cases that should be discussed, since they
tend to create even more confusion. Those are field variables on other
histograms, and aliases, both described below through example tests
using the hist_debug files.
Test of field variables on other histograms
-------------------------------------------
This example is similar to the previous examples, but in this case,
the sched_switch trigger references a hist trigger field on another
event, namely the sched_waking event. In order to accomplish this, a
field variable is created for the other event, but since an existing
histogram can't be used, as existing histograms are immutable, a new
histogram with a matching variable is created and used, and we'll see
that reflected in the hist_debug output shown below.
First, we create the wakeup_latency synthetic event. Note the
addition of the prio field::
# echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events
As in previous test examples, we set up the sched_waking trigger::
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
Here we set up a hist trigger on sched_switch to send a wakeup_latency
event using an onmatch handler naming the sched_waking event. Note
that the third param being passed to the wakeup_latency() is prio,
which is a field name that needs to have a field variable created for
it. There isn't however any prio field on the sched_switch event so
it would seem that it wouldn't be possible to create a field variable
for it. The matching sched_waking event does have a prio field, so it
should be possible to make use of it for this purpose. The problem
with that is that it's not currently possible to define a new variable
on an existing histogram, so it's not possible to add a new prio field
variable to the existing sched_waking histogram. It is however
possible to create an additional new 'matching' sched_waking histogram
for the same event, meaning that it uses the same key and filters, and
define the new prio field variable on that.
Here's the sched_switch trigger::
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger
And here's the output of the hist_debug information for the
sched_waking hist trigger. Note that there are two histograms
displayed in the output: the first is the normal sched_waking
histogram we've seen in the previous examples, and the second is the
special histogram we created to provide the prio field variable.
Looking at the second histogram below, we see a variable with the name
synthetic_prio. This is the field variable created for the prio field
on that sched_waking histogram::
# cat events/sched/sched_waking/hist_debug
# event histogram
#
# trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
#
hist_data: 00000000349570e4
n_vals: 2
n_keys: 1
n_fields: 3
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
var.name: ts0
var.idx (into tracing_map_elt.vars[]): 0
type: u64
size: 8
is_signed: 0
key fields:
hist_data->fields[2]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: pid
type: pid_t
size: 8
is_signed: 1
# event histogram
#
# trigger info: hist:keys=pid:vals=hitcount:synthetic_prio=prio:sort=hitcount:size=2048 [active]
#
hist_data: 000000006920cf38
n_vals: 2
n_keys: 1
n_fields: 3
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
ftrace_event_field name: prio
var.name: synthetic_prio
var.idx (into tracing_map_elt.vars[]): 0
type: int
size: 4
is_signed: 1
key fields:
hist_data->fields[2]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: pid
type: pid_t
size: 8
is_signed: 1
Looking at the sched_switch histogram below, we can see a reference to
the synthetic_prio variable on sched_waking, and looking at the
associated hist_data address we see that it is indeed associated with
the new histogram. Note also that the other references are to a
normal variable, wakeup_lat, and to a normal field variable, next_pid,
the details of which are in the field variables section::
# cat events/sched/sched_switch/hist_debug
# event histogram
#
# trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio) [active]
#
hist_data: 00000000a73b67df
n_vals: 2
n_keys: 1
n_fields: 3
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
var.name: wakeup_lat
var.idx (into tracing_map_elt.vars[]): 0
type: u64
size: 0
is_signed: 0
key fields:
hist_data->fields[2]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: next_pid
type: pid_t
size: 8
is_signed: 1
variable reference fields:
hist_data->var_refs[0]:
flags:
HIST_FIELD_FL_VAR_REF
name: ts0
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 00000000349570e4
var_ref_idx (into hist_data->var_refs[]): 0
type: u64
size: 8
is_signed: 0
hist_data->var_refs[1]:
flags:
HIST_FIELD_FL_VAR_REF
name: wakeup_lat
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 00000000a73b67df
var_ref_idx (into hist_data->var_refs[]): 1
type: u64
size: 0
is_signed: 0
hist_data->var_refs[2]:
flags:
HIST_FIELD_FL_VAR_REF
name: next_pid
var.idx (into tracing_map_elt.vars[]): 1
var.hist_data: 00000000a73b67df
var_ref_idx (into hist_data->var_refs[]): 2
type: pid_t
size: 4
is_signed: 0
hist_data->var_refs[3]:
flags:
HIST_FIELD_FL_VAR_REF
name: synthetic_prio
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 000000006920cf38
var_ref_idx (into hist_data->var_refs[]): 3
type: int
size: 4
is_signed: 1
field variables:
hist_data->field_vars[0]:
field_vars[0].var:
flags:
HIST_FIELD_FL_VAR
var.name: next_pid
var.idx (into tracing_map_elt.vars[]): 1
field_vars[0].val:
ftrace_event_field name: next_pid
type: pid_t
size: 4
is_signed: 1
action tracking variables (for onmax()/onchange()/onmatch()):
hist_data->actions[0].match_data.event_system: sched
hist_data->actions[0].match_data.event: sched_waking
The commands below can be used to clean things up for the next test::
# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger
# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
# echo '!wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events
Alias test
----------
This example is very similar to previous examples, but demonstrates
the alias flag.
First, we create the wakeup_latency synthetic event::
# echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
Next, we create a sched_waking trigger similar to previous examples,
but in this case we save the pid in the waking_pid variable::
# echo 'hist:keys=pid:waking_pid=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
For the sched_switch trigger, instead of using $waking_pid directly in
the wakeup_latency synthetic event invocation, we create an alias of
$waking_pid named $woken_pid, and use that in the synthetic event
invocation instead::
# echo 'hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger
Looking at the sched_waking hist_debug output, in addition to the
normal fields, we can see the waking_pid variable::
# cat events/sched/sched_waking/hist_debug
# event histogram
#
# trigger info: hist:keys=pid:vals=hitcount:waking_pid=pid,ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
#
hist_data: 00000000a250528c
n_vals: 3
n_keys: 1
n_fields: 4
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
ftrace_event_field name: pid
var.name: waking_pid
var.idx (into tracing_map_elt.vars[]): 0
type: pid_t
size: 4
is_signed: 1
hist_data->fields[2]:
flags:
HIST_FIELD_FL_VAR
var.name: ts0
var.idx (into tracing_map_elt.vars[]): 1
type: u64
size: 8
is_signed: 0
key fields:
hist_data->fields[3]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: pid
type: pid_t
size: 8
is_signed: 1
The sched_switch hist_debug output shows that a variable named
woken_pid has been created but that it also has the
HIST_FIELD_FL_ALIAS flag set. It also has the HIST_FIELD_FL_VAR flag
set, which is why it appears in the val field section.
Despite that implementation detail, an alias variable is actually more
like a variable reference; in fact it can be thought of as a reference
to a reference. The implementation copies the var_ref->fn() from the
variable reference being referenced, in this case, the waking_pid
fn(), which is hist_field_var_ref() and makes that the fn() of the
alias. The hist_field_var_ref() fn() requires the var_ref_idx of the
variable reference it's using, so waking_pid's var_ref_idx is also
copied to the alias. The end result is that when the value of alias
is retrieved, in the end it just does the same thing the original
reference would have done and retrieves the same value from the
var_ref_vals[] array. You can verify this in the output by noting
that the var_ref_idx of the alias, in this case woken_pid, is the same
as the var_ref_idx of the reference, waking_pid, in the variable
reference fields section.
Additionally, once it gets that value, since it is also a variable, it
then saves that value into its var.idx. So the var.idx of the
woken_pid alias is 0, which it fills with the value from var_ref_idx 0
when its fn() is called to update itself. You'll also notice that
there's a woken_pid var_ref in the variable refs section. That is the
reference to the woken_pid alias variable, and you can see that it
retrieves the value from the same var.idx as the woken_pid alias, 0,
and then in turn saves that value in its own var_ref_idx slot, 3, and
the value at this position is finally what gets assigned to the
$woken_pid slot in the trace event invocation::
# cat events/sched/sched_switch/hist_debug
# event histogram
#
# trigger info: hist:keys=next_pid:vals=hitcount:woken_pid=$waking_pid,wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm) [active]
#
hist_data: 0000000055d65ed0
n_vals: 3
n_keys: 1
n_fields: 4
val fields:
hist_data->fields[0]:
flags:
VAL: HIST_FIELD_FL_HITCOUNT
type: u64
size: 8
is_signed: 0
hist_data->fields[1]:
flags:
HIST_FIELD_FL_VAR
HIST_FIELD_FL_ALIAS
var.name: woken_pid
var.idx (into tracing_map_elt.vars[]): 0
var_ref_idx (into hist_data->var_refs[]): 0
type: pid_t
size: 4
is_signed: 1
hist_data->fields[2]:
flags:
HIST_FIELD_FL_VAR
var.name: wakeup_lat
var.idx (into tracing_map_elt.vars[]): 1
type: u64
size: 0
is_signed: 0
key fields:
hist_data->fields[3]:
flags:
HIST_FIELD_FL_KEY
ftrace_event_field name: next_pid
type: pid_t
size: 8
is_signed: 1
variable reference fields:
hist_data->var_refs[0]:
flags:
HIST_FIELD_FL_VAR_REF
name: waking_pid
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 00000000a250528c
var_ref_idx (into hist_data->var_refs[]): 0
type: pid_t
size: 4
is_signed: 1
hist_data->var_refs[1]:
flags:
HIST_FIELD_FL_VAR_REF
name: ts0
var.idx (into tracing_map_elt.vars[]): 1
var.hist_data: 00000000a250528c
var_ref_idx (into hist_data->var_refs[]): 1
type: u64
size: 8
is_signed: 0
hist_data->var_refs[2]:
flags:
HIST_FIELD_FL_VAR_REF
name: wakeup_lat
var.idx (into tracing_map_elt.vars[]): 1
var.hist_data: 0000000055d65ed0
var_ref_idx (into hist_data->var_refs[]): 2
type: u64
size: 0
is_signed: 0
hist_data->var_refs[3]:
flags:
HIST_FIELD_FL_VAR_REF
name: woken_pid
var.idx (into tracing_map_elt.vars[]): 0
var.hist_data: 0000000055d65ed0
var_ref_idx (into hist_data->var_refs[]): 3
type: pid_t
size: 4
is_signed: 1
hist_data->var_refs[4]:
flags:
HIST_FIELD_FL_VAR_REF
name: next_comm
var.idx (into tracing_map_elt.vars[]): 2
var.hist_data: 0000000055d65ed0
var_ref_idx (into hist_data->var_refs[]): 4
type: char[16]
size: 256
is_signed: 0
field variables:
hist_data->field_vars[0]:
field_vars[0].var:
flags:
HIST_FIELD_FL_VAR
var.name: next_comm
var.idx (into tracing_map_elt.vars[]): 2
field_vars[0].val:
ftrace_event_field name: next_comm
type: char[16]
size: 256
is_signed: 0
action tracking variables (for onmax()/onchange()/onmatch()):
hist_data->actions[0].match_data.event_system: sched
hist_data->actions[0].match_data.event: sched_waking
The commands below can be used to clean things up for the next test::
# echo '!hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger
# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
# echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
...@@ -249,15 +249,6 @@ config TRACE_PREEMPT_TOGGLE ...@@ -249,15 +249,6 @@ config TRACE_PREEMPT_TOGGLE
Enables hooks which will be called when preemption is first disabled, Enables hooks which will be called when preemption is first disabled,
and last enabled. and last enabled.
config PREEMPTIRQ_EVENTS
bool "Enable trace events for preempt and irq disable/enable"
select TRACE_IRQFLAGS
select TRACE_PREEMPT_TOGGLE if PREEMPTION
select GENERIC_TRACER
default n
help
Enable tracing of disable and enable events for preemption and irqs.
config IRQSOFF_TRACER config IRQSOFF_TRACER
bool "Interrupts-off Latency Tracer" bool "Interrupts-off Latency Tracer"
default n default n
...@@ -614,12 +605,30 @@ config TRACING_MAP ...@@ -614,12 +605,30 @@ config TRACING_MAP
generally used outside of that context, and is normally generally used outside of that context, and is normally
selected by tracers that use it. selected by tracers that use it.
config SYNTH_EVENTS
bool "Synthetic trace events"
select TRACING
select DYNAMIC_EVENTS
default n
help
Synthetic events are user-defined trace events that can be
used to combine data from other trace events or in fact any
data source. Synthetic events can be generated indirectly
via the trace() action of histogram triggers or directly
by way of an in-kernel API.
See Documentation/trace/events.rst or
Documentation/trace/histogram.rst for details and examples.
If in doubt, say N.
config HIST_TRIGGERS config HIST_TRIGGERS
bool "Histogram triggers" bool "Histogram triggers"
depends on ARCH_HAVE_NMI_SAFE_CMPXCHG depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
select TRACING_MAP select TRACING_MAP
select TRACING select TRACING
select DYNAMIC_EVENTS select DYNAMIC_EVENTS
select SYNTH_EVENTS
default n default n
help help
Hist triggers allow one or more arbitrary trace event fields Hist triggers allow one or more arbitrary trace event fields
...@@ -815,7 +824,7 @@ config PREEMPTIRQ_DELAY_TEST ...@@ -815,7 +824,7 @@ config PREEMPTIRQ_DELAY_TEST
config SYNTH_EVENT_GEN_TEST config SYNTH_EVENT_GEN_TEST
tristate "Test module for in-kernel synthetic event generation" tristate "Test module for in-kernel synthetic event generation"
depends on HIST_TRIGGERS depends on SYNTH_EVENTS
help help
This option creates a test module to check the base This option creates a test module to check the base
functionality of in-kernel synthetic event definition and functionality of in-kernel synthetic event definition and
...@@ -838,6 +847,29 @@ config KPROBE_EVENT_GEN_TEST ...@@ -838,6 +847,29 @@ config KPROBE_EVENT_GEN_TEST
If unsure, say N. If unsure, say N.
config HIST_TRIGGERS_DEBUG
bool "Hist trigger debug support"
depends on HIST_TRIGGERS
help
Add "hist_debug" file for each event, which when read will
dump out a bunch of internal details about the hist triggers
defined on that event.
The hist_debug file serves a couple of purposes:
- Helps developers verify that nothing is broken.
- Provides educational information to support the details
of the hist trigger internals as described by
Documentation/trace/histogram-design.rst.
The hist_debug output only covers the data structures
related to the histogram definitions themselves and doesn't
display the internals of map buckets or variable values of
running histograms.
If unsure, say N.
endif # FTRACE endif # FTRACE
endif # TRACING_SUPPORT endif # TRACING_SUPPORT
......
...@@ -72,6 +72,7 @@ endif ...@@ -72,6 +72,7 @@ endif
obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
obj-$(CONFIG_TRACE_EVENT_INJECT) += trace_events_inject.o obj-$(CONFIG_TRACE_EVENT_INJECT) += trace_events_inject.o
obj-$(CONFIG_SYNTH_EVENTS) += trace_events_synth.o
obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o
obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe.o obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe.o
......
...@@ -2016,14 +2016,14 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec) ...@@ -2016,14 +2016,14 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec)
{ {
unsigned long ip = rec ? rec->ip : 0; unsigned long ip = rec ? rec->ip : 0;
pr_info("------------[ ftrace bug ]------------\n");
switch (failed) { switch (failed) {
case -EFAULT: case -EFAULT:
FTRACE_WARN_ON_ONCE(1);
pr_info("ftrace faulted on modifying "); pr_info("ftrace faulted on modifying ");
print_ip_sym(KERN_INFO, ip); print_ip_sym(KERN_INFO, ip);
break; break;
case -EINVAL: case -EINVAL:
FTRACE_WARN_ON_ONCE(1);
pr_info("ftrace failed to modify "); pr_info("ftrace failed to modify ");
print_ip_sym(KERN_INFO, ip); print_ip_sym(KERN_INFO, ip);
print_ip_ins(" actual: ", (unsigned char *)ip); print_ip_ins(" actual: ", (unsigned char *)ip);
...@@ -2034,12 +2034,10 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec) ...@@ -2034,12 +2034,10 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec)
} }
break; break;
case -EPERM: case -EPERM:
FTRACE_WARN_ON_ONCE(1);
pr_info("ftrace faulted on writing "); pr_info("ftrace faulted on writing ");
print_ip_sym(KERN_INFO, ip); print_ip_sym(KERN_INFO, ip);
break; break;
default: default:
FTRACE_WARN_ON_ONCE(1);
pr_info("ftrace faulted on unknown error "); pr_info("ftrace faulted on unknown error ");
print_ip_sym(KERN_INFO, ip); print_ip_sym(KERN_INFO, ip);
} }
...@@ -2066,6 +2064,8 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec) ...@@ -2066,6 +2064,8 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec)
ip = ftrace_get_addr_curr(rec); ip = ftrace_get_addr_curr(rec);
pr_cont("\n expected tramp: %lx\n", ip); pr_cont("\n expected tramp: %lx\n", ip);
} }
FTRACE_WARN_ON_ONCE(1);
} }
static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update) static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update)
......
...@@ -1299,8 +1299,11 @@ EXPORT_SYMBOL_GPL(tracing_off); ...@@ -1299,8 +1299,11 @@ EXPORT_SYMBOL_GPL(tracing_off);
void disable_trace_on_warning(void) void disable_trace_on_warning(void)
{ {
if (__disable_trace_on_warning) if (__disable_trace_on_warning) {
trace_array_printk_buf(global_trace.array_buffer.buffer, _THIS_IP_,
"Disabling tracing due to warning\n");
tracing_off(); tracing_off();
}
} }
/** /**
......
...@@ -1661,6 +1661,7 @@ extern struct list_head ftrace_events; ...@@ -1661,6 +1661,7 @@ extern struct list_head ftrace_events;
extern const struct file_operations event_trigger_fops; extern const struct file_operations event_trigger_fops;
extern const struct file_operations event_hist_fops; extern const struct file_operations event_hist_fops;
extern const struct file_operations event_hist_debug_fops;
extern const struct file_operations event_inject_fops; extern const struct file_operations event_inject_fops;
#ifdef CONFIG_HIST_TRIGGERS #ifdef CONFIG_HIST_TRIGGERS
......
...@@ -2208,6 +2208,10 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) ...@@ -2208,6 +2208,10 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
#ifdef CONFIG_HIST_TRIGGERS #ifdef CONFIG_HIST_TRIGGERS
trace_create_file("hist", 0444, file->dir, file, trace_create_file("hist", 0444, file->dir, file,
&event_hist_fops); &event_hist_fops);
#endif
#ifdef CONFIG_HIST_TRIGGERS_DEBUG
trace_create_file("hist_debug", 0444, file->dir, file,
&event_hist_debug_fops);
#endif #endif
trace_create_file("format", 0444, file->dir, call, trace_create_file("format", 0444, file->dir, call,
&ftrace_event_format_fops); &ftrace_event_format_fops);
......
...@@ -19,13 +19,7 @@ ...@@ -19,13 +19,7 @@
#include <trace/events/mmflags.h> #include <trace/events/mmflags.h>
#include "tracing_map.h" #include "tracing_map.h"
#include "trace.h" #include "trace_synth.h"
#include "trace_dynevent.h"
#define SYNTH_SYSTEM "synthetic"
#define SYNTH_FIELDS_MAX 32
#define STR_VAR_LEN_MAX 32 /* must be multiple of sizeof(u64) */
#define ERRORS \ #define ERRORS \
C(NONE, "No error"), \ C(NONE, "No error"), \
...@@ -380,69 +374,6 @@ struct hist_trigger_data { ...@@ -380,69 +374,6 @@ struct hist_trigger_data {
unsigned int n_save_var_str; unsigned int n_save_var_str;
}; };
static int create_synth_event(int argc, const char **argv);
static int synth_event_show(struct seq_file *m, struct dyn_event *ev);
static int synth_event_release(struct dyn_event *ev);
static bool synth_event_is_busy(struct dyn_event *ev);
static bool synth_event_match(const char *system, const char *event,
int argc, const char **argv, struct dyn_event *ev);
static struct dyn_event_operations synth_event_ops = {
.create = create_synth_event,
.show = synth_event_show,
.is_busy = synth_event_is_busy,
.free = synth_event_release,
.match = synth_event_match,
};
struct synth_field {
char *type;
char *name;
size_t size;
unsigned int offset;
bool is_signed;
bool is_string;
};
struct synth_event {
struct dyn_event devent;
int ref;
char *name;
struct synth_field **fields;
unsigned int n_fields;
unsigned int n_u64;
struct trace_event_class class;
struct trace_event_call call;
struct tracepoint *tp;
struct module *mod;
};
static bool is_synth_event(struct dyn_event *ev)
{
return ev->ops == &synth_event_ops;
}
static struct synth_event *to_synth_event(struct dyn_event *ev)
{
return container_of(ev, struct synth_event, devent);
}
static bool synth_event_is_busy(struct dyn_event *ev)
{
struct synth_event *event = to_synth_event(ev);
return event->ref != 0;
}
static bool synth_event_match(const char *system, const char *event,
int argc, const char **argv, struct dyn_event *ev)
{
struct synth_event *sev = to_synth_event(ev);
return strcmp(sev->name, event) == 0 &&
(!system || strcmp(system, SYNTH_SYSTEM) == 0);
}
struct action_data; struct action_data;
typedef void (*action_fn_t) (struct hist_trigger_data *hist_data, typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
...@@ -589,6 +520,7 @@ static struct track_data *track_data_alloc(unsigned int key_len, ...@@ -589,6 +520,7 @@ static struct track_data *track_data_alloc(unsigned int key_len,
track_data_free(data); track_data_free(data);
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
} }
data->elt.private_data = elt_data; data->elt.private_data = elt_data;
elt_data->comm = kzalloc(TASK_COMM_LEN, GFP_KERNEL); elt_data->comm = kzalloc(TASK_COMM_LEN, GFP_KERNEL);
...@@ -597,1788 +529,94 @@ static struct track_data *track_data_alloc(unsigned int key_len, ...@@ -597,1788 +529,94 @@ static struct track_data *track_data_alloc(unsigned int key_len,
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
} }
return data; return data;
} }
static char last_cmd[MAX_FILTER_STR_VAL];
static char last_cmd_loc[MAX_FILTER_STR_VAL];
static int errpos(char *str)
{
return err_pos(last_cmd, str);
}
static void last_cmd_set(struct trace_event_file *file, char *str)
{
const char *system = NULL, *name = NULL;
struct trace_event_call *call;
if (!str)
return;
strcpy(last_cmd, "hist:");
strncat(last_cmd, str, MAX_FILTER_STR_VAL - 1 - sizeof("hist:"));
if (file) {
call = file->event_call;
system = call->class->system;
if (system) {
name = trace_event_name(call);
if (!name)
system = NULL;
}
}
if (system)
snprintf(last_cmd_loc, MAX_FILTER_STR_VAL, "hist:%s:%s", system, name);
}
static void hist_err(struct trace_array *tr, u8 err_type, u8 err_pos)
{
tracing_log_err(tr, last_cmd_loc, last_cmd, err_text,
err_type, err_pos);
}
static void hist_err_clear(void)
{
last_cmd[0] = '\0';
last_cmd_loc[0] = '\0';
}
struct synth_trace_event {
struct trace_entry ent;
u64 fields[];
};
static int synth_event_define_fields(struct trace_event_call *call)
{
struct synth_trace_event trace;
int offset = offsetof(typeof(trace), fields);
struct synth_event *event = call->data;
unsigned int i, size, n_u64;
char *name, *type;
bool is_signed;
int ret = 0;
for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
size = event->fields[i]->size;
is_signed = event->fields[i]->is_signed;
type = event->fields[i]->type;
name = event->fields[i]->name;
ret = trace_define_field(call, type, name, offset, size,
is_signed, FILTER_OTHER);
if (ret)
break;
event->fields[i]->offset = n_u64;
if (event->fields[i]->is_string) {
offset += STR_VAR_LEN_MAX;
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
offset += sizeof(u64);
n_u64++;
}
}
event->n_u64 = n_u64;
return ret;
}
static bool synth_field_signed(char *type)
{
if (str_has_prefix(type, "u"))
return false;
if (strcmp(type, "gfp_t") == 0)
return false;
return true;
}
static int synth_field_is_string(char *type)
{
if (strstr(type, "char[") != NULL)
return true;
return false;
}
static int synth_field_string_size(char *type)
{
char buf[4], *end, *start;
unsigned int len;
int size, err;
start = strstr(type, "char[");
if (start == NULL)
return -EINVAL;
start += sizeof("char[") - 1;
end = strchr(type, ']');
if (!end || end < start)
return -EINVAL;
len = end - start;
if (len > 3)
return -EINVAL;
strncpy(buf, start, len);
buf[len] = '\0';
err = kstrtouint(buf, 0, &size);
if (err)
return err;
if (size > STR_VAR_LEN_MAX)
return -EINVAL;
return size;
}
static int synth_field_size(char *type)
{
int size = 0;
if (strcmp(type, "s64") == 0)
size = sizeof(s64);
else if (strcmp(type, "u64") == 0)
size = sizeof(u64);
else if (strcmp(type, "s32") == 0)
size = sizeof(s32);
else if (strcmp(type, "u32") == 0)
size = sizeof(u32);
else if (strcmp(type, "s16") == 0)
size = sizeof(s16);
else if (strcmp(type, "u16") == 0)
size = sizeof(u16);
else if (strcmp(type, "s8") == 0)
size = sizeof(s8);
else if (strcmp(type, "u8") == 0)
size = sizeof(u8);
else if (strcmp(type, "char") == 0)
size = sizeof(char);
else if (strcmp(type, "unsigned char") == 0)
size = sizeof(unsigned char);
else if (strcmp(type, "int") == 0)
size = sizeof(int);
else if (strcmp(type, "unsigned int") == 0)
size = sizeof(unsigned int);
else if (strcmp(type, "long") == 0)
size = sizeof(long);
else if (strcmp(type, "unsigned long") == 0)
size = sizeof(unsigned long);
else if (strcmp(type, "pid_t") == 0)
size = sizeof(pid_t);
else if (strcmp(type, "gfp_t") == 0)
size = sizeof(gfp_t);
else if (synth_field_is_string(type))
size = synth_field_string_size(type);
return size;
}
static const char *synth_field_fmt(char *type)
{
const char *fmt = "%llu";
if (strcmp(type, "s64") == 0)
fmt = "%lld";
else if (strcmp(type, "u64") == 0)
fmt = "%llu";
else if (strcmp(type, "s32") == 0)
fmt = "%d";
else if (strcmp(type, "u32") == 0)
fmt = "%u";
else if (strcmp(type, "s16") == 0)
fmt = "%d";
else if (strcmp(type, "u16") == 0)
fmt = "%u";
else if (strcmp(type, "s8") == 0)
fmt = "%d";
else if (strcmp(type, "u8") == 0)
fmt = "%u";
else if (strcmp(type, "char") == 0)
fmt = "%d";
else if (strcmp(type, "unsigned char") == 0)
fmt = "%u";
else if (strcmp(type, "int") == 0)
fmt = "%d";
else if (strcmp(type, "unsigned int") == 0)
fmt = "%u";
else if (strcmp(type, "long") == 0)
fmt = "%ld";
else if (strcmp(type, "unsigned long") == 0)
fmt = "%lu";
else if (strcmp(type, "pid_t") == 0)
fmt = "%d";
else if (strcmp(type, "gfp_t") == 0)
fmt = "%x";
else if (synth_field_is_string(type))
fmt = "%s";
return fmt;
}
static void print_synth_event_num_val(struct trace_seq *s,
char *print_fmt, char *name,
int size, u64 val, char *space)
{
switch (size) {
case 1:
trace_seq_printf(s, print_fmt, name, (u8)val, space);
break;
case 2:
trace_seq_printf(s, print_fmt, name, (u16)val, space);
break;
case 4:
trace_seq_printf(s, print_fmt, name, (u32)val, space);
break;
default:
trace_seq_printf(s, print_fmt, name, val, space);
break;
}
}
static enum print_line_t print_synth_event(struct trace_iterator *iter,
int flags,
struct trace_event *event)
{
struct trace_array *tr = iter->tr;
struct trace_seq *s = &iter->seq;
struct synth_trace_event *entry;
struct synth_event *se;
unsigned int i, n_u64;
char print_fmt[32];
const char *fmt;
entry = (struct synth_trace_event *)iter->ent;
se = container_of(event, struct synth_event, call.event);
trace_seq_printf(s, "%s: ", se->name);
for (i = 0, n_u64 = 0; i < se->n_fields; i++) {
if (trace_seq_has_overflowed(s))
goto end;
fmt = synth_field_fmt(se->fields[i]->type);
/* parameter types */
if (tr && tr->trace_flags & TRACE_ITER_VERBOSE)
trace_seq_printf(s, "%s ", fmt);
snprintf(print_fmt, sizeof(print_fmt), "%%s=%s%%s", fmt);
/* parameter values */
if (se->fields[i]->is_string) {
trace_seq_printf(s, print_fmt, se->fields[i]->name,
(char *)&entry->fields[n_u64],
i == se->n_fields - 1 ? "" : " ");
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
struct trace_print_flags __flags[] = {
__def_gfpflag_names, {-1, NULL} };
char *space = (i == se->n_fields - 1 ? "" : " ");
print_synth_event_num_val(s, print_fmt,
se->fields[i]->name,
se->fields[i]->size,
entry->fields[n_u64],
space);
if (strcmp(se->fields[i]->type, "gfp_t") == 0) {
trace_seq_puts(s, " (");
trace_print_flags_seq(s, "|",
entry->fields[n_u64],
__flags);
trace_seq_putc(s, ')');
}
n_u64++;
}
}
end:
trace_seq_putc(s, '\n');
return trace_handle_return(s);
}
static struct trace_event_functions synth_event_funcs = {
.trace = print_synth_event
};
static notrace void trace_event_raw_event_synth(void *__data,
u64 *var_ref_vals,
unsigned int *var_ref_idx)
{
struct trace_event_file *trace_file = __data;
struct synth_trace_event *entry;
struct trace_event_buffer fbuffer;
struct trace_buffer *buffer;
struct synth_event *event;
unsigned int i, n_u64, val_idx;
int fields_size = 0;
event = trace_file->event_call->data;
if (trace_trigger_soft_disabled(trace_file))
return;
fields_size = event->n_u64 * sizeof(u64);
/*
* Avoid ring buffer recursion detection, as this event
* is being performed within another event.
*/
buffer = trace_file->tr->array_buffer.buffer;
ring_buffer_nest_start(buffer);
entry = trace_event_buffer_reserve(&fbuffer, trace_file,
sizeof(*entry) + fields_size);
if (!entry)
goto out;
for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
val_idx = var_ref_idx[i];
if (event->fields[i]->is_string) {
char *str_val = (char *)(long)var_ref_vals[val_idx];
char *str_field = (char *)&entry->fields[n_u64];
strscpy(str_field, str_val, STR_VAR_LEN_MAX);
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
struct synth_field *field = event->fields[i];
u64 val = var_ref_vals[val_idx];
switch (field->size) {
case 1:
*(u8 *)&entry->fields[n_u64] = (u8)val;
break;
case 2:
*(u16 *)&entry->fields[n_u64] = (u16)val;
break;
case 4:
*(u32 *)&entry->fields[n_u64] = (u32)val;
break;
default:
entry->fields[n_u64] = val;
break;
}
n_u64++;
}
}
trace_event_buffer_commit(&fbuffer);
out:
ring_buffer_nest_end(buffer);
}
static void free_synth_event_print_fmt(struct trace_event_call *call)
{
if (call) {
kfree(call->print_fmt);
call->print_fmt = NULL;
}
}
static int __set_synth_event_print_fmt(struct synth_event *event,
char *buf, int len)
{
const char *fmt;
int pos = 0;
int i;
/* When len=0, we just calculate the needed length */
#define LEN_OR_ZERO (len ? len - pos : 0)
pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
for (i = 0; i < event->n_fields; i++) {
fmt = synth_field_fmt(event->fields[i]->type);
pos += snprintf(buf + pos, LEN_OR_ZERO, "%s=%s%s",
event->fields[i]->name, fmt,
i == event->n_fields - 1 ? "" : ", ");
}
pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
for (i = 0; i < event->n_fields; i++) {
pos += snprintf(buf + pos, LEN_OR_ZERO,
", REC->%s", event->fields[i]->name);
}
#undef LEN_OR_ZERO
/* return the length of print_fmt */
return pos;
}
static int set_synth_event_print_fmt(struct trace_event_call *call)
{
struct synth_event *event = call->data;
char *print_fmt;
int len;
/* First: called with 0 length to calculate the needed length */
len = __set_synth_event_print_fmt(event, NULL, 0);
print_fmt = kmalloc(len + 1, GFP_KERNEL);
if (!print_fmt)
return -ENOMEM;
/* Second: actually write the @print_fmt */
__set_synth_event_print_fmt(event, print_fmt, len + 1);
call->print_fmt = print_fmt;
return 0;
}
static void free_synth_field(struct synth_field *field)
{
kfree(field->type);
kfree(field->name);
kfree(field);
}
static struct synth_field *parse_synth_field(int argc, const char **argv,
int *consumed)
{
struct synth_field *field;
const char *prefix = NULL, *field_type = argv[0], *field_name, *array;
int len, ret = 0;
if (field_type[0] == ';')
field_type++;
if (!strcmp(field_type, "unsigned")) {
if (argc < 3)
return ERR_PTR(-EINVAL);
prefix = "unsigned ";
field_type = argv[1];
field_name = argv[2];
*consumed = 3;
} else {
field_name = argv[1];
*consumed = 2;
}
field = kzalloc(sizeof(*field), GFP_KERNEL);
if (!field)
return ERR_PTR(-ENOMEM);
len = strlen(field_name);
array = strchr(field_name, '[');
if (array)
len -= strlen(array);
else if (field_name[len - 1] == ';')
len--;
field->name = kmemdup_nul(field_name, len, GFP_KERNEL);
if (!field->name) {
ret = -ENOMEM;
goto free;
}
if (field_type[0] == ';')
field_type++;
len = strlen(field_type) + 1;
if (array)
len += strlen(array);
if (prefix)
len += strlen(prefix);
field->type = kzalloc(len, GFP_KERNEL);
if (!field->type) {
ret = -ENOMEM;
goto free;
}
if (prefix)
strcat(field->type, prefix);
strcat(field->type, field_type);
if (array) {
strcat(field->type, array);
if (field->type[len - 1] == ';')
field->type[len - 1] = '\0';
}
field->size = synth_field_size(field->type);
if (!field->size) {
ret = -EINVAL;
goto free;
}
if (synth_field_is_string(field->type))
field->is_string = true;
field->is_signed = synth_field_signed(field->type);
out:
return field;
free:
free_synth_field(field);
field = ERR_PTR(ret);
goto out;
}
static void free_synth_tracepoint(struct tracepoint *tp)
{
if (!tp)
return;
kfree(tp->name);
kfree(tp);
}
static struct tracepoint *alloc_synth_tracepoint(char *name)
{
struct tracepoint *tp;
tp = kzalloc(sizeof(*tp), GFP_KERNEL);
if (!tp)
return ERR_PTR(-ENOMEM);
tp->name = kstrdup(name, GFP_KERNEL);
if (!tp->name) {
kfree(tp);
return ERR_PTR(-ENOMEM);
}
return tp;
}
typedef void (*synth_probe_func_t) (void *__data, u64 *var_ref_vals,
unsigned int *var_ref_idx);
static inline void trace_synth(struct synth_event *event, u64 *var_ref_vals,
unsigned int *var_ref_idx)
{
struct tracepoint *tp = event->tp;
if (unlikely(atomic_read(&tp->key.enabled) > 0)) {
struct tracepoint_func *probe_func_ptr;
synth_probe_func_t probe_func;
void *__data;
if (!(cpu_online(raw_smp_processor_id())))
return;
probe_func_ptr = rcu_dereference_sched((tp)->funcs);
if (probe_func_ptr) {
do {
probe_func = probe_func_ptr->func;
__data = probe_func_ptr->data;
probe_func(__data, var_ref_vals, var_ref_idx);
} while ((++probe_func_ptr)->func);
}
}
}
static struct synth_event *find_synth_event(const char *name)
{
struct dyn_event *pos;
struct synth_event *event;
for_each_dyn_event(pos) {
if (!is_synth_event(pos))
continue;
event = to_synth_event(pos);
if (strcmp(event->name, name) == 0)
return event;
}
return NULL;
}
static struct trace_event_fields synth_event_fields_array[] = {
{ .type = TRACE_FUNCTION_TYPE,
.define_fields = synth_event_define_fields },
{}
};
static int register_synth_event(struct synth_event *event)
{
struct trace_event_call *call = &event->call;
int ret = 0;
event->call.class = &event->class;
event->class.system = kstrdup(SYNTH_SYSTEM, GFP_KERNEL);
if (!event->class.system) {
ret = -ENOMEM;
goto out;
}
event->tp = alloc_synth_tracepoint(event->name);
if (IS_ERR(event->tp)) {
ret = PTR_ERR(event->tp);
event->tp = NULL;
goto out;
}
INIT_LIST_HEAD(&call->class->fields);
call->event.funcs = &synth_event_funcs;
call->class->fields_array = synth_event_fields_array;
ret = register_trace_event(&call->event);
if (!ret) {
ret = -ENODEV;
goto out;
}
call->flags = TRACE_EVENT_FL_TRACEPOINT;
call->class->reg = trace_event_reg;
call->class->probe = trace_event_raw_event_synth;
call->data = event;
call->tp = event->tp;
ret = trace_add_event_call(call);
if (ret) {
pr_warn("Failed to register synthetic event: %s\n",
trace_event_name(call));
goto err;
}
ret = set_synth_event_print_fmt(call);
if (ret < 0) {
trace_remove_event_call(call);
goto err;
}
out:
return ret;
err:
unregister_trace_event(&call->event);
goto out;
}
static int unregister_synth_event(struct synth_event *event)
{
struct trace_event_call *call = &event->call;
int ret;
ret = trace_remove_event_call(call);
return ret;
}
static void free_synth_event(struct synth_event *event)
{
unsigned int i;
if (!event)
return;
for (i = 0; i < event->n_fields; i++)
free_synth_field(event->fields[i]);
kfree(event->fields);
kfree(event->name);
kfree(event->class.system);
free_synth_tracepoint(event->tp);
free_synth_event_print_fmt(&event->call);
kfree(event);
}
static struct synth_event *alloc_synth_event(const char *name, int n_fields,
struct synth_field **fields)
{
struct synth_event *event;
unsigned int i;
event = kzalloc(sizeof(*event), GFP_KERNEL);
if (!event) {
event = ERR_PTR(-ENOMEM);
goto out;
}
event->name = kstrdup(name, GFP_KERNEL);
if (!event->name) {
kfree(event);
event = ERR_PTR(-ENOMEM);
goto out;
}
event->fields = kcalloc(n_fields, sizeof(*event->fields), GFP_KERNEL);
if (!event->fields) {
free_synth_event(event);
event = ERR_PTR(-ENOMEM);
goto out;
}
dyn_event_init(&event->devent, &synth_event_ops);
for (i = 0; i < n_fields; i++)
event->fields[i] = fields[i];
event->n_fields = n_fields;
out:
return event;
}
static void action_trace(struct hist_trigger_data *hist_data,
struct tracing_map_elt *elt, void *rec,
struct ring_buffer_event *rbe, void *key,
struct action_data *data, u64 *var_ref_vals)
{
struct synth_event *event = data->synth_event;
trace_synth(event, var_ref_vals, data->var_ref_idx);
}
struct hist_var_data {
struct list_head list;
struct hist_trigger_data *hist_data;
};
static int synth_event_check_arg_fn(void *data)
{
struct dynevent_arg_pair *arg_pair = data;
int size;
size = synth_field_size((char *)arg_pair->lhs);
return size ? 0 : -EINVAL;
}
/**
* synth_event_add_field - Add a new field to a synthetic event cmd
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @type: The type of the new field to add
* @name: The name of the new field to add
*
* Add a new field to a synthetic event cmd object. Field ordering is in
* the same order the fields are added.
*
* See synth_field_size() for available types. If field_name contains
* [n] the field is considered to be an array.
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_add_field(struct dynevent_cmd *cmd, const char *type,
const char *name)
{
struct dynevent_arg_pair arg_pair;
int ret;
if (cmd->type != DYNEVENT_TYPE_SYNTH)
return -EINVAL;
if (!type || !name)
return -EINVAL;
dynevent_arg_pair_init(&arg_pair, 0, ';');
arg_pair.lhs = type;
arg_pair.rhs = name;
ret = dynevent_arg_pair_add(cmd, &arg_pair, synth_event_check_arg_fn);
if (ret)
return ret;
if (++cmd->n_fields > SYNTH_FIELDS_MAX)
ret = -EINVAL;
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_add_field);
/**
* synth_event_add_field_str - Add a new field to a synthetic event cmd
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @type_name: The type and name of the new field to add, as a single string
*
* Add a new field to a synthetic event cmd object, as a single
* string. The @type_name string is expected to be of the form 'type
* name', which will be appended by ';'. No sanity checking is done -
* what's passed in is assumed to already be well-formed. Field
* ordering is in the same order the fields are added.
*
* See synth_field_size() for available types. If field_name contains
* [n] the field is considered to be an array.
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_add_field_str(struct dynevent_cmd *cmd, const char *type_name)
{
struct dynevent_arg arg;
int ret;
if (cmd->type != DYNEVENT_TYPE_SYNTH)
return -EINVAL;
if (!type_name)
return -EINVAL;
dynevent_arg_init(&arg, ';');
arg.str = type_name;
ret = dynevent_arg_add(cmd, &arg, NULL);
if (ret)
return ret;
if (++cmd->n_fields > SYNTH_FIELDS_MAX)
ret = -EINVAL;
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_add_field_str);
/**
* synth_event_add_fields - Add multiple fields to a synthetic event cmd
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @fields: An array of type/name field descriptions
* @n_fields: The number of field descriptions contained in the fields array
*
* Add a new set of fields to a synthetic event cmd object. The event
* fields that will be defined for the event should be passed in as an
* array of struct synth_field_desc, and the number of elements in the
* array passed in as n_fields. Field ordering will retain the
* ordering given in the fields array.
*
* See synth_field_size() for available types. If field_name contains
* [n] the field is considered to be an array.
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_add_fields(struct dynevent_cmd *cmd,
struct synth_field_desc *fields,
unsigned int n_fields)
{
unsigned int i;
int ret = 0;
for (i = 0; i < n_fields; i++) {
if (fields[i].type == NULL || fields[i].name == NULL) {
ret = -EINVAL;
break;
}
ret = synth_event_add_field(cmd, fields[i].type, fields[i].name);
if (ret)
break;
}
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_add_fields);
/**
* __synth_event_gen_cmd_start - Start a synthetic event command from arg list
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @name: The name of the synthetic event
* @mod: The module creating the event, NULL if not created from a module
* @args: Variable number of arg (pairs), one pair for each field
*
* NOTE: Users normally won't want to call this function directly, but
* rather use the synth_event_gen_cmd_start() wrapper, which
* automatically adds a NULL to the end of the arg list. If this
* function is used directly, make sure the last arg in the variable
* arg list is NULL.
*
* Generate a synthetic event command to be executed by
* synth_event_gen_cmd_end(). This function can be used to generate
* the complete command or only the first part of it; in the latter
* case, synth_event_add_field(), synth_event_add_field_str(), or
* synth_event_add_fields() can be used to add more fields following
* this.
*
* There should be an even number variable args, each pair consisting
* of a type followed by a field name.
*
* See synth_field_size() for available types. If field_name contains
* [n] the field is considered to be an array.
*
* Return: 0 if successful, error otherwise.
*/
int __synth_event_gen_cmd_start(struct dynevent_cmd *cmd, const char *name,
struct module *mod, ...)
{
struct dynevent_arg arg;
va_list args;
int ret;
cmd->event_name = name;
cmd->private_data = mod;
if (cmd->type != DYNEVENT_TYPE_SYNTH)
return -EINVAL;
dynevent_arg_init(&arg, 0);
arg.str = name;
ret = dynevent_arg_add(cmd, &arg, NULL);
if (ret)
return ret;
va_start(args, mod);
for (;;) {
const char *type, *name;
type = va_arg(args, const char *);
if (!type)
break;
name = va_arg(args, const char *);
if (!name)
break;
if (++cmd->n_fields > SYNTH_FIELDS_MAX) {
ret = -EINVAL;
break;
}
ret = synth_event_add_field(cmd, type, name);
if (ret)
break;
}
va_end(args);
return ret;
}
EXPORT_SYMBOL_GPL(__synth_event_gen_cmd_start);
/**
* synth_event_gen_cmd_array_start - Start synthetic event command from an array
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @name: The name of the synthetic event
* @fields: An array of type/name field descriptions
* @n_fields: The number of field descriptions contained in the fields array
*
* Generate a synthetic event command to be executed by
* synth_event_gen_cmd_end(). This function can be used to generate
* the complete command or only the first part of it; in the latter
* case, synth_event_add_field(), synth_event_add_field_str(), or
* synth_event_add_fields() can be used to add more fields following
* this.
*
* The event fields that will be defined for the event should be
* passed in as an array of struct synth_field_desc, and the number of
* elements in the array passed in as n_fields. Field ordering will
* retain the ordering given in the fields array.
*
* See synth_field_size() for available types. If field_name contains
* [n] the field is considered to be an array.
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_gen_cmd_array_start(struct dynevent_cmd *cmd, const char *name,
struct module *mod,
struct synth_field_desc *fields,
unsigned int n_fields)
{
struct dynevent_arg arg;
unsigned int i;
int ret = 0;
cmd->event_name = name;
cmd->private_data = mod;
if (cmd->type != DYNEVENT_TYPE_SYNTH)
return -EINVAL;
if (n_fields > SYNTH_FIELDS_MAX)
return -EINVAL;
dynevent_arg_init(&arg, 0);
arg.str = name;
ret = dynevent_arg_add(cmd, &arg, NULL);
if (ret)
return ret;
for (i = 0; i < n_fields; i++) {
if (fields[i].type == NULL || fields[i].name == NULL)
return -EINVAL;
ret = synth_event_add_field(cmd, fields[i].type, fields[i].name);
if (ret)
break;
}
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_gen_cmd_array_start);
static int __create_synth_event(int argc, const char *name, const char **argv)
{
struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
struct synth_event *event = NULL;
int i, consumed = 0, n_fields = 0, ret = 0;
/*
* Argument syntax:
* - Add synthetic event: <event_name> field[;field] ...
* - Remove synthetic event: !<event_name> field[;field] ...
* where 'field' = type field_name
*/
if (name[0] == '\0' || argc < 1)
return -EINVAL;
mutex_lock(&event_mutex);
event = find_synth_event(name);
if (event) {
ret = -EEXIST;
goto out;
}
for (i = 0; i < argc - 1; i++) {
if (strcmp(argv[i], ";") == 0)
continue;
if (n_fields == SYNTH_FIELDS_MAX) {
ret = -EINVAL;
goto err;
}
field = parse_synth_field(argc - i, &argv[i], &consumed);
if (IS_ERR(field)) {
ret = PTR_ERR(field);
goto err;
}
fields[n_fields++] = field;
i += consumed - 1;
}
if (i < argc && strcmp(argv[i], ";") != 0) {
ret = -EINVAL;
goto err;
}
event = alloc_synth_event(name, n_fields, fields);
if (IS_ERR(event)) {
ret = PTR_ERR(event);
event = NULL;
goto err;
}
ret = register_synth_event(event);
if (!ret)
dyn_event_add(&event->devent);
else
free_synth_event(event);
out:
mutex_unlock(&event_mutex);
return ret;
err:
for (i = 0; i < n_fields; i++)
free_synth_field(fields[i]);
goto out;
}
/**
* synth_event_create - Create a new synthetic event
* @name: The name of the new sythetic event
* @fields: An array of type/name field descriptions
* @n_fields: The number of field descriptions contained in the fields array
* @mod: The module creating the event, NULL if not created from a module
*
* Create a new synthetic event with the given name under the
* trace/events/synthetic/ directory. The event fields that will be
* defined for the event should be passed in as an array of struct
* synth_field_desc, and the number elements in the array passed in as
* n_fields. Field ordering will retain the ordering given in the
* fields array.
*
* If the new synthetic event is being created from a module, the mod
* param must be non-NULL. This will ensure that the trace buffer
* won't contain unreadable events.
*
* The new synth event should be deleted using synth_event_delete()
* function. The new synthetic event can be generated from modules or
* other kernel code using trace_synth_event() and related functions.
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_create(const char *name, struct synth_field_desc *fields,
unsigned int n_fields, struct module *mod)
{
struct dynevent_cmd cmd;
char *buf;
int ret;
buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
if (!buf)
return -ENOMEM;
synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
ret = synth_event_gen_cmd_array_start(&cmd, name, mod,
fields, n_fields);
if (ret)
goto out;
ret = synth_event_gen_cmd_end(&cmd);
out:
kfree(buf);
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_create);
static int destroy_synth_event(struct synth_event *se)
{
int ret;
if (se->ref)
ret = -EBUSY;
else {
ret = unregister_synth_event(se);
if (!ret) {
dyn_event_remove(&se->devent);
free_synth_event(se);
}
}
return ret;
}
/**
* synth_event_delete - Delete a synthetic event
* @event_name: The name of the new sythetic event
*
* Delete a synthetic event that was created with synth_event_create().
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_delete(const char *event_name)
{
struct synth_event *se = NULL;
struct module *mod = NULL;
int ret = -ENOENT;
mutex_lock(&event_mutex);
se = find_synth_event(event_name);
if (se) {
mod = se->mod;
ret = destroy_synth_event(se);
}
mutex_unlock(&event_mutex);
if (mod) {
mutex_lock(&trace_types_lock);
/*
* It is safest to reset the ring buffer if the module
* being unloaded registered any events that were
* used. The only worry is if a new module gets
* loaded, and takes on the same id as the events of
* this module. When printing out the buffer, traced
* events left over from this module may be passed to
* the new module events and unexpected results may
* occur.
*/
tracing_reset_all_online_cpus();
mutex_unlock(&trace_types_lock);
}
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_delete);
static int create_or_delete_synth_event(int argc, char **argv)
{
const char *name = argv[0];
int ret;
/* trace_run_command() ensures argc != 0 */
if (name[0] == '!') {
ret = synth_event_delete(name + 1);
return ret;
}
ret = __create_synth_event(argc - 1, name, (const char **)argv + 1);
return ret == -ECANCELED ? -EINVAL : ret;
}
static int synth_event_run_command(struct dynevent_cmd *cmd)
{
struct synth_event *se;
int ret;
ret = trace_run_command(cmd->seq.buffer, create_or_delete_synth_event);
if (ret)
return ret;
se = find_synth_event(cmd->event_name);
if (WARN_ON(!se))
return -ENOENT;
se->mod = cmd->private_data;
return ret;
}
/**
* synth_event_cmd_init - Initialize a synthetic event command object
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @buf: A pointer to the buffer used to build the command
* @maxlen: The length of the buffer passed in @buf
*
* Initialize a synthetic event command object. Use this before
* calling any of the other dyenvent_cmd functions.
*/
void synth_event_cmd_init(struct dynevent_cmd *cmd, char *buf, int maxlen)
{
dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_SYNTH,
synth_event_run_command);
}
EXPORT_SYMBOL_GPL(synth_event_cmd_init);
static inline int
__synth_event_trace_start(struct trace_event_file *file,
struct synth_event_trace_state *trace_state)
{
int entry_size, fields_size = 0;
int ret = 0;
memset(trace_state, '\0', sizeof(*trace_state));
/*
* Normal event tracing doesn't get called at all unless the
* ENABLED bit is set (which attaches the probe thus allowing
* this code to be called, etc). Because this is called
* directly by the user, we don't have that but we still need
* to honor not logging when disabled. For the the iterated
* trace case, we save the enabed state upon start and just
* ignore the following data calls.
*/
if (!(file->flags & EVENT_FILE_FL_ENABLED) ||
trace_trigger_soft_disabled(file)) {
trace_state->disabled = true;
ret = -ENOENT;
goto out;
}
trace_state->event = file->event_call->data;
fields_size = trace_state->event->n_u64 * sizeof(u64);
/*
* Avoid ring buffer recursion detection, as this event
* is being performed within another event.
*/
trace_state->buffer = file->tr->array_buffer.buffer;
ring_buffer_nest_start(trace_state->buffer);
entry_size = sizeof(*trace_state->entry) + fields_size;
trace_state->entry = trace_event_buffer_reserve(&trace_state->fbuffer,
file,
entry_size);
if (!trace_state->entry) {
ring_buffer_nest_end(trace_state->buffer);
ret = -EINVAL;
}
out:
return ret;
}
static inline void
__synth_event_trace_end(struct synth_event_trace_state *trace_state)
{
trace_event_buffer_commit(&trace_state->fbuffer);
ring_buffer_nest_end(trace_state->buffer);
}
/**
* synth_event_trace - Trace a synthetic event
* @file: The trace_event_file representing the synthetic event
* @n_vals: The number of values in vals
* @args: Variable number of args containing the event values
*
* Trace a synthetic event using the values passed in the variable
* argument list.
*
* The argument list should be a list 'n_vals' u64 values. The number
* of vals must match the number of field in the synthetic event, and
* must be in the same order as the synthetic event fields.
*
* All vals should be cast to u64, and string vals are just pointers
* to strings, cast to u64. Strings will be copied into space
* reserved in the event for the string, using these pointers.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
{
struct synth_event_trace_state state;
unsigned int i, n_u64;
va_list args;
int ret;
ret = __synth_event_trace_start(file, &state);
if (ret) {
if (ret == -ENOENT)
ret = 0; /* just disabled, not really an error */
return ret;
}
if (n_vals != state.event->n_fields) {
ret = -EINVAL;
goto out;
}
va_start(args, n_vals);
for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
u64 val;
val = va_arg(args, u64);
if (state.event->fields[i]->is_string) {
char *str_val = (char *)(long)val;
char *str_field = (char *)&state.entry->fields[n_u64];
strscpy(str_field, str_val, STR_VAR_LEN_MAX);
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
struct synth_field *field = state.event->fields[i];
switch (field->size) {
case 1:
*(u8 *)&state.entry->fields[n_u64] = (u8)val;
break;
case 2:
*(u16 *)&state.entry->fields[n_u64] = (u16)val;
break;
case 4:
*(u32 *)&state.entry->fields[n_u64] = (u32)val;
break;
default:
state.entry->fields[n_u64] = val;
break;
}
n_u64++;
}
}
va_end(args);
out:
__synth_event_trace_end(&state);
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_trace);
/**
* synth_event_trace_array - Trace a synthetic event from an array
* @file: The trace_event_file representing the synthetic event
* @vals: Array of values
* @n_vals: The number of values in vals
*
* Trace a synthetic event using the values passed in as 'vals'.
*
* The 'vals' array is just an array of 'n_vals' u64. The number of
* vals must match the number of field in the synthetic event, and
* must be in the same order as the synthetic event fields.
*
* All vals should be cast to u64, and string vals are just pointers
* to strings, cast to u64. Strings will be copied into space
* reserved in the event for the string, using these pointers.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
unsigned int n_vals)
{
struct synth_event_trace_state state;
unsigned int i, n_u64;
int ret;
ret = __synth_event_trace_start(file, &state);
if (ret) {
if (ret == -ENOENT)
ret = 0; /* just disabled, not really an error */
return ret;
}
if (n_vals != state.event->n_fields) {
ret = -EINVAL;
goto out;
}
for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
if (state.event->fields[i]->is_string) {
char *str_val = (char *)(long)vals[i];
char *str_field = (char *)&state.entry->fields[n_u64];
strscpy(str_field, str_val, STR_VAR_LEN_MAX);
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
struct synth_field *field = state.event->fields[i];
u64 val = vals[i];
switch (field->size) {
case 1:
*(u8 *)&state.entry->fields[n_u64] = (u8)val;
break;
case 2:
*(u16 *)&state.entry->fields[n_u64] = (u16)val;
break;
case 4:
*(u32 *)&state.entry->fields[n_u64] = (u32)val;
break;
default:
state.entry->fields[n_u64] = val;
break;
}
n_u64++;
}
}
out:
__synth_event_trace_end(&state);
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_trace_array);
/**
* synth_event_trace_start - Start piecewise synthetic event trace
* @file: The trace_event_file representing the synthetic event
* @trace_state: A pointer to object tracking the piecewise trace state
*
* Start the trace of a synthetic event field-by-field rather than all
* at once.
*
* This function 'opens' an event trace, which means space is reserved
* for the event in the trace buffer, after which the event's
* individual field values can be set through either
* synth_event_add_next_val() or synth_event_add_val().
*
* A pointer to a trace_state object is passed in, which will keep
* track of the current event trace state until the event trace is
* closed (and the event finally traced) using
* synth_event_trace_end().
*
* Note that synth_event_trace_end() must be called after all values
* have been added for each event trace, regardless of whether adding
* all field values succeeded or not.
*
* Note also that for a given event trace, all fields must be added
* using either synth_event_add_next_val() or synth_event_add_val()
* but not both together or interleaved.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_trace_start(struct trace_event_file *file,
struct synth_event_trace_state *trace_state)
{
int ret;
if (!trace_state)
return -EINVAL;
ret = __synth_event_trace_start(file, trace_state);
if (ret == -ENOENT)
ret = 0; /* just disabled, not really an error */
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_trace_start);
static int __synth_event_add_val(const char *field_name, u64 val,
struct synth_event_trace_state *trace_state)
{
struct synth_field *field = NULL;
struct synth_trace_event *entry;
struct synth_event *event;
int i, ret = 0;
if (!trace_state) {
ret = -EINVAL;
goto out;
}
/* can't mix add_next_synth_val() with add_synth_val() */
if (field_name) {
if (trace_state->add_next) {
ret = -EINVAL;
goto out;
}
trace_state->add_name = true;
} else {
if (trace_state->add_name) {
ret = -EINVAL;
goto out;
}
trace_state->add_next = true;
}
if (trace_state->disabled)
goto out;
event = trace_state->event;
if (trace_state->add_name) {
for (i = 0; i < event->n_fields; i++) {
field = event->fields[i];
if (strcmp(field->name, field_name) == 0)
break;
}
if (!field) {
ret = -EINVAL;
goto out;
}
} else {
if (trace_state->cur_field >= event->n_fields) {
ret = -EINVAL;
goto out;
}
field = event->fields[trace_state->cur_field++];
}
entry = trace_state->entry;
if (field->is_string) {
char *str_val = (char *)(long)val;
char *str_field;
if (!str_val) {
ret = -EINVAL;
goto out;
}
str_field = (char *)&entry->fields[field->offset];
strscpy(str_field, str_val, STR_VAR_LEN_MAX);
} else {
switch (field->size) {
case 1:
*(u8 *)&trace_state->entry->fields[field->offset] = (u8)val;
break;
case 2:
*(u16 *)&trace_state->entry->fields[field->offset] = (u16)val;
break;
case 4:
*(u32 *)&trace_state->entry->fields[field->offset] = (u32)val;
break;
default:
trace_state->entry->fields[field->offset] = val;
break;
}
}
out:
return ret;
}
/**
* synth_event_add_next_val - Add the next field's value to an open synth trace
* @val: The value to set the next field to
* @trace_state: A pointer to object tracking the piecewise trace state
*
* Set the value of the next field in an event that's been opened by
* synth_event_trace_start().
*
* The val param should be the value cast to u64. If the value points
* to a string, the val param should be a char * cast to u64.
*
* This function assumes all the fields in an event are to be set one
* after another - successive calls to this function are made, one for
* each field, in the order of the fields in the event, until all
* fields have been set. If you'd rather set each field individually
* without regard to ordering, synth_event_add_val() can be used
* instead.
*
* Note however that synth_event_add_next_val() and
* synth_event_add_val() can't be intermixed for a given event trace -
* one or the other but not both can be used at the same time.
*
* Note also that synth_event_trace_end() must be called after all
* values have been added for each event trace, regardless of whether
* adding all field values succeeded or not.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_add_next_val(u64 val,
struct synth_event_trace_state *trace_state)
{
return __synth_event_add_val(NULL, val, trace_state);
}
EXPORT_SYMBOL_GPL(synth_event_add_next_val);
/**
* synth_event_add_val - Add a named field's value to an open synth trace
* @field_name: The name of the synthetic event field value to set
* @val: The value to set the next field to
* @trace_state: A pointer to object tracking the piecewise trace state
*
* Set the value of the named field in an event that's been opened by
* synth_event_trace_start().
*
* The val param should be the value cast to u64. If the value points
* to a string, the val param should be a char * cast to u64.
*
* This function looks up the field name, and if found, sets the field
* to the specified value. This lookup makes this function more
* expensive than synth_event_add_next_val(), so use that or the
* none-piecewise synth_event_trace() instead if efficiency is more
* important.
*
* Note however that synth_event_add_next_val() and
* synth_event_add_val() can't be intermixed for a given event trace -
* one or the other but not both can be used at the same time.
*
* Note also that synth_event_trace_end() must be called after all
* values have been added for each event trace, regardless of whether
* adding all field values succeeded or not.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_add_val(const char *field_name, u64 val,
struct synth_event_trace_state *trace_state)
{
return __synth_event_add_val(field_name, val, trace_state);
}
EXPORT_SYMBOL_GPL(synth_event_add_val);
/**
* synth_event_trace_end - End piecewise synthetic event trace
* @trace_state: A pointer to object tracking the piecewise trace state
*
* End the trace of a synthetic event opened by
* synth_event_trace__start().
*
* This function 'closes' an event trace, which basically means that
* it commits the reserved event and cleans up other loose ends.
*
* A pointer to a trace_state object is passed in, which will keep
* track of the current event trace state opened with
* synth_event_trace_start().
*
* Note that this function must be called after all values have been
* added for each event trace, regardless of whether adding all field
* values succeeded or not.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_trace_end(struct synth_event_trace_state *trace_state)
{
if (!trace_state)
return -EINVAL;
__synth_event_trace_end(trace_state);
return 0;
}
EXPORT_SYMBOL_GPL(synth_event_trace_end);
static int create_synth_event(int argc, const char **argv)
{
const char *name = argv[0];
int len;
if (name[0] != 's' || name[1] != ':')
return -ECANCELED;
name += 2;
/* This interface accepts group name prefix */
if (strchr(name, '/')) {
len = str_has_prefix(name, SYNTH_SYSTEM "/");
if (len == 0)
return -EINVAL;
name += len;
}
return __create_synth_event(argc - 1, name, argv + 1);
}
static int synth_event_release(struct dyn_event *ev)
{
struct synth_event *event = to_synth_event(ev);
int ret;
if (event->ref)
return -EBUSY;
ret = unregister_synth_event(event); static char last_cmd[MAX_FILTER_STR_VAL];
if (ret) static char last_cmd_loc[MAX_FILTER_STR_VAL];
return ret;
dyn_event_remove(ev); static int errpos(char *str)
free_synth_event(event); {
return 0; return err_pos(last_cmd, str);
} }
static int __synth_event_show(struct seq_file *m, struct synth_event *event) static void last_cmd_set(struct trace_event_file *file, char *str)
{ {
struct synth_field *field; const char *system = NULL, *name = NULL;
unsigned int i; struct trace_event_call *call;
seq_printf(m, "%s\t", event->name); if (!str)
return;
for (i = 0; i < event->n_fields; i++) { strcpy(last_cmd, "hist:");
field = event->fields[i]; strncat(last_cmd, str, MAX_FILTER_STR_VAL - 1 - sizeof("hist:"));
/* parameter values */ if (file) {
seq_printf(m, "%s %s%s", field->type, field->name, call = file->event_call;
i == event->n_fields - 1 ? "" : "; "); system = call->class->system;
if (system) {
name = trace_event_name(call);
if (!name)
system = NULL;
}
} }
seq_putc(m, '\n'); if (system)
snprintf(last_cmd_loc, MAX_FILTER_STR_VAL, "hist:%s:%s", system, name);
return 0;
} }
static int synth_event_show(struct seq_file *m, struct dyn_event *ev) static void hist_err(struct trace_array *tr, u8 err_type, u8 err_pos)
{ {
struct synth_event *event = to_synth_event(ev); tracing_log_err(tr, last_cmd_loc, last_cmd, err_text,
err_type, err_pos);
seq_printf(m, "s:%s/", event->class.system);
return __synth_event_show(m, event);
} }
static int synth_events_seq_show(struct seq_file *m, void *v) static void hist_err_clear(void)
{ {
struct dyn_event *ev = v; last_cmd[0] = '\0';
last_cmd_loc[0] = '\0';
if (!is_synth_event(ev))
return 0;
return __synth_event_show(m, to_synth_event(ev));
} }
static const struct seq_operations synth_events_seq_op = { typedef void (*synth_probe_func_t) (void *__data, u64 *var_ref_vals,
.start = dyn_event_seq_start, unsigned int *var_ref_idx);
.next = dyn_event_seq_next,
.stop = dyn_event_seq_stop,
.show = synth_events_seq_show,
};
static int synth_events_open(struct inode *inode, struct file *file) static inline void trace_synth(struct synth_event *event, u64 *var_ref_vals,
unsigned int *var_ref_idx)
{ {
int ret; struct tracepoint *tp = event->tp;
ret = security_locked_down(LOCKDOWN_TRACEFS); if (unlikely(atomic_read(&tp->key.enabled) > 0)) {
if (ret) struct tracepoint_func *probe_func_ptr;
return ret; synth_probe_func_t probe_func;
void *__data;
if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) { if (!(cpu_online(raw_smp_processor_id())))
ret = dyn_events_release_all(&synth_event_ops); return;
if (ret < 0)
return ret;
}
return seq_open(file, &synth_events_seq_op); probe_func_ptr = rcu_dereference_sched((tp)->funcs);
if (probe_func_ptr) {
do {
probe_func = probe_func_ptr->func;
__data = probe_func_ptr->data;
probe_func(__data, var_ref_vals, var_ref_idx);
} while ((++probe_func_ptr)->func);
}
}
} }
static ssize_t synth_events_write(struct file *file, static void action_trace(struct hist_trigger_data *hist_data,
const char __user *buffer, struct tracing_map_elt *elt, void *rec,
size_t count, loff_t *ppos) struct ring_buffer_event *rbe, void *key,
struct action_data *data, u64 *var_ref_vals)
{ {
return trace_parse_run_command(file, buffer, count, ppos, struct synth_event *event = data->synth_event;
create_or_delete_synth_event);
trace_synth(event, var_ref_vals, data->var_ref_idx);
} }
static const struct file_operations synth_events_fops = { struct hist_var_data {
.open = synth_events_open, struct list_head list;
.write = synth_events_write, struct hist_trigger_data *hist_data;
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release,
}; };
static u64 hist_field_timestamp(struct hist_field *hist_field, static u64 hist_field_timestamp(struct hist_field *hist_field,
...@@ -6491,6 +4729,279 @@ const struct file_operations event_hist_fops = { ...@@ -6491,6 +4729,279 @@ const struct file_operations event_hist_fops = {
.release = single_release, .release = single_release,
}; };
#ifdef CONFIG_HIST_TRIGGERS_DEBUG
static void hist_field_debug_show_flags(struct seq_file *m,
unsigned long flags)
{
seq_puts(m, " flags:\n");
if (flags & HIST_FIELD_FL_KEY)
seq_puts(m, " HIST_FIELD_FL_KEY\n");
else if (flags & HIST_FIELD_FL_HITCOUNT)
seq_puts(m, " VAL: HIST_FIELD_FL_HITCOUNT\n");
else if (flags & HIST_FIELD_FL_VAR)
seq_puts(m, " HIST_FIELD_FL_VAR\n");
else if (flags & HIST_FIELD_FL_VAR_REF)
seq_puts(m, " HIST_FIELD_FL_VAR_REF\n");
else
seq_puts(m, " VAL: normal u64 value\n");
if (flags & HIST_FIELD_FL_ALIAS)
seq_puts(m, " HIST_FIELD_FL_ALIAS\n");
}
static int hist_field_debug_show(struct seq_file *m,
struct hist_field *field, unsigned long flags)
{
if ((field->flags & flags) != flags) {
seq_printf(m, "ERROR: bad flags - %lx\n", flags);
return -EINVAL;
}
hist_field_debug_show_flags(m, field->flags);
if (field->field)
seq_printf(m, " ftrace_event_field name: %s\n",
field->field->name);
if (field->flags & HIST_FIELD_FL_VAR) {
seq_printf(m, " var.name: %s\n", field->var.name);
seq_printf(m, " var.idx (into tracing_map_elt.vars[]): %u\n",
field->var.idx);
}
if (field->flags & HIST_FIELD_FL_ALIAS)
seq_printf(m, " var_ref_idx (into hist_data->var_refs[]): %u\n",
field->var_ref_idx);
if (field->flags & HIST_FIELD_FL_VAR_REF) {
seq_printf(m, " name: %s\n", field->name);
seq_printf(m, " var.idx (into tracing_map_elt.vars[]): %u\n",
field->var.idx);
seq_printf(m, " var.hist_data: %p\n", field->var.hist_data);
seq_printf(m, " var_ref_idx (into hist_data->var_refs[]): %u\n",
field->var_ref_idx);
if (field->system)
seq_printf(m, " system: %s\n", field->system);
if (field->event_name)
seq_printf(m, " event_name: %s\n", field->event_name);
}
seq_printf(m, " type: %s\n", field->type);
seq_printf(m, " size: %u\n", field->size);
seq_printf(m, " is_signed: %u\n", field->is_signed);
return 0;
}
static int field_var_debug_show(struct seq_file *m,
struct field_var *field_var, unsigned int i,
bool save_vars)
{
const char *vars_name = save_vars ? "save_vars" : "field_vars";
struct hist_field *field;
int ret = 0;
seq_printf(m, "\n hist_data->%s[%d]:\n", vars_name, i);
field = field_var->var;
seq_printf(m, "\n %s[%d].var:\n", vars_name, i);
hist_field_debug_show_flags(m, field->flags);
seq_printf(m, " var.name: %s\n", field->var.name);
seq_printf(m, " var.idx (into tracing_map_elt.vars[]): %u\n",
field->var.idx);
field = field_var->val;
seq_printf(m, "\n %s[%d].val:\n", vars_name, i);
if (field->field)
seq_printf(m, " ftrace_event_field name: %s\n",
field->field->name);
else {
ret = -EINVAL;
goto out;
}
seq_printf(m, " type: %s\n", field->type);
seq_printf(m, " size: %u\n", field->size);
seq_printf(m, " is_signed: %u\n", field->is_signed);
out:
return ret;
}
static int hist_action_debug_show(struct seq_file *m,
struct action_data *data, int i)
{
int ret = 0;
if (data->handler == HANDLER_ONMAX ||
data->handler == HANDLER_ONCHANGE) {
seq_printf(m, "\n hist_data->actions[%d].track_data.var_ref:\n", i);
ret = hist_field_debug_show(m, data->track_data.var_ref,
HIST_FIELD_FL_VAR_REF);
if (ret)
goto out;
seq_printf(m, "\n hist_data->actions[%d].track_data.track_var:\n", i);
ret = hist_field_debug_show(m, data->track_data.track_var,
HIST_FIELD_FL_VAR);
if (ret)
goto out;
}
if (data->handler == HANDLER_ONMATCH) {
seq_printf(m, "\n hist_data->actions[%d].match_data.event_system: %s\n",
i, data->match_data.event_system);
seq_printf(m, " hist_data->actions[%d].match_data.event: %s\n",
i, data->match_data.event);
}
out:
return ret;
}
static int hist_actions_debug_show(struct seq_file *m,
struct hist_trigger_data *hist_data)
{
int i, ret = 0;
if (hist_data->n_actions)
seq_puts(m, "\n action tracking variables (for onmax()/onchange()/onmatch()):\n");
for (i = 0; i < hist_data->n_actions; i++) {
struct action_data *action = hist_data->actions[i];
ret = hist_action_debug_show(m, action, i);
if (ret)
goto out;
}
if (hist_data->n_save_vars)
seq_puts(m, "\n save action variables (save() params):\n");
for (i = 0; i < hist_data->n_save_vars; i++) {
ret = field_var_debug_show(m, hist_data->save_vars[i], i, true);
if (ret)
goto out;
}
out:
return ret;
}
static void hist_trigger_debug_show(struct seq_file *m,
struct event_trigger_data *data, int n)
{
struct hist_trigger_data *hist_data;
int i, ret;
if (n > 0)
seq_puts(m, "\n\n");
seq_puts(m, "# event histogram\n#\n# trigger info: ");
data->ops->print(m, data->ops, data);
seq_puts(m, "#\n\n");
hist_data = data->private_data;
seq_printf(m, "hist_data: %p\n\n", hist_data);
seq_printf(m, " n_vals: %u\n", hist_data->n_vals);
seq_printf(m, " n_keys: %u\n", hist_data->n_keys);
seq_printf(m, " n_fields: %u\n", hist_data->n_fields);
seq_puts(m, "\n val fields:\n\n");
seq_puts(m, " hist_data->fields[0]:\n");
ret = hist_field_debug_show(m, hist_data->fields[0],
HIST_FIELD_FL_HITCOUNT);
if (ret)
return;
for (i = 1; i < hist_data->n_vals; i++) {
seq_printf(m, "\n hist_data->fields[%d]:\n", i);
ret = hist_field_debug_show(m, hist_data->fields[i], 0);
if (ret)
return;
}
seq_puts(m, "\n key fields:\n");
for (i = hist_data->n_vals; i < hist_data->n_fields; i++) {
seq_printf(m, "\n hist_data->fields[%d]:\n", i);
ret = hist_field_debug_show(m, hist_data->fields[i],
HIST_FIELD_FL_KEY);
if (ret)
return;
}
if (hist_data->n_var_refs)
seq_puts(m, "\n variable reference fields:\n");
for (i = 0; i < hist_data->n_var_refs; i++) {
seq_printf(m, "\n hist_data->var_refs[%d]:\n", i);
ret = hist_field_debug_show(m, hist_data->var_refs[i],
HIST_FIELD_FL_VAR_REF);
if (ret)
return;
}
if (hist_data->n_field_vars)
seq_puts(m, "\n field variables:\n");
for (i = 0; i < hist_data->n_field_vars; i++) {
ret = field_var_debug_show(m, hist_data->field_vars[i], i, false);
if (ret)
return;
}
ret = hist_actions_debug_show(m, hist_data);
if (ret)
return;
}
static int hist_debug_show(struct seq_file *m, void *v)
{
struct event_trigger_data *data;
struct trace_event_file *event_file;
int n = 0, ret = 0;
mutex_lock(&event_mutex);
event_file = event_file_data(m->private);
if (unlikely(!event_file)) {
ret = -ENODEV;
goto out_unlock;
}
list_for_each_entry(data, &event_file->triggers, list) {
if (data->cmd_ops->trigger_type == ETT_EVENT_HIST)
hist_trigger_debug_show(m, data, n++);
}
out_unlock:
mutex_unlock(&event_mutex);
return ret;
}
static int event_hist_debug_open(struct inode *inode, struct file *file)
{
int ret;
ret = security_locked_down(LOCKDOWN_TRACEFS);
if (ret)
return ret;
return single_open(file, hist_debug_show, file);
}
const struct file_operations event_hist_debug_fops = {
.open = event_hist_debug_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
#endif
static void hist_field_print(struct seq_file *m, struct hist_field *hist_field) static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
{ {
const char *field_name = hist_field_name(hist_field, 0); const char *field_name = hist_field_name(hist_field, 0);
...@@ -7393,37 +5904,3 @@ __init int register_trigger_hist_enable_disable_cmds(void) ...@@ -7393,37 +5904,3 @@ __init int register_trigger_hist_enable_disable_cmds(void)
return ret; return ret;
} }
static __init int trace_events_hist_init(void)
{
struct dentry *entry = NULL;
struct dentry *d_tracer;
int err = 0;
err = dyn_event_register(&synth_event_ops);
if (err) {
pr_warn("Could not register synth_event_ops\n");
return err;
}
d_tracer = tracing_init_dentry();
if (IS_ERR(d_tracer)) {
err = PTR_ERR(d_tracer);
goto err;
}
entry = tracefs_create_file("synthetic_events", 0644, d_tracer,
NULL, &synth_events_fops);
if (!entry) {
err = -ENODEV;
goto err;
}
return err;
err:
pr_warn("Could not create tracefs 'synthetic_events' entry\n");
return err;
}
fs_initcall(trace_events_hist_init);
// SPDX-License-Identifier: GPL-2.0
/*
* trace_events_synth - synthetic trace events
*
* Copyright (C) 2015, 2020 Tom Zanussi <tom.zanussi@linux.intel.com>
*/
#include <linux/module.h>
#include <linux/kallsyms.h>
#include <linux/security.h>
#include <linux/mutex.h>
#include <linux/slab.h>
#include <linux/stacktrace.h>
#include <linux/rculist.h>
#include <linux/tracefs.h>
/* for gfp flag names */
#include <linux/trace_events.h>
#include <trace/events/mmflags.h>
#include "trace_synth.h"
static int create_synth_event(int argc, const char **argv);
static int synth_event_show(struct seq_file *m, struct dyn_event *ev);
static int synth_event_release(struct dyn_event *ev);
static bool synth_event_is_busy(struct dyn_event *ev);
static bool synth_event_match(const char *system, const char *event,
int argc, const char **argv, struct dyn_event *ev);
static struct dyn_event_operations synth_event_ops = {
.create = create_synth_event,
.show = synth_event_show,
.is_busy = synth_event_is_busy,
.free = synth_event_release,
.match = synth_event_match,
};
static bool is_synth_event(struct dyn_event *ev)
{
return ev->ops == &synth_event_ops;
}
static struct synth_event *to_synth_event(struct dyn_event *ev)
{
return container_of(ev, struct synth_event, devent);
}
static bool synth_event_is_busy(struct dyn_event *ev)
{
struct synth_event *event = to_synth_event(ev);
return event->ref != 0;
}
static bool synth_event_match(const char *system, const char *event,
int argc, const char **argv, struct dyn_event *ev)
{
struct synth_event *sev = to_synth_event(ev);
return strcmp(sev->name, event) == 0 &&
(!system || strcmp(system, SYNTH_SYSTEM) == 0);
}
struct synth_trace_event {
struct trace_entry ent;
u64 fields[];
};
static int synth_event_define_fields(struct trace_event_call *call)
{
struct synth_trace_event trace;
int offset = offsetof(typeof(trace), fields);
struct synth_event *event = call->data;
unsigned int i, size, n_u64;
char *name, *type;
bool is_signed;
int ret = 0;
for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
size = event->fields[i]->size;
is_signed = event->fields[i]->is_signed;
type = event->fields[i]->type;
name = event->fields[i]->name;
ret = trace_define_field(call, type, name, offset, size,
is_signed, FILTER_OTHER);
if (ret)
break;
event->fields[i]->offset = n_u64;
if (event->fields[i]->is_string) {
offset += STR_VAR_LEN_MAX;
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
offset += sizeof(u64);
n_u64++;
}
}
event->n_u64 = n_u64;
return ret;
}
static bool synth_field_signed(char *type)
{
if (str_has_prefix(type, "u"))
return false;
if (strcmp(type, "gfp_t") == 0)
return false;
return true;
}
static int synth_field_is_string(char *type)
{
if (strstr(type, "char[") != NULL)
return true;
return false;
}
static int synth_field_string_size(char *type)
{
char buf[4], *end, *start;
unsigned int len;
int size, err;
start = strstr(type, "char[");
if (start == NULL)
return -EINVAL;
start += sizeof("char[") - 1;
end = strchr(type, ']');
if (!end || end < start)
return -EINVAL;
len = end - start;
if (len > 3)
return -EINVAL;
strncpy(buf, start, len);
buf[len] = '\0';
err = kstrtouint(buf, 0, &size);
if (err)
return err;
if (size > STR_VAR_LEN_MAX)
return -EINVAL;
return size;
}
static int synth_field_size(char *type)
{
int size = 0;
if (strcmp(type, "s64") == 0)
size = sizeof(s64);
else if (strcmp(type, "u64") == 0)
size = sizeof(u64);
else if (strcmp(type, "s32") == 0)
size = sizeof(s32);
else if (strcmp(type, "u32") == 0)
size = sizeof(u32);
else if (strcmp(type, "s16") == 0)
size = sizeof(s16);
else if (strcmp(type, "u16") == 0)
size = sizeof(u16);
else if (strcmp(type, "s8") == 0)
size = sizeof(s8);
else if (strcmp(type, "u8") == 0)
size = sizeof(u8);
else if (strcmp(type, "char") == 0)
size = sizeof(char);
else if (strcmp(type, "unsigned char") == 0)
size = sizeof(unsigned char);
else if (strcmp(type, "int") == 0)
size = sizeof(int);
else if (strcmp(type, "unsigned int") == 0)
size = sizeof(unsigned int);
else if (strcmp(type, "long") == 0)
size = sizeof(long);
else if (strcmp(type, "unsigned long") == 0)
size = sizeof(unsigned long);
else if (strcmp(type, "pid_t") == 0)
size = sizeof(pid_t);
else if (strcmp(type, "gfp_t") == 0)
size = sizeof(gfp_t);
else if (synth_field_is_string(type))
size = synth_field_string_size(type);
return size;
}
static const char *synth_field_fmt(char *type)
{
const char *fmt = "%llu";
if (strcmp(type, "s64") == 0)
fmt = "%lld";
else if (strcmp(type, "u64") == 0)
fmt = "%llu";
else if (strcmp(type, "s32") == 0)
fmt = "%d";
else if (strcmp(type, "u32") == 0)
fmt = "%u";
else if (strcmp(type, "s16") == 0)
fmt = "%d";
else if (strcmp(type, "u16") == 0)
fmt = "%u";
else if (strcmp(type, "s8") == 0)
fmt = "%d";
else if (strcmp(type, "u8") == 0)
fmt = "%u";
else if (strcmp(type, "char") == 0)
fmt = "%d";
else if (strcmp(type, "unsigned char") == 0)
fmt = "%u";
else if (strcmp(type, "int") == 0)
fmt = "%d";
else if (strcmp(type, "unsigned int") == 0)
fmt = "%u";
else if (strcmp(type, "long") == 0)
fmt = "%ld";
else if (strcmp(type, "unsigned long") == 0)
fmt = "%lu";
else if (strcmp(type, "pid_t") == 0)
fmt = "%d";
else if (strcmp(type, "gfp_t") == 0)
fmt = "%x";
else if (synth_field_is_string(type))
fmt = "%s";
return fmt;
}
static void print_synth_event_num_val(struct trace_seq *s,
char *print_fmt, char *name,
int size, u64 val, char *space)
{
switch (size) {
case 1:
trace_seq_printf(s, print_fmt, name, (u8)val, space);
break;
case 2:
trace_seq_printf(s, print_fmt, name, (u16)val, space);
break;
case 4:
trace_seq_printf(s, print_fmt, name, (u32)val, space);
break;
default:
trace_seq_printf(s, print_fmt, name, val, space);
break;
}
}
static enum print_line_t print_synth_event(struct trace_iterator *iter,
int flags,
struct trace_event *event)
{
struct trace_array *tr = iter->tr;
struct trace_seq *s = &iter->seq;
struct synth_trace_event *entry;
struct synth_event *se;
unsigned int i, n_u64;
char print_fmt[32];
const char *fmt;
entry = (struct synth_trace_event *)iter->ent;
se = container_of(event, struct synth_event, call.event);
trace_seq_printf(s, "%s: ", se->name);
for (i = 0, n_u64 = 0; i < se->n_fields; i++) {
if (trace_seq_has_overflowed(s))
goto end;
fmt = synth_field_fmt(se->fields[i]->type);
/* parameter types */
if (tr && tr->trace_flags & TRACE_ITER_VERBOSE)
trace_seq_printf(s, "%s ", fmt);
snprintf(print_fmt, sizeof(print_fmt), "%%s=%s%%s", fmt);
/* parameter values */
if (se->fields[i]->is_string) {
trace_seq_printf(s, print_fmt, se->fields[i]->name,
(char *)&entry->fields[n_u64],
i == se->n_fields - 1 ? "" : " ");
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
struct trace_print_flags __flags[] = {
__def_gfpflag_names, {-1, NULL} };
char *space = (i == se->n_fields - 1 ? "" : " ");
print_synth_event_num_val(s, print_fmt,
se->fields[i]->name,
se->fields[i]->size,
entry->fields[n_u64],
space);
if (strcmp(se->fields[i]->type, "gfp_t") == 0) {
trace_seq_puts(s, " (");
trace_print_flags_seq(s, "|",
entry->fields[n_u64],
__flags);
trace_seq_putc(s, ')');
}
n_u64++;
}
}
end:
trace_seq_putc(s, '\n');
return trace_handle_return(s);
}
static struct trace_event_functions synth_event_funcs = {
.trace = print_synth_event
};
static notrace void trace_event_raw_event_synth(void *__data,
u64 *var_ref_vals,
unsigned int *var_ref_idx)
{
struct trace_event_file *trace_file = __data;
struct synth_trace_event *entry;
struct trace_event_buffer fbuffer;
struct trace_buffer *buffer;
struct synth_event *event;
unsigned int i, n_u64, val_idx;
int fields_size = 0;
event = trace_file->event_call->data;
if (trace_trigger_soft_disabled(trace_file))
return;
fields_size = event->n_u64 * sizeof(u64);
/*
* Avoid ring buffer recursion detection, as this event
* is being performed within another event.
*/
buffer = trace_file->tr->array_buffer.buffer;
ring_buffer_nest_start(buffer);
entry = trace_event_buffer_reserve(&fbuffer, trace_file,
sizeof(*entry) + fields_size);
if (!entry)
goto out;
for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
val_idx = var_ref_idx[i];
if (event->fields[i]->is_string) {
char *str_val = (char *)(long)var_ref_vals[val_idx];
char *str_field = (char *)&entry->fields[n_u64];
strscpy(str_field, str_val, STR_VAR_LEN_MAX);
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
struct synth_field *field = event->fields[i];
u64 val = var_ref_vals[val_idx];
switch (field->size) {
case 1:
*(u8 *)&entry->fields[n_u64] = (u8)val;
break;
case 2:
*(u16 *)&entry->fields[n_u64] = (u16)val;
break;
case 4:
*(u32 *)&entry->fields[n_u64] = (u32)val;
break;
default:
entry->fields[n_u64] = val;
break;
}
n_u64++;
}
}
trace_event_buffer_commit(&fbuffer);
out:
ring_buffer_nest_end(buffer);
}
static void free_synth_event_print_fmt(struct trace_event_call *call)
{
if (call) {
kfree(call->print_fmt);
call->print_fmt = NULL;
}
}
static int __set_synth_event_print_fmt(struct synth_event *event,
char *buf, int len)
{
const char *fmt;
int pos = 0;
int i;
/* When len=0, we just calculate the needed length */
#define LEN_OR_ZERO (len ? len - pos : 0)
pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
for (i = 0; i < event->n_fields; i++) {
fmt = synth_field_fmt(event->fields[i]->type);
pos += snprintf(buf + pos, LEN_OR_ZERO, "%s=%s%s",
event->fields[i]->name, fmt,
i == event->n_fields - 1 ? "" : ", ");
}
pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
for (i = 0; i < event->n_fields; i++) {
pos += snprintf(buf + pos, LEN_OR_ZERO,
", REC->%s", event->fields[i]->name);
}
#undef LEN_OR_ZERO
/* return the length of print_fmt */
return pos;
}
static int set_synth_event_print_fmt(struct trace_event_call *call)
{
struct synth_event *event = call->data;
char *print_fmt;
int len;
/* First: called with 0 length to calculate the needed length */
len = __set_synth_event_print_fmt(event, NULL, 0);
print_fmt = kmalloc(len + 1, GFP_KERNEL);
if (!print_fmt)
return -ENOMEM;
/* Second: actually write the @print_fmt */
__set_synth_event_print_fmt(event, print_fmt, len + 1);
call->print_fmt = print_fmt;
return 0;
}
static void free_synth_field(struct synth_field *field)
{
kfree(field->type);
kfree(field->name);
kfree(field);
}
static struct synth_field *parse_synth_field(int argc, const char **argv,
int *consumed)
{
struct synth_field *field;
const char *prefix = NULL, *field_type = argv[0], *field_name, *array;
int len, ret = 0;
if (field_type[0] == ';')
field_type++;
if (!strcmp(field_type, "unsigned")) {
if (argc < 3)
return ERR_PTR(-EINVAL);
prefix = "unsigned ";
field_type = argv[1];
field_name = argv[2];
*consumed = 3;
} else {
field_name = argv[1];
*consumed = 2;
}
field = kzalloc(sizeof(*field), GFP_KERNEL);
if (!field)
return ERR_PTR(-ENOMEM);
len = strlen(field_name);
array = strchr(field_name, '[');
if (array)
len -= strlen(array);
else if (field_name[len - 1] == ';')
len--;
field->name = kmemdup_nul(field_name, len, GFP_KERNEL);
if (!field->name) {
ret = -ENOMEM;
goto free;
}
if (field_type[0] == ';')
field_type++;
len = strlen(field_type) + 1;
if (array)
len += strlen(array);
if (prefix)
len += strlen(prefix);
field->type = kzalloc(len, GFP_KERNEL);
if (!field->type) {
ret = -ENOMEM;
goto free;
}
if (prefix)
strcat(field->type, prefix);
strcat(field->type, field_type);
if (array) {
strcat(field->type, array);
if (field->type[len - 1] == ';')
field->type[len - 1] = '\0';
}
field->size = synth_field_size(field->type);
if (!field->size) {
ret = -EINVAL;
goto free;
}
if (synth_field_is_string(field->type))
field->is_string = true;
field->is_signed = synth_field_signed(field->type);
out:
return field;
free:
free_synth_field(field);
field = ERR_PTR(ret);
goto out;
}
static void free_synth_tracepoint(struct tracepoint *tp)
{
if (!tp)
return;
kfree(tp->name);
kfree(tp);
}
static struct tracepoint *alloc_synth_tracepoint(char *name)
{
struct tracepoint *tp;
tp = kzalloc(sizeof(*tp), GFP_KERNEL);
if (!tp)
return ERR_PTR(-ENOMEM);
tp->name = kstrdup(name, GFP_KERNEL);
if (!tp->name) {
kfree(tp);
return ERR_PTR(-ENOMEM);
}
return tp;
}
struct synth_event *find_synth_event(const char *name)
{
struct dyn_event *pos;
struct synth_event *event;
for_each_dyn_event(pos) {
if (!is_synth_event(pos))
continue;
event = to_synth_event(pos);
if (strcmp(event->name, name) == 0)
return event;
}
return NULL;
}
static struct trace_event_fields synth_event_fields_array[] = {
{ .type = TRACE_FUNCTION_TYPE,
.define_fields = synth_event_define_fields },
{}
};
static int register_synth_event(struct synth_event *event)
{
struct trace_event_call *call = &event->call;
int ret = 0;
event->call.class = &event->class;
event->class.system = kstrdup(SYNTH_SYSTEM, GFP_KERNEL);
if (!event->class.system) {
ret = -ENOMEM;
goto out;
}
event->tp = alloc_synth_tracepoint(event->name);
if (IS_ERR(event->tp)) {
ret = PTR_ERR(event->tp);
event->tp = NULL;
goto out;
}
INIT_LIST_HEAD(&call->class->fields);
call->event.funcs = &synth_event_funcs;
call->class->fields_array = synth_event_fields_array;
ret = register_trace_event(&call->event);
if (!ret) {
ret = -ENODEV;
goto out;
}
call->flags = TRACE_EVENT_FL_TRACEPOINT;
call->class->reg = trace_event_reg;
call->class->probe = trace_event_raw_event_synth;
call->data = event;
call->tp = event->tp;
ret = trace_add_event_call(call);
if (ret) {
pr_warn("Failed to register synthetic event: %s\n",
trace_event_name(call));
goto err;
}
ret = set_synth_event_print_fmt(call);
if (ret < 0) {
trace_remove_event_call(call);
goto err;
}
out:
return ret;
err:
unregister_trace_event(&call->event);
goto out;
}
static int unregister_synth_event(struct synth_event *event)
{
struct trace_event_call *call = &event->call;
int ret;
ret = trace_remove_event_call(call);
return ret;
}
static void free_synth_event(struct synth_event *event)
{
unsigned int i;
if (!event)
return;
for (i = 0; i < event->n_fields; i++)
free_synth_field(event->fields[i]);
kfree(event->fields);
kfree(event->name);
kfree(event->class.system);
free_synth_tracepoint(event->tp);
free_synth_event_print_fmt(&event->call);
kfree(event);
}
static struct synth_event *alloc_synth_event(const char *name, int n_fields,
struct synth_field **fields)
{
struct synth_event *event;
unsigned int i;
event = kzalloc(sizeof(*event), GFP_KERNEL);
if (!event) {
event = ERR_PTR(-ENOMEM);
goto out;
}
event->name = kstrdup(name, GFP_KERNEL);
if (!event->name) {
kfree(event);
event = ERR_PTR(-ENOMEM);
goto out;
}
event->fields = kcalloc(n_fields, sizeof(*event->fields), GFP_KERNEL);
if (!event->fields) {
free_synth_event(event);
event = ERR_PTR(-ENOMEM);
goto out;
}
dyn_event_init(&event->devent, &synth_event_ops);
for (i = 0; i < n_fields; i++)
event->fields[i] = fields[i];
event->n_fields = n_fields;
out:
return event;
}
static int synth_event_check_arg_fn(void *data)
{
struct dynevent_arg_pair *arg_pair = data;
int size;
size = synth_field_size((char *)arg_pair->lhs);
return size ? 0 : -EINVAL;
}
/**
* synth_event_add_field - Add a new field to a synthetic event cmd
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @type: The type of the new field to add
* @name: The name of the new field to add
*
* Add a new field to a synthetic event cmd object. Field ordering is in
* the same order the fields are added.
*
* See synth_field_size() for available types. If field_name contains
* [n] the field is considered to be an array.
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_add_field(struct dynevent_cmd *cmd, const char *type,
const char *name)
{
struct dynevent_arg_pair arg_pair;
int ret;
if (cmd->type != DYNEVENT_TYPE_SYNTH)
return -EINVAL;
if (!type || !name)
return -EINVAL;
dynevent_arg_pair_init(&arg_pair, 0, ';');
arg_pair.lhs = type;
arg_pair.rhs = name;
ret = dynevent_arg_pair_add(cmd, &arg_pair, synth_event_check_arg_fn);
if (ret)
return ret;
if (++cmd->n_fields > SYNTH_FIELDS_MAX)
ret = -EINVAL;
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_add_field);
/**
* synth_event_add_field_str - Add a new field to a synthetic event cmd
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @type_name: The type and name of the new field to add, as a single string
*
* Add a new field to a synthetic event cmd object, as a single
* string. The @type_name string is expected to be of the form 'type
* name', which will be appended by ';'. No sanity checking is done -
* what's passed in is assumed to already be well-formed. Field
* ordering is in the same order the fields are added.
*
* See synth_field_size() for available types. If field_name contains
* [n] the field is considered to be an array.
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_add_field_str(struct dynevent_cmd *cmd, const char *type_name)
{
struct dynevent_arg arg;
int ret;
if (cmd->type != DYNEVENT_TYPE_SYNTH)
return -EINVAL;
if (!type_name)
return -EINVAL;
dynevent_arg_init(&arg, ';');
arg.str = type_name;
ret = dynevent_arg_add(cmd, &arg, NULL);
if (ret)
return ret;
if (++cmd->n_fields > SYNTH_FIELDS_MAX)
ret = -EINVAL;
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_add_field_str);
/**
* synth_event_add_fields - Add multiple fields to a synthetic event cmd
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @fields: An array of type/name field descriptions
* @n_fields: The number of field descriptions contained in the fields array
*
* Add a new set of fields to a synthetic event cmd object. The event
* fields that will be defined for the event should be passed in as an
* array of struct synth_field_desc, and the number of elements in the
* array passed in as n_fields. Field ordering will retain the
* ordering given in the fields array.
*
* See synth_field_size() for available types. If field_name contains
* [n] the field is considered to be an array.
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_add_fields(struct dynevent_cmd *cmd,
struct synth_field_desc *fields,
unsigned int n_fields)
{
unsigned int i;
int ret = 0;
for (i = 0; i < n_fields; i++) {
if (fields[i].type == NULL || fields[i].name == NULL) {
ret = -EINVAL;
break;
}
ret = synth_event_add_field(cmd, fields[i].type, fields[i].name);
if (ret)
break;
}
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_add_fields);
/**
* __synth_event_gen_cmd_start - Start a synthetic event command from arg list
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @name: The name of the synthetic event
* @mod: The module creating the event, NULL if not created from a module
* @args: Variable number of arg (pairs), one pair for each field
*
* NOTE: Users normally won't want to call this function directly, but
* rather use the synth_event_gen_cmd_start() wrapper, which
* automatically adds a NULL to the end of the arg list. If this
* function is used directly, make sure the last arg in the variable
* arg list is NULL.
*
* Generate a synthetic event command to be executed by
* synth_event_gen_cmd_end(). This function can be used to generate
* the complete command or only the first part of it; in the latter
* case, synth_event_add_field(), synth_event_add_field_str(), or
* synth_event_add_fields() can be used to add more fields following
* this.
*
* There should be an even number variable args, each pair consisting
* of a type followed by a field name.
*
* See synth_field_size() for available types. If field_name contains
* [n] the field is considered to be an array.
*
* Return: 0 if successful, error otherwise.
*/
int __synth_event_gen_cmd_start(struct dynevent_cmd *cmd, const char *name,
struct module *mod, ...)
{
struct dynevent_arg arg;
va_list args;
int ret;
cmd->event_name = name;
cmd->private_data = mod;
if (cmd->type != DYNEVENT_TYPE_SYNTH)
return -EINVAL;
dynevent_arg_init(&arg, 0);
arg.str = name;
ret = dynevent_arg_add(cmd, &arg, NULL);
if (ret)
return ret;
va_start(args, mod);
for (;;) {
const char *type, *name;
type = va_arg(args, const char *);
if (!type)
break;
name = va_arg(args, const char *);
if (!name)
break;
if (++cmd->n_fields > SYNTH_FIELDS_MAX) {
ret = -EINVAL;
break;
}
ret = synth_event_add_field(cmd, type, name);
if (ret)
break;
}
va_end(args);
return ret;
}
EXPORT_SYMBOL_GPL(__synth_event_gen_cmd_start);
/**
* synth_event_gen_cmd_array_start - Start synthetic event command from an array
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @name: The name of the synthetic event
* @fields: An array of type/name field descriptions
* @n_fields: The number of field descriptions contained in the fields array
*
* Generate a synthetic event command to be executed by
* synth_event_gen_cmd_end(). This function can be used to generate
* the complete command or only the first part of it; in the latter
* case, synth_event_add_field(), synth_event_add_field_str(), or
* synth_event_add_fields() can be used to add more fields following
* this.
*
* The event fields that will be defined for the event should be
* passed in as an array of struct synth_field_desc, and the number of
* elements in the array passed in as n_fields. Field ordering will
* retain the ordering given in the fields array.
*
* See synth_field_size() for available types. If field_name contains
* [n] the field is considered to be an array.
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_gen_cmd_array_start(struct dynevent_cmd *cmd, const char *name,
struct module *mod,
struct synth_field_desc *fields,
unsigned int n_fields)
{
struct dynevent_arg arg;
unsigned int i;
int ret = 0;
cmd->event_name = name;
cmd->private_data = mod;
if (cmd->type != DYNEVENT_TYPE_SYNTH)
return -EINVAL;
if (n_fields > SYNTH_FIELDS_MAX)
return -EINVAL;
dynevent_arg_init(&arg, 0);
arg.str = name;
ret = dynevent_arg_add(cmd, &arg, NULL);
if (ret)
return ret;
for (i = 0; i < n_fields; i++) {
if (fields[i].type == NULL || fields[i].name == NULL)
return -EINVAL;
ret = synth_event_add_field(cmd, fields[i].type, fields[i].name);
if (ret)
break;
}
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_gen_cmd_array_start);
static int __create_synth_event(int argc, const char *name, const char **argv)
{
struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
struct synth_event *event = NULL;
int i, consumed = 0, n_fields = 0, ret = 0;
/*
* Argument syntax:
* - Add synthetic event: <event_name> field[;field] ...
* - Remove synthetic event: !<event_name> field[;field] ...
* where 'field' = type field_name
*/
if (name[0] == '\0' || argc < 1)
return -EINVAL;
mutex_lock(&event_mutex);
event = find_synth_event(name);
if (event) {
ret = -EEXIST;
goto out;
}
for (i = 0; i < argc - 1; i++) {
if (strcmp(argv[i], ";") == 0)
continue;
if (n_fields == SYNTH_FIELDS_MAX) {
ret = -EINVAL;
goto err;
}
field = parse_synth_field(argc - i, &argv[i], &consumed);
if (IS_ERR(field)) {
ret = PTR_ERR(field);
goto err;
}
fields[n_fields++] = field;
i += consumed - 1;
}
if (i < argc && strcmp(argv[i], ";") != 0) {
ret = -EINVAL;
goto err;
}
event = alloc_synth_event(name, n_fields, fields);
if (IS_ERR(event)) {
ret = PTR_ERR(event);
event = NULL;
goto err;
}
ret = register_synth_event(event);
if (!ret)
dyn_event_add(&event->devent);
else
free_synth_event(event);
out:
mutex_unlock(&event_mutex);
return ret;
err:
for (i = 0; i < n_fields; i++)
free_synth_field(fields[i]);
goto out;
}
/**
* synth_event_create - Create a new synthetic event
* @name: The name of the new sythetic event
* @fields: An array of type/name field descriptions
* @n_fields: The number of field descriptions contained in the fields array
* @mod: The module creating the event, NULL if not created from a module
*
* Create a new synthetic event with the given name under the
* trace/events/synthetic/ directory. The event fields that will be
* defined for the event should be passed in as an array of struct
* synth_field_desc, and the number elements in the array passed in as
* n_fields. Field ordering will retain the ordering given in the
* fields array.
*
* If the new synthetic event is being created from a module, the mod
* param must be non-NULL. This will ensure that the trace buffer
* won't contain unreadable events.
*
* The new synth event should be deleted using synth_event_delete()
* function. The new synthetic event can be generated from modules or
* other kernel code using trace_synth_event() and related functions.
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_create(const char *name, struct synth_field_desc *fields,
unsigned int n_fields, struct module *mod)
{
struct dynevent_cmd cmd;
char *buf;
int ret;
buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
if (!buf)
return -ENOMEM;
synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
ret = synth_event_gen_cmd_array_start(&cmd, name, mod,
fields, n_fields);
if (ret)
goto out;
ret = synth_event_gen_cmd_end(&cmd);
out:
kfree(buf);
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_create);
static int destroy_synth_event(struct synth_event *se)
{
int ret;
if (se->ref)
ret = -EBUSY;
else {
ret = unregister_synth_event(se);
if (!ret) {
dyn_event_remove(&se->devent);
free_synth_event(se);
}
}
return ret;
}
/**
* synth_event_delete - Delete a synthetic event
* @event_name: The name of the new sythetic event
*
* Delete a synthetic event that was created with synth_event_create().
*
* Return: 0 if successful, error otherwise.
*/
int synth_event_delete(const char *event_name)
{
struct synth_event *se = NULL;
struct module *mod = NULL;
int ret = -ENOENT;
mutex_lock(&event_mutex);
se = find_synth_event(event_name);
if (se) {
mod = se->mod;
ret = destroy_synth_event(se);
}
mutex_unlock(&event_mutex);
if (mod) {
mutex_lock(&trace_types_lock);
/*
* It is safest to reset the ring buffer if the module
* being unloaded registered any events that were
* used. The only worry is if a new module gets
* loaded, and takes on the same id as the events of
* this module. When printing out the buffer, traced
* events left over from this module may be passed to
* the new module events and unexpected results may
* occur.
*/
tracing_reset_all_online_cpus();
mutex_unlock(&trace_types_lock);
}
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_delete);
static int create_or_delete_synth_event(int argc, char **argv)
{
const char *name = argv[0];
int ret;
/* trace_run_command() ensures argc != 0 */
if (name[0] == '!') {
ret = synth_event_delete(name + 1);
return ret;
}
ret = __create_synth_event(argc - 1, name, (const char **)argv + 1);
return ret == -ECANCELED ? -EINVAL : ret;
}
static int synth_event_run_command(struct dynevent_cmd *cmd)
{
struct synth_event *se;
int ret;
ret = trace_run_command(cmd->seq.buffer, create_or_delete_synth_event);
if (ret)
return ret;
se = find_synth_event(cmd->event_name);
if (WARN_ON(!se))
return -ENOENT;
se->mod = cmd->private_data;
return ret;
}
/**
* synth_event_cmd_init - Initialize a synthetic event command object
* @cmd: A pointer to the dynevent_cmd struct representing the new event
* @buf: A pointer to the buffer used to build the command
* @maxlen: The length of the buffer passed in @buf
*
* Initialize a synthetic event command object. Use this before
* calling any of the other dyenvent_cmd functions.
*/
void synth_event_cmd_init(struct dynevent_cmd *cmd, char *buf, int maxlen)
{
dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_SYNTH,
synth_event_run_command);
}
EXPORT_SYMBOL_GPL(synth_event_cmd_init);
static inline int
__synth_event_trace_start(struct trace_event_file *file,
struct synth_event_trace_state *trace_state)
{
int entry_size, fields_size = 0;
int ret = 0;
memset(trace_state, '\0', sizeof(*trace_state));
/*
* Normal event tracing doesn't get called at all unless the
* ENABLED bit is set (which attaches the probe thus allowing
* this code to be called, etc). Because this is called
* directly by the user, we don't have that but we still need
* to honor not logging when disabled. For the the iterated
* trace case, we save the enabed state upon start and just
* ignore the following data calls.
*/
if (!(file->flags & EVENT_FILE_FL_ENABLED) ||
trace_trigger_soft_disabled(file)) {
trace_state->disabled = true;
ret = -ENOENT;
goto out;
}
trace_state->event = file->event_call->data;
fields_size = trace_state->event->n_u64 * sizeof(u64);
/*
* Avoid ring buffer recursion detection, as this event
* is being performed within another event.
*/
trace_state->buffer = file->tr->array_buffer.buffer;
ring_buffer_nest_start(trace_state->buffer);
entry_size = sizeof(*trace_state->entry) + fields_size;
trace_state->entry = trace_event_buffer_reserve(&trace_state->fbuffer,
file,
entry_size);
if (!trace_state->entry) {
ring_buffer_nest_end(trace_state->buffer);
ret = -EINVAL;
}
out:
return ret;
}
static inline void
__synth_event_trace_end(struct synth_event_trace_state *trace_state)
{
trace_event_buffer_commit(&trace_state->fbuffer);
ring_buffer_nest_end(trace_state->buffer);
}
/**
* synth_event_trace - Trace a synthetic event
* @file: The trace_event_file representing the synthetic event
* @n_vals: The number of values in vals
* @args: Variable number of args containing the event values
*
* Trace a synthetic event using the values passed in the variable
* argument list.
*
* The argument list should be a list 'n_vals' u64 values. The number
* of vals must match the number of field in the synthetic event, and
* must be in the same order as the synthetic event fields.
*
* All vals should be cast to u64, and string vals are just pointers
* to strings, cast to u64. Strings will be copied into space
* reserved in the event for the string, using these pointers.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
{
struct synth_event_trace_state state;
unsigned int i, n_u64;
va_list args;
int ret;
ret = __synth_event_trace_start(file, &state);
if (ret) {
if (ret == -ENOENT)
ret = 0; /* just disabled, not really an error */
return ret;
}
if (n_vals != state.event->n_fields) {
ret = -EINVAL;
goto out;
}
va_start(args, n_vals);
for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
u64 val;
val = va_arg(args, u64);
if (state.event->fields[i]->is_string) {
char *str_val = (char *)(long)val;
char *str_field = (char *)&state.entry->fields[n_u64];
strscpy(str_field, str_val, STR_VAR_LEN_MAX);
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
struct synth_field *field = state.event->fields[i];
switch (field->size) {
case 1:
*(u8 *)&state.entry->fields[n_u64] = (u8)val;
break;
case 2:
*(u16 *)&state.entry->fields[n_u64] = (u16)val;
break;
case 4:
*(u32 *)&state.entry->fields[n_u64] = (u32)val;
break;
default:
state.entry->fields[n_u64] = val;
break;
}
n_u64++;
}
}
va_end(args);
out:
__synth_event_trace_end(&state);
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_trace);
/**
* synth_event_trace_array - Trace a synthetic event from an array
* @file: The trace_event_file representing the synthetic event
* @vals: Array of values
* @n_vals: The number of values in vals
*
* Trace a synthetic event using the values passed in as 'vals'.
*
* The 'vals' array is just an array of 'n_vals' u64. The number of
* vals must match the number of field in the synthetic event, and
* must be in the same order as the synthetic event fields.
*
* All vals should be cast to u64, and string vals are just pointers
* to strings, cast to u64. Strings will be copied into space
* reserved in the event for the string, using these pointers.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
unsigned int n_vals)
{
struct synth_event_trace_state state;
unsigned int i, n_u64;
int ret;
ret = __synth_event_trace_start(file, &state);
if (ret) {
if (ret == -ENOENT)
ret = 0; /* just disabled, not really an error */
return ret;
}
if (n_vals != state.event->n_fields) {
ret = -EINVAL;
goto out;
}
for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
if (state.event->fields[i]->is_string) {
char *str_val = (char *)(long)vals[i];
char *str_field = (char *)&state.entry->fields[n_u64];
strscpy(str_field, str_val, STR_VAR_LEN_MAX);
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
struct synth_field *field = state.event->fields[i];
u64 val = vals[i];
switch (field->size) {
case 1:
*(u8 *)&state.entry->fields[n_u64] = (u8)val;
break;
case 2:
*(u16 *)&state.entry->fields[n_u64] = (u16)val;
break;
case 4:
*(u32 *)&state.entry->fields[n_u64] = (u32)val;
break;
default:
state.entry->fields[n_u64] = val;
break;
}
n_u64++;
}
}
out:
__synth_event_trace_end(&state);
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_trace_array);
/**
* synth_event_trace_start - Start piecewise synthetic event trace
* @file: The trace_event_file representing the synthetic event
* @trace_state: A pointer to object tracking the piecewise trace state
*
* Start the trace of a synthetic event field-by-field rather than all
* at once.
*
* This function 'opens' an event trace, which means space is reserved
* for the event in the trace buffer, after which the event's
* individual field values can be set through either
* synth_event_add_next_val() or synth_event_add_val().
*
* A pointer to a trace_state object is passed in, which will keep
* track of the current event trace state until the event trace is
* closed (and the event finally traced) using
* synth_event_trace_end().
*
* Note that synth_event_trace_end() must be called after all values
* have been added for each event trace, regardless of whether adding
* all field values succeeded or not.
*
* Note also that for a given event trace, all fields must be added
* using either synth_event_add_next_val() or synth_event_add_val()
* but not both together or interleaved.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_trace_start(struct trace_event_file *file,
struct synth_event_trace_state *trace_state)
{
int ret;
if (!trace_state)
return -EINVAL;
ret = __synth_event_trace_start(file, trace_state);
if (ret == -ENOENT)
ret = 0; /* just disabled, not really an error */
return ret;
}
EXPORT_SYMBOL_GPL(synth_event_trace_start);
static int __synth_event_add_val(const char *field_name, u64 val,
struct synth_event_trace_state *trace_state)
{
struct synth_field *field = NULL;
struct synth_trace_event *entry;
struct synth_event *event;
int i, ret = 0;
if (!trace_state) {
ret = -EINVAL;
goto out;
}
/* can't mix add_next_synth_val() with add_synth_val() */
if (field_name) {
if (trace_state->add_next) {
ret = -EINVAL;
goto out;
}
trace_state->add_name = true;
} else {
if (trace_state->add_name) {
ret = -EINVAL;
goto out;
}
trace_state->add_next = true;
}
if (trace_state->disabled)
goto out;
event = trace_state->event;
if (trace_state->add_name) {
for (i = 0; i < event->n_fields; i++) {
field = event->fields[i];
if (strcmp(field->name, field_name) == 0)
break;
}
if (!field) {
ret = -EINVAL;
goto out;
}
} else {
if (trace_state->cur_field >= event->n_fields) {
ret = -EINVAL;
goto out;
}
field = event->fields[trace_state->cur_field++];
}
entry = trace_state->entry;
if (field->is_string) {
char *str_val = (char *)(long)val;
char *str_field;
if (!str_val) {
ret = -EINVAL;
goto out;
}
str_field = (char *)&entry->fields[field->offset];
strscpy(str_field, str_val, STR_VAR_LEN_MAX);
} else {
switch (field->size) {
case 1:
*(u8 *)&trace_state->entry->fields[field->offset] = (u8)val;
break;
case 2:
*(u16 *)&trace_state->entry->fields[field->offset] = (u16)val;
break;
case 4:
*(u32 *)&trace_state->entry->fields[field->offset] = (u32)val;
break;
default:
trace_state->entry->fields[field->offset] = val;
break;
}
}
out:
return ret;
}
/**
* synth_event_add_next_val - Add the next field's value to an open synth trace
* @val: The value to set the next field to
* @trace_state: A pointer to object tracking the piecewise trace state
*
* Set the value of the next field in an event that's been opened by
* synth_event_trace_start().
*
* The val param should be the value cast to u64. If the value points
* to a string, the val param should be a char * cast to u64.
*
* This function assumes all the fields in an event are to be set one
* after another - successive calls to this function are made, one for
* each field, in the order of the fields in the event, until all
* fields have been set. If you'd rather set each field individually
* without regard to ordering, synth_event_add_val() can be used
* instead.
*
* Note however that synth_event_add_next_val() and
* synth_event_add_val() can't be intermixed for a given event trace -
* one or the other but not both can be used at the same time.
*
* Note also that synth_event_trace_end() must be called after all
* values have been added for each event trace, regardless of whether
* adding all field values succeeded or not.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_add_next_val(u64 val,
struct synth_event_trace_state *trace_state)
{
return __synth_event_add_val(NULL, val, trace_state);
}
EXPORT_SYMBOL_GPL(synth_event_add_next_val);
/**
* synth_event_add_val - Add a named field's value to an open synth trace
* @field_name: The name of the synthetic event field value to set
* @val: The value to set the next field to
* @trace_state: A pointer to object tracking the piecewise trace state
*
* Set the value of the named field in an event that's been opened by
* synth_event_trace_start().
*
* The val param should be the value cast to u64. If the value points
* to a string, the val param should be a char * cast to u64.
*
* This function looks up the field name, and if found, sets the field
* to the specified value. This lookup makes this function more
* expensive than synth_event_add_next_val(), so use that or the
* none-piecewise synth_event_trace() instead if efficiency is more
* important.
*
* Note however that synth_event_add_next_val() and
* synth_event_add_val() can't be intermixed for a given event trace -
* one or the other but not both can be used at the same time.
*
* Note also that synth_event_trace_end() must be called after all
* values have been added for each event trace, regardless of whether
* adding all field values succeeded or not.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_add_val(const char *field_name, u64 val,
struct synth_event_trace_state *trace_state)
{
return __synth_event_add_val(field_name, val, trace_state);
}
EXPORT_SYMBOL_GPL(synth_event_add_val);
/**
* synth_event_trace_end - End piecewise synthetic event trace
* @trace_state: A pointer to object tracking the piecewise trace state
*
* End the trace of a synthetic event opened by
* synth_event_trace__start().
*
* This function 'closes' an event trace, which basically means that
* it commits the reserved event and cleans up other loose ends.
*
* A pointer to a trace_state object is passed in, which will keep
* track of the current event trace state opened with
* synth_event_trace_start().
*
* Note that this function must be called after all values have been
* added for each event trace, regardless of whether adding all field
* values succeeded or not.
*
* Return: 0 on success, err otherwise.
*/
int synth_event_trace_end(struct synth_event_trace_state *trace_state)
{
if (!trace_state)
return -EINVAL;
__synth_event_trace_end(trace_state);
return 0;
}
EXPORT_SYMBOL_GPL(synth_event_trace_end);
static int create_synth_event(int argc, const char **argv)
{
const char *name = argv[0];
int len;
if (name[0] != 's' || name[1] != ':')
return -ECANCELED;
name += 2;
/* This interface accepts group name prefix */
if (strchr(name, '/')) {
len = str_has_prefix(name, SYNTH_SYSTEM "/");
if (len == 0)
return -EINVAL;
name += len;
}
return __create_synth_event(argc - 1, name, argv + 1);
}
static int synth_event_release(struct dyn_event *ev)
{
struct synth_event *event = to_synth_event(ev);
int ret;
if (event->ref)
return -EBUSY;
ret = unregister_synth_event(event);
if (ret)
return ret;
dyn_event_remove(ev);
free_synth_event(event);
return 0;
}
static int __synth_event_show(struct seq_file *m, struct synth_event *event)
{
struct synth_field *field;
unsigned int i;
seq_printf(m, "%s\t", event->name);
for (i = 0; i < event->n_fields; i++) {
field = event->fields[i];
/* parameter values */
seq_printf(m, "%s %s%s", field->type, field->name,
i == event->n_fields - 1 ? "" : "; ");
}
seq_putc(m, '\n');
return 0;
}
static int synth_event_show(struct seq_file *m, struct dyn_event *ev)
{
struct synth_event *event = to_synth_event(ev);
seq_printf(m, "s:%s/", event->class.system);
return __synth_event_show(m, event);
}
static int synth_events_seq_show(struct seq_file *m, void *v)
{
struct dyn_event *ev = v;
if (!is_synth_event(ev))
return 0;
return __synth_event_show(m, to_synth_event(ev));
}
static const struct seq_operations synth_events_seq_op = {
.start = dyn_event_seq_start,
.next = dyn_event_seq_next,
.stop = dyn_event_seq_stop,
.show = synth_events_seq_show,
};
static int synth_events_open(struct inode *inode, struct file *file)
{
int ret;
ret = security_locked_down(LOCKDOWN_TRACEFS);
if (ret)
return ret;
if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
ret = dyn_events_release_all(&synth_event_ops);
if (ret < 0)
return ret;
}
return seq_open(file, &synth_events_seq_op);
}
static ssize_t synth_events_write(struct file *file,
const char __user *buffer,
size_t count, loff_t *ppos)
{
return trace_parse_run_command(file, buffer, count, ppos,
create_or_delete_synth_event);
}
static const struct file_operations synth_events_fops = {
.open = synth_events_open,
.write = synth_events_write,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release,
};
static __init int trace_events_synth_init(void)
{
struct dentry *entry = NULL;
struct dentry *d_tracer;
int err = 0;
err = dyn_event_register(&synth_event_ops);
if (err) {
pr_warn("Could not register synth_event_ops\n");
return err;
}
d_tracer = tracing_init_dentry();
if (IS_ERR(d_tracer)) {
err = PTR_ERR(d_tracer);
goto err;
}
entry = tracefs_create_file("synthetic_events", 0644, d_tracer,
NULL, &synth_events_fops);
if (!entry) {
err = -ENODEV;
goto err;
}
return err;
err:
pr_warn("Could not create tracefs 'synthetic_events' entry\n");
return err;
}
fs_initcall(trace_events_synth_init);
...@@ -1006,7 +1006,7 @@ int trace_probe_init(struct trace_probe *tp, const char *event, ...@@ -1006,7 +1006,7 @@ int trace_probe_init(struct trace_probe *tp, const char *event,
INIT_LIST_HEAD(&tp->event->class.fields); INIT_LIST_HEAD(&tp->event->class.fields);
INIT_LIST_HEAD(&tp->event->probes); INIT_LIST_HEAD(&tp->event->probes);
INIT_LIST_HEAD(&tp->list); INIT_LIST_HEAD(&tp->list);
list_add(&tp->event->probes, &tp->list); list_add(&tp->list, &tp->event->probes);
call = trace_probe_event_call(tp); call = trace_probe_event_call(tp);
call->class = &tp->event->class; call->class = &tp->event->class;
......
// SPDX-License-Identifier: GPL-2.0
#ifndef __TRACE_SYNTH_H
#define __TRACE_SYNTH_H
#include "trace_dynevent.h"
#define SYNTH_SYSTEM "synthetic"
#define SYNTH_FIELDS_MAX 32
#define STR_VAR_LEN_MAX 32 /* must be multiple of sizeof(u64) */
struct synth_field {
char *type;
char *name;
size_t size;
unsigned int offset;
bool is_signed;
bool is_string;
};
struct synth_event {
struct dyn_event devent;
int ref;
char *name;
struct synth_field **fields;
unsigned int n_fields;
unsigned int n_u64;
struct trace_event_class class;
struct trace_event_call call;
struct tracepoint *tp;
struct module *mod;
};
extern struct synth_event *find_synth_event(const char *name);
#endif /* __TRACE_SYNTH_H */
...@@ -47,6 +47,7 @@ ...@@ -47,6 +47,7 @@
#include <linux/bug.h> #include <linux/bug.h>
#include <linux/sched.h> #include <linux/sched.h>
#include <linux/rculist.h> #include <linux/rculist.h>
#include <linux/ftrace.h>
extern struct bug_entry __start___bug_table[], __stop___bug_table[]; extern struct bug_entry __start___bug_table[], __stop___bug_table[];
...@@ -153,6 +154,8 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs) ...@@ -153,6 +154,8 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs)
if (!bug) if (!bug)
return BUG_TRAP_TYPE_NONE; return BUG_TRAP_TYPE_NONE;
disable_trace_on_warning();
file = NULL; file = NULL;
line = 0; line = 0;
warning = 0; warning = 0;
......
...@@ -124,9 +124,16 @@ for i in samples/good-* ; do ...@@ -124,9 +124,16 @@ for i in samples/good-* ; do
xpass $BOOTCONF -a $i $INITRD xpass $BOOTCONF -a $i $INITRD
done done
echo
echo "=== Summary ==="
echo "# of Passed: $(expr $NO - $NG - 1)"
echo "# of Failed: $NG"
echo echo
if [ $NG -eq 0 ]; then if [ $NG -eq 0 ]; then
echo "All tests passed" echo "All tests passed"
else else
echo "$NG tests failed" echo "$NG tests failed"
exit 1
fi fi
...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then ...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported exit_unsupported
fi fi
if [ ! -f events/sched/sched_process_fork/hist ]; then
echo "hist trigger is not supported"
exit_unsupported
fi
echo "Test field variable support" echo "Test field variable support"
echo 'wakeup_latency u64 lat; pid_t pid; int prio; char comm[16]' > synthetic_events echo 'wakeup_latency u64 lat; pid_t pid; int prio; char comm[16]' > synthetic_events
......
...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then ...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported exit_unsupported
fi fi
if [ ! -f events/sched/sched_process_fork/hist ]; then
echo "hist trigger is not supported"
exit_unsupported
fi
echo "Test create synthetic event" echo "Test create synthetic event"
echo 'waking_latency u64 lat pid_t pid' > synthetic_events echo 'waking_latency u64 lat pid_t pid' > synthetic_events
......
...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then ...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported exit_unsupported
fi fi
if [ ! -f events/sched/sched_process_fork/hist ]; then
echo "hist trigger is not supported"
exit_unsupported
fi
echo "Test multiple actions on hist trigger" echo "Test multiple actions on hist trigger"
echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events
TRIGGER1=events/sched/sched_wakeup/trigger TRIGGER1=events/sched/sched_wakeup/trigger
......
...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then ...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported exit_unsupported
fi fi
if [ ! -f events/sched/sched_process_fork/hist ]; then
echo "hist trigger is not supported"
exit_unsupported
fi
echo "Test create synthetic event" echo "Test create synthetic event"
echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events
......
...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then ...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported exit_unsupported
fi fi
if [ ! -f events/sched/sched_process_fork/hist ]; then
echo "hist trigger is not supported"
exit_unsupported
fi
echo "Test create synthetic event" echo "Test create synthetic event"
echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events
......
...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then ...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported exit_unsupported
fi fi
if [ ! -f events/sched/sched_process_fork/hist ]; then
echo "hist trigger is not supported"
exit_unsupported
fi
echo "Test create synthetic event" echo "Test create synthetic event"
echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events
......
...@@ -12,6 +12,11 @@ if [ ! -f set_event ]; then ...@@ -12,6 +12,11 @@ if [ ! -f set_event ]; then
exit_unsupported exit_unsupported
fi fi
if [ ! -f events/sched/sched_process_fork/hist ]; then
echo "hist trigger is not supported"
exit_unsupported
fi
if [ ! -f snapshot ]; then if [ ! -f snapshot ]; then
echo "snapshot is not supported" echo "snapshot is not supported"
exit_unsupported exit_unsupported
......
...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then ...@@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then
exit_unsupported exit_unsupported
fi fi
if [ ! -f events/sched/sched_process_fork/hist ]; then
echo "hist trigger is not supported"
exit_unsupported
fi
grep -q "trace(<synthetic_event>" README || exit_unsupported # version issue grep -q "trace(<synthetic_event>" README || exit_unsupported # version issue
echo "Test create synthetic event" echo "Test create synthetic event"
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment