Commit bc0f51d3 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'trace-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:
 "A few more minor updates:

   - Show the tgid mappings for user space trace tools to use

   - Fix and optimize the comm and tgid cache recording

   - Sanitize derived kprobe names

   - Ftrace selftest updates

   - trace file header fix

   - Update of Documentation/trace/ftrace.txt

   - Compiler warning fixes

   - Fix possible uninitialized variable"

* tag 'trace-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix uninitialized variable in match_records()
  ftrace: Remove an unneeded NULL check
  ftrace: Hide cached module code for !CONFIG_MODULES
  tracing: Do note expose stack_trace_filter without DYNAMIC_FTRACE
  tracing: Update Documentation/trace/ftrace.txt
  tracing: Fixup trace file header alignment
  selftests/ftrace: Add a testcase for kprobe event naming
  selftests/ftrace: Add a test to probe module functions
  selftests/ftrace: Update multiple kprobes test for powerpc
  trace/kprobes: Sanitize derived event names
  tracing: Attempt to record other information even if some fail
  tracing: Treat recording tgid for idle task as a success
  tracing: Treat recording comm for idle task as a success
  tracing: Add saved_tgids file to show cached pid to tgid mappings
parents ad51271a 2e028c4f
......@@ -5,10 +5,11 @@ Copyright 2008 Red Hat Inc.
Author: Steven Rostedt <srostedt@redhat.com>
License: The GNU Free Documentation License, Version 1.2
(dual licensed under the GPL v2)
Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton,
John Kacur, and David Teigland.
Original Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton,
John Kacur, and David Teigland.
Written for: 2.6.28-rc2
Updated for: 3.10
Updated for: 4.13 - Copyright 2017 VMware Inc. Steven Rostedt
Introduction
------------
......@@ -26,9 +27,11 @@ a task is woken to the task is actually scheduled in.
One of the most common uses of ftrace is the event tracing.
Through out the kernel is hundreds of static event points that
can be enabled via the debugfs file system to see what is
can be enabled via the tracefs file system to see what is
going on in certain parts of the kernel.
See events.txt for more information.
Implementation Details
----------------------
......@@ -39,34 +42,47 @@ See ftrace-design.txt for details for arch porters and such.
The File System
---------------
Ftrace uses the debugfs file system to hold the control files as
Ftrace uses the tracefs file system to hold the control files as
well as the files to display output.
When debugfs is configured into the kernel (which selecting any ftrace
option will do) the directory /sys/kernel/debug will be created. To mount
When tracefs is configured into the kernel (which selecting any ftrace
option will do) the directory /sys/kernel/tracing will be created. To mount
this directory, you can add to your /etc/fstab file:
debugfs /sys/kernel/debug debugfs defaults 0 0
tracefs /sys/kernel/tracing tracefs defaults 0 0
Or you can mount it at run time with:
mount -t debugfs nodev /sys/kernel/debug
mount -t tracefs nodev /sys/kernel/tracing
For quicker access to that directory you may want to make a soft link to
it:
ln -s /sys/kernel/debug /debug
ln -s /sys/kernel/tracing /tracing
*** NOTICE ***
Before 4.1, all ftrace tracing control files were within the debugfs
file system, which is typically located at /sys/kernel/debug/tracing.
For backward compatibility, when mounting the debugfs file system,
the tracefs file system will be automatically mounted at:
/sys/kernel/debug/tracing
Any selected ftrace option will also create a directory called tracing
within the debugfs. The rest of the document will assume that you are in
the ftrace directory (cd /sys/kernel/debug/tracing) and will only concentrate
on the files within that directory and not distract from the content with
the extended "/sys/kernel/debug/tracing" path name.
All files located in the tracefs file system will be located in that
debugfs file system directory as well.
*** NOTICE ***
Any selected ftrace option will also create the tracefs file system.
The rest of the document will assume that you are in the ftrace directory
(cd /sys/kernel/tracing) and will only concentrate on the files within that
directory and not distract from the content with the extended
"/sys/kernel/tracing" path name.
That's it! (assuming that you have ftrace configured into your kernel)
After mounting debugfs, you can see a directory called
"tracing". This directory contains the control and output files
After mounting tracefs you will have access to the control and output files
of ftrace. Here is a list of some of the key files:
......@@ -92,10 +108,20 @@ of ftrace. Here is a list of some of the key files:
writing to the ring buffer, the tracing overhead may
still be occurring.
The kernel function tracing_off() can be used within the
kernel to disable writing to the ring buffer, which will
set this file to "0". User space can re-enable tracing by
echoing "1" into the file.
Note, the function and event trigger "traceoff" will also
set this file to zero and stop tracing. Which can also
be re-enabled by user space using this file.
trace:
This file holds the output of the trace in a human
readable format (described below).
readable format (described below). Note, tracing is temporarily
disabled while this file is being read (opened).
trace_pipe:
......@@ -109,7 +135,8 @@ of ftrace. Here is a list of some of the key files:
will not be read again with a sequential read. The
"trace" file is static, and if the tracer is not
adding more data, it will display the same
information every time it is read.
information every time it is read. This file will not
disable tracing while being read.
trace_options:
......@@ -128,12 +155,14 @@ of ftrace. Here is a list of some of the key files:
tracing_max_latency:
Some of the tracers record the max latency.
For example, the time interrupts are disabled.
This time is saved in this file. The max trace
will also be stored, and displayed by "trace".
A new max trace will only be recorded if the
latency is greater than the value in this
file. (in microseconds)
For example, the maximum time that interrupts are disabled.
The maximum time is saved in this file. The max trace will also be
stored, and displayed by "trace". A new max trace will only be
recorded if the latency is greater than the value in this file
(in microseconds).
By echoing in a time into this file, no latency will be recorded
unless it is greater than the time in this file.
tracing_thresh:
......@@ -152,32 +181,34 @@ of ftrace. Here is a list of some of the key files:
that the kernel uses for allocation, usually 4 KB in size).
If the last page allocated has room for more bytes
than requested, the rest of the page will be used,
making the actual allocation bigger than requested.
making the actual allocation bigger than requested or shown.
( Note, the size may not be a multiple of the page size
due to buffer management meta-data. )
Buffer sizes for individual CPUs may vary
(see "per_cpu/cpu0/buffer_size_kb" below), and if they do
this file will show "X".
buffer_total_size_kb:
This displays the total combined size of all the trace buffers.
free_buffer:
If a process is performing the tracing, and the ring buffer
should be shrunk "freed" when the process is finished, even
if it were to be killed by a signal, this file can be used
for that purpose. On close of this file, the ring buffer will
be resized to its minimum size. Having a process that is tracing
also open this file, when the process exits its file descriptor
for this file will be closed, and in doing so, the ring buffer
will be "freed".
If a process is performing tracing, and the ring buffer should be
shrunk "freed" when the process is finished, even if it were to be
killed by a signal, this file can be used for that purpose. On close
of this file, the ring buffer will be resized to its minimum size.
Having a process that is tracing also open this file, when the process
exits its file descriptor for this file will be closed, and in doing so,
the ring buffer will be "freed".
It may also stop tracing if disable_on_free option is set.
tracing_cpumask:
This is a mask that lets the user only trace
on specified CPUs. The format is a hex string
representing the CPUs.
This is a mask that lets the user only trace on specified CPUs.
The format is a hex string representing the CPUs.
set_ftrace_filter:
......@@ -190,6 +221,9 @@ of ftrace. Here is a list of some of the key files:
to be traced. Echoing names of functions into this file
will limit the trace to only those functions.
The functions listed in "available_filter_functions" are what
can be written into this file.
This interface also allows for commands to be used. See the
"Filter commands" section for more details.
......@@ -202,7 +236,14 @@ of ftrace. Here is a list of some of the key files:
set_ftrace_pid:
Have the function tracer only trace a single thread.
Have the function tracer only trace the threads whose PID are
listed in this file.
If the "function-fork" option is set, then when a task whose
PID is listed in this file forks, the child's PID will
automatically be added to this file, and the child will be
traced by the function tracer as well. This option will also
cause PIDs of tasks that exit to be removed from the file.
set_event_pid:
......@@ -217,17 +258,28 @@ of ftrace. Here is a list of some of the key files:
set_graph_function:
Set a "trigger" function where tracing should start
with the function graph tracer (See the section
"dynamic ftrace" for more details).
Functions listed in this file will cause the function graph
tracer to only trace these functions and the functions that
they call. (See the section "dynamic ftrace" for more details).
set_graph_notrace:
Similar to set_graph_function, but will disable function graph
tracing when the function is hit until it exits the function.
This makes it possible to ignore tracing functions that are called
by a specific function.
available_filter_functions:
This lists the functions that ftrace
has processed and can trace. These are the function
names that you can pass to "set_ftrace_filter" or
"set_ftrace_notrace". (See the section "dynamic ftrace"
below for more details.)
This lists the functions that ftrace has processed and can trace.
These are the function names that you can pass to
"set_ftrace_filter" or "set_ftrace_notrace".
(See the section "dynamic ftrace" below for more details.)
dyn_ftrace_total_info:
This file is for debugging purposes. The number of functions that
have been converted to nops and are available to be traced.
enabled_functions:
......@@ -250,12 +302,21 @@ of ftrace. Here is a list of some of the key files:
an 'I' will be displayed on the same line as the function that
can be overridden.
If the architecture supports it, it will also show what callback
is being directly called by the function. If the count is greater
than 1 it most likely will be ftrace_ops_list_func().
If the callback of the function jumps to a trampoline that is
specific to a the callback and not the standard trampoline,
its address will be printed as well as the function that the
trampoline calls.
function_profile_enabled:
When set it will enable all functions with either the function
tracer, or if enabled, the function graph tracer. It will
tracer, or if configured, the function graph tracer. It will
keep a histogram of the number of functions that were called
and if run with the function graph tracer, it will also keep
and if the function graph tracer was configured, it will also keep
track of the time spent in those functions. The histogram
content can be displayed in the files:
......@@ -283,12 +344,11 @@ of ftrace. Here is a list of some of the key files:
printk_formats:
This is for tools that read the raw format files. If an event in
the ring buffer references a string (currently only trace_printk()
does this), only a pointer to the string is recorded into the buffer
and not the string itself. This prevents tools from knowing what
that string was. This file displays the string and address for
the string allowing tools to map the pointers to what the
strings were.
the ring buffer references a string, only a pointer to the string
is recorded into the buffer and not the string itself. This prevents
tools from knowing what that string was. This file displays the string
and address for the string allowing tools to map the pointers to what
the strings were.
saved_cmdlines:
......@@ -298,6 +358,22 @@ of ftrace. Here is a list of some of the key files:
comms for events. If a pid for a comm is not listed, then
"<...>" is displayed in the output.
If the option "record-cmd" is set to "0", then comms of tasks
will not be saved during recording. By default, it is enabled.
saved_cmdlines_size:
By default, 128 comms are saved (see "saved_cmdlines" above). To
increase or decrease the amount of comms that are cached, echo
in a the number of comms to cache, into this file.
saved_tgids:
If the option "record-tgid" is set, on each scheduling context switch
the Task Group ID of a task is saved in a table mapping the PID of
the thread to its TGID. By default, the "record-tgid" option is
disabled.
snapshot:
This displays the "snapshot" buffer and also lets the user
......@@ -336,6 +412,9 @@ of ftrace. Here is a list of some of the key files:
# cat trace_clock
[local] global counter x86-tsc
The clock with the square brackets around it is the one
in effect.
local: Default clock, but may not be in sync across CPUs
global: This clock is in sync with all CPUs but may
......@@ -448,6 +527,23 @@ of ftrace. Here is a list of some of the key files:
See events.txt for more information.
set_event:
By echoing in the event into this file, will enable that event.
See events.txt for more information.
available_events:
A list of events that can be enabled in tracing.
See events.txt for more information.
hwlat_detector:
Directory for the Hardware Latency Detector.
See "Hardware Latency Detector" section below.
per_cpu:
This is a directory that contains the trace per_cpu information.
......@@ -539,13 +635,25 @@ Here is the list of current tracers that may be configured.
to draw a graph of function calls similar to C code
source.
"blk"
The block tracer. The tracer used by the blktrace user
application.
"hwlat"
The Hardware Latency tracer is used to detect if the hardware
produces any latency. See "Hardware Latency Detector" section
below.
"irqsoff"
Traces the areas that disable interrupts and saves
the trace with the longest max latency.
See tracing_max_latency. When a new max is recorded,
it replaces the old trace. It is best to view this
trace with the latency-format option enabled.
trace with the latency-format option enabled, which
happens automatically when the tracer is selected.
"preemptoff"
......@@ -571,6 +679,26 @@ Here is the list of current tracers that may be configured.
RT tasks (as the current "wakeup" does). This is useful
for those interested in wake up timings of RT tasks.
"wakeup_dl"
Traces and records the max latency that it takes for
a SCHED_DEADLINE task to be woken (as the "wakeup" and
"wakeup_rt" does).
"mmiotrace"
A special tracer that is used to trace binary module.
It will trace all the calls that a module makes to the
hardware. Everything it writes and reads from the I/O
as well.
"branch"
This tracer can be configured when tracing likely/unlikely
calls within the kernel. It will trace when a likely and
unlikely branch is hit and if it was correct in its prediction
of being correct.
"nop"
This is the "trace nothing" tracer. To remove all
......@@ -582,7 +710,7 @@ Examples of using the tracer
----------------------------
Here are typical examples of using the tracers when controlling
them only with the debugfs interface (without using any
them only with the tracefs interface (without using any
user-land utilities).
Output format:
......@@ -674,7 +802,7 @@ why a latency happened. Here is a typical trace.
This shows that the current tracer is "irqsoff" tracing the time
for which interrupts were disabled. It gives the trace version (which
never changes) and the version of the kernel upon which this was executed on
(3.10). Then it displays the max latency in microseconds (259 us). The number
(3.8). Then it displays the max latency in microseconds (259 us). The number
of trace entries displayed and the total number (both are four: #4/4).
VP, KP, SP, and HP are always zero and are reserved for later use.
#P is the number of online CPUs (#P:4).
......@@ -709,6 +837,8 @@ explains which is which.
'.' otherwise.
hardirq/softirq:
'Z' - NMI occurred inside a hardirq
'z' - NMI is running
'H' - hard irq occurred inside a softirq.
'h' - hard irq is running
's' - soft irq is running
......@@ -757,24 +887,24 @@ nohex
nobin
noblock
trace_printk
nobranch
annotate
nouserstacktrace
nosym-userobj
noprintk-msg-only
context-info
nolatency-format
sleep-time
graph-time
record-cmd
norecord-tgid
overwrite
nodisable_on_free
irq-info
markers
noevent-fork
function-trace
nofunction-fork
nodisplay-graph
nostacktrace
nobranch
To disable one of the options, echo in the option prepended with
"no".
......@@ -830,8 +960,6 @@ Here are the available options:
trace_printk - Can disable trace_printk() from writing into the buffer.
branch - Enable branch tracing with the tracer.
annotate - It is sometimes confusing when the CPU buffers are full
and one CPU buffer had a lot of events recently, thus
a shorter time frame, were another CPU may have only had
......@@ -850,7 +978,8 @@ Here are the available options:
<idle>-0 [001] .Ns3 21169.031485: sub_preempt_count <-_raw_spin_unlock
userstacktrace - This option changes the trace. It records a
stacktrace of the current userspace thread.
stacktrace of the current user space thread after
each trace event.
sym-userobj - when user stacktrace are enabled, look up which
object the address belongs to, and print a
......@@ -873,29 +1002,21 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
context-info - Show only the event data. Hides the comm, PID,
timestamp, CPU, and other useful data.
latency-format - This option changes the trace. When
it is enabled, the trace displays
additional information about the
latencies, as described in "Latency
trace format".
sleep-time - When running function graph tracer, to include
the time a task schedules out in its function.
When enabled, it will account time the task has been
scheduled out as part of the function call.
graph-time - When running function profiler with function graph tracer,
to include the time to call nested functions. When this is
not set, the time reported for the function will only
include the time the function itself executed for, not the
time for functions that it called.
latency-format - This option changes the trace output. When it is enabled,
the trace displays additional information about the
latency, as described in "Latency trace format".
record-cmd - When any event or tracer is enabled, a hook is enabled
in the sched_switch trace point to fill comm cache
in the sched_switch trace point to fill comm cache
with mapped pids and comms. But this may cause some
overhead, and if you only care about pids, and not the
name of the task, disabling this option can lower the
impact of tracing.
impact of tracing. See "saved_cmdlines".
record-tgid - When any event or tracer is enabled, a hook is enabled
in the sched_switch trace point to fill the cache of
mapped Thread Group IDs (TGID) mapping to pids. See
"saved_tgids".
overwrite - This controls what happens when the trace buffer is
full. If "1" (default), the oldest events are
......@@ -935,19 +1056,98 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
functions. This keeps the overhead of the tracer down
when performing latency tests.
function-fork - When set, tasks with PIDs listed in set_ftrace_pid will
have the PIDs of their children added to set_ftrace_pid
when those tasks fork. Also, when tasks with PIDs in
set_ftrace_pid exit, their PIDs will be removed from the
file.
display-graph - When set, the latency tracers (irqsoff, wakeup, etc) will
use function graph tracing instead of function tracing.
stacktrace - This is one of the options that changes the trace
itself. When a trace is recorded, so is the stack
of functions. This allows for back traces of
trace sites.
stacktrace - When set, a stack trace is recorded after any trace event
is recorded.
branch - Enable branch tracing with the tracer. This enables branch
tracer along with the currently set tracer. Enabling this
with the "nop" tracer is the same as just enabling the
"branch" tracer.
Note: Some tracers have their own options. They only appear in this
file when the tracer is active. They always appear in the
options directory.
Here are the per tracer options:
Options for function tracer:
func_stack_trace - When set, a stack trace is recorded after every
function that is recorded. NOTE! Limit the functions
that are recorded before enabling this, with
"set_ftrace_filter" otherwise the system performance
will be critically degraded. Remember to disable
this option before clearing the function filter.
Options for function_graph tracer:
Since the function_graph tracer has a slightly different output
it has its own options to control what is displayed.
funcgraph-overrun - When set, the "overrun" of the graph stack is
displayed after each function traced. The
overrun, is when the stack depth of the calls
is greater than what is reserved for each task.
Each task has a fixed array of functions to
trace in the call graph. If the depth of the
calls exceeds that, the function is not traced.
The overrun is the number of functions missed
due to exceeding this array.
funcgraph-cpu - When set, the CPU number of the CPU where the trace
occurred is displayed.
funcgraph-overhead - When set, if the function takes longer than
A certain amount, then a delay marker is
displayed. See "delay" above, under the
header description.
funcgraph-proc - Unlike other tracers, the process' command line
is not displayed by default, but instead only
when a task is traced in and out during a context
switch. Enabling this options has the command
of each process displayed at every line.
funcgraph-duration - At the end of each function (the return)
the duration of the amount of time in the
function is displayed in microseconds.
funcgraph-abstime - When set, the timestamp is displayed at each
line.
funcgraph-irqs - When disabled, functions that happen inside an
interrupt will not be traced.
funcgraph-tail - When set, the return event will include the function
that it represents. By default this is off, and
only a closing curly bracket "}" is displayed for
the return of a function.
sleep-time - When running function graph tracer, to include
the time a task schedules out in its function.
When enabled, it will account time the task has been
scheduled out as part of the function call.
graph-time - When running function profiler with function graph tracer,
to include the time to call nested functions. When this is
not set, the time reported for the function will only
include the time the function itself executed for, not the
time for functions that it called.
Options for blk tracer:
blk_classic - Shows a more minimalistic output.
irqsoff
-------
......@@ -1711,6 +1911,85 @@ events.
<idle>-0 2d..3 6us : 0:120:R ==> [002] 5882: 94:R sleep
Hardware Latency Detector
-------------------------
The hardware latency detector is executed by enabling the "hwlat" tracer.
NOTE, this tracer will affect the performance of the system as it will
periodically make a CPU constantly busy with interrupts disabled.
# echo hwlat > current_tracer
# sleep 100
# cat trace
# tracer: hwlat
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<...>-3638 [001] d... 19452.055471: #1 inner/outer(us): 12/14 ts:1499801089.066141940
<...>-3638 [003] d... 19454.071354: #2 inner/outer(us): 11/9 ts:1499801091.082164365
<...>-3638 [002] dn.. 19461.126852: #3 inner/outer(us): 12/9 ts:1499801098.138150062
<...>-3638 [001] d... 19488.340960: #4 inner/outer(us): 8/12 ts:1499801125.354139633
<...>-3638 [003] d... 19494.388553: #5 inner/outer(us): 8/12 ts:1499801131.402150961
<...>-3638 [003] d... 19501.283419: #6 inner/outer(us): 0/12 ts:1499801138.297435289 nmi-total:4 nmi-count:1
The above output is somewhat the same in the header. All events will have
interrupts disabled 'd'. Under the FUNCTION title there is:
#1 - This is the count of events recorded that were greater than the
tracing_threshold (See below).
inner/outer(us): 12/14
This shows two numbers as "inner latency" and "outer latency". The test
runs in a loop checking a timestamp twice. The latency detected within
the two timestamps is the "inner latency" and the latency detected
after the previous timestamp and the next timestamp in the loop is
the "outer latency".
ts:1499801089.066141940
The absolute timestamp that the event happened.
nmi-total:4 nmi-count:1
On architectures that support it, if an NMI comes in during the
test, the time spent in NMI is reported in "nmi-total" (in
microseconds).
All architectures that have NMIs will show the "nmi-count" if an
NMI comes in during the test.
hwlat files:
tracing_threshold - This gets automatically set to "10" to represent 10
microseconds. This is the threshold of latency that
needs to be detected before the trace will be recorded.
Note, when hwlat tracer is finished (another tracer is
written into "current_tracer"), the original value for
tracing_threshold is placed back into this file.
hwlat_detector/width - The length of time the test runs with interrupts
disabled.
hwlat_detector/window - The length of time of the window which the test
runs. That is, the test will run for "width"
microseconds per "window" microseconds
tracing_cpumask - When the test is started. A kernel thread is created that
runs the test. This thread will alternate between CPUs
listed in the tracing_cpumask between each period
(one "window"). To limit the test to specific CPUs
set the mask in this file to only the CPUs that the test
should run on.
function
--------
......@@ -1821,15 +2100,15 @@ something like this simple program:
#define STR(x) _STR(x)
#define MAX_PATH 256
const char *find_debugfs(void)
const char *find_tracefs(void)
{
static char debugfs[MAX_PATH+1];
static int debugfs_found;
static char tracefs[MAX_PATH+1];
static int tracefs_found;
char type[100];
FILE *fp;
if (debugfs_found)
return debugfs;
if (tracefs_found)
return tracefs;
if ((fp = fopen("/proc/mounts","r")) == NULL) {
perror("/proc/mounts");
......@@ -1839,27 +2118,27 @@ const char *find_debugfs(void)
while (fscanf(fp, "%*s %"
STR(MAX_PATH)
"s %99s %*s %*d %*d\n",
debugfs, type) == 2) {
if (strcmp(type, "debugfs") == 0)
tracefs, type) == 2) {
if (strcmp(type, "tracefs") == 0)
break;
}
fclose(fp);
if (strcmp(type, "debugfs") != 0) {
fprintf(stderr, "debugfs not mounted");
if (strcmp(type, "tracefs") != 0) {
fprintf(stderr, "tracefs not mounted");
return NULL;
}
strcat(debugfs, "/tracing/");
debugfs_found = 1;
strcat(tracefs, "/tracing/");
tracefs_found = 1;
return debugfs;
return tracefs;
}
const char *tracing_file(const char *file_name)
{
static char trace_file[MAX_PATH+1];
snprintf(trace_file, MAX_PATH, "%s/%s", find_debugfs(), file_name);
snprintf(trace_file, MAX_PATH, "%s/%s", find_tracefs(), file_name);
return trace_file;
}
......@@ -1898,12 +2177,12 @@ Or this simple script!
------
#!/bin/bash
debugfs=`sed -ne 's/^debugfs \(.*\) debugfs.*/\1/p' /proc/mounts`
echo nop > $debugfs/tracing/current_tracer
echo 0 > $debugfs/tracing/tracing_on
echo $$ > $debugfs/tracing/set_ftrace_pid
echo function > $debugfs/tracing/current_tracer
echo 1 > $debugfs/tracing/tracing_on
tracefs=`sed -ne 's/^tracefs \(.*\) tracefs.*/\1/p' /proc/mounts`
echo nop > $tracefs/tracing/current_tracer
echo 0 > $tracefs/tracing/tracing_on
echo $$ > $tracefs/tracing/set_ftrace_pid
echo function > $tracefs/tracing/current_tracer
echo 1 > $tracefs/tracing/tracing_on
exec "$@"
------
......@@ -2145,13 +2424,18 @@ include the -pg switch in the compiling of the kernel.)
At compile time every C file object is run through the
recordmcount program (located in the scripts directory). This
program will parse the ELF headers in the C object to find all
the locations in the .text section that call mcount. (Note, only
white listed .text sections are processed, since processing other
sections like .init.text may cause races due to those sections
being freed unexpectedly).
A new section called "__mcount_loc" is created that holds
references to all the mcount call sites in the .text section.
the locations in the .text section that call mcount. Starting
with gcc verson 4.6, the -mfentry has been added for x86, which
calls "__fentry__" instead of "mcount". Which is called before
the creation of the stack frame.
Note, not all sections are traced. They may be prevented by either
a notrace, or blocked another way and all inline functions are not
traced. Check the "available_filter_functions" file to see what functions
can be traced.
A section called "__mcount_loc" is created that holds
references to all the mcount/fentry call sites in the .text section.
The recordmcount program re-links this section back into the
original object. The final linking stage of the kernel will add all these
references into a single table.
......@@ -2679,7 +2963,7 @@ in time without stopping tracing. Ftrace swaps the current
buffer with a spare buffer, and tracing continues in the new
current (=previous spare) buffer.
The following debugfs files in "tracing" are related to this
The following tracefs files in "tracing" are related to this
feature:
snapshot:
......@@ -2752,7 +3036,7 @@ cat: snapshot: Device or resource busy
Instances
---------
In the debugfs tracing directory is a directory called "instances".
In the tracefs tracing directory is a directory called "instances".
This directory can have new directories created inside of it using
mkdir, and removing directories with rmdir. The directory created
with mkdir in this directory will already contain files and other
......
......@@ -3816,7 +3816,7 @@ match_records(struct ftrace_hash *hash, char *func, int len, char *mod)
int exclude_mod = 0;
int found = 0;
int ret;
int clear_filter;
int clear_filter = 0;
if (func) {
func_g.type = filter_parse_regex(func, len, &func_g.search,
......@@ -3950,7 +3950,7 @@ static int cache_mod(struct trace_array *tr,
continue;
/* no func matches all */
if (!func || strcmp(func, "*") == 0 ||
if (strcmp(func, "*") == 0 ||
(ftrace_mod->func &&
strcmp(ftrace_mod->func, func) == 0)) {
ret = 0;
......@@ -3978,6 +3978,7 @@ static int
ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
int reset, int enable);
#ifdef CONFIG_MODULES
static void process_mod_list(struct list_head *head, struct ftrace_ops *ops,
char *mod, bool enable)
{
......@@ -4068,6 +4069,7 @@ static void process_cached_mods(const char *mod_name)
kfree(mod);
}
#endif
/*
* We register the module command as a template to show others how
......
......@@ -1916,7 +1916,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
{
unsigned pid, idx;
if (!tsk->pid || unlikely(tsk->pid > PID_MAX_DEFAULT))
/* treat recording of idle task as a success */
if (!tsk->pid)
return 1;
if (unlikely(tsk->pid > PID_MAX_DEFAULT))
return 0;
/*
......@@ -2002,7 +2006,11 @@ int trace_find_tgid(int pid)
static int trace_save_tgid(struct task_struct *tsk)
{
if (unlikely(!tgid_map || !tsk->pid || tsk->pid > PID_MAX_DEFAULT))
/* treat recording of idle task as a success */
if (!tsk->pid)
return 1;
if (unlikely(!tgid_map || tsk->pid > PID_MAX_DEFAULT))
return 0;
tgid_map[tsk->pid] = tsk->tgid;
......@@ -2029,11 +2037,20 @@ static bool tracing_record_taskinfo_skip(int flags)
*/
void tracing_record_taskinfo(struct task_struct *task, int flags)
{
bool done;
if (tracing_record_taskinfo_skip(flags))
return;
if ((flags & TRACE_RECORD_CMDLINE) && !trace_save_cmdline(task))
return;
if ((flags & TRACE_RECORD_TGID) && !trace_save_tgid(task))
/*
* Record as much task information as possible. If some fail, continue
* to try to record the others.
*/
done = !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(task);
done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(task);
/* If recording any information failed, retry again soon. */
if (!done)
return;
__this_cpu_write(trace_taskinfo_save, false);
......@@ -2050,15 +2067,22 @@ void tracing_record_taskinfo(struct task_struct *task, int flags)
void tracing_record_taskinfo_sched_switch(struct task_struct *prev,
struct task_struct *next, int flags)
{
bool done;
if (tracing_record_taskinfo_skip(flags))
return;
if ((flags & TRACE_RECORD_CMDLINE) &&
(!trace_save_cmdline(prev) || !trace_save_cmdline(next)))
return;
/*
* Record as much task information as possible. If some fail, continue
* to try to record the others.
*/
done = !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(prev);
done &= !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(next);
done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(prev);
done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(next);
if ((flags & TRACE_RECORD_TGID) &&
(!trace_save_tgid(prev) || !trace_save_tgid(next)))
/* If recording any information failed, retry again soon. */
if (!done)
return;
__this_cpu_write(trace_taskinfo_save, false);
......@@ -3334,14 +3358,23 @@ static void print_func_help_header_irq(struct trace_buffer *buf, struct seq_file
unsigned int flags)
{
bool tgid = flags & TRACE_ITER_RECORD_TGID;
seq_printf(m, "# %s _-----=> irqs-off\n", tgid ? " " : "");
seq_printf(m, "# %s / _----=> need-resched\n", tgid ? " " : "");
seq_printf(m, "# %s| / _---=> hardirq/softirq\n", tgid ? " " : "");
seq_printf(m, "# %s|| / _--=> preempt-depth\n", tgid ? " " : "");
seq_printf(m, "# %s||| / delay\n", tgid ? " " : "");
seq_printf(m, "# TASK-PID CPU#%s|||| TIMESTAMP FUNCTION\n", tgid ? " TGID " : "");
seq_printf(m, "# | | | %s|||| | |\n", tgid ? " | " : "");
const char tgid_space[] = " ";
const char space[] = " ";
seq_printf(m, "# %s _-----=> irqs-off\n",
tgid ? tgid_space : space);
seq_printf(m, "# %s / _----=> need-resched\n",
tgid ? tgid_space : space);
seq_printf(m, "# %s| / _---=> hardirq/softirq\n",
tgid ? tgid_space : space);
seq_printf(m, "# %s|| / _--=> preempt-depth\n",
tgid ? tgid_space : space);
seq_printf(m, "# %s||| / delay\n",
tgid ? tgid_space : space);
seq_printf(m, "# TASK-PID CPU#%s|||| TIMESTAMP FUNCTION\n",
tgid ? " TGID " : space);
seq_printf(m, "# | | | %s|||| | |\n",
tgid ? " | " : space);
}
void
......@@ -4689,6 +4722,76 @@ static const struct file_operations tracing_readme_fops = {
.llseek = generic_file_llseek,
};
static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
{
int *ptr = v;
if (*pos || m->count)
ptr++;
(*pos)++;
for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) {
if (trace_find_tgid(*ptr))
return ptr;
}
return NULL;
}
static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
{
void *v;
loff_t l = 0;
if (!tgid_map)
return NULL;
v = &tgid_map[0];
while (l <= *pos) {
v = saved_tgids_next(m, v, &l);
if (!v)
return NULL;
}
return v;
}
static void saved_tgids_stop(struct seq_file *m, void *v)
{
}
static int saved_tgids_show(struct seq_file *m, void *v)
{
int pid = (int *)v - tgid_map;
seq_printf(m, "%d %d\n", pid, trace_find_tgid(pid));
return 0;
}
static const struct seq_operations tracing_saved_tgids_seq_ops = {
.start = saved_tgids_start,
.stop = saved_tgids_stop,
.next = saved_tgids_next,
.show = saved_tgids_show,
};
static int tracing_saved_tgids_open(struct inode *inode, struct file *filp)
{
if (tracing_disabled)
return -ENODEV;
return seq_open(filp, &tracing_saved_tgids_seq_ops);
}
static const struct file_operations tracing_saved_tgids_fops = {
.open = tracing_saved_tgids_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release,
};
static void *saved_cmdlines_next(struct seq_file *m, void *v, loff_t *pos)
{
unsigned int *ptr = v;
......@@ -7921,6 +8024,9 @@ static __init int tracer_init_tracefs(void)
trace_create_file("saved_cmdlines_size", 0644, d_tracer,
NULL, &tracing_saved_cmdlines_size_fops);
trace_create_file("saved_tgids", 0444, d_tracer,
NULL, &tracing_saved_tgids_fops);
trace_eval_init();
trace_create_eval_file(d_tracer);
......
......@@ -598,6 +598,14 @@ static struct notifier_block trace_kprobe_module_nb = {
.priority = 1 /* Invoked after kprobe module callback */
};
/* Convert certain expected symbols into '_' when generating event names */
static inline void sanitize_event_name(char *name)
{
while (*name++ != '\0')
if (*name == ':' || *name == '.')
*name = '_';
}
static int create_trace_kprobe(int argc, char **argv)
{
/*
......@@ -736,6 +744,7 @@ static int create_trace_kprobe(int argc, char **argv)
else
snprintf(buf, MAX_EVENT_NAME_LEN, "%c_0x%p",
is_return ? 'r' : 'p', addr);
sanitize_event_name(buf);
event = buf;
}
tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive,
......
......@@ -406,6 +406,8 @@ static const struct file_operations stack_trace_fops = {
.release = seq_release,
};
#ifdef CONFIG_DYNAMIC_FTRACE
static int
stack_trace_filter_open(struct inode *inode, struct file *file)
{
......@@ -423,6 +425,8 @@ static const struct file_operations stack_trace_filter_fops = {
.release = ftrace_regex_release,
};
#endif /* CONFIG_DYNAMIC_FTRACE */
int
stack_trace_sysctl(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
......@@ -477,8 +481,10 @@ static __init int stack_trace_init(void)
trace_create_file("stack_trace", 0444, d_tracer,
NULL, &stack_trace_fops);
#ifdef CONFIG_DYNAMIC_FTRACE
trace_create_file("stack_trace_filter", 0444, d_tracer,
&trace_ops, &stack_trace_filter_fops);
#endif
if (stack_trace_filter_buf[0])
ftrace_set_early_filter(&trace_ops, stack_trace_filter_buf, 1);
......
#!/bin/sh
# description: Kprobe event auto/manual naming
[ -f kprobe_events ] || exit_unsupported # this is configurable
disable_events
echo > kprobe_events
:;: "Add an event on function without name" ;:
FUNC=`grep " [tT] .*vfs_read$" /proc/kallsyms | tail -n 1 | cut -f 3 -d " "`
[ "x" != "x$FUNC" ] || exit_unresolved
echo "p $FUNC" > kprobe_events
PROBE_NAME=`echo $FUNC | tr ".:" "_"`
test -d events/kprobes/p_${PROBE_NAME}_0 || exit_failure
:;: "Add an event on function with new name" ;:
echo "p:event1 $FUNC" > kprobe_events
test -d events/kprobes/event1 || exit_failure
:;: "Add an event on function with new name and group" ;:
echo "p:kprobes2/event2 $FUNC" > kprobe_events
test -d events/kprobes2/event2 || exit_failure
:;: "Add an event on dot function without name" ;:
FUNC=`grep -m 10 " [tT] .*\.isra\..*$" /proc/kallsyms | tail -n 1 | cut -f 3 -d " "`
[ "x" != "x$FUNC" ] || exit_unresolved
echo "p $FUNC" > kprobe_events
EVENT=`grep $FUNC kprobe_events | cut -f 1 -d " " | cut -f 2 -d:`
[ "x" != "x$EVENT" ] || exit_failure
test -d events/$EVENT || exit_failure
echo > kprobe_events
#!/bin/sh
# description: Kprobe dynamic event - probing module
[ -f kprobe_events ] || exit_unsupported # this is configurable
disable_events
echo > kprobe_events
:;: "Add an event on a module function without specifying event name" ;:
MOD=`lsmod | head -n 2 | tail -n 1 | cut -f1 -d" "`
FUNC=`grep -m 1 ".* t .*\\[$MOD\\]" /proc/kallsyms | xargs | cut -f3 -d" "`
[ "x" != "x$MOD" -a "y" != "y$FUNC" ] || exit_unresolved
echo "p $MOD:$FUNC" > kprobe_events
PROBE_NAME=`echo $MOD:$FUNC | tr ".:" "_"`
test -d events/kprobes/p_${PROBE_NAME}_0 || exit_failure
:;: "Add an event on a module function with new event name" ;:
echo "p:event1 $MOD:$FUNC" > kprobe_events
test -d events/kprobes/event1 || exit_failure
:;: "Add an event on a module function with new event and group name" ;:
echo "p:kprobes1/event1 $MOD:$FUNC" > kprobe_events
test -d events/kprobes1/event1 || exit_failure
echo > kprobe_events
......@@ -2,10 +2,10 @@
# description: Register/unregister many kprobe events
# ftrace fentry skip size depends on the machine architecture.
# Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc
# Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc64le
case `uname -m` in
x86_64|i[3456]86) OFFS=5;;
ppc*) OFFS=4;;
ppc64le) OFFS=8;;
*) OFFS=0;;
esac
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment