Commits · aafe104aa9096827a429bc1358f8260ee565b7cc · Kirill Smelkov / linux

30 Apr, 2021 1 commit

tracing: Restructure trace_clock_global() to never block · aafe104a

Steven Rostedt (VMware) authored Apr 30, 2021

It was reported that a fix to the ring buffer recursion detection would
cause a hung machine when performing suspend / resume testing. The
following backtrace was extracted from debugging that case:

Call Trace:
 trace_clock_global+0x91/0xa0
 __rb_reserve_next+0x237/0x460
 ring_buffer_lock_reserve+0x12a/0x3f0
 trace_buffer_lock_reserve+0x10/0x50
 __trace_graph_return+0x1f/0x80
 trace_graph_return+0xb7/0xf0
 ? trace_clock_global+0x91/0xa0
 ftrace_return_to_handler+0x8b/0xf0
 ? pv_hash+0xa0/0xa0
 return_to_handler+0x15/0x30
 ? ftrace_graph_caller+0xa0/0xa0
 ? trace_clock_global+0x91/0xa0
 ? __rb_reserve_next+0x237/0x460
 ? ring_buffer_lock_reserve+0x12a/0x3f0
 ? trace_event_buffer_lock_reserve+0x3c/0x120
 ? trace_event_buffer_reserve+0x6b/0xc0
 ? trace_event_raw_event_device_pm_callback_start+0x125/0x2d0
 ? dpm_run_callback+0x3b/0xc0
 ? pm_ops_is_empty+0x50/0x50
 ? platform_get_irq_byname_optional+0x90/0x90
 ? trace_device_pm_callback_start+0x82/0xd0
 ? dpm_run_callback+0x49/0xc0

With the following RIP:

RIP: 0010:native_queued_spin_lock_slowpath+0x69/0x200

Since the fix to the recursion detection would allow a single recursion to
happen while tracing, this lead to the trace_clock_global() taking a spin
lock and then trying to take it again:

ring_buffer_lock_reserve() {
  trace_clock_global() {
    arch_spin_lock() {
      queued_spin_lock_slowpath() {
        /* lock taken */
        (something else gets traced by function graph tracer)
          ring_buffer_lock_reserve() {
            trace_clock_global() {
              arch_spin_lock() {
                queued_spin_lock_slowpath() {
                /* DEAD LOCK! */

Tracing should *never* block, as it can lead to strange lockups like the
above.

Restructure the trace_clock_global() code to instead of simply taking a
lock to update the recorded "prev_time" simply use it, as two events
happening on two different CPUs that calls this at the same time, really
doesn't matter which one goes first. Use a trylock to grab the lock for
updating the prev_time, and if it fails, simply try again the next time.
If it failed to be taken, that means something else is already updating
it.

Link: https://lkml.kernel.org/r/20210430121758.650b6e8a@gandalf.local.home

Cc: stable@vger.kernel.org
Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Fixes: b02414c8 ("ring-buffer: Fix recursion protection transitions between interrupt context") # started showing the problem
Fixes: 14131f2f ("tracing: implement trace_clock_*() APIs") # where the bug happened
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212761Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

aafe104a

28 Apr, 2021 1 commit

tracing: Map all PIDs to command lines · 785e3c0a

Steven Rostedt (VMware) authored Apr 27, 2021

The default max PID is set by PID_MAX_DEFAULT, and the tracing
infrastructure uses this number to map PIDs to the comm names of the
tasks, such output of the trace can show names from the recorded PIDs in
the ring buffer. This mapping is also exported to user space via the
"saved_cmdlines" file in the tracefs directory.

But currently the mapping expects the PIDs to be less than
PID_MAX_DEFAULT, which is the default maximum and not the real maximum.
Recently, systemd will increases the maximum value of a PID on the system,
and when tasks are traced that have a PID higher than PID_MAX_DEFAULT, its
comm is not recorded. This leads to the entire trace to have "<...>" as
the comm name, which is pretty useless.

Instead, keep the array mapping the size of PID_MAX_DEFAULT, but instead
of just mapping the index to the comm, map a mask of the PID
(PID_MAX_DEFAULT - 1) to the comm, and find the full PID from the
map_cmdline_to_pid array (that already exists).

This bug goes back to the beginning of ftrace, but hasn't been an issue
until user space started increasing the maximum value of PIDs.

Link: https://lkml.kernel.org/r/20210427113207.3c601884@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: bc0c38d1 ("ftrace: latency tracer infrastructure")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

785e3c0a

15 Apr, 2021 7 commits

ftrace: Reuse the output of the function tracer for func_repeats · e1db6338

Steven Rostedt (VMware) authored Apr 15, 2021

The func_repeats event shows the output of the function tracer followed by
a count of the number of repeats the previous function had made, as well
as the timestamp of the last function that was repeated.

The printing of the function should be the same as is for the function it
is displaying. Reuse the code in trace_fn_trace() by making a helper
function print_fn_trace() and use it for trace_func_repeats_print().
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

e1db6338

tracing: Add "func_no_repeats" option for function tracing · 22db095d

Yordan Karadzhov (VMware) authored Apr 15, 2021

If the option is activated the function tracing record gets
consolidated in the cases when a single function is called number
of times consecutively. Instead of having an identical record for
each call of the function we will record only the first call
following by event showing the number of repeats.

Link: https://lkml.kernel.org/r/20210415181854.147448-7-y.karadz@gmail.comSigned-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

22db095d

tracing: Unify the logic for function tracing options · 4994891e

Yordan Karadzhov (VMware) authored Apr 15, 2021

Currently the logic for dealing with the options for function tracing
has two different implementations. One is used when we set the flags
(in "static int func_set_flag()") and another used when we initialize
the tracer (in "static int function_trace_init()"). Those two
implementations are meant to do essentially the same thing and they
are both not very convenient for adding new options. In this patch
we add a helper function that provides a single implementation of
the logic for dealing with the options and we make it such that new
options can be easily added.

Link: https://lkml.kernel.org/r/20210415181854.147448-6-y.karadz@gmail.comSigned-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

4994891e

tracing: Add method for recording "func_repeats" events · c658797f

Yordan Karadzhov (VMware) authored Apr 15, 2021

This patch only provides the implementation of the method.
Later we will used it in a combination with a new option for
function tracing.

Link: https://lkml.kernel.org/r/20210415181854.147448-5-y.karadz@gmail.comSigned-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

c658797f

tracing: Add "last_func_repeats" to struct trace_array · 20344c54

Yordan Karadzhov (VMware) authored Apr 15, 2021

The field is used to keep track of the consecutive (on the same CPU) calls
of a single function. This information is needed in order to consolidate
the function tracing record in the cases when a single function is called
number of times.

Link: https://lkml.kernel.org/r/20210415181854.147448-4-y.karadz@gmail.comSigned-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

20344c54

tracing: Define new ftrace event "func_repeats" · f689e4f2

Yordan Karadzhov (VMware) authored Apr 15, 2021

The event aims to consolidate the function tracing record in the cases
when a single function is called number of times consecutively.

	while (cond)
		do_func();

This may happen in various scenarios (busy waiting for example).
The new ftrace event can be used to show repeated function events with
a single event and save space on the ring buffer

Link: https://lkml.kernel.org/r/20210415181854.147448-3-y.karadz@gmail.comSigned-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

f689e4f2

tracing: Define static void trace_print_time() · eaa7a897

Yordan Karadzhov (VMware) authored Apr 15, 2021

The part of the code that prints the time of the trace record in
"int trace_print_context()" gets extracted in a static function. This
is done as a preparation for a following patch, in which we will define
a new ftrace event called "func_repeats". The new static method,
defined here, will be used by this new event to print the time of the
last repeat of a function that is consecutively called number of times.

Link: https://lkml.kernel.org/r/20210415181854.147448-2-y.karadz@gmail.comSigned-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

eaa7a897

01 Apr, 2021 6 commits

ftrace: Simplify the calculation of page number for ftrace_page->records some more · ceaaa129

Steven Rostedt (VMware) authored Apr 01, 2021

Commit b40c6eab ("ftrace: Simplify the calculation of page number for
ftrace_page->records") simplified the calculation of the number of pages
needed for each page group without having any empty pages, but it can be
simplified even further.

Link: https://lore.kernel.org/lkml/CAHk-=wjt9b7kxQ2J=aDNKbR1QBMB3Hiqb_hYcZbKsxGRSEb+gQ@mail.gmail.com/Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

ceaaa129

ftrace: Store the order of pages allocated in ftrace_page · db42523b

Linus Torvalds authored Apr 01, 2021

Instead of saving the size of the records field of the ftrace_page, store
the order it uses to allocate the pages, as that is what is needed to know
in order to free the pages. This simplifies the code.

Link: https://lore.kernel.org/lkml/CAHk-=whyMxheOqXAORt9a7JK9gc9eHTgCJ55Pgs4p=X3RrQubQ@mail.gmail.com/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ change log written by Steven Rostedt ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

db42523b

tracing: Remove unused argument from "ring_buffer_time_stamp() · f3ef7202

Yordan Karadzhov (VMware) authored Mar 29, 2021

The "cpu" parameter is not being used by the function.

Link: https://lkml.kernel.org/r/20210329130331.199402-1-y.karadz@gmail.comSigned-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

f3ef7202

tracing: Remove duplicate struct declaration in trace_events.h · 1decdb33

Wan Jiabing authored Mar 30, 2021

struct trace_array is declared twice. One has been declared
at forward declaration. Remove the duplicate.

Link: https://lkml.kernel.org/r/20210330034056.2266969-1-wanjiabing@vivo.comSigned-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

1decdb33

Merge branch 'trace/ftrace/urgent' into HEAD · 22d5755a

Steven Rostedt (VMware) authored Apr 01, 2021

Needed to merge trace/ftrace/urgent to get:

  Commit 59300b36 ("ftrace: Check if pages were allocated before calling free_pages()")

To clean up the code that is affected by it as well.

22d5755a

tracing: Fix stack trace event size · 9deb193a

Steven Rostedt (VMware) authored Apr 01, 2021

Commit cbc3b92c fixed an issue to modify the macros of the stack trace
event so that user space could parse it properly. Originally the stack
trace format to user space showed that the called stack was a dynamic
array. But it is not actually a dynamic array, in the way that other
dynamic event arrays worked, and this broke user space parsing for it. The
update was to make the array look to have 8 entries in it. Helper
functions were added to make it parse it correctly, as the stack was
dynamic, but was determined by the size of the event stored.

Although this fixed user space on how it read the event, it changed the
internal structure used for the stack trace event. It changed the array
size from [0] to [8] (added 8 entries). This increased the size of the
stack trace event by 8 words. The size reserved on the ring buffer was the
size of the stack trace event plus the number of stack entries found in
the stack trace. That commit caused the amount to be 8 more than what was
needed because it did not expect the caller field to have any size. This
produced 8 entries of garbage (and reading random data) from the stack
trace event:

          <idle>-0       [002] d... 1976396.837549: <stack trace>
 => trace_event_raw_event_sched_switch
 => __traceiter_sched_switch
 => __schedule
 => schedule_idle
 => do_idle
 => cpu_startup_entry
 => secondary_startup_64_no_verify
 => 0xc8c5e150ffff93de
 => 0xffff93de
 => 0
 => 0
 => 0xc8c5e17800000000
 => 0x1f30affff93de
 => 0x00000004
 => 0x200000000

Instead, subtract the size of the caller field from the size of the event
to make sure that only the amount needed to store the stack trace is
reserved.

Link: https://lore.kernel.org/lkml/your-ad-here.call-01617191565-ext-9692@work.hours/

Cc: stable@vger.kernel.org
Fixes: cbc3b92c ("tracing: Set kernel_stack's caller size properly")
Reported-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

9deb193a

30 Mar, 2021 1 commit

ftrace: Check if pages were allocated before calling free_pages() · 59300b36

Steven Rostedt (VMware) authored Mar 30, 2021

It is possible that on error pg->size can be zero when getting its order,
which would return a -1 value. It is dangerous to pass in an order of -1
to free_pages(). Check if order is greater than or equal to zero before
calling free_pages().

Link: https://lore.kernel.org/lkml/20210330093916.432697c7@gandalf.local.home/Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

59300b36

25 Mar, 2021 3 commits

tracing: Update create_system_filter() kernel-doc comment · 70193038

Qiujun Huang authored Mar 25, 2021

commit f306cc82 ("tracing: Update event filters for multibuffer")
added the parameter @tr for create_system_filter().

commit bb9ef1cb ("tracing: Change apply_subsystem_event_filter()
paths to check file->system == dir") changed the parameter from @system to @dir.

Link: https://lkml.kernel.org/r/20210325161911.123452-1-hqjagain@gmail.comSigned-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

70193038

tracing: A minor cleanup for create_system_filter() · 30c3d39f

Qiujun Huang authored Mar 25, 2021

The first two parameters should be reduced to one, as @tr is simply
@dir->tr.

Link: https://lkml.kernel.org/r/20210324205642.65e03248@oasis.local.home
Link: https://lkml.kernel.org/r/20210325163752.128407-1-hqjagain@gmail.comSuggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

30c3d39f

kernel: trace: Mundane typo fixes in the file trace_events_filter.c · 4613bdcc

Bhaskar Chowdhury authored Mar 17, 2021

s/callin/calling/

Link: https://lkml.kernel.org/r/20210317095401.1854544-1-unixbhaskar@gmail.comAcked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
[ Other fixes already done by Ingo Molnar ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

4613bdcc

23 Mar, 2021 1 commit

tracing: Fix various typos in comments · f2cc020d

Ingo Molnar authored Mar 23, 2021

Fix ~59 single-word typos in the tracing code comments, and fix
the grammar in a handful of places.

Link: https://lore.kernel.org/r/20210322224546.GA1981273@gmail.com
Link: https://lkml.kernel.org/r/20210323174935.GA4176821@gmail.comReviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

f2cc020d

22 Mar, 2021 2 commits

scripts/recordmcount.pl: Make vim and emacs indent the same · b700fc3a

Steven Rostedt (VMware) authored Mar 22, 2021

By default, emacs indents Perl files with 4 spaces, but will use tabs
where 8 spaces are used. Add a vim command of softtabstop=4, to make vim
behave the same. This should remove the issue of developers using vim
having causing different indentation.

"John (Warthog9) Hawley" <warthog9@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

b700fc3a

scripts/recordmcount.pl: Make indent spacing consistent · 2cf3af7a

Steven Rostedt (VMware) authored Mar 22, 2021

Emacs by default will have perl files have 4 space indents, where 8 spaces
are represented with a single tab. There are some places in
recordmcount.pl that has 8 spaces where a tab should be used. Replace them
to make the file consistent.

No functional changes.

Cc: "John (Warthog9) Hawley" <warthog9@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

2cf3af7a

18 Mar, 2021 14 commits

tracing: Add a verifier to check string pointers for trace events · 9a6944fe

Steven Rostedt (VMware) authored Feb 25, 2021

It is a common mistake for someone writing a trace event to save a pointer
to a string in the TP_fast_assign() and then display that string pointer
in the TP_printk() with %s. The problem is that those two events may happen
a long time apart, where the source of the string may no longer exist.

The proper way to handle displaying any string that is not guaranteed to be
in the kernel core rodata section, is to copy it into the ring buffer via
the __string(), __assign_str() and __get_str() helper macros.

Add a check at run time while displaying the TP_printk() of events to make
sure that every %s referenced is safe to dereference, and if it is not,
trigger a warning and only show the address of the pointer, and the
dereferenced string if it can be safely retrieved with a
strncpy_from_kernel_nofault() call.

In order to not have to copy the parsing of vsnprintf() formats, or even
exporting its code, the verifier relies on vsnprintf() being able to
modify the va_list that is passed to it, and it remains modified after it
is called. This is the case for some architectures like x86_64, but other
architectures like x86_32 pass the va_list to vsnprintf() as a value not a
reference, and the verifier can not use it to parse the non string
arguments. Thus, at boot up, it is checked if vsnprintf() modifies the
passed in va_list or not, and a static branch will disable the verifier if
it's not compatible.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

9a6944fe

seq_buf: Add seq_buf_terminate() API · f2616c77

Steven Rostedt (VMware) authored Mar 17, 2021

In the case that the seq_buf buffer needs to be printed directly, add a way
to make sure that the buffer is safe to read by forcing a nul terminating
character at the end of the string, or the last byte of the buffer if the
string has overflowed.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

f2616c77

tracing: Add check of trace event print fmts for dereferencing pointers · 5013f454

Steven Rostedt (VMware) authored Feb 25, 2021

Trace events record data into the ring buffer at the time of the event. The
trace event has a printf logic to display the recorded data at a much later
time when the user reads the trace file. This makes using dereferencing
pointers unsafe if the dereferenced pointer points to the original source.
The safe way to handle this is to create an array within the trace event and
copy the source into the array. Then the dereference pointer may point to
that array.

As this is a easy mistake to make, a check is added to examine all trace
event print fmts to make sure that they are safe to use. This only checks
the various %p* dereferenced pointers like %pB, %pR, etc. It does not handle
dereferencing of strings, as there are some use cases that are OK to
dereference the source. That will be dealt with differently.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

5013f454

ftrace: Fix spelling mistake "disabed" -> "disabled" · e0196ae7

Colin Ian King authored Mar 11, 2021

There is a spelling mistake in a comment, fix it.

Link: https://lkml.kernel.org/r/20210311094022.5978-1-colin.king@canonical.comSigned-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

e0196ae7

tools/latency-collector: Remove unneeded semicolon · 421d9d1b

Xu Wang authored Mar 08, 2021

Fix semicolon.cocci warning:
tools/tracing/latency/latency-collector.c:1021:2-3: Unneeded semicolon

Link: https://lkml.kernel.org/r/20210308022459.59881-1-vulab@iscas.ac.cnReviewed-by: Viktor Rosendahl <Viktor.Rosendahl@bmw.de>
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

421d9d1b

bootconfig: Update prototype of setup_boot_config() · 2b7d2fe7

Cao jin authored Mar 11, 2021

Parameter "cmdline" has no use, drop it.

Link: https://lkml.kernel.org/r/20210311085213.27680-1-jojing64@gmail.comAcked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Cao jin <jojing64@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

2b7d2fe7

tracing: Add tracing_event_time_stamp() API · d8279bfc

Steven Rostedt (VMware) authored Mar 16, 2021

Add a tracing_event_time_stamp() API that checks if the event passed in is
not on the ring buffer but a pointer to the per CPU trace_buffered_event
which does not have its time stamp set yet.

If it is a pointer to the trace_buffered_event, then just return the
current time stamp that the ring buffer would produce.

Otherwise, return the time stamp from the event.

Link: https://lkml.kernel.org/r/20210316164114.131996180@goodmis.orgReviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

d8279bfc

ring-buffer: Add verifier for using ring_buffer_event_time_stamp() · a948c69d

Steven Rostedt (VMware) authored Mar 16, 2021

The ring_buffer_event_time_stamp() must be only called by an event that has
not been committed yet, and is on the buffer that is passed in. This was
used to help debug converting the histogram logic over to using the new
time stamp code, and was proven to be very useful.

Add a verifier that can check that this is the case, and extra WARN_ONs to
catch unexpected use cases.

Link: https://lkml.kernel.org/r/20210316164113.987294354@goodmis.orgReviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

a948c69d

tracing: Use a no_filter_buffering_ref to stop using the filter buffer · b94bc80d

Steven Rostedt (VMware) authored Mar 16, 2021

Currently, the trace histograms relies on it using absolute time stamps to
trigger the tracing to not use the temp buffer if filters are set. That's
because the histograms need the full timestamp that is saved in the ring
buffer. That is no longer the case, as the ring_buffer_event_time_stamp()
can now return the time stamp for all events without all triggering a full
absolute time stamp.

Now that the absolute time stamp is an unrelated dependency to not using
the filters. There's nothing about having absolute timestamps to keep from
using the filter buffer. Instead, change the interface to explicitly state
to disable filter buffering that the histogram logic can use.

Link: https://lkml.kernel.org/r/20210316164113.847886563@goodmis.orgReviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

b94bc80d

ring-buffer: Allow ring_buffer_event_time_stamp() to return time stamp of all events · efe6196a

Steven Rostedt (VMware) authored Mar 16, 2021

Currently, ring_buffer_event_time_stamp() only returns an accurate time
stamp of the event if it has an absolute extended time stamp attached to
it. To make it more robust, use the event_stamp() in case the event does
not have an absolute value attached to it.

This will allow ring_buffer_event_time_stamp() to be used in more cases
than just histograms, and it will also allow histograms to not require
including absolute values all the time.

Link: https://lkml.kernel.org/r/20210316164113.704830885@goodmis.orgReviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

efe6196a

tracing: Pass buffer of event to trigger operations · b47e3302

Steven Rostedt (VMware) authored Mar 16, 2021

The ring_buffer_event_time_stamp() is going to be updated to extract the
time stamp for the event without needing it to be set to have absolute
values for all events. But to do so, it needs the buffer that the event is
on as the buffer saves information for the event before it is committed to
the buffer.

If the trace buffer is disabled, a temporary buffer is used, and there's
no access to this buffer from the current histogram triggers, even though
it is passed to the trace event code.

Pass the buffer that the event is on all the way down to the histogram
triggers.

Link: https://lkml.kernel.org/r/20210316164113.542448131@goodmis.orgReviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

b47e3302

ring-buffer: Add a event_stamp to cpu_buffer for each level of nesting · 8672e494

Steven Rostedt (VMware) authored Mar 16, 2021

Add a place to save the current event time stamp for each level of nesting.
This will be used to retrieve the time stamp of the current event before it
is committed.

Link: https://lkml.kernel.org/r/20210316164113.399089673@goodmis.orgReviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

8672e494

ring-buffer: Separate out internal use of ring_buffer_event_time_stamp() · e20044f7

Steven Rostedt (VMware) authored Mar 16, 2021

The exported use of ring_buffer_event_time_stamp() is going to become
different than how it is used internally. Move the internal logic out into a
static function called rb_event_time_stamp(), and have the internal callers
call that instead.

Link: https://lkml.kernel.org/r/20210316164113.257790481@goodmis.orgReviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

e20044f7

workqueue/tracing: Copy workqueue name to buffer in trace event · 83b62687

Steven Rostedt (VMware) authored Mar 18, 2021

The trace event "workqueue_queue_work" references an unsafe string in
dereferencing the name of the workqueue. As the name is allocated, it
could later be freed, and the pointer to that string could stay on the
tracing buffer. If the trace buffer is read after the string is freed, it
will reference an unsafe pointer.

I added a new verifier to make sure that all strings referenced in the
output of the trace buffer is safe to read and this triggered on the
workqueue_queue_work trace event:

workqueue_queue_work: work struct=00000000b2b235c7 function=gc_worker workqueue=(0xffff888100051160:events_power_efficient)[UNSAFE-MEMORY] req_cpu=256 cpu=1
workqueue_queue_work: work struct=00000000c344caec function=flush_to_ldisc workqueue=(0xffff888100054d60:events_unbound)[UNSAFE-MEMORY] req_cpu=256 cpu=4294967295
workqueue_queue_work: work struct=00000000b2b235c7 function=gc_worker workqueue=(0xffff888100051160:events_power_efficient)[UNSAFE-MEMORY] req_cpu=256 cpu=1
workqueue_queue_work: work struct=000000000b238b3f function=vmstat_update workqueue=(0xffff8881000c3760:mm_percpu_wq)[UNSAFE-MEMORY] req_cpu=1 cpu=1

Also, if this event is read via a user space application like perf or
trace-cmd, the name would only be an address and useless information:

workqueue_queue_work: work struct=0xffff953f80b4b918 function=disk_events_workfn workqueue=ffff953f8005d378 req_cpu=8192 cpu=5

Cc: Zqiang <qiang.zhang@windriver.com>
Cc: Tejun Heo <tj@kernel.org>
Fixes: 7bf9c4a8 ("workqueue: tracing the name of the workqueue instead of it's address")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

83b62687

14 Mar, 2021 4 commits

Linux 5.12-rc3 · 1e28eed1
Linus Torvalds authored Mar 14, 2021

1e28eed1

prctl: fix PR_SET_MM_AUXV kernel stack leak · c995f12a

Alexey Dobriyan authored Mar 14, 2021

Doing a

	prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1);

will copy 1 byte from userspace to (quite big) on-stack array
and then stash everything to mm->saved_auxv.
AT_NULL terminator will be inserted at the very end.

/proc/*/auxv handler will find that AT_NULL terminator
and copy original stack contents to userspace.

This devious scheme requires CAP_SYS_RESOURCE.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c995f12a

Merge tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 70404fe3

Linus Torvalds authored Mar 14, 2021

Pull irq fixes from Thomas Gleixner:
 "A set of irqchip updates:

   - Make the GENERIC_IRQ_MULTI_HANDLER configuration correct

   - Add a missing DT compatible string for the Ingenic driver

   - Remove the pointless debugfs_file pointer from struct irqdomain"

* tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/ingenic: Add support for the JZ4760
  dt-bindings/irq: Add compatible string for the JZ4760B
  irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER
  ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly
  irqdomain: Remove debugfs_file from struct irq_domain

70404fe3

Merge tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 802b31c0

Linus Torvalds authored Mar 14, 2021

Pull timer fix from Thomas Gleixner:
 "A single fix in for hrtimers to prevent an interrupt storm caused by
  the lack of reevaluation of the timers which expire in softirq context
  under certain circumstances, e.g. when the clock was set"

* tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()

802b31c0