Commit 714f83d5 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
  tracing, net: fix net tree and tracing tree merge interaction
  tracing, powerpc: fix powerpc tree and tracing tree interaction
  ring-buffer: do not remove reader page from list on ring buffer free
  function-graph: allow unregistering twice
  trace: make argument 'mem' of trace_seq_putmem() const
  tracing: add missing 'extern' keywords to trace_output.h
  tracing: provide trace_seq_reserve()
  blktrace: print out BLK_TN_MESSAGE properly
  blktrace: extract duplidate code
  blktrace: fix memory leak when freeing struct blk_io_trace
  blktrace: fix blk_probes_ref chaos
  blktrace: make classic output more classic
  blktrace: fix off-by-one bug
  blktrace: fix the original blktrace
  blktrace: fix a race when creating blk_tree_root in debugfs
  blktrace: fix timestamp in binary output
  tracing, Text Edit Lock: cleanup
  tracing: filter fix for TRACE_EVENT_FORMAT events
  ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
  x86: kretprobe-booster interrupt emulation code fix
  ...

Fix up trivial conflicts in
 arch/parisc/include/asm/ftrace.h
 include/linux/memory.h
 kernel/extable.c
 kernel/module.c
parents 8901e7ff 645dae96
What: /sys/kernel/debug/kmemtrace/
Date: July 2008
Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Description:
In kmemtrace-enabled kernels, the following files are created:
/sys/kernel/debug/kmemtrace/
cpu<n> (0400) Per-CPU tracing data, see below. (binary)
total_overruns (0400) Total number of bytes which were dropped from
cpu<n> files because of full buffer condition,
non-binary. (text)
abi_version (0400) Kernel's kmemtrace ABI version. (text)
Each per-CPU file should be read according to the relay interface. That is,
the reader should set affinity to that specific CPU and, as currently done by
the userspace application (though there are other methods), use poll() with
an infinite timeout before every read(). Otherwise, erroneous data may be
read. The binary data has the following _core_ format:
Event ID (1 byte) Unsigned integer, one of:
0 - represents an allocation (KMEMTRACE_EVENT_ALLOC)
1 - represents a freeing of previously allocated memory
(KMEMTRACE_EVENT_FREE)
Type ID (1 byte) Unsigned integer, one of:
0 - this is a kmalloc() / kfree()
1 - this is a kmem_cache_alloc() / kmem_cache_free()
2 - this is a __get_free_pages() et al.
Event size (2 bytes) Unsigned integer representing the
size of this event. Used to extend
kmemtrace. Discard the bytes you
don't know about.
Sequence number (4 bytes) Signed integer used to reorder data
logged on SMP machines. Wraparound
must be taken into account, although
it is unlikely.
Caller address (8 bytes) Return address to the caller.
Pointer to mem (8 bytes) Pointer to target memory area. Can be
NULL, but not all such calls might be
recorded.
In case of KMEMTRACE_EVENT_ALLOC events, the next fields follow:
Requested bytes (8 bytes) Total number of requested bytes,
unsigned, must not be zero.
Allocated bytes (8 bytes) Total number of actually allocated
bytes, unsigned, must not be lower
than requested bytes.
Requested flags (4 bytes) GFP flags supplied by the caller.
Target CPU (4 bytes) Signed integer, valid for event id 1.
If equal to -1, target CPU is the same
as origin CPU, but the reverse might
not be true.
The data is made available in the same endianness the machine has.
Other event ids and type ids may be defined and added. Other fields may be
added by increasing event size, but see below for details.
Every modification to the ABI, including new id definitions, are followed
by bumping the ABI version by one.
Adding new data to the packet (features) is done at the end of the mandatory
data:
Feature size (2 byte)
Feature ID (1 byte)
Feature data (Feature size - 3 bytes)
Users:
kmemtrace-user - git://repo.or.cz/kmemtrace-user.git
This diff is collapsed.
......@@ -50,6 +50,7 @@ parameter is applicable:
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
JOY Appropriate joystick support is enabled.
KMEMTRACE kmemtrace is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
LOOP Loopback device support is enabled.
......@@ -1081,6 +1082,15 @@ and is between 256 and 4096 characters. It is defined in the file
use the HighMem zone if it exists, and the Normal
zone if it does not.
kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no }
Controls whether kmemtrace is enabled
at boot-time.
kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of
subbufs kmemtrace's relay channel has. Set this
higher than default (KMEMTRACE_N_SUBBUFS in code) if
you experience buffer overruns.
movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This parameter
is similar to kernelcore except it specifies the
amount of memory used for migratable allocations.
......@@ -2367,6 +2377,8 @@ and is between 256 and 4096 characters. It is defined in the file
tp720= [HW,PS2]
trace_buf_size=nn[KMG] [ftrace] will set tracing buffer size.
trix= [HW,OSS] MediaTrix AudioTrix Pro
Format:
<io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq>
......
......@@ -115,6 +115,8 @@ On all - write a character to /proc/sysrq-trigger. e.g.:
'x' - Used by xmon interface on ppc/powerpc platforms.
'z' - Dump the ftrace buffer
'0'-'9' - Sets the console log level, controlling which kernel messages
will be printed to your console. ('0', for example would make
it so that only emergency messages like PANICs or OOPSes would
......
......@@ -45,8 +45,8 @@ In include/trace/subsys.h :
#include <linux/tracepoint.h>
DECLARE_TRACE(subsys_eventname,
TPPROTO(int firstarg, struct task_struct *p),
TPARGS(firstarg, p));
TP_PROTO(int firstarg, struct task_struct *p),
TP_ARGS(firstarg, p));
In subsys/file.c (where the tracing statement must be added) :
......@@ -66,10 +66,10 @@ Where :
- subsys is the name of your subsystem.
- eventname is the name of the event to trace.
- TPPROTO(int firstarg, struct task_struct *p) is the prototype of the
- TP_PROTO(int firstarg, struct task_struct *p) is the prototype of the
function called by this tracepoint.
- TPARGS(firstarg, p) are the parameters names, same as found in the
- TP_ARGS(firstarg, p) are the parameters names, same as found in the
prototype.
Connecting a function (probe) to a tracepoint is done by providing a
......@@ -103,13 +103,14 @@ used to export the defined tracepoints.
* Probe / tracepoint example
See the example provided in samples/tracepoints/src
See the example provided in samples/tracepoints
Compile them with your kernel.
Compile them with your kernel. They are built during 'make' (not
'make modules') when CONFIG_SAMPLE_TRACEPOINTS=m.
Run, as root :
modprobe tracepoint-example (insmod order is not important)
modprobe tracepoint-probe-example
cat /proc/tracepoint-example (returns an expected error)
rmmod tracepoint-example tracepoint-probe-example
modprobe tracepoint-sample (insmod order is not important)
modprobe tracepoint-probe-sample
cat /proc/tracepoint-sample (returns an expected error)
rmmod tracepoint-sample tracepoint-probe-sample
dmesg
kmemtrace - Kernel Memory Tracer
by Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro>
I. Introduction
===============
kmemtrace helps kernel developers figure out two things:
1) how different allocators (SLAB, SLUB etc.) perform
2) how kernel code allocates memory and how much
To do this, we trace every allocation and export information to the userspace
through the relay interface. We export things such as the number of requested
bytes, the number of bytes actually allocated (i.e. including internal
fragmentation), whether this is a slab allocation or a plain kmalloc() and so
on.
The actual analysis is performed by a userspace tool (see section III for
details on where to get it from). It logs the data exported by the kernel,
processes it and (as of writing this) can provide the following information:
- the total amount of memory allocated and fragmentation per call-site
- the amount of memory allocated and fragmentation per allocation
- total memory allocated and fragmentation in the collected dataset
- number of cross-CPU allocation and frees (makes sense in NUMA environments)
Moreover, it can potentially find inconsistent and erroneous behavior in
kernel code, such as using slab free functions on kmalloc'ed memory or
allocating less memory than requested (but not truly failed allocations).
kmemtrace also makes provisions for tracing on some arch and analysing the
data on another.
II. Design and goals
====================
kmemtrace was designed to handle rather large amounts of data. Thus, it uses
the relay interface to export whatever is logged to userspace, which then
stores it. Analysis and reporting is done asynchronously, that is, after the
data is collected and stored. By design, it allows one to log and analyse
on different machines and different arches.
As of writing this, the ABI is not considered stable, though it might not
change much. However, no guarantees are made about compatibility yet. When
deemed stable, the ABI should still allow easy extension while maintaining
backward compatibility. This is described further in Documentation/ABI.
Summary of design goals:
- allow logging and analysis to be done across different machines
- be fast and anticipate usage in high-load environments (*)
- be reasonably extensible
- make it possible for GNU/Linux distributions to have kmemtrace
included in their repositories
(*) - one of the reasons Pekka Enberg's original userspace data analysis
tool's code was rewritten from Perl to C (although this is more than a
simple conversion)
III. Quick usage guide
======================
1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
CONFIG_KMEMTRACE).
2) Get the userspace tool and build it:
$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository
$ cd kmemtrace-user/
$ ./autogen.sh
$ ./configure
$ make
3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
'single' runlevel (so that relay buffers don't fill up easily), and run
kmemtrace:
# '$' does not mean user, but root here.
$ mount -t debugfs none /sys/kernel/debug
$ mount -t proc none /proc
$ cd path/to/kmemtrace-user/
$ ./kmemtraced
Wait a bit, then stop it with CTRL+C.
$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't
# overrun, should
# be zero.
$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
check its correctness]
$ ./kmemtrace-report
Now you should have a nice and short summary of how the allocator performs.
IV. FAQ and known issues
========================
Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix
this? Should I worry?
A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
large the number is. You can fix it by supplying a higher
'kmemtrace.subbufs=N' kernel parameter.
---
Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
A: This is a bug and should be reported. It can occur for a variety of
reasons:
- possible bugs in relay code
- possible misuse of relay by kmemtrace
- timestamps being collected unorderly
Or you may fix it yourself and send us a patch.
---
Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
A: This is a known issue and I'm working on it. These might be true errors
in kernel code, which may have inconsistent behavior (e.g. allocating memory
with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
out this behavior may work with SLAB, but may fail with other allocators.
It may also be due to lack of tracing in some unusual allocator functions.
We don't want bug reports regarding this issue yet.
---
V. See also
===========
Documentation/kernel-parameters.txt
Documentation/ABI/testing/debugfs-kmemtrace
......@@ -2654,6 +2654,12 @@ M: jason.wessel@windriver.com
L: kgdb-bugreport@lists.sourceforge.net
S: Maintained
KMEMTRACE
P: Eduard - Gabriel Munteanu
M: eduard.munteanu@linux360.ro
L: linux-kernel@vger.kernel.org
S: Maintained
KPROBES
P: Ananth N Mavinakayanahalli
M: ananth@in.ibm.com
......
......@@ -6,6 +6,7 @@ config OPROFILE
tristate "OProfile system profiling (EXPERIMENTAL)"
depends on PROFILING
depends on HAVE_OPROFILE
depends on TRACING_SUPPORT
select TRACING
select RING_BUFFER
help
......
......@@ -14,17 +14,4 @@ typedef struct {
void ack_bad_irq(unsigned int irq);
#define HARDIRQ_BITS 12
/*
* The hardirq mask has to be large enough to have
* space for potentially nestable IRQ sources in the system
* to nest on a single CPU. On Alpha, interrupts are masked at the CPU
* by IPL as well as at the system level. We only have 8 IPLs (UNIX PALcode)
* so we really only have 8 nestable IRQs, but allow some overhead
*/
#if (1 << HARDIRQ_BITS) < 16
#error HARDIRQ_BITS is too low!
#endif
#endif /* _ALPHA_HARDIRQ_H */
......@@ -20,15 +20,4 @@ void ack_bad_irq(unsigned int irq);
#endif /* __ASSEMBLY__ */
#define HARDIRQ_BITS 12
/*
* The hardirq mask has to be large enough to have
* space for potentially all IRQ sources in the system
* nesting on a single CPU:
*/
#if (1 << HARDIRQ_BITS) < NR_IRQS
# error HARDIRQ_BITS is too low!
#endif
#endif /* __ASM_AVR32_HARDIRQ_H */
......@@ -22,6 +22,9 @@ config IA64
select HAVE_OPROFILE
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_DYNAMIC_FTRACE if (!ITANIUM)
select HAVE_FUNCTION_TRACER
select HAVE_DMA_ATTRS
select HAVE_KVM
select HAVE_ARCH_TRACEHOOK
......
#ifndef _ASM_IA64_FTRACE_H
#define _ASM_IA64_FTRACE_H
#ifdef CONFIG_FUNCTION_TRACER
#define MCOUNT_INSN_SIZE 32 /* sizeof mcount call */
#ifndef __ASSEMBLY__
extern void _mcount(unsigned long pfs, unsigned long r1, unsigned long b0, unsigned long r0);
#define mcount _mcount
#include <asm/kprobes.h>
/* In IA64, MCOUNT_ADDR is set in link time, so it's not a constant at compile time */
#define MCOUNT_ADDR (((struct fnptr *)mcount)->ip)
#define FTRACE_ADDR (((struct fnptr *)ftrace_caller)->ip)
static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
/* second bundle, insn 2 */
return addr - 0x12;
}
struct dyn_arch_ftrace {
};
#endif
#endif /* CONFIG_FUNCTION_TRACER */
#endif /* _ASM_IA64_FTRACE_H */
......@@ -20,16 +20,6 @@
#define local_softirq_pending() (local_cpu_data->softirq_pending)
#define HARDIRQ_BITS 14
/*
* The hardirq mask has to be large enough to have space for potentially all IRQ sources
* in the system nesting on a single CPU:
*/
#if (1 << HARDIRQ_BITS) < NR_IRQS
# error HARDIRQ_BITS is too low!
#endif
extern void __iomem *ipi_base_addr;
void ack_bad_irq(unsigned int irq);
......
......@@ -2,6 +2,10 @@
# Makefile for the linux kernel.
#
ifdef CONFIG_DYNAMIC_FTRACE
CFLAGS_REMOVE_ftrace.o = -pg
endif
extra-y := head.o init_task.o vmlinux.lds
obj-y := acpi.o entry.o efi.o efi_stub.o gate-data.o fsys.o ia64_ksyms.o irq.o irq_ia64.o \
......@@ -28,6 +32,7 @@ obj-$(CONFIG_IA64_CYCLONE) += cyclone.o
obj-$(CONFIG_CPU_FREQ) += cpufreq/
obj-$(CONFIG_IA64_MCA_RECOVERY) += mca_recovery.o
obj-$(CONFIG_KPROBES) += kprobes.o jprobes.o
obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o
obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o crash.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
obj-$(CONFIG_IA64_UNCACHED_ALLOCATOR) += uncached.o
......
......@@ -47,6 +47,7 @@
#include <asm/processor.h>
#include <asm/thread_info.h>
#include <asm/unistd.h>
#include <asm/ftrace.h>
#include "minstate.h"
......@@ -1404,6 +1405,105 @@ GLOBAL_ENTRY(unw_init_running)
br.ret.sptk.many rp
END(unw_init_running)
#ifdef CONFIG_FUNCTION_TRACER
#ifdef CONFIG_DYNAMIC_FTRACE
GLOBAL_ENTRY(_mcount)
br ftrace_stub
END(_mcount)
.here:
br.ret.sptk.many b0
GLOBAL_ENTRY(ftrace_caller)
alloc out0 = ar.pfs, 8, 0, 4, 0
mov out3 = r0
;;
mov out2 = b0
add r3 = 0x20, r3
mov out1 = r1;
br.call.sptk.many b0 = ftrace_patch_gp
//this might be called from module, so we must patch gp
ftrace_patch_gp:
movl gp=__gp
mov b0 = r3
;;
.global ftrace_call;
ftrace_call:
{
.mlx
nop.m 0x0
movl r3 = .here;;
}
alloc loc0 = ar.pfs, 4, 4, 2, 0
;;
mov loc1 = b0
mov out0 = b0
mov loc2 = r8
mov loc3 = r15
;;
adds out0 = -MCOUNT_INSN_SIZE, out0
mov out1 = in2
mov b6 = r3
br.call.sptk.many b0 = b6
;;
mov ar.pfs = loc0
mov b0 = loc1
mov r8 = loc2
mov r15 = loc3
br ftrace_stub
;;
END(ftrace_caller)
#else
GLOBAL_ENTRY(_mcount)
movl r2 = ftrace_stub
movl r3 = ftrace_trace_function;;
ld8 r3 = [r3];;
ld8 r3 = [r3];;
cmp.eq p7,p0 = r2, r3
(p7) br.sptk.many ftrace_stub
;;
alloc loc0 = ar.pfs, 4, 4, 2, 0
;;
mov loc1 = b0
mov out0 = b0
mov loc2 = r8
mov loc3 = r15
;;
adds out0 = -MCOUNT_INSN_SIZE, out0
mov out1 = in2
mov b6 = r3
br.call.sptk.many b0 = b6
;;
mov ar.pfs = loc0
mov b0 = loc1
mov r8 = loc2
mov r15 = loc3
br ftrace_stub
;;
END(_mcount)
#endif
GLOBAL_ENTRY(ftrace_stub)
mov r3 = b0
movl r2 = _mcount_ret_helper
;;
mov b6 = r2
mov b7 = r3
br.ret.sptk.many b6
_mcount_ret_helper:
mov b0 = r42
mov r1 = r41
mov ar.pfs = r40
br b7
END(ftrace_stub)
#endif /* CONFIG_FUNCTION_TRACER */
.rodata
.align 8
.globl sys_call_table
......
/*
* Dynamic function tracing support.
*
* Copyright (C) 2008 Shaohua Li <shaohua.li@intel.com>
*
* For licencing details, see COPYING.
*
* Defines low-level handling of mcount calls when the kernel
* is compiled with the -pg flag. When using dynamic ftrace, the
* mcount call-sites get patched lazily with NOP till they are
* enabled. All code mutation routines here take effect atomically.
*/
#include <linux/uaccess.h>
#include <linux/ftrace.h>
#include <asm/cacheflush.h>
#include <asm/patch.h>
/* In IA64, each function will be added below two bundles with -pg option */
static unsigned char __attribute__((aligned(8)))
ftrace_orig_code[MCOUNT_INSN_SIZE] = {
0x02, 0x40, 0x31, 0x10, 0x80, 0x05, /* alloc r40=ar.pfs,12,8,0 */
0xb0, 0x02, 0x00, 0x00, 0x42, 0x40, /* mov r43=r0;; */
0x05, 0x00, 0xc4, 0x00, /* mov r42=b0 */
0x11, 0x48, 0x01, 0x02, 0x00, 0x21, /* mov r41=r1 */
0x00, 0x00, 0x00, 0x02, 0x00, 0x00, /* nop.i 0x0 */
0x08, 0x00, 0x00, 0x50 /* br.call.sptk.many b0 = _mcount;; */
};
struct ftrace_orig_insn {
u64 dummy1, dummy2, dummy3;
u64 dummy4:64-41+13;
u64 imm20:20;
u64 dummy5:3;
u64 sign:1;
u64 dummy6:4;
};
/* mcount stub will be converted below for nop */
static unsigned char ftrace_nop_code[MCOUNT_INSN_SIZE] = {
0x00, 0x00, 0x00, 0x00, 0x01, 0x00, /* [MII] nop.m 0x0 */
0x30, 0x00, 0x00, 0x60, 0x00, 0x00, /* mov r3=ip */
0x00, 0x00, 0x04, 0x00, /* nop.i 0x0 */
0x05, 0x00, 0x00, 0x00, 0x01, 0x00, /* [MLX] nop.m 0x0 */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* nop.x 0x0;; */
0x00, 0x00, 0x04, 0x00
};
static unsigned char *ftrace_nop_replace(void)
{
return ftrace_nop_code;
}
/*
* mcount stub will be converted below for call
* Note: Just the last instruction is changed against nop
* */
static unsigned char __attribute__((aligned(8)))
ftrace_call_code[MCOUNT_INSN_SIZE] = {
0x00, 0x00, 0x00, 0x00, 0x01, 0x00, /* [MII] nop.m 0x0 */
0x30, 0x00, 0x00, 0x60, 0x00, 0x00, /* mov r3=ip */
0x00, 0x00, 0x04, 0x00, /* nop.i 0x0 */
0x05, 0x00, 0x00, 0x00, 0x01, 0x00, /* [MLX] nop.m 0x0 */
0xff, 0xff, 0xff, 0xff, 0x7f, 0x00, /* brl.many .;;*/
0xf8, 0xff, 0xff, 0xc8
};
struct ftrace_call_insn {
u64 dummy1, dummy2;
u64 dummy3:48;
u64 imm39_l:16;
u64 imm39_h:23;
u64 dummy4:13;
u64 imm20:20;
u64 dummy5:3;
u64 i:1;
u64 dummy6:4;
};
static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
{
struct ftrace_call_insn *code = (void *)ftrace_call_code;
unsigned long offset = addr - (ip + 0x10);
code->imm39_l = offset >> 24;
code->imm39_h = offset >> 40;
code->imm20 = offset >> 4;
code->i = offset >> 63;
return ftrace_call_code;
}
static int
ftrace_modify_code(unsigned long ip, unsigned char *old_code,
unsigned char *new_code, int do_check)
{
unsigned char replaced[MCOUNT_INSN_SIZE];
/*
* Note: Due to modules and __init, code can
* disappear and change, we need to protect against faulting
* as well as code changing. We do this by using the
* probe_kernel_* functions.
*
* No real locking needed, this code is run through
* kstop_machine, or before SMP starts.
*/
if (!do_check)
goto skip_check;
/* read the text we want to modify */
if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;
/* Make sure it is what we expect it to be */
if (memcmp(replaced, old_code, MCOUNT_INSN_SIZE) != 0)
return -EINVAL;
skip_check:
/* replace the text with the new text */
if (probe_kernel_write(((void *)ip), new_code, MCOUNT_INSN_SIZE))
return -EPERM;
flush_icache_range(ip, ip + MCOUNT_INSN_SIZE);
return 0;
}
static int ftrace_make_nop_check(struct dyn_ftrace *rec, unsigned long addr)
{
unsigned char __attribute__((aligned(8))) replaced[MCOUNT_INSN_SIZE];
unsigned long ip = rec->ip;
if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;
if (rec->flags & FTRACE_FL_CONVERTED) {
struct ftrace_call_insn *call_insn, *tmp_call;
call_insn = (void *)ftrace_call_code;
tmp_call = (void *)replaced;
call_insn->imm39_l = tmp_call->imm39_l;
call_insn->imm39_h = tmp_call->imm39_h;
call_insn->imm20 = tmp_call->imm20;
call_insn->i = tmp_call->i;
if (memcmp(replaced, ftrace_call_code, MCOUNT_INSN_SIZE) != 0)
return -EINVAL;
return 0;
} else {
struct ftrace_orig_insn *call_insn, *tmp_call;
call_insn = (void *)ftrace_orig_code;
tmp_call = (void *)replaced;
call_insn->sign = tmp_call->sign;
call_insn->imm20 = tmp_call->imm20;
if (memcmp(replaced, ftrace_orig_code, MCOUNT_INSN_SIZE) != 0)
return -EINVAL;
return 0;
}
}
int ftrace_make_nop(struct module *mod,
struct dyn_ftrace *rec, unsigned long addr)
{
int ret;
char *new;
ret = ftrace_make_nop_check(rec, addr);
if (ret)
return ret;
new = ftrace_nop_replace();
return ftrace_modify_code(rec->ip, NULL, new, 0);
}
int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
unsigned long ip = rec->ip;
unsigned char *old, *new;
old= ftrace_nop_replace();
new = ftrace_call_replace(ip, addr);
return ftrace_modify_code(ip, old, new, 1);
}
/* in IA64, _mcount can't directly call ftrace_stub. Only jump is ok */
int ftrace_update_ftrace_func(ftrace_func_t func)
{
unsigned long ip;
unsigned long addr = ((struct fnptr *)ftrace_call)->ip;
if (func == ftrace_stub)
return 0;
ip = ((struct fnptr *)func)->ip;
ia64_patch_imm64(addr + 2, ip);
flush_icache_range(addr, addr + 16);
return 0;
}
/* run from kstop_machine */
int __init ftrace_dyn_arch_init(void *data)
{
*(unsigned long *)data = 0;
return 0;
}
......@@ -112,3 +112,9 @@ EXPORT_SYMBOL_GPL(esi_call_phys);
#endif
extern char ia64_ivt[];
EXPORT_SYMBOL(ia64_ivt);
#include <asm/ftrace.h>
#ifdef CONFIG_FUNCTION_TRACER
/* mcount is defined in assembly */
EXPORT_SYMBOL(_mcount);
#endif
......@@ -557,7 +557,6 @@ extern void mod_return_to_handler(void);
void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
{
unsigned long old;
unsigned long long calltime;
int faulted;
struct ftrace_graph_ent trace;
unsigned long return_hooker = (unsigned long)&return_to_handler;
......@@ -606,10 +605,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
return;
}
calltime = cpu_clock(raw_smp_processor_id());
if (ftrace_push_return_trace(old, calltime,
self_addr, &trace.depth) == -EBUSY) {
if (ftrace_push_return_trace(old, self_addr, &trace.depth) == -EBUSY) {
*parent = old;
return;
}
......
......@@ -34,6 +34,8 @@ config X86
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACE_MCOUNT_TEST
select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE
select HAVE_FTRACE_SYSCALLS
select HAVE_KVM
select HAVE_ARCH_KGDB
select HAVE_ARCH_TRACEHOOK
......
......@@ -126,6 +126,11 @@ void clflush_cache_range(void *addr, unsigned int size);
#ifdef CONFIG_DEBUG_RODATA
void mark_rodata_ro(void);
extern const int rodata_test_data;
void set_kernel_text_rw(void);
void set_kernel_text_ro(void);
#else
static inline void set_kernel_text_rw(void) { }
static inline void set_kernel_text_ro(void) { }
#endif
#ifdef CONFIG_DEBUG_RODATA_TEST
......
......@@ -111,6 +111,8 @@ enum fixed_addresses {
#ifdef CONFIG_PARAVIRT
FIX_PARAVIRT_BOOTMAP,
#endif
FIX_TEXT_POKE0, /* reserve 2 pages for text_poke() */
FIX_TEXT_POKE1,
__end_of_permanent_fixed_addresses,
#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
FIX_OHCI1394_BASE,
......
......@@ -28,6 +28,13 @@
#endif
/* FIXME: I don't want to stay hardcoded */
#ifdef CONFIG_X86_64
# define FTRACE_SYSCALL_MAX 296
#else
# define FTRACE_SYSCALL_MAX 333
#endif
#ifdef CONFIG_FUNCTION_TRACER
#define MCOUNT_ADDR ((long)(mcount))
#define MCOUNT_INSN_SIZE 5 /* sizeof mcount call */
......
......@@ -80,8 +80,6 @@
#define PTRACE_SINGLEBLOCK 33 /* resume execution until next branch */
#ifdef CONFIG_X86_PTRACE_BTS
#ifndef __ASSEMBLY__
#include <linux/types.h>
......@@ -140,6 +138,5 @@ struct ptrace_bts_config {
BTS records are read from oldest to newest.
Returns number of BTS records drained.
*/
#endif /* CONFIG_X86_PTRACE_BTS */
#endif /* _ASM_X86_PTRACE_ABI_H */
......@@ -94,6 +94,7 @@ struct thread_info {
#define TIF_FORCED_TF 24 /* true if TF in eflags artificially */
#define TIF_DEBUGCTLMSR 25 /* uses thread_struct.debugctlmsr */
#define TIF_DS_AREA_MSR 26 /* uses thread_struct.ds_area_msr */
#define TIF_SYSCALL_FTRACE 27 /* for ftrace syscall instrumentation */
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
......@@ -115,15 +116,17 @@ struct thread_info {
#define _TIF_FORCED_TF (1 << TIF_FORCED_TF)
#define _TIF_DEBUGCTLMSR (1 << TIF_DEBUGCTLMSR)
#define _TIF_DS_AREA_MSR (1 << TIF_DS_AREA_MSR)
#define _TIF_SYSCALL_FTRACE (1 << TIF_SYSCALL_FTRACE)
/* work to do in syscall_trace_enter() */
#define _TIF_WORK_SYSCALL_ENTRY \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_FTRACE | \
_TIF_SYSCALL_AUDIT | _TIF_SECCOMP | _TIF_SINGLESTEP)
/* work to do in syscall_trace_leave() */
#define _TIF_WORK_SYSCALL_EXIT \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP)
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP | \
_TIF_SYSCALL_FTRACE)
/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK \
......@@ -132,7 +135,7 @@ struct thread_info {
_TIF_SINGLESTEP|_TIF_SECCOMP|_TIF_SYSCALL_EMU))
/* work to do on any return to user space */
#define _TIF_ALLWORK_MASK (0x0000FFFF & ~_TIF_SECCOMP)
#define _TIF_ALLWORK_MASK ((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_FTRACE)
/* Only used for 64 bit */
#define _TIF_DO_NOTIFY_MASK \
......
......@@ -66,7 +66,8 @@ obj-$(CONFIG_X86_MPPARSE) += mpparse.o
obj-y += apic/
obj-$(CONFIG_X86_REBOOTFIXUPS) += reboot_fixups_32.o
obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o
obj-$(CONFIG_KEXEC) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC) += relocate_kernel_$(BITS).o crash.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o
......
......@@ -5,6 +5,7 @@
#include <linux/kprobes.h>
#include <linux/mm.h>
#include <linux/vmalloc.h>
#include <linux/memory.h>
#include <asm/alternative.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
......@@ -12,7 +13,9 @@
#include <asm/nmi.h>
#include <asm/vsyscall.h>
#include <asm/cacheflush.h>
#include <asm/tlbflush.h>
#include <asm/io.h>
#include <asm/fixmap.h>
#define MAX_PATCH_LEN (255-1)
......@@ -226,6 +229,7 @@ static void alternatives_smp_lock(u8 **start, u8 **end, u8 *text, u8 *text_end)
{
u8 **ptr;
mutex_lock(&text_mutex);
for (ptr = start; ptr < end; ptr++) {
if (*ptr < text)
continue;
......@@ -234,6 +238,7 @@ static void alternatives_smp_lock(u8 **start, u8 **end, u8 *text, u8 *text_end)
/* turn DS segment override prefix into lock prefix */
text_poke(*ptr, ((unsigned char []){0xf0}), 1);
};
mutex_unlock(&text_mutex);
}
static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
......@@ -243,6 +248,7 @@ static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end
if (noreplace_smp)
return;
mutex_lock(&text_mutex);
for (ptr = start; ptr < end; ptr++) {
if (*ptr < text)
continue;
......@@ -251,6 +257,7 @@ static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end
/* turn lock prefix into DS segment override prefix */
text_poke(*ptr, ((unsigned char []){0x3E}), 1);
};
mutex_unlock(&text_mutex);
}
struct smp_alt_module {
......@@ -500,15 +507,16 @@ void *text_poke_early(void *addr, const void *opcode, size_t len)
* It means the size must be writable atomically and the address must be aligned
* in a way that permits an atomic write. It also makes sure we fit on a single
* page.
*
* Note: Must be called under text_mutex.
*/
void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
{
unsigned long flags;
char *vaddr;
int nr_pages = 2;
struct page *pages[2];
int i;
might_sleep();
if (!core_kernel_text((unsigned long)addr)) {
pages[0] = vmalloc_to_page(addr);
pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
......@@ -518,18 +526,21 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
pages[1] = virt_to_page(addr + PAGE_SIZE);
}
BUG_ON(!pages[0]);
if (!pages[1])
nr_pages = 1;
vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
BUG_ON(!vaddr);
local_irq_disable();
local_irq_save(flags);
set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
if (pages[1])
set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
local_irq_enable();
vunmap(vaddr);
clear_fixmap(FIX_TEXT_POKE0);
if (pages[1])
clear_fixmap(FIX_TEXT_POKE1);
local_flush_tlb();
sync_core();
/* Could also do a CLFLUSH here to speed up CPU recovery; but
that causes hangs on some VIA CPUs. */
for (i = 0; i < len; i++)
BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
local_irq_restore(flags);
return addr;
}
......@@ -33,7 +33,7 @@
#include <linux/cpufreq.h>
#include <linux/compiler.h>
#include <linux/dmi.h>
#include <linux/ftrace.h>
#include <trace/power.h>
#include <linux/acpi.h>
#include <linux/io.h>
......@@ -72,6 +72,8 @@ struct acpi_cpufreq_data {
static DEFINE_PER_CPU(struct acpi_cpufreq_data *, drv_data);
DEFINE_TRACE(power_mark);
/* acpi_perf_data is a pointer to percpu data. */
static struct acpi_processor_performance *acpi_perf_data;
......
......@@ -15,6 +15,7 @@
#include <linux/bug.h>
#include <linux/nmi.h>
#include <linux/sysfs.h>
#include <linux/ftrace.h>
#include <asm/stacktrace.h>
......@@ -196,6 +197,11 @@ unsigned __kprobes long oops_begin(void)
int cpu;
unsigned long flags;
/* notify the hw-branch tracer so it may disable tracing and
add the last trace to the trace buffer -
the earlier this happens, the more useful the trace. */
trace_hw_branch_oops();
oops_enter();
/* racy, but better than risking deadlock. */
......
......@@ -18,6 +18,7 @@
#include <linux/init.h>
#include <linux/list.h>
#include <asm/cacheflush.h>
#include <asm/ftrace.h>
#include <linux/ftrace.h>
#include <asm/nops.h>
......@@ -26,6 +27,18 @@
#ifdef CONFIG_DYNAMIC_FTRACE
int ftrace_arch_code_modify_prepare(void)
{
set_kernel_text_rw();
return 0;
}
int ftrace_arch_code_modify_post_process(void)
{
set_kernel_text_ro();
return 0;
}
union ftrace_code_union {
char code[MCOUNT_INSN_SIZE];
struct {
......@@ -66,11 +79,11 @@ static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
*
* 1) Put the instruction pointer into the IP buffer
* and the new code into the "code" buffer.
* 2) Set a flag that says we are modifying code
* 3) Wait for any running NMIs to finish.
* 4) Write the code
* 5) clear the flag.
* 6) Wait for any running NMIs to finish.
* 2) Wait for any running NMIs to finish and set a flag that says
* we are modifying code, it is done in an atomic operation.
* 3) Write the code
* 4) clear the flag.
* 5) Wait for any running NMIs to finish.
*
* If an NMI is executed, the first thing it does is to call
* "ftrace_nmi_enter". This will check if the flag is set to write
......@@ -82,9 +95,9 @@ static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
* are the same as what exists.
*/
static atomic_t in_nmi = ATOMIC_INIT(0);
#define MOD_CODE_WRITE_FLAG (1 << 31) /* set when NMI should do the write */
static atomic_t nmi_running = ATOMIC_INIT(0);
static int mod_code_status; /* holds return value of text write */
static int mod_code_write; /* set when NMI should do the write */
static void *mod_code_ip; /* holds the IP to write to */
static void *mod_code_newcode; /* holds the text to write to the IP */
......@@ -101,6 +114,20 @@ int ftrace_arch_read_dyn_info(char *buf, int size)
return r;
}
static void clear_mod_flag(void)
{
int old = atomic_read(&nmi_running);
for (;;) {
int new = old & ~MOD_CODE_WRITE_FLAG;
if (old == new)
break;
old = atomic_cmpxchg(&nmi_running, old, new);
}
}
static void ftrace_mod_code(void)
{
/*
......@@ -111,37 +138,52 @@ static void ftrace_mod_code(void)
*/
mod_code_status = probe_kernel_write(mod_code_ip, mod_code_newcode,
MCOUNT_INSN_SIZE);
/* if we fail, then kill any new writers */
if (mod_code_status)
clear_mod_flag();
}
void ftrace_nmi_enter(void)
{
atomic_inc(&in_nmi);
/* Must have in_nmi seen before reading write flag */
smp_mb();
if (mod_code_write) {
if (atomic_inc_return(&nmi_running) & MOD_CODE_WRITE_FLAG) {
smp_rmb();
ftrace_mod_code();
atomic_inc(&nmi_update_count);
}
/* Must have previous changes seen before executions */
smp_mb();
}
void ftrace_nmi_exit(void)
{
/* Finish all executions before clearing in_nmi */
smp_wmb();
atomic_dec(&in_nmi);
/* Finish all executions before clearing nmi_running */
smp_mb();
atomic_dec(&nmi_running);
}
static void wait_for_nmi_and_set_mod_flag(void)
{
if (!atomic_cmpxchg(&nmi_running, 0, MOD_CODE_WRITE_FLAG))
return;
do {
cpu_relax();
} while (atomic_cmpxchg(&nmi_running, 0, MOD_CODE_WRITE_FLAG));
nmi_wait_count++;
}
static void wait_for_nmi(void)
{
int waited = 0;
if (!atomic_read(&nmi_running))
return;
while (atomic_read(&in_nmi)) {
waited = 1;
do {
cpu_relax();
}
} while (atomic_read(&nmi_running));
if (waited)
nmi_wait_count++;
nmi_wait_count++;
}
static int
......@@ -151,14 +193,9 @@ do_ftrace_mod_code(unsigned long ip, void *new_code)
mod_code_newcode = new_code;
/* The buffers need to be visible before we let NMIs write them */
smp_wmb();
mod_code_write = 1;
/* Make sure write bit is visible before we wait on NMIs */
smp_mb();
wait_for_nmi();
wait_for_nmi_and_set_mod_flag();
/* Make sure all running NMIs have finished before we write the code */
smp_mb();
......@@ -166,13 +203,9 @@ do_ftrace_mod_code(unsigned long ip, void *new_code)
ftrace_mod_code();
/* Make sure the write happens before clearing the bit */
smp_wmb();
mod_code_write = 0;
/* make sure NMIs see the cleared bit */
smp_mb();
clear_mod_flag();
wait_for_nmi();
return mod_code_status;
......@@ -368,25 +401,6 @@ int ftrace_disable_ftrace_graph_caller(void)
return ftrace_mod_jmp(ip, old_offset, new_offset);
}
#else /* CONFIG_DYNAMIC_FTRACE */
/*
* These functions are picked from those used on
* this page for dynamic ftrace. They have been
* simplified to ignore all traces in NMI context.
*/
static atomic_t in_nmi;
void ftrace_nmi_enter(void)
{
atomic_inc(&in_nmi);
}
void ftrace_nmi_exit(void)
{
atomic_dec(&in_nmi);
}
#endif /* !CONFIG_DYNAMIC_FTRACE */
/*
......@@ -396,14 +410,13 @@ void ftrace_nmi_exit(void)
void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
{
unsigned long old;
unsigned long long calltime;
int faulted;
struct ftrace_graph_ent trace;
unsigned long return_hooker = (unsigned long)
&return_to_handler;
/* Nmi's are currently unsupported */
if (unlikely(atomic_read(&in_nmi)))
if (unlikely(in_nmi()))
return;
if (unlikely(atomic_read(&current->tracing_graph_pause)))
......@@ -439,17 +452,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
return;
}
if (unlikely(!__kernel_text_address(old))) {
ftrace_graph_stop();
*parent = old;
WARN_ON(1);
return;
}
calltime = cpu_clock(raw_smp_processor_id());
if (ftrace_push_return_trace(old, calltime,
self_addr, &trace.depth) == -EBUSY) {
if (ftrace_push_return_trace(old, self_addr, &trace.depth) == -EBUSY) {
*parent = old;
return;
}
......@@ -463,3 +466,66 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
}
}
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
#ifdef CONFIG_FTRACE_SYSCALLS
extern unsigned long __start_syscalls_metadata[];
extern unsigned long __stop_syscalls_metadata[];
extern unsigned long *sys_call_table;
static struct syscall_metadata **syscalls_metadata;
static struct syscall_metadata *find_syscall_meta(unsigned long *syscall)
{
struct syscall_metadata *start;
struct syscall_metadata *stop;
char str[KSYM_SYMBOL_LEN];
start = (struct syscall_metadata *)__start_syscalls_metadata;
stop = (struct syscall_metadata *)__stop_syscalls_metadata;
kallsyms_lookup((unsigned long) syscall, NULL, NULL, NULL, str);
for ( ; start < stop; start++) {
if (start->name && !strcmp(start->name, str))
return start;
}
return NULL;
}
struct syscall_metadata *syscall_nr_to_meta(int nr)
{
if (!syscalls_metadata || nr >= FTRACE_SYSCALL_MAX || nr < 0)
return NULL;
return syscalls_metadata[nr];
}
void arch_init_ftrace_syscalls(void)
{
int i;
struct syscall_metadata *meta;
unsigned long **psys_syscall_table = &sys_call_table;
static atomic_t refs;
if (atomic_inc_return(&refs) != 1)
goto end;
syscalls_metadata = kzalloc(sizeof(*syscalls_metadata) *
FTRACE_SYSCALL_MAX, GFP_KERNEL);
if (!syscalls_metadata) {
WARN_ON(1);
return;
}
for (i = 0; i < FTRACE_SYSCALL_MAX; i++) {
meta = find_syscall_meta(psys_syscall_table[i]);
syscalls_metadata[i] = meta;
}
return;
/* Paranoid: avoid overflow */
end:
atomic_dec(&refs);
}
#endif
......@@ -638,13 +638,13 @@ static void __used __kprobes kretprobe_trampoline_holder(void)
#else
" pushf\n"
/*
* Skip cs, ip, orig_ax.
* Skip cs, ip, orig_ax and gs.
* trampoline_handler() will plug in these values
*/
" subl $12, %esp\n"
" subl $16, %esp\n"
" pushl %fs\n"
" pushl %ds\n"
" pushl %es\n"
" pushl %ds\n"
" pushl %eax\n"
" pushl %ebp\n"
" pushl %edi\n"
......@@ -655,10 +655,10 @@ static void __used __kprobes kretprobe_trampoline_holder(void)
" movl %esp, %eax\n"
" call trampoline_handler\n"
/* Move flags to cs */
" movl 52(%esp), %edx\n"
" movl %edx, 48(%esp)\n"
" movl 56(%esp), %edx\n"
" movl %edx, 52(%esp)\n"
/* Replace saved flags with true return address. */
" movl %eax, 52(%esp)\n"
" movl %eax, 56(%esp)\n"
" popl %ebx\n"
" popl %ecx\n"
" popl %edx\n"
......@@ -666,8 +666,8 @@ static void __used __kprobes kretprobe_trampoline_holder(void)
" popl %edi\n"
" popl %ebp\n"
" popl %eax\n"
/* Skip ip, orig_ax, es, ds, fs */
" addl $20, %esp\n"
/* Skip ds, es, fs, gs, orig_ax and ip */
" addl $24, %esp\n"
" popf\n"
#endif
" ret\n");
......@@ -691,6 +691,7 @@ static __used __kprobes void *trampoline_handler(struct pt_regs *regs)
regs->cs = __KERNEL_CS;
#else
regs->cs = __KERNEL_CS | get_kernel_rpl();
regs->gs = 0;
#endif
regs->ip = trampoline_address;
regs->orig_ax = ~0UL;
......
......@@ -8,7 +8,7 @@
#include <linux/module.h>
#include <linux/pm.h>
#include <linux/clockchips.h>
#include <linux/ftrace.h>
#include <trace/power.h>
#include <asm/system.h>
#include <asm/apic.h>
#include <asm/idle.h>
......@@ -22,6 +22,9 @@ EXPORT_SYMBOL(idle_nomwait);
struct kmem_cache *task_xstate_cachep;
DEFINE_TRACE(power_start);
DEFINE_TRACE(power_end);
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
{
*dst = *src;
......
......@@ -21,6 +21,7 @@
#include <linux/audit.h>
#include <linux/seccomp.h>
#include <linux/signal.h>
#include <linux/ftrace.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
......@@ -1415,6 +1416,9 @@ asmregparm long syscall_trace_enter(struct pt_regs *regs)
tracehook_report_syscall_entry(regs))
ret = -1L;
if (unlikely(test_thread_flag(TIF_SYSCALL_FTRACE)))
ftrace_syscall_enter(regs);
if (unlikely(current->audit_context)) {
if (IS_IA32)
audit_syscall_entry(AUDIT_ARCH_I386,
......@@ -1438,6 +1442,9 @@ asmregparm void syscall_trace_leave(struct pt_regs *regs)
if (unlikely(current->audit_context))
audit_syscall_exit(AUDITSC_RESULT(regs->ax), regs->ax);
if (unlikely(test_thread_flag(TIF_SYSCALL_FTRACE)))
ftrace_syscall_exit(regs);
if (test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, 0);
......
......@@ -59,7 +59,8 @@ config KVM_AMD
config KVM_TRACE
bool "KVM trace support"
depends on KVM && MARKERS && SYSFS
depends on KVM && SYSFS
select MARKERS
select RELAY
select DEBUG_FS
default n
......
......@@ -1054,17 +1054,47 @@ static noinline int do_test_wp_bit(void)
const int rodata_test_data = 0xC3;
EXPORT_SYMBOL_GPL(rodata_test_data);
static int kernel_set_to_readonly;
void set_kernel_text_rw(void)
{
unsigned long start = PFN_ALIGN(_text);
unsigned long size = PFN_ALIGN(_etext) - start;
if (!kernel_set_to_readonly)
return;
pr_debug("Set kernel text: %lx - %lx for read write\n",
start, start+size);
set_pages_rw(virt_to_page(start), size >> PAGE_SHIFT);
}
void set_kernel_text_ro(void)
{
unsigned long start = PFN_ALIGN(_text);
unsigned long size = PFN_ALIGN(_etext) - start;
if (!kernel_set_to_readonly)
return;
pr_debug("Set kernel text: %lx - %lx for read only\n",
start, start+size);
set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
}
void mark_rodata_ro(void)
{
unsigned long start = PFN_ALIGN(_text);
unsigned long size = PFN_ALIGN(_etext) - start;
#ifndef CONFIG_DYNAMIC_FTRACE
/* Dynamic tracing modifies the kernel text section */
set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
printk(KERN_INFO "Write protecting the kernel text: %luk\n",
size >> 10);
kernel_set_to_readonly = 1;
#ifdef CONFIG_CPA_DEBUG
printk(KERN_INFO "Testing CPA: Reverting %lx-%lx\n",
start, start+size);
......@@ -1073,7 +1103,6 @@ void mark_rodata_ro(void)
printk(KERN_INFO "Testing CPA: write protecting again\n");
set_pages_ro(virt_to_page(start), size>>PAGE_SHIFT);
#endif
#endif /* CONFIG_DYNAMIC_FTRACE */
start += size;
size = (unsigned long)__end_rodata - start;
......
......@@ -734,21 +734,48 @@ void __init mem_init(void)
const int rodata_test_data = 0xC3;
EXPORT_SYMBOL_GPL(rodata_test_data);
static int kernel_set_to_readonly;
void set_kernel_text_rw(void)
{
unsigned long start = PFN_ALIGN(_stext);
unsigned long end = PFN_ALIGN(__start_rodata);
if (!kernel_set_to_readonly)
return;
pr_debug("Set kernel text: %lx - %lx for read write\n",
start, end);
set_memory_rw(start, (end - start) >> PAGE_SHIFT);
}
void set_kernel_text_ro(void)
{
unsigned long start = PFN_ALIGN(_stext);
unsigned long end = PFN_ALIGN(__start_rodata);
if (!kernel_set_to_readonly)
return;
pr_debug("Set kernel text: %lx - %lx for read only\n",
start, end);
set_memory_ro(start, (end - start) >> PAGE_SHIFT);
}
void mark_rodata_ro(void)
{
unsigned long start = PFN_ALIGN(_stext), end = PFN_ALIGN(__end_rodata);
unsigned long rodata_start =
((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
#ifdef CONFIG_DYNAMIC_FTRACE
/* Dynamic tracing modifies the kernel text section */
start = rodata_start;
#endif
printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
(end - start) >> 10);
set_memory_ro(start, (end - start) >> PAGE_SHIFT);
kernel_set_to_readonly = 1;
/*
* The rodata section (but not the kernel text!) should also be
* not-executable.
......
......@@ -44,22 +44,6 @@ config LBD
If unsure, say N.
config BLK_DEV_IO_TRACE
bool "Support for tracing block io actions"
depends on SYSFS
select RELAY
select DEBUG_FS
select TRACEPOINTS
help
Say Y here if you want to be able to trace the block layer actions
on a given queue. Tracing allows you to see any traffic happening
on a block device queue. For more information (and the userspace
support tools needed), fetch the blktrace tools from:
git://git.kernel.dk/blktrace.git
If unsure, say N.
config BLK_DEV_BSG
bool "Block layer SG support v4 (EXPERIMENTAL)"
depends on EXPERIMENTAL
......
......@@ -13,6 +13,5 @@ obj-$(CONFIG_IOSCHED_AS) += as-iosched.o
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
obj-$(CONFIG_BLK_DEV_INTEGRITY) += blk-integrity.o
......@@ -283,7 +283,7 @@ static void sysrq_ftrace_dump(int key, struct tty_struct *tty)
}
static struct sysrq_key_op sysrq_ftrace_dump_op = {
.handler = sysrq_ftrace_dump,
.help_msg = "dumpZ-ftrace-buffer",
.help_msg = "dump-ftrace-buffer(Z)",
.action_msg = "Dump ftrace buffer",
.enable_mask = SYSRQ_ENABLE_DUMP,
};
......
......@@ -161,7 +161,7 @@ struct op_sample
{
entry->event = ring_buffer_lock_reserve
(op_ring_buffer_write, sizeof(struct op_sample) +
size * sizeof(entry->sample->data[0]), &entry->irq_flags);
size * sizeof(entry->sample->data[0]));
if (entry->event)
entry->sample = ring_buffer_event_data(entry->event);
else
......@@ -178,8 +178,7 @@ struct op_sample
int op_cpu_buffer_write_commit(struct op_entry *entry)
{
return ring_buffer_unlock_commit(op_ring_buffer_write, entry->event,
entry->irq_flags);
return ring_buffer_unlock_commit(op_ring_buffer_write, entry->event);
}
struct op_sample *op_cpu_buffer_read_entry(struct op_entry *entry, int cpu)
......
......@@ -30,6 +30,7 @@
static struct vfsmount *debugfs_mount;
static int debugfs_mount_count;
static bool debugfs_registered;
static struct inode *debugfs_get_inode(struct super_block *sb, int mode, dev_t dev)
{
......@@ -496,6 +497,16 @@ struct dentry *debugfs_rename(struct dentry *old_dir, struct dentry *old_dentry,
}
EXPORT_SYMBOL_GPL(debugfs_rename);
/**
* debugfs_initialized - Tells whether debugfs has been registered
*/
bool debugfs_initialized(void)
{
return debugfs_registered;
}
EXPORT_SYMBOL_GPL(debugfs_initialized);
static struct kobject *debug_kobj;
static int __init debugfs_init(void)
......@@ -509,11 +520,16 @@ static int __init debugfs_init(void)
retval = register_filesystem(&debug_fs_type);
if (retval)
kobject_put(debug_kobj);
else
debugfs_registered = true;
return retval;
}
static void __exit debugfs_exit(void)
{
debugfs_registered = false;
simple_release_fs(&debugfs_mount, &debugfs_mount_count);
unregister_filesystem(&debug_fs_type);
kobject_put(debug_kobj);
......
......@@ -19,6 +19,7 @@
#include <linux/kmod.h>
#include <linux/ctype.h>
#include <linux/genhd.h>
#include <linux/blktrace_api.h>
#include "check.h"
......@@ -294,6 +295,9 @@ static struct attribute_group part_attr_group = {
static struct attribute_group *part_attr_groups[] = {
&part_attr_group,
#ifdef CONFIG_BLK_DEV_IO_TRACE
&blk_trace_attr_group,
#endif
NULL
};
......
......@@ -61,6 +61,30 @@
#define BRANCH_PROFILE()
#endif
#ifdef CONFIG_EVENT_TRACER
#define FTRACE_EVENTS() VMLINUX_SYMBOL(__start_ftrace_events) = .; \
*(_ftrace_events) \
VMLINUX_SYMBOL(__stop_ftrace_events) = .;
#else
#define FTRACE_EVENTS()
#endif
#ifdef CONFIG_TRACING
#define TRACE_PRINTKS() VMLINUX_SYMBOL(__start___trace_bprintk_fmt) = .; \
*(__trace_printk_fmt) /* Trace_printk fmt' pointer */ \
VMLINUX_SYMBOL(__stop___trace_bprintk_fmt) = .;
#else
#define TRACE_PRINTKS()
#endif
#ifdef CONFIG_FTRACE_SYSCALLS
#define TRACE_SYSCALLS() VMLINUX_SYMBOL(__start_syscalls_metadata) = .; \
*(__syscalls_metadata) \
VMLINUX_SYMBOL(__stop_syscalls_metadata) = .;
#else
#define TRACE_SYSCALLS()
#endif
/* .data section */
#define DATA_DATA \
*(.data) \
......@@ -86,7 +110,10 @@
*(__verbose) \
VMLINUX_SYMBOL(__stop___verbose) = .; \
LIKELY_PROFILE() \
BRANCH_PROFILE()
BRANCH_PROFILE() \
TRACE_PRINTKS() \
FTRACE_EVENTS() \
TRACE_SYSCALLS()
#define RO_DATA(align) \
. = ALIGN((align)); \
......
......@@ -144,6 +144,9 @@ struct blk_user_trace_setup {
#ifdef __KERNEL__
#if defined(CONFIG_BLK_DEV_IO_TRACE)
#include <linux/sysfs.h>
struct blk_trace {
int trace_state;
struct rchan *rchan;
......@@ -194,6 +197,8 @@ extern int blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
extern int blk_trace_startstop(struct request_queue *q, int start);
extern int blk_trace_remove(struct request_queue *q);
extern struct attribute_group blk_trace_attr_group;
#else /* !CONFIG_BLK_DEV_IO_TRACE */
#define blk_trace_ioctl(bdev, cmd, arg) (-ENOTTY)
#define blk_trace_shutdown(q) do { } while (0)
......
......@@ -68,6 +68,7 @@ struct ftrace_branch_data {
unsigned long miss;
unsigned long hit;
};
unsigned long miss_hit[2];
};
};
......@@ -125,10 +126,7 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
.line = __LINE__, \
}; \
______r = !!(cond); \
if (______r) \
______f.hit++; \
else \
______f.miss++; \
______f.miss_hit[______r]++; \
______r; \
}))
#endif /* CONFIG_PROFILE_ALL_BRANCHES */
......
......@@ -71,6 +71,9 @@ struct dentry *debugfs_create_bool(const char *name, mode_t mode,
struct dentry *debugfs_create_blob(const char *name, mode_t mode,
struct dentry *parent,
struct debugfs_blob_wrapper *blob);
bool debugfs_initialized(void);
#else
#include <linux/err.h>
......@@ -183,6 +186,11 @@ static inline struct dentry *debugfs_create_blob(const char *name, mode_t mode,
return ERR_PTR(-ENODEV);
}
static inline bool debugfs_initialized(void)
{
return false;
}
#endif
#endif
This diff is collapsed.
......@@ -2,7 +2,7 @@
#define _LINUX_FTRACE_IRQ_H
#if defined(CONFIG_DYNAMIC_FTRACE) || defined(CONFIG_FUNCTION_GRAPH_TRACER)
#ifdef CONFIG_FTRACE_NMI_ENTER
extern void ftrace_nmi_enter(void);
extern void ftrace_nmi_exit(void);
#else
......
......@@ -15,55 +15,61 @@
* - bits 0-7 are the preemption count (max preemption depth: 256)
* - bits 8-15 are the softirq count (max # of softirqs: 256)
*
* The hardirq count can be overridden per architecture, the default is:
* The hardirq count can in theory reach the same as NR_IRQS.
* In reality, the number of nested IRQS is limited to the stack
* size as well. For archs with over 1000 IRQS it is not practical
* to expect that they will all nest. We give a max of 10 bits for
* hardirq nesting. An arch may choose to give less than 10 bits.
* m68k expects it to be 8.
*
* - bits 16-27 are the hardirq count (max # of hardirqs: 4096)
* - ( bit 28 is the PREEMPT_ACTIVE flag. )
* - bits 16-25 are the hardirq count (max # of nested hardirqs: 1024)
* - bit 26 is the NMI_MASK
* - bit 28 is the PREEMPT_ACTIVE flag
*
* PREEMPT_MASK: 0x000000ff
* SOFTIRQ_MASK: 0x0000ff00
* HARDIRQ_MASK: 0x0fff0000
* HARDIRQ_MASK: 0x03ff0000
* NMI_MASK: 0x04000000
*/
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
#define NMI_BITS 1
#ifndef HARDIRQ_BITS
#define HARDIRQ_BITS 12
#define MAX_HARDIRQ_BITS 10
#ifndef MAX_HARDIRQS_PER_CPU
#define MAX_HARDIRQS_PER_CPU NR_IRQS
#ifndef HARDIRQ_BITS
# define HARDIRQ_BITS MAX_HARDIRQ_BITS
#endif
/*
* The hardirq mask has to be large enough to have space for potentially
* all IRQ sources in the system nesting on a single CPU.
*/
#if (1 << HARDIRQ_BITS) < MAX_HARDIRQS_PER_CPU
# error HARDIRQ_BITS is too low!
#endif
#if HARDIRQ_BITS > MAX_HARDIRQ_BITS
#error HARDIRQ_BITS too high!
#endif
#define PREEMPT_SHIFT 0
#define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)
#define HARDIRQ_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)
#define NMI_SHIFT (HARDIRQ_SHIFT + HARDIRQ_BITS)
#define __IRQ_MASK(x) ((1UL << (x))-1)
#define PREEMPT_MASK (__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
#define SOFTIRQ_MASK (__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
#define HARDIRQ_MASK (__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
#define NMI_MASK (__IRQ_MASK(NMI_BITS) << NMI_SHIFT)
#define PREEMPT_OFFSET (1UL << PREEMPT_SHIFT)
#define SOFTIRQ_OFFSET (1UL << SOFTIRQ_SHIFT)
#define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT)
#define NMI_OFFSET (1UL << NMI_SHIFT)
#if PREEMPT_ACTIVE < (1 << (HARDIRQ_SHIFT + HARDIRQ_BITS))
#if PREEMPT_ACTIVE < (1 << (NMI_SHIFT + NMI_BITS))
#error PREEMPT_ACTIVE is too low!
#endif
#define hardirq_count() (preempt_count() & HARDIRQ_MASK)
#define softirq_count() (preempt_count() & SOFTIRQ_MASK)
#define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
#define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK \
| NMI_MASK))
/*
* Are we doing bottom half or hardware interrupt processing?
......@@ -73,6 +79,11 @@
#define in_softirq() (softirq_count())
#define in_interrupt() (irq_count())
/*
* Are we in NMI context?
*/
#define in_nmi() (preempt_count() & NMI_MASK)
#if defined(CONFIG_PREEMPT)
# define PREEMPT_INATOMIC_BASE kernel_locked()
# define PREEMPT_CHECK_OFFSET 1
......@@ -164,20 +175,24 @@ extern void irq_enter(void);
*/
extern void irq_exit(void);
#define nmi_enter() \
do { \
ftrace_nmi_enter(); \
lockdep_off(); \
rcu_nmi_enter(); \
__irq_enter(); \
#define nmi_enter() \
do { \
ftrace_nmi_enter(); \
BUG_ON(in_nmi()); \
add_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET); \
lockdep_off(); \
rcu_nmi_enter(); \
trace_hardirq_enter(); \
} while (0)
#define nmi_exit() \
do { \
__irq_exit(); \
rcu_nmi_exit(); \
lockdep_on(); \
ftrace_nmi_exit(); \
#define nmi_exit() \
do { \
trace_hardirq_exit(); \
rcu_nmi_exit(); \
lockdep_on(); \
BUG_ON(!in_nmi()); \
sub_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET); \
ftrace_nmi_exit(); \
} while (0)
#endif /* LINUX_HARDIRQ_H */
......@@ -278,6 +278,11 @@ enum
NR_SOFTIRQS
};
/* map softirq index to softirq name. update 'softirq_to_name' in
* kernel/softirq.c when adding a new softirq.
*/
extern char *softirq_to_name[NR_SOFTIRQS];
/* softirq mask and active fields moved to irq_cpustat_t in
* asm/hardirq.h to get better cache usage. KAO
*/
......
......@@ -391,6 +391,139 @@ static inline char *pack_hex_byte(char *buf, u8 byte)
({ if (0) printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__); 0; })
#endif
/*
* General tracing related utility functions - trace_printk(),
* tracing_on/tracing_off and tracing_start()/tracing_stop
*
* Use tracing_on/tracing_off when you want to quickly turn on or off
* tracing. It simply enables or disables the recording of the trace events.
* This also corresponds to the user space debugfs/tracing/tracing_on
* file, which gives a means for the kernel and userspace to interact.
* Place a tracing_off() in the kernel where you want tracing to end.
* From user space, examine the trace, and then echo 1 > tracing_on
* to continue tracing.
*
* tracing_stop/tracing_start has slightly more overhead. It is used
* by things like suspend to ram where disabling the recording of the
* trace is not enough, but tracing must actually stop because things
* like calling smp_processor_id() may crash the system.
*
* Most likely, you want to use tracing_on/tracing_off.
*/
#ifdef CONFIG_RING_BUFFER
void tracing_on(void);
void tracing_off(void);
/* trace_off_permanent stops recording with no way to bring it back */
void tracing_off_permanent(void);
int tracing_is_on(void);
#else
static inline void tracing_on(void) { }
static inline void tracing_off(void) { }
static inline void tracing_off_permanent(void) { }
static inline int tracing_is_on(void) { return 0; }
#endif
#ifdef CONFIG_TRACING
extern void tracing_start(void);
extern void tracing_stop(void);
extern void ftrace_off_permanent(void);
extern void
ftrace_special(unsigned long arg1, unsigned long arg2, unsigned long arg3);
static inline void __attribute__ ((format (printf, 1, 2)))
____trace_printk_check_format(const char *fmt, ...)
{
}
#define __trace_printk_check_format(fmt, args...) \
do { \
if (0) \
____trace_printk_check_format(fmt, ##args); \
} while (0)
/**
* trace_printk - printf formatting in the ftrace buffer
* @fmt: the printf format for printing
*
* Note: __trace_printk is an internal function for trace_printk and
* the @ip is passed in via the trace_printk macro.
*
* This function allows a kernel developer to debug fast path sections
* that printk is not appropriate for. By scattering in various
* printk like tracing in the code, a developer can quickly see
* where problems are occurring.
*
* This is intended as a debugging tool for the developer only.
* Please refrain from leaving trace_printks scattered around in
* your code.
*/
#define trace_printk(fmt, args...) \
do { \
__trace_printk_check_format(fmt, ##args); \
if (__builtin_constant_p(fmt)) { \
static const char *trace_printk_fmt \
__attribute__((section("__trace_printk_fmt"))) = \
__builtin_constant_p(fmt) ? fmt : NULL; \
\
__trace_bprintk(_THIS_IP_, trace_printk_fmt, ##args); \
} else \
__trace_printk(_THIS_IP_, fmt, ##args); \
} while (0)
extern int
__trace_bprintk(unsigned long ip, const char *fmt, ...)
__attribute__ ((format (printf, 2, 3)));
extern int
__trace_printk(unsigned long ip, const char *fmt, ...)
__attribute__ ((format (printf, 2, 3)));
/*
* The double __builtin_constant_p is because gcc will give us an error
* if we try to allocate the static variable to fmt if it is not a
* constant. Even with the outer if statement.
*/
#define ftrace_vprintk(fmt, vargs) \
do { \
if (__builtin_constant_p(fmt)) { \
static const char *trace_printk_fmt \
__attribute__((section("__trace_printk_fmt"))) = \
__builtin_constant_p(fmt) ? fmt : NULL; \
\
__ftrace_vbprintk(_THIS_IP_, trace_printk_fmt, vargs); \
} else \
__ftrace_vprintk(_THIS_IP_, fmt, vargs); \
} while (0)
extern int
__ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap);
extern int
__ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap);
extern void ftrace_dump(void);
#else
static inline void
ftrace_special(unsigned long arg1, unsigned long arg2, unsigned long arg3) { }
static inline int
trace_printk(const char *fmt, ...) __attribute__ ((format (printf, 1, 2)));
static inline void tracing_start(void) { }
static inline void tracing_stop(void) { }
static inline void ftrace_off_permanent(void) { }
static inline int
trace_printk(const char *fmt, ...)
{
return 0;
}
static inline int
ftrace_vprintk(const char *fmt, va_list ap)
{
return 0;
}
static inline void ftrace_dump(void) { }
#endif /* CONFIG_TRACING */
/*
* Display an IP address in readable format.
*/
......
......@@ -110,4 +110,10 @@ struct memory_accessor {
off_t offset, size_t count);
};
/*
* Kernel text modification mutex, used for code patching. Users of this lock
* can sleep.
*/
extern struct mutex text_mutex;
#endif /* _LINUX_MEMORY_H_ */
......@@ -333,6 +333,11 @@ struct module
unsigned int num_tracepoints;
#endif
#ifdef CONFIG_TRACING
const char **trace_bprintk_fmt_start;
unsigned int num_trace_bprintk_fmt;
#endif
#ifdef CONFIG_MODULE_UNLOAD
/* What modules depend on me? */
struct list_head modules_which_use_me;
......
......@@ -8,7 +8,7 @@ struct ring_buffer;
struct ring_buffer_iter;
/*
* Don't reference this struct directly, use functions below.
* Don't refer to this struct directly, use functions below.
*/
struct ring_buffer_event {
u32 type:2, len:3, time_delta:27;
......@@ -18,10 +18,13 @@ struct ring_buffer_event {
/**
* enum ring_buffer_type - internal ring buffer types
*
* @RINGBUF_TYPE_PADDING: Left over page padding
* array is ignored
* size is variable depending on how much
* @RINGBUF_TYPE_PADDING: Left over page padding or discarded event
* If time_delta is 0:
* array is ignored
* size is variable depending on how much
* padding is needed
* If time_delta is non zero:
* everything else same as RINGBUF_TYPE_DATA
*
* @RINGBUF_TYPE_TIME_EXTEND: Extend the time delta
* array[0] = time delta (28 .. 59)
......@@ -65,6 +68,8 @@ ring_buffer_event_time_delta(struct ring_buffer_event *event)
return event->time_delta;
}
void ring_buffer_event_discard(struct ring_buffer_event *event);
/*
* size is in bytes for each per CPU buffer.
*/
......@@ -74,13 +79,10 @@ void ring_buffer_free(struct ring_buffer *buffer);
int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
struct ring_buffer_event *
ring_buffer_lock_reserve(struct ring_buffer *buffer,
unsigned long length,
unsigned long *flags);
struct ring_buffer_event *ring_buffer_lock_reserve(struct ring_buffer *buffer,
unsigned long length);
int ring_buffer_unlock_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event,
unsigned long flags);
struct ring_buffer_event *event);
int ring_buffer_write(struct ring_buffer *buffer,
unsigned long length, void *data);
......@@ -121,17 +123,19 @@ unsigned long ring_buffer_overruns(struct ring_buffer *buffer);
unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu);
unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu);
u64 ring_buffer_time_stamp(int cpu);
void ring_buffer_normalize_time_stamp(int cpu, u64 *ts);
u64 ring_buffer_time_stamp(struct ring_buffer *buffer, int cpu);
void ring_buffer_normalize_time_stamp(struct ring_buffer *buffer,
int cpu, u64 *ts);
void ring_buffer_set_clock(struct ring_buffer *buffer,
u64 (*clock)(void));
size_t ring_buffer_page_len(void *page);
void tracing_on(void);
void tracing_off(void);
void tracing_off_permanent(void);
void *ring_buffer_alloc_read_page(struct ring_buffer *buffer);
void ring_buffer_free_read_page(struct ring_buffer *buffer, void *data);
int ring_buffer_read_page(struct ring_buffer *buffer,
void **data_page, int cpu, int full);
int ring_buffer_read_page(struct ring_buffer *buffer, void **data_page,
size_t len, int cpu, int full);
enum ring_buffer_flags {
RB_FL_OVERWRITE = 1 << 0,
......
......@@ -138,6 +138,8 @@ extern unsigned long nr_uninterruptible(void);
extern unsigned long nr_active(void);
extern unsigned long nr_iowait(void);
extern unsigned long get_parent_ip(unsigned long addr);
struct seq_file;
struct cfs_rq;
struct task_group;
......@@ -1405,6 +1407,8 @@ struct task_struct {
int curr_ret_stack;
/* Stack of return addresses for return function tracing */
struct ftrace_ret_stack *ret_stack;
/* time stamp for last schedule */
unsigned long long ftrace_timestamp;
/*
* Number of functions that haven't been traced
* because of depth overrun.
......
......@@ -14,6 +14,7 @@
#include <asm/page.h> /* kmalloc_sizes.h needs PAGE_SIZE */
#include <asm/cache.h> /* kmalloc_sizes.h needs L1_CACHE_BYTES */
#include <linux/compiler.h>
#include <trace/kmemtrace.h>
/* Size description struct for general caches. */
struct cache_sizes {
......@@ -28,8 +29,26 @@ extern struct cache_sizes malloc_sizes[];
void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
void *__kmalloc(size_t size, gfp_t flags);
static inline void *kmalloc(size_t size, gfp_t flags)
#ifdef CONFIG_KMEMTRACE
extern void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags);
extern size_t slab_buffer_size(struct kmem_cache *cachep);
#else
static __always_inline void *
kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags)
{
return kmem_cache_alloc(cachep, flags);
}
static inline size_t slab_buffer_size(struct kmem_cache *cachep)
{
return 0;
}
#endif
static __always_inline void *kmalloc(size_t size, gfp_t flags)
{
struct kmem_cache *cachep;
void *ret;
if (__builtin_constant_p(size)) {
int i = 0;
......@@ -47,10 +66,17 @@ static inline void *kmalloc(size_t size, gfp_t flags)
found:
#ifdef CONFIG_ZONE_DMA
if (flags & GFP_DMA)
return kmem_cache_alloc(malloc_sizes[i].cs_dmacachep,
flags);
cachep = malloc_sizes[i].cs_dmacachep;
else
#endif
return kmem_cache_alloc(malloc_sizes[i].cs_cachep, flags);
cachep = malloc_sizes[i].cs_cachep;
ret = kmem_cache_alloc_notrace(cachep, flags);
kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
size, slab_buffer_size(cachep), flags);
return ret;
}
return __kmalloc(size, flags);
}
......@@ -59,8 +85,25 @@ static inline void *kmalloc(size_t size, gfp_t flags)
extern void *__kmalloc_node(size_t size, gfp_t flags, int node);
extern void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
#ifdef CONFIG_KMEMTRACE
extern void *kmem_cache_alloc_node_notrace(struct kmem_cache *cachep,
gfp_t flags,
int nodeid);
#else
static __always_inline void *
kmem_cache_alloc_node_notrace(struct kmem_cache *cachep,
gfp_t flags,
int nodeid)
{
return kmem_cache_alloc_node(cachep, flags, nodeid);
}
#endif
static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
struct kmem_cache *cachep;
void *ret;
if (__builtin_constant_p(size)) {
int i = 0;
......@@ -78,11 +121,18 @@ static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
found:
#ifdef CONFIG_ZONE_DMA
if (flags & GFP_DMA)
return kmem_cache_alloc_node(malloc_sizes[i].cs_dmacachep,
flags, node);
cachep = malloc_sizes[i].cs_dmacachep;
else
#endif
return kmem_cache_alloc_node(malloc_sizes[i].cs_cachep,
flags, node);
cachep = malloc_sizes[i].cs_cachep;
ret = kmem_cache_alloc_node_notrace(cachep, flags, node);
kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
ret, size, slab_buffer_size(cachep),
flags, node);
return ret;
}
return __kmalloc_node(size, flags, node);
}
......
......@@ -3,14 +3,15 @@
void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
static inline void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
static __always_inline void *kmem_cache_alloc(struct kmem_cache *cachep,
gfp_t flags)
{
return kmem_cache_alloc_node(cachep, flags, -1);
}
void *__kmalloc_node(size_t size, gfp_t flags, int node);
static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
return __kmalloc_node(size, flags, node);
}
......@@ -23,12 +24,12 @@ static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
* kmalloc is the normal method of allocating memory
* in the kernel.
*/
static inline void *kmalloc(size_t size, gfp_t flags)
static __always_inline void *kmalloc(size_t size, gfp_t flags)
{
return __kmalloc_node(size, flags, -1);
}
static inline void *__kmalloc(size_t size, gfp_t flags)
static __always_inline void *__kmalloc(size_t size, gfp_t flags)
{
return kmalloc(size, flags);
}
......
......@@ -10,6 +10,7 @@
#include <linux/gfp.h>
#include <linux/workqueue.h>
#include <linux/kobject.h>
#include <trace/kmemtrace.h>
enum stat_item {
ALLOC_FASTPATH, /* Allocation from cpu slab */
......@@ -217,13 +218,31 @@ static __always_inline struct kmem_cache *kmalloc_slab(size_t size)
void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
void *__kmalloc(size_t size, gfp_t flags);
#ifdef CONFIG_KMEMTRACE
extern void *kmem_cache_alloc_notrace(struct kmem_cache *s, gfp_t gfpflags);
#else
static __always_inline void *
kmem_cache_alloc_notrace(struct kmem_cache *s, gfp_t gfpflags)
{
return kmem_cache_alloc(s, gfpflags);
}
#endif
static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
{
return (void *)__get_free_pages(flags | __GFP_COMP, get_order(size));
unsigned int order = get_order(size);
void *ret = (void *) __get_free_pages(flags | __GFP_COMP, order);
kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
size, PAGE_SIZE << order, flags);
return ret;
}
static __always_inline void *kmalloc(size_t size, gfp_t flags)
{
void *ret;
if (__builtin_constant_p(size)) {
if (size > SLUB_MAX_SIZE)
return kmalloc_large(size, flags);
......@@ -234,7 +253,13 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
if (!s)
return ZERO_SIZE_PTR;
return kmem_cache_alloc(s, flags);
ret = kmem_cache_alloc_notrace(s, flags);
kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
_THIS_IP_, ret,
size, s->size, flags);
return ret;
}
}
return __kmalloc(size, flags);
......@@ -244,8 +269,24 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
void *__kmalloc_node(size_t size, gfp_t flags, int node);
void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
#ifdef CONFIG_KMEMTRACE
extern void *kmem_cache_alloc_node_notrace(struct kmem_cache *s,
gfp_t gfpflags,
int node);
#else
static __always_inline void *
kmem_cache_alloc_node_notrace(struct kmem_cache *s,
gfp_t gfpflags,
int node)
{
return kmem_cache_alloc_node(s, gfpflags, node);
}
#endif
static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
void *ret;
if (__builtin_constant_p(size) &&
size <= SLUB_MAX_SIZE && !(flags & SLUB_DMA)) {
struct kmem_cache *s = kmalloc_slab(size);
......@@ -253,7 +294,13 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
if (!s)
return ZERO_SIZE_PTR;
return kmem_cache_alloc_node(s, flags, node);
ret = kmem_cache_alloc_node_notrace(s, flags, node);
kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
_THIS_IP_, ret,
size, s->size, flags, node);
return ret;
}
return __kmalloc_node(size, flags, node);
}
......
......@@ -65,6 +65,7 @@ struct old_linux_dirent;
#include <asm/signal.h>
#include <linux/quota.h>
#include <linux/key.h>
#include <linux/ftrace.h>
#define __SC_DECL1(t1, a1) t1 a1
#define __SC_DECL2(t2, a2, ...) t2 a2, __SC_DECL1(__VA_ARGS__)
......@@ -95,7 +96,46 @@ struct old_linux_dirent;
#define __SC_TEST5(t5, a5, ...) __SC_TEST(t5); __SC_TEST4(__VA_ARGS__)
#define __SC_TEST6(t6, a6, ...) __SC_TEST(t6); __SC_TEST5(__VA_ARGS__)
#ifdef CONFIG_FTRACE_SYSCALLS
#define __SC_STR_ADECL1(t, a) #a
#define __SC_STR_ADECL2(t, a, ...) #a, __SC_STR_ADECL1(__VA_ARGS__)
#define __SC_STR_ADECL3(t, a, ...) #a, __SC_STR_ADECL2(__VA_ARGS__)
#define __SC_STR_ADECL4(t, a, ...) #a, __SC_STR_ADECL3(__VA_ARGS__)
#define __SC_STR_ADECL5(t, a, ...) #a, __SC_STR_ADECL4(__VA_ARGS__)
#define __SC_STR_ADECL6(t, a, ...) #a, __SC_STR_ADECL5(__VA_ARGS__)
#define __SC_STR_TDECL1(t, a) #t
#define __SC_STR_TDECL2(t, a, ...) #t, __SC_STR_TDECL1(__VA_ARGS__)
#define __SC_STR_TDECL3(t, a, ...) #t, __SC_STR_TDECL2(__VA_ARGS__)
#define __SC_STR_TDECL4(t, a, ...) #t, __SC_STR_TDECL3(__VA_ARGS__)
#define __SC_STR_TDECL5(t, a, ...) #t, __SC_STR_TDECL4(__VA_ARGS__)
#define __SC_STR_TDECL6(t, a, ...) #t, __SC_STR_TDECL5(__VA_ARGS__)
#define SYSCALL_METADATA(sname, nb) \
static const struct syscall_metadata __used \
__attribute__((__aligned__(4))) \
__attribute__((section("__syscalls_metadata"))) \
__syscall_meta_##sname = { \
.name = "sys"#sname, \
.nb_args = nb, \
.types = types_##sname, \
.args = args_##sname, \
}
#define SYSCALL_DEFINE0(sname) \
static const struct syscall_metadata __used \
__attribute__((__aligned__(4))) \
__attribute__((section("__syscalls_metadata"))) \
__syscall_meta_##sname = { \
.name = "sys_"#sname, \
.nb_args = 0, \
}; \
asmlinkage long sys_##sname(void)
#else
#define SYSCALL_DEFINE0(name) asmlinkage long sys_##name(void)
#endif
#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
......@@ -117,10 +157,26 @@ struct old_linux_dirent;
#endif
#endif
#ifdef CONFIG_FTRACE_SYSCALLS
#define SYSCALL_DEFINEx(x, sname, ...) \
static const char *types_##sname[] = { \
__SC_STR_TDECL##x(__VA_ARGS__) \
}; \
static const char *args_##sname[] = { \
__SC_STR_ADECL##x(__VA_ARGS__) \
}; \
SYSCALL_METADATA(sname, x); \
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
#else
#define SYSCALL_DEFINEx(x, sname, ...) \
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
#endif
#ifdef CONFIG_HAVE_SYSCALL_WRAPPERS
#define SYSCALL_DEFINE(name) static inline long SYSC_##name
#define SYSCALL_DEFINEx(x, name, ...) \
#define __SYSCALL_DEFINEx(x, name, ...) \
asmlinkage long sys##name(__SC_DECL##x(__VA_ARGS__)); \
static inline long SYSC##name(__SC_DECL##x(__VA_ARGS__)); \
asmlinkage long SyS##name(__SC_LONG##x(__VA_ARGS__)) \
......@@ -134,7 +190,7 @@ struct old_linux_dirent;
#else /* CONFIG_HAVE_SYSCALL_WRAPPERS */
#define SYSCALL_DEFINE(name) asmlinkage long sys_##name
#define SYSCALL_DEFINEx(x, name, ...) \
#define __SYSCALL_DEFINEx(x, name, ...) \
asmlinkage long sys##name(__SC_DECL##x(__VA_ARGS__))
#endif /* CONFIG_HAVE_SYSCALL_WRAPPERS */
......
#ifndef _LINUX_TRACE_CLOCK_H
#define _LINUX_TRACE_CLOCK_H
/*
* 3 trace clock variants, with differing scalability/precision
* tradeoffs:
*
* - local: CPU-local trace clock
* - medium: scalable global clock with some jitter
* - global: globally monotonic, serialized clock
*/
#include <linux/compiler.h>
#include <linux/types.h>
extern u64 notrace trace_clock_local(void);
extern u64 notrace trace_clock(void);
extern u64 notrace trace_clock_global(void);
#endif /* _LINUX_TRACE_CLOCK_H */
......@@ -31,8 +31,8 @@ struct tracepoint {
* Keep in sync with vmlinux.lds.h.
*/
#define TPPROTO(args...) args
#define TPARGS(args...) args
#define TP_PROTO(args...) args
#define TP_ARGS(args...) args
#ifdef CONFIG_TRACEPOINTS
......@@ -65,7 +65,7 @@ struct tracepoint {
{ \
if (unlikely(__tracepoint_##name.state)) \
__DO_TRACE(&__tracepoint_##name, \
TPPROTO(proto), TPARGS(args)); \
TP_PROTO(proto), TP_ARGS(args)); \
} \
static inline int register_trace_##name(void (*probe)(proto)) \
{ \
......@@ -153,4 +153,114 @@ static inline void tracepoint_synchronize_unregister(void)
synchronize_sched();
}
#define PARAMS(args...) args
#define TRACE_FORMAT(name, proto, args, fmt) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
/*
* For use with the TRACE_EVENT macro:
*
* We define a tracepoint, its arguments, its printk format
* and its 'fast binay record' layout.
*
* Firstly, name your tracepoint via TRACE_EVENT(name : the
* 'subsystem_event' notation is fine.
*
* Think about this whole construct as the
* 'trace_sched_switch() function' from now on.
*
*
* TRACE_EVENT(sched_switch,
*
* *
* * A function has a regular function arguments
* * prototype, declare it via TP_PROTO():
* *
*
* TP_PROTO(struct rq *rq, struct task_struct *prev,
* struct task_struct *next),
*
* *
* * Define the call signature of the 'function'.
* * (Design sidenote: we use this instead of a
* * TP_PROTO1/TP_PROTO2/TP_PROTO3 ugliness.)
* *
*
* TP_ARGS(rq, prev, next),
*
* *
* * Fast binary tracing: define the trace record via
* * TP_STRUCT__entry(). You can think about it like a
* * regular C structure local variable definition.
* *
* * This is how the trace record is structured and will
* * be saved into the ring buffer. These are the fields
* * that will be exposed to user-space in
* * /debug/tracing/events/<*>/format.
* *
* * The declared 'local variable' is called '__entry'
* *
* * __field(pid_t, prev_prid) is equivalent to a standard declariton:
* *
* * pid_t prev_pid;
* *
* * __array(char, prev_comm, TASK_COMM_LEN) is equivalent to:
* *
* * char prev_comm[TASK_COMM_LEN];
* *
*
* TP_STRUCT__entry(
* __array( char, prev_comm, TASK_COMM_LEN )
* __field( pid_t, prev_pid )
* __field( int, prev_prio )
* __array( char, next_comm, TASK_COMM_LEN )
* __field( pid_t, next_pid )
* __field( int, next_prio )
* ),
*
* *
* * Assign the entry into the trace record, by embedding
* * a full C statement block into TP_fast_assign(). You
* * can refer to the trace record as '__entry' -
* * otherwise you can put arbitrary C code in here.
* *
* * Note: this C code will execute every time a trace event
* * happens, on an active tracepoint.
* *
*
* TP_fast_assign(
* memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
* __entry->prev_pid = prev->pid;
* __entry->prev_prio = prev->prio;
* memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
* __entry->next_pid = next->pid;
* __entry->next_prio = next->prio;
* )
*
* *
* * Formatted output of a trace record via TP_printk().
* * This is how the tracepoint will appear under ftrace
* * plugins that make use of this tracepoint.
* *
* * (raw-binary tracing wont actually perform this step.)
* *
*
* TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
* __entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
* __entry->next_comm, __entry->next_pid, __entry->next_prio),
*
* );
*
* This macro construct is thus used for the regular printk format
* tracing setup, it is used to construct a function pointer based
* tracepoint callback (this is used by programmatic plugins and
* can also by used by generic instrumentation like SystemTap), and
* it is also used to expose a structured trace record in
* /debug/tracing/events/.
*/
#define TRACE_EVENT(name, proto, args, struct, assign, print) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#endif
......@@ -5,72 +5,72 @@
#include <linux/tracepoint.h>
DECLARE_TRACE(block_rq_abort,
TPPROTO(struct request_queue *q, struct request *rq),
TPARGS(q, rq));
TP_PROTO(struct request_queue *q, struct request *rq),
TP_ARGS(q, rq));
DECLARE_TRACE(block_rq_insert,
TPPROTO(struct request_queue *q, struct request *rq),
TPARGS(q, rq));
TP_PROTO(struct request_queue *q, struct request *rq),
TP_ARGS(q, rq));
DECLARE_TRACE(block_rq_issue,
TPPROTO(struct request_queue *q, struct request *rq),
TPARGS(q, rq));
TP_PROTO(struct request_queue *q, struct request *rq),
TP_ARGS(q, rq));
DECLARE_TRACE(block_rq_requeue,
TPPROTO(struct request_queue *q, struct request *rq),
TPARGS(q, rq));
TP_PROTO(struct request_queue *q, struct request *rq),
TP_ARGS(q, rq));
DECLARE_TRACE(block_rq_complete,
TPPROTO(struct request_queue *q, struct request *rq),
TPARGS(q, rq));
TP_PROTO(struct request_queue *q, struct request *rq),
TP_ARGS(q, rq));
DECLARE_TRACE(block_bio_bounce,
TPPROTO(struct request_queue *q, struct bio *bio),
TPARGS(q, bio));
TP_PROTO(struct request_queue *q, struct bio *bio),
TP_ARGS(q, bio));
DECLARE_TRACE(block_bio_complete,
TPPROTO(struct request_queue *q, struct bio *bio),
TPARGS(q, bio));
TP_PROTO(struct request_queue *q, struct bio *bio),
TP_ARGS(q, bio));
DECLARE_TRACE(block_bio_backmerge,
TPPROTO(struct request_queue *q, struct bio *bio),
TPARGS(q, bio));
TP_PROTO(struct request_queue *q, struct bio *bio),
TP_ARGS(q, bio));
DECLARE_TRACE(block_bio_frontmerge,
TPPROTO(struct request_queue *q, struct bio *bio),
TPARGS(q, bio));
TP_PROTO(struct request_queue *q, struct bio *bio),
TP_ARGS(q, bio));
DECLARE_TRACE(block_bio_queue,
TPPROTO(struct request_queue *q, struct bio *bio),
TPARGS(q, bio));
TP_PROTO(struct request_queue *q, struct bio *bio),
TP_ARGS(q, bio));
DECLARE_TRACE(block_getrq,
TPPROTO(struct request_queue *q, struct bio *bio, int rw),
TPARGS(q, bio, rw));
TP_PROTO(struct request_queue *q, struct bio *bio, int rw),
TP_ARGS(q, bio, rw));
DECLARE_TRACE(block_sleeprq,
TPPROTO(struct request_queue *q, struct bio *bio, int rw),
TPARGS(q, bio, rw));
TP_PROTO(struct request_queue *q, struct bio *bio, int rw),
TP_ARGS(q, bio, rw));
DECLARE_TRACE(block_plug,
TPPROTO(struct request_queue *q),
TPARGS(q));
TP_PROTO(struct request_queue *q),
TP_ARGS(q));
DECLARE_TRACE(block_unplug_timer,
TPPROTO(struct request_queue *q),
TPARGS(q));
TP_PROTO(struct request_queue *q),
TP_ARGS(q));
DECLARE_TRACE(block_unplug_io,
TPPROTO(struct request_queue *q),
TPARGS(q));
TP_PROTO(struct request_queue *q),
TP_ARGS(q));
DECLARE_TRACE(block_split,
TPPROTO(struct request_queue *q, struct bio *bio, unsigned int pdu),
TPARGS(q, bio, pdu));
TP_PROTO(struct request_queue *q, struct bio *bio, unsigned int pdu),
TP_ARGS(q, bio, pdu));
DECLARE_TRACE(block_remap,
TPPROTO(struct request_queue *q, struct bio *bio, dev_t dev,
sector_t from, sector_t to),
TPARGS(q, bio, dev, from, to));
TP_PROTO(struct request_queue *q, struct bio *bio, dev_t dev,
sector_t from, sector_t to),
TP_ARGS(q, bio, dev, from, to));
#endif
#ifndef _TRACE_IRQ_H
#define _TRACE_IRQ_H
#include <linux/interrupt.h>
#include <linux/tracepoint.h>
#include <trace/irq_event_types.h>
#endif
/* use <trace/irq.h> instead */
#ifndef TRACE_FORMAT
# error Do not include this file directly.
# error Unless you know what you are doing.
#endif
#undef TRACE_SYSTEM
#define TRACE_SYSTEM irq
/*
* Tracepoint for entry of interrupt handler:
*/
TRACE_FORMAT(irq_handler_entry,
TP_PROTO(int irq, struct irqaction *action),
TP_ARGS(irq, action),
TP_FMT("irq=%d handler=%s", irq, action->name)
);
/*
* Tracepoint for return of an interrupt handler:
*/
TRACE_EVENT(irq_handler_exit,
TP_PROTO(int irq, struct irqaction *action, int ret),
TP_ARGS(irq, action, ret),
TP_STRUCT__entry(
__field( int, irq )
__field( int, ret )
),
TP_fast_assign(
__entry->irq = irq;
__entry->ret = ret;
),
TP_printk("irq=%d return=%s",
__entry->irq, __entry->ret ? "handled" : "unhandled")
);
TRACE_FORMAT(softirq_entry,
TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
TP_ARGS(h, vec),
TP_FMT("softirq=%d action=%s", (int)(h - vec), softirq_to_name[h-vec])
);
TRACE_FORMAT(softirq_exit,
TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
TP_ARGS(h, vec),
TP_FMT("softirq=%d action=%s", (int)(h - vec), softirq_to_name[h-vec])
);
#undef TRACE_SYSTEM
/*
* Copyright (C) 2008 Eduard - Gabriel Munteanu
*
* This file is released under GPL version 2.
*/
#ifndef _LINUX_KMEMTRACE_H
#define _LINUX_KMEMTRACE_H
#ifdef __KERNEL__
#include <linux/types.h>
#include <linux/marker.h>
enum kmemtrace_type_id {
KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() or kfree(). */
KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */
KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */
};
#ifdef CONFIG_KMEMTRACE
extern void kmemtrace_init(void);
extern void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
unsigned long call_site,
const void *ptr,
size_t bytes_req,
size_t bytes_alloc,
gfp_t gfp_flags,
int node);
extern void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
unsigned long call_site,
const void *ptr);
#else /* CONFIG_KMEMTRACE */
static inline void kmemtrace_init(void)
{
}
static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
unsigned long call_site,
const void *ptr,
size_t bytes_req,
size_t bytes_alloc,
gfp_t gfp_flags,
int node)
{
}
static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
unsigned long call_site,
const void *ptr)
{
}
#endif /* CONFIG_KMEMTRACE */
static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id,
unsigned long call_site,
const void *ptr,
size_t bytes_req,
size_t bytes_alloc,
gfp_t gfp_flags)
{
kmemtrace_mark_alloc_node(type_id, call_site, ptr,
bytes_req, bytes_alloc, gfp_flags, -1);
}
#endif /* __KERNEL__ */
#endif /* _LINUX_KMEMTRACE_H */
#ifndef _TRACE_LOCKDEP_H
#define _TRACE_LOCKDEP_H
#include <linux/lockdep.h>
#include <linux/tracepoint.h>
#include <trace/lockdep_event_types.h>
#endif
#ifndef TRACE_FORMAT
# error Do not include this file directly.
# error Unless you know what you are doing.
#endif
#undef TRACE_SYSTEM
#define TRACE_SYSTEM lock
#ifdef CONFIG_LOCKDEP
TRACE_FORMAT(lock_acquire,
TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
int trylock, int read, int check,
struct lockdep_map *next_lock, unsigned long ip),
TP_ARGS(lock, subclass, trylock, read, check, next_lock, ip),
TP_FMT("%s%s%s", trylock ? "try " : "",
read ? "read " : "", lock->name)
);
TRACE_FORMAT(lock_release,
TP_PROTO(struct lockdep_map *lock, int nested, unsigned long ip),
TP_ARGS(lock, nested, ip),
TP_FMT("%s", lock->name)
);
#ifdef CONFIG_LOCK_STAT
TRACE_FORMAT(lock_contended,
TP_PROTO(struct lockdep_map *lock, unsigned long ip),
TP_ARGS(lock, ip),
TP_FMT("%s", lock->name)
);
TRACE_FORMAT(lock_acquired,
TP_PROTO(struct lockdep_map *lock, unsigned long ip),
TP_ARGS(lock, ip),
TP_FMT("%s", lock->name)
);
#endif
#endif
#undef TRACE_SYSTEM
#ifndef _TRACE_POWER_H
#define _TRACE_POWER_H
#include <linux/ktime.h>
#include <linux/tracepoint.h>
enum {
POWER_NONE = 0,
POWER_CSTATE = 1,
POWER_PSTATE = 2,
};
struct power_trace {
ktime_t stamp;
ktime_t end;
int type;
int state;
};
DECLARE_TRACE(power_start,
TP_PROTO(struct power_trace *it, unsigned int type, unsigned int state),
TP_ARGS(it, type, state));
DECLARE_TRACE(power_mark,
TP_PROTO(struct power_trace *it, unsigned int type, unsigned int state),
TP_ARGS(it, type, state));
DECLARE_TRACE(power_end,
TP_PROTO(struct power_trace *it),
TP_ARGS(it));
#endif /* _TRACE_POWER_H */
......@@ -4,53 +4,6 @@
#include <linux/sched.h>
#include <linux/tracepoint.h>
DECLARE_TRACE(sched_kthread_stop,
TPPROTO(struct task_struct *t),
TPARGS(t));
DECLARE_TRACE(sched_kthread_stop_ret,
TPPROTO(int ret),
TPARGS(ret));
DECLARE_TRACE(sched_wait_task,
TPPROTO(struct rq *rq, struct task_struct *p),
TPARGS(rq, p));
DECLARE_TRACE(sched_wakeup,
TPPROTO(struct rq *rq, struct task_struct *p, int success),
TPARGS(rq, p, success));
DECLARE_TRACE(sched_wakeup_new,
TPPROTO(struct rq *rq, struct task_struct *p, int success),
TPARGS(rq, p, success));
DECLARE_TRACE(sched_switch,
TPPROTO(struct rq *rq, struct task_struct *prev,
struct task_struct *next),
TPARGS(rq, prev, next));
DECLARE_TRACE(sched_migrate_task,
TPPROTO(struct task_struct *p, int orig_cpu, int dest_cpu),
TPARGS(p, orig_cpu, dest_cpu));
DECLARE_TRACE(sched_process_free,
TPPROTO(struct task_struct *p),
TPARGS(p));
DECLARE_TRACE(sched_process_exit,
TPPROTO(struct task_struct *p),
TPARGS(p));
DECLARE_TRACE(sched_process_wait,
TPPROTO(struct pid *pid),
TPARGS(pid));
DECLARE_TRACE(sched_process_fork,
TPPROTO(struct task_struct *parent, struct task_struct *child),
TPARGS(parent, child));
DECLARE_TRACE(sched_signal_send,
TPPROTO(int sig, struct task_struct *p),
TPARGS(sig, p));
#include <trace/sched_event_types.h>
#endif
This diff is collapsed.
......@@ -5,7 +5,7 @@
#include <linux/tracepoint.h>
DECLARE_TRACE(kfree_skb,
TPPROTO(struct sk_buff *skb, void *location),
TPARGS(skb, location));
TP_PROTO(struct sk_buff *skb, void *location),
TP_ARGS(skb, location));
#endif
/* trace/<type>_event_types.h here */
#include <trace/sched_event_types.h>
#include <trace/irq_event_types.h>
#include <trace/lockdep_event_types.h>
/* trace/<type>.h here */
#include <trace/sched.h>
#include <trace/irq.h>
#include <trace/lockdep.h>
#ifndef __TRACE_WORKQUEUE_H
#define __TRACE_WORKQUEUE_H
#include <linux/tracepoint.h>
#include <linux/workqueue.h>
#include <linux/sched.h>
DECLARE_TRACE(workqueue_insertion,
TP_PROTO(struct task_struct *wq_thread, struct work_struct *work),
TP_ARGS(wq_thread, work));
DECLARE_TRACE(workqueue_execution,
TP_PROTO(struct task_struct *wq_thread, struct work_struct *work),
TP_ARGS(wq_thread, work));
/* Trace the creation of one workqueue thread on a cpu */
DECLARE_TRACE(workqueue_creation,
TP_PROTO(struct task_struct *wq_thread, int cpu),
TP_ARGS(wq_thread, cpu));
DECLARE_TRACE(workqueue_destruction,
TP_PROTO(struct task_struct *wq_thread),
TP_ARGS(wq_thread));
#endif /* __TRACE_WORKQUEUE_H */
......@@ -1007,7 +1007,7 @@ config TRACEPOINTS
config MARKERS
bool "Activate markers"
depends on TRACEPOINTS
select TRACEPOINTS
help
Place an empty function call at each marker site. Can be
dynamically changed for a probe function.
......
......@@ -71,6 +71,7 @@
#include <asm/setup.h>
#include <asm/sections.h>
#include <asm/cacheflush.h>
#include <trace/kmemtrace.h>
#ifdef CONFIG_X86_LOCAL_APIC
#include <asm/smp.h>
......@@ -648,6 +649,7 @@ asmlinkage void __init start_kernel(void)
enable_debug_pagealloc();
cpu_hotplug_init();
kmem_cache_init();
kmemtrace_init();
debug_objects_mem_init();
idr_init_cache();
setup_per_cpu_pageset();
......@@ -769,6 +771,7 @@ static void __init do_basic_setup(void)
{
rcu_init_sched(); /* needed by module_init stage. */
init_workqueues();
cpuset_init_smp();
usermodehelper_init();
driver_init();
init_irq_proc();
......@@ -863,8 +866,6 @@ static int __init kernel_init(void * unused)
smp_init();
sched_init_smp();
cpuset_init_smp();
do_basic_setup();
/*
......
......@@ -15,11 +15,22 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include <linux/ftrace.h>
#include <linux/memory.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/init.h>
#include <linux/ftrace.h>
#include <asm/uaccess.h>
#include <asm/sections.h>
#include <asm/uaccess.h>
/*
* mutex protecting text section modification (dynamic code patching).
* some users need to sleep (allocating memory...) while they hold this lock.
*
* NOT exported to modules - patching kernel text is a really delicate matter.
*/
DEFINE_MUTEX(text_mutex);
extern struct exception_table_entry __start___ex_table[];
extern struct exception_table_entry __stop___ex_table[];
......@@ -49,7 +60,7 @@ static inline int init_kernel_text(unsigned long addr)
return 0;
}
__notrace_funcgraph int core_kernel_text(unsigned long addr)
int core_kernel_text(unsigned long addr)
{
if (addr >= (unsigned long)_stext &&
addr <= (unsigned long)_etext)
......@@ -61,7 +72,7 @@ __notrace_funcgraph int core_kernel_text(unsigned long addr)
return 0;
}
__notrace_funcgraph int __kernel_text_address(unsigned long addr)
int __kernel_text_address(unsigned long addr)
{
if (core_kernel_text(addr))
return 1;
......
......@@ -17,6 +17,7 @@
#include <linux/kernel_stat.h>
#include <linux/rculist.h>
#include <linux/hash.h>
#include <trace/irq.h>
#include <linux/bootmem.h>
#include "internals.h"
......@@ -338,6 +339,9 @@ irqreturn_t no_action(int cpl, void *dev_id)
return IRQ_NONE;
}
DEFINE_TRACE(irq_handler_entry);
DEFINE_TRACE(irq_handler_exit);
/**
* handle_IRQ_event - irq action chain handler
* @irq: the interrupt number
......@@ -356,7 +360,9 @@ irqreturn_t handle_IRQ_event(unsigned int irq, struct irqaction *action)
local_irq_enable_in_hardirq();
do {
trace_irq_handler_entry(irq, action);
ret = action->handler(irq, action->dev_id);
trace_irq_handler_exit(irq, action, ret);
if (ret == IRQ_HANDLED)
status |= action->flags;
retval |= ret;
......
......@@ -43,6 +43,7 @@
#include <linux/seq_file.h>
#include <linux/debugfs.h>
#include <linux/kdebug.h>
#include <linux/memory.h>
#include <asm-generic/sections.h>
#include <asm/cacheflush.h>
......@@ -699,9 +700,10 @@ int __kprobes register_kprobe(struct kprobe *p)
goto out;
}
mutex_lock(&text_mutex);
ret = arch_prepare_kprobe(p);
if (ret)
goto out;
goto out_unlock_text;
INIT_HLIST_NODE(&p->hlist);
hlist_add_head_rcu(&p->hlist,
......@@ -710,6 +712,8 @@ int __kprobes register_kprobe(struct kprobe *p)
if (kprobe_enabled)
arch_arm_kprobe(p);
out_unlock_text:
mutex_unlock(&text_mutex);
out:
mutex_unlock(&kprobe_mutex);
......@@ -746,8 +750,11 @@ static int __kprobes __unregister_kprobe_top(struct kprobe *p)
* enabled and not gone - otherwise, the breakpoint would
* already have been removed. We save on flushing icache.
*/
if (kprobe_enabled && !kprobe_gone(old_p))
if (kprobe_enabled && !kprobe_gone(old_p)) {
mutex_lock(&text_mutex);
arch_disarm_kprobe(p);
mutex_unlock(&text_mutex);
}
hlist_del_rcu(&old_p->hlist);
} else {
if (p->break_handler && !kprobe_gone(p))
......@@ -912,10 +919,8 @@ static int __kprobes pre_handler_kretprobe(struct kprobe *p,
ri->rp = rp;
ri->task = current;
if (rp->entry_handler && rp->entry_handler(ri, regs)) {
spin_unlock_irqrestore(&rp->lock, flags);
if (rp->entry_handler && rp->entry_handler(ri, regs))
return 0;
}
arch_prepare_kretprobe(ri, regs);
......@@ -1280,12 +1285,14 @@ static void __kprobes enable_all_kprobes(void)
if (kprobe_enabled)
goto already_enabled;
mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist)
if (!kprobe_gone(p))
arch_arm_kprobe(p);
}
mutex_unlock(&text_mutex);
kprobe_enabled = true;
printk(KERN_INFO "Kprobes globally enabled\n");
......@@ -1310,6 +1317,7 @@ static void __kprobes disable_all_kprobes(void)
kprobe_enabled = false;
printk(KERN_INFO "Kprobes globally disabled\n");
mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist) {
......@@ -1318,6 +1326,7 @@ static void __kprobes disable_all_kprobes(void)
}
}
mutex_unlock(&text_mutex);
mutex_unlock(&kprobe_mutex);
/* Allow all currently running kprobes to complete */
synchronize_sched();
......
......@@ -42,6 +42,7 @@
#include <linux/hash.h>
#include <linux/ftrace.h>
#include <linux/stringify.h>
#include <trace/lockdep.h>
#include <asm/sections.h>
......@@ -2923,6 +2924,8 @@ void lock_set_class(struct lockdep_map *lock, const char *name,
}
EXPORT_SYMBOL_GPL(lock_set_class);
DEFINE_TRACE(lock_acquire);
/*
* We are not always called with irqs disabled - do that here,
* and also avoid lockdep recursion:
......@@ -2933,6 +2936,8 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
{
unsigned long flags;
trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock, ip);
if (unlikely(current->lockdep_recursion))
return;
......@@ -2947,11 +2952,15 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
}
EXPORT_SYMBOL_GPL(lock_acquire);
DEFINE_TRACE(lock_release);
void lock_release(struct lockdep_map *lock, int nested,
unsigned long ip)
{
unsigned long flags;
trace_lock_release(lock, nested, ip);
if (unlikely(current->lockdep_recursion))
return;
......@@ -3100,10 +3109,14 @@ __lock_acquired(struct lockdep_map *lock, unsigned long ip)
lock->ip = ip;
}
DEFINE_TRACE(lock_contended);
void lock_contended(struct lockdep_map *lock, unsigned long ip)
{
unsigned long flags;
trace_lock_contended(lock, ip);
if (unlikely(!lock_stat))
return;
......@@ -3119,10 +3132,14 @@ void lock_contended(struct lockdep_map *lock, unsigned long ip)
}
EXPORT_SYMBOL_GPL(lock_contended);
DEFINE_TRACE(lock_acquired);
void lock_acquired(struct lockdep_map *lock, unsigned long ip)
{
unsigned long flags;
trace_lock_acquired(lock, ip);
if (unlikely(!lock_stat))
return;
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment