Commit c3fa27d1 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'perf-core-for-linus' of...

Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (470 commits)
  x86: Fix comments of register/stack access functions
  perf tools: Replace %m with %a in sscanf
  hw-breakpoints: Keep track of user disabled breakpoints
  tracing/syscalls: Make syscall events print callbacks static
  tracing: Add DEFINE_EVENT(), DEFINE_SINGLE_EVENT() support to docbook
  perf: Don't free perf_mmap_data until work has been done
  perf_event: Fix compile error
  perf tools: Fix _GNU_SOURCE macro related strndup() build error
  trace_syscalls: Remove unused syscall_name_to_nr()
  trace_syscalls: Simplify syscall profile
  trace_syscalls: Remove duplicate init_enter_##sname()
  trace_syscalls: Add syscall_nr field to struct syscall_metadata
  trace_syscalls: Remove enter_id exit_id
  trace_syscalls: Set event_enter_##sname->data to its metadata
  trace_syscalls: Remove unused event_syscall_enter and event_syscall_exit
  perf_event: Initialize data.period in perf_swevent_hrtimer()
  perf probe: Simplify event naming
  perf probe: Add --list option for listing current probe events
  perf probe: Add argv_split() from lib/argv_split.c
  perf probe: Move probe event utility functions to probe-event.c
  ...
parents 96fa2b50 d103d01e
...@@ -86,4 +86,9 @@ ...@@ -86,4 +86,9 @@
!Iinclude/trace/events/irq.h !Iinclude/trace/events/irq.h
</chapter> </chapter>
<chapter id="signal">
<title>SIGNAL</title>
!Iinclude/trace/events/signal.h
</chapter>
</book> </book>
Kprobe-based Event Tracing
==========================
Documentation is written by Masami Hiramatsu
Overview
--------
These events are similar to tracepoint based events. Instead of Tracepoint,
this is based on kprobes (kprobe and kretprobe). So it can probe wherever
kprobes can probe (this means, all functions body except for __kprobes
functions). Unlike the Tracepoint based event, this can be added and removed
dynamically, on the fly.
To enable this feature, build your kernel with CONFIG_KPROBE_TRACING=y.
Similar to the events tracer, this doesn't need to be activated via
current_tracer. Instead of that, add probe points via
/sys/kernel/debug/tracing/kprobe_events, and enable it via
/sys/kernel/debug/tracing/events/kprobes/<EVENT>/enabled.
Synopsis of kprobe_events
-------------------------
p[:[GRP/]EVENT] SYMBOL[+offs]|MEMADDR [FETCHARGS] : Set a probe
r[:[GRP/]EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
GRP : Group name. If omitted, use "kprobes" for it.
EVENT : Event name. If omitted, the event name is generated
based on SYMBOL+offs or MEMADDR.
SYMBOL[+offs] : Symbol+offset where the probe is inserted.
MEMADDR : Address where the probe is inserted.
FETCHARGS : Arguments. Each probe can have up to 128 args.
%REG : Fetch register REG
@ADDR : Fetch memory at ADDR (ADDR should be in kernel)
@SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
$stackN : Fetch Nth entry of stack (N >= 0)
$stack : Fetch stack address.
$argN : Fetch function argument. (N >= 0)(*)
$retval : Fetch return value.(**)
+|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(***)
NAME=FETCHARG: Set NAME as the argument name of FETCHARG.
(*) aN may not correct on asmlinkaged functions and at the middle of
function body.
(**) only for return probe.
(***) this is useful for fetching a field of data structures.
Per-Probe Event Filtering
-------------------------
Per-probe event filtering feature allows you to set different filter on each
probe and gives you what arguments will be shown in trace buffer. If an event
name is specified right after 'p:' or 'r:' in kprobe_events, it adds an event
under tracing/events/kprobes/<EVENT>, at the directory you can see 'id',
'enabled', 'format' and 'filter'.
enabled:
You can enable/disable the probe by writing 1 or 0 on it.
format:
This shows the format of this probe event.
filter:
You can write filtering rules of this event.
id:
This shows the id of this probe event.
Event Profiling
---------------
You can check the total number of probe hits and probe miss-hits via
/sys/kernel/debug/tracing/kprobe_profile.
The first column is event name, the second is the number of probe hits,
the third is the number of probe miss-hits.
Usage examples
--------------
To add a probe as a new event, write a new definition to kprobe_events
as below.
echo p:myprobe do_sys_open dfd=$arg0 filename=$arg1 flags=$arg2 mode=$arg3 > /sys/kernel/debug/tracing/kprobe_events
This sets a kprobe on the top of do_sys_open() function with recording
1st to 4th arguments as "myprobe" event. As this example shows, users can
choose more familiar names for each arguments.
echo r:myretprobe do_sys_open $retval >> /sys/kernel/debug/tracing/kprobe_events
This sets a kretprobe on the return point of do_sys_open() function with
recording return value as "myretprobe" event.
You can see the format of these events via
/sys/kernel/debug/tracing/events/kprobes/<EVENT>/format.
cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 75
format:
field:unsigned short common_type; offset:0; size:2;
field:unsigned char common_flags; offset:2; size:1;
field:unsigned char common_preempt_count; offset:3; size:1;
field:int common_pid; offset:4; size:4;
field:int common_tgid; offset:8; size:4;
field: unsigned long ip; offset:16;tsize:8;
field: int nargs; offset:24;tsize:4;
field: unsigned long dfd; offset:32;tsize:8;
field: unsigned long filename; offset:40;tsize:8;
field: unsigned long flags; offset:48;tsize:8;
field: unsigned long mode; offset:56;tsize:8;
print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->ip, REC->dfd, REC->filename, REC->flags, REC->mode
You can see that the event has 4 arguments as in the expressions you specified.
echo > /sys/kernel/debug/tracing/kprobe_events
This clears all probe points.
Right after definition, each event is disabled by default. For tracing these
events, you need to enable it.
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
And you can see the traced information via /sys/kernel/debug/tracing/trace.
cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
<...>-1447 [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
<...>-1447 [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) $retval=fffffffffffffffe
<...>-1447 [001] 1038282.286885: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=40413c flags=8000 mode=1b6
<...>-1447 [001] 1038282.286915: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
<...>-1447 [001] 1038282.286969: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=4041c6 flags=98800 mode=10
<...>-1447 [001] 1038282.286976: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
Each line shows when the kernel hits an event, and <- SYMBOL means kernel
returns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel
returns from do_sys_open to sys_open+0x1b).
...@@ -126,4 +126,11 @@ config HAVE_DMA_API_DEBUG ...@@ -126,4 +126,11 @@ config HAVE_DMA_API_DEBUG
config HAVE_DEFAULT_NO_SPIN_MUTEXES config HAVE_DEFAULT_NO_SPIN_MUTEXES
bool bool
config HAVE_HW_BREAKPOINT
bool
depends on HAVE_PERF_EVENTS
select ANON_INODES
select PERF_EVENTS
source "kernel/gcov/Kconfig" source "kernel/gcov/Kconfig"
...@@ -46,7 +46,7 @@ config DEBUG_STACK_USAGE ...@@ -46,7 +46,7 @@ config DEBUG_STACK_USAGE
config HCALL_STATS config HCALL_STATS
bool "Hypervisor call instrumentation" bool "Hypervisor call instrumentation"
depends on PPC_PSERIES && DEBUG_FS depends on PPC_PSERIES && DEBUG_FS && TRACEPOINTS
help help
Adds code to keep track of the number of hypervisor calls made and Adds code to keep track of the number of hypervisor calls made and
the amount of time spent in hypervisor calls. Wall time spent in the amount of time spent in hypervisor calls. Wall time spent in
......
...@@ -1683,7 +1683,7 @@ CONFIG_HAVE_ARCH_KGDB=y ...@@ -1683,7 +1683,7 @@ CONFIG_HAVE_ARCH_KGDB=y
CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PAGEALLOC is not set # CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_HCALL_STATS=y # CONFIG_HCALL_STATS is not set
# CONFIG_CODE_PATCHING_SELFTEST is not set # CONFIG_CODE_PATCHING_SELFTEST is not set
# CONFIG_FTR_FIXUP_SELFTEST is not set # CONFIG_FTR_FIXUP_SELFTEST is not set
# CONFIG_MSI_BITMAP_SELFTEST is not set # CONFIG_MSI_BITMAP_SELFTEST is not set
......
...@@ -19,6 +19,7 @@ ...@@ -19,6 +19,7 @@
#define _ASM_POWERPC_EMULATED_OPS_H #define _ASM_POWERPC_EMULATED_OPS_H
#include <asm/atomic.h> #include <asm/atomic.h>
#include <linux/perf_event.h>
#ifdef CONFIG_PPC_EMULATED_STATS #ifdef CONFIG_PPC_EMULATED_STATS
...@@ -57,7 +58,7 @@ extern u32 ppc_warn_emulated; ...@@ -57,7 +58,7 @@ extern u32 ppc_warn_emulated;
extern void ppc_warn_emulated_print(const char *type); extern void ppc_warn_emulated_print(const char *type);
#define PPC_WARN_EMULATED(type) \ #define __PPC_WARN_EMULATED(type) \
do { \ do { \
atomic_inc(&ppc_emulated.type.val); \ atomic_inc(&ppc_emulated.type.val); \
if (ppc_warn_emulated) \ if (ppc_warn_emulated) \
...@@ -66,8 +67,22 @@ extern void ppc_warn_emulated_print(const char *type); ...@@ -66,8 +67,22 @@ extern void ppc_warn_emulated_print(const char *type);
#else /* !CONFIG_PPC_EMULATED_STATS */ #else /* !CONFIG_PPC_EMULATED_STATS */
#define PPC_WARN_EMULATED(type) do { } while (0) #define __PPC_WARN_EMULATED(type) do { } while (0)
#endif /* !CONFIG_PPC_EMULATED_STATS */ #endif /* !CONFIG_PPC_EMULATED_STATS */
#define PPC_WARN_EMULATED(type, regs) \
do { \
perf_sw_event(PERF_COUNT_SW_EMULATION_FAULTS, \
1, 0, regs, 0); \
__PPC_WARN_EMULATED(type); \
} while (0)
#define PPC_WARN_ALIGNMENT(type, regs) \
do { \
perf_sw_event(PERF_COUNT_SW_ALIGNMENT_FAULTS, \
1, 0, regs, regs->dar); \
__PPC_WARN_EMULATED(type); \
} while (0)
#endif /* _ASM_POWERPC_EMULATED_OPS_H */ #endif /* _ASM_POWERPC_EMULATED_OPS_H */
...@@ -274,6 +274,8 @@ struct hcall_stats { ...@@ -274,6 +274,8 @@ struct hcall_stats {
unsigned long num_calls; /* number of calls (on this CPU) */ unsigned long num_calls; /* number of calls (on this CPU) */
unsigned long tb_total; /* total wall time (mftb) of calls. */ unsigned long tb_total; /* total wall time (mftb) of calls. */
unsigned long purr_total; /* total cpu time (PURR) of calls. */ unsigned long purr_total; /* total cpu time (PURR) of calls. */
unsigned long tb_start;
unsigned long purr_start;
}; };
#define HCALL_STAT_ARRAY_SIZE ((MAX_HCALL_OPCODE >> 2) + 1) #define HCALL_STAT_ARRAY_SIZE ((MAX_HCALL_OPCODE >> 2) + 1)
......
...@@ -489,6 +489,8 @@ ...@@ -489,6 +489,8 @@
#define SPRN_MMCR1 798 #define SPRN_MMCR1 798
#define SPRN_MMCRA 0x312 #define SPRN_MMCRA 0x312
#define MMCRA_SDSYNC 0x80000000UL /* SDAR synced with SIAR */ #define MMCRA_SDSYNC 0x80000000UL /* SDAR synced with SIAR */
#define MMCRA_SDAR_DCACHE_MISS 0x40000000UL
#define MMCRA_SDAR_ERAT_MISS 0x20000000UL
#define MMCRA_SIHV 0x10000000UL /* state of MSR HV when SIAR set */ #define MMCRA_SIHV 0x10000000UL /* state of MSR HV when SIAR set */
#define MMCRA_SIPR 0x08000000UL /* state of MSR PR when SIAR set */ #define MMCRA_SIPR 0x08000000UL /* state of MSR PR when SIAR set */
#define MMCRA_SLOT 0x07000000UL /* SLOT bits (37-39) */ #define MMCRA_SLOT 0x07000000UL /* SLOT bits (37-39) */
......
#undef TRACE_SYSTEM
#define TRACE_SYSTEM powerpc
#if !defined(_TRACE_POWERPC_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_POWERPC_H
#include <linux/tracepoint.h>
struct pt_regs;
TRACE_EVENT(irq_entry,
TP_PROTO(struct pt_regs *regs),
TP_ARGS(regs),
TP_STRUCT__entry(
__field(struct pt_regs *, regs)
),
TP_fast_assign(
__entry->regs = regs;
),
TP_printk("pt_regs=%p", __entry->regs)
);
TRACE_EVENT(irq_exit,
TP_PROTO(struct pt_regs *regs),
TP_ARGS(regs),
TP_STRUCT__entry(
__field(struct pt_regs *, regs)
),
TP_fast_assign(
__entry->regs = regs;
),
TP_printk("pt_regs=%p", __entry->regs)
);
TRACE_EVENT(timer_interrupt_entry,
TP_PROTO(struct pt_regs *regs),
TP_ARGS(regs),
TP_STRUCT__entry(
__field(struct pt_regs *, regs)
),
TP_fast_assign(
__entry->regs = regs;
),
TP_printk("pt_regs=%p", __entry->regs)
);
TRACE_EVENT(timer_interrupt_exit,
TP_PROTO(struct pt_regs *regs),
TP_ARGS(regs),
TP_STRUCT__entry(
__field(struct pt_regs *, regs)
),
TP_fast_assign(
__entry->regs = regs;
),
TP_printk("pt_regs=%p", __entry->regs)
);
#ifdef CONFIG_PPC_PSERIES
extern void hcall_tracepoint_regfunc(void);
extern void hcall_tracepoint_unregfunc(void);
TRACE_EVENT_FN(hcall_entry,
TP_PROTO(unsigned long opcode, unsigned long *args),
TP_ARGS(opcode, args),
TP_STRUCT__entry(
__field(unsigned long, opcode)
),
TP_fast_assign(
__entry->opcode = opcode;
),
TP_printk("opcode=%lu", __entry->opcode),
hcall_tracepoint_regfunc, hcall_tracepoint_unregfunc
);
TRACE_EVENT_FN(hcall_exit,
TP_PROTO(unsigned long opcode, unsigned long retval,
unsigned long *retbuf),
TP_ARGS(opcode, retval, retbuf),
TP_STRUCT__entry(
__field(unsigned long, opcode)
__field(unsigned long, retval)
),
TP_fast_assign(
__entry->opcode = opcode;
__entry->retval = retval;
),
TP_printk("opcode=%lu retval=%lu", __entry->opcode, __entry->retval),
hcall_tracepoint_regfunc, hcall_tracepoint_unregfunc
);
#endif
#endif /* _TRACE_POWERPC_H */
#undef TRACE_INCLUDE_PATH
#undef TRACE_INCLUDE_FILE
#define TRACE_INCLUDE_PATH asm
#define TRACE_INCLUDE_FILE trace
#include <trace/define_trace.h>
...@@ -732,7 +732,7 @@ int fix_alignment(struct pt_regs *regs) ...@@ -732,7 +732,7 @@ int fix_alignment(struct pt_regs *regs)
#ifdef CONFIG_SPE #ifdef CONFIG_SPE
if ((instr >> 26) == 0x4) { if ((instr >> 26) == 0x4) {
PPC_WARN_EMULATED(spe); PPC_WARN_ALIGNMENT(spe, regs);
return emulate_spe(regs, reg, instr); return emulate_spe(regs, reg, instr);
} }
#endif #endif
...@@ -786,7 +786,7 @@ int fix_alignment(struct pt_regs *regs) ...@@ -786,7 +786,7 @@ int fix_alignment(struct pt_regs *regs)
flags |= SPLT; flags |= SPLT;
nb = 8; nb = 8;
} }
PPC_WARN_EMULATED(vsx); PPC_WARN_ALIGNMENT(vsx, regs);
return emulate_vsx(addr, reg, areg, regs, flags, nb); return emulate_vsx(addr, reg, areg, regs, flags, nb);
} }
#endif #endif
...@@ -794,7 +794,7 @@ int fix_alignment(struct pt_regs *regs) ...@@ -794,7 +794,7 @@ int fix_alignment(struct pt_regs *regs)
* the exception of DCBZ which is handled as a special case here * the exception of DCBZ which is handled as a special case here
*/ */
if (instr == DCBZ) { if (instr == DCBZ) {
PPC_WARN_EMULATED(dcbz); PPC_WARN_ALIGNMENT(dcbz, regs);
return emulate_dcbz(regs, addr); return emulate_dcbz(regs, addr);
} }
if (unlikely(nb == 0)) if (unlikely(nb == 0))
...@@ -804,7 +804,7 @@ int fix_alignment(struct pt_regs *regs) ...@@ -804,7 +804,7 @@ int fix_alignment(struct pt_regs *regs)
* function * function
*/ */
if (flags & M) { if (flags & M) {
PPC_WARN_EMULATED(multiple); PPC_WARN_ALIGNMENT(multiple, regs);
return emulate_multiple(regs, addr, reg, nb, return emulate_multiple(regs, addr, reg, nb,
flags, instr, swiz); flags, instr, swiz);
} }
...@@ -825,11 +825,11 @@ int fix_alignment(struct pt_regs *regs) ...@@ -825,11 +825,11 @@ int fix_alignment(struct pt_regs *regs)
/* Special case for 16-byte FP loads and stores */ /* Special case for 16-byte FP loads and stores */
if (nb == 16) { if (nb == 16) {
PPC_WARN_EMULATED(fp_pair); PPC_WARN_ALIGNMENT(fp_pair, regs);
return emulate_fp_pair(addr, reg, flags); return emulate_fp_pair(addr, reg, flags);
} }
PPC_WARN_EMULATED(unaligned); PPC_WARN_ALIGNMENT(unaligned, regs);
/* If we are loading, get the data from user space, else /* If we are loading, get the data from user space, else
* get it from register values * get it from register values
......
...@@ -551,7 +551,7 @@ restore: ...@@ -551,7 +551,7 @@ restore:
BEGIN_FW_FTR_SECTION BEGIN_FW_FTR_SECTION
ld r5,SOFTE(r1) ld r5,SOFTE(r1)
FW_FTR_SECTION_ELSE FW_FTR_SECTION_ELSE
b iseries_check_pending_irqs b .Liseries_check_pending_irqs
ALT_FW_FTR_SECTION_END_IFCLR(FW_FEATURE_ISERIES) ALT_FW_FTR_SECTION_END_IFCLR(FW_FEATURE_ISERIES)
2: 2:
TRACE_AND_RESTORE_IRQ(r5); TRACE_AND_RESTORE_IRQ(r5);
...@@ -623,7 +623,7 @@ ALT_FW_FTR_SECTION_END_IFCLR(FW_FEATURE_ISERIES) ...@@ -623,7 +623,7 @@ ALT_FW_FTR_SECTION_END_IFCLR(FW_FEATURE_ISERIES)
#endif /* CONFIG_PPC_BOOK3E */ #endif /* CONFIG_PPC_BOOK3E */
iseries_check_pending_irqs: .Liseries_check_pending_irqs:
#ifdef CONFIG_PPC_ISERIES #ifdef CONFIG_PPC_ISERIES
ld r5,SOFTE(r1) ld r5,SOFTE(r1)
cmpdi 0,r5,0 cmpdi 0,r5,0
......
...@@ -185,12 +185,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) ...@@ -185,12 +185,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
* prolog code of the PerformanceMonitor one. A little * prolog code of the PerformanceMonitor one. A little
* trickery is thus necessary * trickery is thus necessary
*/ */
performance_monitor_pSeries_1:
. = 0xf00 . = 0xf00
b performance_monitor_pSeries b performance_monitor_pSeries
altivec_unavailable_pSeries_1:
. = 0xf20 . = 0xf20
b altivec_unavailable_pSeries b altivec_unavailable_pSeries
vsx_unavailable_pSeries_1:
. = 0xf40 . = 0xf40
b vsx_unavailable_pSeries b vsx_unavailable_pSeries
......
...@@ -70,6 +70,8 @@ ...@@ -70,6 +70,8 @@
#include <asm/firmware.h> #include <asm/firmware.h>
#include <asm/lv1call.h> #include <asm/lv1call.h>
#endif #endif
#define CREATE_TRACE_POINTS
#include <asm/trace.h>
int __irq_offset_value; int __irq_offset_value;
static int ppc_spurious_interrupts; static int ppc_spurious_interrupts;
...@@ -325,6 +327,8 @@ void do_IRQ(struct pt_regs *regs) ...@@ -325,6 +327,8 @@ void do_IRQ(struct pt_regs *regs)
struct pt_regs *old_regs = set_irq_regs(regs); struct pt_regs *old_regs = set_irq_regs(regs);
unsigned int irq; unsigned int irq;
trace_irq_entry(regs);
irq_enter(); irq_enter();
check_stack_overflow(); check_stack_overflow();
...@@ -348,6 +352,8 @@ void do_IRQ(struct pt_regs *regs) ...@@ -348,6 +352,8 @@ void do_IRQ(struct pt_regs *regs)
timer_interrupt(regs); timer_interrupt(regs);
} }
#endif #endif
trace_irq_exit(regs);
} }
void __init init_IRQ(void) void __init init_IRQ(void)
......
...@@ -1165,7 +1165,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val, ...@@ -1165,7 +1165,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
*/ */
if (record) { if (record) {
struct perf_sample_data data = { struct perf_sample_data data = {
.addr = 0, .addr = ~0ULL,
.period = event->hw.last_period, .period = event->hw.last_period,
}; };
......
...@@ -72,10 +72,6 @@ ...@@ -72,10 +72,6 @@
#define MMCR1_PMCSEL_SH(n) (MMCR1_PMC1SEL_SH - (n) * 8) #define MMCR1_PMCSEL_SH(n) (MMCR1_PMC1SEL_SH - (n) * 8)
#define MMCR1_PMCSEL_MSK 0x7f #define MMCR1_PMCSEL_MSK 0x7f
/*
* Bits in MMCRA
*/
/* /*
* Layout of constraint bits: * Layout of constraint bits:
* 6666555555555544444444443333333333222222222211111111110000000000 * 6666555555555544444444443333333333222222222211111111110000000000
......
...@@ -72,10 +72,6 @@ ...@@ -72,10 +72,6 @@
#define MMCR1_PMCSEL_SH(n) (MMCR1_PMC1SEL_SH - (n) * 8) #define MMCR1_PMCSEL_SH(n) (MMCR1_PMC1SEL_SH - (n) * 8)
#define MMCR1_PMCSEL_MSK 0x7f #define MMCR1_PMCSEL_MSK 0x7f
/*
* Bits in MMCRA
*/
/* /*
* Layout of constraint bits: * Layout of constraint bits:
* 6666555555555544444444443333333333222222222211111111110000000000 * 6666555555555544444444443333333333222222222211111111110000000000
...@@ -390,7 +386,7 @@ static int power5_compute_mmcr(u64 event[], int n_ev, ...@@ -390,7 +386,7 @@ static int power5_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], unsigned long mmcr[]) unsigned int hwc[], unsigned long mmcr[])
{ {
unsigned long mmcr1 = 0; unsigned long mmcr1 = 0;
unsigned long mmcra = 0; unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
unsigned int pmc, unit, byte, psel; unsigned int pmc, unit, byte, psel;
unsigned int ttm, grp; unsigned int ttm, grp;
int i, isbus, bit, grsel; int i, isbus, bit, grsel;
......
...@@ -178,7 +178,7 @@ static int p6_compute_mmcr(u64 event[], int n_ev, ...@@ -178,7 +178,7 @@ static int p6_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], unsigned long mmcr[]) unsigned int hwc[], unsigned long mmcr[])
{ {
unsigned long mmcr1 = 0; unsigned long mmcr1 = 0;
unsigned long mmcra = 0; unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
int i; int i;
unsigned int pmc, ev, b, u, s, psel; unsigned int pmc, ev, b, u, s, psel;
unsigned int ttmset = 0; unsigned int ttmset = 0;
......
...@@ -50,10 +50,6 @@ ...@@ -50,10 +50,6 @@
#define MMCR1_PMCSEL_SH(n) (MMCR1_PMC1SEL_SH - (n) * 8) #define MMCR1_PMCSEL_SH(n) (MMCR1_PMC1SEL_SH - (n) * 8)
#define MMCR1_PMCSEL_MSK 0xff #define MMCR1_PMCSEL_MSK 0xff
/*
* Bits in MMCRA
*/
/* /*
* Layout of constraint bits: * Layout of constraint bits:
* 6666555555555544444444443333333333222222222211111111110000000000 * 6666555555555544444444443333333333222222222211111111110000000000
...@@ -230,7 +226,7 @@ static int power7_compute_mmcr(u64 event[], int n_ev, ...@@ -230,7 +226,7 @@ static int power7_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], unsigned long mmcr[]) unsigned int hwc[], unsigned long mmcr[])
{ {
unsigned long mmcr1 = 0; unsigned long mmcr1 = 0;
unsigned long mmcra = 0; unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
unsigned int pmc, unit, combine, l2sel, psel; unsigned int pmc, unit, combine, l2sel, psel;
unsigned int pmc_inuse = 0; unsigned int pmc_inuse = 0;
int i; int i;
......
...@@ -83,10 +83,6 @@ static short mmcr1_adder_bits[8] = { ...@@ -83,10 +83,6 @@ static short mmcr1_adder_bits[8] = {
MMCR1_PMC8_ADDER_SEL_SH MMCR1_PMC8_ADDER_SEL_SH
}; };
/*
* Bits in MMCRA
*/
/* /*
* Layout of constraint bits: * Layout of constraint bits:
* 6666555555555544444444443333333333222222222211111111110000000000 * 6666555555555544444444443333333333222222222211111111110000000000
......
...@@ -660,6 +660,7 @@ late_initcall(check_cache_coherency); ...@@ -660,6 +660,7 @@ late_initcall(check_cache_coherency);
#ifdef CONFIG_DEBUG_FS #ifdef CONFIG_DEBUG_FS
struct dentry *powerpc_debugfs_root; struct dentry *powerpc_debugfs_root;
EXPORT_SYMBOL(powerpc_debugfs_root);
static int powerpc_debugfs_init(void) static int powerpc_debugfs_init(void)
{ {
......
...@@ -54,6 +54,7 @@ ...@@ -54,6 +54,7 @@
#include <linux/irq.h> #include <linux/irq.h>
#include <linux/delay.h> #include <linux/delay.h>
#include <linux/perf_event.h> #include <linux/perf_event.h>
#include <asm/trace.h>
#include <asm/io.h> #include <asm/io.h>
#include <asm/processor.h> #include <asm/processor.h>
...@@ -571,6 +572,8 @@ void timer_interrupt(struct pt_regs * regs) ...@@ -571,6 +572,8 @@ void timer_interrupt(struct pt_regs * regs)
struct clock_event_device *evt = &decrementer->event; struct clock_event_device *evt = &decrementer->event;
u64 now; u64 now;
trace_timer_interrupt_entry(regs);
/* Ensure a positive value is written to the decrementer, or else /* Ensure a positive value is written to the decrementer, or else
* some CPUs will continuue to take decrementer exceptions */ * some CPUs will continuue to take decrementer exceptions */
set_dec(DECREMENTER_MAX); set_dec(DECREMENTER_MAX);
...@@ -590,6 +593,7 @@ void timer_interrupt(struct pt_regs * regs) ...@@ -590,6 +593,7 @@ void timer_interrupt(struct pt_regs * regs)
now = decrementer->next_tb - now; now = decrementer->next_tb - now;
if (now <= DECREMENTER_MAX) if (now <= DECREMENTER_MAX)
set_dec((int)now); set_dec((int)now);
trace_timer_interrupt_exit(regs);
return; return;
} }
old_regs = set_irq_regs(regs); old_regs = set_irq_regs(regs);
...@@ -620,6 +624,8 @@ void timer_interrupt(struct pt_regs * regs) ...@@ -620,6 +624,8 @@ void timer_interrupt(struct pt_regs * regs)
irq_exit(); irq_exit();
set_irq_regs(old_regs); set_irq_regs(old_regs);
trace_timer_interrupt_exit(regs);
} }
void wakeup_decrementer(void) void wakeup_decrementer(void)
......
...@@ -759,7 +759,7 @@ static int emulate_instruction(struct pt_regs *regs) ...@@ -759,7 +759,7 @@ static int emulate_instruction(struct pt_regs *regs)
/* Emulate the mfspr rD, PVR. */ /* Emulate the mfspr rD, PVR. */
if ((instword & PPC_INST_MFSPR_PVR_MASK) == PPC_INST_MFSPR_PVR) { if ((instword & PPC_INST_MFSPR_PVR_MASK) == PPC_INST_MFSPR_PVR) {
PPC_WARN_EMULATED(mfpvr); PPC_WARN_EMULATED(mfpvr, regs);
rd = (instword >> 21) & 0x1f; rd = (instword >> 21) & 0x1f;
regs->gpr[rd] = mfspr(SPRN_PVR); regs->gpr[rd] = mfspr(SPRN_PVR);
return 0; return 0;
...@@ -767,7 +767,7 @@ static int emulate_instruction(struct pt_regs *regs) ...@@ -767,7 +767,7 @@ static int emulate_instruction(struct pt_regs *regs)
/* Emulating the dcba insn is just a no-op. */ /* Emulating the dcba insn is just a no-op. */
if ((instword & PPC_INST_DCBA_MASK) == PPC_INST_DCBA) { if ((instword & PPC_INST_DCBA_MASK) == PPC_INST_DCBA) {
PPC_WARN_EMULATED(dcba); PPC_WARN_EMULATED(dcba, regs);
return 0; return 0;
} }
...@@ -776,7 +776,7 @@ static int emulate_instruction(struct pt_regs *regs) ...@@ -776,7 +776,7 @@ static int emulate_instruction(struct pt_regs *regs)
int shift = (instword >> 21) & 0x1c; int shift = (instword >> 21) & 0x1c;
unsigned long msk = 0xf0000000UL >> shift; unsigned long msk = 0xf0000000UL >> shift;
PPC_WARN_EMULATED(mcrxr); PPC_WARN_EMULATED(mcrxr, regs);
regs->ccr = (regs->ccr & ~msk) | ((regs->xer >> shift) & msk); regs->ccr = (regs->ccr & ~msk) | ((regs->xer >> shift) & msk);
regs->xer &= ~0xf0000000UL; regs->xer &= ~0xf0000000UL;
return 0; return 0;
...@@ -784,19 +784,19 @@ static int emulate_instruction(struct pt_regs *regs) ...@@ -784,19 +784,19 @@ static int emulate_instruction(struct pt_regs *regs)
/* Emulate load/store string insn. */ /* Emulate load/store string insn. */
if ((instword & PPC_INST_STRING_GEN_MASK) == PPC_INST_STRING) { if ((instword & PPC_INST_STRING_GEN_MASK) == PPC_INST_STRING) {
PPC_WARN_EMULATED(string); PPC_WARN_EMULATED(string, regs);
return emulate_string_inst(regs, instword); return emulate_string_inst(regs, instword);
} }
/* Emulate the popcntb (Population Count Bytes) instruction. */ /* Emulate the popcntb (Population Count Bytes) instruction. */
if ((instword & PPC_INST_POPCNTB_MASK) == PPC_INST_POPCNTB) { if ((instword & PPC_INST_POPCNTB_MASK) == PPC_INST_POPCNTB) {
PPC_WARN_EMULATED(popcntb); PPC_WARN_EMULATED(popcntb, regs);
return emulate_popcntb_inst(regs, instword); return emulate_popcntb_inst(regs, instword);
} }
/* Emulate isel (Integer Select) instruction */ /* Emulate isel (Integer Select) instruction */
if ((instword & PPC_INST_ISEL_MASK) == PPC_INST_ISEL) { if ((instword & PPC_INST_ISEL_MASK) == PPC_INST_ISEL) {
PPC_WARN_EMULATED(isel); PPC_WARN_EMULATED(isel, regs);
return emulate_isel(regs, instword); return emulate_isel(regs, instword);
} }
...@@ -995,7 +995,7 @@ void SoftwareEmulation(struct pt_regs *regs) ...@@ -995,7 +995,7 @@ void SoftwareEmulation(struct pt_regs *regs)
#ifdef CONFIG_MATH_EMULATION #ifdef CONFIG_MATH_EMULATION
errcode = do_mathemu(regs); errcode = do_mathemu(regs);
if (errcode >= 0) if (errcode >= 0)
PPC_WARN_EMULATED(math); PPC_WARN_EMULATED(math, regs);
switch (errcode) { switch (errcode) {
case 0: case 0:
...@@ -1018,7 +1018,7 @@ void SoftwareEmulation(struct pt_regs *regs) ...@@ -1018,7 +1018,7 @@ void SoftwareEmulation(struct pt_regs *regs)
#elif defined(CONFIG_8XX_MINIMAL_FPEMU) #elif defined(CONFIG_8XX_MINIMAL_FPEMU)
errcode = Soft_emulate_8xx(regs); errcode = Soft_emulate_8xx(regs);
if (errcode >= 0) if (errcode >= 0)
PPC_WARN_EMULATED(8xx); PPC_WARN_EMULATED(8xx, regs);
switch (errcode) { switch (errcode) {
case 0: case 0:
...@@ -1129,7 +1129,7 @@ void altivec_assist_exception(struct pt_regs *regs) ...@@ -1129,7 +1129,7 @@ void altivec_assist_exception(struct pt_regs *regs)
flush_altivec_to_thread(current); flush_altivec_to_thread(current);
PPC_WARN_EMULATED(altivec); PPC_WARN_EMULATED(altivec, regs);
err = emulate_altivec(regs); err = emulate_altivec(regs);
if (err == 0) { if (err == 0) {
regs->nip += 4; /* skip emulated instruction */ regs->nip += 4; /* skip emulated instruction */
......
...@@ -26,11 +26,11 @@ BEGIN_FTR_SECTION ...@@ -26,11 +26,11 @@ BEGIN_FTR_SECTION
srd r8,r5,r11 srd r8,r5,r11
mtctr r8 mtctr r8
setup: .Lsetup:
dcbt r9,r4 dcbt r9,r4
dcbz r9,r3 dcbz r9,r3
add r9,r9,r12 add r9,r9,r12
bdnz setup bdnz .Lsetup
END_FTR_SECTION_IFSET(CPU_FTR_CP_USE_DCBTZ) END_FTR_SECTION_IFSET(CPU_FTR_CP_USE_DCBTZ)
addi r3,r3,-8 addi r3,r3,-8
srdi r8,r5,7 /* page is copied in 128 byte strides */ srdi r8,r5,7 /* page is copied in 128 byte strides */
......
...@@ -14,68 +14,94 @@ ...@@ -14,68 +14,94 @@
#define STK_PARM(i) (48 + ((i)-3)*8) #define STK_PARM(i) (48 + ((i)-3)*8)
#ifdef CONFIG_HCALL_STATS #ifdef CONFIG_TRACEPOINTS
.section ".toc","aw"
.globl hcall_tracepoint_refcount
hcall_tracepoint_refcount:
.llong 0
.section ".text"
/* /*
* precall must preserve all registers. use unused STK_PARM() * precall must preserve all registers. use unused STK_PARM()
* areas to save snapshots and opcode. * areas to save snapshots and opcode. We branch around this
* in early init (eg when populating the MMU hashtable) by using an
* unconditional cpu feature.
*/ */
#define HCALL_INST_PRECALL \ #define HCALL_INST_PRECALL(FIRST_REG) \
std r3,STK_PARM(r3)(r1); /* save opcode */ \
mftb r0; /* get timebase and */ \
std r0,STK_PARM(r5)(r1); /* save for later */ \
BEGIN_FTR_SECTION; \ BEGIN_FTR_SECTION; \
mfspr r0,SPRN_PURR; /* get PURR and */ \ b 1f; \
std r0,STK_PARM(r6)(r1); /* save for later */ \ END_FTR_SECTION(0, 1); \
END_FTR_SECTION_IFSET(CPU_FTR_PURR); ld r12,hcall_tracepoint_refcount@toc(r2); \
cmpdi r12,0; \
beq+ 1f; \
mflr r0; \
std r3,STK_PARM(r3)(r1); \
std r4,STK_PARM(r4)(r1); \
std r5,STK_PARM(r5)(r1); \
std r6,STK_PARM(r6)(r1); \
std r7,STK_PARM(r7)(r1); \
std r8,STK_PARM(r8)(r1); \
std r9,STK_PARM(r9)(r1); \
std r10,STK_PARM(r10)(r1); \
std r0,16(r1); \
addi r4,r1,STK_PARM(FIRST_REG); \
stdu r1,-STACK_FRAME_OVERHEAD(r1); \
bl .__trace_hcall_entry; \
addi r1,r1,STACK_FRAME_OVERHEAD; \
ld r0,16(r1); \
ld r3,STK_PARM(r3)(r1); \
ld r4,STK_PARM(r4)(r1); \
ld r5,STK_PARM(r5)(r1); \
ld r6,STK_PARM(r6)(r1); \
ld r7,STK_PARM(r7)(r1); \
ld r8,STK_PARM(r8)(r1); \
ld r9,STK_PARM(r9)(r1); \
ld r10,STK_PARM(r10)(r1); \
mtlr r0; \
1:
/* /*
* postcall is performed immediately before function return which * postcall is performed immediately before function return which
* allows liberal use of volatile registers. We branch around this * allows liberal use of volatile registers. We branch around this
* in early init (eg when populating the MMU hashtable) by using an * in early init (eg when populating the MMU hashtable) by using an
* unconditional cpu feature. * unconditional cpu feature.
*/ */
#define HCALL_INST_POSTCALL \ #define __HCALL_INST_POSTCALL \
BEGIN_FTR_SECTION; \ BEGIN_FTR_SECTION; \
b 1f; \ b 1f; \
END_FTR_SECTION(0, 1); \ END_FTR_SECTION(0, 1); \
ld r4,STK_PARM(r3)(r1); /* validate opcode */ \ ld r12,hcall_tracepoint_refcount@toc(r2); \
cmpldi cr7,r4,MAX_HCALL_OPCODE; \ cmpdi r12,0; \
bgt- cr7,1f; \ beq+ 1f; \
\ mflr r0; \
/* get time and PURR snapshots after hcall */ \ ld r6,STK_PARM(r3)(r1); \
mftb r7; /* timebase after */ \ std r3,STK_PARM(r3)(r1); \
BEGIN_FTR_SECTION; \ mr r4,r3; \
mfspr r8,SPRN_PURR; /* PURR after */ \ mr r3,r6; \
ld r6,STK_PARM(r6)(r1); /* PURR before */ \ std r0,16(r1); \
subf r6,r6,r8; /* delta */ \ stdu r1,-STACK_FRAME_OVERHEAD(r1); \
END_FTR_SECTION_IFSET(CPU_FTR_PURR); \ bl .__trace_hcall_exit; \
ld r5,STK_PARM(r5)(r1); /* timebase before */ \ addi r1,r1,STACK_FRAME_OVERHEAD; \
subf r5,r5,r7; /* time delta */ \ ld r0,16(r1); \
\ ld r3,STK_PARM(r3)(r1); \
/* calculate address of stat structure r4 = opcode */ \ mtlr r0; \
srdi r4,r4,2; /* index into array */ \
mulli r4,r4,HCALL_STAT_SIZE; \
LOAD_REG_ADDR(r7, per_cpu__hcall_stats); \
add r4,r4,r7; \
ld r7,PACA_DATA_OFFSET(r13); /* per cpu offset */ \
add r4,r4,r7; \
\
/* update stats */ \
ld r7,HCALL_STAT_CALLS(r4); /* count */ \
addi r7,r7,1; \
std r7,HCALL_STAT_CALLS(r4); \
ld r7,HCALL_STAT_TB(r4); /* timebase */ \
add r7,r7,r5; \
std r7,HCALL_STAT_TB(r4); \
BEGIN_FTR_SECTION; \
ld r7,HCALL_STAT_PURR(r4); /* PURR */ \
add r7,r7,r6; \
std r7,HCALL_STAT_PURR(r4); \
END_FTR_SECTION_IFSET(CPU_FTR_PURR); \
1: 1:
#define HCALL_INST_POSTCALL_NORETS \
li r5,0; \
__HCALL_INST_POSTCALL
#define HCALL_INST_POSTCALL(BUFREG) \
mr r5,BUFREG; \
__HCALL_INST_POSTCALL
#else #else
#define HCALL_INST_PRECALL #define HCALL_INST_PRECALL(FIRST_ARG)
#define HCALL_INST_POSTCALL #define HCALL_INST_POSTCALL_NORETS
#define HCALL_INST_POSTCALL(BUFREG)
#endif #endif
.text .text
...@@ -86,11 +112,11 @@ _GLOBAL(plpar_hcall_norets) ...@@ -86,11 +112,11 @@ _GLOBAL(plpar_hcall_norets)
mfcr r0 mfcr r0
stw r0,8(r1) stw r0,8(r1)
HCALL_INST_PRECALL HCALL_INST_PRECALL(r4)
HVSC /* invoke the hypervisor */ HVSC /* invoke the hypervisor */
HCALL_INST_POSTCALL HCALL_INST_POSTCALL_NORETS
lwz r0,8(r1) lwz r0,8(r1)
mtcrf 0xff,r0 mtcrf 0xff,r0
...@@ -102,7 +128,7 @@ _GLOBAL(plpar_hcall) ...@@ -102,7 +128,7 @@ _GLOBAL(plpar_hcall)
mfcr r0 mfcr r0
stw r0,8(r1) stw r0,8(r1)
HCALL_INST_PRECALL HCALL_INST_PRECALL(r5)
std r4,STK_PARM(r4)(r1) /* Save ret buffer */ std r4,STK_PARM(r4)(r1) /* Save ret buffer */
...@@ -121,7 +147,7 @@ _GLOBAL(plpar_hcall) ...@@ -121,7 +147,7 @@ _GLOBAL(plpar_hcall)
std r6, 16(r12) std r6, 16(r12)
std r7, 24(r12) std r7, 24(r12)
HCALL_INST_POSTCALL HCALL_INST_POSTCALL(r12)
lwz r0,8(r1) lwz r0,8(r1)
mtcrf 0xff,r0 mtcrf 0xff,r0
...@@ -168,7 +194,7 @@ _GLOBAL(plpar_hcall9) ...@@ -168,7 +194,7 @@ _GLOBAL(plpar_hcall9)
mfcr r0 mfcr r0
stw r0,8(r1) stw r0,8(r1)
HCALL_INST_PRECALL HCALL_INST_PRECALL(r5)
std r4,STK_PARM(r4)(r1) /* Save ret buffer */ std r4,STK_PARM(r4)(r1) /* Save ret buffer */
...@@ -196,7 +222,7 @@ _GLOBAL(plpar_hcall9) ...@@ -196,7 +222,7 @@ _GLOBAL(plpar_hcall9)
std r11,56(r12) std r11,56(r12)
std r0, 64(r12) std r0, 64(r12)
HCALL_INST_POSTCALL HCALL_INST_POSTCALL(r12)
lwz r0,8(r1) lwz r0,8(r1)
mtcrf 0xff,r0 mtcrf 0xff,r0
......
...@@ -26,6 +26,7 @@ ...@@ -26,6 +26,7 @@
#include <asm/hvcall.h> #include <asm/hvcall.h>
#include <asm/firmware.h> #include <asm/firmware.h>
#include <asm/cputable.h> #include <asm/cputable.h>
#include <asm/trace.h>
DEFINE_PER_CPU(struct hcall_stats[HCALL_STAT_ARRAY_SIZE], hcall_stats); DEFINE_PER_CPU(struct hcall_stats[HCALL_STAT_ARRAY_SIZE], hcall_stats);
...@@ -100,6 +101,35 @@ static const struct file_operations hcall_inst_seq_fops = { ...@@ -100,6 +101,35 @@ static const struct file_operations hcall_inst_seq_fops = {
#define HCALL_ROOT_DIR "hcall_inst" #define HCALL_ROOT_DIR "hcall_inst"
#define CPU_NAME_BUF_SIZE 32 #define CPU_NAME_BUF_SIZE 32
static void probe_hcall_entry(unsigned long opcode, unsigned long *args)
{
struct hcall_stats *h;
if (opcode > MAX_HCALL_OPCODE)
return;
h = &get_cpu_var(hcall_stats)[opcode / 4];
h->tb_start = mftb();
h->purr_start = mfspr(SPRN_PURR);
}
static void probe_hcall_exit(unsigned long opcode, unsigned long retval,
unsigned long *retbuf)
{
struct hcall_stats *h;
if (opcode > MAX_HCALL_OPCODE)
return;
h = &__get_cpu_var(hcall_stats)[opcode / 4];
h->num_calls++;
h->tb_total = mftb() - h->tb_start;
h->purr_total = mfspr(SPRN_PURR) - h->purr_start;
put_cpu_var(hcall_stats);
}
static int __init hcall_inst_init(void) static int __init hcall_inst_init(void)
{ {
struct dentry *hcall_root; struct dentry *hcall_root;
...@@ -110,6 +140,14 @@ static int __init hcall_inst_init(void) ...@@ -110,6 +140,14 @@ static int __init hcall_inst_init(void)
if (!firmware_has_feature(FW_FEATURE_LPAR)) if (!firmware_has_feature(FW_FEATURE_LPAR))
return 0; return 0;
if (register_trace_hcall_entry(probe_hcall_entry))
return -EINVAL;
if (register_trace_hcall_exit(probe_hcall_exit)) {
unregister_trace_hcall_entry(probe_hcall_entry);
return -EINVAL;
}
hcall_root = debugfs_create_dir(HCALL_ROOT_DIR, NULL); hcall_root = debugfs_create_dir(HCALL_ROOT_DIR, NULL);
if (!hcall_root) if (!hcall_root)
return -ENOMEM; return -ENOMEM;
......
...@@ -39,6 +39,7 @@ ...@@ -39,6 +39,7 @@
#include <asm/cputable.h> #include <asm/cputable.h>
#include <asm/udbg.h> #include <asm/udbg.h>
#include <asm/smp.h> #include <asm/smp.h>
#include <asm/trace.h>
#include "plpar_wrappers.h" #include "plpar_wrappers.h"
#include "pseries.h" #include "pseries.h"
...@@ -661,3 +662,35 @@ void arch_free_page(struct page *page, int order) ...@@ -661,3 +662,35 @@ void arch_free_page(struct page *page, int order)
EXPORT_SYMBOL(arch_free_page); EXPORT_SYMBOL(arch_free_page);
#endif #endif
#ifdef CONFIG_TRACEPOINTS
/*
* We optimise our hcall path by placing hcall_tracepoint_refcount
* directly in the TOC so we can check if the hcall tracepoints are
* enabled via a single load.
*/
/* NB: reg/unreg are called while guarded with the tracepoints_mutex */
extern long hcall_tracepoint_refcount;
void hcall_tracepoint_regfunc(void)
{
hcall_tracepoint_refcount++;
}
void hcall_tracepoint_unregfunc(void)
{
hcall_tracepoint_refcount--;
}
void __trace_hcall_entry(unsigned long opcode, unsigned long *args)
{
trace_hcall_entry(opcode, args);
}
void __trace_hcall_exit(long opcode, unsigned long retval,
unsigned long *retbuf)
{
trace_hcall_exit(opcode, retval, retbuf);
}
#endif
...@@ -49,6 +49,7 @@ config X86 ...@@ -49,6 +49,7 @@ config X86
select HAVE_KERNEL_GZIP select HAVE_KERNEL_GZIP
select HAVE_KERNEL_BZIP2 select HAVE_KERNEL_BZIP2
select HAVE_KERNEL_LZMA select HAVE_KERNEL_LZMA
select HAVE_HW_BREAKPOINT
select HAVE_ARCH_KMEMCHECK select HAVE_ARCH_KMEMCHECK
config OUTPUT_FORMAT config OUTPUT_FORMAT
......
...@@ -186,6 +186,15 @@ config X86_DS_SELFTEST ...@@ -186,6 +186,15 @@ config X86_DS_SELFTEST
config HAVE_MMIOTRACE_SUPPORT config HAVE_MMIOTRACE_SUPPORT
def_bool y def_bool y
config X86_DECODER_SELFTEST
bool "x86 instruction decoder selftest"
depends on DEBUG_KERNEL
---help---
Perform x86 instruction decoder selftests at build time.
This option is useful for checking the sanity of x86 instruction
decoder code.
If unsure, say "N".
# #
# IO delay types: # IO delay types:
# #
......
...@@ -155,6 +155,9 @@ all: bzImage ...@@ -155,6 +155,9 @@ all: bzImage
KBUILD_IMAGE := $(boot)/bzImage KBUILD_IMAGE := $(boot)/bzImage
bzImage: vmlinux bzImage: vmlinux
ifeq ($(CONFIG_X86_DECODER_SELFTEST),y)
$(Q)$(MAKE) $(build)=arch/x86/tools posttest
endif
$(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE) $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE)
$(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot $(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot
$(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/$(UTS_MACHINE)/boot/$@ $(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/$(UTS_MACHINE)/boot/$@
......
...@@ -10,6 +10,7 @@ header-y += ptrace-abi.h ...@@ -10,6 +10,7 @@ header-y += ptrace-abi.h
header-y += sigcontext32.h header-y += sigcontext32.h
header-y += ucontext.h header-y += ucontext.h
header-y += processor-flags.h header-y += processor-flags.h
header-y += hw_breakpoint.h
unifdef-y += e820.h unifdef-y += e820.h
unifdef-y += ist.h unifdef-y += ist.h
......
...@@ -17,6 +17,7 @@ ...@@ -17,6 +17,7 @@
#include <linux/user.h> #include <linux/user.h>
#include <linux/elfcore.h> #include <linux/elfcore.h>
#include <asm/debugreg.h>
/* /*
* fill in the user structure for an a.out core dump * fill in the user structure for an a.out core dump
...@@ -32,14 +33,7 @@ static inline void aout_dump_thread(struct pt_regs *regs, struct user *dump) ...@@ -32,14 +33,7 @@ static inline void aout_dump_thread(struct pt_regs *regs, struct user *dump)
>> PAGE_SHIFT; >> PAGE_SHIFT;
dump->u_dsize -= dump->u_tsize; dump->u_dsize -= dump->u_tsize;
dump->u_ssize = 0; dump->u_ssize = 0;
dump->u_debugreg[0] = current->thread.debugreg0; aout_dump_debugregs(dump);
dump->u_debugreg[1] = current->thread.debugreg1;
dump->u_debugreg[2] = current->thread.debugreg2;
dump->u_debugreg[3] = current->thread.debugreg3;
dump->u_debugreg[4] = 0;
dump->u_debugreg[5] = 0;
dump->u_debugreg[6] = current->thread.debugreg6;
dump->u_debugreg[7] = current->thread.debugreg7;
if (dump->start_stack < TASK_SIZE) if (dump->start_stack < TASK_SIZE)
dump->u_ssize = ((unsigned long)(TASK_SIZE - dump->start_stack)) dump->u_ssize = ((unsigned long)(TASK_SIZE - dump->start_stack))
......
...@@ -18,6 +18,7 @@ ...@@ -18,6 +18,7 @@
#define DR_TRAP1 (0x2) /* db1 */ #define DR_TRAP1 (0x2) /* db1 */
#define DR_TRAP2 (0x4) /* db2 */ #define DR_TRAP2 (0x4) /* db2 */
#define DR_TRAP3 (0x8) /* db3 */ #define DR_TRAP3 (0x8) /* db3 */
#define DR_TRAP_BITS (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)
#define DR_STEP (0x4000) /* single-step */ #define DR_STEP (0x4000) /* single-step */
#define DR_SWITCH (0x8000) /* task switch */ #define DR_SWITCH (0x8000) /* task switch */
...@@ -49,6 +50,8 @@ ...@@ -49,6 +50,8 @@
#define DR_LOCAL_ENABLE_SHIFT 0 /* Extra shift to the local enable bit */ #define DR_LOCAL_ENABLE_SHIFT 0 /* Extra shift to the local enable bit */
#define DR_GLOBAL_ENABLE_SHIFT 1 /* Extra shift to the global enable bit */ #define DR_GLOBAL_ENABLE_SHIFT 1 /* Extra shift to the global enable bit */
#define DR_LOCAL_ENABLE (0x1) /* Local enable for reg 0 */
#define DR_GLOBAL_ENABLE (0x2) /* Global enable for reg 0 */
#define DR_ENABLE_SIZE 2 /* 2 enable bits per register */ #define DR_ENABLE_SIZE 2 /* 2 enable bits per register */
#define DR_LOCAL_ENABLE_MASK (0x55) /* Set local bits for all 4 regs */ #define DR_LOCAL_ENABLE_MASK (0x55) /* Set local bits for all 4 regs */
...@@ -67,4 +70,34 @@ ...@@ -67,4 +70,34 @@
#define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */ #define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */
#define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */ #define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */
/*
* HW breakpoint additions
*/
#ifdef __KERNEL__
DECLARE_PER_CPU(unsigned long, cpu_dr7);
static inline void hw_breakpoint_disable(void)
{
/* Zero the control register for HW Breakpoint */
set_debugreg(0UL, 7);
/* Zero-out the individual HW breakpoint address registers */
set_debugreg(0UL, 0);
set_debugreg(0UL, 1);
set_debugreg(0UL, 2);
set_debugreg(0UL, 3);
}
static inline int hw_breakpoint_active(void)
{
return __get_cpu_var(cpu_dr7) & DR_GLOBAL_ENABLE_MASK;
}
extern void aout_dump_debugregs(struct user *dump);
extern void hw_breakpoint_restore(void);
#endif /* __KERNEL__ */
#endif /* _ASM_X86_DEBUGREG_H */ #endif /* _ASM_X86_DEBUGREG_H */
...@@ -20,11 +20,11 @@ typedef struct { ...@@ -20,11 +20,11 @@ typedef struct {
unsigned int irq_call_count; unsigned int irq_call_count;
unsigned int irq_tlb_count; unsigned int irq_tlb_count;
#endif #endif
#ifdef CONFIG_X86_MCE #ifdef CONFIG_X86_THERMAL_VECTOR
unsigned int irq_thermal_count; unsigned int irq_thermal_count;
# ifdef CONFIG_X86_MCE_THRESHOLD #endif
#ifdef CONFIG_X86_MCE_THRESHOLD
unsigned int irq_threshold_count; unsigned int irq_threshold_count;
# endif
#endif #endif
} ____cacheline_aligned irq_cpustat_t; } ____cacheline_aligned irq_cpustat_t;
......
#ifndef _I386_HW_BREAKPOINT_H
#define _I386_HW_BREAKPOINT_H
#ifdef __KERNEL__
#define __ARCH_HW_BREAKPOINT_H
/*
* The name should probably be something dealt in
* a higher level. While dealing with the user
* (display/resolving)
*/
struct arch_hw_breakpoint {
char *name; /* Contains name of the symbol to set bkpt */
unsigned long address;
u8 len;
u8 type;
};
#include <linux/kdebug.h>
#include <linux/percpu.h>
#include <linux/list.h>
/* Available HW breakpoint length encodings */
#define X86_BREAKPOINT_LEN_1 0x40
#define X86_BREAKPOINT_LEN_2 0x44
#define X86_BREAKPOINT_LEN_4 0x4c
#define X86_BREAKPOINT_LEN_EXECUTE 0x40
#ifdef CONFIG_X86_64
#define X86_BREAKPOINT_LEN_8 0x48
#endif
/* Available HW breakpoint type encodings */
/* trigger on instruction execute */
#define X86_BREAKPOINT_EXECUTE 0x80
/* trigger on memory write */
#define X86_BREAKPOINT_WRITE 0x81
/* trigger on memory read or write */
#define X86_BREAKPOINT_RW 0x83
/* Total number of available HW breakpoint registers */
#define HBP_NUM 4
struct perf_event;
struct pmu;
extern int arch_check_va_in_userspace(unsigned long va, u8 hbp_len);
extern int arch_validate_hwbkpt_settings(struct perf_event *bp,
struct task_struct *tsk);
extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
int arch_install_hw_breakpoint(struct perf_event *bp);
void arch_uninstall_hw_breakpoint(struct perf_event *bp);
void hw_breakpoint_pmu_read(struct perf_event *bp);
void hw_breakpoint_pmu_unthrottle(struct perf_event *bp);
extern void
arch_fill_perf_breakpoint(struct perf_event *bp);
unsigned long encode_dr7(int drnum, unsigned int len, unsigned int type);
int decode_dr7(unsigned long dr7, int bpnum, unsigned *len, unsigned *type);
extern int arch_bp_generic_fields(int x86_len, int x86_type,
int *gen_len, int *gen_type);
extern struct pmu perf_ops_bp;
#endif /* __KERNEL__ */
#endif /* _I386_HW_BREAKPOINT_H */
#ifndef _ASM_X86_INAT_H
#define _ASM_X86_INAT_H
/*
* x86 instruction attributes
*
* Written by Masami Hiramatsu <mhiramat@redhat.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*
*/
#include <asm/inat_types.h>
/*
* Internal bits. Don't use bitmasks directly, because these bits are
* unstable. You should use checking functions.
*/
#define INAT_OPCODE_TABLE_SIZE 256
#define INAT_GROUP_TABLE_SIZE 8
/* Legacy last prefixes */
#define INAT_PFX_OPNDSZ 1 /* 0x66 */ /* LPFX1 */
#define INAT_PFX_REPE 2 /* 0xF3 */ /* LPFX2 */
#define INAT_PFX_REPNE 3 /* 0xF2 */ /* LPFX3 */
/* Other Legacy prefixes */
#define INAT_PFX_LOCK 4 /* 0xF0 */
#define INAT_PFX_CS 5 /* 0x2E */
#define INAT_PFX_DS 6 /* 0x3E */
#define INAT_PFX_ES 7 /* 0x26 */
#define INAT_PFX_FS 8 /* 0x64 */
#define INAT_PFX_GS 9 /* 0x65 */
#define INAT_PFX_SS 10 /* 0x36 */
#define INAT_PFX_ADDRSZ 11 /* 0x67 */
/* x86-64 REX prefix */
#define INAT_PFX_REX 12 /* 0x4X */
/* AVX VEX prefixes */
#define INAT_PFX_VEX2 13 /* 2-bytes VEX prefix */
#define INAT_PFX_VEX3 14 /* 3-bytes VEX prefix */
#define INAT_LSTPFX_MAX 3
#define INAT_LGCPFX_MAX 11
/* Immediate size */
#define INAT_IMM_BYTE 1
#define INAT_IMM_WORD 2
#define INAT_IMM_DWORD 3
#define INAT_IMM_QWORD 4
#define INAT_IMM_PTR 5
#define INAT_IMM_VWORD32 6
#define INAT_IMM_VWORD 7
/* Legacy prefix */
#define INAT_PFX_OFFS 0
#define INAT_PFX_BITS 4
#define INAT_PFX_MAX ((1 << INAT_PFX_BITS) - 1)
#define INAT_PFX_MASK (INAT_PFX_MAX << INAT_PFX_OFFS)
/* Escape opcodes */
#define INAT_ESC_OFFS (INAT_PFX_OFFS + INAT_PFX_BITS)
#define INAT_ESC_BITS 2
#define INAT_ESC_MAX ((1 << INAT_ESC_BITS) - 1)
#define INAT_ESC_MASK (INAT_ESC_MAX << INAT_ESC_OFFS)
/* Group opcodes (1-16) */
#define INAT_GRP_OFFS (INAT_ESC_OFFS + INAT_ESC_BITS)
#define INAT_GRP_BITS 5
#define INAT_GRP_MAX ((1 << INAT_GRP_BITS) - 1)
#define INAT_GRP_MASK (INAT_GRP_MAX << INAT_GRP_OFFS)
/* Immediates */
#define INAT_IMM_OFFS (INAT_GRP_OFFS + INAT_GRP_BITS)
#define INAT_IMM_BITS 3
#define INAT_IMM_MASK (((1 << INAT_IMM_BITS) - 1) << INAT_IMM_OFFS)
/* Flags */
#define INAT_FLAG_OFFS (INAT_IMM_OFFS + INAT_IMM_BITS)
#define INAT_MODRM (1 << (INAT_FLAG_OFFS))
#define INAT_FORCE64 (1 << (INAT_FLAG_OFFS + 1))
#define INAT_SCNDIMM (1 << (INAT_FLAG_OFFS + 2))
#define INAT_MOFFSET (1 << (INAT_FLAG_OFFS + 3))
#define INAT_VARIANT (1 << (INAT_FLAG_OFFS + 4))
#define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
#define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
#define INAT_MAKE_GROUP(grp) ((grp << INAT_GRP_OFFS) | INAT_MODRM)
#define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS)
/* Attribute search APIs */
extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode);
extern insn_attr_t inat_get_escape_attribute(insn_byte_t opcode,
insn_byte_t last_pfx,
insn_attr_t esc_attr);
extern insn_attr_t inat_get_group_attribute(insn_byte_t modrm,
insn_byte_t last_pfx,
insn_attr_t esc_attr);
extern insn_attr_t inat_get_avx_attribute(insn_byte_t opcode,
insn_byte_t vex_m,
insn_byte_t vex_pp);
/* Attribute checking functions */
static inline int inat_is_legacy_prefix(insn_attr_t attr)
{
attr &= INAT_PFX_MASK;
return attr && attr <= INAT_LGCPFX_MAX;
}
static inline int inat_is_address_size_prefix(insn_attr_t attr)
{
return (attr & INAT_PFX_MASK) == INAT_PFX_ADDRSZ;
}
static inline int inat_is_operand_size_prefix(insn_attr_t attr)
{
return (attr & INAT_PFX_MASK) == INAT_PFX_OPNDSZ;
}
static inline int inat_is_rex_prefix(insn_attr_t attr)
{
return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
}
static inline int inat_last_prefix_id(insn_attr_t attr)
{
if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
return 0;
else
return attr & INAT_PFX_MASK;
}
static inline int inat_is_vex_prefix(insn_attr_t attr)
{
attr &= INAT_PFX_MASK;
return attr == INAT_PFX_VEX2 || attr == INAT_PFX_VEX3;
}
static inline int inat_is_vex3_prefix(insn_attr_t attr)
{
return (attr & INAT_PFX_MASK) == INAT_PFX_VEX3;
}
static inline int inat_is_escape(insn_attr_t attr)
{
return attr & INAT_ESC_MASK;
}
static inline int inat_escape_id(insn_attr_t attr)
{
return (attr & INAT_ESC_MASK) >> INAT_ESC_OFFS;
}
static inline int inat_is_group(insn_attr_t attr)
{
return attr & INAT_GRP_MASK;
}
static inline int inat_group_id(insn_attr_t attr)
{
return (attr & INAT_GRP_MASK) >> INAT_GRP_OFFS;
}
static inline int inat_group_common_attribute(insn_attr_t attr)
{
return attr & ~INAT_GRP_MASK;
}
static inline int inat_has_immediate(insn_attr_t attr)
{
return attr & INAT_IMM_MASK;
}
static inline int inat_immediate_size(insn_attr_t attr)
{
return (attr & INAT_IMM_MASK) >> INAT_IMM_OFFS;
}
static inline int inat_has_modrm(insn_attr_t attr)
{
return attr & INAT_MODRM;
}
static inline int inat_is_force64(insn_attr_t attr)
{
return attr & INAT_FORCE64;
}
static inline int inat_has_second_immediate(insn_attr_t attr)
{
return attr & INAT_SCNDIMM;
}
static inline int inat_has_moffset(insn_attr_t attr)
{
return attr & INAT_MOFFSET;
}
static inline int inat_has_variant(insn_attr_t attr)
{
return attr & INAT_VARIANT;
}
static inline int inat_accept_vex(insn_attr_t attr)
{
return attr & INAT_VEXOK;
}
static inline int inat_must_vex(insn_attr_t attr)
{
return attr & INAT_VEXONLY;
}
#endif
#ifndef _ASM_X86_INAT_TYPES_H
#define _ASM_X86_INAT_TYPES_H
/*
* x86 instruction attributes
*
* Written by Masami Hiramatsu <mhiramat@redhat.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*
*/
/* Instruction attributes */
typedef unsigned int insn_attr_t;
typedef unsigned char insn_byte_t;
typedef signed int insn_value_t;
#endif
#ifndef _ASM_X86_INSN_H
#define _ASM_X86_INSN_H
/*
* x86 instruction analysis
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*
* Copyright (C) IBM Corporation, 2009
*/
/* insn_attr_t is defined in inat.h */
#include <asm/inat.h>
struct insn_field {
union {
insn_value_t value;
insn_byte_t bytes[4];
};
/* !0 if we've run insn_get_xxx() for this field */
unsigned char got;
unsigned char nbytes;
};
struct insn {
struct insn_field prefixes; /*
* Prefixes
* prefixes.bytes[3]: last prefix
*/
struct insn_field rex_prefix; /* REX prefix */
struct insn_field vex_prefix; /* VEX prefix */
struct insn_field opcode; /*
* opcode.bytes[0]: opcode1
* opcode.bytes[1]: opcode2
* opcode.bytes[2]: opcode3
*/
struct insn_field modrm;
struct insn_field sib;
struct insn_field displacement;
union {
struct insn_field immediate;
struct insn_field moffset1; /* for 64bit MOV */
struct insn_field immediate1; /* for 64bit imm or off16/32 */
};
union {
struct insn_field moffset2; /* for 64bit MOV */
struct insn_field immediate2; /* for 64bit imm or seg16 */
};
insn_attr_t attr;
unsigned char opnd_bytes;
unsigned char addr_bytes;
unsigned char length;
unsigned char x86_64;
const insn_byte_t *kaddr; /* kernel address of insn to analyze */
const insn_byte_t *next_byte;
};
#define X86_MODRM_MOD(modrm) (((modrm) & 0xc0) >> 6)
#define X86_MODRM_REG(modrm) (((modrm) & 0x38) >> 3)
#define X86_MODRM_RM(modrm) ((modrm) & 0x07)
#define X86_SIB_SCALE(sib) (((sib) & 0xc0) >> 6)
#define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
#define X86_SIB_BASE(sib) ((sib) & 0x07)
#define X86_REX_W(rex) ((rex) & 8)
#define X86_REX_R(rex) ((rex) & 4)
#define X86_REX_X(rex) ((rex) & 2)
#define X86_REX_B(rex) ((rex) & 1)
/* VEX bit flags */
#define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */
#define X86_VEX_R(vex) ((vex) & 0x80) /* VEX2/3 Byte1 */
#define X86_VEX_X(vex) ((vex) & 0x40) /* VEX3 Byte1 */
#define X86_VEX_B(vex) ((vex) & 0x20) /* VEX3 Byte1 */
#define X86_VEX_L(vex) ((vex) & 0x04) /* VEX3 Byte2, VEX2 Byte1 */
/* VEX bit fields */
#define X86_VEX3_M(vex) ((vex) & 0x1f) /* VEX3 Byte1 */
#define X86_VEX2_M 1 /* VEX2.M always 1 */
#define X86_VEX_V(vex) (((vex) & 0x78) >> 3) /* VEX3 Byte2, VEX2 Byte1 */
#define X86_VEX_P(vex) ((vex) & 0x03) /* VEX3 Byte2, VEX2 Byte1 */
#define X86_VEX_M_MAX 0x1f /* VEX3.M Maximum value */
/* The last prefix is needed for two-byte and three-byte opcodes */
static inline insn_byte_t insn_last_prefix(struct insn *insn)
{
return insn->prefixes.bytes[3];
}
extern void insn_init(struct insn *insn, const void *kaddr, int x86_64);
extern void insn_get_prefixes(struct insn *insn);
extern void insn_get_opcode(struct insn *insn);
extern void insn_get_modrm(struct insn *insn);
extern void insn_get_sib(struct insn *insn);
extern void insn_get_displacement(struct insn *insn);
extern void insn_get_immediate(struct insn *insn);
extern void insn_get_length(struct insn *insn);
/* Attribute will be determined after getting ModRM (for opcode groups) */
static inline void insn_get_attribute(struct insn *insn)
{
insn_get_modrm(insn);
}
/* Instruction uses RIP-relative addressing */
extern int insn_rip_relative(struct insn *insn);
/* Init insn for kernel text */
static inline void kernel_insn_init(struct insn *insn, const void *kaddr)
{
#ifdef CONFIG_X86_64
insn_init(insn, kaddr, 1);
#else /* CONFIG_X86_32 */
insn_init(insn, kaddr, 0);
#endif
}
static inline int insn_is_avx(struct insn *insn)
{
if (!insn->prefixes.got)
insn_get_prefixes(insn);
return (insn->vex_prefix.value != 0);
}
static inline insn_byte_t insn_vex_m_bits(struct insn *insn)
{
if (insn->vex_prefix.nbytes == 2) /* 2 bytes VEX */
return X86_VEX2_M;
else
return X86_VEX3_M(insn->vex_prefix.bytes[1]);
}
static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
{
if (insn->vex_prefix.nbytes == 2) /* 2 bytes VEX */
return X86_VEX_P(insn->vex_prefix.bytes[1]);
else
return X86_VEX_P(insn->vex_prefix.bytes[2]);
}
/* Offset of each field from kaddr */
static inline int insn_offset_rex_prefix(struct insn *insn)
{
return insn->prefixes.nbytes;
}
static inline int insn_offset_vex_prefix(struct insn *insn)
{
return insn_offset_rex_prefix(insn) + insn->rex_prefix.nbytes;
}
static inline int insn_offset_opcode(struct insn *insn)
{
return insn_offset_vex_prefix(insn) + insn->vex_prefix.nbytes;
}
static inline int insn_offset_modrm(struct insn *insn)
{
return insn_offset_opcode(insn) + insn->opcode.nbytes;
}
static inline int insn_offset_sib(struct insn *insn)
{
return insn_offset_modrm(insn) + insn->modrm.nbytes;
}
static inline int insn_offset_displacement(struct insn *insn)
{
return insn_offset_sib(insn) + insn->sib.nbytes;
}
static inline int insn_offset_immediate(struct insn *insn)
{
return insn_offset_displacement(insn) + insn->displacement.nbytes;
}
#endif /* _ASM_X86_INSN_H */
...@@ -108,6 +108,8 @@ struct mce_log { ...@@ -108,6 +108,8 @@ struct mce_log {
#define K8_MCE_THRESHOLD_BANK_5 (MCE_THRESHOLD_BASE + 5 * 9) #define K8_MCE_THRESHOLD_BANK_5 (MCE_THRESHOLD_BASE + 5 * 9)
#define K8_MCE_THRESHOLD_DRAM_ECC (MCE_THRESHOLD_BANK_4 + 0) #define K8_MCE_THRESHOLD_DRAM_ECC (MCE_THRESHOLD_BANK_4 + 0)
extern struct atomic_notifier_head x86_mce_decoder_chain;
#ifdef __KERNEL__ #ifdef __KERNEL__
#include <linux/percpu.h> #include <linux/percpu.h>
...@@ -118,9 +120,11 @@ extern int mce_disabled; ...@@ -118,9 +120,11 @@ extern int mce_disabled;
extern int mce_p5_enabled; extern int mce_p5_enabled;
#ifdef CONFIG_X86_MCE #ifdef CONFIG_X86_MCE
void mcheck_init(struct cpuinfo_x86 *c); int mcheck_init(void);
void mcheck_cpu_init(struct cpuinfo_x86 *c);
#else #else
static inline void mcheck_init(struct cpuinfo_x86 *c) {} static inline int mcheck_init(void) { return 0; }
static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
#endif #endif
#ifdef CONFIG_X86_ANCIENT_MCE #ifdef CONFIG_X86_ANCIENT_MCE
...@@ -214,5 +218,11 @@ void intel_init_thermal(struct cpuinfo_x86 *c); ...@@ -214,5 +218,11 @@ void intel_init_thermal(struct cpuinfo_x86 *c);
void mce_log_therm_throt_event(__u64 status); void mce_log_therm_throt_event(__u64 status);
#ifdef CONFIG_X86_THERMAL_VECTOR
extern void mcheck_intel_therm_init(void);
#else
static inline void mcheck_intel_therm_init(void) { }
#endif
#endif /* __KERNEL__ */ #endif /* __KERNEL__ */
#endif /* _ASM_X86_MCE_H */ #endif /* _ASM_X86_MCE_H */
...@@ -28,9 +28,20 @@ ...@@ -28,9 +28,20 @@
*/ */
#define ARCH_PERFMON_EVENT_MASK 0xffff #define ARCH_PERFMON_EVENT_MASK 0xffff
/*
* filter mask to validate fixed counter events.
* the following filters disqualify for fixed counters:
* - inv
* - edge
* - cnt-mask
* The other filters are supported by fixed counters.
* The any-thread option is supported starting with v3.
*/
#define ARCH_PERFMON_EVENT_FILTER_MASK 0xff840000
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL 0x3c #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL 0x3c
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK (0x00 << 8) #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK (0x00 << 8)
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX 0 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX 0
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT \ #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT \
(1 << (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX)) (1 << (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX))
......
...@@ -30,6 +30,7 @@ struct mm_struct; ...@@ -30,6 +30,7 @@ struct mm_struct;
#include <linux/math64.h> #include <linux/math64.h>
#include <linux/init.h> #include <linux/init.h>
#define HBP_NUM 4
/* /*
* Default implementation of macro that returns current * Default implementation of macro that returns current
* instruction pointer ("program counter"). * instruction pointer ("program counter").
...@@ -422,6 +423,8 @@ extern unsigned int xstate_size; ...@@ -422,6 +423,8 @@ extern unsigned int xstate_size;
extern void free_thread_xstate(struct task_struct *); extern void free_thread_xstate(struct task_struct *);
extern struct kmem_cache *task_xstate_cachep; extern struct kmem_cache *task_xstate_cachep;
struct perf_event;
struct thread_struct { struct thread_struct {
/* Cached TLS descriptors: */ /* Cached TLS descriptors: */
struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES]; struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES];
...@@ -443,13 +446,10 @@ struct thread_struct { ...@@ -443,13 +446,10 @@ struct thread_struct {
unsigned long fs; unsigned long fs;
#endif #endif
unsigned long gs; unsigned long gs;
/* Hardware debugging registers: */ /* Save middle states of ptrace breakpoints */
unsigned long debugreg0; struct perf_event *ptrace_bps[HBP_NUM];
unsigned long debugreg1; /* Debug status used for traps, single steps, etc... */
unsigned long debugreg2; unsigned long debugreg6;
unsigned long debugreg3;
unsigned long debugreg6;
unsigned long debugreg7;
/* Fault info: */ /* Fault info: */
unsigned long cr2; unsigned long cr2;
unsigned long trap_no; unsigned long trap_no;
......
...@@ -7,6 +7,7 @@ ...@@ -7,6 +7,7 @@
#ifdef __KERNEL__ #ifdef __KERNEL__
#include <asm/segment.h> #include <asm/segment.h>
#include <asm/page_types.h>
#endif #endif
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
...@@ -216,6 +217,67 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) ...@@ -216,6 +217,67 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs)
return regs->sp; return regs->sp;
} }
/* Query offset/name of register from its name/offset */
extern int regs_query_register_offset(const char *name);
extern const char *regs_query_register_name(unsigned int offset);
#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss))
/**
* regs_get_register() - get register value from its offset
* @regs: pt_regs from which register value is gotten.
* @offset: offset number of the register.
*
* regs_get_register returns the value of a register. The @offset is the
* offset of the register in struct pt_regs address which specified by @regs.
* If @offset is bigger than MAX_REG_OFFSET, this returns 0.
*/
static inline unsigned long regs_get_register(struct pt_regs *regs,
unsigned int offset)
{
if (unlikely(offset > MAX_REG_OFFSET))
return 0;
return *(unsigned long *)((unsigned long)regs + offset);
}
/**
* regs_within_kernel_stack() - check the address in the stack
* @regs: pt_regs which contains kernel stack pointer.
* @addr: address which is checked.
*
* regs_within_kernel_stack() checks @addr is within the kernel stack page(s).
* If @addr is within the kernel stack, it returns true. If not, returns false.
*/
static inline int regs_within_kernel_stack(struct pt_regs *regs,
unsigned long addr)
{
return ((addr & ~(THREAD_SIZE - 1)) ==
(kernel_stack_pointer(regs) & ~(THREAD_SIZE - 1)));
}
/**
* regs_get_kernel_stack_nth() - get Nth entry of the stack
* @regs: pt_regs which contains kernel stack pointer.
* @n: stack entry number.
*
* regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which
* is specified by @regs. If the @n th entry is NOT in the kernel stack,
* this returns 0.
*/
static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs,
unsigned int n)
{
unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs);
addr += n;
if (regs_within_kernel_stack(regs, (unsigned long)addr))
return *addr;
else
return 0;
}
/* Get Nth argument at function call */
extern unsigned long regs_get_argument_nth(struct pt_regs *regs,
unsigned int n);
/* /*
* These are defined as per linux/ptrace.h, which see. * These are defined as per linux/ptrace.h, which see.
*/ */
......
...@@ -40,7 +40,7 @@ obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o ...@@ -40,7 +40,7 @@ obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o
obj-$(CONFIG_X86_64) += syscall_64.o vsyscall_64.o obj-$(CONFIG_X86_64) += syscall_64.o vsyscall_64.o
obj-y += bootflag.o e820.o obj-y += bootflag.o e820.o
obj-y += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o obj-y += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
obj-y += alternative.o i8253.o pci-nommu.o obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o
obj-y += tsc.o io_delay.o rtc.o obj-y += tsc.o io_delay.o rtc.o
obj-$(CONFIG_X86_TRAMPOLINE) += trampoline.o obj-$(CONFIG_X86_TRAMPOLINE) += trampoline.o
......
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
# Don't trace early stages of a secondary CPU boot # Don't trace early stages of a secondary CPU boot
ifdef CONFIG_FUNCTION_TRACER ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_common.o = -pg CFLAGS_REMOVE_common.o = -pg
CFLAGS_REMOVE_perf_event.o = -pg
endif endif
# Make sure load_percpu_segment has no stackprotector # Make sure load_percpu_segment has no stackprotector
......
...@@ -837,10 +837,8 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c) ...@@ -837,10 +837,8 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
boot_cpu_data.x86_capability[i] &= c->x86_capability[i]; boot_cpu_data.x86_capability[i] &= c->x86_capability[i];
} }
#ifdef CONFIG_X86_MCE
/* Init Machine Check Exception if available. */ /* Init Machine Check Exception if available. */
mcheck_init(c); mcheck_cpu_init(c);
#endif
select_idle_routine(c); select_idle_routine(c);
......
...@@ -46,6 +46,9 @@ ...@@ -46,6 +46,9 @@
#include "mce-internal.h" #include "mce-internal.h"
#define CREATE_TRACE_POINTS
#include <trace/events/mce.h>
int mce_disabled __read_mostly; int mce_disabled __read_mostly;
#define MISC_MCELOG_MINOR 227 #define MISC_MCELOG_MINOR 227
...@@ -85,18 +88,26 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_wait); ...@@ -85,18 +88,26 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_wait);
static DEFINE_PER_CPU(struct mce, mces_seen); static DEFINE_PER_CPU(struct mce, mces_seen);
static int cpu_missing; static int cpu_missing;
static void default_decode_mce(struct mce *m) /*
* CPU/chipset specific EDAC code can register a notifier call here to print
* MCE errors in a human-readable form.
*/
ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain);
EXPORT_SYMBOL_GPL(x86_mce_decoder_chain);
static int default_decode_mce(struct notifier_block *nb, unsigned long val,
void *data)
{ {
pr_emerg("No human readable MCE decoding support on this CPU type.\n"); pr_emerg("No human readable MCE decoding support on this CPU type.\n");
pr_emerg("Run the message through 'mcelog --ascii' to decode.\n"); pr_emerg("Run the message through 'mcelog --ascii' to decode.\n");
return NOTIFY_STOP;
} }
/* static struct notifier_block mce_dec_nb = {
* CPU/chipset specific EDAC code can register a callback here to print .notifier_call = default_decode_mce,
* MCE errors in a human-readable form: .priority = -1,
*/ };
void (*x86_mce_decode_callback)(struct mce *m) = default_decode_mce;
EXPORT_SYMBOL(x86_mce_decode_callback);
/* MCA banks polled by the period polling timer for corrected events */ /* MCA banks polled by the period polling timer for corrected events */
DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = { DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
...@@ -141,6 +152,9 @@ void mce_log(struct mce *mce) ...@@ -141,6 +152,9 @@ void mce_log(struct mce *mce)
{ {
unsigned next, entry; unsigned next, entry;
/* Emit the trace record: */
trace_mce_record(mce);
mce->finished = 0; mce->finished = 0;
wmb(); wmb();
for (;;) { for (;;) {
...@@ -204,9 +218,9 @@ static void print_mce(struct mce *m) ...@@ -204,9 +218,9 @@ static void print_mce(struct mce *m)
/* /*
* Print out human-readable details about the MCE error, * Print out human-readable details about the MCE error,
* (if the CPU has an implementation for that): * (if the CPU has an implementation for that)
*/ */
x86_mce_decode_callback(m); atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
} }
static void print_mce_head(void) static void print_mce_head(void)
...@@ -1122,7 +1136,7 @@ static int check_interval = 5 * 60; /* 5 minutes */ ...@@ -1122,7 +1136,7 @@ static int check_interval = 5 * 60; /* 5 minutes */
static DEFINE_PER_CPU(int, mce_next_interval); /* in jiffies */ static DEFINE_PER_CPU(int, mce_next_interval); /* in jiffies */
static DEFINE_PER_CPU(struct timer_list, mce_timer); static DEFINE_PER_CPU(struct timer_list, mce_timer);
static void mcheck_timer(unsigned long data) static void mce_start_timer(unsigned long data)
{ {
struct timer_list *t = &per_cpu(mce_timer, data); struct timer_list *t = &per_cpu(mce_timer, data);
int *n; int *n;
...@@ -1187,7 +1201,7 @@ int mce_notify_irq(void) ...@@ -1187,7 +1201,7 @@ int mce_notify_irq(void)
} }
EXPORT_SYMBOL_GPL(mce_notify_irq); EXPORT_SYMBOL_GPL(mce_notify_irq);
static int mce_banks_init(void) static int __cpuinit __mcheck_cpu_mce_banks_init(void)
{ {
int i; int i;
...@@ -1206,7 +1220,7 @@ static int mce_banks_init(void) ...@@ -1206,7 +1220,7 @@ static int mce_banks_init(void)
/* /*
* Initialize Machine Checks for a CPU. * Initialize Machine Checks for a CPU.
*/ */
static int __cpuinit mce_cap_init(void) static int __cpuinit __mcheck_cpu_cap_init(void)
{ {
unsigned b; unsigned b;
u64 cap; u64 cap;
...@@ -1228,7 +1242,7 @@ static int __cpuinit mce_cap_init(void) ...@@ -1228,7 +1242,7 @@ static int __cpuinit mce_cap_init(void)
WARN_ON(banks != 0 && b != banks); WARN_ON(banks != 0 && b != banks);
banks = b; banks = b;
if (!mce_banks) { if (!mce_banks) {
int err = mce_banks_init(); int err = __mcheck_cpu_mce_banks_init();
if (err) if (err)
return err; return err;
...@@ -1244,7 +1258,7 @@ static int __cpuinit mce_cap_init(void) ...@@ -1244,7 +1258,7 @@ static int __cpuinit mce_cap_init(void)
return 0; return 0;
} }
static void mce_init(void) static void __mcheck_cpu_init_generic(void)
{ {
mce_banks_t all_banks; mce_banks_t all_banks;
u64 cap; u64 cap;
...@@ -1273,7 +1287,7 @@ static void mce_init(void) ...@@ -1273,7 +1287,7 @@ static void mce_init(void)
} }
/* Add per CPU specific workarounds here */ /* Add per CPU specific workarounds here */
static int __cpuinit mce_cpu_quirks(struct cpuinfo_x86 *c) static int __cpuinit __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
{ {
if (c->x86_vendor == X86_VENDOR_UNKNOWN) { if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
pr_info("MCE: unknown CPU type - not enabling MCE support.\n"); pr_info("MCE: unknown CPU type - not enabling MCE support.\n");
...@@ -1341,7 +1355,7 @@ static int __cpuinit mce_cpu_quirks(struct cpuinfo_x86 *c) ...@@ -1341,7 +1355,7 @@ static int __cpuinit mce_cpu_quirks(struct cpuinfo_x86 *c)
return 0; return 0;
} }
static void __cpuinit mce_ancient_init(struct cpuinfo_x86 *c) static void __cpuinit __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
{ {
if (c->x86 != 5) if (c->x86 != 5)
return; return;
...@@ -1355,7 +1369,7 @@ static void __cpuinit mce_ancient_init(struct cpuinfo_x86 *c) ...@@ -1355,7 +1369,7 @@ static void __cpuinit mce_ancient_init(struct cpuinfo_x86 *c)
} }
} }
static void mce_cpu_features(struct cpuinfo_x86 *c) static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
{ {
switch (c->x86_vendor) { switch (c->x86_vendor) {
case X86_VENDOR_INTEL: case X86_VENDOR_INTEL:
...@@ -1369,7 +1383,7 @@ static void mce_cpu_features(struct cpuinfo_x86 *c) ...@@ -1369,7 +1383,7 @@ static void mce_cpu_features(struct cpuinfo_x86 *c)
} }
} }
static void mce_init_timer(void) static void __mcheck_cpu_init_timer(void)
{ {
struct timer_list *t = &__get_cpu_var(mce_timer); struct timer_list *t = &__get_cpu_var(mce_timer);
int *n = &__get_cpu_var(mce_next_interval); int *n = &__get_cpu_var(mce_next_interval);
...@@ -1380,7 +1394,7 @@ static void mce_init_timer(void) ...@@ -1380,7 +1394,7 @@ static void mce_init_timer(void)
*n = check_interval * HZ; *n = check_interval * HZ;
if (!*n) if (!*n)
return; return;
setup_timer(t, mcheck_timer, smp_processor_id()); setup_timer(t, mce_start_timer, smp_processor_id());
t->expires = round_jiffies(jiffies + *n); t->expires = round_jiffies(jiffies + *n);
add_timer_on(t, smp_processor_id()); add_timer_on(t, smp_processor_id());
} }
...@@ -1400,27 +1414,28 @@ void (*machine_check_vector)(struct pt_regs *, long error_code) = ...@@ -1400,27 +1414,28 @@ void (*machine_check_vector)(struct pt_regs *, long error_code) =
* Called for each booted CPU to set up machine checks. * Called for each booted CPU to set up machine checks.
* Must be called with preempt off: * Must be called with preempt off:
*/ */
void __cpuinit mcheck_init(struct cpuinfo_x86 *c) void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
{ {
if (mce_disabled) if (mce_disabled)
return; return;
mce_ancient_init(c); __mcheck_cpu_ancient_init(c);
if (!mce_available(c)) if (!mce_available(c))
return; return;
if (mce_cap_init() < 0 || mce_cpu_quirks(c) < 0) { if (__mcheck_cpu_cap_init() < 0 || __mcheck_cpu_apply_quirks(c) < 0) {
mce_disabled = 1; mce_disabled = 1;
return; return;
} }
machine_check_vector = do_machine_check; machine_check_vector = do_machine_check;
mce_init(); __mcheck_cpu_init_generic();
mce_cpu_features(c); __mcheck_cpu_init_vendor(c);
mce_init_timer(); __mcheck_cpu_init_timer();
INIT_WORK(&__get_cpu_var(mce_work), mce_process_work); INIT_WORK(&__get_cpu_var(mce_work), mce_process_work);
} }
/* /*
...@@ -1640,6 +1655,15 @@ static int __init mcheck_enable(char *str) ...@@ -1640,6 +1655,15 @@ static int __init mcheck_enable(char *str)
} }
__setup("mce", mcheck_enable); __setup("mce", mcheck_enable);
int __init mcheck_init(void)
{
atomic_notifier_chain_register(&x86_mce_decoder_chain, &mce_dec_nb);
mcheck_intel_therm_init();
return 0;
}
/* /*
* Sysfs support * Sysfs support
*/ */
...@@ -1648,7 +1672,7 @@ __setup("mce", mcheck_enable); ...@@ -1648,7 +1672,7 @@ __setup("mce", mcheck_enable);
* Disable machine checks on suspend and shutdown. We can't really handle * Disable machine checks on suspend and shutdown. We can't really handle
* them later. * them later.
*/ */
static int mce_disable(void) static int mce_disable_error_reporting(void)
{ {
int i; int i;
...@@ -1663,12 +1687,12 @@ static int mce_disable(void) ...@@ -1663,12 +1687,12 @@ static int mce_disable(void)
static int mce_suspend(struct sys_device *dev, pm_message_t state) static int mce_suspend(struct sys_device *dev, pm_message_t state)
{ {
return mce_disable(); return mce_disable_error_reporting();
} }
static int mce_shutdown(struct sys_device *dev) static int mce_shutdown(struct sys_device *dev)
{ {
return mce_disable(); return mce_disable_error_reporting();
} }
/* /*
...@@ -1678,8 +1702,8 @@ static int mce_shutdown(struct sys_device *dev) ...@@ -1678,8 +1702,8 @@ static int mce_shutdown(struct sys_device *dev)
*/ */
static int mce_resume(struct sys_device *dev) static int mce_resume(struct sys_device *dev)
{ {
mce_init(); __mcheck_cpu_init_generic();
mce_cpu_features(&current_cpu_data); __mcheck_cpu_init_vendor(&current_cpu_data);
return 0; return 0;
} }
...@@ -1689,8 +1713,8 @@ static void mce_cpu_restart(void *data) ...@@ -1689,8 +1713,8 @@ static void mce_cpu_restart(void *data)
del_timer_sync(&__get_cpu_var(mce_timer)); del_timer_sync(&__get_cpu_var(mce_timer));
if (!mce_available(&current_cpu_data)) if (!mce_available(&current_cpu_data))
return; return;
mce_init(); __mcheck_cpu_init_generic();
mce_init_timer(); __mcheck_cpu_init_timer();
} }
/* Reinit MCEs after user configuration changes */ /* Reinit MCEs after user configuration changes */
...@@ -1716,7 +1740,7 @@ static void mce_enable_ce(void *all) ...@@ -1716,7 +1740,7 @@ static void mce_enable_ce(void *all)
cmci_reenable(); cmci_reenable();
cmci_recheck(); cmci_recheck();
if (all) if (all)
mce_init_timer(); __mcheck_cpu_init_timer();
} }
static struct sysdev_class mce_sysclass = { static struct sysdev_class mce_sysclass = {
...@@ -1929,13 +1953,14 @@ static __cpuinit void mce_remove_device(unsigned int cpu) ...@@ -1929,13 +1953,14 @@ static __cpuinit void mce_remove_device(unsigned int cpu)
} }
/* Make sure there are no machine checks on offlined CPUs. */ /* Make sure there are no machine checks on offlined CPUs. */
static void mce_disable_cpu(void *h) static void __cpuinit mce_disable_cpu(void *h)
{ {
unsigned long action = *(unsigned long *)h; unsigned long action = *(unsigned long *)h;
int i; int i;
if (!mce_available(&current_cpu_data)) if (!mce_available(&current_cpu_data))
return; return;
if (!(action & CPU_TASKS_FROZEN)) if (!(action & CPU_TASKS_FROZEN))
cmci_clear(); cmci_clear();
for (i = 0; i < banks; i++) { for (i = 0; i < banks; i++) {
...@@ -1946,7 +1971,7 @@ static void mce_disable_cpu(void *h) ...@@ -1946,7 +1971,7 @@ static void mce_disable_cpu(void *h)
} }
} }
static void mce_reenable_cpu(void *h) static void __cpuinit mce_reenable_cpu(void *h)
{ {
unsigned long action = *(unsigned long *)h; unsigned long action = *(unsigned long *)h;
int i; int i;
...@@ -2025,7 +2050,7 @@ static __init void mce_init_banks(void) ...@@ -2025,7 +2050,7 @@ static __init void mce_init_banks(void)
} }
} }
static __init int mce_init_device(void) static __init int mcheck_init_device(void)
{ {
int err; int err;
int i = 0; int i = 0;
...@@ -2053,7 +2078,7 @@ static __init int mce_init_device(void) ...@@ -2053,7 +2078,7 @@ static __init int mce_init_device(void)
return err; return err;
} }
device_initcall(mce_init_device); device_initcall(mcheck_init_device);
/* /*
* Old style boot options parsing. Only for compatibility. * Old style boot options parsing. Only for compatibility.
...@@ -2101,7 +2126,7 @@ static int fake_panic_set(void *data, u64 val) ...@@ -2101,7 +2126,7 @@ static int fake_panic_set(void *data, u64 val)
DEFINE_SIMPLE_ATTRIBUTE(fake_panic_fops, fake_panic_get, DEFINE_SIMPLE_ATTRIBUTE(fake_panic_fops, fake_panic_get,
fake_panic_set, "%llu\n"); fake_panic_set, "%llu\n");
static int __init mce_debugfs_init(void) static int __init mcheck_debugfs_init(void)
{ {
struct dentry *dmce, *ffake_panic; struct dentry *dmce, *ffake_panic;
...@@ -2115,5 +2140,5 @@ static int __init mce_debugfs_init(void) ...@@ -2115,5 +2140,5 @@ static int __init mce_debugfs_init(void)
return 0; return 0;
} }
late_initcall(mce_debugfs_init); late_initcall(mcheck_debugfs_init);
#endif #endif
...@@ -49,6 +49,8 @@ static DEFINE_PER_CPU(struct thermal_state, thermal_state); ...@@ -49,6 +49,8 @@ static DEFINE_PER_CPU(struct thermal_state, thermal_state);
static atomic_t therm_throt_en = ATOMIC_INIT(0); static atomic_t therm_throt_en = ATOMIC_INIT(0);
static u32 lvtthmr_init __read_mostly;
#ifdef CONFIG_SYSFS #ifdef CONFIG_SYSFS
#define define_therm_throt_sysdev_one_ro(_name) \ #define define_therm_throt_sysdev_one_ro(_name) \
static SYSDEV_ATTR(_name, 0444, therm_throt_sysdev_show_##_name, NULL) static SYSDEV_ATTR(_name, 0444, therm_throt_sysdev_show_##_name, NULL)
...@@ -254,6 +256,18 @@ asmlinkage void smp_thermal_interrupt(struct pt_regs *regs) ...@@ -254,6 +256,18 @@ asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
ack_APIC_irq(); ack_APIC_irq();
} }
void __init mcheck_intel_therm_init(void)
{
/*
* This function is only called on boot CPU. Save the init thermal
* LVT value on BSP and use that value to restore APs' thermal LVT
* entry BIOS programmed later
*/
if (cpu_has(&boot_cpu_data, X86_FEATURE_ACPI) &&
cpu_has(&boot_cpu_data, X86_FEATURE_ACC))
lvtthmr_init = apic_read(APIC_LVTTHMR);
}
void intel_init_thermal(struct cpuinfo_x86 *c) void intel_init_thermal(struct cpuinfo_x86 *c)
{ {
unsigned int cpu = smp_processor_id(); unsigned int cpu = smp_processor_id();
...@@ -270,7 +284,20 @@ void intel_init_thermal(struct cpuinfo_x86 *c) ...@@ -270,7 +284,20 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
* since it might be delivered via SMI already: * since it might be delivered via SMI already:
*/ */
rdmsr(MSR_IA32_MISC_ENABLE, l, h); rdmsr(MSR_IA32_MISC_ENABLE, l, h);
h = apic_read(APIC_LVTTHMR);
/*
* The initial value of thermal LVT entries on all APs always reads
* 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
* sequence to them and LVT registers are reset to 0s except for
* the mask bits which are set to 1s when APs receive INIT IPI.
* Always restore the value that BIOS has programmed on AP based on
* BSP's info we saved since BIOS is always setting the same value
* for all threads/cores
*/
apic_write(APIC_LVTTHMR, lvtthmr_init);
h = lvtthmr_init;
if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) { if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
printk(KERN_DEBUG printk(KERN_DEBUG
"CPU%d: Thermal monitoring handled by SMI\n", cpu); "CPU%d: Thermal monitoring handled by SMI\n", cpu);
......
This diff is collapsed.
...@@ -333,6 +333,10 @@ ENTRY(ret_from_fork) ...@@ -333,6 +333,10 @@ ENTRY(ret_from_fork)
CFI_ENDPROC CFI_ENDPROC
END(ret_from_fork) END(ret_from_fork)
/*
* Interrupt exit functions should be protected against kprobes
*/
.pushsection .kprobes.text, "ax"
/* /*
* Return to user mode is not as complex as all this looks, * Return to user mode is not as complex as all this looks,
* but we want the default path for a system call return to * but we want the default path for a system call return to
...@@ -383,6 +387,10 @@ need_resched: ...@@ -383,6 +387,10 @@ need_resched:
END(resume_kernel) END(resume_kernel)
#endif #endif
CFI_ENDPROC CFI_ENDPROC
/*
* End of kprobes section
*/
.popsection
/* SYSENTER_RETURN points to after the "sysenter" instruction in /* SYSENTER_RETURN points to after the "sysenter" instruction in
the vsyscall page. See vsyscall-sysentry.S, which defines the symbol. */ the vsyscall page. See vsyscall-sysentry.S, which defines the symbol. */
...@@ -513,6 +521,10 @@ sysexit_audit: ...@@ -513,6 +521,10 @@ sysexit_audit:
PTGS_TO_GS_EX PTGS_TO_GS_EX
ENDPROC(ia32_sysenter_target) ENDPROC(ia32_sysenter_target)
/*
* syscall stub including irq exit should be protected against kprobes
*/
.pushsection .kprobes.text, "ax"
# system call handler stub # system call handler stub
ENTRY(system_call) ENTRY(system_call)
RING0_INT_FRAME # can't unwind into user space anyway RING0_INT_FRAME # can't unwind into user space anyway
...@@ -705,6 +717,10 @@ syscall_badsys: ...@@ -705,6 +717,10 @@ syscall_badsys:
jmp resume_userspace jmp resume_userspace
END(syscall_badsys) END(syscall_badsys)
CFI_ENDPROC CFI_ENDPROC
/*
* End of kprobes section
*/
.popsection
/* /*
* System calls that need a pt_regs pointer. * System calls that need a pt_regs pointer.
...@@ -814,6 +830,10 @@ common_interrupt: ...@@ -814,6 +830,10 @@ common_interrupt:
ENDPROC(common_interrupt) ENDPROC(common_interrupt)
CFI_ENDPROC CFI_ENDPROC
/*
* Irq entries should be protected against kprobes
*/
.pushsection .kprobes.text, "ax"
#define BUILD_INTERRUPT3(name, nr, fn) \ #define BUILD_INTERRUPT3(name, nr, fn) \
ENTRY(name) \ ENTRY(name) \
RING0_INT_FRAME; \ RING0_INT_FRAME; \
...@@ -980,6 +1000,10 @@ ENTRY(spurious_interrupt_bug) ...@@ -980,6 +1000,10 @@ ENTRY(spurious_interrupt_bug)
jmp error_code jmp error_code
CFI_ENDPROC CFI_ENDPROC
END(spurious_interrupt_bug) END(spurious_interrupt_bug)
/*
* End of kprobes section
*/
.popsection
ENTRY(kernel_thread_helper) ENTRY(kernel_thread_helper)
pushl $0 # fake return address for unwinder pushl $0 # fake return address for unwinder
......
...@@ -803,6 +803,10 @@ END(interrupt) ...@@ -803,6 +803,10 @@ END(interrupt)
call \func call \func
.endm .endm
/*
* Interrupt entry/exit should be protected against kprobes
*/
.pushsection .kprobes.text, "ax"
/* /*
* The interrupt stubs push (~vector+0x80) onto the stack and * The interrupt stubs push (~vector+0x80) onto the stack and
* then jump to common_interrupt. * then jump to common_interrupt.
...@@ -941,6 +945,10 @@ ENTRY(retint_kernel) ...@@ -941,6 +945,10 @@ ENTRY(retint_kernel)
CFI_ENDPROC CFI_ENDPROC
END(common_interrupt) END(common_interrupt)
/*
* End of kprobes section
*/
.popsection
/* /*
* APIC interrupts. * APIC interrupts.
......
This diff is collapsed.
...@@ -92,17 +92,17 @@ static int show_other_interrupts(struct seq_file *p, int prec) ...@@ -92,17 +92,17 @@ static int show_other_interrupts(struct seq_file *p, int prec)
seq_printf(p, "%10u ", irq_stats(j)->irq_tlb_count); seq_printf(p, "%10u ", irq_stats(j)->irq_tlb_count);
seq_printf(p, " TLB shootdowns\n"); seq_printf(p, " TLB shootdowns\n");
#endif #endif
#ifdef CONFIG_X86_MCE #ifdef CONFIG_X86_THERMAL_VECTOR
seq_printf(p, "%*s: ", prec, "TRM"); seq_printf(p, "%*s: ", prec, "TRM");
for_each_online_cpu(j) for_each_online_cpu(j)
seq_printf(p, "%10u ", irq_stats(j)->irq_thermal_count); seq_printf(p, "%10u ", irq_stats(j)->irq_thermal_count);
seq_printf(p, " Thermal event interrupts\n"); seq_printf(p, " Thermal event interrupts\n");
# ifdef CONFIG_X86_MCE_THRESHOLD #endif
#ifdef CONFIG_X86_MCE_THRESHOLD
seq_printf(p, "%*s: ", prec, "THR"); seq_printf(p, "%*s: ", prec, "THR");
for_each_online_cpu(j) for_each_online_cpu(j)
seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count); seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count);
seq_printf(p, " Threshold APIC interrupts\n"); seq_printf(p, " Threshold APIC interrupts\n");
# endif
#endif #endif
#ifdef CONFIG_X86_MCE #ifdef CONFIG_X86_MCE
seq_printf(p, "%*s: ", prec, "MCE"); seq_printf(p, "%*s: ", prec, "MCE");
...@@ -194,11 +194,11 @@ u64 arch_irq_stat_cpu(unsigned int cpu) ...@@ -194,11 +194,11 @@ u64 arch_irq_stat_cpu(unsigned int cpu)
sum += irq_stats(cpu)->irq_call_count; sum += irq_stats(cpu)->irq_call_count;
sum += irq_stats(cpu)->irq_tlb_count; sum += irq_stats(cpu)->irq_tlb_count;
#endif #endif
#ifdef CONFIG_X86_MCE #ifdef CONFIG_X86_THERMAL_VECTOR
sum += irq_stats(cpu)->irq_thermal_count; sum += irq_stats(cpu)->irq_thermal_count;
# ifdef CONFIG_X86_MCE_THRESHOLD #endif
#ifdef CONFIG_X86_MCE_THRESHOLD
sum += irq_stats(cpu)->irq_threshold_count; sum += irq_stats(cpu)->irq_threshold_count;
# endif
#endif #endif
#ifdef CONFIG_X86_MCE #ifdef CONFIG_X86_MCE
sum += per_cpu(mce_exception_count, cpu); sum += per_cpu(mce_exception_count, cpu);
......
...@@ -43,6 +43,7 @@ ...@@ -43,6 +43,7 @@
#include <linux/smp.h> #include <linux/smp.h>
#include <linux/nmi.h> #include <linux/nmi.h>
#include <asm/debugreg.h>
#include <asm/apicdef.h> #include <asm/apicdef.h>
#include <asm/system.h> #include <asm/system.h>
...@@ -434,6 +435,11 @@ single_step_cont(struct pt_regs *regs, struct die_args *args) ...@@ -434,6 +435,11 @@ single_step_cont(struct pt_regs *regs, struct die_args *args)
"resuming...\n"); "resuming...\n");
kgdb_arch_handle_exception(args->trapnr, args->signr, kgdb_arch_handle_exception(args->trapnr, args->signr,
args->err, "c", "", regs); args->err, "c", "", regs);
/*
* Reset the BS bit in dr6 (pointed by args->err) to
* denote completion of processing
*/
(*(unsigned long *)ERR_PTR(args->err)) &= ~DR_STEP;
return NOTIFY_STOP; return NOTIFY_STOP;
} }
......
This diff is collapsed.
...@@ -25,6 +25,7 @@ ...@@ -25,6 +25,7 @@
#include <asm/desc.h> #include <asm/desc.h>
#include <asm/system.h> #include <asm/system.h>
#include <asm/cacheflush.h> #include <asm/cacheflush.h>
#include <asm/debugreg.h>
static void set_idt(void *newidt, __u16 limit) static void set_idt(void *newidt, __u16 limit)
{ {
...@@ -202,6 +203,7 @@ void machine_kexec(struct kimage *image) ...@@ -202,6 +203,7 @@ void machine_kexec(struct kimage *image)
/* Interrupts aren't acceptable while we reboot */ /* Interrupts aren't acceptable while we reboot */
local_irq_disable(); local_irq_disable();
hw_breakpoint_disable();
if (image->preserve_context) { if (image->preserve_context) {
#ifdef CONFIG_X86_IO_APIC #ifdef CONFIG_X86_IO_APIC
......
...@@ -18,6 +18,7 @@ ...@@ -18,6 +18,7 @@
#include <asm/pgtable.h> #include <asm/pgtable.h>
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
#include <asm/mmu_context.h> #include <asm/mmu_context.h>
#include <asm/debugreg.h>
static int init_one_level2_page(struct kimage *image, pgd_t *pgd, static int init_one_level2_page(struct kimage *image, pgd_t *pgd,
unsigned long addr) unsigned long addr)
...@@ -282,6 +283,7 @@ void machine_kexec(struct kimage *image) ...@@ -282,6 +283,7 @@ void machine_kexec(struct kimage *image)
/* Interrupts aren't acceptable while we reboot */ /* Interrupts aren't acceptable while we reboot */
local_irq_disable(); local_irq_disable();
hw_breakpoint_disable();
if (image->preserve_context) { if (image->preserve_context) {
#ifdef CONFIG_X86_IO_APIC #ifdef CONFIG_X86_IO_APIC
......
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
#include <linux/clockchips.h> #include <linux/clockchips.h>
#include <linux/random.h> #include <linux/random.h>
#include <trace/events/power.h> #include <trace/events/power.h>
#include <linux/hw_breakpoint.h>
#include <asm/system.h> #include <asm/system.h>
#include <asm/apic.h> #include <asm/apic.h>
#include <asm/syscalls.h> #include <asm/syscalls.h>
...@@ -17,6 +18,7 @@ ...@@ -17,6 +18,7 @@
#include <asm/uaccess.h> #include <asm/uaccess.h>
#include <asm/i387.h> #include <asm/i387.h>
#include <asm/ds.h> #include <asm/ds.h>
#include <asm/debugreg.h>
unsigned long idle_halt; unsigned long idle_halt;
EXPORT_SYMBOL(idle_halt); EXPORT_SYMBOL(idle_halt);
...@@ -103,14 +105,7 @@ void flush_thread(void) ...@@ -103,14 +105,7 @@ void flush_thread(void)
} }
#endif #endif
clear_tsk_thread_flag(tsk, TIF_DEBUG); flush_ptrace_hw_breakpoint(tsk);
tsk->thread.debugreg0 = 0;
tsk->thread.debugreg1 = 0;
tsk->thread.debugreg2 = 0;
tsk->thread.debugreg3 = 0;
tsk->thread.debugreg6 = 0;
tsk->thread.debugreg7 = 0;
memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
/* /*
* Forget coprocessor state.. * Forget coprocessor state..
...@@ -192,16 +187,6 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, ...@@ -192,16 +187,6 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
else if (next->debugctlmsr != prev->debugctlmsr) else if (next->debugctlmsr != prev->debugctlmsr)
update_debugctlmsr(next->debugctlmsr); update_debugctlmsr(next->debugctlmsr);
if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
set_debugreg(next->debugreg0, 0);
set_debugreg(next->debugreg1, 1);
set_debugreg(next->debugreg2, 2);
set_debugreg(next->debugreg3, 3);
/* no 4 and 5 */
set_debugreg(next->debugreg6, 6);
set_debugreg(next->debugreg7, 7);
}
if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^ if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
test_tsk_thread_flag(next_p, TIF_NOTSC)) { test_tsk_thread_flag(next_p, TIF_NOTSC)) {
/* prev and next are different */ /* prev and next are different */
......
...@@ -58,6 +58,7 @@ ...@@ -58,6 +58,7 @@
#include <asm/idle.h> #include <asm/idle.h>
#include <asm/syscalls.h> #include <asm/syscalls.h>
#include <asm/ds.h> #include <asm/ds.h>
#include <asm/debugreg.h>
asmlinkage void ret_from_fork(void) __asm__("ret_from_fork"); asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
...@@ -259,7 +260,12 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, ...@@ -259,7 +260,12 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
task_user_gs(p) = get_user_gs(regs); task_user_gs(p) = get_user_gs(regs);
p->thread.io_bitmap_ptr = NULL;
tsk = current; tsk = current;
err = -ENOMEM;
memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) { if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr, p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
IO_BITMAP_BYTES, GFP_KERNEL); IO_BITMAP_BYTES, GFP_KERNEL);
......
...@@ -52,6 +52,7 @@ ...@@ -52,6 +52,7 @@
#include <asm/idle.h> #include <asm/idle.h>
#include <asm/syscalls.h> #include <asm/syscalls.h>
#include <asm/ds.h> #include <asm/ds.h>
#include <asm/debugreg.h>
asmlinkage extern void ret_from_fork(void); asmlinkage extern void ret_from_fork(void);
...@@ -297,12 +298,16 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, ...@@ -297,12 +298,16 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
p->thread.fs = me->thread.fs; p->thread.fs = me->thread.fs;
p->thread.gs = me->thread.gs; p->thread.gs = me->thread.gs;
p->thread.io_bitmap_ptr = NULL;
savesegment(gs, p->thread.gsindex); savesegment(gs, p->thread.gsindex);
savesegment(fs, p->thread.fsindex); savesegment(fs, p->thread.fsindex);
savesegment(es, p->thread.es); savesegment(es, p->thread.es);
savesegment(ds, p->thread.ds); savesegment(ds, p->thread.ds);
err = -ENOMEM;
memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) { if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL); p->thread.io_bitmap_ptr = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
if (!p->thread.io_bitmap_ptr) { if (!p->thread.io_bitmap_ptr) {
...@@ -341,6 +346,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, ...@@ -341,6 +346,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
kfree(p->thread.io_bitmap_ptr); kfree(p->thread.io_bitmap_ptr);
p->thread.io_bitmap_max = 0; p->thread.io_bitmap_max = 0;
} }
return err; return err;
} }
...@@ -495,6 +501,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) ...@@ -495,6 +501,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*/ */
if (preload_fpu) if (preload_fpu)
__math_state_restore(); __math_state_restore();
return prev_p; return prev_p;
} }
......
This diff is collapsed.
...@@ -109,6 +109,7 @@ ...@@ -109,6 +109,7 @@
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
#include <asm/numa_64.h> #include <asm/numa_64.h>
#endif #endif
#include <asm/mce.h>
/* /*
* end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries. * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
...@@ -1031,6 +1032,8 @@ void __init setup_arch(char **cmdline_p) ...@@ -1031,6 +1032,8 @@ void __init setup_arch(char **cmdline_p)
#endif #endif
#endif #endif
x86_init.oem.banner(); x86_init.oem.banner();
mcheck_init();
} }
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
......
...@@ -799,15 +799,6 @@ static void do_signal(struct pt_regs *regs) ...@@ -799,15 +799,6 @@ static void do_signal(struct pt_regs *regs)
signr = get_signal_to_deliver(&info, &ka, regs, NULL); signr = get_signal_to_deliver(&info, &ka, regs, NULL);
if (signr > 0) { if (signr > 0) {
/*
* Re-enable any watchpoints before delivering the
* signal to user space. The processor register will
* have been cleared if the watchpoint triggered
* inside the kernel.
*/
if (current->thread.debugreg7)
set_debugreg(current->thread.debugreg7, 7);
/* Whee! Actually deliver the signal. */ /* Whee! Actually deliver the signal. */
if (handle_signal(signr, &info, &ka, oldset, regs) == 0) { if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
/* /*
......
...@@ -529,77 +529,56 @@ asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *eregs) ...@@ -529,77 +529,56 @@ asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *eregs)
dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code) dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
{ {
struct task_struct *tsk = current; struct task_struct *tsk = current;
unsigned long condition; unsigned long dr6;
int si_code; int si_code;
get_debugreg(condition, 6); get_debugreg(dr6, 6);
/* Catch kmemcheck conditions first of all! */ /* Catch kmemcheck conditions first of all! */
if (condition & DR_STEP && kmemcheck_trap(regs)) if ((dr6 & DR_STEP) && kmemcheck_trap(regs))
return; return;
/* DR6 may or may not be cleared by the CPU */
set_debugreg(0, 6);
/* /*
* The processor cleared BTF, so don't mark that we need it set. * The processor cleared BTF, so don't mark that we need it set.
*/ */
clear_tsk_thread_flag(tsk, TIF_DEBUGCTLMSR); clear_tsk_thread_flag(tsk, TIF_DEBUGCTLMSR);
tsk->thread.debugctlmsr = 0; tsk->thread.debugctlmsr = 0;
if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code, /* Store the virtualized DR6 value */
SIGTRAP) == NOTIFY_STOP) tsk->thread.debugreg6 = dr6;
if (notify_die(DIE_DEBUG, "debug", regs, PTR_ERR(&dr6), error_code,
SIGTRAP) == NOTIFY_STOP)
return; return;
/* It's safe to allow irq's after DR6 has been saved */ /* It's safe to allow irq's after DR6 has been saved */
preempt_conditional_sti(regs); preempt_conditional_sti(regs);
/* Mask out spurious debug traps due to lazy DR7 setting */ if (regs->flags & X86_VM_MASK) {
if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) { handle_vm86_trap((struct kernel_vm86_regs *) regs,
if (!tsk->thread.debugreg7) error_code, 1);
goto clear_dr7; return;
} }
#ifdef CONFIG_X86_32
if (regs->flags & X86_VM_MASK)
goto debug_vm86;
#endif
/* Save debug status register where ptrace can see it */
tsk->thread.debugreg6 = condition;
/* /*
* Single-stepping through TF: make sure we ignore any events in * Single-stepping through system calls: ignore any exceptions in
* kernel space (but re-enable TF when returning to user mode). * kernel space, but re-enable TF when returning to user mode.
*
* We already checked v86 mode above, so we can check for kernel mode
* by just checking the CPL of CS.
*/ */
if (condition & DR_STEP) { if ((dr6 & DR_STEP) && !user_mode(regs)) {
if (!user_mode(regs)) tsk->thread.debugreg6 &= ~DR_STEP;
goto clear_TF_reenable; set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
regs->flags &= ~X86_EFLAGS_TF;
} }
si_code = get_si_code(tsk->thread.debugreg6);
si_code = get_si_code(condition); if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
/* Ok, finally something we can handle */ send_sigtrap(tsk, regs, error_code, si_code);
send_sigtrap(tsk, regs, error_code, si_code);
/*
* Disable additional traps. They'll be re-enabled when
* the signal is delivered.
*/
clear_dr7:
set_debugreg(0, 7);
preempt_conditional_cli(regs); preempt_conditional_cli(regs);
return;
#ifdef CONFIG_X86_32
debug_vm86:
/* reenable preemption: handle_vm86_trap() might sleep */
dec_preempt_count();
handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
conditional_cli(regs);
return;
#endif
clear_TF_reenable:
set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
regs->flags &= ~X86_EFLAGS_TF;
preempt_conditional_cli(regs);
return; return;
} }
......
...@@ -42,6 +42,7 @@ ...@@ -42,6 +42,7 @@
#define CREATE_TRACE_POINTS #define CREATE_TRACE_POINTS
#include "trace.h" #include "trace.h"
#include <asm/debugreg.h>
#include <asm/uaccess.h> #include <asm/uaccess.h>
#include <asm/msr.h> #include <asm/msr.h>
#include <asm/desc.h> #include <asm/desc.h>
...@@ -3643,14 +3644,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) ...@@ -3643,14 +3644,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
trace_kvm_entry(vcpu->vcpu_id); trace_kvm_entry(vcpu->vcpu_id);
kvm_x86_ops->run(vcpu, kvm_run); kvm_x86_ops->run(vcpu, kvm_run);
if (unlikely(vcpu->arch.switch_db_regs || test_thread_flag(TIF_DEBUG))) { /*
set_debugreg(current->thread.debugreg0, 0); * If the guest has used debug registers, at least dr7
set_debugreg(current->thread.debugreg1, 1); * will be disabled while returning to the host.
set_debugreg(current->thread.debugreg2, 2); * If we don't have active breakpoints in the host, we don't
set_debugreg(current->thread.debugreg3, 3); * care about the messed up debug address registers. But if
set_debugreg(current->thread.debugreg6, 6); * we have some of them active, restore the old state.
set_debugreg(current->thread.debugreg7, 7); */
} if (hw_breakpoint_active())
hw_breakpoint_restore();
set_bit(KVM_REQ_KICK, &vcpu->requests); set_bit(KVM_REQ_KICK, &vcpu->requests);
local_irq_enable(); local_irq_enable();
......
...@@ -2,12 +2,25 @@ ...@@ -2,12 +2,25 @@
# Makefile for x86 specific library files. # Makefile for x86 specific library files.
# #
inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
inat_tables_maps = $(srctree)/arch/x86/lib/x86-opcode-map.txt
quiet_cmd_inat_tables = GEN $@
cmd_inat_tables = $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@
$(obj)/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
$(call cmd,inat_tables)
$(obj)/inat.o: $(obj)/inat-tables.c
clean-files := inat-tables.c
obj-$(CONFIG_SMP) := msr.o obj-$(CONFIG_SMP) := msr.o
lib-y := delay.o lib-y := delay.o
lib-y += thunk_$(BITS).o lib-y += thunk_$(BITS).o
lib-y += usercopy_$(BITS).o getuser.o putuser.o lib-y += usercopy_$(BITS).o getuser.o putuser.o
lib-y += memcpy_$(BITS).o lib-y += memcpy_$(BITS).o
lib-y += insn.o inat.o
obj-y += msr-reg.o msr-reg-export.o obj-y += msr-reg.o msr-reg-export.o
......
/*
* x86 instruction attribute tables
*
* Written by Masami Hiramatsu <mhiramat@redhat.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*
*/
#include <asm/insn.h>
/* Attribute tables are generated from opcode map */
#include "inat-tables.c"
/* Attribute search APIs */
insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode)
{
return inat_primary_table[opcode];
}
insn_attr_t inat_get_escape_attribute(insn_byte_t opcode, insn_byte_t last_pfx,
insn_attr_t esc_attr)
{
const insn_attr_t *table;
insn_attr_t lpfx_attr;
int n, m = 0;
n = inat_escape_id(esc_attr);
if (last_pfx) {
lpfx_attr = inat_get_opcode_attribute(last_pfx);
m = inat_last_prefix_id(lpfx_attr);
}
table = inat_escape_tables[n][0];
if (!table)
return 0;
if (inat_has_variant(table[opcode]) && m) {
table = inat_escape_tables[n][m];
if (!table)
return 0;
}
return table[opcode];
}
insn_attr_t inat_get_group_attribute(insn_byte_t modrm, insn_byte_t last_pfx,
insn_attr_t grp_attr)
{
const insn_attr_t *table;
insn_attr_t lpfx_attr;
int n, m = 0;
n = inat_group_id(grp_attr);
if (last_pfx) {
lpfx_attr = inat_get_opcode_attribute(last_pfx);
m = inat_last_prefix_id(lpfx_attr);
}
table = inat_group_tables[n][0];
if (!table)
return inat_group_common_attribute(grp_attr);
if (inat_has_variant(table[X86_MODRM_REG(modrm)]) && m) {
table = inat_group_tables[n][m];
if (!table)
return inat_group_common_attribute(grp_attr);
}
return table[X86_MODRM_REG(modrm)] |
inat_group_common_attribute(grp_attr);
}
insn_attr_t inat_get_avx_attribute(insn_byte_t opcode, insn_byte_t vex_m,
insn_byte_t vex_p)
{
const insn_attr_t *table;
if (vex_m > X86_VEX_M_MAX || vex_p > INAT_LSTPFX_MAX)
return 0;
table = inat_avx_tables[vex_m][vex_p];
if (!table)
return 0;
return table[opcode];
}
This diff is collapsed.
This diff is collapsed.
...@@ -38,7 +38,8 @@ enum x86_pf_error_code { ...@@ -38,7 +38,8 @@ enum x86_pf_error_code {
* Returns 0 if mmiotrace is disabled, or if the fault is not * Returns 0 if mmiotrace is disabled, or if the fault is not
* handled by mmiotrace: * handled by mmiotrace:
*/ */
static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr) static inline int __kprobes
kmmio_fault(struct pt_regs *regs, unsigned long addr)
{ {
if (unlikely(is_kmmio_active())) if (unlikely(is_kmmio_active()))
if (kmmio_handler(regs, addr) == 1) if (kmmio_handler(regs, addr) == 1)
...@@ -46,7 +47,7 @@ static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr) ...@@ -46,7 +47,7 @@ static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr)
return 0; return 0;
} }
static inline int notify_page_fault(struct pt_regs *regs) static inline int __kprobes notify_page_fault(struct pt_regs *regs)
{ {
int ret = 0; int ret = 0;
...@@ -240,7 +241,7 @@ void vmalloc_sync_all(void) ...@@ -240,7 +241,7 @@ void vmalloc_sync_all(void)
* *
* Handle a fault on the vmalloc or module mapping area * Handle a fault on the vmalloc or module mapping area
*/ */
static noinline int vmalloc_fault(unsigned long address) static noinline __kprobes int vmalloc_fault(unsigned long address)
{ {
unsigned long pgd_paddr; unsigned long pgd_paddr;
pmd_t *pmd_k; pmd_t *pmd_k;
...@@ -357,7 +358,7 @@ void vmalloc_sync_all(void) ...@@ -357,7 +358,7 @@ void vmalloc_sync_all(void)
* *
* This assumes no large pages in there. * This assumes no large pages in there.
*/ */
static noinline int vmalloc_fault(unsigned long address) static noinline __kprobes int vmalloc_fault(unsigned long address)
{ {
pgd_t *pgd, *pgd_ref; pgd_t *pgd, *pgd_ref;
pud_t *pud, *pud_ref; pud_t *pud, *pud_ref;
...@@ -860,7 +861,7 @@ static int spurious_fault_check(unsigned long error_code, pte_t *pte) ...@@ -860,7 +861,7 @@ static int spurious_fault_check(unsigned long error_code, pte_t *pte)
* There are no security implications to leaving a stale TLB when * There are no security implications to leaving a stale TLB when
* increasing the permissions on a page. * increasing the permissions on a page.
*/ */
static noinline int static noinline __kprobes int
spurious_fault(unsigned long error_code, unsigned long address) spurious_fault(unsigned long error_code, unsigned long address)
{ {
pgd_t *pgd; pgd_t *pgd;
......
...@@ -540,8 +540,14 @@ kmmio_die_notifier(struct notifier_block *nb, unsigned long val, void *args) ...@@ -540,8 +540,14 @@ kmmio_die_notifier(struct notifier_block *nb, unsigned long val, void *args)
struct die_args *arg = args; struct die_args *arg = args;
if (val == DIE_DEBUG && (arg->err & DR_STEP)) if (val == DIE_DEBUG && (arg->err & DR_STEP))
if (post_kmmio_handler(arg->err, arg->regs) == 1) if (post_kmmio_handler(arg->err, arg->regs) == 1) {
/*
* Reset the BS bit in dr6 (pointed by args->err) to
* denote completion of processing
*/
(*(unsigned long *)ERR_PTR(arg->err)) &= ~DR_STEP;
return NOTIFY_STOP; return NOTIFY_STOP;
}
return NOTIFY_DONE; return NOTIFY_DONE;
} }
......
...@@ -18,6 +18,7 @@ ...@@ -18,6 +18,7 @@
#include <asm/mce.h> #include <asm/mce.h>
#include <asm/xcr.h> #include <asm/xcr.h>
#include <asm/suspend.h> #include <asm/suspend.h>
#include <asm/debugreg.h>
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
static struct saved_context saved_context; static struct saved_context saved_context;
...@@ -142,31 +143,6 @@ static void fix_processor_context(void) ...@@ -142,31 +143,6 @@ static void fix_processor_context(void)
#endif #endif
load_TR_desc(); /* This does ltr */ load_TR_desc(); /* This does ltr */
load_LDT(&current->active_mm->context); /* This does lldt */ load_LDT(&current->active_mm->context); /* This does lldt */
/*
* Now maybe reload the debug registers
*/
if (current->thread.debugreg7) {
#ifdef CONFIG_X86_32
set_debugreg(current->thread.debugreg0, 0);
set_debugreg(current->thread.debugreg1, 1);
set_debugreg(current->thread.debugreg2, 2);
set_debugreg(current->thread.debugreg3, 3);
/* no 4 and 5 */
set_debugreg(current->thread.debugreg6, 6);
set_debugreg(current->thread.debugreg7, 7);
#else
/* CONFIG_X86_64 */
loaddebug(&current->thread, 0);
loaddebug(&current->thread, 1);
loaddebug(&current->thread, 2);
loaddebug(&current->thread, 3);
/* no 4 and 5 */
loaddebug(&current->thread, 6);
loaddebug(&current->thread, 7);
#endif
}
} }
/** /**
......
PHONY += posttest
ifeq ($(KBUILD_VERBOSE),1)
posttest_verbose = -v
else
posttest_verbose =
endif
ifeq ($(CONFIG_64BIT),y)
posttest_64bit = -y
else
posttest_64bit = -n
endif
distill_awk = $(srctree)/arch/x86/tools/distill.awk
chkobjdump = $(srctree)/arch/x86/tools/chkobjdump.awk
quiet_cmd_posttest = TEST $@
cmd_posttest = ($(OBJDUMP) -v | $(AWK) -f $(chkobjdump)) || $(OBJDUMP) -d -j .text $(objtree)/vmlinux | $(AWK) -f $(distill_awk) | $(obj)/test_get_len $(posttest_64bit) $(posttest_verbose)
posttest: $(obj)/test_get_len vmlinux
$(call cmd,posttest)
hostprogs-y := test_get_len
# -I needed for generated C source and C source which in the kernel tree.
HOSTCFLAGS_test_get_len.o := -Wall -I$(objtree)/arch/x86/lib/ -I$(srctree)/arch/x86/include/ -I$(srctree)/arch/x86/lib/ -I$(srctree)/include/
# Dependencies are also needed.
$(obj)/test_get_len.o: $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c $(srctree)/arch/x86/include/asm/inat_types.h $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment