Commit 243d657e authored by Ashok Raj's avatar Ashok Raj Committed by Ingo Molnar

x86/mce: Handle Local MCE events

Add the necessary changes to do_machine_check() to be able to
process MCEs signaled as local MCEs. Typically, only recoverable
errors (SRAR type) will be Signaled as LMCE. The architecture
does not restrict to only those errors, however.

When errors are signaled as LMCE, there is no need for the MCE
handler to perform rendezvous with other logical processors
unlike earlier processors that would broadcast machine check
errors.
Signed-off-by: default avatarAshok Raj <ashok.raj@intel.com>
Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-17-git-send-email-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent 88d53867
...@@ -1047,6 +1047,7 @@ void do_machine_check(struct pt_regs *regs, long error_code) ...@@ -1047,6 +1047,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
char *msg = "Unknown"; char *msg = "Unknown";
u64 recover_paddr = ~0ull; u64 recover_paddr = ~0ull;
int flags = MF_ACTION_REQUIRED; int flags = MF_ACTION_REQUIRED;
int lmce = 0;
prev_state = ist_enter(regs); prev_state = ist_enter(regs);
...@@ -1074,11 +1075,20 @@ void do_machine_check(struct pt_regs *regs, long error_code) ...@@ -1074,11 +1075,20 @@ void do_machine_check(struct pt_regs *regs, long error_code)
kill_it = 1; kill_it = 1;
/* /*
* Go through all the banks in exclusion of the other CPUs. * Check if this MCE is signaled to only this logical processor
* This way we don't report duplicated events on shared banks
* because the first one to see it will clear it.
*/ */
order = mce_start(&no_way_out); if (m.mcgstatus & MCG_STATUS_LMCES)
lmce = 1;
else {
/*
* Go through all the banks in exclusion of the other CPUs.
* This way we don't report duplicated events on shared banks
* because the first one to see it will clear it.
* If this is a Local MCE, then no need to perform rendezvous.
*/
order = mce_start(&no_way_out);
}
for (i = 0; i < cfg->banks; i++) { for (i = 0; i < cfg->banks; i++) {
__clear_bit(i, toclear); __clear_bit(i, toclear);
if (!test_bit(i, valid_banks)) if (!test_bit(i, valid_banks))
...@@ -1155,8 +1165,18 @@ void do_machine_check(struct pt_regs *regs, long error_code) ...@@ -1155,8 +1165,18 @@ void do_machine_check(struct pt_regs *regs, long error_code)
* Do most of the synchronization with other CPUs. * Do most of the synchronization with other CPUs.
* When there's any problem use only local no_way_out state. * When there's any problem use only local no_way_out state.
*/ */
if (mce_end(order) < 0) if (!lmce) {
no_way_out = worst >= MCE_PANIC_SEVERITY; if (mce_end(order) < 0)
no_way_out = worst >= MCE_PANIC_SEVERITY;
} else {
/*
* Local MCE skipped calling mce_reign()
* If we found a fatal error, we need to panic here.
*/
if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)
mce_panic("Machine check from unknown source",
NULL, NULL);
}
/* /*
* At insane "tolerant" levels we take no action. Otherwise * At insane "tolerant" levels we take no action. Otherwise
......
...@@ -452,4 +452,5 @@ void mce_intel_feature_init(struct cpuinfo_x86 *c) ...@@ -452,4 +452,5 @@ void mce_intel_feature_init(struct cpuinfo_x86 *c)
{ {
intel_init_thermal(c); intel_init_thermal(c);
intel_init_cmci(); intel_init_cmci();
intel_init_lmce();
} }
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment