• Tony Luck's avatar
    x86/mce: Fix machine_check_poll() tests for error types · f19501aa
    Tony Luck authored
    There has been a lurking "TBD" in the machine check poll routine ever
    since it was first split out from the machine check handler. The
    potential issue is that the poll routine may have just begun a read from
    the STATUS register in a machine check bank when the hardware logs an
    error in that bank and signals a machine check.
    
    That race used to be pretty small back when machine checks were
    broadcast, but the addition of local machine check means that the poll
    code could continue running and clear the error from the bank before the
    local machine check handler on another CPU gets around to reading it.
    
    Fix the code to be sure to only process errors that need to be processed
    in the poll code, leaving other logged errors alone for the machine
    check handler to find and process.
    
     [ bp: Massage a bit and flip the "== 0" check to the usual !(..) test. ]
    
    Fixes: b79109c3 ("x86, mce: separate correct machine check poller and fatal exception handler")
    Fixes: ed7290d0 ("x86, mce: implement new status bits")
    Reported-by: default avatarAshok Raj <ashok.raj@intel.com>
    Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Cc: Ashok Raj <ashok.raj@intel.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: linux-edac <linux-edac@vger.kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: x86-ml <x86@kernel.org>
    Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
    Link: https://lkml.kernel.org/r/20190312170938.GA23035@agluck-desk
    f19501aa
core.c 56.9 KB