• Borislav Petkov's avatar
    x86/mce: Defer processing of early errors · 3bff147b
    Borislav Petkov authored
    When a fatal machine check results in a system reset, Linux does not
    clear the error(s) from machine check bank(s) - hardware preserves the
    machine check banks across a warm reset.
    
    During initialization of the kernel after the reboot, Linux reads, logs,
    and clears all machine check banks.
    
    But there is a problem. In:
    
      5de97c9f ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
    
    the call to mce_register_decode_chain() moved later in the boot
    sequence. This means that /dev/mcelog doesn't see those early error
    logs.
    
    This was partially fixed by:
    
      cd9c57ca ("x86/MCE: Dump MCE to dmesg if no consumers")
    
    which made sure that the logs were not lost completely by printing
    to the console. But parsing console logs is error prone. Users of
    /dev/mcelog should expect to find any early errors logged to standard
    places.
    
    Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used in early
    machine check initialization to indicate that any errors found should
    just be queued to genpool. When mcheck_late_init() is called it will
    call mce_schedule_work() to actually log and flush any errors queued in
    the genpool.
    
     [ Based on an original patch, commit message by and completely
       productized by Tony Luck. ]
    
    Fixes: 5de97c9f ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
    Reported-by: default avatarSumanth Kamatala <skamatala@juniper.net>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Link: https://lkml.kernel.org/r/20210824003129.GA1642753@agluck-desk2.amr.corp.intel.com
    3bff147b
mce.h 12.8 KB