• Yazen Ghannam's avatar
    x86/mce: Handle varying MCA bank counts · 006c0770
    Yazen Ghannam authored
    Linux reads MCG_CAP[Count] to find the number of MCA banks visible to a
    CPU. Currently, this number is the same for all CPUs and a warning is
    shown if there is a difference. The number of banks is overwritten with
    the MCG_CAP[Count] value of each following CPU that boots.
    
    According to the Intel SDM and AMD APM, the MCG_CAP[Count] value gives
    the number of banks that are available to a "processor implementation".
    The AMD BKDGs/PPRs further clarify that this value is per core. This
    value has historically been the same for every core in the system, but
    that is not an architectural requirement.
    
    Future AMD systems may have different MCG_CAP[Count] values per core,
    so the assumption that all CPUs will have the same MCG_CAP[Count] value
    will no longer be valid.
    
    Also, the first CPU to boot will allocate the struct mce_banks[] array
    using the number of banks based on its MCG_CAP[Count] value. The machine
    check handler and other functions use the global number of banks to
    iterate and index into the mce_banks[] array. So it's possible to use an
    out-of-bounds index on an asymmetric system where a following CPU sees a
    MCG_CAP[Count] value greater than its predecessors.
    
    Thus, allocate the mce_banks[] array to the maximum number of banks.
    This will avoid the potential out-of-bounds index since the value of
    mca_cfg.banks is capped to MAX_NR_BANKS.
    
    Set the value of mca_cfg.banks equal to the max of the previous value
    and the value for the current CPU. This way mca_cfg.banks will always
    represent the max number of banks detected on any CPU in the system.
    
    This will ensure that all CPUs will access all the banks that are
    visible to them. A CPU that can access fewer than the max number of
    banks will find the registers of the extra banks to be read-as-zero.
    
    Furthermore, print the resulting number of MCA banks in use. Do this in
    mcheck_late_init() so that the final value is printed after all CPUs
    have been initialized.
    
    Finally, get bank count from target CPU when doing injection with mce-inject
    module.
    
     [ bp: Remove out-of-bounds example, passify and cleanup commit message. ]
    Signed-off-by: default avatarYazen Ghannam <yazen.ghannam@amd.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: linux-edac <linux-edac@vger.kernel.org>
    Cc: Pu Wen <puwen@hygon.cn>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vishal Verma <vishal.l.verma@intel.com>
    Cc: x86-ml <x86@kernel.org>
    Link: https://lkml.kernel.org/r/20180727214009.78289-1-Yazen.Ghannam@amd.com
    006c0770
core.c 56.7 KB