• HATAYAMA Daisuke's avatar
    perf/x86/intel: ignore CondChgd bit to avoid false NMI handling · a2b2c030
    HATAYAMA Daisuke authored
    commit b292d7a1 upstream.
    
    Currently, any NMI is falsely handled by a NMI handler of NMI watchdog
    if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
    
    For example, we use external NMI to make system panic to get crash
    dump, but in this case, the external NMI is falsely handled do to the
    issue.
    
    This commit deals with the issue simply by ignoring CondChgd bit.
    
    Here is explanation in detail.
    
    On x86 NMI watchdog uses performance monitoring feature to
    periodically signal NMI each time performance counter gets overflowed.
    
    intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
    handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
    owner of a given NMI by looking at overflow status bits in
    MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
    handles the given NMI as its own NMI.
    
    The problem is that the intel_pmu_handle_irq() doesn't distinguish
    CondChgd bit from other bits. Unlike the other status bits, CondChgd
    bit doesn't represent overflow status for performance counters. Thus,
    CondChgd bit cannot be thought of as a mark indicating a given NMI is
    NMI watchdog's.
    
    As a result, if CondChgd bit is set, any NMI is falsely handled by the
    NMI handler of NMI watchdog. Also, if type of the falsely handled NMI
    is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding
    action is never performed until CondChgd bit is cleared.
    
    I noticed this behavior on systems with Ivy Bridge processors: Intel
    Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
    CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
    in the beginning at boot. Then the CondChgd bit is immediately cleared
    by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
    0.
    
    On the other hand, on older processors such as Nehalem, Xeon E7540,
    CondChgd bit is not set in the beginning at boot.
    
    I'm not sure about exact behavior of CondChgd bit, in particular when
    this bit is set. Although I read Intel System Programmer's Manual to
    figure out that, the descriptions I found are:
    
      In 18.9.1:
    
      "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to
       indicate changes to the state of performancmonitoring hardware"
    
      In Table 35-2 IA-32 Architectural MSRs
    
      63 CondChg: status bits of this register has changed.
    
    These are different from the bahviour I see on the actual system as I
    explained above.
    
    At least, I think ignoring CondChgd bit should be enough for NMI
    watchdog perspective.
    Signed-off-by: default avatarHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
    Acked-by: default avatarDon Zickus <dzickus@redhat.com>
    Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    a2b2c030
perf_event_intel.c 50.6 KB