• Vitaly Kuznetsov's avatar
    Revert "KVM: VMX: Micro-optimize vmexit time when not exposing PMU" · 49097762
    Vitaly Kuznetsov authored
    Guest crashes are observed on a Cascade Lake system when 'perf top' is
    launched on the host, e.g.
    
     BUG: unable to handle kernel paging request at fffffe0000073038
     PGD 7ffa7067 P4D 7ffa7067 PUD 7ffa6067 PMD 7ffa5067 PTE ffffffffff120
     Oops: 0000 [#1] SMP PTI
     CPU: 1 PID: 1 Comm: systemd Not tainted 4.18.0+ #380
    ...
     Call Trace:
      serial8250_console_write+0xfe/0x1f0
      call_console_drivers.constprop.0+0x9d/0x120
      console_unlock+0x1ea/0x460
    
    Call traces are different but the crash is imminent. The problem was
    blindly bisected to the commit 041bc42c ("KVM: VMX: Micro-optimize
    vmexit time when not exposing PMU"). It was also confirmed that the
    issue goes away if PMU is exposed to the guest.
    
    With some instrumentation of the guest we can see what is being switched
    (when we do atomic_switch_perf_msrs()):
    
     vmx_vcpu_run: switching 2 msrs
     vmx_vcpu_run: switching MSR38f guest: 70000000d host: 70000000f
     vmx_vcpu_run: switching MSR3f1 guest: 0 host: 2
    
    The current guess is that PEBS (MSR_IA32_PEBS_ENABLE, 0x3f1) is to blame.
    Regardless of whether PMU is exposed to the guest or not, PEBS needs to
    be disabled upon switch.
    
    This reverts commit 041bc42c.
    Reported-by: default avatarMaxime Coquelin <maxime.coquelin@redhat.com>
    Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
    Message-Id: <20200619094046.654019-1-vkuznets@redhat.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    49097762
vmx.c 227 KB