• Eiichi Tsukata's avatar
    KVM: x86: Fix APIC page invalidation race · 35153372
    Eiichi Tsukata authored
    [ Upstream commit e649b3f0 ]
    
    Commit b1394e74 ("KVM: x86: fix APIC page invalidation") tried
    to fix inappropriate APIC page invalidation by re-introducing arch
    specific kvm_arch_mmu_notifier_invalidate_range() and calling it from
    kvm_mmu_notifier_invalidate_range_start. However, the patch left a
    possible race where the VMCS APIC address cache is updated *before*
    it is unmapped:
    
      (Invalidator) kvm_mmu_notifier_invalidate_range_start()
      (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
      (KVM VCPU) vcpu_enter_guest()
      (KVM VCPU) kvm_vcpu_reload_apic_access_page()
      (Invalidator) actually unmap page
    
    Because of the above race, there can be a mismatch between the
    host physical address stored in the APIC_ACCESS_PAGE VMCS field and
    the host physical address stored in the EPT entry for the APIC GPA
    (0xfee0000).  When this happens, the processor will not trap APIC
    accesses, and will instead show the raw contents of the APIC-access page.
    Because Windows OS periodically checks for unexpected modifications to
    the LAPIC register, this will show up as a BSOD crash with BugCheck
    CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in
    https://bugzilla.redhat.com/show_bug.cgi?id=1751017.
    
    The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range()
    cannot guarantee that no additional references are taken to the pages in
    the range before kvm_mmu_notifier_invalidate_range_end().  Fortunately,
    this case is supported by the MMU notifier API, as documented in
    include/linux/mmu_notifier.h:
    
    	 * If the subsystem
             * can't guarantee that no additional references are taken to
             * the pages in the range, it has to implement the
             * invalidate_range() notifier to remove any references taken
             * after invalidate_range_start().
    
    The fix therefore is to reload the APIC-access page field in the VMCS
    from kvm_mmu_notifier_invalidate_range() instead of ..._range_start().
    
    Cc: stable@vger.kernel.org
    Fixes: b1394e74 ("KVM: x86: fix APIC page invalidation")
    Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951Signed-off-by: default avatarEiichi Tsukata <eiichi.tsukata@nutanix.com>
    Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
    35153372
x86.c 253 KB