• Sean Christopherson's avatar
    KVM: x86: Do not re-{try,execute} after failed emulation in L2 · 6c3dfeb6
    Sean Christopherson authored
    Commit a6f177ef ("KVM: Reenter guest after emulation failure if
    due to access to non-mmio address") added reexecute_instruction() to
    handle the scenario where two (or more) vCPUS race to write a shadowed
    page, i.e. reexecute_instruction() is intended to return true if and
    only if the instruction being emulated was accessing a shadowed page.
    As L0 is only explicitly shadowing L1 tables, an emulation failure of
    a nested VM instruction cannot be due to a race to write a shadowed
    page and so should never be re-executed.
    
    This fixes an issue where an "MMIO" emulation failure[1] in L2 is all
    but guaranteed to result in an infinite loop when TDP is enabled.
    Because "cr2" is actually an L2 GPA when TDP is enabled, calling
    kvm_mmu_gva_to_gpa_write() to translate cr2 in the non-direct mapped
    case (L2 is never direct mapped) will almost always yield UNMAPPED_GVA
    and cause reexecute_instruction() to immediately return true.  The
    !mmio_info_in_cache() check in kvm_mmu_page_fault() doesn't catch this
    case because mmio_info_in_cache() returns false for a nested MMU (the
    MMIO caching currently handles L1 only, e.g. to cache nested guests'
    GPAs we'd have to manually flush the cache when switching between
    VMs and when L1 updated its page tables controlling the nested guest).
    
    Way back when, commit 68be0803 ("KVM: x86: never re-execute
    instruction with enabled tdp") changed reexecute_instruction() to
    always return false when using TDP under the assumption that KVM would
    only get into the emulator for MMIO.  Commit 95b3cf69 ("KVM: x86:
    let reexecute_instruction work for tdp") effectively reverted that
    behavior in order to handle the scenario where emulation failed due to
    an access from L1 to the shadow page tables for L2, but it didn't
    account for the case where emulation failed in L2 with TDP enabled.
    
    All of the above logic also applies to retry_instruction(), added by
    commit 1cb3f3ae ("KVM: x86: retry non-page-table writing
    instructions").  An indefinite loop in retry_instruction() should be
    impossible as it protects against retrying the same instruction over
    and over, but it's still correct to not retry an L2 instruction in
    the first place.
    
    Fix the immediate issue by adding a check for a nested guest when
    determining whether or not to allow retry in kvm_mmu_page_fault().
    In addition to fixing the immediate bug, add WARN_ON_ONCE in the
    retry functions since they are not designed to handle nested cases,
    i.e. they need to be modified even if there is some scenario in the
    future where we want to allow retrying a nested guest.
    
    [1] This issue was encountered after commit 3a2936de ("kvm: mmu:
        Don't expose private memslots to L2") changed the page fault path
        to return KVM_PFN_NOSLOT when translating an L2 access to a
        prive memslot.  Returning KVM_PFN_NOSLOT is semantically correct
        when we want to hide a memslot from L2, i.e. there effectively is
        no defined memory region for L2, but it has the unfortunate side
        effect of making KVM think the GFN is a MMIO page, thus triggering
        emulation.  The failure occurred with in-development code that
        deliberately exposed a private memslot to L2, which L2 accessed
        with an instruction that is not emulated by KVM.
    
    Fixes: 95b3cf69 ("KVM: x86: let reexecute_instruction work for tdp")
    Fixes: 1cb3f3ae ("KVM: x86: retry non-page-table writing instructions")
    Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
    Cc: Jim Mattson <jmattson@google.com>
    Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
    Cc: Xiao Guangrong <xiaoguangrong@tencent.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
    6c3dfeb6
mmu.c 153 KB