• Sean Christopherson's avatar
    KVM: nVMX: add option to perform early consistency checks via H/W · 52017608
    Sean Christopherson authored
    KVM defers many VMX consistency checks to the CPU, ostensibly for
    performance reasons[1], including checks that result in VMFail (as
    opposed to VMExit).  This behavior may be undesirable for some users
    since this means KVM detects certain classes of VMFail only after it
    has processed guest state, e.g. emulated MSR load-on-entry.  Because
    there is a strict ordering between checks that cause VMFail and those
    that cause VMExit, i.e. all VMFail checks are performed before any
    checks that cause VMExit, we can detect (almost) all VMFail conditions
    via a dry run of sorts.  The almost qualifier exists because some
    state in vmcs02 comes from L0, e.g. VPID, which means that hardware
    will never detect an invalid VPID in vmcs12 because it never sees
    said value.  Software must (continue to) explicitly check such fields.
    
    After preparing vmcs02 with all state needed to pass the VMFail
    consistency checks, optionally do a "test" VMEnter with an invalid
    GUEST_RFLAGS.  If the VMEnter results in a VMExit (due to bad guest
    state), then we can safely say that the nested VMEnter should not
    VMFail, i.e. any VMFail encountered in nested_vmx_vmexit() must
    be due to an L0 bug.  GUEST_RFLAGS is used to induce VMExit as it
    is unconditionally loaded on all implementations of VMX, has an
    invalid value that is writable on a 32-bit system and its consistency
    check is performed relatively early in all implementations (the exact
    order of consistency checks is micro-architectural).
    
    Unfortunately, since the "passing" case causes a VMExit, KVM must
    be extra diligent to ensure that host state is restored, e.g. DR7
    and RFLAGS are reset on VMExit.  Failure to restore RFLAGS.IF is
    particularly fatal.
    
    And of course the extra VMEnter and VMExit impacts performance.
    The raw overhead of the early consistency checks is ~6% on modern
    hardware (though this could easily vary based on configuration),
    while the added latency observed from the L1 VMM is ~10%.  The
    early consistency checks do not occur in a vacuum, e.g. spending
    more time in L0 can lead to more interrupts being serviced while
    emulating VMEnter, thereby increasing the latency observed by L1.
    
    Add a module param, early_consistency_checks, to provide control
    over whether or not VMX performs the early consistency checks.
    In addition to standard on/off behavior, the param accepts a value
    of -1, which is essentialy an "auto" setting whereby KVM does
    the early checks only when it thinks it's running on bare metal.
    When running nested, doing early checks is of dubious value since
    the resulting behavior is heavily dependent on L0.  In the future,
    the "auto" setting could also be used to default to skipping the
    early hardware checks for certain configurations/platforms if KVM
    reaches a state where it has 100% coverage of VMFail conditions.
    
    [1] To my knowledge no one has implemented and tested full software
        emulation of the VMFail consistency checks.  Until that happens,
        one can only speculate about the actual performance overhead of
        doing all VMFail consistency checks in software.  Obviously any
        code is slower than no code, but in the grand scheme of nested
        virtualization it's entirely possible the overhead is negligible.
    Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    52017608
vmx.c 412 KB