1. 05 Jul, 2019 10 commits
  2. 03 Jul, 2019 2 commits
  3. 02 Jul, 2019 11 commits
  4. 20 Jun, 2019 2 commits
  5. 18 Jun, 2019 15 commits
    • Paolo Bonzini's avatar
      KVM: nVMX: shadow pin based execution controls · eceb9973
      Paolo Bonzini authored
      The VMX_PREEMPTION_TIMER flag may be toggled frequently, though not
      *very* frequently.  Since it does not affect KVM's dirty logic, e.g.
      the preemption timer value is loaded from vmcs12 even if vmcs12 is
      "clean", there is no need to mark vmcs12 dirty when L1 writes pin
      controls, and shadowing the field achieves that.
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eceb9973
    • Sean Christopherson's avatar
      KVM: VMX: Leave preemption timer running when it's disabled · 804939ea
      Sean Christopherson authored
      VMWRITEs to the major VMCS controls, pin controls included, are
      deceptively expensive.  CPUs with VMCS caching (Westmere and later) also
      optimize away consistency checks on VM-Entry, i.e. skip consistency
      checks if the relevant fields have not changed since the last successful
      VM-Entry (of the cached VMCS).  Because uops are a precious commodity,
      uCode's dirty VMCS field tracking isn't as precise as software would
      prefer.  Notably, writing any of the major VMCS fields effectively marks
      the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
      consistency checks, which consumes several hundred cycles.
      
      As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
      doubles the latency of the next VM-Entry (and again when/if the flag is
      toggled back).  In a non-nested scenario, running a "standard" guest
      with the preemption timer enabled, toggling the timer flag is uncommon
      but not rare, e.g. roughly 1 in 10 entries.  Disabling the preemption
      timer can change these numbers due to its use for "immediate exits",
      even when explicitly disabled by userspace.
      
      Nested virtualization in particular is painful, as the timer flag is set
      for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
      pin controls to *clear* the flag since its the timer's final state isn't
      known until vmx_vcpu_run().  I.e. the majority of nested VM-Enters end
      up unnecessarily writing pin controls *twice*.
      
      Rather than toggle the timer flag in pin controls, set the timer value
      itself to the largest allowed value to put it into a "soft disabled"
      state, and ignore any spurious preemption timer exits.
      
      Sadly, the timer is a 32-bit value and so theoretically it can fire
      before the head death of the universe, i.e. spurious exits are possible.
      But because KVM does *not* save the timer value on VM-Exit and because
      the timer runs at a slower rate than the TSC, the maximuma timer value
      is still sufficiently large for KVM's purposes.  E.g. on a modern CPU
      with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
      TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
      execution.  In other words, spurious VM-Exits are effectively only
      possible if the host is completely tickless on the logical CPU, the
      guest is not using the preemption timer, and the guest is not generating
      VM-Exits for any other reason.
      
      To be safe from bad/weird hardware, disable the preemption timer if its
      maximum delay is less than ten seconds.  Ten seconds is mostly arbitrary
      and was selected in no small part because it's a nice round number.
      For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
      if the preemption timer is disabled by KVM or userspace.  Previously
      KVM continued to use the preemption timer to force immediate exits even
      when the timer was disabled by userspace.  Now that KVM leaves the timer
      running instead of truly disabling it, allow userspace to kill it
      entirely in the unlikely event the timer (or KVM) malfunctions.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      804939ea
    • Sean Christopherson's avatar
      KVM: VMX: Drop hv_timer_armed from 'struct loaded_vmcs' · 9d99cc49
      Sean Christopherson authored
      ... now that it is fully redundant with the pin controls shadow.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9d99cc49
    • Sean Christopherson's avatar
      KVM: nVMX: Preset *DT exiting in vmcs02 when emulating UMIP · 469debdb
      Sean Christopherson authored
      KVM dynamically toggles SECONDARY_EXEC_DESC to intercept (a subset of)
      instructions that are subject to User-Mode Instruction Prevention, i.e.
      VMCS.SECONDARY_EXEC_DESC == CR4.UMIP when emulating UMIP.  Preset the
      VMCS control when preparing vmcs02 to avoid unnecessarily VMWRITEs,
      e.g. KVM will clear VMCS.SECONDARY_EXEC_DESC in prepare_vmcs02_early()
      and then set it in vmx_set_cr4().
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      469debdb
    • Sean Christopherson's avatar
      KVM: nVMX: Preserve last USE_MSR_BITMAPS when preparing vmcs02 · de0286b7
      Sean Christopherson authored
      KVM dynamically toggles the CPU_BASED_USE_MSR_BITMAPS execution control
      for nested guests based on whether or not both L0 and L1 want to pass
      through the same MSRs to L2.  Preserve the last used value from vmcs02
      so as to avoid multiple VMWRITEs to (re)set/(re)clear the bit on nested
      VM-Entry.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      de0286b7
    • Sean Christopherson's avatar
      KVM: VMX: Explicitly initialize controls shadow at VMCS allocation · 3af80fec
      Sean Christopherson authored
      Or: Don't re-initialize vmcs02's controls on every nested VM-Entry.
      
      VMWRITEs to the major VMCS controls are deceptively expensive.  Intel
      CPUs with VMCS caching (Westmere and later) also optimize away
      consistency checks on VM-Entry, i.e. skip consistency checks if the
      relevant fields have not changed since the last successful VM-Entry (of
      the cached VMCS).  Because uops are a precious commodity, uCode's dirty
      VMCS field tracking isn't as precise as software would prefer.  Notably,
      writing any of the major VMCS fields effectively marks the entire VMCS
      dirty, i.e. causes the next VM-Entry to perform all consistency checks,
      which consumes several hundred cycles.
      
      Zero out the controls' shadow copies during VMCS allocation and use the
      optimized setter when "initializing" controls.  While this technically
      affects both non-nested and nested virtualization, nested virtualization
      is the primary beneficiary as avoid VMWRITEs when prepare vmcs02 allows
      hardware to optimizie away consistency checks.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3af80fec
    • Sean Christopherson's avatar
      KVM: nVMX: Don't reset VMCS controls shadow on VMCS switch · ae81d089
      Sean Christopherson authored
      ... now that the shadow copies are per-VMCS.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ae81d089
    • Sean Christopherson's avatar
      KVM: nVMX: Shadow VMCS controls on a per-VMCS basis · 09e226cf
      Sean Christopherson authored
      ... to pave the way for not preserving the shadow copies across switches
      between vmcs01 and vmcs02, and eventually to avoid VMWRITEs to vmcs02
      when the desired value is unchanged across nested VM-Enters.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      09e226cf
    • Sean Christopherson's avatar
      KVM: VMX: Shadow VMCS secondary execution controls · fe7f895d
      Sean Christopherson authored
      Prepare to shadow all major control fields on a per-VMCS basis, which
      allows KVM to avoid costly VMWRITEs when switching between vmcs01 and
      vmcs02.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fe7f895d
    • Sean Christopherson's avatar
      KVM: VMX: Shadow VMCS primary execution controls · 2183f564
      Sean Christopherson authored
      Prepare to shadow all major control fields on a per-VMCS basis, which
      allows KVM to avoid VMREADs when switching between vmcs01 and vmcs02,
      and more importantly can eliminate costly VMWRITEs to controls when
      preparing vmcs02.
      
      Shadowing exec controls also saves a VMREAD when opening virtual
      INTR/NMI windows, yay...
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2183f564
    • Sean Christopherson's avatar
      KVM: VMX: Shadow VMCS pin controls · c5f2c766
      Sean Christopherson authored
      Prepare to shadow all major control fields on a per-VMCS basis, which
      allows KVM to avoid costly VMWRITEs when switching between vmcs01 and
      vmcs02.
      
      Shadowing pin controls also allows a future patch to remove the per-VMCS
      'hv_timer_armed' flag, as the shadow copy is a superset of said flag.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c5f2c766
    • Sean Christopherson's avatar
      KVM: VMX: Add builder macros for shadowing controls · 70f932ec
      Sean Christopherson authored
      ... to pave the way for shadowing all (five) major VMCS control fields
      without massive amounts of error prone copy+paste+modify.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      70f932ec
    • Sean Christopherson's avatar
      KVM: nVMX: Use adjusted pin controls for vmcs02 · c075c3e4
      Sean Christopherson authored
      KVM provides a module parameter to allow disabling virtual NMI support
      to simplify testing (hardware *without* virtual NMI support is hard to
      come by but it does have users).  When preparing vmcs02, use the accessor
      for pin controls to ensure that the module param is respected for nested
      guests.
      
      Opportunistically swap the order of applying L0's and L1's pin controls
      to better align with other controls and to prepare for a future patche
      that will ignore L1's, but not L0's, preemption timer flag.
      
      Fixes: d02fcf50 ("kvm: vmx: Allow disabling virtual NMI support")
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c075c3e4
    • Sean Christopherson's avatar
      KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary · c7554efc
      Sean Christopherson authored
      Per Intel's SDM:
      
        ... the logical processor uses PAE paging if CR0.PG=1, CR4.PAE=1 and
        IA32_EFER.LME=0.  A VM entry to a guest that uses PAE paging loads the
        PDPTEs into internal, non-architectural registers based on the setting
        of the "enable EPT" VM-execution control.
      
      and:
      
        [GUEST_PDPTR] values are saved into the four PDPTE fields as follows:
      
          - If the "enable EPT" VM-execution control is 0 or the logical
            processor was not using PAE paging at the time of the VM exit,
            the values saved are undefined.
      
      In other words, if EPT is disabled or the guest isn't using PAE paging,
      then the PDPTRS aren't consumed by hardware on VM-Entry and are loaded
      with junk on VM-Exit.  From a nesting perspective, all of the above hold
      true, i.e. KVM can effectively ignore the VMCS PDPTRs.  E.g. KVM already
      loads the PDPTRs from memory when nested EPT is disabled (see
      nested_vmx_load_cr3()).
      
      Because KVM intercepts setting CR4.PAE, there is no danger of consuming
      a stale value or crushing L1's VMWRITEs regardless of whether L1
      intercepts CR4.PAE. The vmcs12's values are unchanged up until the
      VM-Exit where L2 sets CR4.PAE, i.e. L0 will see the new PAE state on the
      subsequent VM-Entry and propagate the PDPTRs from vmcs12 to vmcs02.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c7554efc
    • Paolo Bonzini's avatar
      KVM: x86: introduce is_pae_paging · bf03d4f9
      Paolo Bonzini authored
      Checking for 32-bit PAE is quite common around code that fiddles with
      the PDPTRs.  Add a function to compress all checks into a single
      invocation.
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bf03d4f9