1. 11 Jul, 2019 1 commit
    • Eric Hankland's avatar
      KVM: x86: PMU Event Filter · 66bb8a06
      Eric Hankland authored
      Some events can provide a guest with information about other guests or the
      host (e.g. L3 cache stats); providing the capability to restrict access
      to a "safe" set of events would limit the potential for the PMU to be used
      in any side channel attacks. This change introduces a new VM ioctl that
      sets an event filter. If the guest attempts to program a counter for
      any blacklisted or non-whitelisted event, the kernel counter won't be
      created, so any RDPMC/RDMSR will show 0 instances of that event.
      Signed-off-by: default avatarEric Hankland <ehankland@google.com>
      [Lots of changes. All remaining bugs are probably mine. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      66bb8a06
  2. 10 Jul, 2019 2 commits
  3. 05 Jul, 2019 15 commits
  4. 03 Jul, 2019 2 commits
  5. 02 Jul, 2019 11 commits
  6. 20 Jun, 2019 2 commits
  7. 18 Jun, 2019 7 commits
    • Paolo Bonzini's avatar
      KVM: nVMX: shadow pin based execution controls · eceb9973
      Paolo Bonzini authored
      The VMX_PREEMPTION_TIMER flag may be toggled frequently, though not
      *very* frequently.  Since it does not affect KVM's dirty logic, e.g.
      the preemption timer value is loaded from vmcs12 even if vmcs12 is
      "clean", there is no need to mark vmcs12 dirty when L1 writes pin
      controls, and shadowing the field achieves that.
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eceb9973
    • Sean Christopherson's avatar
      KVM: VMX: Leave preemption timer running when it's disabled · 804939ea
      Sean Christopherson authored
      VMWRITEs to the major VMCS controls, pin controls included, are
      deceptively expensive.  CPUs with VMCS caching (Westmere and later) also
      optimize away consistency checks on VM-Entry, i.e. skip consistency
      checks if the relevant fields have not changed since the last successful
      VM-Entry (of the cached VMCS).  Because uops are a precious commodity,
      uCode's dirty VMCS field tracking isn't as precise as software would
      prefer.  Notably, writing any of the major VMCS fields effectively marks
      the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
      consistency checks, which consumes several hundred cycles.
      
      As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
      doubles the latency of the next VM-Entry (and again when/if the flag is
      toggled back).  In a non-nested scenario, running a "standard" guest
      with the preemption timer enabled, toggling the timer flag is uncommon
      but not rare, e.g. roughly 1 in 10 entries.  Disabling the preemption
      timer can change these numbers due to its use for "immediate exits",
      even when explicitly disabled by userspace.
      
      Nested virtualization in particular is painful, as the timer flag is set
      for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
      pin controls to *clear* the flag since its the timer's final state isn't
      known until vmx_vcpu_run().  I.e. the majority of nested VM-Enters end
      up unnecessarily writing pin controls *twice*.
      
      Rather than toggle the timer flag in pin controls, set the timer value
      itself to the largest allowed value to put it into a "soft disabled"
      state, and ignore any spurious preemption timer exits.
      
      Sadly, the timer is a 32-bit value and so theoretically it can fire
      before the head death of the universe, i.e. spurious exits are possible.
      But because KVM does *not* save the timer value on VM-Exit and because
      the timer runs at a slower rate than the TSC, the maximuma timer value
      is still sufficiently large for KVM's purposes.  E.g. on a modern CPU
      with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
      TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
      execution.  In other words, spurious VM-Exits are effectively only
      possible if the host is completely tickless on the logical CPU, the
      guest is not using the preemption timer, and the guest is not generating
      VM-Exits for any other reason.
      
      To be safe from bad/weird hardware, disable the preemption timer if its
      maximum delay is less than ten seconds.  Ten seconds is mostly arbitrary
      and was selected in no small part because it's a nice round number.
      For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
      if the preemption timer is disabled by KVM or userspace.  Previously
      KVM continued to use the preemption timer to force immediate exits even
      when the timer was disabled by userspace.  Now that KVM leaves the timer
      running instead of truly disabling it, allow userspace to kill it
      entirely in the unlikely event the timer (or KVM) malfunctions.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      804939ea
    • Sean Christopherson's avatar
      KVM: VMX: Drop hv_timer_armed from 'struct loaded_vmcs' · 9d99cc49
      Sean Christopherson authored
      ... now that it is fully redundant with the pin controls shadow.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9d99cc49
    • Sean Christopherson's avatar
      KVM: nVMX: Preset *DT exiting in vmcs02 when emulating UMIP · 469debdb
      Sean Christopherson authored
      KVM dynamically toggles SECONDARY_EXEC_DESC to intercept (a subset of)
      instructions that are subject to User-Mode Instruction Prevention, i.e.
      VMCS.SECONDARY_EXEC_DESC == CR4.UMIP when emulating UMIP.  Preset the
      VMCS control when preparing vmcs02 to avoid unnecessarily VMWRITEs,
      e.g. KVM will clear VMCS.SECONDARY_EXEC_DESC in prepare_vmcs02_early()
      and then set it in vmx_set_cr4().
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      469debdb
    • Sean Christopherson's avatar
      KVM: nVMX: Preserve last USE_MSR_BITMAPS when preparing vmcs02 · de0286b7
      Sean Christopherson authored
      KVM dynamically toggles the CPU_BASED_USE_MSR_BITMAPS execution control
      for nested guests based on whether or not both L0 and L1 want to pass
      through the same MSRs to L2.  Preserve the last used value from vmcs02
      so as to avoid multiple VMWRITEs to (re)set/(re)clear the bit on nested
      VM-Entry.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      de0286b7
    • Sean Christopherson's avatar
      KVM: VMX: Explicitly initialize controls shadow at VMCS allocation · 3af80fec
      Sean Christopherson authored
      Or: Don't re-initialize vmcs02's controls on every nested VM-Entry.
      
      VMWRITEs to the major VMCS controls are deceptively expensive.  Intel
      CPUs with VMCS caching (Westmere and later) also optimize away
      consistency checks on VM-Entry, i.e. skip consistency checks if the
      relevant fields have not changed since the last successful VM-Entry (of
      the cached VMCS).  Because uops are a precious commodity, uCode's dirty
      VMCS field tracking isn't as precise as software would prefer.  Notably,
      writing any of the major VMCS fields effectively marks the entire VMCS
      dirty, i.e. causes the next VM-Entry to perform all consistency checks,
      which consumes several hundred cycles.
      
      Zero out the controls' shadow copies during VMCS allocation and use the
      optimized setter when "initializing" controls.  While this technically
      affects both non-nested and nested virtualization, nested virtualization
      is the primary beneficiary as avoid VMWRITEs when prepare vmcs02 allows
      hardware to optimizie away consistency checks.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3af80fec
    • Sean Christopherson's avatar
      KVM: nVMX: Don't reset VMCS controls shadow on VMCS switch · ae81d089
      Sean Christopherson authored
      ... now that the shadow copies are per-VMCS.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ae81d089