1. 20 Jun, 2022 11 commits
    • Sean Christopherson's avatar
      KVM: VMX: Refactor 32-bit PSE PT creation to avoid using MMU macro · 1ae20e0b
      Sean Christopherson authored
      Compute the number of PTEs to be filled for the 32-bit PSE page tables
      using the page size and the size of each entry.  While using the MMU's
      PT32_ENT_PER_PAGE macro is arguably better in isolation, removing VMX's
      usage will allow a future namespacing cleanup to move the guest page
      table macros into paging_tmpl.h, out of the reach of code that isn't
      directly related to shadow paging.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614233328.3896033-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1ae20e0b
    • Sean Christopherson's avatar
      KVM: x86: Use lapic_in_kernel() to query in-kernel APIC in APICv helper · b8e1b962
      Sean Christopherson authored
      Use lapic_in_kernel() in kvm_vcpu_apicv_active() to take advantage of the
      kvm_has_noapic_vcpu static branch.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614230548.3852141-6-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b8e1b962
    • Sean Christopherson's avatar
      KVM: x86: Move "apicv_active" into "struct kvm_lapic" · ce0a58f4
      Sean Christopherson authored
      Move the per-vCPU apicv_active flag into KVM's local APIC instance.
      APICv is fully dependent on an in-kernel local APIC, but that's not at
      all clear when reading the current code due to the flag being stored in
      the generic kvm_vcpu_arch struct.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614230548.3852141-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ce0a58f4
    • Sean Christopherson's avatar
      KVM: x86: Check for in-kernel xAPIC when querying APICv for directed yield · ae801e13
      Sean Christopherson authored
      Use kvm_vcpu_apicv_active() to check if APICv is active when seeing if a
      vCPU is a candidate for directed yield due to a pending ACPIv interrupt.
      This will allow moving apicv_active into kvm_lapic without introducing a
      potential NULL pointer deref (kvm_vcpu_apicv_active() effectively adds a
      pre-check on the vCPU having an in-kernel APIC).
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614230548.3852141-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ae801e13
    • Sean Christopherson's avatar
      KVM: x86: Drop @vcpu parameter from kvm_x86_ops.hwapic_isr_update() · d39850f5
      Sean Christopherson authored
      Drop the unused @vcpu parameter from hwapic_isr_update().  AMD/AVIC is
      unlikely to implement the helper, and VMX/APICv doesn't need the vCPU as
      it operates on the current VMCS.  The result is somewhat odd, but allows
      for a decent amount of (future) cleanup in the APIC code.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614230548.3852141-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d39850f5
    • Sean Christopherson's avatar
      KVM: SVM: Drop unused AVIC / kvm_x86_ops declarations · ec1d7e6a
      Sean Christopherson authored
      Drop a handful of unused AVIC function declarations whose implementations
      were removed during the conversion to optional static calls.
      
      No functional change intended.
      
      Fixes: abb6d479 ("KVM: x86: make several APIC virtualization callbacks optional")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614230548.3852141-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ec1d7e6a
    • Sean Christopherson's avatar
      KVM: nVMX: Update vmcs12 on BNDCFGS write, not at vmcs02=>vmcs12 sync · 913d6c9b
      Sean Christopherson authored
      Update vmcs12->guest_bndcfgs on intercepted writes to BNDCFGS from L2
      instead of waiting until vmcs02 is synchronized to vmcs12.  KVM always
      intercepts BNDCFGS accesses, so the only way the value in vmcs02 can
      change is via KVM's explicit VMWRITE during emulation.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614215831.3762138-6-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      913d6c9b
    • Sean Christopherson's avatar
      KVM: nVMX: Save BNDCFGS to vmcs12 iff relevant controls are exposed to L1 · 308a4fff
      Sean Christopherson authored
      Save BNDCFGS to vmcs12 (from vmcs02) if and only if at least of one of
      the load-on-entry or clear-on-exit fields for BNDCFGS is enumerated as an
      allowed-1 bit in vmcs12.  Skipping the field avoids an unnecessary VMREAD
      when MPX is supported but not exposed to L1.
      
      Per Intel's SDM:
      
        If the processor supports either the 1-setting of the "load IA32_BNDCFGS"
        VM-entry control or that of the "clear IA32_BNDCFGS" VM-exit control, the
        contents of the IA32_BNDCFGS MSR are saved into the corresponding field.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614215831.3762138-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      308a4fff
    • Sean Christopherson's avatar
      KVM: nVMX: Rename nested.vmcs01_* fields to nested.pre_vmenter_* · 5d76b1f8
      Sean Christopherson authored
      Rename the fields in struct nested_vmx used to snapshot pre-VM-Enter
      values to reflect that they can hold L2's values when restoring nested
      state, e.g. if userspace restores MSRs before nested state.  As crazy as
      it seems, restoring MSRs before nested state actually works (because KVM
      goes out if it's way to make it work), even though the initial MSR writes
      will hit vmcs01 despite holding L2 values.
      
      Add a related comment to vmx_enter_smm() to call out that using the
      common VM-Exit and VM-Enter helpers to emulate SMI and RSM is wrong and
      broken.  The few MSRs that have snapshots _could_ be fixed by taking a
      snapshot prior to the forced VM-Exit instead of at forced VM-Enter, but
      that's just the tip of the iceberg as the rather long list of MSRs that
      aren't snapshotted (hello, VM-Exit MSR load list) can't be handled this
      way.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614215831.3762138-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5d76b1f8
    • Sean Christopherson's avatar
      KVM: nVMX: Snapshot pre-VM-Enter DEBUGCTL for !nested_run_pending case · 764643a6
      Sean Christopherson authored
      If a nested run isn't pending, snapshot vmcs01.GUEST_IA32_DEBUGCTL
      irrespective of whether or not VM_ENTRY_LOAD_DEBUG_CONTROLS is set in
      vmcs12.  When restoring nested state, e.g. after migration, without a
      nested run pending, prepare_vmcs02() will propagate
      nested.vmcs01_debugctl to vmcs02, i.e. will load garbage/zeros into
      vmcs02.GUEST_IA32_DEBUGCTL.
      
      If userspace restores nested state before MSRs, then loading garbage is a
      non-issue as loading DEBUGCTL will also update vmcs02.  But if usersepace
      restores MSRs first, then KVM is responsible for propagating L2's value,
      which is actually thrown into vmcs01, into vmcs02.
      
      Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state
      is all kinds of bizarre and ideally would not be supported.  Sadly, some
      VMMs do exactly that and rely on KVM to make things work.
      
      Note, there's still a lurking SMM bug, as propagating vmcs01's DEBUGCTL
      to vmcs02 across RSM may corrupt L2's DEBUGCTL.  But KVM's entire VMX+SMM
      emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the
      "default treatment of SMIs", i.e. when not using an SMI Transfer Monitor.
      
      Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com
      Fixes: 8fcc4b59 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614215831.3762138-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      764643a6
    • Sean Christopherson's avatar
      KVM: nVMX: Snapshot pre-VM-Enter BNDCFGS for !nested_run_pending case · fa578398
      Sean Christopherson authored
      If a nested run isn't pending, snapshot vmcs01.GUEST_BNDCFGS irrespective
      of whether or not VM_ENTRY_LOAD_BNDCFGS is set in vmcs12.  When restoring
      nested state, e.g. after migration, without a nested run pending,
      prepare_vmcs02() will propagate nested.vmcs01_guest_bndcfgs to vmcs02,
      i.e. will load garbage/zeros into vmcs02.GUEST_BNDCFGS.
      
      If userspace restores nested state before MSRs, then loading garbage is a
      non-issue as loading BNDCFGS will also update vmcs02.  But if usersepace
      restores MSRs first, then KVM is responsible for propagating L2's value,
      which is actually thrown into vmcs01, into vmcs02.
      
      Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state
      is all kinds of bizarre and ideally would not be supported.  Sadly, some
      VMMs do exactly that and rely on KVM to make things work.
      
      Note, there's still a lurking SMM bug, as propagating vmcs01.GUEST_BNDFGS
      to vmcs02 across RSM may corrupt L2's BNDCFGS.  But KVM's entire VMX+SMM
      emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the
      "default treatment of SMIs", i.e. when not using an SMI Transfer Monitor.
      
      Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com
      Fixes: 62cf9bd8 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS")
      Cc: stable@vger.kernel.org
      Cc: Lei Wang <lei4.wang@intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220614215831.3762138-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fa578398
  2. 15 Jun, 2022 13 commits
  3. 14 Jun, 2022 5 commits
  4. 11 Jun, 2022 11 commits