- 26 Sep, 2022 40 commits
-
-
Sean Christopherson authored
Morph pending exceptions to pending VM-Exits (due to interception) when the exception is queued instead of waiting until nested events are checked at VM-Entry. This fixes a longstanding bug where KVM fails to handle an exception that occurs during delivery of a previous exception, KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow paging), and KVM determines that the exception is in the guest's domain, i.e. queues the new exception for L2. Deferring the interception check causes KVM to esclate various combinations of injected+pending exceptions to double fault (#DF) without consulting L1's interception desires, and ends up injecting a spurious #DF into L2. KVM has fudged around the issue for #PF by special casing emulated #PF injection for shadow paging, but the underlying issue is not unique to shadow paging in L0, e.g. if KVM is intercepting #PF because the guest has a smaller maxphyaddr and L1 (but not L0) is using shadow paging. Other exceptions are affected as well, e.g. if KVM is intercepting #GP for one of SVM's workaround or for the VMware backdoor emulation stuff. The other cases have gone unnoticed because the #DF is spurious if and only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1 would have injected #DF anyways. The hack-a-fix has also led to ugly code, e.g. bailing from the emulator if #PF injection forced a nested VM-Exit and the emulator finds itself back in L1. Allowing for direct-to-VM-Exit queueing also neatly solves the async #PF in L2 mess; no need to set a magic flag and token, simply queue a #PF nested VM-Exit. Deal with event migration by flagging that a pending exception was queued by userspace and check for interception at the next KVM_RUN, e.g. so that KVM does the right thing regardless of the order in which userspace restores nested state vs. event state. When "getting" events from userspace, simply drop any pending excpetion that is destined to be intercepted if there is also an injected exception to be migrated. Ideally, KVM would migrate both events, but that would require new ABI, and practically speaking losing the event is unlikely to be noticed, let alone fatal. The injected exception is captured, RIP still points at the original faulting instruction, etc... So either the injection on the target will trigger the same intercepted exception, or the source of the intercepted exception was transient and/or non-deterministic, thus dropping it is ok-ish. Fixes: a04aead1 ("KVM: nSVM: fix running nested guests when npt=0") Fixes: feaf0c7d ("KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2") Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-22-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a gigantic comment above vmx_check_nested_events() to document the priorities of all known events on Intel CPUs. Intel's SDM doesn't include VMX-specific events in its "Priority Among Concurrent Events", which makes it painfully difficult to suss out the correct priority between things like Monitor Trap Flag VM-Exits and pending #DBs. Kudos to Jim Mattson for doing the hard work of collecting and interpreting the priorities from various locations throughtout the SDM (because putting them all in one place in the SDM would be too easy). Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-21-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a helper to identify "low"-priority #DB traps, i.e. trap-like #DBs that aren't TSS T flag #DBs, and tweak the related code to operate on any queued exception. A future commit will separate exceptions that are intercepted by L1, i.e. cause nested VM-Exit, from those that do NOT trigger nested VM-Exit. I.e. there will be multiple exception structs and multiple invocations of the helpers. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-20-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Determine whether or not new events can be injected after checking nested events. If a VM-Exit occurred during nested event handling, any previous event that needed re-injection is gone from's KVM perspective; the event is captured in the vmc*12 VM-Exit information, but doesn't exist in terms of what needs to be done for entry to L1. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-19-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Perform nested event checks before re-injecting exceptions/events into L2. If a pending exception causes VM-Exit to L1, re-injecting events into vmcs02 is premature and wasted effort. Take care to ensure events that need to be re-injected are still re-injected if checking for nested events "fails", i.e. if KVM needs to force an immediate entry+exit to complete the to-be-re-injecteed event. Keep the "can_inject" logic the same for now; it too can be pushed below the nested checks, but is a slightly riskier change (see past bugs about events not being properly purged on nested VM-Exit). Add and/or modify comments to better document the various interactions. Of note is the comment regarding "blocking" previously injected NMIs and IRQs if an exception is pending. The old comment isn't wrong strictly speaking, but it failed to capture the reason why the logic even exists. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-18-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Queue #DF by recursing on kvm_multiple_exception() by way of kvm_queue_exception_e() instead of open coding the behavior. This will allow KVM to Just Work when a future commit moves exception interception checks (for L2 => L1) into kvm_multiple_exception(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-17-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Capture nested_run_pending as block_pending_exceptions so that the logic of why exceptions are blocked only needs to be documented once instead of at every place that employs the logic. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-16-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the definition of "struct kvm_queued_exception" out of kvm_vcpu_arch in anticipation of adding a second instance in kvm_vcpu_arch to handle exceptions that occur when vectoring an injected exception and are morphed to VM-Exit instead of leading to #DF. Opportunistically take advantage of the churn to rename "nr" to "vector". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-15-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename the kvm_x86_ops hook for exception injection to better reflect reality, and to align with pretty much every other related function name in KVM. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-14-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Treat #PFs that occur during emulation of ENCLS as, wait for it, emulated page faults. Practically speaking, this is a glorified nop as the exception is never of the nested flavor, and it's extremely unlikely the guest is relying on the side effect of an implicit INVLPG on the faulting address. Fixes: 70210c04 ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-13-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Clear mtf_pending on nested VM-Exit instead of handling the clear on a case-by-case basis in vmx_check_nested_events(). The pending MTF should never survive nested VM-Exit, as it is a property of KVM's run of the current L2, i.e. should never affect the next L2 run by L1. In practice, this is likely a nop as getting to L1 with nested_run_pending is impossible, and KVM doesn't correctly handle morphing a pending exception that occurs on a prior injected exception (need for re-injected exception being the other case where MTF isn't cleared). However, KVM will hopefully soon correctly deal with a pending exception on top of an injected exception. Add a TODO to document that KVM has an inversion priority bug between SMIs and MTF (and trap-like #DBS), and that KVM also doesn't properly save/restore MTF across SMI/RSM. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-12-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Fall through to handling other pending exception/events for L2 if SIPI is pending while the CPU is not in Wait-for-SIPI. KVM correctly ignores the event, but incorrectly returns immediately, e.g. a SIPI coincident with another event could lead to KVM incorrectly routing the event to L1 instead of L2. Fixes: bf0cd88c ("KVM: x86: emulate wait-for-SIPI and SIPI-VMExit") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-11-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use DR7_GD in the emulator instead of open coding the check, and drop a comically wrong comment. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-10-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or trap-like depending the sub-type of #DB, and effectively defer the decision of what to do with the #DB to the caller. For the emulator's two calls to exception_type(), treat the #DB as fault-like, as the emulator handles only code breakpoint and general detect #DBs, both of which are fault-like. For event injection, which uses exception_type() to determine whether to set EFLAGS.RF=1 on the stack, keep the current behavior of not setting RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs, so exempting by failing the "== EXCPT_FAULT" check is correct. The only other fault-like #DB is General Detect, and despite Intel and AMD both strongly implying (through omission) that General Detect #DBs should set RF=1, hardware (multiple generations of both Intel and AMD), in fact does not. Through insider knowledge, extreme foresight, sheer dumb luck, or some combination thereof, KVM correctly handled RF for General Detect #DBs. Fixes: 38827dbd ("KVM: x86: Do not update EFLAGS on faulting emulation") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Service TSS T-flag #DBs prior to pending MTFs, as such #DBs are higher priority than MTF. KVM itself doesn't emulate TSS #DBs, and any such exceptions injected from L1 will be handled by hardware (or morphed to a fault-like exception if injection fails), but theoretically userspace could pend a TSS T-flag #DB in conjunction with a pending MTF. Note, there's no known use case this fixes, it's purely to be technically correct with respect to Intel's SDM. Cc: Oliver Upton <oupton@google.com> Cc: Peter Shier <pshier@google.com> Fixes: 5ef8acbd ("KVM: nVMX: Emulate MTF when performing instruction emulation") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-8-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Exclude General Detect #DBs, which have fault-like behavior but also have a non-zero payload (DR6.BD=1), from nVMX's handling of pending debug traps. Opportunistically rewrite the comment to better document what is being checked, i.e. "has a non-zero payload" vs. "has a payload", and to call out the many caveats surrounding #DBs that KVM dodges one way or another. Cc: Oliver Upton <oupton@google.com> Cc: Peter Shier <pshier@google.com> Fixes: 684c0422 ("KVM: nVMX: Handle pending #DB when injecting INIT VM-exit") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-7-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Suppress code breakpoints if MOV/POP SS blocking is active and the guest CPU is Intel, i.e. if the guest thinks it's running on an Intel CPU. Intel CPUs inhibit code #DBs when MOV/POP SS blocking is active, whereas AMD (and its descendents) do not. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830231614.3580124-6-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Extend force_emulation_prefix to an 'int' and use bit 1 as a flag to indicate that KVM should clear RFLAGS.RF before emulating, e.g. to allow tests to force emulation of code breakpoints in conjunction with MOV/POP SS blocking, which is impossible without KVM intervention as VMX unconditionally sets RFLAGS.RF on intercepted #UD. Make the behavior controllable so that tests can also test RFLAGS.RF=1 (again in conjunction with code #DBs). Note, clearing RFLAGS.RF won't create an infinite #DB loop as the guest's IRET from the #DB handler will return to the instruction and not the prefix, i.e. the restart won't force emulation. Opportunistically convert the permissions to the preferred octal format. Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830231614.3580124-5-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Don't check for code breakpoints during instruction emulation if the emulation was triggered by exception interception. Code breakpoints are the highest priority fault-like exception, and KVM only emulates on exceptions that are fault-like. Thus, if hardware signaled a different exception, then the vCPU is already passed the stage of checking for hardware breakpoints. This is likely a glorified nop in terms of functionality, and is more for clarification and is technically an optimization. Intel's SDM explicitly states vmcs.GUEST_RFLAGS.RF on exception interception is the same as the value that would have been saved on the stack had the exception not been intercepted, i.e. will be '1' due to all fault-like exceptions setting RF to '1'. AMD says "guest state saved ... is the processor state as of the moment the intercept triggers", but that begs the question, "when does the intercept trigger?". Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-4-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Deliberately truncate the exception error code when shoving it into the VMCS (VM-Entry field for vmcs01 and vmcs02, VM-Exit field for vmcs12). Intel CPUs are incapable of handling 32-bit error codes and will never generate an error code with bits 31:16, but userspace can provide an arbitrary error code via KVM_SET_VCPU_EVENTS. Failure to drop the bits on exception injection results in failed VM-Entry, as VMX disallows setting bits 31:16. Setting the bits on VM-Exit would at best confuse L1, and at worse induce a nested VM-Entry failure, e.g. if L1 decided to reinject the exception back into L2. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-3-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Drop pending exceptions and events queued for re-injection when leaving nested guest mode, even if the "exit" is due to VM-Fail, SMI, or forced by host userspace. Failure to purge events could result in an event belonging to L2 being injected into L1. This _should_ never happen for VM-Fail as all events should be blocked by nested_run_pending, but it's possible if KVM, not the L1 hypervisor, is the source of VM-Fail when running vmcs02. SMI is a nop (barring unknown bugs) as recognition of SMI and thus entry to SMM is blocked by pending exceptions and re-injected events. Forced exit is definitely buggy, but has likely gone unnoticed because userspace probably follows the forced exit with KVM_SET_VCPU_EVENTS (or some other ioctl() that purges the queue). Fixes: 4f350c6d ("kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-2-seanjc@google.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Hou Wenlong authored
Since the RDMSR/WRMSR emulation uses a sepearte emualtor interface, the trace points for RDMSR/WRMSR can be added in emulator path like normal path. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/39181a9f777a72d61a4d0bb9f6984ccbd1de2ea3.1661930557.git.houwenlong.hwl@antgroup.comSigned-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Hou Wenlong authored
The return value of emulator_{get|set}_mst_with_filter() is confused, since msr access error and emulator error are mixed. Although, KVM_MSR_RET_* doesn't conflict with X86EMUL_IO_NEEDED at present, it is better to convert msr access error to emulator error if error value is needed. So move "r < 0" handling for wrmsr emulation into the set helper function, then only X86EMUL_* is returned in the helper functions. Also add "r < 0" check in the get helper function, although KVM doesn't return -errno today, but assuming that will always hold true is unnecessarily risking. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/09b2847fc3bcb8937fb11738f0ccf7be7f61d9dd.1661930557.git.houwenlong.hwl@antgroup.com [sean: wrap changelog less aggressively] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jilin Yuan authored
Delete the redundant word 'to'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Link: https://lore.kernel.org/r/20220831125217.12313-1-yuanjilin@cdjrlc.comSigned-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
vmcs_config has cached host MSR_IA32_VMX_MISC value, use it for setting up nested MSR_IA32_VMX_MISC in nested_vmx_setup_ctls_msrs() and avoid the redundant rdmsr(). No (real) functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-34-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Like other host VMX control MSRs, MSR_IA32_VMX_MISC can be cached in vmcs_config to avoid the need to re-read it later, e.g. from cpu_has_vmx_intel_pt() or cpu_has_vmx_shadow_vmcs(). No (real) functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-33-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Using raw host MSR values for setting up nested VMX control MSRs is incorrect as some features need to disabled, e.g. when KVM runs as a nested hypervisor on Hyper-V and uses Enlightened VMCS or when a workaround for IA32_PERF_GLOBAL_CTRL is applied. For non-nested VMX, this is done in setup_vmcs_config() and the result is stored in vmcs_config. Use it for setting up allowed-1 bits in nested VMX MSRs too. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-32-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Similar to exit_ctls_low, entry_ctls_low, and procbased_ctls_low, pinbased_ctls_low should be set to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR and not host's MSR_IA32_VMX_PINBASED_CTLS value |= PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR. The commit eabeaacc ("KVM: nVMX: Clean up and fix pin-based execution controls") which introduced '|=' doesn't mention anything about why this is needed, the change seems rather accidental. Note: normally, required-1 portion of MSR_IA32_VMX_PINBASED_CTLS should be equal to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR so no behavioral change is expected, however, it is (in theory) possible to observe something different there when e.g. KVM is running as a nested hypervisor. Hope this doesn't happen in practice. Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-31-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
As a preparation to reusing the result of setup_vmcs_config() for setting up nested VMX control MSRs, move LOAD_IA32_PERF_GLOBAL_CTRL errata handling to vmx_vmexit_ctrl()/vmx_vmentry_ctrl() and print the warning from hardware_setup(). While it seems reasonable to not expose LOAD_IA32_PERF_GLOBAL_CTRL controls to L1 hypervisor on buggy CPUs, such change would inevitably break live migration from older KVMs where the controls are exposed. Keep the status quo for now, L1 hypervisor itself is supposed to take care of the errata. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-30-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
Intel processor code names are more familiar to many readers than their decimal model numbers. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-29-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Clear the CR3 and INVLPG interception controls at runtime based on whether or not EPT is being _used_, as opposed to clearing the bits at setup if EPT is _supported_ in hardware, and then restoring them when EPT is not used. Not mucking with the base config will allow using the base config as the starting point for emulating the VMX capability MSRs. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-28-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
As a preparation to reusing the result of setup_vmcs_config() in nested VMX MSR setup, add the CPU based VM execution controls which KVM doesn't use but supports for nVMX to KVM_OPT_VMX_CPU_BASED_VM_EXEC_CONTROL and filter them out in vmx_exec_control(). No functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-27-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
As a preparation to reusing the result of setup_vmcs_config() in nested VMX MSR setup, add the VMEXIT controls which KVM doesn't use but supports for nVMX to KVM_OPT_VMX_VM_EXIT_CONTROLS and filter them out in vmx_vmexit_ctrl(). No functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-26-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
As a preparation to reusing the result of setup_vmcs_config() in nested VMX MSR setup, move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering to vmx_exec_control(). No functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-25-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
When VMX controls macros are used to set or clear a control bit, make sure that this bit was checked in setup_vmcs_config() and thus is properly reflected in vmcs_config. Opportunistically drop pointless "< 0" check for adjust_vmx_controls()'s return value. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-24-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Don't toggle VM_ENTRY_IA32E_MODE in 32-bit kernels/KVM and instead bug the VM if KVM attempts to run the guest with EFER.LMA=1. KVM doesn't support running 64-bit guests with 32-bit hosts. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-23-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
SECONDARY_EXEC_ENCLS_EXITING is the only control which is conditionally added to the 'optional' checklist in setup_vmcs_config() but the special case can be avoided by always checking for its presence first and filtering out the result later. Note: the situation when SECONDARY_EXEC_ENCLS_EXITING is present but cpu_has_sgx() is false is possible when SGX is "soft-disabled", e.g. if software writes MCE control MSRs or there's an uncorrectable #MC. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-22-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
CPU_BASED_{INTR,NMI}_WINDOW_EXITING controls are toggled dynamically by vmx_enable_{irq,nmi}_window, handle_interrupt_window(), handle_nmi_window() but setup_vmcs_config() doesn't check their existence. Add the check and filter the controls out in vmx_exec_control(). Note: KVM explicitly supports CPUs without VIRTUAL_NMIS and all these CPUs are supposedly lacking NMI_WINDOW_EXITING too. Adjust cpu_has_virtual_nmis() accordingly. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-21-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
VM_ENTRY_IA32E_MODE control is toggled dynamically by vmx_set_efer() and setup_vmcs_config() doesn't check its existence. On the contrary, nested_vmx_setup_ctls_msrs() doesn set it on x86_64. Add the missing check and filter the bit out in vmx_vmentry_ctrl(). No (real) functional change intended as all existing CPUs supporting long mode and VMX are supposed to have it. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-20-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Advertise VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL as being supported for nested VMs irrespective of hardware support. KVM fully emulates the controls, i.e. manually emulates MSR writes on entry/exit, and never propagates the guest settings directly to vmcs02. In addition to allowing L1 VMMs to use the controls on older hardware, unconditionally advertising the controls will also allow KVM to use its vmcs01 configuration as the basis for the nested VMX configuration without causing a regression (due the errata which causes KVM to "hide" the control from vmcs01 but not vmcs12). Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-19-vkuznets@redhat.comSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
-