- 14 Dec, 2018 40 commits
-
-
Marc Orr authored
Previously, the guest_fpu field was embedded in the kvm_vcpu_arch struct. Unfortunately, the field is quite large, (e.g., 4352 bytes on my current setup). This bloats the kvm_vcpu_arch struct for x86 into an order 3 memory allocation, which can become a problem on overcommitted machines. Thus, this patch moves the fpu state outside of the kvm_vcpu_arch struct. With this patch applied, the kvm_vcpu_arch struct is reduced to 15168 bytes for vmx on my setup when building the kernel with kvmconfig. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Marc Orr authored
Previously, x86's instantiation of 'struct kvm_vcpu_arch' added an fpu field to save/restore fpu-related architectural state, which will differ from kvm's fpu state. However, this is redundant to the 'struct fpu' field, called fpu, embedded in the task struct, via the thread field. Thus, this patch removes the user_fpu field from the kvm_vcpu_arch struct and replaces it with the task struct's fpu field. This change is significant because the fpu struct is actually quite large. For example, on the system used to develop this patch, this change reduces the size of the vcpu_vmx struct from 23680 bytes down to 19520 bytes, when building the kernel with kvmconfig. This reduction in the size of the vcpu_vmx struct moves us closer to being able to allocate the struct at order 2, rather than order 3. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
.. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
.. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
.. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
.. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Passing the enum and doing an indirect lookup is silly when we can simply pass the field directly. Remove the "fast path" code in nested_vmx_check_msr_switch_controls() as it's now nothing more than a redundant check. Remove the debug message rather than continue passing the enum for the address field. Having debug messages for the MSRs themselves is useful as MSR legality is a huge space, whereas messing up a physical address means the VMM is fundamentally broken. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
.. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
.. as they are used only in nested vmx context. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Lan Tianyu authored
This patch is to initialize ept_pointer to INVALID_PAGE and check it before flushing ept tlb. If ept_pointer is invalid, bypass the flush request. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
According to section "VM-entry Failures During or After Loading Guest State" in Intel SDM vol 3C, "No MSRs are saved into the VM-exit MSR-store area." when bit 31 of the exit reason is set. Reported-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
If userspace has provided a different value for this MSR (e.g with the turbo bits set), the userspace-provided value should survive a vCPU reset. For backwards compatibility, MSR_PLATFORM_INFO is initialized in kvm_arch_vcpu_setup. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Drew Schmitt <dasch@google.com> Cc: Abhiroop Dabral <adabral@paloaltonetworks.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wei Huang authored
This patch adds Intel "Xeon CPU E3-1220 V2", with CPUID.01H.EAX=0x000306A8, into the list of known broken CPUs which fail to support VMX preemption timer. This bug was found while running the APIC timer test of kvm-unit-test on this specific CPU, even though the errata info can't be located in the public domain for this CPU. Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Eduardo Habkost authored
Months ago, we have added code to allow direct access to MSR_IA32_SPEC_CTRL to the guest, which makes STIBP available to guests. This was implemented by commits d28b387f ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL") and b2ac58f9 ("KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL"). However, we never updated GET_SUPPORTED_CPUID to let userspace know that STIBP can be enabled in CPUID. Fix that by updating kvm_cpuid_8000_0008_ebx_x86_features and kvm_cpuid_7_0_edx_x86_features. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Turns out we over-engineered Direct Mode for stimers a bit: unlike traditional stimers where we may want to try to re-inject the message upon EOI, Direct Mode stimers just set the irq in APIC and kvm_apic_set_irq() fails only when APIC is disabled (see APIC_DM_FIXED case in __apic_accept_irq()). Remove the redundant part. Suggested-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
stimers_pending optimization only helps us to avoid multiple kvm_make_request() calls. This doesn't happen very often and these calls are very cheap in the first place, remove open-coded version of stimer_mark_pending() from kvm_hv_notify_acked_sint(). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Turns out Hyper-V on KVM (as of 2016) will only use synthetic timers if direct mode is available. With direct mode we notify the guest by asserting APIC irq instead of sending a SynIC message. The implementation uses existing vec_bitmap for letting lapic code know that we're interested in the particular IRQ's EOI request. We assume that the same APIC irq won't be used by the guest for both direct mode stimer and as sint source (especially with AutoEOI semantics). It is unclear how things should be handled if that's not true. Direct mode is also somewhat less expensive; in my testing stimer_send_msg() takes not less than 1500 cpu cycles and stimer_notify_direct() can usually be done in 300-400. WS2016 without Hyper-V, however, always sticks to non-direct version. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
As a preparation to implementing Direct Mode for Hyper-V synthetic timers switch to using stimer config definition from hyperv-tlfs.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
We implement Hyper-V SynIC and synthetic timers in KVM too so there's some room for code sharing. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Add a simple (and stupid) hyperv_cpuid test: check that we got the expected number of entries with and without Enlightened VMCS enabled and that all currently reserved fields are zeroed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
In case we want to test failing ioctls we need an option to not fail. Following _vcpu_run() precedent implement _vcpu_ioctl(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
With every new Hyper-V Enlightenment we implement we're forced to add a KVM_CAP_HYPERV_* capability. While this approach works it is fairly inconvenient: the majority of the enlightenments we do have corresponding CPUID feature bit(s) and userspace has to know this anyways to be able to expose the feature to the guest. Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one cap to rule them all!") returning all Hyper-V CPUID feature leaves. Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible: Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000, 0x40000001) and we would probably confuse userspace in case we decide to return these twice. KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
The upcoming KVM_GET_SUPPORTED_HV_CPUID ioctl will need to return Enlightened VMCS version in HYPERV_CPUID_NESTED_FEATURES.EAX when it was enabled. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
BIT(13) in HYPERV_CPUID_FEATURES.EBX is described as "ConfigureProfiler" in TLFS v4.0 but starting 5.0 it is replaced with 'Reserved'. As we don't currently us it in kernel it can just be dropped. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
hyperv-tlfs.h is a bit messy: CPUID feature bits are not always sorted, it's hard to get which CPUID they belong to, some items are duplicated (e.g. HV_X64_MSR_CRASH_CTL_NOTIFY/HV_CRASH_CTL_CRASH_NOTIFY). Do some housekeeping work. While on it, replace all (1 << X) with BIT(X) macro. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
The TLFS structures are used for hypervisor-guest communication and must exactly meet the specification. Compilers can add alignment padding to structures or reorder struct members for randomization and optimization, which would break the hypervisor ABI. Mark the structures as packed to prevent this. 'struct hv_vp_assist_page' and 'struct hv_enlightened_vmcs' need to be properly padded to support the change. Suggested-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Roman Kagan authored
The SynIC message delivery protocol allows the message originator to request, should the message slot be busy, to be notified when it's free. However, this is unnecessary and even undesirable for messages generated by SynIC timers in periodic mode: if the period is short enough compared to the time the guest spends in the timer interrupt handler, so the timer ticks start piling up, the excessive interactions due to this notification and retried message delivery only makes the things worse. [This was observed, in particular, with Windows L2 guests setting (temporarily) the periodic timer to 2 kHz, and spending hundreds of microseconds in the timer interrupt handler due to several L2->L1 exits; under some load in L0 this could exceed 500 us so the timer ticks started to pile up and the guest livelocked.] Relieve the situation somewhat by not retrying message delivery for periodic SynIC timers. This appears to remain within the "lazy" lost ticks policy for SynIC timers as implemented in KVM. Note that it doesn't solve the fundamental problem of livelocking the guest with a periodic timer whose period is smaller than the time needed to process a tick, but it makes it a bit less likely to be triggered. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Roman Kagan authored
SynIC message delivery is somewhat overengineered: it pretends to follow the ordering rules when grabbing the message slot, using atomic operations and all that, but does it incorrectly and unnecessarily. The correct order would be to first set .msg_pending, then atomically replace .message_type if it was zero, and then clear .msg_pending if the previous step was successful. But this all is done in vcpu context so the whole update looks atomic to the guest (it's assumed to only access the message page from this cpu), and therefore can be done in whatever order is most convenient (and is also the reason why the incorrect order didn't trigger any bugs so far). While at this, also switch to kvm_vcpu_{read,write}_guest_page, and drop the no longer needed synic_clear_sint_msg_pending. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peng Hao authored
In the previous code, the variable apic_sw_disabled influences recalculate_apic_map. But in "KVM: x86: simplify kvm_apic_map" (commit: 3b5a5ffa), the access to apic_sw_disabled in recalculate_apic_map has been deleted. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peng Hao authored
structure svm_init_data is never used. So remove it. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
Like IA32_STAR, IA32_LSTAR and IA32_FMASK only need to contain guest values on VM-entry when the guest is in long mode and EFER.SCE is set. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
SYSCALL raises #UD in compatibility mode on Intel CPUs, so it's pointless to load the guest's IA32_CSTAR value into the hardware MSR. IA32_CSTAR still provides 48 bits of storage on Intel CPUs that have CPUID.80000001:EDX.LM[bit 29] set, so we cannot remove it from the vmx_msr_index[] array. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
Add a comment explaining why MSR_STAR must be included in vmx_msr_index[] even for i386 builds. The elided comment has not been relevant since move_msr_up() was introduced in commit a75beee6 ("KVM: VMX: Avoid saving and restoring msrs on lightweight vmexit"). Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
RDTSCP is supported in legacy mode as well as long mode. The IA32_TSC_AUX MSR should be set to the correct guest value before entering any guest that supports RDTSCP. Fixes: 4e47c7a6 ("KVM: VMX: Add instruction rdtscp support for guest") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
From a functional perspective, this is (supposed to be) a straight copy-paste of code. Code was moved piecemeal to nested.c as not all code that could/should be moved was obviously nested-only. The nested code was then re-ordered as needed to compile, i.e. stats may not show this is being a "pure" move despite there not being any intended changes in functionality. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Exposing only the function allows @nested, i.e. the module param, to be statically defined in vmx.c, ensuring we aren't unnecessarily checking said variable in the nested code. nested_vmx_allowed() is exposed due to the need to verify nested support in vmx_{get,set}_nested_state(). The downside is that nested_vmx_allowed() likely won't be inlined in vmx_{get,set}_nested_state(), but that should be a non-issue as they're not a hot path. Keeping vmx_{get,set}_nested_state() in vmx.c isn't a viable option as they need access to several nested-only functions. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
...as they're used directly by the nested code. This will allow moving the bulk of the nested code out of vmx.c without concurrent changes to vmx.h. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Exposed vmx_msr_index, vmx_return and host_efer via vmx.h so that the nested code can be moved out of vmx.c. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
...so that the function doesn't need to be created when moving the nested code out of vmx.c. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
...so that it doesn't need access to @nested. The only case where the provided struct isn't already zeroed is the call from vmx_create_vcpu() as setup_vmcs_config() zeroes the struct in the other use cases. This will allow @nested to be statically defined in vmx.c, i.e. this removes the last direct reference from nested code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-