Commit 668fffa3 authored by Michael S. Tsirkin's avatar Michael S. Tsirkin Committed by Paolo Bonzini

kvm: better MWAIT emulation for guests

Guests that are heavy on futexes end up IPI'ing each other a lot. That
can lead to significant slowdowns and latency increase for those guests
when running within KVM.

If only a single guest is needed on a host, we have a lot of spare host
CPU time we can throw at the problem. Modern CPUs implement a feature
called "MWAIT" which allows guests to wake up sleeping remote CPUs without
an IPI - thus without an exit - at the expense of never going out of guest
context.

The decision whether this is something sensible to use should be up to the
VM admin, so to user space. We can however allow MWAIT execution on systems
that support it properly hardware wise.

This patch adds a CAP to user space and a KVM cpuid leaf to indicate
availability of native MWAIT execution. With that enabled, the worst a
guest can do is waste as many cycles as a "jmp ." would do, so it's not
a privilege problem.

We consciously do *not* expose the feature in our CPUID bitmap, as most
people will want to benefit from sleeping vCPUs to allow for over commit.
Reported-by: default avatar"Gabriel L. Somlo" <gsomlo@gmail.com>
Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
[agraf: fix amd, change commit message]
Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
parent db2336a8
...@@ -4111,3 +4111,12 @@ reserved. ...@@ -4111,3 +4111,12 @@ reserved.
2: MIPS64 or microMIPS64 with access to all address segments. 2: MIPS64 or microMIPS64 with access to all address segments.
Both registers and addresses are 64-bits wide. Both registers and addresses are 64-bits wide.
It will be possible to run 64-bit or 32-bit guest code. It will be possible to run 64-bit or 32-bit guest code.
8.8 KVM_CAP_X86_GUEST_MWAIT
Architectures: x86
This capability indicates that guest using memory monotoring instructions
(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit. As such time
spent while virtual CPU is halted in this way will then be accounted for as
guest running time on the host (as opposed to e.g. HLT).
...@@ -1198,10 +1198,13 @@ static void init_vmcb(struct vcpu_svm *svm) ...@@ -1198,10 +1198,13 @@ static void init_vmcb(struct vcpu_svm *svm)
set_intercept(svm, INTERCEPT_CLGI); set_intercept(svm, INTERCEPT_CLGI);
set_intercept(svm, INTERCEPT_SKINIT); set_intercept(svm, INTERCEPT_SKINIT);
set_intercept(svm, INTERCEPT_WBINVD); set_intercept(svm, INTERCEPT_WBINVD);
set_intercept(svm, INTERCEPT_MONITOR);
set_intercept(svm, INTERCEPT_MWAIT);
set_intercept(svm, INTERCEPT_XSETBV); set_intercept(svm, INTERCEPT_XSETBV);
if (!kvm_mwait_in_guest()) {
set_intercept(svm, INTERCEPT_MONITOR);
set_intercept(svm, INTERCEPT_MWAIT);
}
control->iopm_base_pa = iopm_base; control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = __pa(svm->msrpm); control->msrpm_base_pa = __pa(svm->msrpm);
control->int_ctl = V_INTR_MASKING_MASK; control->int_ctl = V_INTR_MASKING_MASK;
......
...@@ -3527,11 +3527,13 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) ...@@ -3527,11 +3527,13 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
CPU_BASED_USE_IO_BITMAPS | CPU_BASED_USE_IO_BITMAPS |
CPU_BASED_MOV_DR_EXITING | CPU_BASED_MOV_DR_EXITING |
CPU_BASED_USE_TSC_OFFSETING | CPU_BASED_USE_TSC_OFFSETING |
CPU_BASED_MWAIT_EXITING |
CPU_BASED_MONITOR_EXITING |
CPU_BASED_INVLPG_EXITING | CPU_BASED_INVLPG_EXITING |
CPU_BASED_RDPMC_EXITING; CPU_BASED_RDPMC_EXITING;
if (!kvm_mwait_in_guest())
min |= CPU_BASED_MWAIT_EXITING |
CPU_BASED_MONITOR_EXITING;
opt = CPU_BASED_TPR_SHADOW | opt = CPU_BASED_TPR_SHADOW |
CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_MSR_BITMAPS |
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
......
...@@ -2687,6 +2687,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) ...@@ -2687,6 +2687,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ADJUST_CLOCK: case KVM_CAP_ADJUST_CLOCK:
r = KVM_CLOCK_TSC_STABLE; r = KVM_CLOCK_TSC_STABLE;
break; break;
case KVM_CAP_X86_GUEST_MWAIT:
r = kvm_mwait_in_guest();
break;
case KVM_CAP_X86_SMM: case KVM_CAP_X86_SMM:
/* SMBASE is usually relocated above 1M on modern chipsets, /* SMBASE is usually relocated above 1M on modern chipsets,
* and SMM handlers might indeed rely on 4G segment limits, * and SMM handlers might indeed rely on 4G segment limits,
......
#ifndef ARCH_X86_KVM_X86_H #ifndef ARCH_X86_KVM_X86_H
#define ARCH_X86_KVM_X86_H #define ARCH_X86_KVM_X86_H
#include <asm/processor.h>
#include <asm/mwait.h>
#include <linux/kvm_host.h> #include <linux/kvm_host.h>
#include <asm/pvclock.h> #include <asm/pvclock.h>
#include "kvm_cache_regs.h" #include "kvm_cache_regs.h"
...@@ -212,4 +214,38 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec) ...@@ -212,4 +214,38 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
__rem; \ __rem; \
}) })
static inline bool kvm_mwait_in_guest(void)
{
unsigned int eax, ebx, ecx, edx;
if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
return false;
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
/* All AMD CPUs have a working MWAIT implementation */
return true;
case X86_VENDOR_INTEL:
/* Handle Intel below */
break;
default:
return false;
}
/*
* Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
* they would allow guest to stop the CPU completely by disabling
* interrupts then invoking MWAIT.
*/
if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
return false;
cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
return false;
return true;
}
#endif #endif
...@@ -893,6 +893,7 @@ struct kvm_ppc_resize_hpt { ...@@ -893,6 +893,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_S390_GS 140 #define KVM_CAP_S390_GS 140
#define KVM_CAP_S390_AIS 141 #define KVM_CAP_S390_AIS 141
#define KVM_CAP_SPAPR_TCE_VFIO 142 #define KVM_CAP_SPAPR_TCE_VFIO 142
#define KVM_CAP_X86_GUEST_MWAIT 143
#ifdef KVM_CAP_IRQ_ROUTING #ifdef KVM_CAP_IRQ_ROUTING
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment