Commits · 22a7cdcae6a4a3c8974899e62851d270956f58ce · Kirill Smelkov / linux

24 Oct, 2018 1 commit

KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned · 22a7cdca

KarimAllah Ahmed authored Oct 20, 2018

The spec only requires the posted interrupt descriptor address to be
64-bytes aligned (i.e. bits[0:5] == 0). Using page_address_valid also
forces the address to be page aligned.

Only validate that the address does not cross the maximum physical address
without enforcing a page alignment.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: 6de84e58 ("nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2")
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Krish Sadhuhan <krish.sadhukhan@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

22a7cdca

23 Oct, 2018 1 commit

Revert "kvm: x86: optimize dr6 restore" · f9dcf08e

Radim Krčmář authored Oct 23, 2018

This reverts commit 0e0a53c5.

As Christian Ehrhardt noted:

  The most common case is that vcpu->arch.dr6 and the host's %dr6 value
  are not related at all because ->switch_db_regs is zero. To do this
  all correctly, we must handle the case where the guest leaves an arbitrary
  unused value in vcpu->arch.dr6 before disabling breakpoints again.

  However, this means that vcpu->arch.dr6 is not suitable to detect the
  need for a %dr6 clear.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

f9dcf08e

21 Oct, 2018 1 commit

Merge tag 'kvm-ppc-next-4.20-2' of... · 574c0cfb

Paolo Bonzini authored Oct 21, 2018

Merge tag 'kvm-ppc-next-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

Second PPC KVM update for 4.20.

Two commits; one is an optimization for PCI pass-through, and the
other disables nested HV-KVM on early POWER9 chips that need a
particular hardware bug workaround.

574c0cfb

20 Oct, 2018 1 commit

KVM: PPC: Optimize clearing TCEs for sparse tables · 6e301a8e

Alexey Kardashevskiy authored Oct 15, 2018

The powernv platform maintains 2 TCE tables for VFIO - a hardware TCE
table and a table with userspace addresses. These tables are radix trees,
we allocate indirect levels when they are written to. Since
the memory allocation is problematic in real mode, we have 2 accessors
to the entries:
- for virtual mode: it allocates the memory and it is always expected
to return non-NULL;
- fr real mode: it does not allocate and can return NULL.

Also, DMA windows can span to up to 55 bits of the address space and since
we never have this much RAM, such windows are sparse. However currently
the SPAPR TCE IOMMU driver walks through all TCEs to unpin DMA memory.

Since we maintain a userspace addresses table for VFIO which is a mirror
of the hardware table, we can use it to know which parts of the DMA
window have not been mapped and skip these so does this patch.

The bare metal systems do not have this problem as they use a bypass mode
of a PHB which maps RAM directly.

This helps a lot with sparse DMA windows, reducing the shutdown time from
about 3 minutes per 1 billion TCEs to a few seconds for 32GB sparse guest.
Just skipping the last level seems to be good enough.

As non-allocating accessor is used now in virtual mode as well, rename it
from IOMMU_TABLE_USERSPACE_ENTRY_RM (real mode) to _RO (read only).
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

6e301a8e

19 Oct, 2018 5 commits

x86/kvm/nVMX: tweak shadow fields · cbe3f898

Vitaly Kuznetsov authored Oct 19, 2018

It seems we have some leftovers from times when 'unrestricted guest'
wasn't exposed to L1. Stop shadowing GUEST_CS_{BASE,LIMIT,AR_SELECTOR}
and GUEST_ES_BASE, shadow GUEST_SS_AR_BYTES as it was found that some
hypervisors (e.g. Hyper-V without Enlightened VMCS) access it pretty
often.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cbe3f898

selftests/kvm: add missing executables to .gitignore · f15ac811

Anders Roxell authored Oct 19, 2018

Fixes: 18178ff8 ("KVM: selftests: add Enlightened VMCS test")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f15ac811

Merge tag 'kvmarm-for-v4.20' of... · e42b4a50

Paolo Bonzini authored Oct 19, 2018

Merge tag 'kvmarm-for-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for 4.20

- Improved guest IPA space support (32 to 52 bits)
- RAS event delivery for 32bit
- PMU fixes
- Guest entry hardening
- Various cleanups

e42b4a50

KVM: arm64: Safety check PSTATE when entering guest and handle IL · e4e11cc0

Christoffer Dall authored Oct 17, 2018

This commit adds a paranoid check when entering the guest to make sure
we don't attempt running guest code in an equally or more privilged mode
than the hypervisor.  We also catch other accidental programming of the
SPSR_EL2 which results in an illegal exception return and report this
safely back to the user.
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

e4e11cc0

KVM: PPC: Book3S HV: Don't use streamlined entry path on early POWER9 chips · 8d9fcacf

Paul Mackerras authored Oct 19, 2018

This disables the use of the streamlined entry path for radix guests
on early POWER9 chips that need the workaround added in commit
a25bd72b ("powerpc/mm/radix: Workaround prefetch issue with KVM",
2017-07-24), because the streamlined entry path does not include
that workaround.  This also means that we can't do nested HV-KVM
on those chips.

Since the chips that need that workaround are the same ones that can't
run both radix and HPT guests at the same time on different threads of
a core, we use the existing 'no_mixing_hpt_and_radix' variable that
identifies those chips to identify when we can't use the new guest
entry path, and when we can't do nested virtualization.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

8d9fcacf

18 Oct, 2018 2 commits

arm/arm64: KVM: Enable 32 bits kvm vcpu events support · 58bf437f

Dongjiu Geng authored Oct 13, 2018

The commit 539aee0e ("KVM: arm64: Share the parts of
get/set events useful to 32bit") shares the get/set events
helper for arm64 and arm32, but forgot to share the cap
extension code.

User space will check whether KVM supports vcpu events by
checking the KVM_CAP_VCPU_EVENTS extension
Acked-by: James Morse <james.morse@arm.com>
Reviewed-by : Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

58bf437f

arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension() · 375bdd3b

Dongjiu Geng authored Oct 13, 2018

Rename kvm_arch_dev_ioctl_check_extension() to
kvm_arch_vm_ioctl_check_extension(), because it does
not have any relationship with device.

Renaming this function can make code readable.

Cc: James Morse <james.morse@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

375bdd3b

17 Oct, 2018 8 commits

KVM: arm64: Fix caching of host MDCR_EL2 value · da5a3ce6

Mark Rutland authored Oct 17, 2018

At boot time, KVM stashes the host MDCR_EL2 value, but only does this
when the kernel is not running in hyp mode (i.e. is non-VHE). In these
cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
lead to CONSTRAINED UNPREDICTABLE behaviour.

Since we use this value to derive the MDCR_EL2 value when switching
to/from a guest, after a guest have been run, the performance counters
do not behave as expected. This has been observed to result in accesses
via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
counters, resulting in events not being counted. In these cases, only
the fixed-purpose cycle counter appears to work as expected.

Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.

Cc: Christopher Dall <christoffer.dall@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Fixes: 1e947bad ("arm64: KVM: Skip HYP setup when already running in HYP")
Tested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

da5a3ce6

KVM: VMX: enable nested virtualization by default · 1e58e5e5

Paolo Bonzini authored Oct 17, 2018

With live migration support and finally a good solution for exception
event injection, nested VMX should be ready for having a stable userspace
ABI.  The results of syzkaller fuzzing are not perfect but not horrible
either (and might be partially due to running on GCE, so that effectively
we're testing three-level nesting on a fork of upstream KVM!).  Enabling
it by default seems like a nice way to conclude the 4.20 pull request. :)

Unfortunately, enabling nested SVM in 2009 (commit 4b6e4dca) was a
bit premature.  However, until live migration support is in place we can
reasonably expect that it does not offer much in terms of ABI guarantees.
Therefore we are still in time to break things and conform as much as
possible to the interface used for VMX.
Suggested-by: Jim Mattson <jmattson@google.com>
Suggested-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Celebrated-by: Liran Alon <liran.alon@oracle.com>
Celebrated-by: Wanpeng Li <kernellwp@gmail.com>
Celebrated-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1e58e5e5

KVM/x86: Use 32bit xor to clear registers in svm.c · 43ce76ce

Uros Bizjak authored Oct 17, 2018

x86_64 zero-extends 32bit xor operation to a full 64bit register.

Also add a comment and remove unnecessary instruction suffix in vmx.c
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

43ce76ce

kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD · c4f55198

Jim Mattson authored Oct 16, 2018

This is a per-VM capability which can be enabled by userspace so that
the faulting linear address will be included with the information
about a pending #PF in L2, and the "new DR6 bits" will be included
with the information about a pending #DB in L2. With this capability
enabled, the L1 hypervisor can now intercept #PF before CR2 is
modified. Under VMX, the L1 hypervisor can now intercept #DB before
DR6 and DR7 are modified.

When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
generally provide an appropriate payload when injecting a #PF or #DB
exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
checkpoints, this payload is not required.

Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
region while advanced debugging of RTM transactional regions was
enabled. This is the reverse of DR6.RTM, which is cleared in this
scenario.

This capability also enables exception.pending in struct
kvm_vcpu_events, which allows userspace to distinguish between pending
and injected exceptions.
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c4f55198

kvm: vmx: Defer setting of DR6 until #DB delivery · f10c729f

Jim Mattson authored Oct 16, 2018

When exception payloads are enabled by userspace (which is not yet
possible) and a #DB is raised in L2, defer the setting of DR6 until
later. Under VMX, this allows the L1 hypervisor to intercept the fault
before DR6 is modified. Under SVM, DR6 is modified before L1 can
intercept the fault (as has always been the case with DR7).

Note that the payload associated with a #DB exception includes only
the "new DR6 bits." When the payload is delievered, DR6.B0-B3 will be
cleared and DR6.RTM will be set prior to merging in the new DR6 bits.

Also note that bit 16 in the "new DR6 bits" is set to indicate that a
debug exception (#DB) or a breakpoint exception (#BP) occurred inside
an RTM region while advanced debugging of RTM transactional regions
was enabled. Though the reverse of DR6.RTM, this makes the #DB payload
field compatible with both the pending debug exceptions field under
VMX and the exit qualification for #DB exceptions under VMX.
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f10c729f

kvm: x86: Defer setting of CR2 until #PF delivery · da998b46

Jim Mattson authored Oct 16, 2018

When exception payloads are enabled by userspace (which is not yet
possible) and a #PF is raised in L2, defer the setting of CR2 until
the #PF is delivered. This allows the L1 hypervisor to intercept the
fault before CR2 is modified.

For backwards compatibility, when exception payloads are not enabled
by userspace, kvm_multiple_exception modifies CR2 when the #PF
exception is raised.
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

da998b46

kvm: x86: Add payload operands to kvm_multiple_exception · 91e86d22

Jim Mattson authored Oct 16, 2018

kvm_multiple_exception now takes two additional operands: has_payload
and payload, so that updates to CR2 (and DR6 under VMX) can be delayed
until the exception is delivered. This is necessary to properly
emulate VMX or SVM hardware behavior for nested virtualization.

The new behavior is triggered by
vcpu->kvm->arch.exception_payload_enabled, which will (later) be set
by a new per-VM capability, KVM_CAP_EXCEPTION_PAYLOAD.
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

91e86d22

kvm: x86: Add exception payload fields to kvm_vcpu_events · 59073aaf

Jim Mattson authored Oct 16, 2018

The per-VM capability KVM_CAP_EXCEPTION_PAYLOAD (to be introduced in a
later commit) adds the following fields to struct kvm_vcpu_events:
exception_has_payload, exception_payload, and exception.pending.

With this capability set, all of the details of vcpu->arch.exception,
including the payload for a pending exception, are reported to
userspace in response to KVM_GET_VCPU_EVENTS.

With this capability clear, the original ABI is preserved, and the
exception.injected field is set for either pending or injected
exceptions.

When userspace calls KVM_SET_VCPU_EVENTS with
KVM_CAP_EXCEPTION_PAYLOAD clear, exception.injected is no longer
translated to exception.pending. KVM_SET_VCPU_EVENTS can now only
establish a pending exception when KVM_CAP_EXCEPTION_PAYLOAD is set.
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

59073aaf

16 Oct, 2018 21 commits

kvm: x86: Add has_payload and payload to kvm_queued_exception · c851436a

Jim Mattson authored Oct 16, 2018

The payload associated with a #PF exception is the linear address of
the fault to be loaded into CR2 when the fault is delivered. The
payload associated with a #DB exception is a mask of the DR6 bits to
be set (or in the case of DR6.RTM, cleared) when the fault is
delivered. Add fields has_payload and payload to kvm_queued_exception
to track payloads for pending exceptions.

The new fields are introduced here, but for now, they are just cleared.
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c851436a

KVM: Documentation: Fix omission in struct kvm_vcpu_events · bba9ce58

Jim Mattson authored Oct 16, 2018

The header file indicates that there are 36 reserved bytes at the end
of this structure. Adjust the documentation to agree with the header
file.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

bba9ce58

KVM: selftests: add Enlightened VMCS test · 18178ff8

Vitaly Kuznetsov authored Oct 16, 2018

Modify test library and add eVMCS test. This includes nVMX save/restore
testing.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

18178ff8

tools/headers: update kvm.h · c939989d

Vitaly Kuznetsov authored Oct 16, 2018

Pick up the latest kvm.h definitions.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c939989d

x86/kvm/nVMX: nested state migration for Enlightened VMCS · 8cab6507

Vitaly Kuznetsov authored Oct 16, 2018

Add support for get/set of nested state when Enlightened VMCS is in use.
A new KVM_STATE_NESTED_EVMCS flag to indicate eVMCS on the vCPU was enabled
is added.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8cab6507

KVM: selftests: state_test: test bare VMXON migration · 1e7ecd1b

Vitaly Kuznetsov authored Oct 16, 2018

Split prepare_for_vmx_operation() into prepare_for_vmx_operation() and
load_vmcs() so we can inject GUEST_SYNC() in between.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1e7ecd1b

x86/kvm/nVMX: allow bare VMXON state migration · a1b0c1c6

Vitaly Kuznetsov authored Oct 16, 2018

It is perfectly valid for a guest to do VMXON and not do VMPTRLD. This
state needs to be preserved on migration.

Cc: stable@vger.kernel.org
Fixes: 8fcc4b59Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a1b0c1c6

x86/kvm/lapic: preserve gfn_to_hva_cache len on cache reinit · a7c42bb6

Vitaly Kuznetsov authored Oct 16, 2018

vcpu->arch.pv_eoi is accessible through both HV_X64_MSR_VP_ASSIST_PAGE and
MSR_KVM_PV_EOI_EN so on migration userspace may try to restore them in any
order. Values match, however, kvm_lapic_enable_pv_eoi() uses different
length: for Hyper-V case it's the whole struct hv_vp_assist_page, for KVM
native case it is 8. In case we restore KVM-native MSR last cache will
be reinitialized with len=8 so trying to access VP assist page beyond
8 bytes with kvm_read_guest_cached() will fail.

Check if we re-initializing cache for the same address and preserve length
in case it was greater.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a7c42bb6

x86/kvm/hyperv: don't clear VP assist pages on init · 12e0c618

Vitaly Kuznetsov authored Oct 16, 2018

VP assist pages may hold valuable data which needs to be preserved across
migration. Clean PV EOI portion of the data on init, the guest is
responsible for making sure there's no garbage in the rest.

This will be used for nVMX migration, eVMCS address needs to be preserved.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

12e0c618

KVM: nVMX: optimize prepare_vmcs02{,_full} for Enlightened VMCS case · c4ebd629

Vitaly Kuznetsov authored Oct 16, 2018

When Enlightened VMCS is in use by L1 hypervisor we can avoid vmwriting
VMCS fields which did not change.

Our first goal is to achieve minimal impact on traditional VMCS case so
we're not wrapping each vmwrite() with an if-changed checker. We also can't
utilize static keys as Enlightened VMCS usage is per-guest.

This patch implements the simpliest solution: checking fields in groups.
We skip single vmwrite() statements as doing the check will cost us
something even in non-evmcs case and the win is tiny. Unfortunately, this
makes prepare_vmcs02_full{,_full}() code Enlightened VMCS-dependent (and
a bit ugly).
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c4ebd629

KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR · b8bbab92

Vitaly Kuznetsov authored Oct 16, 2018

Per Hyper-V TLFS 5.0b:

"The L1 hypervisor may choose to use enlightened VMCSs by writing 1 to
the corresponding field in the VP assist page (see section 7.8.7).
Another field in the VP assist page controls the currently active
enlightened VMCS. Each enlightened VMCS is exactly one page (4 KB) in
size and must be initially zeroed. No VMPTRLD instruction must be
executed to make an enlightened VMCS active or current.

After the L1 hypervisor performs a VM entry with an enlightened VMCS,
the VMCS is considered active on the processor. An enlightened VMCS
can only be active on a single processor at the same time. The L1
hypervisor can execute a VMCLEAR instruction to transition an
enlightened VMCS from the active to the non-active state. Any VMREAD
or VMWRITE instructions while an enlightened VMCS is active is
unsupported and can result in unexpected behavior."

Keep Enlightened VMCS structure for the current L2 guest permanently mapped
from struct nested_vmx instead of mapping it every time.
Suggested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b8bbab92

KVM: nVMX: add enlightened VMCS state · 945679e3

Vitaly Kuznetsov authored Oct 16, 2018

Adds hv_evmcs pointer and implement copy_enlightened_to_vmcs12() and
copy_enlightened_to_vmcs12().

prepare_vmcs02()/prepare_vmcs02_full() separation is not valid for
Enlightened VMCS, do full sync for now.
Suggested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

945679e3

KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability · 57b119da

Vitaly Kuznetsov authored Oct 16, 2018

Enlightened VMCS is opt-in. The current version does not contain all
fields supported by nested VMX so we must not advertise the
corresponding VMX features if enlightened VMCS is enabled.

Userspace is given the enlightened VMCS version supported by KVM as
part of enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS. The version is to
be advertised to the nested hypervisor, currently done via a cpuid
leaf for Hyper-V.
Suggested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

57b119da

KVM: VMX: refactor evmcs_sanitize_exec_ctrls() · 5d7a6443

Vitaly Kuznetsov authored Oct 16, 2018

Split off EVMCS1_UNSUPPORTED_* macros so we can re-use them when
enabling Enlightened VMCS for Hyper-V on KVM.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5d7a6443

KVM: hyperv: define VP assist page helpers · 72bbf935

Ladi Prosek authored Oct 16, 2018

The state related to the VP assist page is still managed by the LAPIC
code in the pv_eoi field.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

72bbf935

KVM: refine the comment of function gfn_to_hva_memslot_prot() · 970c0d4b

Wei Yang authored Oct 09, 2018

The original comment is little hard to understand.

No functional change, just amend the comment a little.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

970c0d4b

KVM: x86: reintroduce pte_list_remove, but including mmu_spte_clear_track_bits · e7912386

Wei Yang authored Oct 04, 2018

rmap_remove() removes the sptep after locating the correct rmap_head but,
in several cases, the caller has already known the correct rmap_head.

This patch introduces a new pte_list_remove(); because it is known that
the spte is present (or it would not have an rmap_head), it is safe
to remove the tracking bits without any previous check.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e7912386

KVM: x86: rename pte_list_remove to __pte_list_remove · 8daf3462

Wei Yang authored Oct 04, 2018

This is a patch preparing for further change.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8daf3462

kvm/x86 : add coalesced pio support · 0804c849

Peng Hao authored Oct 14, 2018

Coalesced pio is based on coalesced mmio and can be used for some port
like rtc port, pci-host config port and so on.

Specially in case of rtc as coalesced pio, some versions of windows guest
access rtc frequently because of rtc as system tick. guest access rtc like
this: write register index to 0x70, then write or read data from 0x71.
writing 0x70 port is just as index and do nothing else. So we can use
coalesced pio to handle this scene to reduce VM-EXIT time.

When starting and closing a virtual machine, it will access pci-host config
port frequently. So setting these port as coalesced pio can reduce startup
and shutdown time.

without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8
using perf tools: (guest OS : windows 7 64bit)
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 86 30.99% 74.59% 9us 29us 10.75us (+- 3.41%)
0xcf8:POUT 1119 2.60% 2.12% 2.79us 56.83us 3.41us (+- 2.23%)

with my patch
IO Port Access Samples Samples% Time% Min Time Max Time Avg time
0x70:POUT 106 32.02% 29.47% 0us 10us 1.57us (+- 7.38%)
0xcf8:POUT 1065 1.67% 0.28% 0.41us 65.44us 0.66us (+- 10.55%)
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0804c849

kvm/x86 : add document for coalesced mmio · 9943450b

Peng Hao authored Oct 14, 2018

Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9943450b

kvm/x86 : fix some typo · 39337ad1

Peng Hao authored Oct 04, 2018

Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

39337ad1