Commits · 8afd74c2961a7544b0d73fad2d92f2cb65502e99 · Kirill Smelkov / linux

21 Apr, 2017 2 commits

Merge branch 'x86/process' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD · 8afd74c2
Paolo Bonzini authored Apr 21, 2017
```
Required for KVM support of the CPUID faulting feature.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
8afd74c2

KVM: VMX: drop vmm_exclusive module parameter · fe0e80be

David Hildenbrand authored Mar 10, 2017

vmm_exclusive=0 leads to KVM setting X86_CR4_VMXE always and calling
VMXON only when the vcpu is loaded. X86_CR4_VMXE is used as an
indication in cpu_emergency_vmxoff() (called on kdump) if VMXOFF has to be
called. This is obviously not the case if both are used independtly.
Calling VMXOFF without a previous VMXON will result in an exception.

In addition, X86_CR4_VMXE is used as a mean to test if VMX is already in
use by another VMM in hardware_enable(). So there can't really be
co-existance. If the other VMM is prepared for co-existance and does a
similar check, only one VMM can exist. If the other VMM is not prepared
and blindly sets/clears X86_CR4_VMXE, we will get inconsistencies with
X86_CR4_VMXE.

As we also had bug reports related to clearing of vmcs with vmm_exclusive=0
this seems to be pretty much untested. So let's better drop it.

While at it, directly move setting/clearing X86_CR4_VMXE into
kvm_cpu_vmxon/off.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fe0e80be

13 Apr, 2017 1 commit

KVM: nVMX: fix AD condition when handling EPT violation · 33251870

Radim Krčmář authored Apr 13, 2017

I have introduced this bug when applying and simplifying Paolo's patch
as we agreed on the list.  The original was "x &= ~y; if (z) x |= y;".

Here is the story of a bad workflow:

  A maintainer was already testing with the intended change, but it was
  applied only to a testing repo on a different machine.  When the time
  to push tested patches to kvm/next came, he realized that this change
  was missing and quickly added it to the maintenance repo, didn't test
  again (because the change is trivial, right), and pushed the world to
  fire.

Fixes: ae1e2d10 ("kvm: nVMX: support EPT accessed/dirty bits")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

33251870

12 Apr, 2017 28 commits

KVM: x86: Add MSR_AMD64_DC_CFG to the list of ignored MSRs · 405a353a

Ladi Prosek authored Apr 06, 2017

Hyper-V writes 0x800000000000 to MSR_AMD64_DC_CFG when running on AMD CPUs
as recommended in erratum 383, analogous to our svm_init_erratum_383.

By ignoring the MSR, this patch enables running Hyper-V in L1 on AMD.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

405a353a

x86/kvm: virt_xxx memory barriers instead of mandatory barriers · 5a48a622

Wanpeng Li authored Apr 11, 2017

virt_xxx memory barriers are implemented trivially using the low-level
__smp_xxx macros, __smp_xxx is equal to a compiler barrier for strong
TSO memory model, however, mandatory barriers will unconditional add
memory barriers, this patch replaces the rmb() in kvm_steal_clock() by
virt_rmb().

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

5a48a622

KVM: x86: fix maintaining of kvm_clock stability on guest CPU hotplug · bd8fab39

Denis Plotnikov authored Apr 07, 2017

VCPU TSC synchronization is perfromed in kvm_write_tsc() when the TSC
value being set is within 1 second from the expected, as obtained by
extrapolating of the TSC in already synchronized VCPUs.

This is naturally achieved on all VCPUs at VM start and resume;
however on VCPU hotplug it is not: the newly added VCPU is created
with TSC == 0 while others are well ahead.

To compensate for that, consider host-initiated kvm_write_tsc() with
TSC == 0 a special case requiring synchronization regardless of the
current TSC on other VCPUs.
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

bd8fab39

KVM: x86: remaster kvm_write_tsc code · c5e8ec8e

Denis Plotnikov authored Apr 07, 2017

Reuse existing code instead of using inline asm.
Make the code more concise and clear in the TSC
synchronization part.
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

c5e8ec8e

KVM: x86: use irqchip_kernel() to check for pic+ioapic · 900ab14c

David Hildenbrand authored Apr 07, 2017

Although the current check is not wrong, this check explicitly includes
the pic.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

900ab14c

KVM: x86: simplify pic_ioport_read() · b5e7cf52

David Hildenbrand authored Apr 07, 2017

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

b5e7cf52

KVM: x86: set data directly in picdev_read() · 84a5c79e

David Hildenbrand authored Apr 07, 2017

Now it looks almost as picdev_write().
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

84a5c79e

KVM: x86: drop picdev_in_range() · 9fecaa9e

David Hildenbrand authored Apr 07, 2017

We already have the exact same checks a couple of lines below.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

9fecaa9e

KVM: x86: make kvm_pic_reset() static · dc24d1d2

David Hildenbrand authored Apr 07, 2017

Not used outside of i8259.c, so let's make it static.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

dc24d1d2

KVM: x86: simplify pic_unlock() · e21d1758

David Hildenbrand authored Apr 07, 2017

We can easily compact this code and get rid of one local variable.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

e21d1758

KVM: x86: cleanup return handling in setup_routing_entry() · 8c6b7828

David Hildenbrand authored Apr 07, 2017

Let's drop the goto and return the error value directly.
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

8c6b7828

KVM: x86: drop goto label in kvm_set_routing_entry() · 43ae312c

David Hildenbrand authored Apr 07, 2017

No need for the goto label + local variable "r".
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

43ae312c

KVM: x86: rename kvm_vcpu_request_scan_ioapic() · 993225ad

David Hildenbrand authored Apr 07, 2017

Let's rename it into a proper arch specific callback.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

993225ad

KVM: x86: directly call kvm_make_scan_ioapic_request() in ioapic.c · ca8ab3f8

David Hildenbrand authored Apr 07, 2017

We know there is an ioapic, so let's call it directly.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

ca8ab3f8

KVM: x86: remove all-vcpu request from kvm_ioapic_init() · d62f270b

David Hildenbrand authored Apr 07, 2017

kvm_ioapic_init() is guaranteed to be called without any created VCPUs,
so doing an all-vcpu request results in a NOP.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

d62f270b

KVM: x86: KVM_IRQCHIP_PIC_MASTER only has 8 pins · 445ee82d

David Hildenbrand authored Apr 07, 2017

Currently, one could set pin 8-15, implicitly referring to
KVM_IRQCHIP_PIC_SLAVE.

Get rid of the two local variables max_pin and delta on the way.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

445ee82d

KVM: x86: push usage of slots_lock down · 49f520b9

David Hildenbrand authored Apr 07, 2017

Let's just move it to the place where it is actually needed.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

49f520b9

KVM: x86: don't take kvm->irq_lock when creating IRQCHIP · ba7454e1

David Hildenbrand authored Apr 07, 2017

I don't see any reason any more for this lock, seemed to be used to protect
removal of kvm->arch.vpic / kvm->arch.vioapic when already partially
inititalized, now access is properly protected using kvm->arch.irqchip_mode
and this shouldn't be necessary anymore.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

ba7454e1

KVM: x86: convert kvm_(set|get)_ioapic() into void · 33392b49

David Hildenbrand authored Apr 07, 2017

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

33392b49

KVM: x86: remove duplicate checks for ioapic · 4c0b06d8

David Hildenbrand authored Apr 07, 2017

When handling KVM_GET_IRQCHIP, we already check irqchip_kernel(), which
implies a fully inititalized ioapic.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

4c0b06d8

KVM: x86: use ioapic_in_kernel() to check for ioapic existence · 0bceb15a
David Hildenbrand authored Apr 07, 2017
```
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
```
0bceb15a

KVM: x86: get rid of ioapic_irqchip() · 0191e92d

David Hildenbrand authored Apr 07, 2017

Let's just use kvm->arch.vioapic directly.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

0191e92d

KVM: x86: get rid of pic_irqchip() · 90bca052

David Hildenbrand authored Apr 07, 2017

It seemed like a nice idea to encapsulate access to kvm->arch.vpic. But
as the usage is already mixed, internal locks are taken outside of i8259.c
and grepping for "vpic" only is much easier, let's just get rid of
pic_irqchip().
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

90bca052

KVM: x86: check against irqchip_mode in ioapic_in_kernel() · f567080b

David Hildenbrand authored Apr 07, 2017

KVM_IRQCHIP_KERNEL implies a fully inititalized ioapic, while
kvm->arch.vioapic might temporarily be set but invalidated again if e.g.
setting of default routing fails when setting KVM_CREATE_IRQCHIP.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

f567080b

KVM: x86: check against irqchip_mode in pic_in_kernel() · 19d25a0e

David Hildenbrand authored Apr 07, 2017

Let's avoid checking against kvm->arch.vpic. We have kvm->arch.irqchip_mode
for that now.

KVM_IRQCHIP_KERNEL implies a fully inititalized pic, while kvm->arch.vpic
might temporarily be set but invalidated again if e.g. kvm_ioapic_init()
fails when setting KVM_CREATE_IRQCHIP. Although current users seem to be
fine, this avoids future bugs.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

19d25a0e

KVM: x86: check against irqchip_mode in kvm_set_routing_entry() · 8bf463f3

David Hildenbrand authored Apr 07, 2017

Let's replace the checks for pic_in_kernel() and ioapic_in_kernel() by
checks against irqchip_mode.

Also make sure that creation of any route is only possible if we have
an lapic in kernel (irqchip_in_kernel()) or if we are currently
inititalizing the irqchip.

This is necessary to switch pic_in_kernel() and ioapic_in_kernel() to
irqchip_mode, too.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

8bf463f3

KVM: x86: new irqchip mode KVM_IRQCHIP_INIT_IN_PROGRESS · 637e3f86

David Hildenbrand authored Apr 07, 2017

Let's add a new mode and set it while we create the irqchip via
KVM_CREATE_IRQCHIP and KVM_CAP_SPLIT_IRQCHIP.

This mode will be used later to test if adding routes
(in kvm_set_routing_entry()) is already allowed.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

637e3f86

KVM: x86: race between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP · 1df6dded

David Hildenbrand authored Apr 07, 2017

Avoid races between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP by taking
the kvm->lock when setting up routes.

If KVM_CREATE_IRQCHIP fails, KVM_SET_GSI_ROUTING could have already set
up routes pointing at pic/ioapic, being silently removed already.

Also, as a side effect, this patch makes sure that KVM_SET_GSI_ROUTING
and KVM_CAP_SPLIT_IRQCHIP cannot run in parallel.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

1df6dded

11 Apr, 2017 1 commit

Merge tag 'kvm-s390-next-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux · f7b1a77d

Radim Krčmář authored Apr 11, 2017

From: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: features for 4.12

1. guarded storage support for guests
   This contains an s390 base Linux feature branch that is necessary
   to implement the KVM part
2. Provide an interface to implement adapter interruption suppression
   which is necessary for proper zPCI support
3. Use more defines instead of numbers
4. Provide logging for lazy enablement of runtime instrumentation

f7b1a77d

07 Apr, 2017 8 commits

kvm: nVMX: Disallow userspace-injected exceptions in guest mode · 28d06353

Jim Mattson authored Apr 05, 2017

The userspace exception injection API and code path are entirely
unprepared for exceptions that might cause a VM-exit from L2 to L1, so
the best course of action may be to simply disallow this for now.

1. The API provides no mechanism for userspace to specify the new DR6
bits for a #DB exception or the new CR2 value for a #PF
exception. Presumably, userspace is expected to modify these registers
directly with KVM_SET_SREGS before the next KVM_RUN ioctl. However, in
the event that L1 intercepts the exception, these registers should not
be changed. Instead, the new values should be provided in the
exit_qualification field of vmcs12 (Intel SDM vol 3, section 27.1).

2. In the case of a userspace-injected #DB, inject_pending_event()
clears DR7.GD before calling vmx_queue_exception(). However, in the
event that L1 intercepts the exception, this is too early, because
DR7.GD should not be modified by a #DB that causes a VM-exit directly
(Intel SDM vol 3, section 27.1).

3. If the injected exception is a #PF, nested_vmx_check_exception()
doesn't properly check whether or not L1 is interested in the
associated error code (using the #PF error code mask and match fields
from vmcs12). It may either return 0 when it should call
nested_vmx_vmexit() or vice versa.

4. nested_vmx_check_exception() assumes that it is dealing with a
hardware-generated exception intercept from L2, with some of the
relevant details (the VM-exit interruption-information and the exit
qualification) live in vmcs02. For userspace-injected exceptions, this
is not the case.

5. prepare_vmcs12() assumes that when its exit_intr_info argument
specifies valid information with a valid error code that it can VMREAD
the VM-exit interruption error code from vmcs02. For
userspace-injected exceptions, this is not the case.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

28d06353

KVM: x86: fix user triggerable warning in kvm_apic_accept_events() · 28bf2888

David Hildenbrand authored Mar 23, 2017

If we already entered/are about to enter SMM, don't allow switching to
INIT/SIPI_RECEIVED, otherwise the next call to kvm_apic_accept_events()
will report a warning.

Same applies if we are already in MP state INIT_RECEIVED and SMM is
requested to be turned on. Refuse to set the VCPU events in this case.

Fixes: cd7764fe ("KVM: x86: latch INITs while in system management mode")
Cc: stable@vger.kernel.org # 4.2+
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

28bf2888

kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET public · 4b4357e0

Paolo Bonzini authored Mar 31, 2017

Its value has never changed; we might as well make it part of the ABI instead
of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO).

Because PPC does not always make MMIO available, the code has to be made
dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

4b4357e0

kvm: make KVM_CAP_COALESCED_MMIO architecture agnostic · 30422558

Paolo Bonzini authored Mar 31, 2017

Remove code from architecture files that can be moved to virt/kvm, since there
is already common code for coalesced MMIO.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[Removed a pointless 'break' after 'return'.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

30422558

KVM: nVMX: support RDRAND and RDSEED exiting · a5f46457

Paolo Bonzini authored Mar 30, 2017

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

a5f46457

KVM: VMX: add missing exit reasons · 1f519992

Paolo Bonzini authored Mar 30, 2017

In order to simplify adding exit reasons in the future,
the array of exit reason names is now also sorted by
exit reason code.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

1f519992

kvm: nVMX: support EPT accessed/dirty bits · ae1e2d10

Paolo Bonzini authored Mar 30, 2017

Now use bit 6 of EPTP to optionally enable A/D bits for EPTP.  Another
thing to change is that, when EPT accessed and dirty bits are not in use,
VMX treats accesses to guest paging structures as data reads.  When they
are in use (bit 6 of EPTP is set), they are treated as writes and the
corresponding EPT dirty bit is set.  The MMU didn't know this detail,
so this patch adds it.

We also have to fix up the exit qualification.  It may be wrong because
KVM sets bit 6 but the guest might not.

L1 emulates EPT A/D bits using write permissions, so in principle it may
be possible for EPT A/D bits to be used by L1 even though not available
in hardware.  The problem is that guest page-table walks will be treated
as reads rather than writes, so they would not cause an EPT violation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Fixed typo in walk_addr_generic() comment and changed bit clear +
 conditional-set pattern in handle_ept_violation() to conditional-clear]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

ae1e2d10

kvm: x86: MMU support for EPT accessed/dirty bits · 86407bcb

Paolo Bonzini authored Mar 30, 2017

This prepares the MMU paging code for EPT accessed and dirty bits,
which can be enabled optionally at runtime.  Code that updates the
accessed and dirty bits will need a pointer to the struct kvm_mmu.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

86407bcb