Commits · 8a7ae055f3533b520401c170ac55e30628b34df5 · nexedi / linux

30 Jan, 2008 40 commits

KVM: MMU: Partial swapping of guest memory · 8a7ae055

Izik Eidus authored Oct 18, 2007

This allows guest memory to be swapped.  Pages which are currently mapped
via shadow page tables are pinned into memory, but all other pages can
be freely swapped.

The patch makes gfn_to_page() elevate the page's reference count, and
introduces kvm_release_page() that pairs with it.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

8a7ae055

KVM: MMU: Make gfn_to_page() always safe · cea7bb21

Izik Eidus authored Oct 17, 2007

In case the page is not present in the guest memory map, return a dummy
page the guest can scribble on.

This simplifies error checking in its users.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

cea7bb21

KVM: MMU: Keep a reverse mapping of non-writable translations · 9647c14c

Izik Eidus authored Oct 16, 2007

The current kvm mmu only reverse maps writable translation.  This is used
to write-protect a page in case it becomes a pagetable.

But with swapping support, we need a reverse mapping of read-only pages as
well:  when we evict a page, we need to remove any mapping to it, whether
writable or not.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

9647c14c

KVM: MMU: Add rmap_next(), a helper for walking kvm rmaps · 98348e95
Izik Eidus authored Oct 16, 2007
```
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
98348e95

KVM: x86 emulator: cmc, clc, cli, sti · b284be57

Nitin A Kamble authored Oct 16, 2007

Instruction: cmc, clc, cli, sti
opcodes: 0xf5, 0xf8, 0xfa, 0xfb respectively.

[avi: fix reference to EFLG_IF which is not defined anywhere]
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

b284be57

KVM: MMU: Simplify page table walker · 42bf3f0a

Avi Kivity authored Oct 17, 2007

Simplify the walker level loop not to carry so much information from one
loop to the next. In addition to being complex, this made kmap_atomic()
critical sections difficult to manage.

As a result of this change, kmap_atomic() sections are limited to actually
touching the guest pte, which allows the other functions called from the
walker to do sleepy operations. This will happen when we enable swapping.
Signed-off-by: Avi Kivity <avi@qumranet.com>

42bf3f0a

KVM: x86 emulator: Implement emulation of instruction: inc & dec · d77a2507

Nitin A Kamble authored Oct 12, 2007

Instructions:
	inc r16/r32 (opcode 0x40-0x47)
	dec r16/r32 (opcode 0x48-0x4f)
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

d77a2507

KVM: Rename KVM_TLB_FLUSH to KVM_REQ_TLB_FLUSH · 3176bc3e

Avi Kivity authored Oct 16, 2007

We now have a new namespace, KVM_REQ_*, for bits in vcpu->requests.
Signed-off-by: Avi Kivity <avi@qumranet.com>

3176bc3e

KVM: Move apic timer interrupt backlog processing to common code · ab6ef34b

Avi Kivity authored Oct 16, 2007

Beside the obvious goodness of making code more common, this prevents
a livelock with the next patch which moves interrupt injection out of the
critical section.
Signed-off-by: Avi Kivity <avi@qumranet.com>

ab6ef34b

KVM: Add some \n in ioapic_debug() · e25e3ed5

Laurent Vivier authored Oct 12, 2007

Add new-line at end of debug strings.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e25e3ed5

KVM: apic round robin cleanup · e4d47f40

Qing He authored Sep 24, 2007

If no apic is enabled in the bitmap of an interrupt delivery with delivery
mode of lowest priority, a warning should be reported rather than select
a fallback vcpu
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Eddie (Yaozu) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e4d47f40

KVM: Portability: split kvm_vcpu_ioctl · 313a3dc7

Carsten Otte authored Oct 11, 2007

This patch splits kvm_vcpu_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.

Common ioctls for all architectures are:
KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT,
KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU
Note that some PPC chips don't have an FPU, so we might need an #ifdef
around KVM_GET/SET_FPU one day.

x86 specific ioctls are:
KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS

An interresting aspect is vcpu_load/vcpu_put. We now have a common
vcpu_load/put which does the preemption stuff, and an architecture
specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the
vmx/svm function defined in kvm_x86_ops.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

313a3dc7

KVM: MMU: When updating the dirty bit, inform the mmu about it · c4fcc272

Avi Kivity authored Oct 11, 2007

Since the mmu uses different shadow pages for dirty large pages and clean
large pages, this allows the mmu to drop ptes that are now invalid.
Signed-off-by: Avi Kivity <avi@qumranet.com>

c4fcc272

KVM: MMU: Move dirty bit updates to a separate function · 5df34a86
Avi Kivity authored Oct 11, 2007
```
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
5df34a86
KVM: MMU: Instantiate real-mode shadows as user writable shadows · 6bfccdc9
Avi Kivity authored Oct 11, 2007
```
This is consistent with real-mode permissions.
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
6bfccdc9

KVM: MMU: Disable write access on clean large pages · cc70e737

Avi Kivity authored Oct 11, 2007

By forcing clean huge pages to be read-only, we have separate roles
for the shadow of a clean large page and the shadow of a dirty large
page.  This is necessary because different ptes will be instantiated
for the two cases, even for read faults.
Signed-off-by: Avi Kivity <avi@qumranet.com>

cc70e737

KVM: MMU: Fix nx access bit for huge pages · c22e3514

Avi Kivity authored Oct 11, 2007

We must set the bit before the shift, otherwise the wrong bit gets set.
Signed-off-by: Avi Kivity <avi@qumranet.com>

c22e3514

KVM: Move guest pte dirty bit management to the guest pagetable walker · e3c5e7ec

Avi Kivity authored Oct 11, 2007

This is more consistent with the accessed bit management, and makes the dirty
bit available earlier for other purposes.
Signed-off-by: Avi Kivity <avi@qumranet.com>

e3c5e7ec

KVM: MMU: More struct kvm_vcpu -> struct kvm cleanups · 4a4c9924

Anthony Liguori authored Oct 10, 2007

This time, the biggest change is gpa_to_hpa. The translation of GPA to HPA does
not depend on the VCPU state unlike GVA to GPA so there's no need to pass in
the kvm_vcpu.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

4a4c9924

KVM: MMU: Clean up MMU functions to take struct kvm when appropriate · f67a46f4

Anthony Liguori authored Oct 10, 2007

Some of the MMU functions take a struct kvm_vcpu even though they affect all
VCPUs.  This patch cleans up some of them to instead take a struct kvm.  This
makes things a bit more clear.

The main thing that was confusing me was whether certain functions need to be
called on all VCPUs.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

f67a46f4

KVM: Move x86 msr handling to new files x86.[ch] · 043405e1
Carsten Otte authored Oct 10, 2007
```
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
043405e1

KVM: Support assigning userspace memory to the guest · 6fc138d2

Izik Eidus authored Oct 09, 2007

Instead of having the kernel allocate memory to the guest, let userspace
allocate it and pass the address to the kernel.

This is required for s390 support, but also enables features like memory
sharing and using hugetlbfs backed memory.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

6fc138d2

KVM: CodingStyle cleanup · d77c26fc

Mike Day authored Oct 08, 2007

Signed-off-by: Mike D. Day <ncmike@ncultra.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>

d77c26fc

KVM: Remove gratuitous casts from lapic.c · 7e620d16

Rusty Russell authored Oct 08, 2007

Since vcpu->apic is of the correct type, there's not need to cast.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

7e620d16

KVM: Hoist kvm_create_lapic() into kvm_vcpu_init() · 76fafa5e

Rusty Russell authored Oct 08, 2007

Move kvm_create_lapic() into kvm_vcpu_init(), rather than having svm
and vmx do it.  And make it return the error rather than a fairly
random -ENOMEM.

This also solves the problem that neither svm.c nor vmx.c actually
handles the error path properly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

76fafa5e

KVM: Add kvm_free_lapic() to pair with kvm_create_lapic() · d589444e

Rusty Russell authored Oct 08, 2007

Instead of the asymetry of kvm_free_apic, implement kvm_free_lapic().
And guess what?  I found a minor bug: we don't need to hrtimer_cancel()
from kvm_main.c, because we do that in kvm_free_apic().

Also:
1) kvm_vcpu_uninit should be the reverse order from kvm_vcpu_init.
2) Don't set apic->regs_page to zero before freeing apic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

d589444e

KVM: Allow dynamic allocation of the mmu shadow cache size · 82ce2c96

Izik Eidus authored Oct 02, 2007

The user is now able to set how many mmu pages will be allocated to the guest.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

82ce2c96

KVM: Add general accessors to read and write guest memory · 195aefde
Izik Eidus authored Oct 01, 2007
```
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
195aefde

KVM: Remove the usage of page->private field by rmap · 290fc38d

Izik Eidus authored Sep 27, 2007

When kvm uses user-allocated pages in the future for the guest, we won't
be able to use page->private for rmap, since page->rmap is reserved for
the filesystem.  So we move the rmap base pointers to the memory slot.

A side effect of this is that we need to store the gfn of each gpte in
the shadow pages, since the memory slot is addressed by gfn, instead of
hfn like struct page.
Signed-off-by: Izik Eidus <izik@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

290fc38d

KVM: VMX: Simplify vcpu_clear() · f566e09f

Avi Kivity authored Sep 30, 2007

Now that smp_call_function_single() knows how to call a function on the
current cpu, there's no need to check explicitly.
Signed-off-by: Avi Kivity <avi@qumranet.com>

f566e09f

KVM: VMX: Don't clear the vmcs if the vcpu is not loaded on any processor · eae5ecb5
Avi Kivity authored Sep 30, 2007
```
Noted by Eddie Dong.
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
eae5ecb5

KVM: x86 emulator: Any legacy prefix after a REX prefix nullifies its effect · b4c6abfe

Laurent Vivier authored Sep 25, 2007

This patch modifies the management of REX prefix according behavior
I saw in Xen 3.1. In Xen, this modification has been introduced by
Jan Beulich.

http://lists.xensource.com/archives/html/xen-changelog/2007-01/msg00081.htmlSigned-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

b4c6abfe

KVM: Purify x86_decode_insn() error case management · a22436b7

Laurent Vivier authored Sep 24, 2007

The only valid case is on protected page access, other cases are errors.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

a22436b7

KVM: x86_emulator: no writeback for bt · e4f8e039

Qing He authored Sep 24, 2007

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e4f8e039

KVM: x86 emulator: Remove no_wb, use dst.type = OP_NONE instead · a01af5ec

Laurent Vivier authored Sep 24, 2007

Remove no_wb, use dst.type = OP_NONE instead, idea stollen from xen-3.1
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

a01af5ec

KVM: x86 emulator: remove _eflags and use directly ctxt->eflags. · 05f086f8

Laurent Vivier authored Sep 24, 2007

Remove _eflags and use directly ctxt->eflags. Caching eflags is not needed as
it is restored to vcpu by kvm_main.c:emulate_instruction() from ctxt->eflags
only if emulation doesn't fail.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

05f086f8

KVM: x86 emulator: split some decoding into functions for readability · 8cdbd2c9

Laurent Vivier authored Sep 24, 2007

To improve readability, move push, writeback, and grp 1a/2/3/4/5/9 emulation
parts into functions.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

8cdbd2c9

KVM: MMU: Ignore reserved bits in cr3 in non-pae mode · 21764863

Ryan Harper authored Sep 18, 2007

This patch removes the fault injected when the guest attempts to set reserved
bits in cr3.  X86 hardware doesn't generate a fault when setting reserved bits.
The result of this patch is that vmware-server, running within a kvm guest,
boots and runs memtest from an iso.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

21764863

KVM: MMU: Make flooding detection work when guest page faults are bypassed · 12b7d28f

Avi Kivity authored Sep 23, 2007

When we allow guest page faults to reach the guests directly, we lose
the fault tracking which allows us to detect demand paging. So we provide
an alternate mechnism by clearing the accessed bit when we set a pte, and
checking it later to see if the guest actually used it.
Signed-off-by: Avi Kivity <avi@qumranet.com>

12b7d28f

KVM: Allow not-present guest page faults to bypass kvm · c7addb90

Avi Kivity authored Sep 16, 2007

There are two classes of page faults trapped by kvm:
 - host page faults, where the fault is needed to allow kvm to install
   the shadow pte or update the guest accessed and dirty bits
 - guest page faults, where the guest has faulted and kvm simply injects
   the fault back into the guest to handle

The second class, guest page faults, is pure overhead.  We can eliminate
some of it on vmx using the following evil trick:
 - when we set up a shadow page table entry, if the corresponding guest pte
   is not present, set up the shadow pte as not present
 - if the guest pte _is_ present, mark the shadow pte as present but also
   set one of the reserved bits in the shadow pte
 - tell the vmx hardware not to trap faults which have the present bit clear

With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.

Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code.  It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>

c7addb90