Commits · d47510e295c0f82699192a61d715351cf00f65de · nexedi / linux

27 Jan, 2013 3 commits

kvm: Obey read-only mappings in iommu · d47510e2

Alex Williamson authored Jan 24, 2013

We've been ignoring read-only mappings and programming everything
into the iommu as read-write.  Fix this to only include the write
access flag when read-only is not set.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

d47510e2

kvm: Force IOMMU remapping on memory slot read-only flag changes · 261874b0

Alex Williamson authored Jan 24, 2013

Memory slot flags can be altered without changing other parameters of
the slot.  The read-only attribute is the only one the IOMMU cares
about, so generate an un-map, re-map when this occurs.  This also
avoid unnecessarily re-mapping the slot when no IOMMU visible changes
are made.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

261874b0

KVM: x86 emulator: fix test_cc() build failure on i386 · 3f0c3d0b

Avi Kivity authored Jan 26, 2013

'pushq' doesn't exist on i386.  Replace with 'push', which should work
since the operand is a register.
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

3f0c3d0b

24 Jan, 2013 17 commits

KVM: VMX: set vmx->emulation_required only when needed. · 14168786

Gleb Natapov authored Jan 21, 2013

If emulate_invalid_guest_state=false vmx->emulation_required is never
actually used, but it ends up to be always set to true since
handle_invalid_guest_state(), the only place it is reset back to
false, is never called. This, besides been not very clean, makes vmexit
and vmentry path to check emulate_invalid_guest_state needlessly.

The patch fixes that by keeping emulation_required coherent with
emulate_invalid_guest_state setting.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

14168786

KVM: x86: fix use of uninitialized memory as segment descriptor in emulator. · 378a8b09

Gleb Natapov authored Jan 21, 2013

If VMX reports segment as unusable, zero descriptor passed by the emulator
before returning. Such descriptor will be considered not present by the
emulator.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

378a8b09

KVM: VMX: rename fix_pmode_dataseg to fix_pmode_seg. · 91b0aa2c

Gleb Natapov authored Jan 21, 2013

The function deals with code segment too.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

91b0aa2c

KVM: VMX: don't clobber segment AR of unusable segments. · 25391454

Gleb Natapov authored Jan 21, 2013

Usability is returned in unusable field, so not need to clobber entire
AR. Callers have to know how to deal with unusable segments already
since if emulate_invalid_guest_state=true AR is not zeroed.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

25391454

KVM: VMX: skip vmx->rmode.vm86_active check on cr0 write if unrestricted guest is enabled · 218e763f

Gleb Natapov authored Jan 21, 2013

vmx->rmode.vm86_active is never true is unrestricted guest is enabled.
Make it more explicit that neither enter_pmode() nor enter_rmode() is
called in this case.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

218e763f

KVM: VMX: remove hack that disables emulation on vcpu reset/init · 286da415

Gleb Natapov authored Jan 21, 2013

There is no reason for it. If state is suitable for vmentry it
will be detected during guest entry and no emulation will happen.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

286da415

KVM: VMX: if unrestricted guest is enabled vcpu state is always valid. · c5e97c80
Gleb Natapov authored Jan 21, 2013
```
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
c5e97c80

KVM: VMX: reset CPL only on CS register write. · 2f143240

Gleb Natapov authored Jan 21, 2013

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

2f143240

KVM: VMX: remove special CPL cache access during transition to real mode. · 1f3141e8

Gleb Natapov authored Jan 21, 2013

Since vmx_get_cpl() always returns 0 when VCPU is in real mode it is no
longer needed. Also reset CPL cache to zero during transaction to
protected mode since transaction may happen while CS.selectors & 3 != 0,
but in reality CPL is 0.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

1f3141e8

KVM: x86 emulator: convert a few freestanding emulations to fastop · 158de57f

Avi Kivity authored Jan 19, 2013

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

158de57f

KVM: x86 emulator: rearrange fastop definitions · 34b77652

Avi Kivity authored Jan 19, 2013

Make fastop opcodes usable in other emulations.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

34b77652

KVM: x86 emulator: convert 2-operand IMUL to fastop · 4d758349

Avi Kivity authored Jan 19, 2013

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

4d758349

KVM: x86 emulator: convert BT/BTS/BTR/BTC/BSF/BSR to fastop · 11c363ba

Avi Kivity authored Jan 19, 2013

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

11c363ba

KVM: x86 emulator: convert INC/DEC to fastop · 95413dc4

Avi Kivity authored Jan 19, 2013

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

95413dc4

KVM: x86 emulator: covert SETCC to fastop · 9ae9feba

Avi Kivity authored Jan 19, 2013

This is a bit of a special case since we don't have the usual
byte/word/long/quad switch; instead we switch on the condition code embedded
in the instruction.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

9ae9feba

KVM: x86 emulator: convert shift/rotate instructions to fastop · 007a3b54

Avi Kivity authored Jan 19, 2013

SHL, SHR, ROL, ROR, RCL, RCR, SAR, SAL
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

007a3b54

KVM: x86 emulator: Convert SHLD, SHRD to fastop · 0bdea068

Avi Kivity authored Jan 19, 2013

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0bdea068

22 Jan, 2013 3 commits

KVM: x86: improve reexecute_instruction · 93c05d3e

Xiao Guangrong authored Jan 13, 2013

The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn

For example, some cases are nested-write-protect - if the page we want to
write is used as PDE but it chains to itself. Under this case, we should
stop the emulation and report the case to userspace
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

93c05d3e

KVM: x86: let reexecute_instruction work for tdp · 95b3cf69

Xiao Guangrong authored Jan 13, 2013

Currently, reexecute_instruction refused to retry all instructions if
tdp is enabled. If nested npt is used, the emulation may be caused by
shadow page, it can be fixed by dropping the shadow page. And the only
condition that tdp can not retry the instruction is the access fault
on error pfn
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

95b3cf69

KVM: x86: clean up reexecute_instruction · 22368028

Xiao Guangrong authored Jan 13, 2013

Little cleanup for reexecute_instruction, also use gpa_to_gfn in
retry_instruction
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

22368028

17 Jan, 2013 4 commits

KVM: set_memory_region: Remove unnecessary variable memslot · a843fac2

Takuya Yoshikawa authored Jan 11, 2013

One such variable, slot, is enough for holding a pointer temporarily.
We also remove another local variable named slot, which is limited in
a block, since it is confusing to have the same name in this function.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

a843fac2

KVM: set_memory_region: Don't check for overlaps unless we create or move a slot · 0a706bee

Takuya Yoshikawa authored Jan 11, 2013

Don't need the check for deleting an existing slot or just modifiying
the flags.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

0a706bee

KVM: set_memory_region: Don't jump to out_free unnecessarily · 0ea75e1d

Takuya Yoshikawa authored Jan 11, 2013

This makes the separation between the sanity checks and the rest of the
code a bit clearer.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

0ea75e1d

KVM: s390: kvm/sigp.c: fix memory leakage · a046b816

Cong Ding authored Jan 15, 2013

the variable inti should be freed in the branch CPUSTAT_STOPPED.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

a046b816

14 Jan, 2013 8 commits

KVM: MMU: Conditionally reschedule when kvm_mmu_slot_remove_write_access() takes a long time · 6b81b05e

Takuya Yoshikawa authored Jan 08, 2013

If the userspace starts dirty logging for a large slot, say 64GB of
memory, kvm_mmu_slot_remove_write_access() needs to hold mmu_lock for
a long time such as tens of milliseconds. This patch controls the lock
hold time by asking the scheduler if we need to reschedule for others.

One penalty for this is that we need to flush TLBs before releasing
mmu_lock. But since holding mmu_lock for a long time does affect not
only the guest, vCPU threads in other words, but also the host as a
whole, we should pay for that.

In practice, the cost will not be so high because we can protect a fair
amount of memory before being rescheduled: on my test environment,
cond_resched_lock() was called only once for protecting 12GB of memory
even without THP. We can also revisit Avi's "unlocked TLB flush" work
later for completely suppressing extra TLB flushes if needed.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

6b81b05e

KVM: Make kvm_mmu_slot_remove_write_access() take mmu_lock by itself · 9d1beefb

Takuya Yoshikawa authored Jan 08, 2013

Better to place mmu_lock handling and TLB flushing code together since
this is a self-contained function.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

9d1beefb

KVM: Make kvm_mmu_change_mmu_pages() take mmu_lock by itself · b34cb590

Takuya Yoshikawa authored Jan 08, 2013

No reason to make callers take mmu_lock since we do not need to protect
kvm_mmu_change_mmu_pages() and kvm_mmu_slot_remove_write_access()
together by mmu_lock in kvm_arch_commit_memory_region(): the former
calls kvm_mmu_commit_zap_page() and flushes TLBs by itself.

Note: we do not need to protect kvm->arch.n_requested_mmu_pages by
mmu_lock as can be seen from the fact that it is read locklessly.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

b34cb590

KVM: Remove unused slot_bitmap from kvm_mmu_page · e12091ce

Takuya Yoshikawa authored Jan 08, 2013

Not needed any more.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

e12091ce

KVM: MMU: Make kvm_mmu_slot_remove_write_access() rmap based · b99db1d3

Takuya Yoshikawa authored Jan 08, 2013

This makes it possible to release mmu_lock and reschedule conditionally
in a later patch.  Although this may increase the time needed to protect
the whole slot when we start dirty logging, the kernel should not allow
the userspace to trigger something that will hold a spinlock for such a
long time as tens of milliseconds: actually there is no limit since it
is roughly proportional to the number of guest pages.

Another point to note is that this patch removes the only user of
slot_bitmap which will cause some problems when we increase the number
of slots further.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

b99db1d3

KVM: MMU: Remove unused parameter level from __rmap_write_protect() · 245c3912

Takuya Yoshikawa authored Jan 08, 2013

No longer need to care about the mapping level in this function.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

245c3912

KVM: Write protect the updated slot only when dirty logging is enabled · c972f3b1

Takuya Yoshikawa authored Jan 08, 2013

Calling kvm_mmu_slot_remove_write_access() for a deleted slot does
nothing but search for non-existent mmu pages which have mappings to
that deleted memory; this is safe but a waste of time.

Since we want to make the function rmap based in a later patch, in a
manner which makes it unsafe to be called for a deleted slot, we makes
the caller see if the slot is non-zero and being dirty logged.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

c972f3b1

Merge branch 'kvm-ppc-next' of https://github.com/agraf/linux-2.6 into queue · aa11e3a8
Gleb Natapov authored Jan 14, 2013

aa11e3a8

10 Jan, 2013 5 commits

KVM: trace: Fix exit decoding. · f79ed82d

Cornelia Huck authored Jan 08, 2013

trace_kvm_userspace_exit has been missing the KVM_EXIT_WATCHDOG exit.

CC: Bharat Bhushan <r65777@freescale.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f79ed82d

KVM: MMU: fix infinite fault access retry · 7751babd

Xiao Guangrong authored Jan 08, 2013

We have two issues in current code:
- if target gfn is used as its page table, guest will refault then kvm will use
  small page size to map it. We need two #PF to fix its shadow page table

- sometimes, say a exception is triggered during vm-exit caused by #PF
  (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
  by the target gfn before go into page fault path, it will cause infinite
  loop:
  delete shadow pages shadowed by the gfn -> try to use large page size to map
  the gfn -> retry the access ->...

To fix these, we can adjust page size early if the target gfn is used as page
table
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

7751babd

KVM: MMU: fix Dirty bit missed if CR0.WP = 0 · c2288505

Xiao Guangrong authored Jan 08, 2013

If the write-fault access is from supervisor and CR0.WP is not set on the
vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte
and clears U bit. This is the chance that kvm can change pte access from
readonly to writable

Unfortunately, the pte access is the access of 'direct' shadow page table,
means direct sp.role.access = pte_access, then we will create a writable
spte entry on the readonly shadow page table. It will cause Dirty bit is
not tracked when two guest ptes point to the same large page. Note, it
does not have other impact except Dirty bit since cr0.wp is encoded into
sp.role

It can be fixed by adjusting pte access before establishing shadow page
table. Also, after that, no mmu specified code exists in the common function
and drop two parameters in set_spte
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

c2288505

KVM: PPC: BookE: Add EPR ONE_REG sync · 324b3e63

Alexander Graf authored Jan 04, 2013

We need to be able to read and write the contents of the EPR register
from user space.

This patch implements that logic through the ONE_REG API and declares
its (never implemented) SREGS counterpart as deprecated.
Signed-off-by: Alexander Graf <agraf@suse.de>

324b3e63

KVM: PPC: BookE: Implement EPR exit · 1c810636

Alexander Graf authored Jan 04, 2013

The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.

Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.

This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.
Signed-off-by: Alexander Graf <agraf@suse.de>

1c810636