Commits · 95413dc41398fec2518abf4e0449503b1306dcbc · Kirill Smelkov / linux

24 Jan, 2013 4 commits

KVM: x86 emulator: convert INC/DEC to fastop · 95413dc4

Avi Kivity authored Jan 19, 2013

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

95413dc4

KVM: x86 emulator: covert SETCC to fastop · 9ae9feba

Avi Kivity authored Jan 19, 2013

This is a bit of a special case since we don't have the usual
byte/word/long/quad switch; instead we switch on the condition code embedded
in the instruction.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

9ae9feba

KVM: x86 emulator: convert shift/rotate instructions to fastop · 007a3b54

Avi Kivity authored Jan 19, 2013

SHL, SHR, ROL, ROR, RCL, RCR, SAR, SAL
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

007a3b54

KVM: x86 emulator: Convert SHLD, SHRD to fastop · 0bdea068

Avi Kivity authored Jan 19, 2013

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0bdea068

22 Jan, 2013 3 commits

KVM: x86: improve reexecute_instruction · 93c05d3e

Xiao Guangrong authored Jan 13, 2013

The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn

For example, some cases are nested-write-protect - if the page we want to
write is used as PDE but it chains to itself. Under this case, we should
stop the emulation and report the case to userspace
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

93c05d3e

KVM: x86: let reexecute_instruction work for tdp · 95b3cf69

Xiao Guangrong authored Jan 13, 2013

Currently, reexecute_instruction refused to retry all instructions if
tdp is enabled. If nested npt is used, the emulation may be caused by
shadow page, it can be fixed by dropping the shadow page. And the only
condition that tdp can not retry the instruction is the access fault
on error pfn
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

95b3cf69

KVM: x86: clean up reexecute_instruction · 22368028

Xiao Guangrong authored Jan 13, 2013

Little cleanup for reexecute_instruction, also use gpa_to_gfn in
retry_instruction
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

22368028

17 Jan, 2013 4 commits

KVM: set_memory_region: Remove unnecessary variable memslot · a843fac2

Takuya Yoshikawa authored Jan 11, 2013

One such variable, slot, is enough for holding a pointer temporarily.
We also remove another local variable named slot, which is limited in
a block, since it is confusing to have the same name in this function.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

a843fac2

KVM: set_memory_region: Don't check for overlaps unless we create or move a slot · 0a706bee

Takuya Yoshikawa authored Jan 11, 2013

Don't need the check for deleting an existing slot or just modifiying
the flags.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

0a706bee

KVM: set_memory_region: Don't jump to out_free unnecessarily · 0ea75e1d

Takuya Yoshikawa authored Jan 11, 2013

This makes the separation between the sanity checks and the rest of the
code a bit clearer.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

0ea75e1d

KVM: s390: kvm/sigp.c: fix memory leakage · a046b816

Cong Ding authored Jan 15, 2013

the variable inti should be freed in the branch CPUSTAT_STOPPED.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

a046b816

14 Jan, 2013 8 commits

KVM: MMU: Conditionally reschedule when kvm_mmu_slot_remove_write_access() takes a long time · 6b81b05e

Takuya Yoshikawa authored Jan 08, 2013

If the userspace starts dirty logging for a large slot, say 64GB of
memory, kvm_mmu_slot_remove_write_access() needs to hold mmu_lock for
a long time such as tens of milliseconds. This patch controls the lock
hold time by asking the scheduler if we need to reschedule for others.

One penalty for this is that we need to flush TLBs before releasing
mmu_lock. But since holding mmu_lock for a long time does affect not
only the guest, vCPU threads in other words, but also the host as a
whole, we should pay for that.

In practice, the cost will not be so high because we can protect a fair
amount of memory before being rescheduled: on my test environment,
cond_resched_lock() was called only once for protecting 12GB of memory
even without THP. We can also revisit Avi's "unlocked TLB flush" work
later for completely suppressing extra TLB flushes if needed.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

6b81b05e

KVM: Make kvm_mmu_slot_remove_write_access() take mmu_lock by itself · 9d1beefb

Takuya Yoshikawa authored Jan 08, 2013

Better to place mmu_lock handling and TLB flushing code together since
this is a self-contained function.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

9d1beefb

KVM: Make kvm_mmu_change_mmu_pages() take mmu_lock by itself · b34cb590

Takuya Yoshikawa authored Jan 08, 2013

No reason to make callers take mmu_lock since we do not need to protect
kvm_mmu_change_mmu_pages() and kvm_mmu_slot_remove_write_access()
together by mmu_lock in kvm_arch_commit_memory_region(): the former
calls kvm_mmu_commit_zap_page() and flushes TLBs by itself.

Note: we do not need to protect kvm->arch.n_requested_mmu_pages by
mmu_lock as can be seen from the fact that it is read locklessly.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

b34cb590

KVM: Remove unused slot_bitmap from kvm_mmu_page · e12091ce

Takuya Yoshikawa authored Jan 08, 2013

Not needed any more.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

e12091ce

KVM: MMU: Make kvm_mmu_slot_remove_write_access() rmap based · b99db1d3

Takuya Yoshikawa authored Jan 08, 2013

This makes it possible to release mmu_lock and reschedule conditionally
in a later patch.  Although this may increase the time needed to protect
the whole slot when we start dirty logging, the kernel should not allow
the userspace to trigger something that will hold a spinlock for such a
long time as tens of milliseconds: actually there is no limit since it
is roughly proportional to the number of guest pages.

Another point to note is that this patch removes the only user of
slot_bitmap which will cause some problems when we increase the number
of slots further.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

b99db1d3

KVM: MMU: Remove unused parameter level from __rmap_write_protect() · 245c3912

Takuya Yoshikawa authored Jan 08, 2013

No longer need to care about the mapping level in this function.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

245c3912

KVM: Write protect the updated slot only when dirty logging is enabled · c972f3b1

Takuya Yoshikawa authored Jan 08, 2013

Calling kvm_mmu_slot_remove_write_access() for a deleted slot does
nothing but search for non-existent mmu pages which have mappings to
that deleted memory; this is safe but a waste of time.

Since we want to make the function rmap based in a later patch, in a
manner which makes it unsafe to be called for a deleted slot, we makes
the caller see if the slot is non-zero and being dirty logged.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

c972f3b1

Merge branch 'kvm-ppc-next' of https://github.com/agraf/linux-2.6 into queue · aa11e3a8
Gleb Natapov authored Jan 14, 2013

aa11e3a8

10 Jan, 2013 11 commits

KVM: trace: Fix exit decoding. · f79ed82d

Cornelia Huck authored Jan 08, 2013

trace_kvm_userspace_exit has been missing the KVM_EXIT_WATCHDOG exit.

CC: Bharat Bhushan <r65777@freescale.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f79ed82d

KVM: MMU: fix infinite fault access retry · 7751babd

Xiao Guangrong authored Jan 08, 2013

We have two issues in current code:
- if target gfn is used as its page table, guest will refault then kvm will use
  small page size to map it. We need two #PF to fix its shadow page table

- sometimes, say a exception is triggered during vm-exit caused by #PF
  (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
  by the target gfn before go into page fault path, it will cause infinite
  loop:
  delete shadow pages shadowed by the gfn -> try to use large page size to map
  the gfn -> retry the access ->...

To fix these, we can adjust page size early if the target gfn is used as page
table
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

7751babd

KVM: MMU: fix Dirty bit missed if CR0.WP = 0 · c2288505

Xiao Guangrong authored Jan 08, 2013

If the write-fault access is from supervisor and CR0.WP is not set on the
vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte
and clears U bit. This is the chance that kvm can change pte access from
readonly to writable

Unfortunately, the pte access is the access of 'direct' shadow page table,
means direct sp.role.access = pte_access, then we will create a writable
spte entry on the readonly shadow page table. It will cause Dirty bit is
not tracked when two guest ptes point to the same large page. Note, it
does not have other impact except Dirty bit since cr0.wp is encoded into
sp.role

It can be fixed by adjusting pte access before establishing shadow page
table. Also, after that, no mmu specified code exists in the common function
and drop two parameters in set_spte
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

c2288505

KVM: PPC: BookE: Add EPR ONE_REG sync · 324b3e63

Alexander Graf authored Jan 04, 2013

We need to be able to read and write the contents of the EPR register
from user space.

This patch implements that logic through the ONE_REG API and declares
its (never implemented) SREGS counterpart as deprecated.
Signed-off-by: Alexander Graf <agraf@suse.de>

324b3e63

KVM: PPC: BookE: Implement EPR exit · 1c810636

Alexander Graf authored Jan 04, 2013

The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.

Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.

This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.
Signed-off-by: Alexander Graf <agraf@suse.de>

1c810636

KVM: PPC: BookE: Emulate mfspr on EPR · 37ecb257

Alexander Graf authored Jan 04, 2013

The EPR register is potentially valid for PR KVM as well, so we need
to emulate accesses to it. It's only defined for reading, so only
handle the mfspr case.
Signed-off-by: Alexander Graf <agraf@suse.de>

37ecb257

KVM: PPC: BookE: Allow irq deliveries to inject requests · b8c649a9

Alexander Graf authored Dec 20, 2012

When injecting an interrupt into guest context, we usually don't need
to check for requests anymore. At least not until today.

With the introduction of EPR, we will have to create a request when the
guest has successfully accepted an external interrupt though.

So we need to prepare the interrupt delivery to abort guest entry
gracefully. Otherwise we'd delay the EPR request.
Signed-off-by: Alexander Graf <agraf@suse.de>

b8c649a9

KVM: PPC: Fix mfspr/mtspr MMUCFG emulation · f2be6550

Mihai Caraman authored Dec 20, 2012

On mfspr/mtspr emulation path Book3E's MMUCFG SPR with value 1015 clashes
with G4's MSSSR0 SPR. Move MSSSR0 emulation from generic part to Books3S.
MSSSR0 also clashes with Book3S's DABRX SPR. DABRX was not explicitly
handled so Book3S execution flow will behave as before.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

f2be6550

KVM: PPC: Book3S: PR: Enable alternative instruction for SC 1 · 50c7bb80

Alexander Graf authored Dec 14, 2012

When running on top of pHyp, the hypercall instruction "sc 1" goes
straight into pHyp without trapping in supervisor mode.

So if we want to support PAPR guest in this configuration we need to
add a second way of accessing PAPR hypercalls, preferably with the
exact same semantics except for the instruction.

So let's overlay an officially reserved instruction and emulate PAPR
hypercalls whenever we hit that one.
Signed-off-by: Alexander Graf <agraf@suse.de>

50c7bb80

KVM: PPC: Only WARN on invalid emulation · 5a33169e

Alexander Graf authored Dec 14, 2012

When we hit an emulation result that we didn't expect, that is an error,
but it's nothing that warrants a BUG(), because it can be guest triggered.

So instead, let's only WARN() the user that this happened.
Signed-off-by: Alexander Graf <agraf@suse.de>

5a33169e

KVM: PPC: Fix SREGS documentation reference · 68e2ffed

Mihai Caraman authored Dec 11, 2012

Reflect the uapi folder change in SREGS API documentation.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Reviewed-by: Amos Kong <kongjianjun@gmail.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

68e2ffed

09 Jan, 2013 9 commits

KVM: s390: Gracefully handle busy conditions on ccw_device_start · b26ba22b

Christian Borntraeger authored Jan 07, 2013

In rare cases a virtio command might try to issue a ccw before a former
ccw was answered with a tsch. This will cause CC=2 (busy). Lets just
retry in that case.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b26ba22b

KVM: s390: Dynamic allocation of virtio-ccw I/O data. · 73fa21ea

Cornelia Huck authored Jan 07, 2013

Dynamically allocate any data structures like ccw used when
doing channel I/O. Otherwise, we'd need to add extra serialization
for the different callbacks using the same data structures.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

73fa21ea

KVM: x86 emulator: convert basic ALU ops to fastop · fb864fbc