Commits · a3e06bbe8445f57eb949e6474c5a9b30f24d2057 · nexedi / linux

05 Oct, 2011 1 commit

KVM: emulate lapic tsc deadline timer for guest · a3e06bbe

Liu, Jinsong authored Sep 22, 2011

This patch emulate lapic tsc deadline timer for guest:
Enumerate tsc deadline timer capability by CPUID;
Enable tsc deadline timer mode by lapic MMIO;
Start tsc deadline timer by WRMSR;

[jan: use do_div()]
[avi: fix for !irqchip_in_kernel()]
[marcelo: another fix for !irqchip_in_kernel()]
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

a3e06bbe

25 Sep, 2011 39 commits

x86: TSC deadline definitions · b90dfb04

Liu, Jinsong authored Sep 22, 2011

This pre-defination is preparing for KVM tsc deadline timer emulation, but
theirself are not kvm specific.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b90dfb04

KVM: Fix simultaneous NMIs · 7460fb4a

Avi Kivity authored Sep 20, 2011

If simultaneous NMIs happen, we're supposed to queue the second
and next (collapsing them), but currently we sometimes collapse
the second into the first.

Fix by using a counter for pending NMIs instead of a bool; since
the counter limit depends on whether the processor is currently
in an NMI handler, which can only be checked in vcpu context
(via the NMI mask), we add a new KVM_REQ_NMI to request recalculation
of the counter.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

7460fb4a

KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode · 1cd196ea
Avi Kivity authored Sep 13, 2011
```
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
1cd196ea
KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode · d4b4325f
Avi Kivity authored Sep 13, 2011
```
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
d4b4325f

KVM: x86 emulator: streamline decode of segment registers · c191a7a0

Avi Kivity authored Sep 13, 2011

The opcodes

  push %seg
  pop %seg
  l%seg, %mem, %reg  (e.g. lds/les/lss/lfs/lgs)

all have an segment register encoded in the instruction.  To allow reuse,
decode the segment number into src2 during the decode stage instead of the
execution stage.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

c191a7a0

KVM: x86 emulator: simplify OpMem64 decode · 41ddf978

Avi Kivity authored Sep 13, 2011

Use the same technique as the other OpMem variants, and goto mem_common.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

41ddf978

KVM: x86 emulator: switch src decode to decode_operand() · 0fe59128
Avi Kivity authored Sep 13, 2011
```
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
0fe59128

KVM: x86 emulator: qualify OpReg inhibit_byte_regs hack · 5217973e

Avi Kivity authored Sep 13, 2011

OpReg decoding has a hack that inhibits byte registers for movsx and movzx
instructions.  It should be replaced by something better, but meanwhile,
qualify that the hack is only active for the destination operand.

Note these instructions only use OpReg for the destination, but better to
be explicit about it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

5217973e

KVM: x86 emulator: switch OpImmUByte decode to decode_imm() · 608aabe3

Avi Kivity authored Sep 13, 2011

Similar to SrcImmUByte.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

608aabe3

KVM: x86 emulator: free up some flag bits near src, dst · 20c29ff2

Avi Kivity authored Sep 13, 2011

Op fields are going to grow by a bit, we need two free bits.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

20c29ff2

KVM: x86 emulator: switch src2 to generic decode_operand() · 4dd6a57d
Avi Kivity authored Sep 13, 2011
```
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
4dd6a57d

KVM: x86 emulator: expand decode flags to 64 bits · b1ea50b2

Avi Kivity authored Sep 13, 2011

Unifiying the operands means not taking advantage of the fact that some
operand types can only go into certain operands (for example, DI can only
be used by the destination), so we need more bits to hold the operand type.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b1ea50b2

KVM: x86 emulator: split dst decode to a generic decode_operand() · a9945549

Avi Kivity authored Sep 13, 2011

Instead of decoding each operand using its own code, use a generic
function.  Start with the destination operand.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

a9945549

KVM: x86 emulator: move memop, memopp into emulation context · f09ed83e

Avi Kivity authored Sep 13, 2011

Simplifies further generalization of decode.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f09ed83e

KVM: x86 emulator: convert group 3 instructions to direct decode · 3329ece1
Avi Kivity authored Sep 13, 2011
```
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
3329ece1

KVM: Split up MSI-X assigned device IRQ handler · cc079396

Jan Kiszka authored Sep 12, 2011

The threaded IRQ handler for MSI-X has almost nothing in common with the
INTx/MSI handler. Move its code into a dedicated handler.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

cc079396

KVM: x86: Add module parameter for lapic periodic timer limit · 9bc5791d

Jan Kiszka authored Sep 12, 2011

Certain guests, specifically RTOSes, request faster periodic timers than
what we allow by default. Add a module parameter to adjust the limit for
non-standard setups. Also add a rate-limited warning in case the guest
requested more.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

9bc5791d

KVM: Clean up and extend rate-limited output · bd80158a

Jan Kiszka authored Sep 12, 2011

The use of printk_ratelimit is discouraged, replace it with
pr*_ratelimited or __ratelimit. While at it, convert remaining
guest-triggerable printks to rate-limited variants.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

bd80158a

KVM: x86: Avoid guest-triggerable printks in APIC model · 7712de87

Jan Kiszka authored Sep 12, 2011

Convert remaining printks that the guest can trigger to apic_printk.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

7712de87

KVM: x86: Move kvm_trace_exit into atomic vmexit section · 1e2b1dd7

Jan Kiszka authored Sep 12, 2011

This avoids that events causing the vmexit are recorded before the
actual exit reason.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

1e2b1dd7

KVM: x86 emulator: disable writeback for TEST · caa8a168

Avi Kivity authored Sep 11, 2011

The TEST instruction doesn't write its destination operand.  This
could cause problems if an MMIO register was accessed using the TEST
instruction.  Recently Windows XP was observed to use TEST against
the APIC ICR; this can cause spurious IPIs.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

caa8a168

KVM: Avoid needless registrations of IRQ ack notifier for assigned devices · c61fa9d6

Jan Kiszka authored Sep 11, 2011

We only perform work in kvm_assigned_dev_ack_irq if the guest IRQ is of
INTx type. This completely avoids the callback invocation in non-INTx
cases by registering the IRQ ack notifier only for INTx.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

c61fa9d6

KVM: Clean up unneeded void pointer casts · 9f9f6b78

Jan Kiszka authored Sep 11, 2011

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

9f9f6b78

KVM: x86 emulator: simplify emulate_1op_rax_rdx() · e8f2b1d6

Avi Kivity authored Sep 07, 2011

emulate_1op_rax_rdx() is always called with the same parameters.  Simplify
by passing just the emulation context.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e8f2b1d6

KVM: x86 emulator: merge the two emulate_1op_rax_rdx implementations · 9fef72ce

Avi Kivity authored Sep 07, 2011

We have two emulate-with-extended-accumulator implementations: once
which expect traps (_ex) and one which doesn't (plain).  Drop the
plain implementation and always use the one which expects traps;
it will simply return 0 in the _ex argument and we can happily ignore
it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

9fef72ce

KVM: x86 emulator: simplify emulate_1op() · d1eef45d

Avi Kivity authored Sep 07, 2011

emulate_1op() is always called with the same parameters.  Simplify
by passing just the emulation context.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d1eef45d

KVM: x86 emulator: simplify emulate_2op_cl() · 29053a60

Avi Kivity authored Sep 07, 2011

emulate_2op_cl() is always called with the same parameters.  Simplify
by passing just the emulation context.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

29053a60

KVM: x86 emulator: simplify emulate_2op_cl() · 761441b9

Avi Kivity authored Sep 07, 2011

emulate_2op_cl() is always called with the same parameters.  Simplify
by passing just the emulation context.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

761441b9

KVM: x86 emulator: simplify emulate_2op_SrcV() · a31b9cea

Avi Kivity authored Sep 07, 2011

emulate_2op_SrcV(), and its siblings, emulate_2op_SrcV_nobyte()
and emulate_2op_SrcB(), all use the same calling conventions
and all get passed exactly the same parameters.  Simplify them
by passing just the emulation context.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

a31b9cea

KVM: Update documentation to include detailed ENABLE_CAP description · 821246a5

Alexander Graf authored Aug 31, 2011

We have an ioctl that enables capabilities individually, but no description
on what exactly happens when we enable a capability using this ioctl.

This patch adds documentation for capability enabling in a new section
of the API documentation.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

821246a5

KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code · 19ccb76a

Paul Mackerras authored Jul 23, 2011

With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel. This is inefficient.

This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode. When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle. Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction. When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.

This has some other ramifications. First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle. This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running. In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.

This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.

Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.

Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.

Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

19ccb76a

KVM: PPC: book3s_pr: Simplify transitions between virtual and real mode · 02143947

Paul Mackerras authored Jul 23, 2011

This simplifies the way that the book3s_pr makes the transition to
real mode when entering the guest.  We now call kvmppc_entry_trampoline
(renamed from kvmppc_rmcall) in the base kernel using a normal function
call instead of doing an indirect call through a pointer in the vcpu.
If kvm is a module, the module loader takes care of generating a
trampoline as it does for other calls to functions outside the module.

kvmppc_entry_trampoline then disables interrupts and jumps to
kvmppc_handler_trampoline_enter in real mode using an rfi[d].
That then uses the link register as the address to return to
(potentially in module space) when the guest exits.

This also simplifies the way that we call the Linux interrupt handler
when we exit the guest due to an external, decrementer or performance
monitor interrupt.  Instead of turning on the MMU, then deciding that
we need to call the Linux handler and turning the MMU back off again,
we now go straight to the handler at the point where we would turn the
MMU on.  The handler will then return to the virtual-mode code
(potentially in the module).

Along the way, this moves the setting and clearing of the HID5 DCBZ32
bit into real-mode interrupts-off code, and also makes sure that
we clear the MSR[RI] bit before loading values into SRR0/1.

The net result is that we no longer need any code addresses to be
stored in vcpu->arch.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

02143947

KVM: PPC: Assemble book3s{,_hv}_rmhandlers.S separately · 177339d7

Paul Mackerras authored Jul 23, 2011

This makes arch/powerpc/kvm/book3s_rmhandlers.S and
arch/powerpc/kvm/book3s_hv_rmhandlers.S be assembled as
separate compilation units rather than having them #included in
arch/powerpc/kernel/exceptions-64s.S.  We no longer have any
conditional branches between the exception prologs in
exceptions-64s.S and the KVM handlers, so there is no need to
keep their contents close together in the vmlinux image.

In their current location, they are using up part of the limited
space between the first-level interrupt handlers and the firmware
NMI data area at offset 0x7000, and with some kernel configurations
this area will overflow (e.g. allyesconfig), leading to an
"attempt to .org backwards" error when compiling exceptions-64s.S.

Moving them out requires that we add some #includes that the
book3s_{,hv_}rmhandlers.S code was previously getting implicitly
via exceptions-64s.S.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

177339d7

KVM: PPC: Add sanity checking to vcpu_run · af8f38b3

Alexander Graf authored Aug 10, 2011

There are multiple features in PowerPC KVM that can now be enabled
depending on the user's wishes. Some of the combinations don't make
sense or don't work though.

So this patch adds a way to check if the executing environment would
actually be able to run the guest properly. It also adds sanity
checks if PVR is set (should always be true given the current code
flow), if PAPR is only used with book3s_64 where it works and that
HV KVM is only used in PAPR mode.
Signed-off-by: Alexander Graf <agraf@suse.de>

af8f38b3

KVM: PPC: Enable the PAPR CAP for Book3S · 930b412a

Alexander Graf authored Aug 08, 2011

Now that Book3S PV mode can also run PAPR guests, we can add a PAPR cap and
enable it for all Book3S targets. Enabling that CAP switches KVM into PAPR
mode.
Signed-off-by: Alexander Graf <agraf@suse.de>

930b412a

KVM: PPC: Support SC1 hypercalls for PAPR in PR mode · a668f2bd

Alexander Graf authored Aug 08, 2011

PAPR defines hypercalls as SC1 instructions. Using these, the guest modifies
page tables and does other privileged operations that it wouldn't be allowed
to do in supervisor mode.

This patch adds support for PR KVM to trap these instructions and route them
through the same PAPR hypercall interface that we already use for HV style
KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>

a668f2bd

KVM: PPC: Stub emulate CFAR and PURR SPRs · aacf9aa3

Alexander Graf authored Aug 08, 2011

Recent Linux versions use the CFAR and PURR SPRs, but don't really care about
their contents (yet). So for now, we can simply return 0 when the guest wants
to read them.
Signed-off-by: Alexander Graf <agraf@suse.de>

aacf9aa3

KVM: PPC: Add PAPR hypercall code for PR mode · 0254f074

Alexander Graf authored Aug 08, 2011

When running a PAPR guest, we need to handle a few hypercalls in kernel space,
most prominently the page table invalidation (to sync the shadows).

So this patch adds handling for a few PAPR hypercalls to PR mode KVM. I tried
to share the code with HV mode, but it ended up being a lot easier this way
around, as the two differ too much in those details.
Signed-off-by: Alexander Graf <agraf@suse.de>

---

v1 -> v2:

  - whitespace fix

0254f074

KVM: PPC: Add support for explicit HIOR setting · a15bd354

Alexander Graf authored Aug 08, 2011

Until now, we always set HIOR based on the PVR, but this is just wrong.
Instead, we should be setting HIOR explicitly, so user space can decide
what the initial HIOR value is - just like on real hardware.

We keep the old PVR based way around for backwards compatibility, but
once user space uses the SREGS based method, we drop the PVR logic.
Signed-off-by: Alexander Graf <agraf@suse.de>

a15bd354