Commits · 3766a4c693358cff33441310413e3776dbbf8ef0 · Kirill Smelkov / linux

05 Oct, 2012 28 commits

KVM: PPC: Move kvm_guest_enter call into generic code · 3766a4c6

Alexander Graf authored Aug 13, 2012

We need to call kvm_guest_enter in booke and book3s, so move its
call to generic code.
Signed-off-by: Alexander Graf <agraf@suse.de>

3766a4c6

KVM: PPC: Book3S: PR: Rework irq disabling · bd2be683

Alexander Graf authored Aug 13, 2012

Today, we disable preemption while inside guest context, because we need
to expose to the world that we are not in a preemptible context. However,
during that time we already have interrupts disabled, which would indicate
that we are in a non-preemptible context.

The reason the checks for irqs_disabled() fail for us though is that we
manually control hard IRQs and ignore all the lazy EE framework. Let's
stop doing that. Instead, let's always use lazy EE to indicate when we
want to disable IRQs, but do a special final switch that gets us into
EE disabled, but soft enabled state. That way when we get back out of
guest state, we are immediately ready to process interrupts.

This simplifies the code drastically and reduces the time that we appear
as preempt disabled.
Signed-off-by: Alexander Graf <agraf@suse.de>

bd2be683

KVM: PPC: Consistentify vcpu exit path · 24afa37b

Alexander Graf authored Aug 12, 2012

When getting out of __vcpu_run, let's be consistent about the state we
return in. We want to always

  * have IRQs enabled
  * have called kvm_guest_exit before
Signed-off-by: Alexander Graf <agraf@suse.de>

24afa37b

KVM: PPC: Book3S: PR: Indicate we're out of guest mode · 0652eaae

Alexander Graf authored Aug 12, 2012

When going out of guest mode, indicate that we are in vcpu->mode. That way
requests from other CPUs don't needlessly need to kick us to process them,
because it'll just happen next time we enter the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>

0652eaae

KVM: PPC: Exit guest context while handling exit · 706fb730

Alexander Graf authored Aug 12, 2012

The x86 implementation of KVM accounts for host time while processing
guest exits. Do the same for us.
Signed-off-by: Alexander Graf <agraf@suse.de>

706fb730

KVM: PPC: Book3S: PR: Only do resched check once per exit · c63ddcb4

Alexander Graf authored Aug 12, 2012

Now that we use our generic exit helper, we can safely drop our previous
kvm_resched that we used to trigger at the beginning of the exit handler
function.
Signed-off-by: Alexander Graf <agraf@suse.de>

c63ddcb4

KVM: PPC: BookE: Drop redundant vcpu->mode set · e85ad380
Alexander Graf authored Aug 12, 2012
```
We only need to set vcpu->mode to outside once.
Signed-off-by: Alexander Graf <agraf@suse.de>
```
e85ad380

KVM: PPC: Book3s: PR: Add (dumb) MMU Notifier support · 9b0cb3c8

Alexander Graf authored Aug 10, 2012

Now that we have very simple MMU Notifier support for e500 in place,
also add the same simple support to book3s. It gets us one step closer
to actual fast support.
Signed-off-by: Alexander Graf <agraf@suse.de>

9b0cb3c8

KVM: PPC: Use same kvmppc_prepare_to_enter code for booke and book3s_pr · 03d25c5b

Alexander Graf authored Aug 10, 2012

We need to do the same things when preparing to enter a guest for booke and
book3s_pr cores. Fold the generic code into a generic function that both call.
Signed-off-by: Alexander Graf <agraf@suse.de>

03d25c5b

KVM: PPC: BookE: No duplicate request != 0 check · 2d8185d4

Alexander Graf authored Aug 10, 2012

We only call kvmppc_check_requests() when vcpu->requests != 0, so drop
the redundant check in the function itself
Signed-off-by: Alexander Graf <agraf@suse.de>

2d8185d4

KVM: PPC: BookE: Add some more trace points · 6346046c

Alexander Graf authored Aug 08, 2012

Without trace points, debugging what exactly is going on inside guest
code can be very tricky. Add a few more trace points at places that
hopefully tell us more when things go wrong.
Signed-off-by: Alexander Graf <agraf@suse.de>

6346046c

KVM: PPC: E500: Implement MMU notifiers · 862d31f7

Alexander Graf authored Jul 31, 2012

The e500 target has lived without mmu notifiers ever since it got
introduced, but fails for the user space check on them with hugetlbfs.

So in order to get that one working, implement mmu notifiers in a
reasonably dumb fashion and be happy. On embedded hardware, we almost
never end up with mmu notifier calls, since most people don't overcommit.
Signed-off-by: Alexander Graf <agraf@suse.de>

862d31f7

KVM: PPC: BookE: Add support for vcpu->mode · d69c6436

Alexander Graf authored Aug 08, 2012

Generic KVM code might want to know whether we are inside guest context
or outside. It also wants to be able to push us out of guest context.

Add support to the BookE code for the generic vcpu->mode field that describes
the above states.
Signed-off-by: Alexander Graf <agraf@suse.de>

d69c6436

KVM: PPC: BookE: Add check_requests helper function · 4ffc6356

Alexander Graf authored Aug 08, 2012

We need a central place to check for pending requests in. Add one that
only does the timer check we already do in a different place.

Later, this central function can be extended by more checks.
Signed-off-by: Alexander Graf <agraf@suse.de>

4ffc6356

powerpc/epapr: export epapr_hypercall_start · 8043e494

Scott Wood authored Aug 10, 2012

This fixes breakage introduced by the following commit:

  commit 6d2d82627f4f1e96a33664ace494fa363e0495cb
  Author: Liu Yu-B13201 <Yu.Liu@freescale.com>
  Date:   Tue Jul 3 05:48:56 2012 +0000

    PPC: Don't use hardcoded opcode for ePAPR hcall invocation

when a driver that uses ePAPR hypercalls is built as a module.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

8043e494

KVM: PPC: Quieten message about allocating linear regions · 1340f3e8

Paul Mackerras authored Aug 06, 2012

This is printed once for every RMA or HPT region that get
preallocated.  If one preallocates hundreds of such regions
(in order to run hundreds of KVM guests), that gets rather
painful, so make it a bit quieter.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

1340f3e8

KVM: PPC: E500: Fix clear_tlb_refs · 2bb890f5

Alexander Graf authored Aug 02, 2012

Our mapping code assumes that TLB0 entries are always mapped. However, after
calling clear_tlb_refs() this is no longer the case.

Map them dynamically if we find an entry unmapped in TLB0.
Signed-off-by: Alexander Graf <agraf@suse.de>

2bb890f5

KVM: PPC: BookE: Expose remote TLB flushes in debugfs · cf1c5ca4

Alexander Graf authored Aug 01, 2012

We're already counting remote TLB flushes in a variable, but don't export
it to user space yet. Do so, so we know what's going on.
Signed-off-by: Alexander Graf <agraf@suse.de>

cf1c5ca4

KVM: PPC: Expose SYNC cap based on mmu notifiers · f4800b1f

Alexander Graf authored Aug 07, 2012

Semantically, the "SYNC" cap means that we have mmu notifiers available.
Express this in our #ifdef'ery around the feature, so that we can be sure
we don't miss out on ppc targets when they get their implementation.
Signed-off-by: Alexander Graf <agraf@suse.de>

f4800b1f

KVM: PPC: PR: Use generic tracepoint for guest exit · 97c95059

Alexander Graf authored Aug 02, 2012

We want to have tracing information on guest exits for booke as well
as book3s. Since most information is identical, use a common trace point.
Signed-off-by: Alexander Graf <agraf@suse.de>

97c95059

PPC: Don't use hardcoded opcode for ePAPR hcall invocation · 8e525d59

Liu Yu-B13201 authored Jul 03, 2012

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

8e525d59

powerpc/fsl-soc: use CONFIG_EPAPR_PARAVIRT for hcalls · 305bcf26

Scott Wood authored Jul 03, 2012

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

305bcf26

PPC: select EPAPR_PARAVIRT for all users of epapr hcalls · 40656397

Stuart Yoder authored Jul 03, 2012

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

40656397

KVM: PPC: ev_idle hcall support for e500 guests · 2f979de8

Liu Yu-B13201 authored Jul 03, 2012

Signed-off-by: Liu Yu <yu.liu@freescale.com>
[varun: 64-bit changes]
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

2f979de8

KVM: PPC: Add support for ePAPR idle hcall in host kernel · 9202e076

Liu Yu-B13201 authored Jul 03, 2012

And add a new flag definition in kvm_ppc_pvinfo to indicate
whether the host supports the EV_IDLE hcall.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
[stuart.yoder@freescale.com: cleanup,fixes for conditions allowing idle]
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
[agraf: fix typo]
Signed-off-by: Alexander Graf <agraf@suse.de>

9202e076

KVM: PPC: add pvinfo for hcall opcodes on e500mc/e5500 · 784bafac

Stuart Yoder authored Jul 03, 2012

Signed-off-by: Liu Yu <yu.liu@freescale.com>
[stuart: factored this out from idle hcall support in host patch]
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

784bafac

KVM: PPC: use definitions in epapr header for hcalls · fdcf8bd7

Stuart Yoder authored Jul 03, 2012

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

fdcf8bd7

PPC: epapr: create define for return code value of success · e13dcc1a
Stuart Yoder authored Jul 03, 2012
```
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
```
e13dcc1a

27 Sep, 2012 1 commit

KVM: s390: Fix vcpu_load handling in interrupt code · 3d11df7a

Christian Borntraeger authored Sep 27, 2012

Recent changes (KVM: make processes waiting on vcpu mutex killable)
now requires to check the return value of vcpu_load. This triggered
a warning in s390 specific kvm code. Turns out that we can actually
remove the put/load, since schedule will do the right thing via
the preempt notifiers.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

3d11df7a

23 Sep, 2012 2 commits

KVM: x86: Fix guest debug across vcpu INIT reset · c8639010

Jan Kiszka authored Sep 21, 2012

If we reset a vcpu on INIT, we so far overwrote dr7 as provided by
KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally.

Fix this by saving the dr7 used for guest debugging and calculating the
effective register value as well as switch_db_regs on any potential
change. This will change to focus of the set_guest_debug vendor op to
update_dp_bp_intercept.

Found while trying to stop on start_secondary.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

c8639010

KVM: Add resampling irqfds for level triggered interrupts · 7a84428a

Alex Williamson authored Sep 21, 2012

To emulate level triggered interrupts, add a resample option to
KVM_IRQFD.  When specified, a new resamplefd is provided that notifies
the user when the irqchip has been resampled by the VM.  This may, for
instance, indicate an EOI.  Also in this mode, posting of an interrupt
through an irqfd only asserts the interrupt.  On resampling, the
interrupt is automatically de-asserted prior to user notification.
This enables level triggered interrupts to be posted and re-enabled
from vfio with no userspace intervention.

All resampling irqfds can make use of a single irq source ID, so we
reserve a new one for this interface.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

7a84428a

20 Sep, 2012 9 commits

KVM: optimize apic interrupt delivery · 1e08ec4a

Gleb Natapov authored Sep 13, 2012

Most interrupt are delivered to only one vcpu. Use pre-build tables to
find interrupt destination instead of looping through all vcpus. In case
of logical mode loop only through vcpus in a logical cluster irq is sent
to.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

1e08ec4a

Merge branch 'queue' into next · 1d86b5cc

Avi Kivity authored Sep 20, 2012

* queue:
  KVM: MMU: Eliminate pointless temporary 'ac'
  KVM: MMU: Avoid access/dirty update loop if all is well
  KVM: MMU: Eliminate eperm temporary
  KVM: MMU: Optimize is_last_gpte()
  KVM: MMU: Simplify walk_addr_generic() loop
  KVM: MMU: Optimize pte permission checks
  KVM: MMU: Update accessed and dirty bits after guest pagetable walk
  KVM: MMU: Move gpte_access() out of paging_tmpl.h
  KVM: MMU: Optimize gpte_access() slightly
  KVM: MMU: Push clean gpte write protection out of gpte_access()
  KVM: clarify kvmclock documentation
  KVM: make processes waiting on vcpu mutex killable
  KVM: SVM: Make use of asm.h
  KVM: VMX: Make use of asm.h
  KVM: VMX: Make lto-friendly
Signed-off-by: Avi Kivity <avi@redhat.com>

1d86b5cc

KVM: MMU: Eliminate pointless temporary 'ac' · c5421519

Avi Kivity authored Sep 19, 2012

'ac' essentially reconstructs the 'access' variable we already
have, except for the PFERR_PRESENT_MASK and PFERR_RSVD_MASK.  As
these are not used by callees, just use 'access' directly.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

c5421519

KVM: MMU: Avoid access/dirty update loop if all is well · b514c30f

Avi Kivity authored Sep 16, 2012

Keep track of accessed/dirty bits; if they are all set, do not
enter the accessed/dirty update loop.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

b514c30f

KVM: MMU: Eliminate eperm temporary · 71331a1d

Avi Kivity authored Sep 16, 2012

'eperm' is no longer used in the walker loop, so we can eliminate it.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

71331a1d

KVM: MMU: Optimize is_last_gpte() · 6fd01b71

Avi Kivity authored Sep 12, 2012

Instead of branchy code depending on level, gpte.ps, and mmu configuration,
prepare everything in a bitmap during mode changes and look it up during
runtime.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

6fd01b71

KVM: MMU: Simplify walk_addr_generic() loop · 13d22b6a

Avi Kivity authored Sep 12, 2012

The page table walk is coded as an infinite loop, with a special
case on the last pte.

Code it as an ordinary loop with a termination condition on the last
pte (large page or walk length exhausted), and put the last pte handling
code after the loop where it belongs.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

13d22b6a

KVM: MMU: Optimize pte permission checks · 97d64b78

Avi Kivity authored Sep 12, 2012

walk_addr_generic() permission checks are a maze of branchy code, which is
performed four times per lookup.  It depends on the type of access, efer.nxe,
cr0.wp, cr4.smep, and in the near future, cr4.smap.

Optimize this away by precalculating all variants and storing them in a
bitmap.  The bitmap is recalculated when rarely-changing variables change
(cr0, cr4) and is indexed by the often-changing variables (page fault error
code, pte access permissions).

The permission check is moved to the end of the loop, otherwise an SMEP
fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
Noted by Xiao Guangrong.

The result is short, branch-free code.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

97d64b78

KVM: MMU: Update accessed and dirty bits after guest pagetable walk · 8cbc7069

Avi Kivity authored Sep 16, 2012

While unspecified, the behaviour of Intel processors is to first
perform the page table walk, then, if the walk was successful, to
atomically update the accessed and dirty bits of walked paging elements.

While we are not required to follow this exactly, doing so will allow us
to perform the access permissions check after the walk is complete, rather
than after each walk step.

(the tricky case is SMEP: a zero in any pte's U bit makes the referenced
page a supervisor page, so we can't fault on a one bit during the walk
itself).
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

8cbc7069