Commit 5fabc487 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

* 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  KVM: IOMMU: Disable device assignment without interrupt remapping
  KVM: MMU: trace mmio page fault
  KVM: MMU: mmio page fault support
  KVM: MMU: reorganize struct kvm_shadow_walk_iterator
  KVM: MMU: lockless walking shadow page table
  KVM: MMU: do not need atomicly to set/clear spte
  KVM: MMU: introduce the rules to modify shadow page table
  KVM: MMU: abstract some functions to handle fault pfn
  KVM: MMU: filter out the mmio pfn from the fault pfn
  KVM: MMU: remove bypass_guest_pf
  KVM: MMU: split kvm_mmu_free_page
  KVM: MMU: count used shadow pages on prepareing path
  KVM: MMU: rename 'pt_write' to 'emulate'
  KVM: MMU: cleanup for FNAME(fetch)
  KVM: MMU: optimize to handle dirty bit
  KVM: MMU: cache mmio info on page fault path
  KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code
  KVM: MMU: do not update slot bitmap if spte is nonpresent
  KVM: MMU: fix walking shadow page table
  KVM guest: KVM Steal time registration
  ...
parents c61264f9 3f68b031
......@@ -1159,10 +1159,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
for all guests.
Default is 1 (enabled) if in 64bit or 32bit-PAE mode
kvm-intel.bypass_guest_pf=
[KVM,Intel] Disables bypassing of guest page faults
on Intel chips. Default is 1 (enabled)
kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
......@@ -1737,6 +1733,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
fault handling.
no-steal-acc [X86,KVM] Disable paravirtualized steal time accounting.
steal time is computed, but won't influence scheduler
behaviour
nolapic [X86-32,APIC] Do not enable or use the local APIC.
nolapic_timer [X86-32,APIC] Do not use the local APIC timer.
......
......@@ -180,6 +180,19 @@ KVM_CHECK_EXTENSION ioctl() to determine the value for max_vcpus at run-time.
If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4
cpus max.
On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
threads in one or more virtual CPU cores. (This is because the
hardware requires all the hardware threads in a CPU core to be in the
same partition.) The KVM_CAP_PPC_SMT capability indicates the number
of vcpus per virtual core (vcore). The vcore id is obtained by
dividing the vcpu id by the number of vcpus per vcore. The vcpus in a
given vcore will always be in the same physical core as each other
(though that might be a different physical core from time to time).
Userspace can control the threading (SMT) mode of the guest by its
allocation of vcpu ids. For example, if userspace wants
single-threaded guest vcpus, it should make all vcpu ids be a multiple
of the number of vcpus per vcore.
4.8 KVM_GET_DIRTY_LOG (vm ioctl)
Capability: basic
......@@ -1143,15 +1156,10 @@ Assigns an IRQ to a passed-through device.
struct kvm_assigned_irq {
__u32 assigned_dev_id;
__u32 host_irq;
__u32 host_irq; /* ignored (legacy field) */
__u32 guest_irq;
__u32 flags;
union {
struct {
__u32 addr_lo;
__u32 addr_hi;
__u32 data;
} guest_msi;
__u32 reserved[12];
};
};
......@@ -1239,8 +1247,10 @@ Type: vm ioctl
Parameters: struct kvm_assigned_msix_nr (in)
Returns: 0 on success, -1 on error
Set the number of MSI-X interrupts for an assigned device. This service can
only be called once in the lifetime of an assigned device.
Set the number of MSI-X interrupts for an assigned device. The number is
reset again by terminating the MSI-X assignment of the device via
KVM_DEASSIGN_DEV_IRQ. Calling this service more than once at any earlier
point will fail.
struct kvm_assigned_msix_nr {
__u32 assigned_dev_id;
......@@ -1291,6 +1301,135 @@ Returns the tsc frequency of the guest. The unit of the return value is
KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
error.
4.56 KVM_GET_LAPIC
Capability: KVM_CAP_IRQCHIP
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_lapic_state (out)
Returns: 0 on success, -1 on error
#define KVM_APIC_REG_SIZE 0x400
struct kvm_lapic_state {
char regs[KVM_APIC_REG_SIZE];
};
Reads the Local APIC registers and copies them into the input argument. The
data format and layout are the same as documented in the architecture manual.
4.57 KVM_SET_LAPIC
Capability: KVM_CAP_IRQCHIP
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_lapic_state (in)
Returns: 0 on success, -1 on error
#define KVM_APIC_REG_SIZE 0x400
struct kvm_lapic_state {
char regs[KVM_APIC_REG_SIZE];
};
Copies the input argument into the the Local APIC registers. The data format
and layout are the same as documented in the architecture manual.
4.58 KVM_IOEVENTFD
Capability: KVM_CAP_IOEVENTFD
Architectures: all
Type: vm ioctl
Parameters: struct kvm_ioeventfd (in)
Returns: 0 on success, !0 on error
This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address
within the guest. A guest write in the registered address will signal the
provided event instead of triggering an exit.
struct kvm_ioeventfd {
__u64 datamatch;
__u64 addr; /* legal pio/mmio address */
__u32 len; /* 1, 2, 4, or 8 bytes */
__s32 fd;
__u32 flags;
__u8 pad[36];
};
The following flags are defined:
#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
#define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio)
#define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
If datamatch flag is set, the event will be signaled only if the written value
to the registered address is equal to datamatch in struct kvm_ioeventfd.
4.62 KVM_CREATE_SPAPR_TCE
Capability: KVM_CAP_SPAPR_TCE
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_create_spapr_tce (in)
Returns: file descriptor for manipulating the created TCE table
This creates a virtual TCE (translation control entry) table, which
is an IOMMU for PAPR-style virtual I/O. It is used to translate
logical addresses used in virtual I/O into guest physical addresses,
and provides a scatter/gather capability for PAPR virtual I/O.
/* for KVM_CAP_SPAPR_TCE */
struct kvm_create_spapr_tce {
__u64 liobn;
__u32 window_size;
};
The liobn field gives the logical IO bus number for which to create a
TCE table. The window_size field specifies the size of the DMA window
which this TCE table will translate - the table will contain one 64
bit TCE entry for every 4kiB of the DMA window.
When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
table has been created using this ioctl(), the kernel will handle it
in real mode, updating the TCE table. H_PUT_TCE calls for other
liobns will cause a vm exit and must be handled by userspace.
The return value is a file descriptor which can be passed to mmap(2)
to map the created TCE table into userspace. This lets userspace read
the entries written by kernel-handled H_PUT_TCE calls, and also lets
userspace update the TCE table directly which is useful in some
circumstances.
4.63 KVM_ALLOCATE_RMA
Capability: KVM_CAP_PPC_RMA
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_allocate_rma (out)
Returns: file descriptor for mapping the allocated RMA
This allocates a Real Mode Area (RMA) from the pool allocated at boot
time by the kernel. An RMA is a physically-contiguous, aligned region
of memory used on older POWER processors to provide the memory which
will be accessed by real-mode (MMU off) accesses in a KVM guest.
POWER processors support a set of sizes for the RMA that usually
includes 64MB, 128MB, 256MB and some larger powers of two.
/* for KVM_ALLOCATE_RMA */
struct kvm_allocate_rma {
__u64 rma_size;
};
The return value is a file descriptor which can be passed to mmap(2)
to map the allocated RMA into userspace. The mapped area can then be
passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
RMA for a virtual machine. The size of the RMA in bytes (which is
fixed at host kernel boot time) is returned in the rma_size field of
the argument structure.
The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
is supported; 2 if the processor requires all virtual machines to have
an RMA, or 1 if the processor can use an RMA but doesn't require it,
because it supports the Virtual RMA (VRMA) facility.
5. The kvm_run structure
Application code obtains a pointer to the kvm_run structure by
......@@ -1473,6 +1612,23 @@ Userspace can now handle the hypercall and when it's done modify the gprs as
necessary. Upon guest entry all guest GPRs will then be replaced by the values
in this struct.
/* KVM_EXIT_PAPR_HCALL */
struct {
__u64 nr;
__u64 ret;
__u64 args[9];
} papr_hcall;
This is used on 64-bit PowerPC when emulating a pSeries partition,
e.g. with the 'pseries' machine type in qemu. It occurs when the
guest does a hypercall using the 'sc 1' instruction. The 'nr' field
contains the hypercall number (from the guest R3), and 'args' contains
the arguments (from the guest R4 - R12). Userspace should put the
return code in 'ret' and any extra returned values in args[].
The possible hypercalls are defined in the Power Architecture Platform
Requirements (PAPR) document available from www.power.org (free
developer registration required to access it).
/* Fix the size of the union. */
char padding[256];
};
......
......@@ -165,6 +165,10 @@ Shadow pages contain the following information:
Contains the value of efer.nxe for which the page is valid.
role.cr0_wp:
Contains the value of cr0.wp for which the page is valid.
role.smep_andnot_wp:
Contains the value of cr4.smep && !cr0.wp for which the page is valid
(pages for which this is true are different from other pages; see the
treatment of cr0.wp=0 below).
gfn:
Either the guest page table containing the translations shadowed by this
page, or the base page frame for linear translations. See role.direct.
......@@ -317,6 +321,20 @@ on fault type:
(user write faults generate a #PF)
In the first case there is an additional complication if CR4.SMEP is
enabled: since we've turned the page into a kernel page, the kernel may now
execute it. We handle this by also setting spte.nx. If we get a user
fetch or read fault, we'll change spte.u=1 and spte.nx=gpte.nx back.
To prevent an spte that was converted into a kernel page with cr0.wp=0
from being written by the kernel after cr0.wp has changed to 1, we make
the value of cr0.wp part of the page role. This means that an spte created
with one value of cr0.wp cannot be used when cr0.wp has a different value -
it will simply be missed by the shadow page lookup code. A similar issue
exists when an spte created with cr0.wp=0 and cr4.smep=0 is used after
changing cr4.smep to 1. To avoid this, the value of !cr0.wp && cr4.smep
is also made a part of the page role.
Large pages
===========
......
......@@ -185,3 +185,37 @@ MSR_KVM_ASYNC_PF_EN: 0x4b564d02
Currently type 2 APF will be always delivered on the same vcpu as
type 1 was, but guest should not rely on that.
MSR_KVM_STEAL_TIME: 0x4b564d03
data: 64-byte alignment physical address of a memory area which must be
in guest RAM, plus an enable bit in bit 0. This memory is expected to
hold a copy of the following structure:
struct kvm_steal_time {
__u64 steal;
__u32 version;
__u32 flags;
__u32 pad[12];
}
whose data will be filled in by the hypervisor periodically. Only one
write, or registration, is needed for each VCPU. The interval between
updates of this structure is arbitrary and implementation-dependent.
The hypervisor may update this structure at any time it sees fit until
anything with bit0 == 0 is written to it. Guest is required to make sure
this structure is initialized to zero.
Fields have the following meanings:
version: a sequence counter. In other words, guest has to check
this field before and after grabbing time information and make
sure they are both equal and even. An odd version indicates an
in-progress update.
flags: At this point, always zero. May be used to indicate
changes in this structure in the future.
steal: the amount of time in which this vCPU did not run, in
nanoseconds. Time during which the vcpu is idle, will not be
reported as steal time.
Nested VMX
==========
Overview
---------
On Intel processors, KVM uses Intel's VMX (Virtual-Machine eXtensions)
to easily and efficiently run guest operating systems. Normally, these guests
*cannot* themselves be hypervisors running their own guests, because in VMX,
guests cannot use VMX instructions.
The "Nested VMX" feature adds this missing capability - of running guest
hypervisors (which use VMX) with their own nested guests. It does so by
allowing a guest to use VMX instructions, and correctly and efficiently
emulating them using the single level of VMX available in the hardware.
We describe in much greater detail the theory behind the nested VMX feature,
its implementation and its performance characteristics, in the OSDI 2010 paper
"The Turtles Project: Design and Implementation of Nested Virtualization",
available at:
http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
Terminology
-----------
Single-level virtualization has two levels - the host (KVM) and the guests.
In nested virtualization, we have three levels: The host (KVM), which we call
L0, the guest hypervisor, which we call L1, and its nested guest, which we
call L2.
Known limitations
-----------------
The current code supports running Linux guests under KVM guests.
Only 64-bit guest hypervisors are supported.
Additional patches for running Windows under guest KVM, and Linux under
guest VMware server, and support for nested EPT, are currently running in
the lab, and will be sent as follow-on patchsets.
Running nested VMX
------------------
The nested VMX feature is disabled by default. It can be enabled by giving
the "nested=1" option to the kvm-intel module.
No modifications are required to user space (qemu). However, qemu's default
emulated CPU type (qemu64) does not list the "VMX" CPU feature, so it must be
explicitly enabled, by giving qemu one of the following options:
-cpu host (emulated CPU has all features of the real CPU)
-cpu qemu64,+vmx (add just the vmx feature to a named CPU type)
ABIs
----
Nested VMX aims to present a standard and (eventually) fully-functional VMX
implementation for the a guest hypervisor to use. As such, the official
specification of the ABI that it provides is Intel's VMX specification,
namely volume 3B of their "Intel 64 and IA-32 Architectures Software
Developer's Manual". Not all of VMX's features are currently fully supported,
but the goal is to eventually support them all, starting with the VMX features
which are used in practice by popular hypervisors (KVM and others).
As a VMX implementation, nested VMX presents a VMCS structure to L1.
As mandated by the spec, other than the two fields revision_id and abort,
this structure is *opaque* to its user, who is not supposed to know or care
about its internal structure. Rather, the structure is accessed through the
VMREAD and VMWRITE instructions.
Still, for debugging purposes, KVM developers might be interested to know the
internals of this structure; This is struct vmcs12 from arch/x86/kvm/vmx.c.
The name "vmcs12" refers to the VMCS that L1 builds for L2. In the code we
also have "vmcs01", the VMCS that L0 built for L1, and "vmcs02" is the VMCS
which L0 builds to actually run L2 - how this is done is explained in the
aforementioned paper.
For convenience, we repeat the content of struct vmcs12 here. If the internals
of this structure changes, this can break live migration across KVM versions.
VMCS12_REVISION (from vmx.c) should be changed if struct vmcs12 or its inner
struct shadow_vmcs is ever changed.
typedef u64 natural_width;
struct __packed vmcs12 {
/* According to the Intel spec, a VMCS region must start with
* these two user-visible fields */
u32 revision_id;
u32 abort;
u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */
u32 padding[7]; /* room for future expansion */
u64 io_bitmap_a;
u64 io_bitmap_b;
u64 msr_bitmap;
u64 vm_exit_msr_store_addr;
u64 vm_exit_msr_load_addr;
u64 vm_entry_msr_load_addr;
u64 tsc_offset;
u64 virtual_apic_page_addr;
u64 apic_access_addr;
u64 ept_pointer;
u64 guest_physical_address;
u64 vmcs_link_pointer;
u64 guest_ia32_debugctl;
u64 guest_ia32_pat;
u64 guest_ia32_efer;
u64 guest_pdptr0;
u64 guest_pdptr1;
u64 guest_pdptr2;
u64 guest_pdptr3;
u64 host_ia32_pat;
u64 host_ia32_efer;
u64 padding64[8]; /* room for future expansion */
natural_width cr0_guest_host_mask;
natural_width cr4_guest_host_mask;
natural_width cr0_read_shadow;
natural_width cr4_read_shadow;
natural_width cr3_target_value0;
natural_width cr3_target_value1;
natural_width cr3_target_value2;
natural_width cr3_target_value3;
natural_width exit_qualification;
natural_width guest_linear_address;
natural_width guest_cr0;
natural_width guest_cr3;
natural_width guest_cr4;
natural_width guest_es_base;
natural_width guest_cs_base;
natural_width guest_ss_base;
natural_width guest_ds_base;
natural_width guest_fs_base;
natural_width guest_gs_base;
natural_width guest_ldtr_base;
natural_width guest_tr_base;
natural_width guest_gdtr_base;
natural_width guest_idtr_base;
natural_width guest_dr7;
natural_width guest_rsp;
natural_width guest_rip;
natural_width guest_rflags;
natural_width guest_pending_dbg_exceptions;
natural_width guest_sysenter_esp;
natural_width guest_sysenter_eip;
natural_width host_cr0;
natural_width host_cr3;
natural_width host_cr4;
natural_width host_fs_base;
natural_width host_gs_base;
natural_width host_tr_base;
natural_width host_gdtr_base;
natural_width host_idtr_base;
natural_width host_ia32_sysenter_esp;
natural_width host_ia32_sysenter_eip;
natural_width host_rsp;
natural_width host_rip;
natural_width paddingl[8]; /* room for future expansion */
u32 pin_based_vm_exec_control;
u32 cpu_based_vm_exec_control;
u32 exception_bitmap;
u32 page_fault_error_code_mask;
u32 page_fault_error_code_match;
u32 cr3_target_count;
u32 vm_exit_controls;
u32 vm_exit_msr_store_count;
u32 vm_exit_msr_load_count;
u32 vm_entry_controls;
u32 vm_entry_msr_load_count;
u32 vm_entry_intr_info_field;
u32 vm_entry_exception_error_code;
u32 vm_entry_instruction_len;
u32 tpr_threshold;
u32 secondary_vm_exec_control;
u32 vm_instruction_error;
u32 vm_exit_reason;
u32 vm_exit_intr_info;
u32 vm_exit_intr_error_code;
u32 idt_vectoring_info_field;
u32 idt_vectoring_error_code;
u32 vm_exit_instruction_len;
u32 vmx_instruction_info;
u32 guest_es_limit;
u32 guest_cs_limit;
u32 guest_ss_limit;
u32 guest_ds_limit;
u32 guest_fs_limit;
u32 guest_gs_limit;
u32 guest_ldtr_limit;
u32 guest_tr_limit;
u32 guest_gdtr_limit;
u32 guest_idtr_limit;
u32 guest_es_ar_bytes;
u32 guest_cs_ar_bytes;
u32 guest_ss_ar_bytes;
u32 guest_ds_ar_bytes;
u32 guest_fs_ar_bytes;
u32 guest_gs_ar_bytes;
u32 guest_ldtr_ar_bytes;
u32 guest_tr_ar_bytes;
u32 guest_interruptibility_info;
u32 guest_activity_state;
u32 guest_sysenter_cs;
u32 host_ia32_sysenter_cs;
u32 padding32[8]; /* room for future expansion */
u16 virtual_processor_id;
u16 guest_es_selector;
u16 guest_cs_selector;
u16 guest_ss_selector;
u16 guest_ds_selector;
u16 guest_fs_selector;
u16 guest_gs_selector;
u16 guest_ldtr_selector;
u16 guest_tr_selector;
u16 host_es_selector;
u16 host_cs_selector;
u16 host_ss_selector;
u16 host_ds_selector;
u16 host_fs_selector;
u16 host_gs_selector;
u16 host_tr_selector;
};
Authors
-------
These patches were written by:
Abel Gordon, abelg <at> il.ibm.com
Nadav Har'El, nyh <at> il.ibm.com
Orit Wasserman, oritw <at> il.ibm.com
Ben-Ami Yassor, benami <at> il.ibm.com
Muli Ben-Yehuda, muli <at> il.ibm.com
With contributions by:
Anthony Liguori, aliguori <at> us.ibm.com
Mike Day, mdday <at> us.ibm.com
Michael Factor, factor <at> il.ibm.com
Zvi Dubitzky, dubi <at> il.ibm.com
And valuable reviews by:
Avi Kivity, avi <at> redhat.com
Gleb Natapov, gleb <at> redhat.com
Marcelo Tosatti, mtosatti <at> redhat.com
Kevin Tian, kevin.tian <at> intel.com
and others.
......@@ -68,9 +68,11 @@ page that contains parts of supervisor visible register state. The guest can
map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
With this hypercall issued the guest always gets the magic page mapped at the
desired location in effective and physical address space. For now, we always
map the page to -4096. This way we can access it using absolute load and store
functions. The following instruction reads the first field of the magic page:
desired location. The first parameter indicates the effective address when the
MMU is enabled. The second parameter indicates the address in real mode, if
applicable to the target. For now, we always map the page to -4096. This way we
can access it using absolute load and store functions. The following
instruction reads the first field of the magic page:
ld rX, -4096(0)
......
......@@ -281,6 +281,10 @@ paravirt_init_missing_ticks_accounting(int cpu)
pv_time_ops.init_missing_ticks_accounting(cpu);
}
struct jump_label_key;
extern struct jump_label_key paravirt_steal_enabled;
extern struct jump_label_key paravirt_steal_rq_enabled;
static inline int
paravirt_do_steal_accounting(unsigned long *new_itm)
{
......
......@@ -634,6 +634,8 @@ struct pv_irq_ops pv_irq_ops = {
* pv_time_ops
* time operations
*/
struct jump_label_key paravirt_steal_enabled;
struct jump_label_key paravirt_steal_rq_enabled;
static int
ia64_native_do_steal_accounting(unsigned long *new_itm)
......
......@@ -179,8 +179,9 @@ extern const char *powerpc_base_platform;
#define LONG_ASM_CONST(x) 0
#endif
#define CPU_FTR_HVMODE_206 LONG_ASM_CONST(0x0000000800000000)
#define CPU_FTR_HVMODE LONG_ASM_CONST(0x0000000200000000)
#define CPU_FTR_ARCH_201 LONG_ASM_CONST(0x0000000400000000)
#define CPU_FTR_ARCH_206 LONG_ASM_CONST(0x0000000800000000)
#define CPU_FTR_CFAR LONG_ASM_CONST(0x0000001000000000)
#define CPU_FTR_IABR LONG_ASM_CONST(0x0000002000000000)
#define CPU_FTR_MMCRA LONG_ASM_CONST(0x0000004000000000)
......@@ -401,9 +402,10 @@ extern const char *powerpc_base_platform;
CPU_FTR_MMCRA | CPU_FTR_CP_USE_DCBTZ | \
CPU_FTR_STCX_CHECKS_ADDRESS)
#define CPU_FTRS_PPC970 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_201 | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_CAN_NAP | CPU_FTR_MMCRA | \
CPU_FTR_CP_USE_DCBTZ | CPU_FTR_STCX_CHECKS_ADDRESS)
CPU_FTR_CP_USE_DCBTZ | CPU_FTR_STCX_CHECKS_ADDRESS | \
CPU_FTR_HVMODE)
#define CPU_FTRS_POWER5 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA | CPU_FTR_SMT | \
......@@ -417,13 +419,13 @@ extern const char *powerpc_base_platform;
CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_CFAR)
#define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_HVMODE_206 |\
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_ICSWX | CPU_FTR_CFAR)
CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE)
#define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
......
......@@ -61,19 +61,22 @@
#define EXC_HV H
#define EXC_STD
#define EXCEPTION_PROLOG_1(area) \
#define __EXCEPTION_PROLOG_1(area, extra, vec) \
GET_PACA(r13); \
std r9,area+EX_R9(r13); /* save r9 - r12 */ \
std r10,area+EX_R10(r13); \
std r11,area+EX_R11(r13); \
std r12,area+EX_R12(r13); \
BEGIN_FTR_SECTION_NESTED(66); \
mfspr r10,SPRN_CFAR; \
std r10,area+EX_CFAR(r13); \
END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \
GET_SCRATCH0(r9); \
std r9,area+EX_R13(r13); \
mfcr r9
mfcr r9; \
extra(vec); \
std r11,area+EX_R11(r13); \
std r12,area+EX_R12(r13); \
GET_SCRATCH0(r10); \
std r10,area+EX_R13(r13)
#define EXCEPTION_PROLOG_1(area, extra, vec) \
__EXCEPTION_PROLOG_1(area, extra, vec)
#define __EXCEPTION_PROLOG_PSERIES_1(label, h) \
ld r12,PACAKBASE(r13); /* get high part of &label */ \
......@@ -85,13 +88,65 @@
mtspr SPRN_##h##SRR1,r10; \
h##rfid; \
b . /* prevent speculative execution */
#define EXCEPTION_PROLOG_PSERIES_1(label, h) \
#define EXCEPTION_PROLOG_PSERIES_1(label, h) \
__EXCEPTION_PROLOG_PSERIES_1(label, h)
#define EXCEPTION_PROLOG_PSERIES(area, label, h) \
EXCEPTION_PROLOG_1(area); \
#define EXCEPTION_PROLOG_PSERIES(area, label, h, extra, vec) \
EXCEPTION_PROLOG_1(area, extra, vec); \
EXCEPTION_PROLOG_PSERIES_1(label, h);
#define __KVMTEST(n) \
lbz r10,HSTATE_IN_GUEST(r13); \
cmpwi r10,0; \
bne do_kvm_##n
#define __KVM_HANDLER(area, h, n) \
do_kvm_##n: \
ld r10,area+EX_R10(r13); \
stw r9,HSTATE_SCRATCH1(r13); \
ld r9,area+EX_R9(r13); \
std r12,HSTATE_SCRATCH0(r13); \
li r12,n; \
b kvmppc_interrupt
#define __KVM_HANDLER_SKIP(area, h, n) \
do_kvm_##n: \
cmpwi r10,KVM_GUEST_MODE_SKIP; \
ld r10,area+EX_R10(r13); \
beq 89f; \
stw r9,HSTATE_SCRATCH1(r13); \
ld r9,area+EX_R9(r13); \
std r12,HSTATE_SCRATCH0(r13); \
li r12,n; \
b kvmppc_interrupt; \
89: mtocrf 0x80,r9; \
ld r9,area+EX_R9(r13); \
b kvmppc_skip_##h##interrupt
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#define KVMTEST(n) __KVMTEST(n)
#define KVM_HANDLER(area, h, n) __KVM_HANDLER(area, h, n)
#define KVM_HANDLER_SKIP(area, h, n) __KVM_HANDLER_SKIP(area, h, n)
#else
#define KVMTEST(n)
#define KVM_HANDLER(area, h, n)
#define KVM_HANDLER_SKIP(area, h, n)
#endif
#ifdef CONFIG_KVM_BOOK3S_PR
#define KVMTEST_PR(n) __KVMTEST(n)
#define KVM_HANDLER_PR(area, h, n) __KVM_HANDLER(area, h, n)
#define KVM_HANDLER_PR_SKIP(area, h, n) __KVM_HANDLER_SKIP(area, h, n)
#else
#define KVMTEST_PR(n)
#define KVM_HANDLER_PR(area, h, n)
#define KVM_HANDLER_PR_SKIP(area, h, n)
#endif
#define NOTEST(n)
/*
* The common exception prolog is used for all except a few exceptions
* such as a segment miss on a kernel address. We have to be prepared
......@@ -164,57 +219,58 @@
.globl label##_pSeries; \
label##_pSeries: \
HMT_MEDIUM; \
DO_KVM vec; \
SET_SCRATCH0(r13); /* save r13 */ \
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common, EXC_STD)
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common, \
EXC_STD, KVMTEST_PR, vec)
#define STD_EXCEPTION_HV(loc, vec, label) \
. = loc; \
.globl label##_hv; \
label##_hv: \
HMT_MEDIUM; \
DO_KVM vec; \
SET_SCRATCH0(r13); /* save r13 */ \
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common, EXC_HV)
SET_SCRATCH0(r13); /* save r13 */ \
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common, \
EXC_HV, KVMTEST, vec)
#define __MASKABLE_EXCEPTION_PSERIES(vec, label, h) \
HMT_MEDIUM; \
DO_KVM vec; \
SET_SCRATCH0(r13); /* save r13 */ \
GET_PACA(r13); \
std r9,PACA_EXGEN+EX_R9(r13); /* save r9, r10 */ \
std r10,PACA_EXGEN+EX_R10(r13); \
#define __SOFTEN_TEST(h) \
lbz r10,PACASOFTIRQEN(r13); \
mfcr r9; \
cmpwi r10,0; \
beq masked_##h##interrupt; \
GET_SCRATCH0(r10); \
std r10,PACA_EXGEN+EX_R13(r13); \
std r11,PACA_EXGEN+EX_R11(r13); \
std r12,PACA_EXGEN+EX_R12(r13); \
ld r12,PACAKBASE(r13); /* get high part of &label */ \
ld r10,PACAKMSR(r13); /* get MSR value for kernel */ \
mfspr r11,SPRN_##h##SRR0; /* save SRR0 */ \
LOAD_HANDLER(r12,label##_common) \
mtspr SPRN_##h##SRR0,r12; \
mfspr r12,SPRN_##h##SRR1; /* and SRR1 */ \
mtspr SPRN_##h##SRR1,r10; \
h##rfid; \
b . /* prevent speculative execution */
#define _MASKABLE_EXCEPTION_PSERIES(vec, label, h) \
__MASKABLE_EXCEPTION_PSERIES(vec, label, h)
beq masked_##h##interrupt
#define _SOFTEN_TEST(h) __SOFTEN_TEST(h)
#define SOFTEN_TEST_PR(vec) \
KVMTEST_PR(vec); \
_SOFTEN_TEST(EXC_STD)
#define SOFTEN_TEST_HV(vec) \
KVMTEST(vec); \
_SOFTEN_TEST(EXC_HV)
#define SOFTEN_TEST_HV_201(vec) \
KVMTEST(vec); \
_SOFTEN_TEST(EXC_STD)
#define __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra) \
HMT_MEDIUM; \
SET_SCRATCH0(r13); /* save r13 */ \
__EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec); \
EXCEPTION_PROLOG_PSERIES_1(label##_common, h);
#define _MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra) \
__MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)
#define MASKABLE_EXCEPTION_PSERIES(loc, vec, label) \
. = loc; \
.globl label##_pSeries; \
label##_pSeries: \
_MASKABLE_EXCEPTION_PSERIES(vec, label, EXC_STD)
_MASKABLE_EXCEPTION_PSERIES(vec, label, \
EXC_STD, SOFTEN_TEST_PR)
#define MASKABLE_EXCEPTION_HV(loc, vec, label) \
. = loc; \
.globl label##_hv; \
label##_hv: \
_MASKABLE_EXCEPTION_PSERIES(vec, label, EXC_HV)
_MASKABLE_EXCEPTION_PSERIES(vec, label, \
EXC_HV, SOFTEN_TEST_HV)
#ifdef CONFIG_PPC_ISERIES
#define DISABLE_INTS \
......
......@@ -29,6 +29,10 @@
#define H_LONG_BUSY_ORDER_100_SEC 9905 /* Long busy, hint that 100sec \
is a good time to retry */
#define H_LONG_BUSY_END_RANGE 9905 /* End of long busy range */
/* Internal value used in book3s_hv kvm support; not returned to guests */
#define H_TOO_HARD 9999
#define H_HARDWARE -1 /* Hardware error */
#define H_FUNCTION -2 /* Function not supported */
#define H_PRIVILEGE -3 /* Caller not privileged */
......@@ -100,6 +104,7 @@
#define H_PAGE_SET_ACTIVE H_PAGE_STATE_CHANGE
#define H_AVPN (1UL<<(63-32)) /* An avpn is provided as a sanity test */
#define H_ANDCOND (1UL<<(63-33))
#define H_LOCAL (1UL<<(63-35))
#define H_ICACHE_INVALIDATE (1UL<<(63-40)) /* icbi, etc. (ignored for IO pages) */
#define H_ICACHE_SYNCHRONIZE (1UL<<(63-41)) /* dcbst, icbi, etc (ignored for IO pages */
#define H_COALESCE_CAND (1UL<<(63-42)) /* page is a good candidate for coalescing */
......
......@@ -22,6 +22,10 @@
#include <linux/types.h>
/* Select powerpc specific features in <linux/kvm.h> */
#define __KVM_HAVE_SPAPR_TCE
#define __KVM_HAVE_PPC_SMT
struct kvm_regs {
__u64 pc;
__u64 cr;
......@@ -272,4 +276,15 @@ struct kvm_guest_debug_arch {
#define KVM_INTERRUPT_UNSET -2U
#define KVM_INTERRUPT_SET_LEVEL -3U
/* for KVM_CAP_SPAPR_TCE */
struct kvm_create_spapr_tce {
__u64 liobn;
__u32 window_size;
};
/* for KVM_ALLOCATE_RMA */
struct kvm_allocate_rma {
__u64 rma_size;
};
#endif /* __LINUX_KVM_POWERPC_H */
......@@ -64,8 +64,12 @@
#define BOOK3S_INTERRUPT_PROGRAM 0x700
#define BOOK3S_INTERRUPT_FP_UNAVAIL 0x800
#define BOOK3S_INTERRUPT_DECREMENTER 0x900
#define BOOK3S_INTERRUPT_HV_DECREMENTER 0x980
#define BOOK3S_INTERRUPT_SYSCALL 0xc00
#define BOOK3S_INTERRUPT_TRACE 0xd00
#define BOOK3S_INTERRUPT_H_DATA_STORAGE 0xe00
#define BOOK3S_INTERRUPT_H_INST_STORAGE 0xe20
#define BOOK3S_INTERRUPT_H_EMUL_ASSIST 0xe40
#define BOOK3S_INTERRUPT_PERFMON 0xf00
#define BOOK3S_INTERRUPT_ALTIVEC 0xf20
#define BOOK3S_INTERRUPT_VSX 0xf40
......
......@@ -24,20 +24,6 @@
#include <linux/kvm_host.h>
#include <asm/kvm_book3s_asm.h>
struct kvmppc_slb {
u64 esid;
u64 vsid;
u64 orige;
u64 origv;
bool valid : 1;
bool Ks : 1;
bool Kp : 1;
bool nx : 1;
bool large : 1; /* PTEs are 16MB */
bool tb : 1; /* 1TB segment */
bool class : 1;
};
struct kvmppc_bat {
u64 raw;
u32 bepi;
......@@ -67,11 +53,22 @@ struct kvmppc_sid_map {
#define VSID_POOL_SIZE (SID_CONTEXTS * 16)
#endif
struct hpte_cache {
struct hlist_node list_pte;
struct hlist_node list_pte_long;
struct hlist_node list_vpte;
struct hlist_node list_vpte_long;
struct rcu_head rcu_head;
u64 host_va;
u64 pfn;
ulong slot;
struct kvmppc_pte pte;
};
struct kvmppc_vcpu_book3s {
struct kvm_vcpu vcpu;
struct kvmppc_book3s_shadow_vcpu *shadow_vcpu;
struct kvmppc_sid_map sid_map[SID_MAP_NUM];
struct kvmppc_slb slb[64];
struct {
u64 esid;
u64 vsid;
......@@ -81,7 +78,6 @@ struct kvmppc_vcpu_book3s {
struct kvmppc_bat dbat[8];
u64 hid[6];
u64 gqr[8];
int slb_nr;
u64 sdr1;
u64 hior;
u64 msr_mask;
......@@ -93,7 +89,13 @@ struct kvmppc_vcpu_book3s {
u64 vsid_max;
#endif
int context_id[SID_CONTEXTS];
ulong prog_flags; /* flags to inject when giving a 700 trap */
struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE];
struct hlist_head hpte_hash_pte_long[HPTEG_HASH_NUM_PTE_LONG];
struct hlist_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE];
struct hlist_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG];
int hpte_cache_count;
spinlock_t mmu_lock;
};
#define CONTEXT_HOST 0
......@@ -110,8 +112,10 @@ extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong ea, ulong ea_mask)
extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end);
extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
extern void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr);
extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
extern void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu);
extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
......@@ -123,19 +127,22 @@ extern int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu);
extern void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte);
extern int kvmppc_mmu_hpte_sysinit(void);
extern void kvmppc_mmu_hpte_sysexit(void);
extern int kvmppc_mmu_hv_init(void);
extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
extern void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags);
extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat,
bool upper, u32 val);
extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr);
extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu);
extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
extern ulong kvmppc_trampoline_lowmem;
extern ulong kvmppc_trampoline_enter;
extern void kvmppc_handler_lowmem_trampoline(void);
extern void kvmppc_handler_trampoline_enter(void);
extern void kvmppc_rmcall(ulong srr0, ulong srr1);
extern void kvmppc_hv_entry_trampoline(void);
extern void kvmppc_load_up_fpu(void);
extern void kvmppc_load_up_altivec(void);
extern void kvmppc_load_up_vsx(void);
......@@ -147,15 +154,32 @@ static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
return container_of(vcpu, struct kvmppc_vcpu_book3s, vcpu);
}
static inline ulong dsisr(void)
extern void kvm_return_point(void);
/* Also add subarch specific defines */
#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
#include <asm/kvm_book3s_32.h>
#endif
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#include <asm/kvm_book3s_64.h>
#endif
#ifdef CONFIG_KVM_BOOK3S_PR
static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu)
{
ulong r;
asm ( "mfdsisr %0 " : "=r" (r) );
return r;
return to_book3s(vcpu)->hior;
}
extern void kvm_return_point(void);
static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu);
static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
unsigned long pending_now, unsigned long old_pending)
{
if (pending_now)
vcpu->arch.shared->int_pending = 1;
else if (old_pending)
vcpu->arch.shared->int_pending = 0;
}
static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
{
......@@ -244,6 +268,120 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
return to_svcpu(vcpu)->fault_dar;
}
static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
{
ulong crit_raw = vcpu->arch.shared->critical;
ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
bool crit;
/* Truncate crit indicators in 32 bit mode */
if (!(vcpu->arch.shared->msr & MSR_SF)) {
crit_raw &= 0xffffffff;
crit_r1 &= 0xffffffff;
}
/* Critical section when crit == r1 */
crit = (crit_raw == crit_r1);
/* ... and we're in supervisor mode */
crit = crit && !(vcpu->arch.shared->msr & MSR_PR);
return crit;
}
#else /* CONFIG_KVM_BOOK3S_PR */
static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu)
{
return 0;
}
static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
unsigned long pending_now, unsigned long old_pending)
{
}
static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
{
vcpu->arch.gpr[num] = val;
}
static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
{
return vcpu->arch.gpr[num];
}
static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
{
vcpu->arch.cr = val;
}
static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.cr;
}
static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
{
vcpu->arch.xer = val;
}
static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
{
return vcpu->arch.xer;
}
static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
{
vcpu->arch.ctr = val;
}
static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.ctr;
}
static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
{
vcpu->arch.lr = val;
}
static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.lr;
}
static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
{
vcpu->arch.pc = val;
}
static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
{
return vcpu->arch.pc;
}
static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
{
ulong pc = kvmppc_get_pc(vcpu);
/* Load the instruction manually if it failed to do so in the
* exit path */
if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
return vcpu->arch.last_inst;
}
static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
{
return vcpu->arch.fault_dar;
}
static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
{
return false;
}
#endif
/* Magic register values loaded into r3 and r4 before the 'sc' assembly
* instruction for the OSI hypercalls */
#define OSI_SC_MAGIC_R3 0x113724FA
......@@ -251,12 +389,4 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
#define INS_DCBZ 0x7c0007ec
/* Also add subarch specific defines */
#ifdef CONFIG_PPC_BOOK3S_32
#include <asm/kvm_book3s_32.h>
#else
#include <asm/kvm_book3s_64.h>
#endif
#endif /* __ASM_KVM_BOOK3S_H__ */
......@@ -20,9 +20,13 @@
#ifndef __ASM_KVM_BOOK3S_64_H__
#define __ASM_KVM_BOOK3S_64_H__
#ifdef CONFIG_KVM_BOOK3S_PR
static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu)
{
return &get_paca()->shadow_vcpu;
}
#endif
#define SPAPR_TCE_SHIFT 12
#endif /* __ASM_KVM_BOOK3S_64_H__ */
......@@ -60,6 +60,36 @@ kvmppc_resume_\intno:
#else /*__ASSEMBLY__ */
/*
* This struct goes in the PACA on 64-bit processors. It is used
* to store host state that needs to be saved when we enter a guest
* and restored when we exit, but isn't specific to any particular
* guest or vcpu. It also has some scratch fields used by the guest
* exit code.
*/
struct kvmppc_host_state {
ulong host_r1;
ulong host_r2;
ulong host_msr;
ulong vmhandler;
ulong scratch0;
ulong scratch1;
u8 in_guest;
#ifdef CONFIG_KVM_BOOK3S_64_HV
struct kvm_vcpu *kvm_vcpu;
struct kvmppc_vcore *kvm_vcore;
unsigned long xics_phys;
u64 dabr;
u64 host_mmcr[3];
u32 host_pmc[8];
u64 host_purr;
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
#endif
};
struct kvmppc_book3s_shadow_vcpu {
ulong gpr[14];
u32 cr;
......@@ -73,17 +103,12 @@ struct kvmppc_book3s_shadow_vcpu {
ulong shadow_srr1;
ulong fault_dar;
ulong host_r1;
ulong host_r2;
ulong handler;
ulong scratch0;
ulong scratch1;
ulong vmhandler;
u8 in_guest;
#ifdef CONFIG_PPC_BOOK3S_32
u32 sr[16]; /* Guest SRs */
struct kvmppc_host_state hstate;
#endif
#ifdef CONFIG_PPC_BOOK3S_64
u8 slb_max; /* highest used guest slb entry */
struct {
......
......@@ -93,4 +93,8 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
return vcpu->arch.fault_dear;
}
static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.shared->msr;
}
#endif /* __ASM_KVM_BOOKE_H__ */
/*
* Copyright (C) 2008 Freescale Semiconductor, Inc. All rights reserved.
* Copyright (C) 2008-2011 Freescale Semiconductor, Inc. All rights reserved.
*
* Author: Yu Liu, <yu.liu@freescale.com>
*
......@@ -29,17 +29,25 @@ struct tlbe{
u32 mas7;
};
#define E500_TLB_VALID 1
#define E500_TLB_DIRTY 2
struct tlbe_priv {
pfn_t pfn;
unsigned int flags; /* E500_TLB_* */
};
struct vcpu_id_table;
struct kvmppc_vcpu_e500 {
/* Unmodified copy of the guest's TLB. */
struct tlbe *guest_tlb[E500_TLB_NUM];
/* TLB that's actually used when the guest is running. */
struct tlbe *shadow_tlb[E500_TLB_NUM];
/* Pages which are referenced in the shadow TLB. */
struct page **shadow_pages[E500_TLB_NUM];
struct tlbe *gtlb_arch[E500_TLB_NUM];
unsigned int guest_tlb_size[E500_TLB_NUM];
unsigned int shadow_tlb_size[E500_TLB_NUM];
unsigned int guest_tlb_nv[E500_TLB_NUM];
/* KVM internal information associated with each guest TLB entry */
struct tlbe_priv *gtlb_priv[E500_TLB_NUM];
unsigned int gtlb_size[E500_TLB_NUM];
unsigned int gtlb_nv[E500_TLB_NUM];
u32 host_pid[E500_PID_NUM];
u32 pid[E500_PID_NUM];
......@@ -53,6 +61,10 @@ struct kvmppc_vcpu_e500 {
u32 mas5;
u32 mas6;
u32 mas7;
/* vcpu id table */
struct vcpu_id_table *idt;
u32 l1csr0;
u32 l1csr1;
u32 hid0;
......
......@@ -25,15 +25,23 @@
#include <linux/interrupt.h>
#include <linux/types.h>
#include <linux/kvm_types.h>
#include <linux/threads.h>
#include <linux/spinlock.h>
#include <linux/kvm_para.h>
#include <linux/list.h>
#include <linux/atomic.h>
#include <asm/kvm_asm.h>
#include <asm/processor.h>
#define KVM_MAX_VCPUS 1
#define KVM_MAX_VCPUS NR_CPUS
#define KVM_MAX_VCORES NR_CPUS
#define KVM_MEMORY_SLOTS 32
/* memory slots that does not exposed to userspace */
#define KVM_PRIVATE_MEM_SLOTS 4
#ifdef CONFIG_KVM_MMIO
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
#endif
/* We don't currently support large pages. */
#define KVM_HPAGE_GFN_SHIFT(x) 0
......@@ -57,6 +65,10 @@ struct kvm;
struct kvm_run;
struct kvm_vcpu;
struct lppaca;
struct slb_shadow;
struct dtl;
struct kvm_vm_stat {
u32 remote_tlb_flush;
};
......@@ -133,9 +145,74 @@ struct kvmppc_exit_timing {
};
};
struct kvmppc_pginfo {
unsigned long pfn;
atomic_t refcnt;
};
struct kvmppc_spapr_tce_table {
struct list_head list;
struct kvm *kvm;
u64 liobn;
u32 window_size;
struct page *pages[0];
};
struct kvmppc_rma_info {
void *base_virt;
unsigned long base_pfn;
unsigned long npages;
struct list_head list;
atomic_t use_count;
};
struct kvm_arch {
#ifdef CONFIG_KVM_BOOK3S_64_HV
unsigned long hpt_virt;
unsigned long ram_npages;
unsigned long ram_psize;
unsigned long ram_porder;
struct kvmppc_pginfo *ram_pginfo;
unsigned int lpid;
unsigned int host_lpid;
unsigned long host_lpcr;
unsigned long sdr1;
unsigned long host_sdr1;
int tlbie_lock;
int n_rma_pages;
unsigned long lpcr;
unsigned long rmor;
struct kvmppc_rma_info *rma;
struct list_head spapr_tce_tables;
unsigned short last_vcpu[NR_CPUS];
struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
#endif /* CONFIG_KVM_BOOK3S_64_HV */
};
/*
* Struct for a virtual core.
* Note: entry_exit_count combines an entry count in the bottom 8 bits
* and an exit count in the next 8 bits. This is so that we can
* atomically increment the entry count iff the exit count is 0
* without taking the lock.
*/
struct kvmppc_vcore {
int n_runnable;
int n_blocked;
int num_threads;
int entry_exit_count;
int n_woken;
int nap_count;
u16 pcpu;
u8 vcore_running;
u8 in_guest;
struct list_head runnable_threads;
spinlock_t lock;
};
#define VCORE_ENTRY_COUNT(vc) ((vc)->entry_exit_count & 0xff)
#define VCORE_EXIT_COUNT(vc) ((vc)->entry_exit_count >> 8)
struct kvmppc_pte {
ulong eaddr;
u64 vpage;
......@@ -163,16 +240,18 @@ struct kvmppc_mmu {
bool (*is_dcbz32)(struct kvm_vcpu *vcpu);
};
struct hpte_cache {
struct hlist_node list_pte;
struct hlist_node list_pte_long;
struct hlist_node list_vpte;
struct hlist_node list_vpte_long;
struct rcu_head rcu_head;
u64 host_va;
u64 pfn;
ulong slot;
struct kvmppc_pte pte;
struct kvmppc_slb {
u64 esid;
u64 vsid;
u64 orige;
u64 origv;
bool valid : 1;
bool Ks : 1;
bool Kp : 1;
bool nx : 1;
bool large : 1; /* PTEs are 16MB */
bool tb : 1; /* 1TB segment */
bool class : 1;
};
struct kvm_vcpu_arch {
......@@ -187,6 +266,9 @@ struct kvm_vcpu_arch {
ulong highmem_handler;
ulong rmcall;
ulong host_paca_phys;
struct kvmppc_slb slb[64];
int slb_max; /* 1 + index of last valid entry in slb[] */
int slb_nr; /* total number of entries in SLB */
struct kvmppc_mmu mmu;
#endif
......@@ -195,13 +277,19 @@ struct kvm_vcpu_arch {
u64 fpr[32];
u64 fpscr;
#ifdef CONFIG_SPE
ulong evr[32];
ulong spefscr;
ulong host_spefscr;
u64 acc;
#endif
#ifdef CONFIG_ALTIVEC
vector128 vr[32];
vector128 vscr;
#endif
#ifdef CONFIG_VSX
u64 vsr[32];
u64 vsr[64];
#endif
#ifdef CONFIG_PPC_BOOK3S
......@@ -209,22 +297,27 @@ struct kvm_vcpu_arch {
u32 qpr[32];
#endif
#ifdef CONFIG_BOOKE
ulong pc;
ulong ctr;
ulong lr;
ulong xer;
u32 cr;
#endif
#ifdef CONFIG_PPC_BOOK3S
ulong shadow_msr;
ulong hflags;
ulong guest_owned_ext;
ulong purr;
ulong spurr;
ulong dscr;
ulong amr;
ulong uamor;
u32 ctrl;
ulong dabr;
#endif
u32 vrsave; /* also USPRG0 */
u32 mmucr;
ulong shadow_msr;
ulong sprg4;
ulong sprg5;
ulong sprg6;
......@@ -249,6 +342,7 @@ struct kvm_vcpu_arch {
u32 pvr;
u32 shadow_pid;
u32 shadow_pid1;
u32 pid;
u32 swap_pid;
......@@ -258,6 +352,9 @@ struct kvm_vcpu_arch {
u32 dbcr1;
u32 dbsr;
u64 mmcr[3];
u32 pmc[8];
#ifdef CONFIG_KVM_EXIT_TIMING
struct mutex exit_timing_lock;
struct kvmppc_exit_timing timing_exit;
......@@ -272,8 +369,12 @@ struct kvm_vcpu_arch {
struct dentry *debugfs_exit_timing;
#endif
#ifdef CONFIG_PPC_BOOK3S
ulong fault_dar;
u32 fault_dsisr;
#endif
#ifdef CONFIG_BOOKE
u32 last_inst;
ulong fault_dear;
ulong fault_esr;
ulong queued_dear;
......@@ -288,25 +389,47 @@ struct kvm_vcpu_arch {
u8 dcr_is_write;
u8 osi_needed;
u8 osi_enabled;
u8 hcall_needed;
u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
struct hrtimer dec_timer;
struct tasklet_struct tasklet;
u64 dec_jiffies;
u64 dec_expires;
unsigned long pending_exceptions;
u16 last_cpu;
u8 ceded;
u8 prodded;
u32 last_inst;
struct lppaca *vpa;
struct slb_shadow *slb_shadow;
struct dtl *dtl;
struct dtl *dtl_end;
struct kvmppc_vcore *vcore;
int ret;
int trap;
int state;
int ptid;
wait_queue_head_t cpu_run;
struct kvm_vcpu_arch_shared *shared;
unsigned long magic_page_pa; /* phys addr to map the magic page to */
unsigned long magic_page_ea; /* effect. addr to map the magic page to */
#ifdef CONFIG_PPC_BOOK3S
struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE];
struct hlist_head hpte_hash_pte_long[HPTEG_HASH_NUM_PTE_LONG];
struct hlist_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE];
struct hlist_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG];
int hpte_cache_count;
spinlock_t mmu_lock;
#ifdef CONFIG_KVM_BOOK3S_64_HV
struct kvm_vcpu_arch_shared shregs;
struct list_head run_list;
struct task_struct *run_task;
struct kvm_run *kvm_run;
#endif
};
#define KVMPPC_VCPU_BUSY_IN_HOST 0
#define KVMPPC_VCPU_BLOCKED 1
#define KVMPPC_VCPU_RUNNABLE 2
#endif /* __POWERPC_KVM_HOST_H__ */
......@@ -33,6 +33,9 @@
#else
#include <asm/kvm_booke.h>
#endif
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#include <asm/paca.h>
#endif
enum emulation_result {
EMULATE_DONE, /* no further processing */
......@@ -42,6 +45,7 @@ enum emulation_result {
EMULATE_AGAIN, /* something went wrong. go again */
};
extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
extern char kvmppc_handlers_start[];
extern unsigned long kvmppc_handler_len;
......@@ -109,6 +113,27 @@ extern void kvmppc_booke_exit(void);
extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
extern long kvmppc_alloc_hpt(struct kvm *kvm);
extern void kvmppc_free_hpt(struct kvm *kvm);
extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
extern void kvmppc_map_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
extern long kvm_vm_ioctl_allocate_rma(struct kvm *kvm,
struct kvm_allocate_rma *rma);
extern struct kvmppc_rma_info *kvm_alloc_rma(void);
extern void kvm_release_rma(struct kvmppc_rma_info *ri);
extern int kvmppc_core_init_vm(struct kvm *kvm);
extern void kvmppc_core_destroy_vm(struct kvm *kvm);
extern int kvmppc_core_prepare_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
extern void kvmppc_core_commit_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
/*
* Cuts out inst bits with ordering according to spec.
......@@ -151,4 +176,20 @@ int kvmppc_set_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid);
#ifdef CONFIG_KVM_BOOK3S_64_HV
static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
{
paca[cpu].kvm_hstate.xics_phys = addr;
}
extern void kvm_rma_init(void);
#else
static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
{}
static inline void kvm_rma_init(void)
{}
#endif
#endif /* __POWERPC_KVM_PPC_H__ */
......@@ -90,13 +90,19 @@ extern char initial_stab[];
#define HPTE_R_PP0 ASM_CONST(0x8000000000000000)
#define HPTE_R_TS ASM_CONST(0x4000000000000000)
#define HPTE_R_KEY_HI ASM_CONST(0x3000000000000000)
#define HPTE_R_RPN_SHIFT 12
#define HPTE_R_RPN ASM_CONST(0x3ffffffffffff000)
#define HPTE_R_FLAGS ASM_CONST(0x00000000000003ff)
#define HPTE_R_RPN ASM_CONST(0x0ffffffffffff000)
#define HPTE_R_PP ASM_CONST(0x0000000000000003)
#define HPTE_R_N ASM_CONST(0x0000000000000004)
#define HPTE_R_G ASM_CONST(0x0000000000000008)
#define HPTE_R_M ASM_CONST(0x0000000000000010)
#define HPTE_R_I ASM_CONST(0x0000000000000020)
#define HPTE_R_W ASM_CONST(0x0000000000000040)
#define HPTE_R_WIMG ASM_CONST(0x0000000000000078)
#define HPTE_R_C ASM_CONST(0x0000000000000080)
#define HPTE_R_R ASM_CONST(0x0000000000000100)
#define HPTE_R_KEY_LO ASM_CONST(0x0000000000000e00)
#define HPTE_V_1TB_SEG ASM_CONST(0x4000000000000000)
#define HPTE_V_VRMA_MASK ASM_CONST(0x4001ffffff000000)
......
......@@ -147,8 +147,11 @@ struct paca_struct {
struct dtl_entry *dtl_curr; /* pointer corresponding to dtl_ridx */
#ifdef CONFIG_KVM_BOOK3S_HANDLER
#ifdef CONFIG_KVM_BOOK3S_PR
/* We use this to store guest state in */
struct kvmppc_book3s_shadow_vcpu shadow_vcpu;
#endif
struct kvmppc_host_state kvm_hstate;
#endif
};
......
......@@ -150,18 +150,22 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
#define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
#define SAVE_2EVRS(n,s,base) SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
#define SAVE_4EVRS(n,s,base) SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
#define SAVE_8EVRS(n,s,base) SAVE_4EVRS(n,s,base); SAVE_4EVRS(n+4,s,base)
#define SAVE_16EVRS(n,s,base) SAVE_8EVRS(n,s,base); SAVE_8EVRS(n+8,s,base)
#define SAVE_32EVRS(n,s,base) SAVE_16EVRS(n,s,base); SAVE_16EVRS(n+16,s,base)
#define REST_EVR(n,s,base) lwz s,THREAD_EVR0+4*(n)(base); evmergelo n,s,n
#define REST_2EVRS(n,s,base) REST_EVR(n,s,base); REST_EVR(n+1,s,base)
#define REST_4EVRS(n,s,base) REST_2EVRS(n,s,base); REST_2EVRS(n+2,s,base)
#define REST_8EVRS(n,s,base) REST_4EVRS(n,s,base); REST_4EVRS(n+4,s,base)
#define REST_16EVRS(n,s,base) REST_8EVRS(n,s,base); REST_8EVRS(n+8,s,base)
#define REST_32EVRS(n,s,base) REST_16EVRS(n,s,base); REST_16EVRS(n+16,s,base)
/*
* b = base register for addressing, o = base offset from register of 1st EVR
* n = first EVR, s = scratch
*/
#define SAVE_EVR(n,s,b,o) evmergehi s,s,n; stw s,o+4*(n)(b)
#define SAVE_2EVRS(n,s,b,o) SAVE_EVR(n,s,b,o); SAVE_EVR(n+1,s,b,o)
#define SAVE_4EVRS(n,s,b,o) SAVE_2EVRS(n,s,b,o); SAVE_2EVRS(n+2,s,b,o)
#define SAVE_8EVRS(n,s,b,o) SAVE_4EVRS(n,s,b,o); SAVE_4EVRS(n+4,s,b,o)
#define SAVE_16EVRS(n,s,b,o) SAVE_8EVRS(n,s,b,o); SAVE_8EVRS(n+8,s,b,o)
#define SAVE_32EVRS(n,s,b,o) SAVE_16EVRS(n,s,b,o); SAVE_16EVRS(n+16,s,b,o)
#define REST_EVR(n,s,b,o) lwz s,o+4*(n)(b); evmergelo n,s,n
#define REST_2EVRS(n,s,b,o) REST_EVR(n,s,b,o); REST_EVR(n+1,s,b,o)
#define REST_4EVRS(n,s,b,o) REST_2EVRS(n,s,b,o); REST_2EVRS(n+2,s,b,o)
#define REST_8EVRS(n,s,b,o) REST_4EVRS(n,s,b,o); REST_4EVRS(n+4,s,b,o)
#define REST_16EVRS(n,s,b,o) REST_8EVRS(n,s,b,o); REST_8EVRS(n+8,s,b,o)
#define REST_32EVRS(n,s,b,o) REST_16EVRS(n,s,b,o); REST_16EVRS(n+16,s,b,o)
/* Macros to adjust thread priority for hardware multithreading */
#define HMT_VERY_LOW or 31,31,31 # very low priority
......
......@@ -189,6 +189,9 @@
#define SPRN_CTR 0x009 /* Count Register */
#define SPRN_DSCR 0x11
#define SPRN_CFAR 0x1c /* Come From Address Register */
#define SPRN_AMR 0x1d /* Authority Mask Register */
#define SPRN_UAMOR 0x9d /* User Authority Mask Override Register */
#define SPRN_AMOR 0x15d /* Authority Mask Override Register */
#define SPRN_ACOP 0x1F /* Available Coprocessor Register */
#define SPRN_CTRLF 0x088
#define SPRN_CTRLT 0x098
......@@ -232,22 +235,28 @@
#define LPCR_VPM0 (1ul << (63-0))
#define LPCR_VPM1 (1ul << (63-1))
#define LPCR_ISL (1ul << (63-2))
#define LPCR_VC_SH (63-2)
#define LPCR_DPFD_SH (63-11)
#define LPCR_VRMA_L (1ul << (63-12))
#define LPCR_VRMA_LP0 (1ul << (63-15))
#define LPCR_VRMA_LP1 (1ul << (63-16))
#define LPCR_VRMASD_SH (63-16)
#define LPCR_RMLS 0x1C000000 /* impl dependent rmo limit sel */
#define LPCR_RMLS_SH (63-37)
#define LPCR_ILE 0x02000000 /* !HV irqs set MSR:LE */
#define LPCR_PECE 0x00007000 /* powersave exit cause enable */
#define LPCR_PECE0 0x00004000 /* ext. exceptions can cause exit */
#define LPCR_PECE1 0x00002000 /* decrementer can cause exit */
#define LPCR_PECE2 0x00001000 /* machine check etc can cause exit */
#define LPCR_MER 0x00000800 /* Mediated External Exception */
#define LPCR_LPES 0x0000000c
#define LPCR_LPES0 0x00000008 /* LPAR Env selector 0 */
#define LPCR_LPES1 0x00000004 /* LPAR Env selector 1 */
#define LPCR_LPES_SH 2
#define LPCR_RMI 0x00000002 /* real mode is cache inhibit */
#define LPCR_HDICE 0x00000001 /* Hyp Decr enable (HV,PR,EE) */
#define SPRN_LPID 0x13F /* Logical Partition Identifier */
#define LPID_RSVD 0x3ff /* Reserved LPID for partn switching */
#define SPRN_HMER 0x150 /* Hardware m? error recovery */
#define SPRN_HMEER 0x151 /* Hardware m? enable error recovery */
#define SPRN_HEIR 0x153 /* Hypervisor Emulated Instruction Register */
......@@ -298,6 +307,7 @@
#define SPRN_HASH1 0x3D2 /* Primary Hash Address Register */
#define SPRN_HASH2 0x3D3 /* Secondary Hash Address Resgister */
#define SPRN_HID0 0x3F0 /* Hardware Implementation Register 0 */
#define HID0_HDICE_SH (63 - 23) /* 970 HDEC interrupt enable */
#define HID0_EMCP (1<<31) /* Enable Machine Check pin */
#define HID0_EBA (1<<29) /* Enable Bus Address Parity */
#define HID0_EBD (1<<28) /* Enable Bus Data Parity */
......@@ -353,6 +363,13 @@
#define SPRN_IABR2 0x3FA /* 83xx */
#define SPRN_IBCR 0x135 /* 83xx Insn Breakpoint Control Reg */
#define SPRN_HID4 0x3F4 /* 970 HID4 */
#define HID4_LPES0 (1ul << (63-0)) /* LPAR env. sel. bit 0 */
#define HID4_RMLS2_SH (63 - 2) /* Real mode limit bottom 2 bits */
#define HID4_LPID5_SH (63 - 6) /* partition ID bottom 4 bits */
#define HID4_RMOR_SH (63 - 22) /* real mode offset (16 bits) */
#define HID4_LPES1 (1 << (63-57)) /* LPAR env. sel. bit 1 */
#define HID4_RMLS0_SH (63 - 58) /* Real mode limit top bit */
#define HID4_LPID1_SH 0 /* partition ID top 2 bits */
#define SPRN_HID4_GEKKO 0x3F3 /* Gekko HID4 */
#define SPRN_HID5 0x3F6 /* 970 HID5 */
#define SPRN_HID6 0x3F9 /* BE HID 6 */
......@@ -802,28 +819,28 @@
mfspr rX,SPRN_SPRG_PACA; \
FTR_SECTION_ELSE_NESTED(66); \
mfspr rX,SPRN_SPRG_HPACA; \
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE, 66)
#define SET_PACA(rX) \
BEGIN_FTR_SECTION_NESTED(66); \
mtspr SPRN_SPRG_PACA,rX; \
FTR_SECTION_ELSE_NESTED(66); \
mtspr SPRN_SPRG_HPACA,rX; \
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE, 66)
#define GET_SCRATCH0(rX) \
BEGIN_FTR_SECTION_NESTED(66); \
mfspr rX,SPRN_SPRG_SCRATCH0; \
FTR_SECTION_ELSE_NESTED(66); \
mfspr rX,SPRN_SPRG_HSCRATCH0; \
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE, 66)
#define SET_SCRATCH0(rX) \
BEGIN_FTR_SECTION_NESTED(66); \
mtspr SPRN_SPRG_SCRATCH0,rX; \
FTR_SECTION_ELSE_NESTED(66); \
mtspr SPRN_SPRG_HSCRATCH0,rX; \
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE, 66)
#else /* CONFIG_PPC_BOOK3S_64 */
#define GET_SCRATCH0(rX) mfspr rX,SPRN_SPRG_SCRATCH0
......
......@@ -318,6 +318,7 @@
#define ESR_ILK 0x00100000 /* Instr. Cache Locking */
#define ESR_PUO 0x00040000 /* Unimplemented Operation exception */
#define ESR_BO 0x00020000 /* Byte Ordering */
#define ESR_SPV 0x00000080 /* Signal Processing operation */
/* Bit definitions related to the DBCR0. */
#if defined(CONFIG_40x)
......
This diff is collapsed.
......@@ -45,12 +45,12 @@ _GLOBAL(__restore_cpu_power7)
blr
__init_hvmode_206:
/* Disable CPU_FTR_HVMODE_206 and exit if MSR:HV is not set */
/* Disable CPU_FTR_HVMODE and exit if MSR:HV is not set */
mfmsr r3
rldicl. r0,r3,4,63
bnelr
ld r5,CPU_SPEC_FEATURES(r4)
LOAD_REG_IMMEDIATE(r6,CPU_FTR_HVMODE_206)
LOAD_REG_IMMEDIATE(r6,CPU_FTR_HVMODE)
xor r5,r5,r6
std r5,CPU_SPEC_FEATURES(r4)
blr
......@@ -61,19 +61,23 @@ __init_LPCR:
* LPES = 0b01 (HSRR0/1 used for 0x500)
* PECE = 0b111
* DPFD = 4
* HDICE = 0
* VC = 0b100 (VPM0=1, VPM1=0, ISL=0)
* VRMASD = 0b10000 (L=1, LP=00)
*
* Other bits untouched for now
*/
mfspr r3,SPRN_LPCR
ori r3,r3,(LPCR_LPES0|LPCR_LPES1)
xori r3,r3, LPCR_LPES0
li r5,1
rldimi r3,r5, LPCR_LPES_SH, 64-LPCR_LPES_SH-2
ori r3,r3,(LPCR_PECE0|LPCR_PECE1|LPCR_PECE2)
li r5,7
sldi r5,r5,LPCR_DPFD_SH
andc r3,r3,r5
li r5,4
sldi r5,r5,LPCR_DPFD_SH
or r3,r3,r5
rldimi r3,r5, LPCR_DPFD_SH, 64-LPCR_DPFD_SH-3
clrrdi r3,r3,1 /* clear HDICE */
li r5,4
rldimi r3,r5, LPCR_VC_SH, 0
li r5,0x10
rldimi r3,r5, LPCR_VRMASD_SH, 64-LPCR_VRMASD_SH-5
mtspr SPRN_LPCR,r3
isync
blr
......
......@@ -76,7 +76,7 @@ _GLOBAL(__setup_cpu_ppc970)
/* Do nothing if not running in HV mode */
mfmsr r0
rldicl. r0,r0,4,63
beqlr
beq no_hv_mode
mfspr r0,SPRN_HID0
li r11,5 /* clear DOZE and SLEEP */
......@@ -90,7 +90,7 @@ _GLOBAL(__setup_cpu_ppc970MP)
/* Do nothing if not running in HV mode */
mfmsr r0
rldicl. r0,r0,4,63
beqlr
beq no_hv_mode
mfspr r0,SPRN_HID0
li r11,0x15 /* clear DOZE and SLEEP */
......@@ -109,6 +109,14 @@ load_hids:
sync
isync
/* Try to set LPES = 01 in HID4 */
mfspr r0,SPRN_HID4
clrldi r0,r0,1 /* clear LPES0 */
ori r0,r0,HID4_LPES1 /* set LPES1 */
sync
mtspr SPRN_HID4,r0
isync
/* Save away cpu state */
LOAD_REG_ADDR(r5,cpu_state_storage)
......@@ -117,11 +125,21 @@ load_hids:
std r3,CS_HID0(r5)
mfspr r3,SPRN_HID1
std r3,CS_HID1(r5)
mfspr r3,SPRN_HID4
std r3,CS_HID4(r5)
mfspr r4,SPRN_HID4
std r4,CS_HID4(r5)
mfspr r3,SPRN_HID5
std r3,CS_HID5(r5)
/* See if we successfully set LPES1 to 1; if not we are in Apple mode */
andi. r4,r4,HID4_LPES1
bnelr
no_hv_mode:
/* Disable CPU_FTR_HVMODE and exit, since we don't have HV mode */
ld r5,CPU_SPEC_FEATURES(r4)
LOAD_REG_IMMEDIATE(r6,CPU_FTR_HVMODE)
andc r5,r5,r6
std r5,CPU_SPEC_FEATURES(r4)
blr
/* Called with no MMU context (typically MSR:IR/DR off) to
......
This diff is collapsed.
......@@ -656,7 +656,7 @@ load_up_spe:
cmpi 0,r4,0
beq 1f
addi r4,r4,THREAD /* want THREAD of last_task_used_spe */
SAVE_32EVRS(0,r10,r4)
SAVE_32EVRS(0,r10,r4,THREAD_EVR0)
evxor evr10, evr10, evr10 /* clear out evr10 */
evmwumiaa evr10, evr10, evr10 /* evr10 <- ACC = 0 * 0 + ACC */
li r5,THREAD_ACC
......@@ -676,7 +676,7 @@ load_up_spe:
stw r4,THREAD_USED_SPE(r5)
evlddx evr4,r10,r5
evmra evr4,evr4
REST_32EVRS(0,r10,r5)
REST_32EVRS(0,r10,r5,THREAD_EVR0)
#ifndef CONFIG_SMP
subi r4,r5,THREAD
stw r4,last_task_used_spe@l(r3)
......@@ -787,13 +787,11 @@ _GLOBAL(giveup_spe)
addi r3,r3,THREAD /* want THREAD of task */
lwz r5,PT_REGS(r3)
cmpi 0,r5,0
SAVE_32EVRS(0, r4, r3)
SAVE_32EVRS(0, r4, r3, THREAD_EVR0)
evxor evr6, evr6, evr6 /* clear out evr6 */
evmwumiaa evr6, evr6, evr6 /* evr6 <- ACC = 0 * 0 + ACC */
li r4,THREAD_ACC
evstddx evr6, r4, r3 /* save off accumulator */
mfspr r6,SPRN_SPEFSCR
stw r6,THREAD_SPEFSCR(r3) /* save spefscr register value */
beq 1f
lwz r4,_MSR-STACK_FRAME_OVERHEAD(r5)
lis r3,MSR_SPE@h
......
......@@ -73,7 +73,6 @@ _GLOBAL(power7_idle)
b .
_GLOBAL(power7_wakeup_loss)
GET_PACA(r13)
ld r1,PACAR1(r13)
REST_NVGPRS(r1)
REST_GPR(2, r1)
......@@ -87,7 +86,6 @@ _GLOBAL(power7_wakeup_loss)
rfid
_GLOBAL(power7_wakeup_noloss)
GET_PACA(r13)
ld r1,PACAR1(r13)
ld r4,_MSR(r1)
ld r5,_NIP(r1)
......
......@@ -167,7 +167,7 @@ void setup_paca(struct paca_struct *new_paca)
* if we do a GET_PACA() before the feature fixups have been
* applied
*/
if (cpu_has_feature(CPU_FTR_HVMODE_206))
if (cpu_has_feature(CPU_FTR_HVMODE))
mtspr(SPRN_SPRG_HPACA, local_paca);
#endif
mtspr(SPRN_SPRG_PACA, local_paca);
......
......@@ -96,6 +96,7 @@ void flush_fp_to_thread(struct task_struct *tsk)
preempt_enable();
}
}
EXPORT_SYMBOL_GPL(flush_fp_to_thread);
void enable_kernel_fp(void)
{
......@@ -145,6 +146,7 @@ void flush_altivec_to_thread(struct task_struct *tsk)
preempt_enable();
}
}
EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
......@@ -186,6 +188,7 @@ void flush_vsx_to_thread(struct task_struct *tsk)
preempt_enable();
}
}
EXPORT_SYMBOL_GPL(flush_vsx_to_thread);
#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
......@@ -213,6 +216,7 @@ void flush_spe_to_thread(struct task_struct *tsk)
#ifdef CONFIG_SMP
BUG_ON(tsk != current);
#endif
tsk->thread.spefscr = mfspr(SPRN_SPEFSCR);
giveup_spe(tsk);
}
preempt_enable();
......
......@@ -375,6 +375,9 @@ void __init check_for_initrd(void)
int threads_per_core, threads_shift;
cpumask_t threads_core_mask;
EXPORT_SYMBOL_GPL(threads_per_core);
EXPORT_SYMBOL_GPL(threads_shift);
EXPORT_SYMBOL_GPL(threads_core_mask);
static void __init cpu_init_thread_core_maps(int tpc)
{
......
......@@ -63,6 +63,7 @@
#include <asm/kexec.h>
#include <asm/mmu_context.h>
#include <asm/code-patching.h>
#include <asm/kvm_ppc.h>
#include "setup.h"
......@@ -580,6 +581,8 @@ void __init setup_arch(char **cmdline_p)
/* Initialize the MMU context management stuff */
mmu_context_init();
kvm_rma_init();
ppc64_boot_msg(0x15, "Setup Done");
}
......
......@@ -243,6 +243,7 @@ void smp_send_reschedule(int cpu)
if (likely(smp_ops))
smp_ops->message_pass(cpu, PPC_MSG_RESCHEDULE);
}
EXPORT_SYMBOL_GPL(smp_send_reschedule);
void arch_send_call_function_single_ipi(int cpu)
{
......
......@@ -1387,10 +1387,7 @@ void SPEFloatingPointException(struct pt_regs *regs)
int code = 0;
int err;
preempt_disable();
if (regs->msr & MSR_SPE)
giveup_spe(current);
preempt_enable();
flush_spe_to_thread(current);
spefscr = current->thread.spefscr;
fpexc_mode = current->thread.fpexc_mode;
......
......@@ -387,8 +387,10 @@ static void kvmppc_44x_invalidate(struct kvm_vcpu *vcpu,
}
}
void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode)
void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 old_msr)
{
int usermode = vcpu->arch.shared->msr & MSR_PR;
vcpu->arch.shadow_pid = !usermode;
}
......
......@@ -20,7 +20,6 @@ config KVM
bool
select PREEMPT_NOTIFIERS
select ANON_INODES
select KVM_MMIO
config KVM_BOOK3S_HANDLER
bool
......@@ -28,16 +27,22 @@ config KVM_BOOK3S_HANDLER
config KVM_BOOK3S_32_HANDLER
bool
select KVM_BOOK3S_HANDLER
select KVM_MMIO
config KVM_BOOK3S_64_HANDLER
bool
select KVM_BOOK3S_HANDLER
config KVM_BOOK3S_PR
bool
select KVM_MMIO
config KVM_BOOK3S_32
tristate "KVM support for PowerPC book3s_32 processors"
depends on EXPERIMENTAL && PPC_BOOK3S_32 && !SMP && !PTE_64BIT
select KVM
select KVM_BOOK3S_32_HANDLER
select KVM_BOOK3S_PR
---help---
Support running unmodified book3s_32 guest kernels
in virtual machines on book3s_32 host processors.
......@@ -50,8 +55,8 @@ config KVM_BOOK3S_32
config KVM_BOOK3S_64
tristate "KVM support for PowerPC book3s_64 processors"
depends on EXPERIMENTAL && PPC_BOOK3S_64
select KVM
select KVM_BOOK3S_64_HANDLER
select KVM
---help---
Support running unmodified book3s_64 and book3s_32 guest kernels
in virtual machines on book3s_64 host processors.
......@@ -61,10 +66,34 @@ config KVM_BOOK3S_64
If unsure, say N.
config KVM_BOOK3S_64_HV
bool "KVM support for POWER7 and PPC970 using hypervisor mode in host"
depends on KVM_BOOK3S_64
---help---
Support running unmodified book3s_64 guest kernels in
virtual machines on POWER7 and PPC970 processors that have
hypervisor mode available to the host.
If you say Y here, KVM will use the hardware virtualization
facilities of POWER7 (and later) processors, meaning that
guest operating systems will run at full hardware speed
using supervisor and user modes. However, this also means
that KVM is not usable under PowerVM (pHyp), is only usable
on POWER7 (or later) processors and PPC970-family processors,
and cannot emulate a different processor from the host processor.
If unsure, say N.
config KVM_BOOK3S_64_PR
def_bool y
depends on KVM_BOOK3S_64 && !KVM_BOOK3S_64_HV
select KVM_BOOK3S_PR
config KVM_440
bool "KVM support for PowerPC 440 processors"
depends on EXPERIMENTAL && 44x
select KVM
select KVM_MMIO
---help---
Support running unmodified 440 guest kernels in virtual machines on
440 host processors.
......@@ -89,6 +118,7 @@ config KVM_E500
bool "KVM support for PowerPC E500 processors"
depends on EXPERIMENTAL && E500
select KVM
select KVM_MMIO
---help---
Support running unmodified E500 guest kernels in virtual machines on
E500 host processors.
......
......@@ -38,24 +38,42 @@ kvm-e500-objs := \
e500_emulate.o
kvm-objs-$(CONFIG_KVM_E500) := $(kvm-e500-objs)
kvm-book3s_64-objs := \
$(common-objs-y) \
kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \
../../../virt/kvm/coalesced_mmio.o \
fpu.o \
book3s_paired_singles.o \
book3s.o \
book3s_pr.o \
book3s_emulate.o \
book3s_interrupts.o \
book3s_mmu_hpte.o \
book3s_64_mmu_host.o \
book3s_64_mmu.o \
book3s_32_mmu.o
kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-objs)
kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
book3s_hv.o \
book3s_hv_interrupts.o \
book3s_64_mmu_hv.o
kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
book3s_hv_rm_mmu.o \
book3s_64_vio_hv.o \
book3s_hv_builtin.o
kvm-book3s_64-module-objs := \
../../../virt/kvm/kvm_main.o \
powerpc.o \
emulate.o \
book3s.o \
$(kvm-book3s_64-objs-y)
kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-module-objs)
kvm-book3s_32-objs := \
$(common-objs-y) \
fpu.o \
book3s_paired_singles.o \
book3s.o \
book3s_pr.o \
book3s_emulate.o \
book3s_interrupts.o \
book3s_mmu_hpte.o \
......@@ -70,3 +88,4 @@ obj-$(CONFIG_KVM_E500) += kvm.o
obj-$(CONFIG_KVM_BOOK3S_64) += kvm.o
obj-$(CONFIG_KVM_BOOK3S_32) += kvm.o
obj-y += $(kvm-book3s_64-builtin-objs-y)
This diff is collapsed.
......@@ -41,36 +41,36 @@ static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu)
}
static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe(
struct kvmppc_vcpu_book3s *vcpu_book3s,
struct kvm_vcpu *vcpu,
gva_t eaddr)
{
int i;
u64 esid = GET_ESID(eaddr);
u64 esid_1t = GET_ESID_1T(eaddr);
for (i = 0; i < vcpu_book3s->slb_nr; i++) {
for (i = 0; i < vcpu->arch.slb_nr; i++) {
u64 cmp_esid = esid;
if (!vcpu_book3s->slb[i].valid)
if (!vcpu->arch.slb[i].valid)
continue;
if (vcpu_book3s->slb[i].tb)
if (vcpu->arch.slb[i].tb)
cmp_esid = esid_1t;
if (vcpu_book3s->slb[i].esid == cmp_esid)
return &vcpu_book3s->slb[i];
if (vcpu->arch.slb[i].esid == cmp_esid)
return &vcpu->arch.slb[i];
}
dprintk("KVM: No SLB entry found for 0x%lx [%llx | %llx]\n",
eaddr, esid, esid_1t);
for (i = 0; i < vcpu_book3s->slb_nr; i++) {
if (vcpu_book3s->slb[i].vsid)
for (i = 0; i < vcpu->arch.slb_nr; i++) {
if (vcpu->arch.slb[i].vsid)
dprintk(" %d: %c%c%c %llx %llx\n", i,
vcpu_book3s->slb[i].valid ? 'v' : ' ',
vcpu_book3s->slb[i].large ? 'l' : ' ',
vcpu_book3s->slb[i].tb ? 't' : ' ',
vcpu_book3s->slb[i].esid,
vcpu_book3s->slb[i].vsid);
vcpu->arch.slb[i].valid ? 'v' : ' ',
vcpu->arch.slb[i].large ? 'l' : ' ',
vcpu->arch.slb[i].tb ? 't' : ' ',
vcpu->arch.slb[i].esid,
vcpu->arch.slb[i].vsid);
}
return NULL;
......@@ -81,7 +81,7 @@ static u64 kvmppc_mmu_book3s_64_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
{
struct kvmppc_slb *slb;
slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), eaddr);
slb = kvmppc_mmu_book3s_64_find_slbe(vcpu, eaddr);
if (!slb)
return 0;
......@@ -180,7 +180,7 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
return 0;
}
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr);
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu, eaddr);
if (!slbe)
goto no_seg_found;
......@@ -320,10 +320,10 @@ static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
esid_1t = GET_ESID_1T(rb);
slb_nr = rb & 0xfff;
if (slb_nr > vcpu_book3s->slb_nr)
if (slb_nr > vcpu->arch.slb_nr)
return;
slbe = &vcpu_book3s->slb[slb_nr];
slbe = &vcpu->arch.slb[slb_nr];
slbe->large = (rs & SLB_VSID_L) ? 1 : 0;
slbe->tb = (rs & SLB_VSID_B_1T) ? 1 : 0;
......@@ -344,38 +344,35 @@ static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
static u64 kvmppc_mmu_book3s_64_slbmfee(struct kvm_vcpu *vcpu, u64 slb_nr)
{
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
struct kvmppc_slb *slbe;
if (slb_nr > vcpu_book3s->slb_nr)
if (slb_nr > vcpu->arch.slb_nr)
return 0;
slbe = &vcpu_book3s->slb[slb_nr];
slbe = &vcpu->arch.slb[slb_nr];
return slbe->orige;
}
static u64 kvmppc_mmu_book3s_64_slbmfev(struct kvm_vcpu *vcpu, u64 slb_nr)
{
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
struct kvmppc_slb *slbe;
if (slb_nr > vcpu_book3s->slb_nr)
if (slb_nr > vcpu->arch.slb_nr)
return 0;
slbe = &vcpu_book3s->slb[slb_nr];
slbe = &vcpu->arch.slb[slb_nr];
return slbe->origv;
}
static void kvmppc_mmu_book3s_64_slbie(struct kvm_vcpu *vcpu, u64 ea)
{
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
struct kvmppc_slb *slbe;
dprintk("KVM MMU: slbie(0x%llx)\n", ea);
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, ea);
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu, ea);
if (!slbe)
return;
......@@ -389,13 +386,12 @@ static void kvmppc_mmu_book3s_64_slbie(struct kvm_vcpu *vcpu, u64 ea)
static void kvmppc_mmu_book3s_64_slbia(struct kvm_vcpu *vcpu)
{
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
int i;
dprintk("KVM MMU: slbia()\n");
for (i = 1; i < vcpu_book3s->slb_nr; i++)
vcpu_book3s->slb[i].valid = false;
for (i = 1; i < vcpu->arch.slb_nr; i++)
vcpu->arch.slb[i].valid = false;
if (vcpu->arch.shared->msr & MSR_IR) {
kvmppc_mmu_flush_segments(vcpu);
......@@ -464,7 +460,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
ulong mp_ea = vcpu->arch.magic_page_ea;
if (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea);
slb = kvmppc_mmu_book3s_64_find_slbe(vcpu, ea);
if (slb)
gvsid = slb->vsid;
}
......
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
*/
#include <linux/types.h>
#include <linux/string.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <linux/highmem.h>
#include <linux/gfp.h>
#include <linux/slab.h>
#include <linux/hugetlb.h>
#include <asm/tlbflush.h>
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
#include <asm/mmu-hash64.h>
#include <asm/hvcall.h>
#include <asm/synch.h>
#include <asm/ppc-opcode.h>
#include <asm/cputable.h>
/* For now use fixed-size 16MB page table */
#define HPT_ORDER 24
#define HPT_NPTEG (1ul << (HPT_ORDER - 7)) /* 128B per pteg */
#define HPT_HASH_MASK (HPT_NPTEG - 1)
/* Pages in the VRMA are 16MB pages */
#define VRMA_PAGE_ORDER 24
#define VRMA_VSID 0x1ffffffUL /* 1TB VSID reserved for VRMA */
/* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
#define MAX_LPID_970 63
#define NR_LPIDS (LPID_RSVD + 1)
unsigned long lpid_inuse[BITS_TO_LONGS(NR_LPIDS)];
long kvmppc_alloc_hpt(struct kvm *kvm)
{
unsigned long hpt;
unsigned long lpid;
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN,
HPT_ORDER - PAGE_SHIFT);
if (!hpt) {
pr_err("kvm_alloc_hpt: Couldn't alloc HPT\n");
return -ENOMEM;
}
kvm->arch.hpt_virt = hpt;
do {
lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS);
if (lpid >= NR_LPIDS) {
pr_err("kvm_alloc_hpt: No LPIDs free\n");
free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
return -ENOMEM;
}
} while (test_and_set_bit(lpid, lpid_inuse));
kvm->arch.sdr1 = __pa(hpt) | (HPT_ORDER - 18);
kvm->arch.lpid = lpid;
pr_info("KVM guest htab at %lx, LPID %lx\n", hpt, lpid);
return 0;
}
void kvmppc_free_hpt(struct kvm *kvm)
{
clear_bit(kvm->arch.lpid, lpid_inuse);
free_pages(kvm->arch.hpt_virt, HPT_ORDER - PAGE_SHIFT);
}
void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem)
{
unsigned long i;
unsigned long npages = kvm->arch.ram_npages;
unsigned long pfn;
unsigned long *hpte;
unsigned long hash;
struct kvmppc_pginfo *pginfo = kvm->arch.ram_pginfo;
if (!pginfo)
return;
/* VRMA can't be > 1TB */
if (npages > 1ul << (40 - kvm->arch.ram_porder))
npages = 1ul << (40 - kvm->arch.ram_porder);
/* Can't use more than 1 HPTE per HPTEG */
if (npages > HPT_NPTEG)
npages = HPT_NPTEG;
for (i = 0; i < npages; ++i) {
pfn = pginfo[i].pfn;
if (!pfn)
break;
/* can't use hpt_hash since va > 64 bits */
hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & HPT_HASH_MASK;
/*
* We assume that the hash table is empty and no
* vcpus are using it at this stage. Since we create
* at most one HPTE per HPTEG, we just assume entry 7
* is available and use it.
*/
hpte = (unsigned long *) (kvm->arch.hpt_virt + (hash << 7));
hpte += 7 * 2;
/* HPTE low word - RPN, protection, etc. */
hpte[1] = (pfn << PAGE_SHIFT) | HPTE_R_R | HPTE_R_C |
HPTE_R_M | PP_RWXX;
wmb();
hpte[0] = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
(i << (VRMA_PAGE_ORDER - 16)) | HPTE_V_BOLTED |
HPTE_V_LARGE | HPTE_V_VALID;
}
}
int kvmppc_mmu_hv_init(void)
{
unsigned long host_lpid, rsvd_lpid;
if (!cpu_has_feature(CPU_FTR_HVMODE))
return -EINVAL;
memset(lpid_inuse, 0, sizeof(lpid_inuse));
if (cpu_has_feature(CPU_FTR_ARCH_206)) {
host_lpid = mfspr(SPRN_LPID); /* POWER7 */
rsvd_lpid = LPID_RSVD;
} else {
host_lpid = 0; /* PPC970 */
rsvd_lpid = MAX_LPID_970;
}
set_bit(host_lpid, lpid_inuse);
/* rsvd_lpid is reserved for use in partition switching */
set_bit(rsvd_lpid, lpid_inuse);
return 0;
}
void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
{
}
static void kvmppc_mmu_book3s_64_hv_reset_msr(struct kvm_vcpu *vcpu)
{
kvmppc_set_msr(vcpu, MSR_SF | MSR_ME);
}
static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_pte *gpte, bool data)
{
return -ENOENT;
}
void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu)
{
struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
if (cpu_has_feature(CPU_FTR_ARCH_206))
vcpu->arch.slb_nr = 32; /* POWER7 */
else
vcpu->arch.slb_nr = 64;
mmu->xlate = kvmppc_mmu_book3s_64_hv_xlate;
mmu->reset_msr = kvmppc_mmu_book3s_64_hv_reset_msr;
vcpu->arch.hflags |= BOOK3S_HFLAG_SLB;
}
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
* Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
*/
#include <linux/types.h>
#include <linux/string.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <linux/highmem.h>
#include <linux/gfp.h>
#include <linux/slab.h>
#include <linux/hugetlb.h>
#include <linux/list.h>
#include <asm/tlbflush.h>
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
#include <asm/mmu-hash64.h>
#include <asm/hvcall.h>
#include <asm/synch.h>
#include <asm/ppc-opcode.h>
#include <asm/kvm_host.h>
#include <asm/udbg.h>
#define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))
long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
unsigned long ioba, unsigned long tce)
{
struct kvm *kvm = vcpu->kvm;
struct kvmppc_spapr_tce_table *stt;
/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
/* liobn, ioba, tce); */
list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
if (stt->liobn == liobn) {
unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
struct page *page;
u64 *tbl;
/* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p window_size=0x%x\n", */
/* liobn, stt, stt->window_size); */
if (ioba >= stt->window_size)
return H_PARAMETER;
page = stt->pages[idx / TCES_PER_PAGE];
tbl = (u64 *)page_address(page);
/* FIXME: Need to validate the TCE itself */
/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
tbl[idx % TCES_PER_PAGE] = tce;
return H_SUCCESS;
}
}
/* Didn't find the liobn, punt it to userspace */
return H_TOO_HARD;
}
......@@ -20,8 +20,11 @@
#include <linux/module.h>
#include <asm/kvm_book3s.h>
EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter);
EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem);
#ifdef CONFIG_KVM_BOOK3S_64_HV
EXPORT_SYMBOL_GPL(kvmppc_hv_entry_trampoline);
#else
EXPORT_SYMBOL_GPL(kvmppc_handler_trampoline_enter);
EXPORT_SYMBOL_GPL(kvmppc_handler_lowmem_trampoline);
EXPORT_SYMBOL_GPL(kvmppc_rmcall);
EXPORT_SYMBOL_GPL(kvmppc_load_up_fpu);
#ifdef CONFIG_ALTIVEC
......@@ -30,3 +33,5 @@ EXPORT_SYMBOL_GPL(kvmppc_load_up_altivec);
#ifdef CONFIG_VSX
EXPORT_SYMBOL_GPL(kvmppc_load_up_vsx);
#endif
#endif
This diff is collapsed.
/*
* Copyright 2011 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*/
#include <linux/kvm_host.h>
#include <linux/preempt.h>
#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/bootmem.h>
#include <linux/init.h>
#include <asm/cputable.h>
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
/*
* This maintains a list of RMAs (real mode areas) for KVM guests to use.
* Each RMA has to be physically contiguous and of a size that the
* hardware supports. PPC970 and POWER7 support 64MB, 128MB and 256MB,
* and other larger sizes. Since we are unlikely to be allocate that
* much physically contiguous memory after the system is up and running,
* we preallocate a set of RMAs in early boot for KVM to use.
*/
static unsigned long kvm_rma_size = 64 << 20; /* 64MB */
static unsigned long kvm_rma_count;
static int __init early_parse_rma_size(char *p)
{
if (!p)
return 1;
kvm_rma_size = memparse(p, &p);
return 0;
}
early_param("kvm_rma_size", early_parse_rma_size);
static int __init early_parse_rma_count(char *p)
{
if (!p)
return 1;
kvm_rma_count = simple_strtoul(p, NULL, 0);
return 0;
}
early_param("kvm_rma_count", early_parse_rma_count);
static struct kvmppc_rma_info *rma_info;
static LIST_HEAD(free_rmas);
static DEFINE_SPINLOCK(rma_lock);
/* Work out RMLS (real mode limit selector) field value for a given RMA size.
Assumes POWER7 or PPC970. */
static inline int lpcr_rmls(unsigned long rma_size)
{
switch (rma_size) {
case 32ul << 20: /* 32 MB */
if (cpu_has_feature(CPU_FTR_ARCH_206))
return 8; /* only supported on POWER7 */
return -1;
case 64ul << 20: /* 64 MB */
return 3;
case 128ul << 20: /* 128 MB */
return 7;
case 256ul << 20: /* 256 MB */
return 4;
case 1ul << 30: /* 1 GB */
return 2;
case 16ul << 30: /* 16 GB */
return 1;
case 256ul << 30: /* 256 GB */
return 0;
default:
return -1;
}
}
/*
* Called at boot time while the bootmem allocator is active,
* to allocate contiguous physical memory for the real memory
* areas for guests.
*/
void kvm_rma_init(void)
{
unsigned long i;
unsigned long j, npages;
void *rma;
struct page *pg;
/* Only do this on PPC970 in HV mode */
if (!cpu_has_feature(CPU_FTR_HVMODE) ||
!cpu_has_feature(CPU_FTR_ARCH_201))
return;
if (!kvm_rma_size || !kvm_rma_count)
return;
/* Check that the requested size is one supported in hardware */
if (lpcr_rmls(kvm_rma_size) < 0) {
pr_err("RMA size of 0x%lx not supported\n", kvm_rma_size);
return;
}
npages = kvm_rma_size >> PAGE_SHIFT;
rma_info = alloc_bootmem(kvm_rma_count * sizeof(struct kvmppc_rma_info));
for (i = 0; i < kvm_rma_count; ++i) {
rma = alloc_bootmem_align(kvm_rma_size, kvm_rma_size);
pr_info("Allocated KVM RMA at %p (%ld MB)\n", rma,
kvm_rma_size >> 20);
rma_info[i].base_virt = rma;
rma_info[i].base_pfn = __pa(rma) >> PAGE_SHIFT;
rma_info[i].npages = npages;
list_add_tail(&rma_info[i].list, &free_rmas);
atomic_set(&rma_info[i].use_count, 0);
pg = pfn_to_page(rma_info[i].base_pfn);
for (j = 0; j < npages; ++j) {
atomic_inc(&pg->_count);
++pg;
}
}
}
struct kvmppc_rma_info *kvm_alloc_rma(void)
{
struct kvmppc_rma_info *ri;
ri = NULL;
spin_lock(&rma_lock);
if (!list_empty(&free_rmas)) {
ri = list_first_entry(&free_rmas, struct kvmppc_rma_info, list);
list_del(&ri->list);
atomic_inc(&ri->use_count);
}
spin_unlock(&rma_lock);
return ri;
}
EXPORT_SYMBOL_GPL(kvm_alloc_rma);
void kvm_release_rma(struct kvmppc_rma_info *ri)
{
if (atomic_dec_and_test(&ri->use_count)) {
spin_lock(&rma_lock);
list_add_tail(&ri->list, &free_rmas);
spin_unlock(&rma_lock);
}
}
EXPORT_SYMBOL_GPL(kvm_release_rma);
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright 2011 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
*
* Derived from book3s_interrupts.S, which is:
* Copyright SUSE Linux Products GmbH 2009
*
* Authors: Alexander Graf <agraf@suse.de>
*/
#include <asm/ppc_asm.h>
#include <asm/kvm_asm.h>
#include <asm/reg.h>
#include <asm/page.h>
#include <asm/asm-offsets.h>
#include <asm/exception-64s.h>
#include <asm/ppc-opcode.h>
/*****************************************************************************
* *
* Guest entry / exit code that is in kernel module memory (vmalloc) *
* *
****************************************************************************/
/* Registers:
* r4: vcpu pointer
*/
_GLOBAL(__kvmppc_vcore_entry)
/* Write correct stack frame */
mflr r0
std r0,PPC_LR_STKOFF(r1)
/* Save host state to the stack */
stdu r1, -SWITCH_FRAME_SIZE(r1)
/* Save non-volatile registers (r14 - r31) */
SAVE_NVGPRS(r1)
/* Save host DSCR */
BEGIN_FTR_SECTION
mfspr r3, SPRN_DSCR
std r3, HSTATE_DSCR(r13)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
/* Save host DABR */
mfspr r3, SPRN_DABR
std r3, HSTATE_DABR(r13)
/* Hard-disable interrupts */
mfmsr r10
std r10, HSTATE_HOST_MSR(r13)
rldicl r10,r10,48,1
rotldi r10,r10,16
mtmsrd r10,1
/* Save host PMU registers and load guest PMU registers */
/* R4 is live here (vcpu pointer) but not r3 or r5 */
li r3, 1
sldi r3, r3, 31 /* MMCR0_FC (freeze counters) bit */
mfspr r7, SPRN_MMCR0 /* save MMCR0 */
mtspr SPRN_MMCR0, r3 /* freeze all counters, disable interrupts */
isync
ld r3, PACALPPACAPTR(r13) /* is the host using the PMU? */
lbz r5, LPPACA_PMCINUSE(r3)
cmpwi r5, 0
beq 31f /* skip if not */
mfspr r5, SPRN_MMCR1
mfspr r6, SPRN_MMCRA
std r7, HSTATE_MMCR(r13)
std r5, HSTATE_MMCR + 8(r13)
std r6, HSTATE_MMCR + 16(r13)
mfspr r3, SPRN_PMC1
mfspr r5, SPRN_PMC2
mfspr r6, SPRN_PMC3
mfspr r7, SPRN_PMC4
mfspr r8, SPRN_PMC5
mfspr r9, SPRN_PMC6
BEGIN_FTR_SECTION
mfspr r10, SPRN_PMC7
mfspr r11, SPRN_PMC8
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
stw r3, HSTATE_PMC(r13)
stw r5, HSTATE_PMC + 4(r13)
stw r6, HSTATE_PMC + 8(r13)
stw r7, HSTATE_PMC + 12(r13)
stw r8, HSTATE_PMC + 16(r13)
stw r9, HSTATE_PMC + 20(r13)
BEGIN_FTR_SECTION
stw r10, HSTATE_PMC + 24(r13)
stw r11, HSTATE_PMC + 28(r13)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
31:
/*
* Put whatever is in the decrementer into the
* hypervisor decrementer.
*/
mfspr r8,SPRN_DEC
mftb r7
mtspr SPRN_HDEC,r8
extsw r8,r8
add r8,r8,r7
std r8,HSTATE_DECEXP(r13)
/*
* On PPC970, if the guest vcpu has an external interrupt pending,
* send ourselves an IPI so as to interrupt the guest once it
* enables interrupts. (It must have interrupts disabled,
* otherwise we would already have delivered the interrupt.)
*/
BEGIN_FTR_SECTION
ld r0, VCPU_PENDING_EXC(r4)
li r7, (1 << BOOK3S_IRQPRIO_EXTERNAL)
oris r7, r7, (1 << BOOK3S_IRQPRIO_EXTERNAL_LEVEL)@h
and. r0, r0, r7
beq 32f
mr r31, r4
lhz r3, PACAPACAINDEX(r13)
bl smp_send_reschedule
nop
mr r4, r31
32:
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
/* Jump to partition switch code */
bl .kvmppc_hv_entry_trampoline
nop
/*
* We return here in virtual mode after the guest exits
* with something that we can't handle in real mode.
* Interrupts are enabled again at this point.
*/
.global kvmppc_handler_highmem
kvmppc_handler_highmem:
/*
* Register usage at this point:
*
* R1 = host R1
* R2 = host R2
* R12 = exit handler id
* R13 = PACA
*/
/* Restore non-volatile host registers (r14 - r31) */
REST_NVGPRS(r1)
addi r1, r1, SWITCH_FRAME_SIZE
ld r0, PPC_LR_STKOFF(r1)
mtlr r0
blr
This diff is collapsed.
This diff is collapsed.
......@@ -29,8 +29,7 @@
#define ULONG_SIZE 8
#define FUNC(name) GLUE(.,name)
#define GET_SHADOW_VCPU(reg) \
addi reg, r13, PACA_KVM_SVCPU
#define GET_SHADOW_VCPU_R13
#define DISABLE_INTERRUPTS \
mfmsr r0; \
......@@ -43,8 +42,8 @@
#define ULONG_SIZE 4
#define FUNC(name) name
#define GET_SHADOW_VCPU(reg) \
lwz reg, (THREAD + THREAD_KVM_SVCPU)(r2)
#define GET_SHADOW_VCPU_R13 \
lwz r13, (THREAD + THREAD_KVM_SVCPU)(r2)
#define DISABLE_INTERRUPTS \
mfmsr r0; \
......@@ -85,7 +84,7 @@
* r3: kvm_run pointer
* r4: vcpu pointer
*/
_GLOBAL(__kvmppc_vcpu_entry)
_GLOBAL(__kvmppc_vcpu_run)
kvm_start_entry:
/* Write correct stack frame */
......@@ -107,17 +106,11 @@ kvm_start_entry:
/* Load non-volatile guest state from the vcpu */
VCPU_LOAD_NVGPRS(r4)
GET_SHADOW_VCPU(r5)
/* Save R1/R2 in the PACA */
PPC_STL r1, SVCPU_HOST_R1(r5)
PPC_STL r2, SVCPU_HOST_R2(r5)
kvm_start_lightweight:
/* XXX swap in/out on load? */
GET_SHADOW_VCPU_R13
PPC_LL r3, VCPU_HIGHMEM_HANDLER(r4)
PPC_STL r3, SVCPU_VMHANDLER(r5)
kvm_start_lightweight:
PPC_STL r3, HSTATE_VMHANDLER(r13)
PPC_LL r10, VCPU_SHADOW_MSR(r4) /* r10 = vcpu->arch.shadow_msr */
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -52,24 +52,19 @@
extern unsigned long kvmppc_booke_handlers;
/* Helper function for "full" MSR writes. No need to call this if only EE is
* changing. */
static inline void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
{
if ((new_msr & MSR_PR) != (vcpu->arch.shared->msr & MSR_PR))
kvmppc_mmu_priv_switch(vcpu, new_msr & MSR_PR);
vcpu->arch.shared->msr = new_msr;
if (vcpu->arch.shared->msr & MSR_WE) {
kvm_vcpu_block(vcpu);
kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS);
};
}
void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr);
void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 old_msr);
int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned int inst, int *advance);
int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt);
int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs);
/* low-level asm code to transfer guest state */
void kvmppc_load_guest_spe(struct kvm_vcpu *vcpu);
void kvmppc_save_guest_spe(struct kvm_vcpu *vcpu);
/* high-level function, manages flags, host state */
void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
#endif /* __KVM_BOOKE_H__ */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment