Commit b05d59df authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next

Pull KVM updates from Paolo Bonzini:
 "At over 200 commits, covering almost all supported architectures, this
  was a pretty active cycle for KVM.  Changes include:

   - a lot of s390 changes: optimizations, support for migration, GDB
     support and more

   - ARM changes are pretty small: support for the PSCI 0.2 hypercall
     interface on both the guest and the host (the latter acked by
     Catalin)

   - initial POWER8 and little-endian host support

   - support for running u-boot on embedded POWER targets

   - pretty large changes to MIPS too, completing the userspace
     interface and improving the handling of virtualized timer hardware

   - for x86, a larger set of changes is scheduled for 3.17.  Still, we
     have a few emulator bugfixes and support for running nested
     fully-virtualized Xen guests (para-virtualized Xen guests have
     always worked).  And some optimizations too.

  The only missing architecture here is ia64.  It's not a coincidence
  that support for KVM on ia64 is scheduled for removal in 3.17"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
  KVM: add missing cleanup_srcu_struct
  KVM: PPC: Book3S PR: Rework SLB switching code
  KVM: PPC: Book3S PR: Use SLB entry 0
  KVM: PPC: Book3S HV: Fix machine check delivery to guest
  KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
  KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
  KVM: PPC: Book3S HV: Fix dirty map for hugepages
  KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
  KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
  KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
  KVM: PPC: Book3S: Add ONE_REG register names that were missed
  KVM: PPC: Add CAP to indicate hcall fixes
  KVM: PPC: MPIC: Reset IRQ source private members
  KVM: PPC: Graciously fail broken LE hypercalls
  PPC: ePAPR: Fix hypercall on LE guest
  KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
  KVM: PPC: BOOK3S: Always use the saved DAR value
  PPC: KVM: Make NX bit available with magic page
  KVM: PPC: Disable NX for old magic page using guests
  KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
  ...
parents daf342af 820b3fcd
......@@ -21,7 +21,15 @@ to #0.
Main node required properties:
- compatible : Must be "arm,psci"
- compatible : should contain at least one of:
* "arm,psci" : for implementations complying to PSCI versions prior to
0.2. For these cases function IDs must be provided.
* "arm,psci-0.2" : for implementations complying to PSCI 0.2. Function
IDs are not required and should be ignored by an OS with PSCI 0.2
support, but are permitted to be present for compatibility with
existing software when "arm,psci" is later in the compatible list.
- method : The method of calling the PSCI firmware. Permitted
values are:
......@@ -45,6 +53,8 @@ Main node optional properties:
Example:
Case 1: PSCI v0.1 only.
psci {
compatible = "arm,psci";
method = "smc";
......@@ -53,3 +63,28 @@ Example:
cpu_on = <0x95c10002>;
migrate = <0x95c10003>;
};
Case 2: PSCI v0.2 only
psci {
compatible = "arm,psci-0.2";
method = "smc";
};
Case 3: PSCI v0.2 and PSCI v0.1.
A DTB may provide IDs for use by kernels without PSCI 0.2 support,
enabling firmware and hypervisors to support existing and new kernels.
These IDs will be ignored by kernels with PSCI 0.2 support, which will
use the standard PSCI 0.2 IDs exclusively.
psci {
compatible = "arm,psci-0.2", "arm,psci";
method = "hvc";
cpu_on = < arbitrary value >;
cpu_off = < arbitrary value >;
...
};
......@@ -1794,6 +1794,11 @@ registers, find a list below:
PPC | KVM_REG_PPC_MMCR0 | 64
PPC | KVM_REG_PPC_MMCR1 | 64
PPC | KVM_REG_PPC_MMCRA | 64
PPC | KVM_REG_PPC_MMCR2 | 64
PPC | KVM_REG_PPC_MMCRS | 64
PPC | KVM_REG_PPC_SIAR | 64
PPC | KVM_REG_PPC_SDAR | 64
PPC | KVM_REG_PPC_SIER | 64
PPC | KVM_REG_PPC_PMC1 | 32
PPC | KVM_REG_PPC_PMC2 | 32
PPC | KVM_REG_PPC_PMC3 | 32
......@@ -1868,6 +1873,7 @@ registers, find a list below:
PPC | KVM_REG_PPC_PPR | 64
PPC | KVM_REG_PPC_ARCH_COMPAT 32
PPC | KVM_REG_PPC_DABRX | 32
PPC | KVM_REG_PPC_WORT | 64
PPC | KVM_REG_PPC_TM_GPR0 | 64
...
PPC | KVM_REG_PPC_TM_GPR31 | 64
......@@ -2211,6 +2217,8 @@ KVM_S390_SIGP_STOP (vcpu) - sigp restart
KVM_S390_PROGRAM_INT (vcpu) - program check; code in parm
KVM_S390_SIGP_SET_PREFIX (vcpu) - sigp set prefix; prefix address in parm
KVM_S390_RESTART (vcpu) - restart
KVM_S390_INT_CLOCK_COMP (vcpu) - clock comparator interrupt
KVM_S390_INT_CPU_TIMER (vcpu) - CPU timer interrupt
KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt
parameters in parm and parm64
KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm
......@@ -2314,8 +2322,8 @@ struct kvm_create_device {
4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
Capability: KVM_CAP_DEVICE_CTRL
Type: device ioctl
Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device
Type: device ioctl, vm ioctl
Parameters: struct kvm_device_attr
Returns: 0 on success, -1 on error
Errors:
......@@ -2340,8 +2348,8 @@ struct kvm_device_attr {
4.81 KVM_HAS_DEVICE_ATTR
Capability: KVM_CAP_DEVICE_CTRL
Type: device ioctl
Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device
Type: device ioctl, vm ioctl
Parameters: struct kvm_device_attr
Returns: 0 on success, -1 on error
Errors:
......@@ -2376,6 +2384,8 @@ Possible features:
Depends on KVM_CAP_ARM_PSCI.
- KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
- KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU.
Depends on KVM_CAP_ARM_PSCI_0_2.
4.83 KVM_ARM_PREFERRED_TARGET
......@@ -2738,6 +2748,21 @@ It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
external interrupt has just been delivered into the guest. User space
should put the acknowledged interrupt vector into the 'epr' field.
/* KVM_EXIT_SYSTEM_EVENT */
struct {
#define KVM_SYSTEM_EVENT_SHUTDOWN 1
#define KVM_SYSTEM_EVENT_RESET 2
__u32 type;
__u64 flags;
} system_event;
If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered
a system-level event using some architecture specific mechanism (hypercall
or some special instruction). In case of ARM/ARM64, this is triggered using
HVC instruction based PSCI call from the vcpu. The 'type' field describes
the system-level event type. The 'flags' field describes architecture
specific flags for the system-level event.
/* Fix the size of the union. */
char padding[256];
};
......
Generic vm interface
====================================
The virtual machine "device" also accepts the ioctls KVM_SET_DEVICE_ATTR,
KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same
struct kvm_device_attr as other devices, but targets VM-wide settings
and controls.
The groups and attributes per virtual machine, if any, are architecture
specific.
1. GROUP: KVM_S390_VM_MEM_CTRL
Architectures: s390
1.1. ATTRIBUTE: KVM_S390_VM_MEM_CTRL
Parameters: none
Returns: -EBUSY if already a vcpus is defined, otherwise 0
Enables CMMA for the virtual machine
1.2. ATTRIBUTE: KVM_S390_VM_CLR_CMMA
Parameteres: none
Returns: 0
Clear the CMMA status for all guest pages, so any pages the guest marked
as unused are again used any may not be reclaimed by the host.
......@@ -94,10 +94,24 @@ a bitmap of available features inside the magic page.
The following enhancements to the magic page are currently available:
KVM_MAGIC_FEAT_SR Maps SR registers r/w in the magic page
KVM_MAGIC_FEAT_MAS0_TO_SPRG7 Maps MASn, ESR, PIR and high SPRGs
For enhanced features in the magic page, please check for the existence of the
feature before using them!
Magic page flags
================
In addition to features that indicate whether a host is capable of a particular
feature we also have a channel for a guest to tell the guest whether it's capable
of something. This is what we call "flags".
Flags are passed to the host in the low 12 bits of the Effective Address.
The following flags are currently available for a guest to expose:
MAGIC_PAGE_FLAG_NOT_MAPPED_NX Guest handles NX bits correclty wrt magic page
MSR bits
========
......
......@@ -78,3 +78,5 @@ DIAGNOSE function code 'X'501 - KVM breakpoint
If the function code specifies 0x501, breakpoint functions may be performed.
This function code is handled by userspace.
This diagnose function code has no subfunctions and uses no parameters.
......@@ -36,7 +36,7 @@
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
#define KVM_HAVE_ONE_REG
#define KVM_VCPU_MAX_FEATURES 1
#define KVM_VCPU_MAX_FEATURES 2
#include <kvm/arm_vgic.h>
......
......@@ -18,6 +18,10 @@
#ifndef __ARM_KVM_PSCI_H__
#define __ARM_KVM_PSCI_H__
bool kvm_psci_call(struct kvm_vcpu *vcpu);
#define KVM_ARM_PSCI_0_1 1
#define KVM_ARM_PSCI_0_2 2
int kvm_psci_version(struct kvm_vcpu *vcpu);
int kvm_psci_call(struct kvm_vcpu *vcpu);
#endif /* __ARM_KVM_PSCI_H__ */
......@@ -29,16 +29,19 @@ struct psci_operations {
int (*cpu_off)(struct psci_power_state state);
int (*cpu_on)(unsigned long cpuid, unsigned long entry_point);
int (*migrate)(unsigned long cpuid);
int (*affinity_info)(unsigned long target_affinity,
unsigned long lowest_affinity_level);
int (*migrate_info_type)(void);
};
extern struct psci_operations psci_ops;
extern struct smp_operations psci_smp_ops;
#ifdef CONFIG_ARM_PSCI
void psci_init(void);
int psci_init(void);
bool psci_smp_available(void);
#else
static inline void psci_init(void) { }
static inline int psci_init(void) { return 0; }
static inline bool psci_smp_available(void) { return false; }
#endif
......
......@@ -20,6 +20,7 @@
#define __ARM_KVM_H__
#include <linux/types.h>
#include <linux/psci.h>
#include <asm/ptrace.h>
#define __KVM_HAVE_GUEST_DEBUG
......@@ -83,6 +84,7 @@ struct kvm_regs {
#define KVM_VGIC_V2_CPU_SIZE 0x2000
#define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */
#define KVM_ARM_VCPU_PSCI_0_2 1 /* CPU uses PSCI v0.2 */
struct kvm_vcpu_init {
__u32 target;
......@@ -201,9 +203,9 @@ struct kvm_arch_memory_slot {
#define KVM_PSCI_FN_CPU_ON KVM_PSCI_FN(2)
#define KVM_PSCI_FN_MIGRATE KVM_PSCI_FN(3)
#define KVM_PSCI_RET_SUCCESS 0
#define KVM_PSCI_RET_NI ((unsigned long)-1)
#define KVM_PSCI_RET_INVAL ((unsigned long)-2)
#define KVM_PSCI_RET_DENIED ((unsigned long)-3)
#define KVM_PSCI_RET_SUCCESS PSCI_RET_SUCCESS
#define KVM_PSCI_RET_NI PSCI_RET_NOT_SUPPORTED
#define KVM_PSCI_RET_INVAL PSCI_RET_INVALID_PARAMS
#define KVM_PSCI_RET_DENIED PSCI_RET_DENIED
#endif /* __ARM_KVM_H__ */
......@@ -17,63 +17,58 @@
#include <linux/init.h>
#include <linux/of.h>
#include <linux/reboot.h>
#include <linux/pm.h>
#include <uapi/linux/psci.h>
#include <asm/compiler.h>
#include <asm/errno.h>
#include <asm/opcodes-sec.h>
#include <asm/opcodes-virt.h>
#include <asm/psci.h>
#include <asm/system_misc.h>
struct psci_operations psci_ops;
static int (*invoke_psci_fn)(u32, u32, u32, u32);
typedef int (*psci_initcall_t)(const struct device_node *);
enum psci_function {
PSCI_FN_CPU_SUSPEND,
PSCI_FN_CPU_ON,
PSCI_FN_CPU_OFF,
PSCI_FN_MIGRATE,
PSCI_FN_AFFINITY_INFO,
PSCI_FN_MIGRATE_INFO_TYPE,
PSCI_FN_MAX,
};
static u32 psci_function_id[PSCI_FN_MAX];
#define PSCI_RET_SUCCESS 0
#define PSCI_RET_EOPNOTSUPP -1
#define PSCI_RET_EINVAL -2
#define PSCI_RET_EPERM -3
static int psci_to_linux_errno(int errno)
{
switch (errno) {
case PSCI_RET_SUCCESS:
return 0;
case PSCI_RET_EOPNOTSUPP:
case PSCI_RET_NOT_SUPPORTED:
return -EOPNOTSUPP;
case PSCI_RET_EINVAL:
case PSCI_RET_INVALID_PARAMS:
return -EINVAL;
case PSCI_RET_EPERM:
case PSCI_RET_DENIED:
return -EPERM;
};
return -EINVAL;
}
#define PSCI_POWER_STATE_ID_MASK 0xffff
#define PSCI_POWER_STATE_ID_SHIFT 0
#define PSCI_POWER_STATE_TYPE_MASK 0x1
#define PSCI_POWER_STATE_TYPE_SHIFT 16
#define PSCI_POWER_STATE_AFFL_MASK 0x3
#define PSCI_POWER_STATE_AFFL_SHIFT 24
static u32 psci_power_state_pack(struct psci_power_state state)
{
return ((state.id & PSCI_POWER_STATE_ID_MASK)
<< PSCI_POWER_STATE_ID_SHIFT) |
((state.type & PSCI_POWER_STATE_TYPE_MASK)
<< PSCI_POWER_STATE_TYPE_SHIFT) |
((state.affinity_level & PSCI_POWER_STATE_AFFL_MASK)
<< PSCI_POWER_STATE_AFFL_SHIFT);
return ((state.id << PSCI_0_2_POWER_STATE_ID_SHIFT)
& PSCI_0_2_POWER_STATE_ID_MASK) |
((state.type << PSCI_0_2_POWER_STATE_TYPE_SHIFT)
& PSCI_0_2_POWER_STATE_TYPE_MASK) |
((state.affinity_level << PSCI_0_2_POWER_STATE_AFFL_SHIFT)
& PSCI_0_2_POWER_STATE_AFFL_MASK);
}
/*
......@@ -110,6 +105,14 @@ static noinline int __invoke_psci_fn_smc(u32 function_id, u32 arg0, u32 arg1,
return function_id;
}
static int psci_get_version(void)
{
int err;
err = invoke_psci_fn(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
return err;
}
static int psci_cpu_suspend(struct psci_power_state state,
unsigned long entry_point)
{
......@@ -153,26 +156,36 @@ static int psci_migrate(unsigned long cpuid)
return psci_to_linux_errno(err);
}
static const struct of_device_id psci_of_match[] __initconst = {
{ .compatible = "arm,psci", },
{},
};
static int psci_affinity_info(unsigned long target_affinity,
unsigned long lowest_affinity_level)
{
int err;
u32 fn;
fn = psci_function_id[PSCI_FN_AFFINITY_INFO];
err = invoke_psci_fn(fn, target_affinity, lowest_affinity_level, 0);
return err;
}
void __init psci_init(void)
static int psci_migrate_info_type(void)
{
struct device_node *np;
const char *method;
u32 id;
int err;
u32 fn;
np = of_find_matching_node(NULL, psci_of_match);
if (!np)
return;
fn = psci_function_id[PSCI_FN_MIGRATE_INFO_TYPE];
err = invoke_psci_fn(fn, 0, 0, 0);
return err;
}
static int get_set_conduit_method(struct device_node *np)
{
const char *method;
pr_info("probing function IDs from device-tree\n");
pr_info("probing for conduit method from DT.\n");
if (of_property_read_string(np, "method", &method)) {
pr_warning("missing \"method\" property\n");
goto out_put_node;
pr_warn("missing \"method\" property\n");
return -ENXIO;
}
if (!strcmp("hvc", method)) {
......@@ -180,10 +193,99 @@ void __init psci_init(void)
} else if (!strcmp("smc", method)) {
invoke_psci_fn = __invoke_psci_fn_smc;
} else {
pr_warning("invalid \"method\" property: %s\n", method);
pr_warn("invalid \"method\" property: %s\n", method);
return -EINVAL;
}
return 0;
}
static void psci_sys_reset(enum reboot_mode reboot_mode, const char *cmd)
{
invoke_psci_fn(PSCI_0_2_FN_SYSTEM_RESET, 0, 0, 0);
}
static void psci_sys_poweroff(void)
{
invoke_psci_fn(PSCI_0_2_FN_SYSTEM_OFF, 0, 0, 0);
}
/*
* PSCI Function IDs for v0.2+ are well defined so use
* standard values.
*/
static int psci_0_2_init(struct device_node *np)
{
int err, ver;
err = get_set_conduit_method(np);
if (err)
goto out_put_node;
ver = psci_get_version();
if (ver == PSCI_RET_NOT_SUPPORTED) {
/* PSCI v0.2 mandates implementation of PSCI_ID_VERSION. */
pr_err("PSCI firmware does not comply with the v0.2 spec.\n");
err = -EOPNOTSUPP;
goto out_put_node;
} else {
pr_info("PSCIv%d.%d detected in firmware.\n",
PSCI_VERSION_MAJOR(ver),
PSCI_VERSION_MINOR(ver));
if (PSCI_VERSION_MAJOR(ver) == 0 &&
PSCI_VERSION_MINOR(ver) < 2) {
err = -EINVAL;
pr_err("Conflicting PSCI version detected.\n");
goto out_put_node;
}
}
pr_info("Using standard PSCI v0.2 function IDs\n");
psci_function_id[PSCI_FN_CPU_SUSPEND] = PSCI_0_2_FN_CPU_SUSPEND;
psci_ops.cpu_suspend = psci_cpu_suspend;
psci_function_id[PSCI_FN_CPU_OFF] = PSCI_0_2_FN_CPU_OFF;
psci_ops.cpu_off = psci_cpu_off;
psci_function_id[PSCI_FN_CPU_ON] = PSCI_0_2_FN_CPU_ON;
psci_ops.cpu_on = psci_cpu_on;
psci_function_id[PSCI_FN_MIGRATE] = PSCI_0_2_FN_MIGRATE;
psci_ops.migrate = psci_migrate;
psci_function_id[PSCI_FN_AFFINITY_INFO] = PSCI_0_2_FN_AFFINITY_INFO;
psci_ops.affinity_info = psci_affinity_info;
psci_function_id[PSCI_FN_MIGRATE_INFO_TYPE] =
PSCI_0_2_FN_MIGRATE_INFO_TYPE;
psci_ops.migrate_info_type = psci_migrate_info_type;
arm_pm_restart = psci_sys_reset;
pm_power_off = psci_sys_poweroff;
out_put_node:
of_node_put(np);
return err;
}
/*
* PSCI < v0.2 get PSCI Function IDs via DT.
*/
static int psci_0_1_init(struct device_node *np)
{
u32 id;
int err;
err = get_set_conduit_method(np);
if (err)
goto out_put_node;
pr_info("Using PSCI v0.1 Function IDs from DT\n");
if (!of_property_read_u32(np, "cpu_suspend", &id)) {
psci_function_id[PSCI_FN_CPU_SUSPEND] = id;
psci_ops.cpu_suspend = psci_cpu_suspend;
......@@ -206,5 +308,25 @@ void __init psci_init(void)
out_put_node:
of_node_put(np);
return;
return err;
}
static const struct of_device_id psci_of_match[] __initconst = {
{ .compatible = "arm,psci", .data = psci_0_1_init},
{ .compatible = "arm,psci-0.2", .data = psci_0_2_init},
{},
};
int __init psci_init(void)
{
struct device_node *np;
const struct of_device_id *matched_np;
psci_initcall_t init_fn;
np = of_find_matching_node_and_match(NULL, psci_of_match, &matched_np);
if (!np)
return -ENODEV;
init_fn = (psci_initcall_t)matched_np->data;
return init_fn(np);
}
......@@ -16,6 +16,8 @@
#include <linux/init.h>
#include <linux/smp.h>
#include <linux/of.h>
#include <linux/delay.h>
#include <uapi/linux/psci.h>
#include <asm/psci.h>
#include <asm/smp_plat.h>
......@@ -66,6 +68,36 @@ void __ref psci_cpu_die(unsigned int cpu)
/* We should never return */
panic("psci: cpu %d failed to shutdown\n", cpu);
}
int __ref psci_cpu_kill(unsigned int cpu)
{
int err, i;
if (!psci_ops.affinity_info)
return 1;
/*
* cpu_kill could race with cpu_die and we can
* potentially end up declaring this cpu undead
* while it is dying. So, try again a few times.
*/
for (i = 0; i < 10; i++) {
err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
pr_info("CPU%d killed.\n", cpu);
return 1;
}
msleep(10);
pr_info("Retrying again to check for CPU kill\n");
}
pr_warn("CPU%d may not have shut down cleanly (AFFINITY_INFO reports %d)\n",
cpu, err);
/* Make platform_cpu_kill() fail. */
return 0;
}
#endif
bool __init psci_smp_available(void)
......@@ -78,5 +110,6 @@ struct smp_operations __initdata psci_smp_ops = {
.smp_boot_secondary = psci_boot_secondary,
#ifdef CONFIG_HOTPLUG_CPU
.cpu_die = psci_cpu_die,
.cpu_kill = psci_cpu_kill,
#endif
};
......@@ -197,6 +197,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
case KVM_CAP_ONE_REG:
case KVM_CAP_ARM_PSCI:
case KVM_CAP_ARM_PSCI_0_2:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
......
......@@ -38,14 +38,18 @@ static int handle_svc_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
int ret;
trace_kvm_hvc(*vcpu_pc(vcpu), *vcpu_reg(vcpu, 0),
kvm_vcpu_hvc_get_imm(vcpu));
if (kvm_psci_call(vcpu))
ret = kvm_psci_call(vcpu);
if (ret < 0) {
kvm_inject_undefined(vcpu);
return 1;
}
kvm_inject_undefined(vcpu);
return 1;
return ret;
}
static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
......
......@@ -27,6 +27,36 @@
* as described in ARM document number ARM DEN 0022A.
*/
#define AFFINITY_MASK(level) ~((0x1UL << ((level) * MPIDR_LEVEL_BITS)) - 1)
static unsigned long psci_affinity_mask(unsigned long affinity_level)
{
if (affinity_level <= 3)
return MPIDR_HWID_BITMASK & AFFINITY_MASK(affinity_level);
return 0;
}
static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
{
/*
* NOTE: For simplicity, we make VCPU suspend emulation to be
* same-as WFI (Wait-for-interrupt) emulation.
*
* This means for KVM the wakeup events are interrupts and
* this is consistent with intended use of StateID as described
* in section 5.4.1 of PSCI v0.2 specification (ARM DEN 0022A).
*
* Further, we also treat power-down request to be same as
* stand-by request as-per section 5.4.2 clause 3 of PSCI v0.2
* specification (ARM DEN 0022A). This means all suspend states
* for KVM will preserve the register state.
*/
kvm_vcpu_block(vcpu);
return PSCI_RET_SUCCESS;
}
static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
{
vcpu->arch.pause = true;
......@@ -38,6 +68,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
struct kvm_vcpu *vcpu = NULL, *tmp;
wait_queue_head_t *wq;
unsigned long cpu_id;
unsigned long context_id;
unsigned long mpidr;
phys_addr_t target_pc;
int i;
......@@ -58,10 +89,17 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
* Make sure the caller requested a valid CPU and that the CPU is
* turned off.
*/
if (!vcpu || !vcpu->arch.pause)
return KVM_PSCI_RET_INVAL;
if (!vcpu)
return PSCI_RET_INVALID_PARAMS;
if (!vcpu->arch.pause) {
if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
return PSCI_RET_ALREADY_ON;
else
return PSCI_RET_INVALID_PARAMS;
}
target_pc = *vcpu_reg(source_vcpu, 2);
context_id = *vcpu_reg(source_vcpu, 3);
kvm_reset_vcpu(vcpu);
......@@ -76,26 +114,160 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
kvm_vcpu_set_be(vcpu);
*vcpu_pc(vcpu) = target_pc;
/*
* NOTE: We always update r0 (or x0) because for PSCI v0.1
* the general puspose registers are undefined upon CPU_ON.
*/
*vcpu_reg(vcpu, 0) = context_id;
vcpu->arch.pause = false;
smp_mb(); /* Make sure the above is visible */
wq = kvm_arch_vcpu_wq(vcpu);
wake_up_interruptible(wq);
return KVM_PSCI_RET_SUCCESS;
return PSCI_RET_SUCCESS;
}
/**
* kvm_psci_call - handle PSCI call if r0 value is in range
* @vcpu: Pointer to the VCPU struct
*
* Handle PSCI calls from guests through traps from HVC instructions.
* The calling convention is similar to SMC calls to the secure world where
* the function number is placed in r0 and this function returns true if the
* function number specified in r0 is withing the PSCI range, and false
* otherwise.
*/
bool kvm_psci_call(struct kvm_vcpu *vcpu)
static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
{
int i;
unsigned long mpidr;
unsigned long target_affinity;
unsigned long target_affinity_mask;
unsigned long lowest_affinity_level;
struct kvm *kvm = vcpu->kvm;
struct kvm_vcpu *tmp;
target_affinity = *vcpu_reg(vcpu, 1);
lowest_affinity_level = *vcpu_reg(vcpu, 2);
/* Determine target affinity mask */
target_affinity_mask = psci_affinity_mask(lowest_affinity_level);
if (!target_affinity_mask)
return PSCI_RET_INVALID_PARAMS;
/* Ignore other bits of target affinity */
target_affinity &= target_affinity_mask;
/*
* If one or more VCPU matching target affinity are running
* then ON else OFF
*/
kvm_for_each_vcpu(i, tmp, kvm) {
mpidr = kvm_vcpu_get_mpidr(tmp);
if (((mpidr & target_affinity_mask) == target_affinity) &&
!tmp->arch.pause) {
return PSCI_0_2_AFFINITY_LEVEL_ON;
}
}
return PSCI_0_2_AFFINITY_LEVEL_OFF;
}
static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
{
memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
vcpu->run->system_event.type = type;
vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
}
static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
{
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
}
static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
{
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
}
int kvm_psci_version(struct kvm_vcpu *vcpu)
{
if (test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features))
return KVM_ARM_PSCI_0_2;
return KVM_ARM_PSCI_0_1;
}
static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
{
int ret = 1;
unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
unsigned long val;
switch (psci_fn) {
case PSCI_0_2_FN_PSCI_VERSION:
/*
* Bits[31:16] = Major Version = 0
* Bits[15:0] = Minor Version = 2
*/
val = 2;
break;
case PSCI_0_2_FN_CPU_SUSPEND:
case PSCI_0_2_FN64_CPU_SUSPEND:
val = kvm_psci_vcpu_suspend(vcpu);
break;
case PSCI_0_2_FN_CPU_OFF:
kvm_psci_vcpu_off(vcpu);
val = PSCI_RET_SUCCESS;
break;
case PSCI_0_2_FN_CPU_ON:
case PSCI_0_2_FN64_CPU_ON:
val = kvm_psci_vcpu_on(vcpu);
break;
case PSCI_0_2_FN_AFFINITY_INFO:
case PSCI_0_2_FN64_AFFINITY_INFO:
val = kvm_psci_vcpu_affinity_info(vcpu);
break;
case PSCI_0_2_FN_MIGRATE:
case PSCI_0_2_FN64_MIGRATE:
val = PSCI_RET_NOT_SUPPORTED;
break;
case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
/*
* Trusted OS is MP hence does not require migration
* or
* Trusted OS is not present
*/
val = PSCI_0_2_TOS_MP;
break;
case PSCI_0_2_FN_MIGRATE_INFO_UP_CPU:
case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU:
val = PSCI_RET_NOT_SUPPORTED;
break;
case PSCI_0_2_FN_SYSTEM_OFF:
kvm_psci_system_off(vcpu);
/*
* We should'nt be going back to guest VCPU after
* receiving SYSTEM_OFF request.
*
* If user space accidently/deliberately resumes
* guest VCPU after SYSTEM_OFF request then guest
* VCPU should see internal failure from PSCI return
* value. To achieve this, we preload r0 (or x0) with
* PSCI return value INTERNAL_FAILURE.
*/
val = PSCI_RET_INTERNAL_FAILURE;
ret = 0;
break;
case PSCI_0_2_FN_SYSTEM_RESET:
kvm_psci_system_reset(vcpu);
/*
* Same reason as SYSTEM_OFF for preloading r0 (or x0)
* with PSCI return value INTERNAL_FAILURE.
*/
val = PSCI_RET_INTERNAL_FAILURE;
ret = 0;
break;
default:
return -EINVAL;
}
*vcpu_reg(vcpu, 0) = val;
return ret;
}
static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
{
unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
unsigned long val;
......@@ -103,20 +275,45 @@ bool kvm_psci_call(struct kvm_vcpu *vcpu)
switch (psci_fn) {
case KVM_PSCI_FN_CPU_OFF:
kvm_psci_vcpu_off(vcpu);
val = KVM_PSCI_RET_SUCCESS;
val = PSCI_RET_SUCCESS;
break;
case KVM_PSCI_FN_CPU_ON:
val = kvm_psci_vcpu_on(vcpu);
break;
case KVM_PSCI_FN_CPU_SUSPEND:
case KVM_PSCI_FN_MIGRATE:
val = KVM_PSCI_RET_NI;
val = PSCI_RET_NOT_SUPPORTED;
break;
default:
return false;
return -EINVAL;
}
*vcpu_reg(vcpu, 0) = val;
return true;
return 1;
}
/**
* kvm_psci_call - handle PSCI call if r0 value is in range
* @vcpu: Pointer to the VCPU struct
*
* Handle PSCI calls from guests through traps from HVC instructions.
* The calling convention is similar to SMC calls to the secure world
* where the function number is placed in r0.
*
* This function returns: > 0 (success), 0 (success but exit to user
* space), and < 0 (errors)
*
* Errors:
* -EINVAL: Unrecognized PSCI function
*/
int kvm_psci_call(struct kvm_vcpu *vcpu)
{
switch (kvm_psci_version(vcpu)) {
case KVM_ARM_PSCI_0_2:
return kvm_psci_0_2_call(vcpu);
case KVM_ARM_PSCI_0_1:
return kvm_psci_0_1_call(vcpu);
default:
return -EINVAL;
};
}
......@@ -39,6 +39,7 @@ struct device_node;
* from the cpu to be killed.
* @cpu_die: Makes a cpu leave the kernel. Must not fail. Called from the
* cpu being killed.
* @cpu_kill: Ensures a cpu has left the kernel. Called from another cpu.
* @cpu_suspend: Suspends a cpu and saves the required context. May fail owing
* to wrong parameters or error conditions. Called from the
* CPU being suspended. Must be called with IRQs disabled.
......@@ -52,6 +53,7 @@ struct cpu_operations {
#ifdef CONFIG_HOTPLUG_CPU
int (*cpu_disable)(unsigned int cpu);
void (*cpu_die)(unsigned int cpu);
int (*cpu_kill)(unsigned int cpu);
#endif
#ifdef CONFIG_ARM64_CPU_SUSPEND
int (*cpu_suspend)(unsigned long);
......
......@@ -41,6 +41,7 @@
#define ARM_CPU_PART_AEM_V8 0xD0F0
#define ARM_CPU_PART_FOUNDATION 0xD000
#define ARM_CPU_PART_CORTEX_A53 0xD030
#define ARM_CPU_PART_CORTEX_A57 0xD070
#define APM_CPU_PART_POTENZA 0x0000
......
......@@ -39,7 +39,7 @@
#include <kvm/arm_vgic.h>
#include <kvm/arm_arch_timer.h>
#define KVM_VCPU_MAX_FEATURES 2
#define KVM_VCPU_MAX_FEATURES 3
struct kvm_vcpu;
int kvm_target_cpu(void);
......
......@@ -18,6 +18,10 @@
#ifndef __ARM64_KVM_PSCI_H__
#define __ARM64_KVM_PSCI_H__
bool kvm_psci_call(struct kvm_vcpu *vcpu);
#define KVM_ARM_PSCI_0_1 1
#define KVM_ARM_PSCI_0_2 2
int kvm_psci_version(struct kvm_vcpu *vcpu);
int kvm_psci_call(struct kvm_vcpu *vcpu);
#endif /* __ARM64_KVM_PSCI_H__ */
......@@ -14,6 +14,6 @@
#ifndef __ASM_PSCI_H
#define __ASM_PSCI_H
void psci_init(void);
int psci_init(void);
#endif /* __ASM_PSCI_H */
......@@ -31,6 +31,7 @@
#define KVM_NR_SPSR 5
#ifndef __ASSEMBLY__
#include <linux/psci.h>
#include <asm/types.h>
#include <asm/ptrace.h>
......@@ -56,8 +57,9 @@ struct kvm_regs {
#define KVM_ARM_TARGET_FOUNDATION_V8 1
#define KVM_ARM_TARGET_CORTEX_A57 2
#define KVM_ARM_TARGET_XGENE_POTENZA 3
#define KVM_ARM_TARGET_CORTEX_A53 4
#define KVM_ARM_NUM_TARGETS 4
#define KVM_ARM_NUM_TARGETS 5
/* KVM_ARM_SET_DEVICE_ADDR ioctl id encoding */
#define KVM_ARM_DEVICE_TYPE_SHIFT 0
......@@ -77,6 +79,7 @@ struct kvm_regs {
#define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */
#define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
#define KVM_ARM_VCPU_PSCI_0_2 2 /* CPU uses PSCI v0.2 */
struct kvm_vcpu_init {
__u32 target;
......@@ -186,10 +189,10 @@ struct kvm_arch_memory_slot {
#define KVM_PSCI_FN_CPU_ON KVM_PSCI_FN(2)
#define KVM_PSCI_FN_MIGRATE KVM_PSCI_FN(3)
#define KVM_PSCI_RET_SUCCESS 0
#define KVM_PSCI_RET_NI ((unsigned long)-1)
#define KVM_PSCI_RET_INVAL ((unsigned long)-2)
#define KVM_PSCI_RET_DENIED ((unsigned long)-3)
#define KVM_PSCI_RET_SUCCESS PSCI_RET_SUCCESS
#define KVM_PSCI_RET_NI PSCI_RET_NOT_SUPPORTED
#define KVM_PSCI_RET_INVAL PSCI_RET_INVALID_PARAMS
#define KVM_PSCI_RET_DENIED PSCI_RET_DENIED
#endif
......
......@@ -18,12 +18,17 @@
#include <linux/init.h>
#include <linux/of.h>
#include <linux/smp.h>
#include <linux/reboot.h>
#include <linux/pm.h>
#include <linux/delay.h>
#include <uapi/linux/psci.h>
#include <asm/compiler.h>
#include <asm/cpu_ops.h>
#include <asm/errno.h>
#include <asm/psci.h>
#include <asm/smp_plat.h>
#include <asm/system_misc.h>
#define PSCI_POWER_STATE_TYPE_STANDBY 0
#define PSCI_POWER_STATE_TYPE_POWER_DOWN 1
......@@ -40,58 +45,52 @@ struct psci_operations {
int (*cpu_off)(struct psci_power_state state);
int (*cpu_on)(unsigned long cpuid, unsigned long entry_point);
int (*migrate)(unsigned long cpuid);
int (*affinity_info)(unsigned long target_affinity,
unsigned long lowest_affinity_level);
int (*migrate_info_type)(void);
};
static struct psci_operations psci_ops;
static int (*invoke_psci_fn)(u64, u64, u64, u64);
typedef int (*psci_initcall_t)(const struct device_node *);
enum psci_function {
PSCI_FN_CPU_SUSPEND,
PSCI_FN_CPU_ON,
PSCI_FN_CPU_OFF,
PSCI_FN_MIGRATE,
PSCI_FN_AFFINITY_INFO,
PSCI_FN_MIGRATE_INFO_TYPE,
PSCI_FN_MAX,
};
static u32 psci_function_id[PSCI_FN_MAX];
#define PSCI_RET_SUCCESS 0
#define PSCI_RET_EOPNOTSUPP -1
#define PSCI_RET_EINVAL -2
#define PSCI_RET_EPERM -3
static int psci_to_linux_errno(int errno)
{
switch (errno) {
case PSCI_RET_SUCCESS:
return 0;
case PSCI_RET_EOPNOTSUPP:
case PSCI_RET_NOT_SUPPORTED:
return -EOPNOTSUPP;
case PSCI_RET_EINVAL:
case PSCI_RET_INVALID_PARAMS:
return -EINVAL;
case PSCI_RET_EPERM:
case PSCI_RET_DENIED:
return -EPERM;
};
return -EINVAL;
}
#define PSCI_POWER_STATE_ID_MASK 0xffff
#define PSCI_POWER_STATE_ID_SHIFT 0
#define PSCI_POWER_STATE_TYPE_MASK 0x1
#define PSCI_POWER_STATE_TYPE_SHIFT 16
#define PSCI_POWER_STATE_AFFL_MASK 0x3
#define PSCI_POWER_STATE_AFFL_SHIFT 24
static u32 psci_power_state_pack(struct psci_power_state state)
{
return ((state.id & PSCI_POWER_STATE_ID_MASK)
<< PSCI_POWER_STATE_ID_SHIFT) |
((state.type & PSCI_POWER_STATE_TYPE_MASK)
<< PSCI_POWER_STATE_TYPE_SHIFT) |
((state.affinity_level & PSCI_POWER_STATE_AFFL_MASK)
<< PSCI_POWER_STATE_AFFL_SHIFT);
return ((state.id << PSCI_0_2_POWER_STATE_ID_SHIFT)
& PSCI_0_2_POWER_STATE_ID_MASK) |
((state.type << PSCI_0_2_POWER_STATE_TYPE_SHIFT)
& PSCI_0_2_POWER_STATE_TYPE_MASK) |
((state.affinity_level << PSCI_0_2_POWER_STATE_AFFL_SHIFT)
& PSCI_0_2_POWER_STATE_AFFL_MASK);
}
/*
......@@ -128,6 +127,14 @@ static noinline int __invoke_psci_fn_smc(u64 function_id, u64 arg0, u64 arg1,
return function_id;
}
static int psci_get_version(void)
{
int err;
err = invoke_psci_fn(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
return err;
}
static int psci_cpu_suspend(struct psci_power_state state,
unsigned long entry_point)
{
......@@ -171,26 +178,36 @@ static int psci_migrate(unsigned long cpuid)
return psci_to_linux_errno(err);
}
static const struct of_device_id psci_of_match[] __initconst = {
{ .compatible = "arm,psci", },
{},
};
static int psci_affinity_info(unsigned long target_affinity,
unsigned long lowest_affinity_level)
{
int err;
u32 fn;
fn = psci_function_id[PSCI_FN_AFFINITY_INFO];
err = invoke_psci_fn(fn, target_affinity, lowest_affinity_level, 0);
return err;
}
void __init psci_init(void)
static int psci_migrate_info_type(void)
{
struct device_node *np;
const char *method;
u32 id;
int err;
u32 fn;
np = of_find_matching_node(NULL, psci_of_match);
if (!np)
return;
fn = psci_function_id[PSCI_FN_MIGRATE_INFO_TYPE];
err = invoke_psci_fn(fn, 0, 0, 0);
return err;
}
pr_info("probing function IDs from device-tree\n");
static int get_set_conduit_method(struct device_node *np)
{
const char *method;
pr_info("probing for conduit method from DT.\n");
if (of_property_read_string(np, "method", &method)) {
pr_warning("missing \"method\" property\n");
goto out_put_node;
pr_warn("missing \"method\" property\n");
return -ENXIO;
}
if (!strcmp("hvc", method)) {
......@@ -198,10 +215,99 @@ void __init psci_init(void)
} else if (!strcmp("smc", method)) {
invoke_psci_fn = __invoke_psci_fn_smc;
} else {
pr_warning("invalid \"method\" property: %s\n", method);
pr_warn("invalid \"method\" property: %s\n", method);
return -EINVAL;
}
return 0;
}
static void psci_sys_reset(enum reboot_mode reboot_mode, const char *cmd)
{
invoke_psci_fn(PSCI_0_2_FN_SYSTEM_RESET, 0, 0, 0);
}
static void psci_sys_poweroff(void)
{
invoke_psci_fn(PSCI_0_2_FN_SYSTEM_OFF, 0, 0, 0);
}
/*
* PSCI Function IDs for v0.2+ are well defined so use
* standard values.
*/
static int psci_0_2_init(struct device_node *np)
{
int err, ver;
err = get_set_conduit_method(np);
if (err)
goto out_put_node;
ver = psci_get_version();
if (ver == PSCI_RET_NOT_SUPPORTED) {
/* PSCI v0.2 mandates implementation of PSCI_ID_VERSION. */
pr_err("PSCI firmware does not comply with the v0.2 spec.\n");
err = -EOPNOTSUPP;
goto out_put_node;
} else {
pr_info("PSCIv%d.%d detected in firmware.\n",
PSCI_VERSION_MAJOR(ver),
PSCI_VERSION_MINOR(ver));
if (PSCI_VERSION_MAJOR(ver) == 0 &&
PSCI_VERSION_MINOR(ver) < 2) {
err = -EINVAL;
pr_err("Conflicting PSCI version detected.\n");
goto out_put_node;
}
}
pr_info("Using standard PSCI v0.2 function IDs\n");
psci_function_id[PSCI_FN_CPU_SUSPEND] = PSCI_0_2_FN64_CPU_SUSPEND;
psci_ops.cpu_suspend = psci_cpu_suspend;
psci_function_id[PSCI_FN_CPU_OFF] = PSCI_0_2_FN_CPU_OFF;
psci_ops.cpu_off = psci_cpu_off;
psci_function_id[PSCI_FN_CPU_ON] = PSCI_0_2_FN64_CPU_ON;
psci_ops.cpu_on = psci_cpu_on;
psci_function_id[PSCI_FN_MIGRATE] = PSCI_0_2_FN64_MIGRATE;
psci_ops.migrate = psci_migrate;
psci_function_id[PSCI_FN_AFFINITY_INFO] = PSCI_0_2_FN64_AFFINITY_INFO;
psci_ops.affinity_info = psci_affinity_info;
psci_function_id[PSCI_FN_MIGRATE_INFO_TYPE] =
PSCI_0_2_FN_MIGRATE_INFO_TYPE;
psci_ops.migrate_info_type = psci_migrate_info_type;
arm_pm_restart = psci_sys_reset;
pm_power_off = psci_sys_poweroff;
out_put_node:
of_node_put(np);
return err;
}
/*
* PSCI < v0.2 get PSCI Function IDs via DT.
*/
static int psci_0_1_init(struct device_node *np)
{
u32 id;
int err;
err = get_set_conduit_method(np);
if (err)
goto out_put_node;
pr_info("Using PSCI v0.1 Function IDs from DT\n");
if (!of_property_read_u32(np, "cpu_suspend", &id)) {
psci_function_id[PSCI_FN_CPU_SUSPEND] = id;
psci_ops.cpu_suspend = psci_cpu_suspend;
......@@ -224,7 +330,28 @@ void __init psci_init(void)
out_put_node:
of_node_put(np);
return;
return err;
}
static const struct of_device_id psci_of_match[] __initconst = {
{ .compatible = "arm,psci", .data = psci_0_1_init},
{ .compatible = "arm,psci-0.2", .data = psci_0_2_init},
{},
};
int __init psci_init(void)
{
struct device_node *np;
const struct of_device_id *matched_np;
psci_initcall_t init_fn;
np = of_find_matching_node_and_match(NULL, psci_of_match, &matched_np);
if (!np)
return -ENODEV;
init_fn = (psci_initcall_t)matched_np->data;
return init_fn(np);
}
#ifdef CONFIG_SMP
......@@ -277,6 +404,35 @@ static void cpu_psci_cpu_die(unsigned int cpu)
pr_crit("unable to power off CPU%u (%d)\n", cpu, ret);
}
static int cpu_psci_cpu_kill(unsigned int cpu)
{
int err, i;
if (!psci_ops.affinity_info)
return 1;
/*
* cpu_kill could race with cpu_die and we can
* potentially end up declaring this cpu undead
* while it is dying. So, try again a few times.
*/
for (i = 0; i < 10; i++) {
err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
pr_info("CPU%d killed.\n", cpu);
return 1;
}
msleep(10);
pr_info("Retrying again to check for CPU kill\n");
}
pr_warn("CPU%d may not have shut down cleanly (AFFINITY_INFO reports %d)\n",
cpu, err);
/* Make op_cpu_kill() fail. */
return 0;
}
#endif
const struct cpu_operations cpu_psci_ops = {
......@@ -287,6 +443,7 @@ const struct cpu_operations cpu_psci_ops = {
#ifdef CONFIG_HOTPLUG_CPU
.cpu_disable = cpu_psci_cpu_disable,
.cpu_die = cpu_psci_cpu_die,
.cpu_kill = cpu_psci_cpu_kill,
#endif
};
......
......@@ -228,6 +228,19 @@ int __cpu_disable(void)
return 0;
}
static int op_cpu_kill(unsigned int cpu)
{
/*
* If we have no means of synchronising with the dying CPU, then assume
* that it is really dead. We can only wait for an arbitrary length of
* time and hope that it's dead, so let's skip the wait and just hope.
*/
if (!cpu_ops[cpu]->cpu_kill)
return 1;
return cpu_ops[cpu]->cpu_kill(cpu);
}
static DECLARE_COMPLETION(cpu_died);
/*
......@@ -241,6 +254,15 @@ void __cpu_die(unsigned int cpu)
return;
}
pr_notice("CPU%u: shutdown\n", cpu);
/*
* Now that the dying CPU is beyond the point of no return w.r.t.
* in-kernel synchronisation, try to get the firwmare to help us to
* verify that it has really left the kernel before we consider
* clobbering anything it might still be using.
*/
if (!op_cpu_kill(cpu))
pr_warn("CPU%d may not have shut down cleanly\n", cpu);
}
/*
......
......@@ -214,6 +214,8 @@ int __attribute_const__ kvm_target_cpu(void)
return KVM_ARM_TARGET_AEM_V8;
case ARM_CPU_PART_FOUNDATION:
return KVM_ARM_TARGET_FOUNDATION_V8;
case ARM_CPU_PART_CORTEX_A53:
return KVM_ARM_TARGET_CORTEX_A53;
case ARM_CPU_PART_CORTEX_A57:
return KVM_ARM_TARGET_CORTEX_A57;
};
......
......@@ -30,11 +30,15 @@ typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *);
static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
if (kvm_psci_call(vcpu))
int ret;
ret = kvm_psci_call(vcpu);
if (ret < 0) {
kvm_inject_undefined(vcpu);
return 1;
}
kvm_inject_undefined(vcpu);
return 1;
return ret;
}
static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
......
......@@ -88,6 +88,8 @@ static int __init sys_reg_genericv8_init(void)
&genericv8_target_table);
kvm_register_target_sys_reg_table(KVM_ARM_TARGET_FOUNDATION_V8,
&genericv8_target_table);
kvm_register_target_sys_reg_table(KVM_ARM_TARGET_CORTEX_A53,
&genericv8_target_table);
kvm_register_target_sys_reg_table(KVM_ARM_TARGET_CORTEX_A57,
&genericv8_target_table);
kvm_register_target_sys_reg_table(KVM_ARM_TARGET_XGENE_POTENZA,
......
......@@ -1756,14 +1756,14 @@ config KVM_GUEST
help
Select this option if building a guest kernel for KVM (Trap & Emulate) mode
config KVM_HOST_FREQ
int "KVM Host Processor Frequency (MHz)"
config KVM_GUEST_TIMER_FREQ
int "Count/Compare Timer Frequency (MHz)"
depends on KVM_GUEST
default 500
default 100
help
Select this option if building a guest kernel for KVM to skip
RTC emulation when determining guest CPU Frequency. Instead, the guest
processor frequency is automatically derived from the host frequency.
Set this to non-zero if building a guest kernel for KVM to skip RTC
emulation when determining guest CPU Frequency. Instead, the guest's
timer frequency is specified directly.
choice
prompt "Kernel page size"
......
......@@ -19,6 +19,38 @@
#include <linux/threads.h>
#include <linux/spinlock.h>
/* MIPS KVM register ids */
#define MIPS_CP0_32(_R, _S) \
(KVM_REG_MIPS | KVM_REG_SIZE_U32 | 0x10000 | (8 * (_R) + (_S)))
#define MIPS_CP0_64(_R, _S) \
(KVM_REG_MIPS | KVM_REG_SIZE_U64 | 0x10000 | (8 * (_R) + (_S)))
#define KVM_REG_MIPS_CP0_INDEX MIPS_CP0_32(0, 0)
#define KVM_REG_MIPS_CP0_ENTRYLO0 MIPS_CP0_64(2, 0)
#define KVM_REG_MIPS_CP0_ENTRYLO1 MIPS_CP0_64(3, 0)
#define KVM_REG_MIPS_CP0_CONTEXT MIPS_CP0_64(4, 0)
#define KVM_REG_MIPS_CP0_USERLOCAL MIPS_CP0_64(4, 2)
#define KVM_REG_MIPS_CP0_PAGEMASK MIPS_CP0_32(5, 0)
#define KVM_REG_MIPS_CP0_PAGEGRAIN MIPS_CP0_32(5, 1)
#define KVM_REG_MIPS_CP0_WIRED MIPS_CP0_32(6, 0)
#define KVM_REG_MIPS_CP0_HWRENA MIPS_CP0_32(7, 0)
#define KVM_REG_MIPS_CP0_BADVADDR MIPS_CP0_64(8, 0)
#define KVM_REG_MIPS_CP0_COUNT MIPS_CP0_32(9, 0)
#define KVM_REG_MIPS_CP0_ENTRYHI MIPS_CP0_64(10, 0)
#define KVM_REG_MIPS_CP0_COMPARE MIPS_CP0_32(11, 0)
#define KVM_REG_MIPS_CP0_STATUS MIPS_CP0_32(12, 0)
#define KVM_REG_MIPS_CP0_CAUSE MIPS_CP0_32(13, 0)
#define KVM_REG_MIPS_CP0_EPC MIPS_CP0_64(14, 0)
#define KVM_REG_MIPS_CP0_EBASE MIPS_CP0_64(15, 1)
#define KVM_REG_MIPS_CP0_CONFIG MIPS_CP0_32(16, 0)
#define KVM_REG_MIPS_CP0_CONFIG1 MIPS_CP0_32(16, 1)
#define KVM_REG_MIPS_CP0_CONFIG2 MIPS_CP0_32(16, 2)
#define KVM_REG_MIPS_CP0_CONFIG3 MIPS_CP0_32(16, 3)
#define KVM_REG_MIPS_CP0_CONFIG7 MIPS_CP0_32(16, 7)
#define KVM_REG_MIPS_CP0_XCONTEXT MIPS_CP0_64(20, 0)
#define KVM_REG_MIPS_CP0_ERROREPC MIPS_CP0_64(30, 0)
#define KVM_MAX_VCPUS 1
#define KVM_USER_MEM_SLOTS 8
......@@ -372,8 +404,19 @@ struct kvm_vcpu_arch {
u32 io_gpr; /* GPR used as IO source/target */
/* Used to calibrate the virutal count register for the guest */
int32_t host_cp0_count;
struct hrtimer comparecount_timer;
/* Count timer control KVM register */
uint32_t count_ctl;
/* Count bias from the raw time */
uint32_t count_bias;
/* Frequency of timer in Hz */
uint32_t count_hz;
/* Dynamic nanosecond bias (multiple of count_period) to avoid overflow */
s64 count_dyn_bias;
/* Resume time */
ktime_t count_resume;
/* Period of timer tick in ns */
u64 count_period;
/* Bitmask of exceptions that are pending */
unsigned long pending_exceptions;
......@@ -394,8 +437,6 @@ struct kvm_vcpu_arch {
uint32_t guest_kernel_asid[NR_CPUS];
struct mm_struct guest_kernel_mm, guest_user_mm;
struct hrtimer comparecount_timer;
int last_sched_cpu;
/* WAIT executed */
......@@ -410,6 +451,7 @@ struct kvm_vcpu_arch {
#define kvm_read_c0_guest_context(cop0) (cop0->reg[MIPS_CP0_TLB_CONTEXT][0])
#define kvm_write_c0_guest_context(cop0, val) (cop0->reg[MIPS_CP0_TLB_CONTEXT][0] = (val))
#define kvm_read_c0_guest_userlocal(cop0) (cop0->reg[MIPS_CP0_TLB_CONTEXT][2])
#define kvm_write_c0_guest_userlocal(cop0, val) (cop0->reg[MIPS_CP0_TLB_CONTEXT][2] = (val))
#define kvm_read_c0_guest_pagemask(cop0) (cop0->reg[MIPS_CP0_TLB_PG_MASK][0])
#define kvm_write_c0_guest_pagemask(cop0, val) (cop0->reg[MIPS_CP0_TLB_PG_MASK][0] = (val))
#define kvm_read_c0_guest_wired(cop0) (cop0->reg[MIPS_CP0_TLB_WIRED][0])
......@@ -449,15 +491,74 @@ struct kvm_vcpu_arch {
#define kvm_read_c0_guest_errorepc(cop0) (cop0->reg[MIPS_CP0_ERROR_PC][0])
#define kvm_write_c0_guest_errorepc(cop0, val) (cop0->reg[MIPS_CP0_ERROR_PC][0] = (val))
/*
* Some of the guest registers may be modified asynchronously (e.g. from a
* hrtimer callback in hard irq context) and therefore need stronger atomicity
* guarantees than other registers.
*/
static inline void _kvm_atomic_set_c0_guest_reg(unsigned long *reg,
unsigned long val)
{
unsigned long temp;
do {
__asm__ __volatile__(
" .set mips3 \n"
" " __LL "%0, %1 \n"
" or %0, %2 \n"
" " __SC "%0, %1 \n"
" .set mips0 \n"
: "=&r" (temp), "+m" (*reg)
: "r" (val));
} while (unlikely(!temp));
}
static inline void _kvm_atomic_clear_c0_guest_reg(unsigned long *reg,
unsigned long val)
{
unsigned long temp;
do {
__asm__ __volatile__(
" .set mips3 \n"
" " __LL "%0, %1 \n"
" and %0, %2 \n"
" " __SC "%0, %1 \n"
" .set mips0 \n"
: "=&r" (temp), "+m" (*reg)
: "r" (~val));
} while (unlikely(!temp));
}
static inline void _kvm_atomic_change_c0_guest_reg(unsigned long *reg,
unsigned long change,
unsigned long val)
{
unsigned long temp;
do {
__asm__ __volatile__(
" .set mips3 \n"
" " __LL "%0, %1 \n"
" and %0, %2 \n"
" or %0, %3 \n"
" " __SC "%0, %1 \n"
" .set mips0 \n"
: "=&r" (temp), "+m" (*reg)
: "r" (~change), "r" (val & change));
} while (unlikely(!temp));
}
#define kvm_set_c0_guest_status(cop0, val) (cop0->reg[MIPS_CP0_STATUS][0] |= (val))
#define kvm_clear_c0_guest_status(cop0, val) (cop0->reg[MIPS_CP0_STATUS][0] &= ~(val))
#define kvm_set_c0_guest_cause(cop0, val) (cop0->reg[MIPS_CP0_CAUSE][0] |= (val))
#define kvm_clear_c0_guest_cause(cop0, val) (cop0->reg[MIPS_CP0_CAUSE][0] &= ~(val))
/* Cause can be modified asynchronously from hardirq hrtimer callback */
#define kvm_set_c0_guest_cause(cop0, val) \
_kvm_atomic_set_c0_guest_reg(&cop0->reg[MIPS_CP0_CAUSE][0], val)
#define kvm_clear_c0_guest_cause(cop0, val) \
_kvm_atomic_clear_c0_guest_reg(&cop0->reg[MIPS_CP0_CAUSE][0], val)
#define kvm_change_c0_guest_cause(cop0, change, val) \
{ \
kvm_clear_c0_guest_cause(cop0, change); \
kvm_set_c0_guest_cause(cop0, ((val) & (change))); \
}
_kvm_atomic_change_c0_guest_reg(&cop0->reg[MIPS_CP0_CAUSE][0], \
change, val)
#define kvm_set_c0_guest_ebase(cop0, val) (cop0->reg[MIPS_CP0_PRID][1] |= (val))
#define kvm_clear_c0_guest_ebase(cop0, val) (cop0->reg[MIPS_CP0_PRID][1] &= ~(val))
#define kvm_change_c0_guest_ebase(cop0, change, val) \
......@@ -468,29 +569,33 @@ struct kvm_vcpu_arch {
struct kvm_mips_callbacks {
int (*handle_cop_unusable) (struct kvm_vcpu *vcpu);
int (*handle_tlb_mod) (struct kvm_vcpu *vcpu);
int (*handle_tlb_ld_miss) (struct kvm_vcpu *vcpu);
int (*handle_tlb_st_miss) (struct kvm_vcpu *vcpu);
int (*handle_addr_err_st) (struct kvm_vcpu *vcpu);
int (*handle_addr_err_ld) (struct kvm_vcpu *vcpu);
int (*handle_syscall) (struct kvm_vcpu *vcpu);
int (*handle_res_inst) (struct kvm_vcpu *vcpu);
int (*handle_break) (struct kvm_vcpu *vcpu);
int (*vm_init) (struct kvm *kvm);
int (*vcpu_init) (struct kvm_vcpu *vcpu);
int (*vcpu_setup) (struct kvm_vcpu *vcpu);
gpa_t(*gva_to_gpa) (gva_t gva);
void (*queue_timer_int) (struct kvm_vcpu *vcpu);
void (*dequeue_timer_int) (struct kvm_vcpu *vcpu);
void (*queue_io_int) (struct kvm_vcpu *vcpu,
struct kvm_mips_interrupt *irq);
void (*dequeue_io_int) (struct kvm_vcpu *vcpu,
struct kvm_mips_interrupt *irq);
int (*irq_deliver) (struct kvm_vcpu *vcpu, unsigned int priority,
uint32_t cause);
int (*irq_clear) (struct kvm_vcpu *vcpu, unsigned int priority,
uint32_t cause);
int (*handle_cop_unusable)(struct kvm_vcpu *vcpu);
int (*handle_tlb_mod)(struct kvm_vcpu *vcpu);
int (*handle_tlb_ld_miss)(struct kvm_vcpu *vcpu);
int (*handle_tlb_st_miss)(struct kvm_vcpu *vcpu);
int (*handle_addr_err_st)(struct kvm_vcpu *vcpu);
int (*handle_addr_err_ld)(struct kvm_vcpu *vcpu);
int (*handle_syscall)(struct kvm_vcpu *vcpu);
int (*handle_res_inst)(struct kvm_vcpu *vcpu);
int (*handle_break)(struct kvm_vcpu *vcpu);
int (*vm_init)(struct kvm *kvm);
int (*vcpu_init)(struct kvm_vcpu *vcpu);
int (*vcpu_setup)(struct kvm_vcpu *vcpu);
gpa_t (*gva_to_gpa)(gva_t gva);
void (*queue_timer_int)(struct kvm_vcpu *vcpu);
void (*dequeue_timer_int)(struct kvm_vcpu *vcpu);
void (*queue_io_int)(struct kvm_vcpu *vcpu,
struct kvm_mips_interrupt *irq);
void (*dequeue_io_int)(struct kvm_vcpu *vcpu,
struct kvm_mips_interrupt *irq);
int (*irq_deliver)(struct kvm_vcpu *vcpu, unsigned int priority,
uint32_t cause);
int (*irq_clear)(struct kvm_vcpu *vcpu, unsigned int priority,
uint32_t cause);
int (*get_one_reg)(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg, s64 *v);
int (*set_one_reg)(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg, s64 v);
};
extern struct kvm_mips_callbacks *kvm_mips_callbacks;
int kvm_mips_emulation_init(struct kvm_mips_callbacks **install_callbacks);
......@@ -609,7 +714,16 @@ extern enum emulation_result kvm_mips_emulate_bp_exc(unsigned long cause,
extern enum emulation_result kvm_mips_complete_mmio_load(struct kvm_vcpu *vcpu,
struct kvm_run *run);
enum emulation_result kvm_mips_emulate_count(struct kvm_vcpu *vcpu);
uint32_t kvm_mips_read_count(struct kvm_vcpu *vcpu);
void kvm_mips_write_count(struct kvm_vcpu *vcpu, uint32_t count);
void kvm_mips_write_compare(struct kvm_vcpu *vcpu, uint32_t compare);
void kvm_mips_init_count(struct kvm_vcpu *vcpu);
int kvm_mips_set_count_ctl(struct kvm_vcpu *vcpu, s64 count_ctl);
int kvm_mips_set_count_resume(struct kvm_vcpu *vcpu, s64 count_resume);
int kvm_mips_set_count_hz(struct kvm_vcpu *vcpu, s64 count_hz);
void kvm_mips_count_enable_cause(struct kvm_vcpu *vcpu);
void kvm_mips_count_disable_cause(struct kvm_vcpu *vcpu);
enum hrtimer_restart kvm_mips_count_timeout(struct kvm_vcpu *vcpu);
enum emulation_result kvm_mips_check_privilege(unsigned long cause,
uint32_t *opc,
......@@ -646,7 +760,6 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
struct kvm_vcpu *vcpu);
/* Misc */
extern void mips32_SyncICache(unsigned long addr, unsigned long size);
extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
......
......@@ -106,6 +106,41 @@ struct kvm_fpu {
#define KVM_REG_MIPS_LO (KVM_REG_MIPS | KVM_REG_SIZE_U64 | 33)
#define KVM_REG_MIPS_PC (KVM_REG_MIPS | KVM_REG_SIZE_U64 | 34)
/* KVM specific control registers */
/*
* CP0_Count control
* DC: Set 0: Master disable CP0_Count and set COUNT_RESUME to now
* Set 1: Master re-enable CP0_Count with unchanged bias, handling timer
* interrupts since COUNT_RESUME
* This can be used to freeze the timer to get a consistent snapshot of
* the CP0_Count and timer interrupt pending state, while also resuming
* safely without losing time or guest timer interrupts.
* Other: Reserved, do not change.
*/
#define KVM_REG_MIPS_COUNT_CTL (KVM_REG_MIPS | KVM_REG_SIZE_U64 | \
0x20000 | 0)
#define KVM_REG_MIPS_COUNT_CTL_DC 0x00000001
/*
* CP0_Count resume monotonic nanoseconds
* The monotonic nanosecond time of the last set of COUNT_CTL.DC (master
* disable). Any reads and writes of Count related registers while
* COUNT_CTL.DC=1 will appear to occur at this time. When COUNT_CTL.DC is
* cleared again (master enable) any timer interrupts since this time will be
* emulated.
* Modifications to times in the future are rejected.
*/
#define KVM_REG_MIPS_COUNT_RESUME (KVM_REG_MIPS | KVM_REG_SIZE_U64 | \
0x20000 | 1)
/*
* CP0_Count rate in Hz
* Specifies the rate of the CP0_Count timer in Hz. Modifications occur without
* discontinuities in CP0_Count.
*/
#define KVM_REG_MIPS_COUNT_HZ (KVM_REG_MIPS | KVM_REG_SIZE_U64 | \
0x20000 | 2)
/*
* KVM MIPS specific structures and definitions
*
......
......@@ -611,35 +611,3 @@ MIPSX(exceptions):
.word _C_LABEL(MIPSX(GuestException)) # 29
.word _C_LABEL(MIPSX(GuestException)) # 30
.word _C_LABEL(MIPSX(GuestException)) # 31
/* This routine makes changes to the instruction stream effective to the hardware.
* It should be called after the instruction stream is written.
* On return, the new instructions are effective.
* Inputs:
* a0 = Start address of new instruction stream
* a1 = Size, in bytes, of new instruction stream
*/
#define HW_SYNCI_Step $1
LEAF(MIPSX(SyncICache))
.set push
.set mips32r2
beq a1, zero, 20f
nop
REG_ADDU a1, a0, a1
rdhwr v0, HW_SYNCI_Step
beq v0, zero, 20f
nop
10:
synci 0(a0)
REG_ADDU a0, a0, v0
sltu v1, a0, a1
bne v1, zero, 10b
nop
sync
20:
jr.hb ra
nop
.set pop
END(MIPSX(SyncICache))
This diff is collapsed.
......@@ -16,6 +16,7 @@
#include <linux/vmalloc.h>
#include <linux/fs.h>
#include <linux/bootmem.h>
#include <asm/cacheflush.h>
#include "kvm_mips_comm.h"
......@@ -40,7 +41,7 @@ kvm_mips_trans_cache_index(uint32_t inst, uint32_t *opc,
CKSEG0ADDR(kvm_mips_translate_guest_kseg0_to_hpa
(vcpu, (unsigned long) opc));
memcpy((void *)kseg0_opc, (void *)&synci_inst, sizeof(uint32_t));
mips32_SyncICache(kseg0_opc, 32);
local_flush_icache_range(kseg0_opc, kseg0_opc + 32);
return result;
}
......@@ -66,7 +67,7 @@ kvm_mips_trans_cache_va(uint32_t inst, uint32_t *opc,
CKSEG0ADDR(kvm_mips_translate_guest_kseg0_to_hpa
(vcpu, (unsigned long) opc));
memcpy((void *)kseg0_opc, (void *)&synci_inst, sizeof(uint32_t));
mips32_SyncICache(kseg0_opc, 32);
local_flush_icache_range(kseg0_opc, kseg0_opc + 32);
return result;
}
......@@ -99,11 +100,12 @@ kvm_mips_trans_mfc0(uint32_t inst, uint32_t *opc, struct kvm_vcpu *vcpu)
CKSEG0ADDR(kvm_mips_translate_guest_kseg0_to_hpa
(vcpu, (unsigned long) opc));
memcpy((void *)kseg0_opc, (void *)&mfc0_inst, sizeof(uint32_t));
mips32_SyncICache(kseg0_opc, 32);
local_flush_icache_range(kseg0_opc, kseg0_opc + 32);
} else if (KVM_GUEST_KSEGX((unsigned long) opc) == KVM_GUEST_KSEG23) {
local_irq_save(flags);
memcpy((void *)opc, (void *)&mfc0_inst, sizeof(uint32_t));
mips32_SyncICache((unsigned long) opc, 32);
local_flush_icache_range((unsigned long)opc,
(unsigned long)opc + 32);
local_irq_restore(flags);
} else {
kvm_err("%s: Invalid address: %p\n", __func__, opc);
......@@ -134,11 +136,12 @@ kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc, struct kvm_vcpu *vcpu)
CKSEG0ADDR(kvm_mips_translate_guest_kseg0_to_hpa
(vcpu, (unsigned long) opc));
memcpy((void *)kseg0_opc, (void *)&mtc0_inst, sizeof(uint32_t));
mips32_SyncICache(kseg0_opc, 32);
local_flush_icache_range(kseg0_opc, kseg0_opc + 32);
} else if (KVM_GUEST_KSEGX((unsigned long) opc) == KVM_GUEST_KSEG23) {
local_irq_save(flags);
memcpy((void *)opc, (void *)&mtc0_inst, sizeof(uint32_t));
mips32_SyncICache((unsigned long) opc, 32);
local_flush_icache_range((unsigned long)opc,
(unsigned long)opc + 32);
local_irq_restore(flags);
} else {
kvm_err("%s: Invalid address: %p\n", __func__, opc);
......
This diff is collapsed.
......@@ -222,26 +222,19 @@ kvm_mips_host_tlb_write(struct kvm_vcpu *vcpu, unsigned long entryhi,
return -1;
}
if (idx < 0) {
idx = read_c0_random() % current_cpu_data.tlbsize;
write_c0_index(idx);
mtc0_tlbw_hazard();
}
write_c0_entrylo0(entrylo0);
write_c0_entrylo1(entrylo1);
mtc0_tlbw_hazard();
tlb_write_indexed();
if (idx < 0)
tlb_write_random();
else
tlb_write_indexed();
tlbw_use_hazard();
#ifdef DEBUG
if (debug) {
kvm_debug("@ %#lx idx: %2d [entryhi(R): %#lx] "
"entrylo0(R): 0x%08lx, entrylo1(R): 0x%08lx\n",
vcpu->arch.pc, idx, read_c0_entryhi(),
read_c0_entrylo0(), read_c0_entrylo1());
}
#endif
kvm_debug("@ %#lx idx: %2d [entryhi(R): %#lx] entrylo0(R): 0x%08lx, entrylo1(R): 0x%08lx\n",
vcpu->arch.pc, idx, read_c0_entryhi(),
read_c0_entrylo0(), read_c0_entrylo1());
/* Flush D-cache */
if (flush_dcache_mask) {
......@@ -348,11 +341,9 @@ int kvm_mips_handle_commpage_tlb_fault(unsigned long badvaddr,
mtc0_tlbw_hazard();
tlbw_use_hazard();
#ifdef DEBUG
kvm_debug ("@ %#lx idx: %2d [entryhi(R): %#lx] entrylo0 (R): 0x%08lx, entrylo1(R): 0x%08lx\n",
vcpu->arch.pc, read_c0_index(), read_c0_entryhi(),
read_c0_entrylo0(), read_c0_entrylo1());
#endif
/* Restore old ASID */
write_c0_entryhi(old_entryhi);
......@@ -400,10 +391,8 @@ kvm_mips_handle_mapped_seg_tlb_fault(struct kvm_vcpu *vcpu,
entrylo1 = mips3_paddr_to_tlbpfn(pfn1 << PAGE_SHIFT) | (0x3 << 3) |
(tlb->tlb_lo1 & MIPS3_PG_D) | (tlb->tlb_lo1 & MIPS3_PG_V);
#ifdef DEBUG
kvm_debug("@ %#lx tlb_lo0: 0x%08lx tlb_lo1: 0x%08lx\n", vcpu->arch.pc,
tlb->tlb_lo0, tlb->tlb_lo1);
#endif
return kvm_mips_host_tlb_write(vcpu, entryhi, entrylo0, entrylo1,
tlb->tlb_mask);
......@@ -424,10 +413,8 @@ int kvm_mips_guest_tlb_lookup(struct kvm_vcpu *vcpu, unsigned long entryhi)
}
}
#ifdef DEBUG
kvm_debug("%s: entryhi: %#lx, index: %d lo0: %#lx, lo1: %#lx\n",
__func__, entryhi, index, tlb[i].tlb_lo0, tlb[i].tlb_lo1);
#endif
return index;
}
......@@ -461,9 +448,7 @@ int kvm_mips_host_tlb_lookup(struct kvm_vcpu *vcpu, unsigned long vaddr)
local_irq_restore(flags);
#ifdef DEBUG
kvm_debug("Host TLB lookup, %#lx, idx: %2d\n", vaddr, idx);
#endif
return idx;
}
......@@ -508,12 +493,9 @@ int kvm_mips_host_tlb_inv(struct kvm_vcpu *vcpu, unsigned long va)
local_irq_restore(flags);
#ifdef DEBUG
if (idx > 0) {
if (idx > 0)
kvm_debug("%s: Invalidated entryhi %#lx @ idx %d\n", __func__,
(va & VPN2_MASK) | (vcpu->arch.asid_map[va & ASID_MASK] & ASID_MASK), idx);
}
#endif
(va & VPN2_MASK) | kvm_mips_get_user_asid(vcpu), idx);
return 0;
}
......@@ -658,15 +640,30 @@ void kvm_local_flush_tlb_all(void)
local_irq_restore(flags);
}
/**
* kvm_mips_migrate_count() - Migrate timer.
* @vcpu: Virtual CPU.
*
* Migrate CP0_Count hrtimer to the current CPU by cancelling and restarting it
* if it was running prior to being cancelled.
*
* Must be called when the VCPU is migrated to a different CPU to ensure that
* timer expiry during guest execution interrupts the guest and causes the
* interrupt to be delivered in a timely manner.
*/
static void kvm_mips_migrate_count(struct kvm_vcpu *vcpu)
{
if (hrtimer_cancel(&vcpu->arch.comparecount_timer))
hrtimer_restart(&vcpu->arch.comparecount_timer);
}
/* Restore ASID once we are scheduled back after preemption */
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
unsigned long flags;
int newasid = 0;
#ifdef DEBUG
kvm_debug("%s: vcpu %p, cpu: %d\n", __func__, vcpu, cpu);
#endif
/* Alocate new kernel and user ASIDs if needed */
......@@ -682,17 +679,23 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vcpu->arch.guest_user_mm.context.asid[cpu];
newasid++;
kvm_info("[%d]: cpu_context: %#lx\n", cpu,
cpu_context(cpu, current->mm));
kvm_info("[%d]: Allocated new ASID for Guest Kernel: %#x\n",
cpu, vcpu->arch.guest_kernel_asid[cpu]);
kvm_info("[%d]: Allocated new ASID for Guest User: %#x\n", cpu,
vcpu->arch.guest_user_asid[cpu]);
kvm_debug("[%d]: cpu_context: %#lx\n", cpu,
cpu_context(cpu, current->mm));
kvm_debug("[%d]: Allocated new ASID for Guest Kernel: %#x\n",
cpu, vcpu->arch.guest_kernel_asid[cpu]);
kvm_debug("[%d]: Allocated new ASID for Guest User: %#x\n", cpu,
vcpu->arch.guest_user_asid[cpu]);
}
if (vcpu->arch.last_sched_cpu != cpu) {
kvm_info("[%d->%d]KVM VCPU[%d] switch\n",
vcpu->arch.last_sched_cpu, cpu, vcpu->vcpu_id);
kvm_debug("[%d->%d]KVM VCPU[%d] switch\n",
vcpu->arch.last_sched_cpu, cpu, vcpu->vcpu_id);
/*
* Migrate the timer interrupt to the current CPU so that it
* always interrupts the guest and synchronously triggers a
* guest timer interrupt.
*/
kvm_mips_migrate_count(vcpu);
}
if (!newasid) {
......
......@@ -32,9 +32,7 @@ static gpa_t kvm_trap_emul_gva_to_gpa_cb(gva_t gva)
gpa = KVM_INVALID_ADDR;
}
#ifdef DEBUG
kvm_debug("%s: gva %#lx, gpa: %#llx\n", __func__, gva, gpa);
#endif
return gpa;
}
......@@ -85,11 +83,9 @@ static int kvm_trap_emul_handle_tlb_mod(struct kvm_vcpu *vcpu)
if (KVM_GUEST_KSEGX(badvaddr) < KVM_GUEST_KSEG0
|| KVM_GUEST_KSEGX(badvaddr) == KVM_GUEST_KSEG23) {
#ifdef DEBUG
kvm_debug
("USER/KSEG23 ADDR TLB MOD fault: cause %#lx, PC: %p, BadVaddr: %#lx\n",
cause, opc, badvaddr);
#endif
er = kvm_mips_handle_tlbmod(cause, opc, run, vcpu);
if (er == EMULATE_DONE)
......@@ -138,11 +134,9 @@ static int kvm_trap_emul_handle_tlb_st_miss(struct kvm_vcpu *vcpu)
}
} else if (KVM_GUEST_KSEGX(badvaddr) < KVM_GUEST_KSEG0
|| KVM_GUEST_KSEGX(badvaddr) == KVM_GUEST_KSEG23) {
#ifdef DEBUG
kvm_debug
("USER ADDR TLB LD fault: cause %#lx, PC: %p, BadVaddr: %#lx\n",
cause, opc, badvaddr);
#endif
er = kvm_mips_handle_tlbmiss(cause, opc, run, vcpu);
if (er == EMULATE_DONE)
ret = RESUME_GUEST;
......@@ -188,10 +182,8 @@ static int kvm_trap_emul_handle_tlb_ld_miss(struct kvm_vcpu *vcpu)
}
} else if (KVM_GUEST_KSEGX(badvaddr) < KVM_GUEST_KSEG0
|| KVM_GUEST_KSEGX(badvaddr) == KVM_GUEST_KSEG23) {
#ifdef DEBUG
kvm_debug("USER ADDR TLB ST fault: PC: %#lx, BadVaddr: %#lx\n",
vcpu->arch.pc, badvaddr);
#endif
/* User Address (UA) fault, this could happen if
* (1) TLB entry not present/valid in both Guest and shadow host TLBs, in this
......@@ -236,9 +228,7 @@ static int kvm_trap_emul_handle_addr_err_st(struct kvm_vcpu *vcpu)
if (KVM_GUEST_KERNEL_MODE(vcpu)
&& (KSEGX(badvaddr) == CKSEG0 || KSEGX(badvaddr) == CKSEG1)) {
#ifdef DEBUG
kvm_debug("Emulate Store to MMIO space\n");
#endif
er = kvm_mips_emulate_inst(cause, opc, run, vcpu);
if (er == EMULATE_FAIL) {
printk("Emulate Store to MMIO space failed\n");
......@@ -268,9 +258,7 @@ static int kvm_trap_emul_handle_addr_err_ld(struct kvm_vcpu *vcpu)
int ret = RESUME_GUEST;
if (KSEGX(badvaddr) == CKSEG0 || KSEGX(badvaddr) == CKSEG1) {
#ifdef DEBUG
kvm_debug("Emulate Load from MMIO space @ %#lx\n", badvaddr);
#endif
er = kvm_mips_emulate_inst(cause, opc, run, vcpu);
if (er == EMULATE_FAIL) {
printk("Emulate Load from MMIO space failed\n");
......@@ -401,6 +389,78 @@ static int kvm_trap_emul_vcpu_setup(struct kvm_vcpu *vcpu)
return 0;
}
static int kvm_trap_emul_get_one_reg(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg,
s64 *v)
{
switch (reg->id) {
case KVM_REG_MIPS_CP0_COUNT:
*v = kvm_mips_read_count(vcpu);
break;
case KVM_REG_MIPS_COUNT_CTL:
*v = vcpu->arch.count_ctl;
break;
case KVM_REG_MIPS_COUNT_RESUME:
*v = ktime_to_ns(vcpu->arch.count_resume);
break;
case KVM_REG_MIPS_COUNT_HZ:
*v = vcpu->arch.count_hz;
break;
default:
return -EINVAL;
}
return 0;
}
static int kvm_trap_emul_set_one_reg(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg,
s64 v)
{
struct mips_coproc *cop0 = vcpu->arch.cop0;
int ret = 0;
switch (reg->id) {
case KVM_REG_MIPS_CP0_COUNT:
kvm_mips_write_count(vcpu, v);
break;
case KVM_REG_MIPS_CP0_COMPARE:
kvm_mips_write_compare(vcpu, v);
break;
case KVM_REG_MIPS_CP0_CAUSE:
/*
* If the timer is stopped or started (DC bit) it must look
* atomic with changes to the interrupt pending bits (TI, IRQ5).
* A timer interrupt should not happen in between.
*/
if ((kvm_read_c0_guest_cause(cop0) ^ v) & CAUSEF_DC) {
if (v & CAUSEF_DC) {
/* disable timer first */
kvm_mips_count_disable_cause(vcpu);
kvm_change_c0_guest_cause(cop0, ~CAUSEF_DC, v);
} else {
/* enable timer last */
kvm_change_c0_guest_cause(cop0, ~CAUSEF_DC, v);
kvm_mips_count_enable_cause(vcpu);
}
} else {
kvm_write_c0_guest_cause(cop0, v);
}
break;
case KVM_REG_MIPS_COUNT_CTL:
ret = kvm_mips_set_count_ctl(vcpu, v);
break;
case KVM_REG_MIPS_COUNT_RESUME:
ret = kvm_mips_set_count_resume(vcpu, v);
break;
case KVM_REG_MIPS_COUNT_HZ:
ret = kvm_mips_set_count_hz(vcpu, v);
break;
default:
return -EINVAL;
}
return ret;
}
static struct kvm_mips_callbacks kvm_trap_emul_callbacks = {
/* exit handlers */
.handle_cop_unusable = kvm_trap_emul_handle_cop_unusable,
......@@ -423,6 +483,8 @@ static struct kvm_mips_callbacks kvm_trap_emul_callbacks = {
.dequeue_io_int = kvm_mips_dequeue_io_int_cb,
.irq_deliver = kvm_mips_irq_deliver_cb,
.irq_clear = kvm_mips_irq_clear_cb,
.get_one_reg = kvm_trap_emul_get_one_reg,
.set_one_reg = kvm_trap_emul_set_one_reg,
};
int kvm_mips_emulation_init(struct kvm_mips_callbacks **install_callbacks)
......
......@@ -31,6 +31,7 @@ void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page,
void (*flush_icache_range)(unsigned long start, unsigned long end);
EXPORT_SYMBOL_GPL(flush_icache_range);
void (*local_flush_icache_range)(unsigned long start, unsigned long end);
EXPORT_SYMBOL_GPL(local_flush_icache_range);
void (*__flush_cache_vmap)(void);
void (*__flush_cache_vunmap)(void);
......
......@@ -74,18 +74,8 @@ static void __init estimate_frequencies(void)
unsigned int giccount = 0, gicstart = 0;
#endif
#if defined (CONFIG_KVM_GUEST) && defined (CONFIG_KVM_HOST_FREQ)
unsigned int prid = read_c0_prid() & (PRID_COMP_MASK | PRID_IMP_MASK);
/*
* XXXKYMA: hardwire the CPU frequency to Host Freq/4
*/
count = (CONFIG_KVM_HOST_FREQ * 1000000) >> 3;
if ((prid != (PRID_COMP_MIPS | PRID_IMP_20KC)) &&
(prid != (PRID_COMP_MIPS | PRID_IMP_25KF)))
count *= 2;
mips_hpt_frequency = count;
#if defined(CONFIG_KVM_GUEST) && CONFIG_KVM_GUEST_TIMER_FREQ
mips_hpt_frequency = CONFIG_KVM_GUEST_TIMER_FREQ * 1000000;
return;
#endif
......
......@@ -81,4 +81,38 @@ static inline unsigned int get_oc(u32 inst)
{
return (inst >> 11) & 0x7fff;
}
#define IS_XFORM(inst) (get_op(inst) == 31)
#define IS_DSFORM(inst) (get_op(inst) >= 56)
/*
* Create a DSISR value from the instruction
*/
static inline unsigned make_dsisr(unsigned instr)
{
unsigned dsisr;
/* bits 6:15 --> 22:31 */
dsisr = (instr & 0x03ff0000) >> 16;
if (IS_XFORM(instr)) {
/* bits 29:30 --> 15:16 */
dsisr |= (instr & 0x00000006) << 14;
/* bit 25 --> 17 */
dsisr |= (instr & 0x00000040) << 8;
/* bits 21:24 --> 18:21 */
dsisr |= (instr & 0x00000780) << 3;
} else {
/* bit 5 --> 17 */
dsisr |= (instr & 0x04000000) >> 12;
/* bits 1: 4 --> 18:21 */
dsisr |= (instr & 0x78000000) >> 17;
/* bits 30:31 --> 12:13 */
if (IS_DSFORM(instr))
dsisr |= (instr & 0x00000003) << 18;
}
return dsisr;
}
#endif /* __ASM_PPC_DISASSEMBLE_H__ */
......@@ -102,6 +102,7 @@
#define BOOK3S_INTERRUPT_PERFMON 0xf00
#define BOOK3S_INTERRUPT_ALTIVEC 0xf20
#define BOOK3S_INTERRUPT_VSX 0xf40
#define BOOK3S_INTERRUPT_FAC_UNAVAIL 0xf60
#define BOOK3S_INTERRUPT_H_FAC_UNAVAIL 0xf80
#define BOOK3S_IRQPRIO_SYSTEM_RESET 0
......@@ -114,14 +115,15 @@
#define BOOK3S_IRQPRIO_FP_UNAVAIL 7
#define BOOK3S_IRQPRIO_ALTIVEC 8
#define BOOK3S_IRQPRIO_VSX 9
#define BOOK3S_IRQPRIO_SYSCALL 10
#define BOOK3S_IRQPRIO_MACHINE_CHECK 11
#define BOOK3S_IRQPRIO_DEBUG 12
#define BOOK3S_IRQPRIO_EXTERNAL 13
#define BOOK3S_IRQPRIO_DECREMENTER 14
#define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR 15
#define BOOK3S_IRQPRIO_EXTERNAL_LEVEL 16
#define BOOK3S_IRQPRIO_MAX 17
#define BOOK3S_IRQPRIO_FAC_UNAVAIL 10
#define BOOK3S_IRQPRIO_SYSCALL 11
#define BOOK3S_IRQPRIO_MACHINE_CHECK 12
#define BOOK3S_IRQPRIO_DEBUG 13
#define BOOK3S_IRQPRIO_EXTERNAL 14
#define BOOK3S_IRQPRIO_DECREMENTER 15
#define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR 16
#define BOOK3S_IRQPRIO_EXTERNAL_LEVEL 17
#define BOOK3S_IRQPRIO_MAX 18
#define BOOK3S_HFLAG_DCBZ32 0x1
#define BOOK3S_HFLAG_SLB 0x2
......
......@@ -268,9 +268,10 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
return vcpu->arch.pc;
}
static inline u64 kvmppc_get_msr(struct kvm_vcpu *vcpu);
static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu)
{
return (vcpu->arch.shared->msr & MSR_LE) != (MSR_KERNEL & MSR_LE);
return (kvmppc_get_msr(vcpu) & MSR_LE) != (MSR_KERNEL & MSR_LE);
}
static inline u32 kvmppc_get_last_inst_internal(struct kvm_vcpu *vcpu, ulong pc)
......
......@@ -77,34 +77,122 @@ static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits)
return old == 0;
}
static inline int __hpte_actual_psize(unsigned int lp, int psize)
{
int i, shift;
unsigned int mask;
/* start from 1 ignoring MMU_PAGE_4K */
for (i = 1; i < MMU_PAGE_COUNT; i++) {
/* invalid penc */
if (mmu_psize_defs[psize].penc[i] == -1)
continue;
/*
* encoding bits per actual page size
* PTE LP actual page size
* rrrr rrrz >=8KB
* rrrr rrzz >=16KB
* rrrr rzzz >=32KB
* rrrr zzzz >=64KB
* .......
*/
shift = mmu_psize_defs[i].shift - LP_SHIFT;
if (shift > LP_BITS)
shift = LP_BITS;
mask = (1 << shift) - 1;
if ((lp & mask) == mmu_psize_defs[psize].penc[i])
return i;
}
return -1;
}
static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
unsigned long pte_index)
{
unsigned long rb, va_low;
int b_psize, a_psize;
unsigned int penc;
unsigned long rb = 0, va_low, sllp;
unsigned int lp = (r >> LP_SHIFT) & ((1 << LP_BITS) - 1);
if (!(v & HPTE_V_LARGE)) {
/* both base and actual psize is 4k */
b_psize = MMU_PAGE_4K;
a_psize = MMU_PAGE_4K;
} else {
for (b_psize = 0; b_psize < MMU_PAGE_COUNT; b_psize++) {
/* valid entries have a shift value */
if (!mmu_psize_defs[b_psize].shift)
continue;
a_psize = __hpte_actual_psize(lp, b_psize);
if (a_psize != -1)
break;
}
}
/*
* Ignore the top 14 bits of va
* v have top two bits covering segment size, hence move
* by 16 bits, Also clear the lower HPTE_V_AVPN_SHIFT (7) bits.
* AVA field in v also have the lower 23 bits ignored.
* For base page size 4K we need 14 .. 65 bits (so need to
* collect extra 11 bits)
* For others we need 14..14+i
*/
/* This covers 14..54 bits of va*/
rb = (v & ~0x7fUL) << 16; /* AVA field */
/*
* AVA in v had cleared lower 23 bits. We need to derive
* that from pteg index
*/
va_low = pte_index >> 3;
if (v & HPTE_V_SECONDARY)
va_low = ~va_low;
/* xor vsid from AVA */
/*
* get the vpn bits from va_low using reverse of hashing.
* In v we have va with 23 bits dropped and then left shifted
* HPTE_V_AVPN_SHIFT (7) bits. Now to find vsid we need
* right shift it with (SID_SHIFT - (23 - 7))
*/
if (!(v & HPTE_V_1TB_SEG))
va_low ^= v >> 12;
va_low ^= v >> (SID_SHIFT - 16);
else
va_low ^= v >> 24;
va_low ^= v >> (SID_SHIFT_1T - 16);
va_low &= 0x7ff;
if (v & HPTE_V_LARGE) {
rb |= 1; /* L field */
if (cpu_has_feature(CPU_FTR_ARCH_206) &&
(r & 0xff000)) {
/* non-16MB large page, must be 64k */
/* (masks depend on page size) */
rb |= 0x1000; /* page encoding in LP field */
rb |= (va_low & 0x7f) << 16; /* 7b of VA in AVA/LP field */
rb |= ((va_low << 4) & 0xf0); /* AVAL field (P7 doesn't seem to care) */
}
} else {
/* 4kB page */
rb |= (va_low & 0x7ff) << 12; /* remaining 11b of VA */
switch (b_psize) {
case MMU_PAGE_4K:
sllp = ((mmu_psize_defs[a_psize].sllp & SLB_VSID_L) >> 6) |
((mmu_psize_defs[a_psize].sllp & SLB_VSID_LP) >> 4);
rb |= sllp << 5; /* AP field */
rb |= (va_low & 0x7ff) << 12; /* remaining 11 bits of AVA */
break;
default:
{
int aval_shift;
/*
* remaining 7bits of AVA/LP fields
* Also contain the rr bits of LP
*/
rb |= (va_low & 0x7f) << 16;
/*
* Now clear not needed LP bits based on actual psize
*/
rb &= ~((1ul << mmu_psize_defs[a_psize].shift) - 1);
/*
* AVAL field 58..77 - base_page_shift bits of va
* we have space for 58..64 bits, Missing bits should
* be zero filled. +1 is to take care of L bit shift
*/
aval_shift = 64 - (77 - mmu_psize_defs[b_psize].shift) + 1;
rb |= ((va_low << aval_shift) & 0xfe);
rb |= 1; /* L field */
penc = mmu_psize_defs[b_psize].penc[a_psize];
rb |= penc << 12; /* LP field */
break;
}
}
rb |= (v >> 54) & 0x300; /* B field */
return rb;
......@@ -112,14 +200,26 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
{
int size, a_psize;
/* Look at the 8 bit LP value */
unsigned int lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1);
/* only handle 4k, 64k and 16M pages for now */
if (!(h & HPTE_V_LARGE))
return 1ul << 12; /* 4k page */
if ((l & 0xf000) == 0x1000 && cpu_has_feature(CPU_FTR_ARCH_206))
return 1ul << 16; /* 64k page */
if ((l & 0xff000) == 0)
return 1ul << 24; /* 16M page */
return 0; /* error */
return 1ul << 12;
else {
for (size = 0; size < MMU_PAGE_COUNT; size++) {
/* valid entries have a shift value */
if (!mmu_psize_defs[size].shift)
continue;
a_psize = __hpte_actual_psize(lp, size);
if (a_psize != -1)
return 1ul << mmu_psize_defs[a_psize].shift;
}
}
return 0;
}
static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
......
......@@ -104,6 +104,7 @@ struct kvmppc_host_state {
#ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
u64 ppr;
u64 host_fscr;
#endif
};
......@@ -133,6 +134,7 @@ struct kvmppc_book3s_shadow_vcpu {
u64 esid;
u64 vsid;
} slb[64]; /* guest SLB */
u64 shadow_fscr;
#endif
};
......
......@@ -108,9 +108,4 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
{
return vcpu->arch.fault_dear;
}
static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.shared->msr;
}
#endif /* __ASM_KVM_BOOKE_H__ */
......@@ -449,7 +449,9 @@ struct kvm_vcpu_arch {
ulong pc;
ulong ctr;
ulong lr;
#ifdef CONFIG_PPC_BOOK3S
ulong tar;
#endif
ulong xer;
u32 cr;
......@@ -475,6 +477,7 @@ struct kvm_vcpu_arch {
ulong ppr;
ulong pspb;
ulong fscr;
ulong shadow_fscr;
ulong ebbhr;
ulong ebbrr;
ulong bescr;
......@@ -562,6 +565,7 @@ struct kvm_vcpu_arch {
#ifdef CONFIG_PPC_BOOK3S
ulong fault_dar;
u32 fault_dsisr;
unsigned long intr_msr;
#endif
#ifdef CONFIG_BOOKE
......@@ -622,8 +626,12 @@ struct kvm_vcpu_arch {
wait_queue_head_t cpu_run;
struct kvm_vcpu_arch_shared *shared;
#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
bool shared_big_endian;
#endif
unsigned long magic_page_pa; /* phys addr to map the magic page to */
unsigned long magic_page_ea; /* effect. addr to map the magic page to */
bool disable_kernel_nx;
int irq_type; /* one of KVM_IRQ_* */
int irq_cpu_id;
......@@ -654,7 +662,6 @@ struct kvm_vcpu_arch {
spinlock_t tbacct_lock;
u64 busy_stolen;
u64 busy_preempt;
unsigned long intr_msr;
#endif
};
......
......@@ -448,6 +448,84 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn)
}
}
/*
* Shared struct helpers. The shared struct can be little or big endian,
* depending on the guest endianness. So expose helpers to all of them.
*/
static inline bool kvmppc_shared_big_endian(struct kvm_vcpu *vcpu)
{
#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
/* Only Book3S_64 PR supports bi-endian for now */
return vcpu->arch.shared_big_endian;
#elif defined(CONFIG_PPC_BOOK3S_64) && defined(__LITTLE_ENDIAN__)
/* Book3s_64 HV on little endian is always little endian */
return false;
#else
return true;
#endif
}
#define SHARED_WRAPPER_GET(reg, size) \
static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu) \
{ \
if (kvmppc_shared_big_endian(vcpu)) \
return be##size##_to_cpu(vcpu->arch.shared->reg); \
else \
return le##size##_to_cpu(vcpu->arch.shared->reg); \
} \
#define SHARED_WRAPPER_SET(reg, size) \
static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val) \
{ \
if (kvmppc_shared_big_endian(vcpu)) \
vcpu->arch.shared->reg = cpu_to_be##size(val); \
else \
vcpu->arch.shared->reg = cpu_to_le##size(val); \
} \
#define SHARED_WRAPPER(reg, size) \
SHARED_WRAPPER_GET(reg, size) \
SHARED_WRAPPER_SET(reg, size) \
SHARED_WRAPPER(critical, 64)
SHARED_WRAPPER(sprg0, 64)
SHARED_WRAPPER(sprg1, 64)
SHARED_WRAPPER(sprg2, 64)
SHARED_WRAPPER(sprg3, 64)
SHARED_WRAPPER(srr0, 64)
SHARED_WRAPPER(srr1, 64)
SHARED_WRAPPER(dar, 64)
SHARED_WRAPPER_GET(msr, 64)
static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val)
{
if (kvmppc_shared_big_endian(vcpu))
vcpu->arch.shared->msr = cpu_to_be64(val);
else
vcpu->arch.shared->msr = cpu_to_le64(val);
}
SHARED_WRAPPER(dsisr, 32)
SHARED_WRAPPER(int_pending, 32)
SHARED_WRAPPER(sprg4, 64)
SHARED_WRAPPER(sprg5, 64)
SHARED_WRAPPER(sprg6, 64)
SHARED_WRAPPER(sprg7, 64)
static inline u32 kvmppc_get_sr(struct kvm_vcpu *vcpu, int nr)
{
if (kvmppc_shared_big_endian(vcpu))
return be32_to_cpu(vcpu->arch.shared->sr[nr]);
else
return le32_to_cpu(vcpu->arch.shared->sr[nr]);
}
static inline void kvmppc_set_sr(struct kvm_vcpu *vcpu, int nr, u32 val)
{
if (kvmppc_shared_big_endian(vcpu))
vcpu->arch.shared->sr[nr] = cpu_to_be32(val);
else
vcpu->arch.shared->sr[nr] = cpu_to_le32(val);
}
/*
* Please call after prepare_to_enter. This function puts the lazy ee and irq
* disabled tracking state back to normal mode, without actually enabling
......@@ -485,7 +563,7 @@ static inline ulong kvmppc_get_ea_indexed(struct kvm_vcpu *vcpu, int ra, int rb)
msr_64bit = MSR_SF;
#endif
if (!(vcpu->arch.shared->msr & msr_64bit))
if (!(kvmppc_get_msr(vcpu) & msr_64bit))
ea = (uint32_t)ea;
return ea;
......
......@@ -670,18 +670,20 @@
#define MMCR0_PROBLEM_DISABLE MMCR0_FCP
#define MMCR0_FCM1 0x10000000UL /* freeze counters while MSR mark = 1 */
#define MMCR0_FCM0 0x08000000UL /* freeze counters while MSR mark = 0 */
#define MMCR0_PMXE 0x04000000UL /* performance monitor exception enable */
#define MMCR0_FCECE 0x02000000UL /* freeze ctrs on enabled cond or event */
#define MMCR0_PMXE ASM_CONST(0x04000000) /* perf mon exception enable */
#define MMCR0_FCECE ASM_CONST(0x02000000) /* freeze ctrs on enabled cond or event */
#define MMCR0_TBEE 0x00400000UL /* time base exception enable */
#define MMCR0_BHRBA 0x00200000UL /* BHRB Access allowed in userspace */
#define MMCR0_EBE 0x00100000UL /* Event based branch enable */
#define MMCR0_PMCC 0x000c0000UL /* PMC control */
#define MMCR0_PMCC_U6 0x00080000UL /* PMC1-6 are R/W by user (PR) */
#define MMCR0_PMC1CE 0x00008000UL /* PMC1 count enable*/
#define MMCR0_PMCjCE 0x00004000UL /* PMCj count enable*/
#define MMCR0_PMCjCE ASM_CONST(0x00004000) /* PMCj count enable*/
#define MMCR0_TRIGGER 0x00002000UL /* TRIGGER enable */
#define MMCR0_PMAO_SYNC 0x00000800UL /* PMU interrupt is synchronous */
#define MMCR0_PMAO 0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */
#define MMCR0_PMAO_SYNC ASM_CONST(0x00000800) /* PMU intr is synchronous */
#define MMCR0_C56RUN ASM_CONST(0x00000100) /* PMC5/6 count when RUN=0 */
/* performance monitor alert has occurred, set to 0 after handling exception */
#define MMCR0_PMAO ASM_CONST(0x00000080)
#define MMCR0_SHRFC 0x00000040UL /* SHRre freeze conditions between threads */
#define MMCR0_FC56 0x00000010UL /* freeze counters 5 and 6 */
#define MMCR0_FCTI 0x00000008UL /* freeze counters in tags inactive mode */
......
......@@ -583,6 +583,7 @@
/* Bit definitions for L1CSR0. */
#define L1CSR0_CPE 0x00010000 /* Data Cache Parity Enable */
#define L1CSR0_CUL 0x00000400 /* Data Cache Unable to Lock */
#define L1CSR0_CLFC 0x00000100 /* Cache Lock Bits Flash Clear */
#define L1CSR0_DCFI 0x00000002 /* Data Cache Flash Invalidate */
#define L1CSR0_CFI 0x00000002 /* Cache Flash Invalidate */
......
......@@ -545,7 +545,6 @@ struct kvm_get_htab_header {
#define KVM_REG_PPC_TCSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb1)
#define KVM_REG_PPC_PID (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb2)
#define KVM_REG_PPC_ACOP (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb3)
#define KVM_REG_PPC_WORT (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb4)
#define KVM_REG_PPC_VRSAVE (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb4)
#define KVM_REG_PPC_LPCR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb5)
......@@ -555,6 +554,7 @@ struct kvm_get_htab_header {
#define KVM_REG_PPC_ARCH_COMPAT (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb7)
#define KVM_REG_PPC_DABRX (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb8)
#define KVM_REG_PPC_WORT (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb9)
/* Transactional Memory checkpointed state:
* This is all GPRs, all VSX regs and a subset of SPRs
......
......@@ -82,10 +82,16 @@ struct kvm_vcpu_arch_shared {
#define KVM_FEATURE_MAGIC_PAGE 1
/* Magic page flags from host to guest */
#define KVM_MAGIC_FEAT_SR (1 << 0)
/* MASn, ESR, PIR, and high SPRGs */
#define KVM_MAGIC_FEAT_MAS0_TO_SPRG7 (1 << 1)
/* Magic page flags from guest to host */
#define MAGIC_PAGE_FLAG_NOT_MAPPED_NX (1 << 0)
#endif /* _UAPI__POWERPC_KVM_PARA_H__ */
......@@ -25,14 +25,13 @@
#include <asm/cputable.h>
#include <asm/emulated_ops.h>
#include <asm/switch_to.h>
#include <asm/disassemble.h>
struct aligninfo {
unsigned char len;
unsigned char flags;
};
#define IS_XFORM(inst) (((inst) >> 26) == 31)
#define IS_DSFORM(inst) (((inst) >> 26) >= 56)
#define INVALID { 0, 0 }
......@@ -191,37 +190,6 @@ static struct aligninfo aligninfo[128] = {
INVALID, /* 11 1 1111 */
};
/*
* Create a DSISR value from the instruction
*/
static inline unsigned make_dsisr(unsigned instr)
{
unsigned dsisr;
/* bits 6:15 --> 22:31 */
dsisr = (instr & 0x03ff0000) >> 16;
if (IS_XFORM(instr)) {
/* bits 29:30 --> 15:16 */
dsisr |= (instr & 0x00000006) << 14;
/* bit 25 --> 17 */
dsisr |= (instr & 0x00000040) << 8;
/* bits 21:24 --> 18:21 */
dsisr |= (instr & 0x00000780) << 3;
} else {
/* bit 5 --> 17 */
dsisr |= (instr & 0x04000000) >> 12;
/* bits 1: 4 --> 18:21 */
dsisr |= (instr & 0x78000000) >> 17;
/* bits 30:31 --> 12:13 */
if (IS_DSFORM(instr))
dsisr |= (instr & 0x00000003) << 18;
}
return dsisr;
}
/*
* The dcbz (data cache block zero) instruction
* gives an alignment fault if used on non-cacheable
......
......@@ -54,6 +54,7 @@
#endif
#if defined(CONFIG_KVM) && defined(CONFIG_PPC_BOOK3S)
#include <asm/kvm_book3s.h>
#include <asm/kvm_ppc.h>
#endif
#ifdef CONFIG_PPC32
......@@ -445,7 +446,9 @@ int main(void)
DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, arch.xer));
DEFINE(VCPU_CTR, offsetof(struct kvm_vcpu, arch.ctr));
DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, arch.lr));
#ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_TAR, offsetof(struct kvm_vcpu, arch.tar));
#endif
DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr));
DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, arch.pc));
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
......@@ -467,6 +470,9 @@ int main(void)
DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared));
DEFINE(VCPU_SHARED_MSR, offsetof(struct kvm_vcpu_arch_shared, msr));
DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
DEFINE(VCPU_SHAREDBE, offsetof(struct kvm_vcpu, arch.shared_big_endian));
#endif
DEFINE(VCPU_SHARED_MAS0, offsetof(struct kvm_vcpu_arch_shared, mas0));
DEFINE(VCPU_SHARED_MAS1, offsetof(struct kvm_vcpu_arch_shared, mas1));
......@@ -493,7 +499,6 @@ int main(void)
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty));
DEFINE(VCPU_INTR_MSR, offsetof(struct kvm_vcpu, arch.intr_msr));
#endif
#ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
......@@ -528,11 +533,13 @@ int main(void)
DEFINE(VCPU_SLB_NR, offsetof(struct kvm_vcpu, arch.slb_nr));
DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
DEFINE(VCPU_FAULT_DAR, offsetof(struct kvm_vcpu, arch.fault_dar));
DEFINE(VCPU_INTR_MSR, offsetof(struct kvm_vcpu, arch.intr_msr));
DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
DEFINE(VCPU_TRAP, offsetof(struct kvm_vcpu, arch.trap));
DEFINE(VCPU_CFAR, offsetof(struct kvm_vcpu, arch.cfar));
DEFINE(VCPU_PPR, offsetof(struct kvm_vcpu, arch.ppr));
DEFINE(VCPU_FSCR, offsetof(struct kvm_vcpu, arch.fscr));
DEFINE(VCPU_SHADOW_FSCR, offsetof(struct kvm_vcpu, arch.shadow_fscr));
DEFINE(VCPU_PSPB, offsetof(struct kvm_vcpu, arch.pspb));
DEFINE(VCPU_EBBHR, offsetof(struct kvm_vcpu, arch.ebbhr));
DEFINE(VCPU_EBBRR, offsetof(struct kvm_vcpu, arch.ebbrr));
......@@ -614,6 +621,7 @@ int main(void)
#ifdef CONFIG_PPC64
SVCPU_FIELD(SVCPU_SLB, slb);
SVCPU_FIELD(SVCPU_SLB_MAX, slb_max);
SVCPU_FIELD(SVCPU_SHADOW_FSCR, shadow_fscr);
#endif
HSTATE_FIELD(HSTATE_HOST_R1, host_r1);
......@@ -649,6 +657,7 @@ int main(void)
#ifdef CONFIG_PPC_BOOK3S_64
HSTATE_FIELD(HSTATE_CFAR, cfar);
HSTATE_FIELD(HSTATE_PPR, ppr);
HSTATE_FIELD(HSTATE_HOST_FSCR, host_fscr);
#endif /* CONFIG_PPC_BOOK3S_64 */
#else /* CONFIG_PPC_BOOK3S */
......
......@@ -47,9 +47,10 @@ static int __init early_init_dt_scan_epapr(unsigned long node,
return -1;
for (i = 0; i < (len / 4); i++) {
patch_instruction(epapr_hypercall_start + i, insts[i]);
u32 inst = be32_to_cpu(insts[i]);
patch_instruction(epapr_hypercall_start + i, inst);
#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
patch_instruction(epapr_ev_idle_start + i, insts[i]);
patch_instruction(epapr_ev_idle_start + i, inst);
#endif
}
......
......@@ -417,7 +417,7 @@ static void kvm_map_magic_page(void *data)
ulong out[8];
in[0] = KVM_MAGIC_PAGE;
in[1] = KVM_MAGIC_PAGE;
in[1] = KVM_MAGIC_PAGE | MAGIC_PAGE_FLAG_NOT_MAPPED_NX;
epapr_hypercall(in, out, KVM_HCALL_TOKEN(KVM_HC_PPC_MAP_MAGIC_PAGE));
......
......@@ -98,6 +98,9 @@ static inline void free_lppacas(void) { }
/*
* 3 persistent SLBs are registered here. The buffer will be zero
* initially, hence will all be invaild until we actually write them.
*
* If you make the number of persistent SLB entries dynamic, please also
* update PR KVM to flush and restore them accordingly.
*/
static struct slb_shadow *slb_shadow;
......
......@@ -6,7 +6,6 @@ source "virt/kvm/Kconfig"
menuconfig VIRTUALIZATION
bool "Virtualization"
depends on !CPU_LITTLE_ENDIAN
---help---
Say Y here to get to see options for using your Linux host to run
other operating systems inside virtual machines (guests).
......@@ -76,6 +75,7 @@ config KVM_BOOK3S_64
config KVM_BOOK3S_64_HV
tristate "KVM support for POWER7 and PPC970 using hypervisor mode in host"
depends on KVM_BOOK3S_64
depends on !CPU_LITTLE_ENDIAN
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
......
......@@ -85,9 +85,9 @@ static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
if (is_kvmppc_hv_enabled(vcpu->kvm))
return;
if (pending_now)
vcpu->arch.shared->int_pending = 1;
kvmppc_set_int_pending(vcpu, 1);
else if (old_pending)
vcpu->arch.shared->int_pending = 0;
kvmppc_set_int_pending(vcpu, 0);
}
static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
......@@ -99,11 +99,11 @@ static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
if (is_kvmppc_hv_enabled(vcpu->kvm))
return false;
crit_raw = vcpu->arch.shared->critical;
crit_raw = kvmppc_get_critical(vcpu);
crit_r1 = kvmppc_get_gpr(vcpu, 1);
/* Truncate crit indicators in 32 bit mode */
if (!(vcpu->arch.shared->msr & MSR_SF)) {
if (!(kvmppc_get_msr(vcpu) & MSR_SF)) {
crit_raw &= 0xffffffff;
crit_r1 &= 0xffffffff;
}
......@@ -111,15 +111,15 @@ static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
/* Critical section when crit == r1 */
crit = (crit_raw == crit_r1);
/* ... and we're in supervisor mode */
crit = crit && !(vcpu->arch.shared->msr & MSR_PR);
crit = crit && !(kvmppc_get_msr(vcpu) & MSR_PR);
return crit;
}
void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags)
{
vcpu->arch.shared->srr0 = kvmppc_get_pc(vcpu);
vcpu->arch.shared->srr1 = vcpu->arch.shared->msr | flags;
kvmppc_set_srr0(vcpu, kvmppc_get_pc(vcpu));
kvmppc_set_srr1(vcpu, kvmppc_get_msr(vcpu) | flags);
kvmppc_set_pc(vcpu, kvmppc_interrupt_offset(vcpu) + vec);
vcpu->arch.mmu.reset_msr(vcpu);
}
......@@ -145,6 +145,7 @@ static int kvmppc_book3s_vec2irqprio(unsigned int vec)
case 0xd00: prio = BOOK3S_IRQPRIO_DEBUG; break;
case 0xf20: prio = BOOK3S_IRQPRIO_ALTIVEC; break;
case 0xf40: prio = BOOK3S_IRQPRIO_VSX; break;
case 0xf60: prio = BOOK3S_IRQPRIO_FAC_UNAVAIL; break;
default: prio = BOOK3S_IRQPRIO_MAX; break;
}
......@@ -225,12 +226,12 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
switch (priority) {
case BOOK3S_IRQPRIO_DECREMENTER:
deliver = (vcpu->arch.shared->msr & MSR_EE) && !crit;
deliver = (kvmppc_get_msr(vcpu) & MSR_EE) && !crit;
vec = BOOK3S_INTERRUPT_DECREMENTER;
break;
case BOOK3S_IRQPRIO_EXTERNAL:
case BOOK3S_IRQPRIO_EXTERNAL_LEVEL:
deliver = (vcpu->arch.shared->msr & MSR_EE) && !crit;
deliver = (kvmppc_get_msr(vcpu) & MSR_EE) && !crit;
vec = BOOK3S_INTERRUPT_EXTERNAL;
break;
case BOOK3S_IRQPRIO_SYSTEM_RESET:
......@@ -275,6 +276,9 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR:
vec = BOOK3S_INTERRUPT_PERFMON;
break;
case BOOK3S_IRQPRIO_FAC_UNAVAIL:
vec = BOOK3S_INTERRUPT_FAC_UNAVAIL;
break;
default:
deliver = 0;
printk(KERN_ERR "KVM: Unknown interrupt: 0x%x\n", priority);
......@@ -343,7 +347,7 @@ pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing,
{
ulong mp_pa = vcpu->arch.magic_page_pa;
if (!(vcpu->arch.shared->msr & MSR_SF))
if (!(kvmppc_get_msr(vcpu) & MSR_SF))
mp_pa = (uint32_t)mp_pa;
/* Magic page override */
......@@ -367,7 +371,7 @@ EXPORT_SYMBOL_GPL(kvmppc_gfn_to_pfn);
static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data,
bool iswrite, struct kvmppc_pte *pte)
{
int relocated = (vcpu->arch.shared->msr & (data ? MSR_DR : MSR_IR));
int relocated = (kvmppc_get_msr(vcpu) & (data ? MSR_DR : MSR_IR));
int r;
if (relocated) {
......@@ -498,18 +502,18 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
regs->ctr = kvmppc_get_ctr(vcpu);
regs->lr = kvmppc_get_lr(vcpu);
regs->xer = kvmppc_get_xer(vcpu);
regs->msr = vcpu->arch.shared->msr;
regs->srr0 = vcpu->arch.shared->srr0;
regs->srr1 = vcpu->arch.shared->srr1;
regs->msr = kvmppc_get_msr(vcpu);
regs->srr0 = kvmppc_get_srr0(vcpu);
regs->srr1 = kvmppc_get_srr1(vcpu);
regs->pid = vcpu->arch.pid;
regs->sprg0 = vcpu->arch.shared->sprg0;
regs->sprg1 = vcpu->arch.shared->sprg1;
regs->sprg2 = vcpu->arch.shared->sprg2;
regs->sprg3 = vcpu->arch.shared->sprg3;
regs->sprg4 = vcpu->arch.shared->sprg4;
regs->sprg5 = vcpu->arch.shared->sprg5;
regs->sprg6 = vcpu->arch.shared->sprg6;
regs->sprg7 = vcpu->arch.shared->sprg7;
regs->sprg0 = kvmppc_get_sprg0(vcpu);
regs->sprg1 = kvmppc_get_sprg1(vcpu);
regs->sprg2 = kvmppc_get_sprg2(vcpu);
regs->sprg3 = kvmppc_get_sprg3(vcpu);
regs->sprg4 = kvmppc_get_sprg4(vcpu);
regs->sprg5 = kvmppc_get_sprg5(vcpu);
regs->sprg6 = kvmppc_get_sprg6(vcpu);
regs->sprg7 = kvmppc_get_sprg7(vcpu);
for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
regs->gpr[i] = kvmppc_get_gpr(vcpu, i);
......@@ -527,16 +531,16 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
kvmppc_set_lr(vcpu, regs->lr);
kvmppc_set_xer(vcpu, regs->xer);
kvmppc_set_msr(vcpu, regs->msr);
vcpu->arch.shared->srr0 = regs->srr0;
vcpu->arch.shared->srr1 = regs->srr1;
vcpu->arch.shared->sprg0 = regs->sprg0;
vcpu->arch.shared->sprg1 = regs->sprg1;
vcpu->arch.shared->sprg2 = regs->sprg2;
vcpu->arch.shared->sprg3 = regs->sprg3;
vcpu->arch.shared->sprg4 = regs->sprg4;
vcpu->arch.shared->sprg5 = regs->sprg5;
vcpu->arch.shared->sprg6 = regs->sprg6;
vcpu->arch.shared->sprg7 = regs->sprg7;
kvmppc_set_srr0(vcpu, regs->srr0);
kvmppc_set_srr1(vcpu, regs->srr1);
kvmppc_set_sprg0(vcpu, regs->sprg0);
kvmppc_set_sprg1(vcpu, regs->sprg1);
kvmppc_set_sprg2(vcpu, regs->sprg2);
kvmppc_set_sprg3(vcpu, regs->sprg3);
kvmppc_set_sprg4(vcpu, regs->sprg4);
kvmppc_set_sprg5(vcpu, regs->sprg5);
kvmppc_set_sprg6(vcpu, regs->sprg6);
kvmppc_set_sprg7(vcpu, regs->sprg7);
for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
kvmppc_set_gpr(vcpu, i, regs->gpr[i]);
......@@ -570,10 +574,10 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
r = 0;
switch (reg->id) {
case KVM_REG_PPC_DAR:
val = get_reg_val(reg->id, vcpu->arch.shared->dar);
val = get_reg_val(reg->id, kvmppc_get_dar(vcpu));
break;
case KVM_REG_PPC_DSISR:
val = get_reg_val(reg->id, vcpu->arch.shared->dsisr);
val = get_reg_val(reg->id, kvmppc_get_dsisr(vcpu));
break;
case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
i = reg->id - KVM_REG_PPC_FPR0;
......@@ -627,6 +631,21 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
val = get_reg_val(reg->id, kvmppc_xics_get_icp(vcpu));
break;
#endif /* CONFIG_KVM_XICS */
case KVM_REG_PPC_FSCR:
val = get_reg_val(reg->id, vcpu->arch.fscr);
break;
case KVM_REG_PPC_TAR:
val = get_reg_val(reg->id, vcpu->arch.tar);
break;
case KVM_REG_PPC_EBBHR:
val = get_reg_val(reg->id, vcpu->arch.ebbhr);
break;
case KVM_REG_PPC_EBBRR:
val = get_reg_val(reg->id, vcpu->arch.ebbrr);
break;
case KVM_REG_PPC_BESCR:
val = get_reg_val(reg->id, vcpu->arch.bescr);
break;
default:
r = -EINVAL;
break;
......@@ -660,10 +679,10 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
r = 0;
switch (reg->id) {
case KVM_REG_PPC_DAR:
vcpu->arch.shared->dar = set_reg_val(reg->id, val);
kvmppc_set_dar(vcpu, set_reg_val(reg->id, val));
break;
case KVM_REG_PPC_DSISR:
vcpu->arch.shared->dsisr = set_reg_val(reg->id, val);
kvmppc_set_dsisr(vcpu, set_reg_val(reg->id, val));
break;
case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
i = reg->id - KVM_REG_PPC_FPR0;
......@@ -716,6 +735,21 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
set_reg_val(reg->id, val));
break;
#endif /* CONFIG_KVM_XICS */
case KVM_REG_PPC_FSCR:
vcpu->arch.fscr = set_reg_val(reg->id, val);
break;
case KVM_REG_PPC_TAR:
vcpu->arch.tar = set_reg_val(reg->id, val);
break;
case KVM_REG_PPC_EBBHR:
vcpu->arch.ebbhr = set_reg_val(reg->id, val);
break;
case KVM_REG_PPC_EBBRR:
vcpu->arch.ebbrr = set_reg_val(reg->id, val);
break;
case KVM_REG_PPC_BESCR:
vcpu->arch.bescr = set_reg_val(reg->id, val);
break;
default:
r = -EINVAL;
break;
......
......@@ -91,7 +91,7 @@ static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
static u32 find_sr(struct kvm_vcpu *vcpu, gva_t eaddr)
{
return vcpu->arch.shared->sr[(eaddr >> 28) & 0xf];
return kvmppc_get_sr(vcpu, (eaddr >> 28) & 0xf);
}
static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
......@@ -131,7 +131,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvm_vcpu *vcpu,
pteg = (vcpu_book3s->sdr1 & 0xffff0000) | hash;
dprintk("MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n",
kvmppc_get_pc(&vcpu_book3s->vcpu), eaddr, vcpu_book3s->sdr1, pteg,
kvmppc_get_pc(vcpu), eaddr, vcpu_book3s->sdr1, pteg,
sr_vsid(sre));
r = gfn_to_hva(vcpu->kvm, pteg >> PAGE_SHIFT);
......@@ -160,7 +160,7 @@ static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
else
bat = &vcpu_book3s->ibat[i];
if (vcpu->arch.shared->msr & MSR_PR) {
if (kvmppc_get_msr(vcpu) & MSR_PR) {
if (!bat->vp)
continue;
} else {
......@@ -208,6 +208,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
u32 sre;
hva_t ptegp;
u32 pteg[16];
u32 pte0, pte1;
u32 ptem = 0;
int i;
int found = 0;
......@@ -233,14 +234,16 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
}
for (i=0; i<16; i+=2) {
if (ptem == pteg[i]) {
pte0 = be32_to_cpu(pteg[i]);
pte1 = be32_to_cpu(pteg[i + 1]);
if (ptem == pte0) {
u8 pp;
pte->raddr = (pteg[i+1] & ~(0xFFFULL)) | (eaddr & 0xFFF);
pp = pteg[i+1] & 3;
pte->raddr = (pte1 & ~(0xFFFULL)) | (eaddr & 0xFFF);
pp = pte1 & 3;
if ((sr_kp(sre) && (vcpu->arch.shared->msr & MSR_PR)) ||
(sr_ks(sre) && !(vcpu->arch.shared->msr & MSR_PR)))
if ((sr_kp(sre) && (kvmppc_get_msr(vcpu) & MSR_PR)) ||
(sr_ks(sre) && !(kvmppc_get_msr(vcpu) & MSR_PR)))
pp |= 4;
pte->may_write = false;
......@@ -260,7 +263,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
}
dprintk_pte("MMU: Found PTE -> %x %x - %x\n",
pteg[i], pteg[i+1], pp);
pte0, pte1, pp);
found = 1;
break;
}
......@@ -269,8 +272,8 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
/* Update PTE C and A bits, so the guest's swapper knows we used the
page */
if (found) {
u32 pte_r = pteg[i+1];
char __user *addr = (char __user *) &pteg[i+1];
u32 pte_r = pte1;
char __user *addr = (char __user *) (ptegp + (i+1) * sizeof(u32));
/*
* Use single-byte writes to update the HPTE, to
......@@ -296,7 +299,8 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
to_book3s(vcpu)->sdr1, ptegp);
for (i=0; i<16; i+=2) {
dprintk_pte(" %02d: 0x%x - 0x%x (0x%x)\n",
i, pteg[i], pteg[i+1], ptem);
i, be32_to_cpu(pteg[i]),
be32_to_cpu(pteg[i+1]), ptem);
}
}
......@@ -316,7 +320,7 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
/* Magic page override */
if (unlikely(mp_ea) &&
unlikely((eaddr & ~0xfffULL) == (mp_ea & ~0xfffULL)) &&
!(vcpu->arch.shared->msr & MSR_PR)) {
!(kvmppc_get_msr(vcpu) & MSR_PR)) {
pte->vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data);
pte->raddr = vcpu->arch.magic_page_pa | (pte->raddr & 0xfff);
pte->raddr &= KVM_PAM;
......@@ -341,13 +345,13 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
static u32 kvmppc_mmu_book3s_32_mfsrin(struct kvm_vcpu *vcpu, u32 srnum)
{
return vcpu->arch.shared->sr[srnum];
return kvmppc_get_sr(vcpu, srnum);
}
static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum,
ulong value)
{
vcpu->arch.shared->sr[srnum] = value;
kvmppc_set_sr(vcpu, srnum, value);
kvmppc_mmu_map_segment(vcpu, srnum << SID_SHIFT);
}
......@@ -367,8 +371,9 @@ static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
ulong ea = esid << SID_SHIFT;
u32 sr;
u64 gvsid = esid;
u64 msr = kvmppc_get_msr(vcpu);
if (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
if (msr & (MSR_DR|MSR_IR)) {
sr = find_sr(vcpu, ea);
if (sr_valid(sr))
gvsid = sr_vsid(sr);
......@@ -377,7 +382,7 @@ static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
/* In case we only have one of MSR_IR or MSR_DR set, let's put
that in the real-mode context (and hope RM doesn't access
high memory) */
switch (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
switch (msr & (MSR_DR|MSR_IR)) {
case 0:
*vsid = VSID_REAL | esid;
break;
......@@ -397,7 +402,7 @@ static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
BUG();
}
if (vcpu->arch.shared->msr & MSR_PR)
if (msr & MSR_PR)
*vsid |= VSID_PR;
return 0;
......
......@@ -92,7 +92,7 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
struct kvmppc_sid_map *map;
u16 sid_map_mask;
if (vcpu->arch.shared->msr & MSR_PR)
if (kvmppc_get_msr(vcpu) & MSR_PR)
gvsid |= VSID_PR;
sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
......@@ -279,7 +279,7 @@ static struct kvmppc_sid_map *create_sid_map(struct kvm_vcpu *vcpu, u64 gvsid)
u16 sid_map_mask;
static int backwards_map = 0;
if (vcpu->arch.shared->msr & MSR_PR)
if (kvmppc_get_msr(vcpu) & MSR_PR)
gvsid |= VSID_PR;
/* We might get collisions that trap in preceding order, so let's
......
......@@ -38,7 +38,7 @@
static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu)
{
kvmppc_set_msr(vcpu, MSR_SF);
kvmppc_set_msr(vcpu, vcpu->arch.intr_msr);
}
static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe(
......@@ -226,7 +226,7 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
/* Magic page override */
if (unlikely(mp_ea) &&
unlikely((eaddr & ~0xfffULL) == (mp_ea & ~0xfffULL)) &&
!(vcpu->arch.shared->msr & MSR_PR)) {
!(kvmppc_get_msr(vcpu) & MSR_PR)) {
gpte->eaddr = eaddr;
gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data);
gpte->raddr = vcpu->arch.magic_page_pa | (gpte->raddr & 0xfff);
......@@ -269,18 +269,21 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
goto no_page_found;
}
if ((vcpu->arch.shared->msr & MSR_PR) && slbe->Kp)
if ((kvmppc_get_msr(vcpu) & MSR_PR) && slbe->Kp)
key = 4;
else if (!(vcpu->arch.shared->msr & MSR_PR) && slbe->Ks)
else if (!(kvmppc_get_msr(vcpu) & MSR_PR) && slbe->Ks)
key = 4;
for (i=0; i<16; i+=2) {
u64 pte0 = be64_to_cpu(pteg[i]);
u64 pte1 = be64_to_cpu(pteg[i + 1]);
/* Check all relevant fields of 1st dword */
if ((pteg[i] & v_mask) == v_val) {
if ((pte0 & v_mask) == v_val) {
/* If large page bit is set, check pgsize encoding */
if (slbe->large &&
(vcpu->arch.hflags & BOOK3S_HFLAG_MULTI_PGSIZE)) {
pgsize = decode_pagesize(slbe, pteg[i+1]);
pgsize = decode_pagesize(slbe, pte1);
if (pgsize < 0)
continue;
}
......@@ -297,8 +300,8 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
goto do_second;
}
v = pteg[i];
r = pteg[i+1];
v = be64_to_cpu(pteg[i]);
r = be64_to_cpu(pteg[i+1]);
pp = (r & HPTE_R_PP) | key;
if (r & HPTE_R_PP0)
pp |= 8;
......@@ -310,6 +313,9 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
gpte->raddr = (r & HPTE_R_RPN & ~eaddr_mask) | (eaddr & eaddr_mask);
gpte->page_size = pgsize;
gpte->may_execute = ((r & HPTE_R_N) ? false : true);
if (unlikely(vcpu->arch.disable_kernel_nx) &&
!(kvmppc_get_msr(vcpu) & MSR_PR))
gpte->may_execute = true;
gpte->may_read = false;
gpte->may_write = false;
......@@ -342,14 +348,14 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
* non-PAPR platforms such as mac99, and this is
* what real hardware does.
*/
char __user *addr = (char __user *) &pteg[i+1];
char __user *addr = (char __user *) (ptegp + (i + 1) * sizeof(u64));
r |= HPTE_R_R;
put_user(r >> 8, addr + 6);
}
if (iswrite && gpte->may_write && !(r & HPTE_R_C)) {
/* Set the dirty flag */
/* Use a single byte write */
char __user *addr = (char __user *) &pteg[i+1];
char __user *addr = (char __user *) (ptegp + (i + 1) * sizeof(u64));
r |= HPTE_R_C;
put_user(r, addr + 7);
}
......@@ -479,7 +485,7 @@ static void kvmppc_mmu_book3s_64_slbia(struct kvm_vcpu *vcpu)
vcpu->arch.slb[i].origv = 0;
}
if (vcpu->arch.shared->msr & MSR_IR) {
if (kvmppc_get_msr(vcpu) & MSR_IR) {
kvmppc_mmu_flush_segments(vcpu);
kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
}
......@@ -563,7 +569,7 @@ static int segment_contains_magic_page(struct kvm_vcpu *vcpu, ulong esid)
{
ulong mp_ea = vcpu->arch.magic_page_ea;
return mp_ea && !(vcpu->arch.shared->msr & MSR_PR) &&
return mp_ea && !(kvmppc_get_msr(vcpu) & MSR_PR) &&
(mp_ea >> SID_SHIFT) == esid;
}
#endif
......@@ -576,8 +582,9 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
u64 gvsid = esid;
ulong mp_ea = vcpu->arch.magic_page_ea;
int pagesize = MMU_PAGE_64K;
u64 msr = kvmppc_get_msr(vcpu);
if (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
if (msr & (MSR_DR|MSR_IR)) {
slb = kvmppc_mmu_book3s_64_find_slbe(vcpu, ea);
if (slb) {
gvsid = slb->vsid;
......@@ -590,7 +597,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
}
}
switch (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
switch (msr & (MSR_DR|MSR_IR)) {
case 0:
gvsid = VSID_REAL | esid;
break;
......@@ -623,7 +630,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
gvsid |= VSID_64K;
#endif
if (vcpu->arch.shared->msr & MSR_PR)
if (kvmppc_get_msr(vcpu) & MSR_PR)
gvsid |= VSID_PR;
*vsid = gvsid;
......@@ -633,7 +640,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
/* Catch magic page case */
if (unlikely(mp_ea) &&
unlikely(esid == (mp_ea >> SID_SHIFT)) &&
!(vcpu->arch.shared->msr & MSR_PR)) {
!(kvmppc_get_msr(vcpu) & MSR_PR)) {
*vsid = VSID_REAL | esid;
return 0;
}
......
......@@ -58,7 +58,7 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
struct kvmppc_sid_map *map;
u16 sid_map_mask;
if (vcpu->arch.shared->msr & MSR_PR)
if (kvmppc_get_msr(vcpu) & MSR_PR)
gvsid |= VSID_PR;
sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
......@@ -230,7 +230,7 @@ static struct kvmppc_sid_map *create_sid_map(struct kvm_vcpu *vcpu, u64 gvsid)
u16 sid_map_mask;
static int backwards_map = 0;
if (vcpu->arch.shared->msr & MSR_PR)
if (kvmppc_get_msr(vcpu) & MSR_PR)
gvsid |= VSID_PR;
/* We might get collisions that trap in preceding order, so let's
......@@ -271,11 +271,8 @@ static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong esid)
int found_inval = -1;
int r;
if (!svcpu->slb_max)
svcpu->slb_max = 1;
/* Are we overwriting? */
for (i = 1; i < svcpu->slb_max; i++) {
for (i = 0; i < svcpu->slb_max; i++) {
if (!(svcpu->slb[i].esid & SLB_ESID_V))
found_inval = i;
else if ((svcpu->slb[i].esid & ESID_MASK) == esid) {
......@@ -285,7 +282,7 @@ static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong esid)
}
/* Found a spare entry that was invalidated before */
if (found_inval > 0) {
if (found_inval >= 0) {
r = found_inval;
goto out;
}
......@@ -359,7 +356,7 @@ void kvmppc_mmu_flush_segment(struct kvm_vcpu *vcpu, ulong ea, ulong seg_size)
ulong seg_mask = -seg_size;
int i;
for (i = 1; i < svcpu->slb_max; i++) {
for (i = 0; i < svcpu->slb_max; i++) {
if ((svcpu->slb[i].esid & SLB_ESID_V) &&
(svcpu->slb[i].esid & seg_mask) == ea) {
/* Invalidate this entry */
......@@ -373,7 +370,7 @@ void kvmppc_mmu_flush_segment(struct kvm_vcpu *vcpu, ulong ea, ulong seg_size)
void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu)
{
struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
svcpu->slb_max = 1;
svcpu->slb_max = 0;
svcpu->slb[0].esid = 0;
svcpu_put(svcpu);
}
......
......@@ -52,7 +52,7 @@ static void kvmppc_rmap_reset(struct kvm *kvm);
long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
{
unsigned long hpt;
unsigned long hpt = 0;
struct revmap_entry *rev;
struct page *page = NULL;
long order = KVM_DEFAULT_HPT_ORDER;
......@@ -64,22 +64,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
kvm->arch.hpt_cma_alloc = 0;
/*
* try first to allocate it from the kernel page allocator.
* We keep the CMA reserved for failed allocation.
*/
hpt = __get_free_pages(GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT |
__GFP_NOWARN, order - PAGE_SHIFT);
/* Next try to allocate from the preallocated pool */
if (!hpt) {
VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
kvm->arch.hpt_cma_alloc = 1;
} else
--order;
VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
kvm->arch.hpt_cma_alloc = 1;
}
/* Lastly try successively smaller sizes from the page allocator */
......@@ -596,6 +585,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
struct kvm *kvm = vcpu->kvm;
unsigned long *hptep, hpte[3], r;
unsigned long mmu_seq, psize, pte_size;
unsigned long gpa_base, gfn_base;
unsigned long gpa, gfn, hva, pfn;
struct kvm_memory_slot *memslot;
unsigned long *rmap;
......@@ -634,7 +624,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
/* Translate the logical address and get the page */
psize = hpte_page_size(hpte[0], r);
gpa = (r & HPTE_R_RPN & ~(psize - 1)) | (ea & (psize - 1));
gpa_base = r & HPTE_R_RPN & ~(psize - 1);
gfn_base = gpa_base >> PAGE_SHIFT;
gpa = gpa_base | (ea & (psize - 1));
gfn = gpa >> PAGE_SHIFT;
memslot = gfn_to_memslot(kvm, gfn);
......@@ -646,6 +638,13 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
if (!kvm->arch.using_mmu_notifiers)
return -EFAULT; /* should never get here */
/*
* This should never happen, because of the slot_is_aligned()
* check in kvmppc_do_h_enter().
*/
if (gfn_base < memslot->base_gfn)
return -EFAULT;
/* used to check for invalidations in progress */
mmu_seq = kvm->mmu_notifier_seq;
smp_rmb();
......@@ -738,7 +737,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
goto out_unlock;
hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
rmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
/* Always put the HPTE in the rmap chain for the page base address */
rmap = &memslot->arch.rmap[gfn_base - memslot->base_gfn];
lock_rmap(rmap);
/* Check if we might have been invalidated; let the guest retry if so */
......@@ -1060,22 +1060,33 @@ void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t pte)
kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
}
static int kvm_test_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
static int vcpus_running(struct kvm *kvm)
{
return atomic_read(&kvm->arch.vcpus_running) != 0;
}
/*
* Returns the number of system pages that are dirty.
* This can be more than 1 if we find a huge-page HPTE.
*/
static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
{
struct revmap_entry *rev = kvm->arch.revmap;
unsigned long head, i, j;
unsigned long n;
unsigned long v, r;
unsigned long *hptep;
int ret = 0;
int npages_dirty = 0;
retry:
lock_rmap(rmapp);
if (*rmapp & KVMPPC_RMAP_CHANGED) {
*rmapp &= ~KVMPPC_RMAP_CHANGED;
ret = 1;
npages_dirty = 1;
}
if (!(*rmapp & KVMPPC_RMAP_PRESENT)) {
unlock_rmap(rmapp);
return ret;
return npages_dirty;
}
i = head = *rmapp & KVMPPC_RMAP_INDEX;
......@@ -1083,7 +1094,22 @@ static int kvm_test_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
hptep = (unsigned long *) (kvm->arch.hpt_virt + (i << 4));
j = rev[i].forw;
if (!(hptep[1] & HPTE_R_C))
/*
* Checking the C (changed) bit here is racy since there
* is no guarantee about when the hardware writes it back.
* If the HPTE is not writable then it is stable since the
* page can't be written to, and we would have done a tlbie
* (which forces the hardware to complete any writeback)
* when making the HPTE read-only.
* If vcpus are running then this call is racy anyway
* since the page could get dirtied subsequently, so we
* expect there to be a further call which would pick up
* any delayed C bit writeback.
* Otherwise we need to do the tlbie even if C==0 in
* order to pick up any delayed writeback of C.
*/
if (!(hptep[1] & HPTE_R_C) &&
(!hpte_is_writable(hptep[1]) || vcpus_running(kvm)))
continue;
if (!try_lock_hpte(hptep, HPTE_V_HVLOCK)) {
......@@ -1095,24 +1121,33 @@ static int kvm_test_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
}
/* Now check and modify the HPTE */
if ((hptep[0] & HPTE_V_VALID) && (hptep[1] & HPTE_R_C)) {
/* need to make it temporarily absent to clear C */
hptep[0] |= HPTE_V_ABSENT;
kvmppc_invalidate_hpte(kvm, hptep, i);
hptep[1] &= ~HPTE_R_C;
eieio();
hptep[0] = (hptep[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
if (!(hptep[0] & HPTE_V_VALID))
continue;
/* need to make it temporarily absent so C is stable */
hptep[0] |= HPTE_V_ABSENT;
kvmppc_invalidate_hpte(kvm, hptep, i);
v = hptep[0];
r = hptep[1];
if (r & HPTE_R_C) {
hptep[1] = r & ~HPTE_R_C;
if (!(rev[i].guest_rpte & HPTE_R_C)) {
rev[i].guest_rpte |= HPTE_R_C;
note_hpte_modification(kvm, &rev[i]);
}
ret = 1;
n = hpte_page_size(v, r);
n = (n + PAGE_SIZE - 1) >> PAGE_SHIFT;
if (n > npages_dirty)
npages_dirty = n;
eieio();
}
hptep[0] &= ~HPTE_V_HVLOCK;
v &= ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
v |= HPTE_V_VALID;
hptep[0] = v;
} while ((i = j) != head);
unlock_rmap(rmapp);
return ret;
return npages_dirty;
}
static void harvest_vpa_dirty(struct kvmppc_vpa *vpa,
......@@ -1136,15 +1171,22 @@ static void harvest_vpa_dirty(struct kvmppc_vpa *vpa,
long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot,
unsigned long *map)
{
unsigned long i;
unsigned long i, j;
unsigned long *rmapp;
struct kvm_vcpu *vcpu;
preempt_disable();
rmapp = memslot->arch.rmap;
for (i = 0; i < memslot->npages; ++i) {
if (kvm_test_clear_dirty(kvm, rmapp) && map)
__set_bit_le(i, map);
int npages = kvm_test_clear_dirty_npages(kvm, rmapp);
/*
* Note that if npages > 0 then i must be a multiple of npages,
* since we always put huge-page HPTEs in the rmap chain
* corresponding to their page base address.
*/
if (npages && map)
for (j = i; npages; ++j, --npages)
__set_bit_le(j, map);
++rmapp;
}
......
......@@ -17,30 +17,9 @@
* Authors: Alexander Graf <agraf@suse.de>
*/
#ifdef __LITTLE_ENDIAN__
#error Need to fix SLB shadow accesses in little endian mode
#endif
#define SHADOW_SLB_ESID(num) (SLBSHADOW_SAVEAREA + (num * 0x10))
#define SHADOW_SLB_VSID(num) (SLBSHADOW_SAVEAREA + (num * 0x10) + 0x8)
#define UNBOLT_SLB_ENTRY(num) \
ld r9, SHADOW_SLB_ESID(num)(r12); \
/* Invalid? Skip. */; \
rldicl. r0, r9, 37, 63; \
beq slb_entry_skip_ ## num; \
xoris r9, r9, SLB_ESID_V@h; \
std r9, SHADOW_SLB_ESID(num)(r12); \
slb_entry_skip_ ## num:
#define REBOLT_SLB_ENTRY(num) \
ld r10, SHADOW_SLB_ESID(num)(r11); \
cmpdi r10, 0; \
beq slb_exit_skip_ ## num; \
oris r10, r10, SLB_ESID_V@h; \
ld r9, SHADOW_SLB_VSID(num)(r11); \
slbmte r9, r10; \
std r10, SHADOW_SLB_ESID(num)(r11); \
slb_exit_skip_ ## num:
#define SHADOW_SLB_ENTRY_LEN 0x10
#define OFFSET_ESID(x) (SHADOW_SLB_ENTRY_LEN * x)
#define OFFSET_VSID(x) ((SHADOW_SLB_ENTRY_LEN * x) + 8)
/******************************************************************************
* *
......@@ -64,20 +43,15 @@ slb_exit_skip_ ## num:
* SVCPU[LR] = guest LR
*/
/* Remove LPAR shadow entries */
BEGIN_FW_FTR_SECTION
#if SLB_NUM_BOLTED == 3
/* Declare SLB shadow as 0 entries big */
ld r12, PACA_SLBSHADOWPTR(r13)
ld r11, PACA_SLBSHADOWPTR(r13)
li r8, 0
stb r8, 3(r11)
/* Remove bolted entries */
UNBOLT_SLB_ENTRY(0)
UNBOLT_SLB_ENTRY(1)
UNBOLT_SLB_ENTRY(2)
#else
#error unknown number of bolted entries
#endif
END_FW_FTR_SECTION_IFSET(FW_FEATURE_LPAR)
/* Flush SLB */
......@@ -100,7 +74,7 @@ slb_loop_enter:
ld r10, 0(r11)
rldicl. r0, r10, 37, 63
andis. r9, r10, SLB_ESID_V@h
beq slb_loop_enter_skip
ld r9, 8(r11)
......@@ -137,23 +111,42 @@ slb_do_enter:
*
*/
/* Restore bolted entries from the shadow and fix it along the way */
/* Remove all SLB entries that are in use. */
/* We don't store anything in entry 0, so we don't need to take care of it */
li r0, r0
slbmte r0, r0
slbia
isync
#if SLB_NUM_BOLTED == 3
/* Restore bolted entries from the shadow */
ld r11, PACA_SLBSHADOWPTR(r13)
REBOLT_SLB_ENTRY(0)
REBOLT_SLB_ENTRY(1)
REBOLT_SLB_ENTRY(2)
#else
#error unknown number of bolted entries
#endif
BEGIN_FW_FTR_SECTION
/* Declare SLB shadow as SLB_NUM_BOLTED entries big */
li r8, SLB_NUM_BOLTED
stb r8, 3(r11)
END_FW_FTR_SECTION_IFSET(FW_FEATURE_LPAR)
/* Manually load all entries from shadow SLB */
li r8, SLBSHADOW_SAVEAREA
li r7, SLBSHADOW_SAVEAREA + 8
.rept SLB_NUM_BOLTED
LDX_BE r10, r11, r8
cmpdi r10, 0
beq 1f
LDX_BE r9, r11, r7
slbmte r9, r10
1: addi r7, r7, SHADOW_SLB_ENTRY_LEN
addi r8, r8, SHADOW_SLB_ENTRY_LEN
.endr
isync
sync
slb_do_exit:
......
......@@ -80,7 +80,7 @@ static bool spr_allowed(struct kvm_vcpu *vcpu, enum priv_level level)
return false;
/* Limit user space to its own small SPR set */
if ((vcpu->arch.shared->msr & MSR_PR) && level > PRIV_PROBLEM)
if ((kvmppc_get_msr(vcpu) & MSR_PR) && level > PRIV_PROBLEM)
return false;
return true;
......@@ -94,14 +94,31 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
int rs = get_rs(inst);
int ra = get_ra(inst);
int rb = get_rb(inst);
u32 inst_sc = 0x44000002;
switch (get_op(inst)) {
case 0:
emulated = EMULATE_FAIL;
if ((kvmppc_get_msr(vcpu) & MSR_LE) &&
(inst == swab32(inst_sc))) {
/*
* This is the byte reversed syscall instruction of our
* hypercall handler. Early versions of LE Linux didn't
* swap the instructions correctly and ended up in
* illegal instructions.
* Just always fail hypercalls on these broken systems.
*/
kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED);
kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
emulated = EMULATE_DONE;
}
break;
case 19:
switch (get_xop(inst)) {
case OP_19_XOP_RFID:
case OP_19_XOP_RFI:
kvmppc_set_pc(vcpu, vcpu->arch.shared->srr0);
kvmppc_set_msr(vcpu, vcpu->arch.shared->srr1);
kvmppc_set_pc(vcpu, kvmppc_get_srr0(vcpu));
kvmppc_set_msr(vcpu, kvmppc_get_srr1(vcpu));
*advance = 0;
break;
......@@ -113,16 +130,16 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
case 31:
switch (get_xop(inst)) {
case OP_31_XOP_MFMSR:
kvmppc_set_gpr(vcpu, rt, vcpu->arch.shared->msr);
kvmppc_set_gpr(vcpu, rt, kvmppc_get_msr(vcpu));
break;
case OP_31_XOP_MTMSRD:
{
ulong rs_val = kvmppc_get_gpr(vcpu, rs);
if (inst & 0x10000) {
ulong new_msr = vcpu->arch.shared->msr;
ulong new_msr = kvmppc_get_msr(vcpu);
new_msr &= ~(MSR_RI | MSR_EE);
new_msr |= rs_val & (MSR_RI | MSR_EE);
vcpu->arch.shared->msr = new_msr;
kvmppc_set_msr_fast(vcpu, new_msr);
} else
kvmppc_set_msr(vcpu, rs_val);
break;
......@@ -179,7 +196,7 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
ulong cmd = kvmppc_get_gpr(vcpu, 3);
int i;
if ((vcpu->arch.shared->msr & MSR_PR) ||
if ((kvmppc_get_msr(vcpu) & MSR_PR) ||
!vcpu->arch.papr_enabled) {
emulated = EMULATE_FAIL;
break;
......@@ -261,14 +278,14 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
ra_val = kvmppc_get_gpr(vcpu, ra);
addr = (ra_val + rb_val) & ~31ULL;
if (!(vcpu->arch.shared->msr & MSR_SF))
if (!(kvmppc_get_msr(vcpu) & MSR_SF))
addr &= 0xffffffff;
vaddr = addr;
r = kvmppc_st(vcpu, &addr, 32, zeros, true);
if ((r == -ENOENT) || (r == -EPERM)) {
*advance = 0;
vcpu->arch.shared->dar = vaddr;
kvmppc_set_dar(vcpu, vaddr);
vcpu->arch.fault_dar = vaddr;
dsisr = DSISR_ISSTORE;
......@@ -277,7 +294,7 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
else if (r == -EPERM)
dsisr |= DSISR_PROTFAULT;
vcpu->arch.shared->dsisr = dsisr;
kvmppc_set_dsisr(vcpu, dsisr);
vcpu->arch.fault_dsisr = dsisr;
kvmppc_book3s_queue_irqprio(vcpu,
......@@ -356,10 +373,10 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val)
to_book3s(vcpu)->sdr1 = spr_val;
break;
case SPRN_DSISR:
vcpu->arch.shared->dsisr = spr_val;
kvmppc_set_dsisr(vcpu, spr_val);
break;
case SPRN_DAR:
vcpu->arch.shared->dar = spr_val;
kvmppc_set_dar(vcpu, spr_val);
break;
case SPRN_HIOR:
to_book3s(vcpu)->hior = spr_val;
......@@ -438,6 +455,31 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val)
case SPRN_GQR7:
to_book3s(vcpu)->gqr[sprn - SPRN_GQR0] = spr_val;
break;
case SPRN_FSCR:
vcpu->arch.fscr = spr_val;
break;
#ifdef CONFIG_PPC_BOOK3S_64
case SPRN_BESCR:
vcpu->arch.bescr = spr_val;
break;
case SPRN_EBBHR:
vcpu->arch.ebbhr = spr_val;
break;
case SPRN_EBBRR:
vcpu->arch.ebbrr = spr_val;
break;
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case SPRN_TFHAR:
vcpu->arch.tfhar = spr_val;
break;
case SPRN_TEXASR:
vcpu->arch.texasr = spr_val;
break;
case SPRN_TFIAR:
vcpu->arch.tfiar = spr_val;
break;
#endif
#endif
case SPRN_ICTC:
case SPRN_THRM1:
case SPRN_THRM2:
......@@ -455,6 +497,13 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val)
case SPRN_WPAR_GEKKO:
case SPRN_MSSSR0:
case SPRN_DABR:
#ifdef CONFIG_PPC_BOOK3S_64
case SPRN_MMCRS:
case SPRN_MMCRA:
case SPRN_MMCR0:
case SPRN_MMCR1:
case SPRN_MMCR2:
#endif
break;
unprivileged:
default:
......@@ -493,10 +542,10 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val
*spr_val = to_book3s(vcpu)->sdr1;
break;
case SPRN_DSISR:
*spr_val = vcpu->arch.shared->dsisr;
*spr_val = kvmppc_get_dsisr(vcpu);
break;
case SPRN_DAR:
*spr_val = vcpu->arch.shared->dar;
*spr_val = kvmppc_get_dar(vcpu);
break;
case SPRN_HIOR:
*spr_val = to_book3s(vcpu)->hior;
......@@ -538,6 +587,31 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val
case SPRN_GQR7:
*spr_val = to_book3s(vcpu)->gqr[sprn - SPRN_GQR0];
break;
case SPRN_FSCR:
*spr_val = vcpu->arch.fscr;
break;
#ifdef CONFIG_PPC_BOOK3S_64
case SPRN_BESCR:
*spr_val = vcpu->arch.bescr;
break;
case SPRN_EBBHR:
*spr_val = vcpu->arch.ebbhr;
break;
case SPRN_EBBRR:
*spr_val = vcpu->arch.ebbrr;
break;
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case SPRN_TFHAR:
*spr_val = vcpu->arch.tfhar;
break;
case SPRN_TEXASR:
*spr_val = vcpu->arch.texasr;
break;
case SPRN_TFIAR:
*spr_val = vcpu->arch.tfiar;
break;
#endif
#endif
case SPRN_THRM1:
case SPRN_THRM2:
case SPRN_THRM3:
......@@ -553,6 +627,14 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val
case SPRN_WPAR_GEKKO:
case SPRN_MSSSR0:
case SPRN_DABR:
#ifdef CONFIG_PPC_BOOK3S_64
case SPRN_MMCRS:
case SPRN_MMCRA:
case SPRN_MMCR0:
case SPRN_MMCR1:
case SPRN_MMCR2:
case SPRN_TIR:
#endif
*spr_val = 0;
break;
default:
......@@ -569,48 +651,17 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val
u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst)
{
u32 dsisr = 0;
/*
* This is what the spec says about DSISR bits (not mentioned = 0):
*
* 12:13 [DS] Set to bits 30:31
* 15:16 [X] Set to bits 29:30
* 17 [X] Set to bit 25
* [D/DS] Set to bit 5
* 18:21 [X] Set to bits 21:24
* [D/DS] Set to bits 1:4
* 22:26 Set to bits 6:10 (RT/RS/FRT/FRS)
* 27:31 Set to bits 11:15 (RA)
*/
switch (get_op(inst)) {
/* D-form */
case OP_LFS:
case OP_LFD:
case OP_STFD:
case OP_STFS:
dsisr |= (inst >> 12) & 0x4000; /* bit 17 */
dsisr |= (inst >> 17) & 0x3c00; /* bits 18:21 */
break;
/* X-form */
case 31:
dsisr |= (inst << 14) & 0x18000; /* bits 15:16 */
dsisr |= (inst << 8) & 0x04000; /* bit 17 */
dsisr |= (inst << 3) & 0x03c00; /* bits 18:21 */
break;
default:
printk(KERN_INFO "KVM: Unaligned instruction 0x%x\n", inst);
break;
}
dsisr |= (inst >> 16) & 0x03ff; /* bits 22:31 */
return dsisr;
return make_dsisr(inst);
}
ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
{
#ifdef CONFIG_PPC_BOOK3S_64
/*
* Linux's fix_alignment() assumes that DAR is valid, so can we
*/
return vcpu->arch.fault_dar;
#else
ulong dar = 0;
ulong ra = get_ra(inst);
ulong rb = get_rb(inst);
......@@ -635,4 +686,5 @@ ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
}
return dar;
#endif
}
......@@ -18,6 +18,7 @@
*/
#include <linux/export.h>
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
......
......@@ -879,24 +879,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_IAMR:
*val = get_reg_val(id, vcpu->arch.iamr);
break;
case KVM_REG_PPC_FSCR:
*val = get_reg_val(id, vcpu->arch.fscr);
break;
case KVM_REG_PPC_PSPB:
*val = get_reg_val(id, vcpu->arch.pspb);
break;
case KVM_REG_PPC_EBBHR:
*val = get_reg_val(id, vcpu->arch.ebbhr);
break;
case KVM_REG_PPC_EBBRR:
*val = get_reg_val(id, vcpu->arch.ebbrr);
break;
case KVM_REG_PPC_BESCR:
*val = get_reg_val(id, vcpu->arch.bescr);
break;
case KVM_REG_PPC_TAR:
*val = get_reg_val(id, vcpu->arch.tar);
break;
case KVM_REG_PPC_DPDES:
*val = get_reg_val(id, vcpu->arch.vcore->dpdes);
break;
......@@ -1091,24 +1076,9 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_IAMR:
vcpu->arch.iamr = set_reg_val(id, *val);
break;
case KVM_REG_PPC_FSCR:
vcpu->arch.fscr = set_reg_val(id, *val);
break;
case KVM_REG_PPC_PSPB:
vcpu->arch.pspb = set_reg_val(id, *val);
break;
case KVM_REG_PPC_EBBHR:
vcpu->arch.ebbhr = set_reg_val(id, *val);
break;
case KVM_REG_PPC_EBBRR:
vcpu->arch.ebbrr = set_reg_val(id, *val);
break;
case KVM_REG_PPC_BESCR:
vcpu->arch.bescr = set_reg_val(id, *val);
break;
case KVM_REG_PPC_TAR:
vcpu->arch.tar = set_reg_val(id, *val);
break;
case KVM_REG_PPC_DPDES:
vcpu->arch.vcore->dpdes = set_reg_val(id, *val);
break;
......@@ -1280,6 +1250,17 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
goto free_vcpu;
vcpu->arch.shared = &vcpu->arch.shregs;
#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
/*
* The shared struct is never shared on HV,
* so we can always use host endianness
*/
#ifdef __BIG_ENDIAN__
vcpu->arch.shared_big_endian = true;
#else
vcpu->arch.shared_big_endian = false;
#endif
#endif
vcpu->arch.mmcr[0] = MMCR0_FC;
vcpu->arch.ctrl = CTRL_RUNLATCH;
/* default to host PVR, since we can't spoof it */
......@@ -1949,6 +1930,13 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
* support pte_enc here
*/
(*sps)->enc[0].pte_enc = def->penc[linux_psize];
/*
* Add 16MB MPSS support if host supports it
*/
if (linux_psize != MMU_PAGE_16M && def->penc[MMU_PAGE_16M] != -1) {
(*sps)->enc[1].page_shift = 24;
(*sps)->enc[1].pte_enc = def->penc[MMU_PAGE_16M];
}
(*sps)++;
}
......
......@@ -42,13 +42,14 @@ static int global_invalidates(struct kvm *kvm, unsigned long flags)
/*
* If there is only one vcore, and it's currently running,
* as indicated by local_paca->kvm_hstate.kvm_vcpu being set,
* we can use tlbiel as long as we mark all other physical
* cores as potentially having stale TLB entries for this lpid.
* If we're not using MMU notifiers, we never take pages away
* from the guest, so we can use tlbiel if requested.
* Otherwise, don't use tlbiel.
*/
if (kvm->arch.online_vcores == 1 && local_paca->kvm_hstate.kvm_vcore)
if (kvm->arch.online_vcores == 1 && local_paca->kvm_hstate.kvm_vcpu)
global = 0;
else if (kvm->arch.using_mmu_notifiers)
global = 1;
......
......@@ -86,6 +86,12 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
lbz r4, LPPACA_PMCINUSE(r3)
cmpwi r4, 0
beq 23f /* skip if not */
BEGIN_FTR_SECTION
ld r3, HSTATE_MMCR(r13)
andi. r4, r3, MMCR0_PMAO_SYNC | MMCR0_PMAO
cmpwi r4, MMCR0_PMAO
beql kvmppc_fix_pmao
END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
lwz r3, HSTATE_PMC(r13)
lwz r4, HSTATE_PMC + 4(r13)
lwz r5, HSTATE_PMC + 8(r13)
......@@ -737,6 +743,12 @@ skip_tm:
sldi r3, r3, 31 /* MMCR0_FC (freeze counters) bit */
mtspr SPRN_MMCR0, r3 /* freeze all counters, disable ints */
isync
BEGIN_FTR_SECTION
ld r3, VCPU_MMCR(r4)
andi. r5, r3, MMCR0_PMAO_SYNC | MMCR0_PMAO
cmpwi r5, MMCR0_PMAO
beql kvmppc_fix_pmao
END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
lwz r3, VCPU_PMC(r4) /* always load up guest PMU registers */
lwz r5, VCPU_PMC + 4(r4) /* to prevent information leak */
lwz r6, VCPU_PMC + 8(r4)
......@@ -1439,6 +1451,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM)
25:
/* Save PMU registers if requested */
/* r8 and cr0.eq are live here */
BEGIN_FTR_SECTION
/*
* POWER8 seems to have a hardware bug where setting
* MMCR0[PMAE] along with MMCR0[PMC1CE] and/or MMCR0[PMCjCE]
* when some counters are already negative doesn't seem
* to cause a performance monitor alert (and hence interrupt).
* The effect of this is that when saving the PMU state,
* if there is no PMU alert pending when we read MMCR0
* before freezing the counters, but one becomes pending
* before we read the counters, we lose it.
* To work around this, we need a way to freeze the counters
* before reading MMCR0. Normally, freezing the counters
* is done by writing MMCR0 (to set MMCR0[FC]) which
* unavoidably writes MMCR0[PMA0] as well. On POWER8,
* we can also freeze the counters using MMCR2, by writing
* 1s to all the counter freeze condition bits (there are
* 9 bits each for 6 counters).
*/
li r3, -1 /* set all freeze bits */
clrrdi r3, r3, 10
mfspr r10, SPRN_MMCR2
mtspr SPRN_MMCR2, r3
isync
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
li r3, 1
sldi r3, r3, 31 /* MMCR0_FC (freeze counters) bit */
mfspr r4, SPRN_MMCR0 /* save MMCR0 */
......@@ -1462,6 +1498,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
std r4, VCPU_MMCR(r9)
std r5, VCPU_MMCR + 8(r9)
std r6, VCPU_MMCR + 16(r9)
BEGIN_FTR_SECTION
std r10, VCPU_MMCR + 24(r9)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
std r7, VCPU_SIAR(r9)
std r8, VCPU_SDAR(r9)
mfspr r3, SPRN_PMC1
......@@ -1485,12 +1524,10 @@ BEGIN_FTR_SECTION
stw r11, VCPU_PMC + 28(r9)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
BEGIN_FTR_SECTION
mfspr r4, SPRN_MMCR2
mfspr r5, SPRN_SIER
mfspr r6, SPRN_SPMC1
mfspr r7, SPRN_SPMC2
mfspr r8, SPRN_MMCRS
std r4, VCPU_MMCR + 24(r9)
std r5, VCPU_SIER(r9)
stw r6, VCPU_PMC + 24(r9)
stw r7, VCPU_PMC + 28(r9)
......@@ -2227,6 +2264,7 @@ machine_check_realmode:
beq mc_cont
/* If not, deliver a machine check. SRR0/1 are already set */
li r10, BOOK3S_INTERRUPT_MACHINE_CHECK
ld r11, VCPU_MSR(r9)
bl kvmppc_msr_interrupt
b fast_interrupt_c_return
......@@ -2431,3 +2469,21 @@ kvmppc_msr_interrupt:
li r0, 1
1: rldimi r11, r0, MSR_TS_S_LG, 63 - MSR_TS_T_LG
blr
/*
* This works around a hardware bug on POWER8E processors, where
* writing a 1 to the MMCR0[PMAO] bit doesn't generate a
* performance monitor interrupt. Instead, when we need to have
* an interrupt pending, we have to arrange for a counter to overflow.
*/
kvmppc_fix_pmao:
li r3, 0
mtspr SPRN_MMCR2, r3
lis r3, (MMCR0_PMXE | MMCR0_FCECE)@h
ori r3, r3, MMCR0_PMCjCE | MMCR0_C56RUN
mtspr SPRN_MMCR0, r3
lis r3, 0x7fff
ori r3, r3, 0xffff
mtspr SPRN_PMC6, r3
isync
blr
......@@ -104,8 +104,27 @@ kvm_start_lightweight:
stb r3, HSTATE_RESTORE_HID5(r13)
/* Load up guest SPRG3 value, since it's user readable */
ld r3, VCPU_SHARED(r4)
ld r3, VCPU_SHARED_SPRG3(r3)
lwz r3, VCPU_SHAREDBE(r4)
cmpwi r3, 0
ld r5, VCPU_SHARED(r4)
beq sprg3_little_endian
sprg3_big_endian:
#ifdef __BIG_ENDIAN__
ld r3, VCPU_SHARED_SPRG3(r5)
#else
addi r5, r5, VCPU_SHARED_SPRG3
ldbrx r3, 0, r5
#endif
b after_sprg3_load
sprg3_little_endian:
#ifdef __LITTLE_ENDIAN__
ld r3, VCPU_SHARED_SPRG3(r5)
#else
addi r5, r5, VCPU_SHARED_SPRG3
ldbrx r3, 0, r5
#endif
after_sprg3_load:
mtspr SPRN_SPRG3, r3
#endif /* CONFIG_PPC_BOOK3S_64 */
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment