Commit e5f62e27 authored by Paolo Bonzini's avatar Paolo Bonzini

Merge tag 'kvmarm-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 6.10

- Move a lot of state that was previously stored on a per vcpu
  basis into a per-CPU area, because it is only pertinent to the
  host while the vcpu is loaded. This results in better state
  tracking, and a smaller vcpu structure.

- Add full handling of the ERET/ERETAA/ERETAB instructions in
  nested virtualisation. The last two instructions also require
  emulating part of the pointer authentication extension.
  As a result, the trap handling of pointer authentication has
  been greattly simplified.

- Turn the global (and not very scalable) LPI translation cache
  into a per-ITS, scalable cache, making non directly injected
  LPIs much cheaper to make visible to the vcpu.

- A batch of pKVM patches, mostly fixes and cleanups, as the
  upstreaming process seems to be resuming. Fingers crossed!

- Allocate PPIs and SGIs outside of the vcpu structure, allowing
  for smaller EL2 mapping and some flexibility in implementing
  more or less than 32 private IRQs.

- Purge stale mpidr_data if a vcpu is created after the MPIDR
  map has been created.

- Preserve vcpu-specific ID registers across a vcpu reset.

- Various minor cleanups and improvements.
parents 4232da23 eaa46a28
......@@ -6894,6 +6894,13 @@ Note that KVM does not skip the faulting instruction as it does for
KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
if it decides to decode and emulate the instruction.
This feature isn't available to protected VMs, as userspace does not
have access to the state that is required to perform the emulation.
Instead, a data abort exception is directly injected in the guest.
Note that although KVM_CAP_ARM_NISV_TO_USER will be reported if
queried outside of a protected VM context, the feature will not be
exposed if queried on a protected VM file descriptor.
::
/* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
......
.. SPDX-License-Identifier: GPL-2.0
=======================================
ARM firmware pseudo-registers interface
=======================================
KVM handles the hypercall services as requested by the guests. New hypercall
services are regularly made available by the ARM specification or by KVM (as
vendor services) if they make sense from a virtualization point of view.
This means that a guest booted on two different versions of KVM can observe
two different "firmware" revisions. This could cause issues if a given guest
is tied to a particular version of a hypercall service, or if a migration
causes a different version to be exposed out of the blue to an unsuspecting
guest.
In order to remedy this situation, KVM exposes a set of "firmware
pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
interface. These registers can be saved/restored by userspace, and set
to a convenient value as required.
The following registers are defined:
* KVM_REG_ARM_PSCI_VERSION:
KVM implements the PSCI (Power State Coordination Interface)
specification in order to provide services such as CPU on/off, reset
and power-off to the guest.
- Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
(and thus has already been initialized)
- Returns the current PSCI version on GET_ONE_REG (defaulting to the
highest PSCI version implemented by KVM and compatible with v0.2)
- Allows any PSCI version implemented by KVM and compatible with
v0.2 to be set with SET_ONE_REG
- Affects the whole VM (even if the register view is per-vcpu)
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
Holds the state of the firmware support to mitigate CVE-2017-5715, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_1 in [1].
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL:
KVM does not offer
firmware support for the workaround. The mitigation status for the
guest is unknown.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL:
The workaround HVC call is
available to the guest and required for the mitigation.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED:
The workaround HVC call
is available to the guest, but it is not needed on this VCPU.
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
Holds the state of the firmware support to mitigate CVE-2018-3639, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_2 in [1]_.
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
A workaround is not
available. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
The workaround state is
unknown. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
The workaround is available,
and can be disabled by a vCPU. If
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for
this vCPU.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
The workaround is always active on this vCPU or it is not needed.
Bitmap Feature Firmware Registers
---------------------------------
Contrary to the above registers, the following registers exposes the
hypercall services in the form of a feature-bitmap to the userspace. This
bitmap is translated to the services that are available to the guest.
There is a register defined per service call owner and can be accessed via
GET/SET_ONE_REG interface.
By default, these registers are set with the upper limit of the features
that are supported. This way userspace can discover all the usable
hypercall services via GET_ONE_REG. The user-space can write-back the
desired bitmap back via SET_ONE_REG. The features for the registers that
are untouched, probably because userspace isn't aware of them, will be
exposed as is to the guest.
Note that KVM will not allow the userspace to configure the registers
anymore once any of the vCPUs has run at least once. Instead, it will
return a -EBUSY.
The pseudo-firmware bitmap register are as follows:
* KVM_REG_ARM_STD_BMAP:
Controls the bitmap of the ARM Standard Secure Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_BIT_TRNG_V1_0:
The bit represents the services offered under v1.0 of ARM True Random
Number Generator (TRNG) specification, ARM DEN0098.
* KVM_REG_ARM_STD_HYP_BMAP:
Controls the bitmap of the ARM Standard Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
The bit represents the Paravirtualized Time service as represented by
ARM DEN0057A.
* KVM_REG_ARM_VENDOR_HYP_BMAP:
Controls the bitmap of the Vendor specific Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT
The bit represents the ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID
and ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID function-ids.
Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
The bit represents the Precision Time Protocol KVM service.
Errors:
======= =============================================================
-ENOENT Unknown register accessed.
-EBUSY Attempt a 'write' to the register after the VM has started.
-EINVAL Invalid bitmap written to the register.
======= =============================================================
.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
.. SPDX-License-Identifier: GPL-2.0
=======================
ARM Hypercall Interface
=======================
KVM handles the hypercall services as requested by the guests. New hypercall
services are regularly made available by the ARM specification or by KVM (as
vendor services) if they make sense from a virtualization point of view.
This means that a guest booted on two different versions of KVM can observe
two different "firmware" revisions. This could cause issues if a given guest
is tied to a particular version of a hypercall service, or if a migration
causes a different version to be exposed out of the blue to an unsuspecting
guest.
In order to remedy this situation, KVM exposes a set of "firmware
pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
interface. These registers can be saved/restored by userspace, and set
to a convenient value as required.
The following registers are defined:
* KVM_REG_ARM_PSCI_VERSION:
KVM implements the PSCI (Power State Coordination Interface)
specification in order to provide services such as CPU on/off, reset
and power-off to the guest.
- Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
(and thus has already been initialized)
- Returns the current PSCI version on GET_ONE_REG (defaulting to the
highest PSCI version implemented by KVM and compatible with v0.2)
- Allows any PSCI version implemented by KVM and compatible with
v0.2 to be set with SET_ONE_REG
- Affects the whole VM (even if the register view is per-vcpu)
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
Holds the state of the firmware support to mitigate CVE-2017-5715, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_1 in [1].
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL:
KVM does not offer
firmware support for the workaround. The mitigation status for the
guest is unknown.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL:
The workaround HVC call is
available to the guest and required for the mitigation.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED:
The workaround HVC call
is available to the guest, but it is not needed on this VCPU.
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
Holds the state of the firmware support to mitigate CVE-2018-3639, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_2 in [1]_.
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
A workaround is not
available. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
The workaround state is
unknown. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
The workaround is available,
and can be disabled by a vCPU. If
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for
this vCPU.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
The workaround is always active on this vCPU or it is not needed.
Bitmap Feature Firmware Registers
---------------------------------
Contrary to the above registers, the following registers exposes the
hypercall services in the form of a feature-bitmap to the userspace. This
bitmap is translated to the services that are available to the guest.
There is a register defined per service call owner and can be accessed via
GET/SET_ONE_REG interface.
By default, these registers are set with the upper limit of the features
that are supported. This way userspace can discover all the usable
hypercall services via GET_ONE_REG. The user-space can write-back the
desired bitmap back via SET_ONE_REG. The features for the registers that
are untouched, probably because userspace isn't aware of them, will be
exposed as is to the guest.
Note that KVM will not allow the userspace to configure the registers
anymore once any of the vCPUs has run at least once. Instead, it will
return a -EBUSY.
The pseudo-firmware bitmap register are as follows:
* KVM_REG_ARM_STD_BMAP:
Controls the bitmap of the ARM Standard Secure Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_BIT_TRNG_V1_0:
The bit represents the services offered under v1.0 of ARM True Random
Number Generator (TRNG) specification, ARM DEN0098.
* KVM_REG_ARM_STD_HYP_BMAP:
Controls the bitmap of the ARM Standard Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
The bit represents the Paravirtualized Time service as represented by
ARM DEN0057A.
* KVM_REG_ARM_VENDOR_HYP_BMAP:
Controls the bitmap of the Vendor specific Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT
The bit represents the ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID
and ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID function-ids.
Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
The bit represents the Precision Time Protocol KVM service.
Errors:
======= =============================================================
-ENOENT Unknown register accessed.
-EBUSY Attempt a 'write' to the register after the VM has started.
-EINVAL Invalid bitmap written to the register.
======= =============================================================
.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
===============================================
KVM/arm64-specific hypercalls exposed to guests
===============================================
This file documents the KVM/arm64-specific hypercalls which may be
exposed by KVM/arm64 to guest operating systems. These hypercalls are
issued using the HVC instruction according to version 1.1 of the Arm SMC
Calling Convention (DEN0028/C):
https://developer.arm.com/docs/den0028/c
All KVM/arm64-specific hypercalls are allocated within the "Vendor
Specific Hypervisor Service Call" range with a UID of
``28b46fb6-2ec5-11e9-a9ca-4b564d003a74``. This UID should be queried by the
guest using the standard "Call UID" function for the service range in
order to determine that the KVM/arm64-specific hypercalls are available.
``ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID``
---------------------------------------------
Provides a discovery mechanism for other KVM/arm64 hypercalls.
+---------------------+-------------------------------------------------------------+
| Presence: | Mandatory for the KVM/arm64 UID |
+---------------------+-------------------------------------------------------------+
| Calling convention: | HVC32 |
+---------------------+----------+--------------------------------------------------+
| Function ID: | (uint32) | 0x86000000 |
+---------------------+----------+--------------------------------------------------+
| Arguments: | None |
+---------------------+----------+----+---------------------------------------------+
| Return Values: | (uint32) | R0 | Bitmap of available function numbers 0-31 |
| +----------+----+---------------------------------------------+
| | (uint32) | R1 | Bitmap of available function numbers 32-63 |
| +----------+----+---------------------------------------------+
| | (uint32) | R2 | Bitmap of available function numbers 64-95 |
| +----------+----+---------------------------------------------+
| | (uint32) | R3 | Bitmap of available function numbers 96-127 |
+---------------------+----------+----+---------------------------------------------+
``ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID``
----------------------------------------
See ptp_kvm.rst
......@@ -7,6 +7,7 @@ ARM
.. toctree::
:maxdepth: 2
fw-pseudo-registers
hyp-abi
hypercalls
pvtime
......
......@@ -7,19 +7,29 @@ PTP_KVM is used for high precision time sync between host and guests.
It relies on transferring the wall clock and counter value from the
host to the guest using a KVM-specific hypercall.
* ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID: 0x86000001
``ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID``
----------------------------------------
This hypercall uses the SMC32/HVC32 calling convention:
Retrieve current time information for the specific counter. There are no
endianness restrictions.
ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID
============== ======== =====================================
Function ID: (uint32) 0x86000001
Arguments: (uint32) KVM_PTP_VIRT_COUNTER(0)
KVM_PTP_PHYS_COUNTER(1)
Return Values: (int32) NOT_SUPPORTED(-1) on error, or
(uint32) Upper 32 bits of wall clock time (r0)
(uint32) Lower 32 bits of wall clock time (r1)
(uint32) Upper 32 bits of counter (r2)
(uint32) Lower 32 bits of counter (r3)
Endianness: No Restrictions.
============== ======== =====================================
+---------------------+-------------------------------------------------------+
| Presence: | Optional |
+---------------------+-------------------------------------------------------+
| Calling convention: | HVC32 |
+---------------------+----------+--------------------------------------------+
| Function ID: | (uint32) | 0x86000001 |
+---------------------+----------+----+---------------------------------------+
| Arguments: | (uint32) | R1 | ``KVM_PTP_VIRT_COUNTER (0)`` |
| | | +---------------------------------------+
| | | | ``KVM_PTP_PHYS_COUNTER (1)`` |
+---------------------+----------+----+---------------------------------------+
| Return Values: | (int32) | R0 | ``NOT_SUPPORTED (-1)`` on error, else |
| | | | upper 32 bits of wall clock time |
| +----------+----+---------------------------------------+
| | (uint32) | R1 | Lower 32 bits of wall clock time |
| +----------+----+---------------------------------------+
| | (uint32) | R2 | Upper 32 bits of counter |
| +----------+----+---------------------------------------+
| | (uint32) | R3 | Lower 32 bits of counter |
+---------------------+----------+----+---------------------------------------+
......@@ -404,6 +404,18 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
return (esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_ACCESS;
}
/* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
static inline bool esr_iss_is_eretax(unsigned long esr)
{
return esr & ESR_ELx_ERET_ISS_ERET;
}
/* Indicate which key is used for ERETAx (false: A-Key, true: B-Key) */
static inline bool esr_iss_is_eretab(unsigned long esr)
{
return esr & ESR_ELx_ERET_ISS_ERETA;
}
const char *esr_get_class_string(unsigned long esr);
#endif /* __ASSEMBLY */
......
......@@ -73,10 +73,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,
__KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
__KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr,
__KVM_HOST_SMCCC_FUNC___vgic_v3_write_vmcr,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
......@@ -241,8 +239,6 @@ extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
extern void __kvm_adjust_pc(struct kvm_vcpu *vcpu);
extern u64 __vgic_v3_get_gic_config(void);
extern u64 __vgic_v3_read_vmcr(void);
extern void __vgic_v3_write_vmcr(u32 vmcr);
extern void __vgic_v3_init_lrs(void);
extern u64 __kvm_get_mdcr_el2(void);
......
......@@ -125,16 +125,6 @@ static inline void vcpu_set_wfx_traps(struct kvm_vcpu *vcpu)
vcpu->arch.hcr_el2 |= HCR_TWI;
}
static inline void vcpu_ptrauth_enable(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 |= (HCR_API | HCR_APK);
}
static inline void vcpu_ptrauth_disable(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 &= ~(HCR_API | HCR_APK);
}
static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.vsesr_el2;
......@@ -587,16 +577,14 @@ static __always_inline u64 kvm_get_reset_cptr_el2(struct kvm_vcpu *vcpu)
} else if (has_hvhe()) {
val = (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN);
if (!vcpu_has_sve(vcpu) ||
(vcpu->arch.fp_state != FP_STATE_GUEST_OWNED))
if (!vcpu_has_sve(vcpu) || !guest_owns_fp_regs())
val |= CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN;
if (cpus_have_final_cap(ARM64_SME))
val |= CPACR_EL1_SMEN_EL1EN | CPACR_EL1_SMEN_EL0EN;
} else {
val = CPTR_NVHE_EL2_RES1;
if (vcpu_has_sve(vcpu) &&
(vcpu->arch.fp_state == FP_STATE_GUEST_OWNED))
if (vcpu_has_sve(vcpu) && guest_owns_fp_regs())
val |= CPTR_EL2_TZ;
if (cpus_have_final_cap(ARM64_SME))
val &= ~CPTR_EL2_TSM;
......
......@@ -211,6 +211,7 @@ typedef unsigned int pkvm_handle_t;
struct kvm_protected_vm {
pkvm_handle_t handle;
struct kvm_hyp_memcache teardown_mc;
bool enabled;
};
struct kvm_mpidr_data {
......@@ -220,20 +221,10 @@ struct kvm_mpidr_data {
static inline u16 kvm_mpidr_index(struct kvm_mpidr_data *data, u64 mpidr)
{
unsigned long mask = data->mpidr_mask;
u64 aff = mpidr & MPIDR_HWID_BITMASK;
int nbits, bit, bit_idx = 0;
u16 index = 0;
unsigned long index = 0, mask = data->mpidr_mask;
unsigned long aff = mpidr & MPIDR_HWID_BITMASK;
/*
* If this looks like RISC-V's BEXT or x86's PEXT
* instructions, it isn't by accident.
*/
nbits = fls(mask);
for_each_set_bit(bit, &mask, nbits) {
index |= (aff & BIT(bit)) >> (bit - bit_idx);
bit_idx++;
}
bitmap_gather(&index, &aff, &mask, fls(mask));
return index;
}
......@@ -530,8 +521,42 @@ struct kvm_cpu_context {
u64 *vncr_array;
};
/*
* This structure is instantiated on a per-CPU basis, and contains
* data that is:
*
* - tied to a single physical CPU, and
* - either have a lifetime that does not extend past vcpu_put()
* - or is an invariant for the lifetime of the system
*
* Use host_data_ptr(field) as a way to access a pointer to such a
* field.
*/
struct kvm_host_data {
struct kvm_cpu_context host_ctxt;
struct user_fpsimd_state *fpsimd_state; /* hyp VA */
/* Ownership of the FP regs */
enum {
FP_STATE_FREE,
FP_STATE_HOST_OWNED,
FP_STATE_GUEST_OWNED,
} fp_owner;
/*
* host_debug_state contains the host registers which are
* saved and restored during world switches.
*/
struct {
/* {Break,watch}point registers */
struct kvm_guest_debug_arch regs;
/* Statistical profiling extension */
u64 pmscr_el1;
/* Self-hosted trace */
u64 trfcr_el1;
/* Values of trap registers for the host before guest entry. */
u64 mdcr_el2;
} host_debug_state;
};
struct kvm_host_psci_config {
......@@ -592,19 +617,9 @@ struct kvm_vcpu_arch {
u64 mdcr_el2;
u64 cptr_el2;
/* Values of trap registers for the host before guest entry. */
u64 mdcr_el2_host;
/* Exception Information */
struct kvm_vcpu_fault_info fault;
/* Ownership of the FP regs */
enum {
FP_STATE_FREE,
FP_STATE_HOST_OWNED,
FP_STATE_GUEST_OWNED,
} fp_state;
/* Configuration flags, set once and for all before the vcpu can run */
u8 cflags;
......@@ -627,11 +642,10 @@ struct kvm_vcpu_arch {
* We maintain more than a single set of debug registers to support
* debugging the guest from the host and to maintain separate host and
* guest state during world switches. vcpu_debug_state are the debug
* registers of the vcpu as the guest sees them. host_debug_state are
* the host registers which are saved and restored during
* world switches. external_debug_state contains the debug
* values we want to debug the guest. This is set via the
* KVM_SET_GUEST_DEBUG ioctl.
* registers of the vcpu as the guest sees them.
*
* external_debug_state contains the debug values we want to debug the
* guest. This is set via the KVM_SET_GUEST_DEBUG ioctl.
*
* debug_ptr points to the set of debug registers that should be loaded
* onto the hardware when running the guest.
......@@ -640,18 +654,6 @@ struct kvm_vcpu_arch {
struct kvm_guest_debug_arch vcpu_debug_state;
struct kvm_guest_debug_arch external_debug_state;
struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
struct task_struct *parent_task;
struct {
/* {Break,watch}point registers */
struct kvm_guest_debug_arch regs;
/* Statistical profiling extension */
u64 pmscr_el1;
/* Self-hosted trace */
u64 trfcr_el1;
} host_debug_state;
/* VGIC state */
struct vgic_cpu vgic_cpu;
struct arch_timer_cpu timer_cpu;
......@@ -817,8 +819,6 @@ struct kvm_vcpu_arch {
#define DEBUG_STATE_SAVE_SPE __vcpu_single_flag(iflags, BIT(5))
/* Save TRBE context if active */
#define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
/* vcpu running in HYP context */
#define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
/* SVE enabled for host EL0 */
#define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
......@@ -896,7 +896,7 @@ struct kvm_vcpu_arch {
* Don't bother with VNCR-based accesses in the nVHE code, it has no
* business dealing with NV.
*/
static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
static inline u64 *___ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
{
#if !defined (__KVM_NVHE_HYPERVISOR__)
if (unlikely(cpus_have_final_cap(ARM64_HAS_NESTED_VIRT) &&
......@@ -906,6 +906,13 @@ static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
return (u64 *)&ctxt->sys_regs[r];
}
#define __ctxt_sys_reg(c,r) \
({ \
BUILD_BUG_ON(__builtin_constant_p(r) && \
(r) >= NR_SYS_REGS); \
___ctxt_sys_reg(c, r); \
})
#define ctxt_sys_reg(c,r) (*__ctxt_sys_reg(c,r))
u64 kvm_vcpu_sanitise_vncr_reg(const struct kvm_vcpu *, enum vcpu_sysreg);
......@@ -1168,6 +1175,44 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
DECLARE_KVM_HYP_PER_CPU(struct kvm_host_data, kvm_host_data);
/*
* How we access per-CPU host data depends on the where we access it from,
* and the mode we're in:
*
* - VHE and nVHE hypervisor bits use their locally defined instance
*
* - the rest of the kernel use either the VHE or nVHE one, depending on
* the mode we're running in.
*
* Unless we're in protected mode, fully deprivileged, and the nVHE
* per-CPU stuff is exclusively accessible to the protected EL2 code.
* In this case, the EL1 code uses the *VHE* data as its private state
* (which makes sense in a way as there shouldn't be any shared state
* between the host and the hypervisor).
*
* Yes, this is all totally trivial. Shoot me now.
*/
#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
#define host_data_ptr(f) (&this_cpu_ptr(&kvm_host_data)->f)
#else
#define host_data_ptr(f) \
(static_branch_unlikely(&kvm_protected_mode_initialized) ? \
&this_cpu_ptr(&kvm_host_data)->f : \
&this_cpu_ptr_hyp_sym(kvm_host_data)->f)
#endif
/* Check whether the FP regs are owned by the guest */
static inline bool guest_owns_fp_regs(void)
{
return *host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED;
}
/* Check whether the FP regs are owned by the host */
static inline bool host_owns_fp_regs(void)
{
return *host_data_ptr(fp_owner) == FP_STATE_HOST_OWNED;
}
static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt)
{
/* The host's MPIDR is immutable, so let's set it up at boot time */
......@@ -1211,7 +1256,6 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu);
static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
{
......@@ -1247,10 +1291,9 @@ struct kvm *kvm_arch_alloc_vm(void);
#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
static inline bool kvm_vm_is_protected(struct kvm *kvm)
{
return false;
}
#define kvm_vm_is_protected(kvm) (is_protected_kvm_enabled() && (kvm)->arch.pkvm.enabled)
#define vcpu_is_protected(vcpu) kvm_vm_is_protected((vcpu)->kvm)
int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature);
bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
......@@ -1275,6 +1318,8 @@ static inline bool __vcpu_has_feature(const struct kvm_arch *ka, int feature)
#define vcpu_has_feature(v, f) __vcpu_has_feature(&(v)->kvm->arch, (f))
#define kvm_vcpu_initialized(v) vcpu_get_flag(vcpu, VCPU_INITIALIZED)
int kvm_trng_call(struct kvm_vcpu *vcpu);
#ifdef CONFIG_KVM
extern phys_addr_t hyp_mem_base;
......@@ -1331,4 +1376,19 @@ bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
(get_idreg_field((kvm), id, fld) >= expand_field_sign(id, fld, min) && \
get_idreg_field((kvm), id, fld) <= expand_field_sign(id, fld, max))
/* Check for a given level of PAuth support */
#define kvm_has_pauth(k, l) \
({ \
bool pa, pi, pa3; \
\
pa = kvm_has_feat((k), ID_AA64ISAR1_EL1, APA, l); \
pa &= kvm_has_feat((k), ID_AA64ISAR1_EL1, GPA, IMP); \
pi = kvm_has_feat((k), ID_AA64ISAR1_EL1, API, l); \
pi &= kvm_has_feat((k), ID_AA64ISAR1_EL1, GPI, IMP); \
pa3 = kvm_has_feat((k), ID_AA64ISAR2_EL1, APA3, l); \
pa3 &= kvm_has_feat((k), ID_AA64ISAR2_EL1, GPA3, IMP); \
\
(pa + pi + pa3) == 1; \
})
#endif /* __ARM64_KVM_HOST_H__ */
......@@ -80,8 +80,8 @@ void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_save_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_restore_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
#ifdef __KVM_NVHE_HYPERVISOR__
......
......@@ -60,7 +60,20 @@ static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
return ttbr0 & ~GENMASK_ULL(63, 48);
}
extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
int kvm_init_nv_sysregs(struct kvm *kvm);
#ifdef CONFIG_ARM64_PTR_AUTH
bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr);
#else
static inline bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr)
{
/* We really should never execute this... */
WARN_ON_ONCE(1);
*elr = 0xbad9acc0debadbad;
return false;
}
#endif
#endif /* __ARM64_KVM_NESTED_H */
......@@ -99,5 +99,26 @@ alternative_else_nop_endif
.macro ptrauth_switch_to_hyp g_ctxt, h_ctxt, reg1, reg2, reg3
.endm
#endif /* CONFIG_ARM64_PTR_AUTH */
#else /* !__ASSEMBLY */
#define __ptrauth_save_key(ctxt, key) \
do { \
u64 __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYLO_EL1); \
ctxt_sys_reg(ctxt, key ## KEYLO_EL1) = __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYHI_EL1); \
ctxt_sys_reg(ctxt, key ## KEYHI_EL1) = __val; \
} while(0)
#define ptrauth_save_keys(ctxt) \
do { \
__ptrauth_save_key(ctxt, APIA); \
__ptrauth_save_key(ctxt, APIB); \
__ptrauth_save_key(ctxt, APDA); \
__ptrauth_save_key(ctxt, APDB); \
__ptrauth_save_key(ctxt, APGA); \
} while(0)
#endif /* __ASSEMBLY__ */
#endif /* __ASM_KVM_PTRAUTH_H */
......@@ -297,6 +297,7 @@
#define TCR_TBI1 (UL(1) << 38)
#define TCR_HA (UL(1) << 39)
#define TCR_HD (UL(1) << 40)
#define TCR_TBID0 (UL(1) << 51)
#define TCR_TBID1 (UL(1) << 52)
#define TCR_NFD0 (UL(1) << 53)
#define TCR_NFD1 (UL(1) << 54)
......
......@@ -82,6 +82,12 @@ bool is_kvm_arm_initialised(void);
DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
static inline bool is_pkvm_initialized(void)
{
return IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized);
}
/* Reports the availability of HYP mode */
static inline bool is_hyp_mode_available(void)
{
......@@ -89,8 +95,7 @@ static inline bool is_hyp_mode_available(void)
* If KVM protected mode is initialized, all CPUs must have been booted
* in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
*/
if (IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized))
if (is_pkvm_initialized())
return true;
return (__boot_cpu_mode[0] == BOOT_CPU_MODE_EL2 &&
......@@ -104,8 +109,7 @@ static inline bool is_hyp_mode_mismatched(void)
* If KVM protected mode is initialized, all CPUs must have been booted
* in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
*/
if (IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized))
if (is_pkvm_initialized())
return false;
return __boot_cpu_mode[0] != __boot_cpu_mode[1];
......
......@@ -209,8 +209,8 @@ static const struct {
char alias[FTR_ALIAS_NAME_LEN];
char feature[FTR_ALIAS_OPTION_LEN];
} aliases[] __initconst = {
{ "kvm_arm.mode=nvhe", "id_aa64mmfr1.vh=0" },
{ "kvm_arm.mode=protected", "id_aa64mmfr1.vh=0" },
{ "kvm_arm.mode=nvhe", "arm64_sw.hvhe=0 id_aa64mmfr1.vh=0" },
{ "kvm_arm.mode=protected", "arm64_sw.hvhe=1" },
{ "arm64.nosve", "id_aa64pfr0.sve=0" },
{ "arm64.nosme", "id_aa64pfr1.sme=0" },
{ "arm64.nobti", "id_aa64pfr1.bt=0" },
......
......@@ -23,6 +23,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
vgic/vgic-its.o vgic/vgic-debug.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o
always-y := hyp_constants.h hyp-constants.s
......
This diff is collapsed.
......@@ -2117,6 +2117,26 @@ bool triage_sysreg_trap(struct kvm_vcpu *vcpu, int *sr_index)
return true;
}
static bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
{
bool control_bit_set;
if (!vcpu_has_nv(vcpu))
return false;
control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
if (!is_hyp_ctxt(vcpu) && control_bit_set) {
kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
return true;
}
return false;
}
bool forward_smc_trap(struct kvm_vcpu *vcpu)
{
return forward_traps(vcpu, HCR_TSC);
}
static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
{
u64 mode = spsr & PSR_MODE_MASK;
......@@ -2152,37 +2172,39 @@ static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
{
u64 spsr, elr, mode;
bool direct_eret;
u64 spsr, elr, esr;
/*
* Going through the whole put/load motions is a waste of time
* if this is a VHE guest hypervisor returning to its own
* userspace, or the hypervisor performing a local exception
* return. No need to save/restore registers, no need to
* switch S2 MMU. Just do the canonical ERET.
* Forward this trap to the virtual EL2 if the virtual
* HCR_EL2.NV bit is set and this is coming from !EL2.
*/
spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
spsr = kvm_check_illegal_exception_return(vcpu, spsr);
mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
direct_eret = (mode == PSR_MODE_EL0t &&
vcpu_el2_e2h_is_set(vcpu) &&
vcpu_el2_tge_is_set(vcpu));
direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
if (direct_eret) {
*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
*vcpu_cpsr(vcpu) = spsr;
trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
if (forward_traps(vcpu, HCR_NV))
return;
/* Check for an ERETAx */
esr = kvm_vcpu_get_esr(vcpu);
if (esr_iss_is_eretax(esr) && !kvm_auth_eretax(vcpu, &elr)) {
/*
* Oh no, ERETAx failed to authenticate. If we have
* FPACCOMBINE, deliver an exception right away. If we
* don't, then let the mangled ELR value trickle down the
* ERET handling, and the guest will have a little surprise.
*/
if (kvm_has_pauth(vcpu->kvm, FPACCOMBINE)) {
esr &= ESR_ELx_ERET_ISS_ERETA;
esr |= FIELD_PREP(ESR_ELx_EC_MASK, ESR_ELx_EC_FPAC);
kvm_inject_nested_sync(vcpu, esr);
return;
}
}
preempt_disable();
kvm_arch_vcpu_put(vcpu);
elr = __vcpu_sys_reg(vcpu, ELR_EL2);
spsr = __vcpu_sys_reg(vcpu, SPSR_EL2);
spsr = kvm_check_illegal_exception_return(vcpu, spsr);
if (!esr_iss_is_eretax(esr))
elr = __vcpu_sys_reg(vcpu, ELR_EL2);
trace_kvm_nested_eret(vcpu, elr, spsr);
......
......@@ -14,19 +14,6 @@
#include <asm/kvm_mmu.h>
#include <asm/sysreg.h>
void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
{
struct task_struct *p = vcpu->arch.parent_task;
struct user_fpsimd_state *fpsimd;
if (!is_protected_kvm_enabled() || !p)
return;
fpsimd = &p->thread.uw.fpsimd_state;
kvm_unshare_hyp(fpsimd, fpsimd + 1);
put_task_struct(p);
}
/*
* Called on entry to KVM_RUN unless this vcpu previously ran at least
* once and the most recent prior KVM_RUN for this vcpu was called from
......@@ -38,30 +25,18 @@ void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
*/
int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
{
int ret;
struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
int ret;
kvm_vcpu_unshare_task_fp(vcpu);
/* pKVM has its own tracking of the host fpsimd state. */
if (is_protected_kvm_enabled())
return 0;
/* Make sure the host task fpsimd state is visible to hyp: */
ret = kvm_share_hyp(fpsimd, fpsimd + 1);
if (ret)
return ret;
vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
/*
* We need to keep current's task_struct pinned until its data has been
* unshared with the hypervisor to make sure it is not re-used by the
* kernel and donated to someone else while already shared -- see
* kvm_vcpu_unshare_task_fp() for the matching put_task_struct().
*/
if (is_protected_kvm_enabled()) {
get_task_struct(current);
vcpu->arch.parent_task = current;
}
return 0;
}
......@@ -86,7 +61,8 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
* guest in kvm_arch_vcpu_ctxflush_fp() and override this to
* FP_STATE_FREE if the flag set.
*/
vcpu->arch.fp_state = FP_STATE_HOST_OWNED;
*host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
*host_data_ptr(fpsimd_state) = kern_hyp_va(&current->thread.uw.fpsimd_state);
vcpu_clear_flag(vcpu, HOST_SVE_ENABLED);
if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
......@@ -110,7 +86,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
* been saved, this is very unlikely to happen.
*/
if (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)) {
vcpu->arch.fp_state = FP_STATE_FREE;
*host_data_ptr(fp_owner) = FP_STATE_FREE;
fpsimd_save_and_flush_cpu_state();
}
}
......@@ -126,7 +102,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu)
{
if (test_thread_flag(TIF_FOREIGN_FPSTATE))
vcpu->arch.fp_state = FP_STATE_FREE;
*host_data_ptr(fp_owner) = FP_STATE_FREE;
}
/*
......@@ -142,8 +118,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
WARN_ON_ONCE(!irqs_disabled());
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
if (guest_owns_fp_regs()) {
/*
* Currently we do not support SME guests so SVCR is
* always 0 and we just need a variable to point to.
......@@ -196,16 +171,38 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
isb();
}
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
if (guest_owns_fp_regs()) {
if (vcpu_has_sve(vcpu)) {
__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
/* Restore the VL that was saved when bound to the CPU */
/*
* Restore the VL that was saved when bound to the CPU,
* which is the maximum VL for the guest. Because the
* layout of the data when saving the sve state depends
* on the VL, we need to use a consistent (i.e., the
* maximum) VL.
* Note that this means that at guest exit ZCR_EL1 is
* not necessarily the same as on guest entry.
*
* Restoring the VL isn't needed in VHE mode since
* ZCR_EL2 (accessed via ZCR_EL1) would fulfill the same
* role when doing the save from EL2.
*/
if (!has_vhe())
sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1,
SYS_ZCR_EL1);
}
/*
* Flush (save and invalidate) the fpsimd/sve state so that if
* the host tries to use fpsimd/sve, it's not using stale data
* from the guest.
*
* Flushing the state sets the TIF_FOREIGN_FPSTATE bit for the
* context unconditionally, in both nVHE and VHE. This allows
* the kernel to restore the fpsimd/sve state, including ZCR_EL1
* when needed.
*/
fpsimd_save_and_flush_cpu_state();
} else if (has_vhe() && system_supports_sve()) {
/*
......
......@@ -55,6 +55,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
static int handle_smc(struct kvm_vcpu *vcpu)
{
/*
* Forward this trapped smc instruction to the virtual EL2 if
* the guest has asked for it.
*/
if (forward_smc_trap(vcpu))
return 1;
/*
* "If an SMC instruction executed at Non-secure EL1 is
* trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
......@@ -207,19 +214,40 @@ static int handle_sve(struct kvm_vcpu *vcpu)
}
/*
* Guest usage of a ptrauth instruction (which the guest EL1 did not turn into
* a NOP). If we get here, it is that we didn't fixup ptrauth on exit, and all
* that we can do is give the guest an UNDEF.
* Two possibilities to handle a trapping ptrauth instruction:
*
* - Guest usage of a ptrauth instruction (which the guest EL1 did not
* turn into a NOP). If we get here, it is because we didn't enable
* ptrauth for the guest. This results in an UNDEF, as it isn't
* supposed to use ptrauth without being told it could.
*
* - Running an L2 NV guest while L1 has left HCR_EL2.API==0, and for
* which we reinject the exception into L1.
*
* Anything else is an emulation bug (hence the WARN_ON + UNDEF).
*/
static int kvm_handle_ptrauth(struct kvm_vcpu *vcpu)
{
if (!vcpu_has_ptrauth(vcpu)) {
kvm_inject_undefined(vcpu);
return 1;
}
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)) {
kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
return 1;
}
/* Really shouldn't be here! */
WARN_ON_ONCE(1);
kvm_inject_undefined(vcpu);
return 1;
}
static int kvm_handle_eret(struct kvm_vcpu *vcpu)
{
if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_ERET_ISS_ERET)
if (esr_iss_is_eretax(kvm_vcpu_get_esr(vcpu)) &&
!vcpu_has_ptrauth(vcpu))
return kvm_handle_ptrauth(vcpu);
/*
......
......@@ -135,9 +135,9 @@ static inline void __debug_switch_to_guest_common(struct kvm_vcpu *vcpu)
if (!vcpu_get_flag(vcpu, DEBUG_DIRTY))
return;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
guest_ctxt = &vcpu->arch.ctxt;
host_dbg = &vcpu->arch.host_debug_state.regs;
host_dbg = host_data_ptr(host_debug_state.regs);
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
__debug_save_state(host_dbg, host_ctxt);
......@@ -154,9 +154,9 @@ static inline void __debug_switch_to_host_common(struct kvm_vcpu *vcpu)
if (!vcpu_get_flag(vcpu, DEBUG_DIRTY))
return;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
guest_ctxt = &vcpu->arch.ctxt;
host_dbg = &vcpu->arch.host_debug_state.regs;
host_dbg = host_data_ptr(host_debug_state.regs);
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
__debug_save_state(guest_dbg, guest_ctxt);
......
......@@ -27,6 +27,7 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_nested.h>
#include <asm/kvm_ptrauth.h>
#include <asm/fpsimd.h>
#include <asm/debug-monitors.h>
#include <asm/processor.h>
......@@ -39,12 +40,6 @@ struct kvm_exception_table_entry {
extern struct kvm_exception_table_entry __start___kvm_ex_table;
extern struct kvm_exception_table_entry __stop___kvm_ex_table;
/* Check whether the FP regs are owned by the guest */
static inline bool guest_owns_fp_regs(struct kvm_vcpu *vcpu)
{
return vcpu->arch.fp_state == FP_STATE_GUEST_OWNED;
}
/* Save the 32-bit only FPSIMD system register state */
static inline void __fpsimd_save_fpexc32(struct kvm_vcpu *vcpu)
{
......@@ -155,7 +150,7 @@ static inline bool cpu_has_amu(void)
static inline void __activate_traps_hfgxtr(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
CHECK_FGT_MASKS(HFGRTR_EL2);
......@@ -191,7 +186,7 @@ static inline void __activate_traps_hfgxtr(struct kvm_vcpu *vcpu)
static inline void __deactivate_traps_hfgxtr(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
if (!cpus_have_final_cap(ARM64_HAS_FGT))
......@@ -226,13 +221,13 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
write_sysreg(0, pmselr_el0);
hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
hctxt = host_data_ptr(host_ctxt);
ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
vcpu_set_flag(vcpu, PMUSERENR_ON_CPU);
}
vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
*host_data_ptr(host_debug_state.mdcr_el2) = read_sysreg(mdcr_el2);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
if (cpus_have_final_cap(ARM64_HAS_HCX)) {
......@@ -254,13 +249,13 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
{
write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
write_sysreg(*host_data_ptr(host_debug_state.mdcr_el2), mdcr_el2);
write_sysreg(0, hstr_el2);
if (kvm_arm_support_pmu_v3()) {
struct kvm_cpu_context *hctxt;
hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
hctxt = host_data_ptr(host_ctxt);
write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0);
vcpu_clear_flag(vcpu, PMUSERENR_ON_CPU);
}
......@@ -271,10 +266,8 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
__deactivate_traps_hfgxtr(vcpu);
}
static inline void ___activate_traps(struct kvm_vcpu *vcpu)
static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)
{
u64 hcr = vcpu->arch.hcr_el2;
if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM))
hcr |= HCR_TVM;
......@@ -376,8 +369,8 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
isb();
/* Write out the host state if it's in the registers */
if (vcpu->arch.fp_state == FP_STATE_HOST_OWNED)
__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
if (host_owns_fp_regs())
__fpsimd_save_state(*host_data_ptr(fpsimd_state));
/* Restore the guest state */
if (sve_guest)
......@@ -389,7 +382,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
if (!(read_sysreg(hcr_el2) & HCR_RW))
write_sysreg(__vcpu_sys_reg(vcpu, FPEXC32_EL2), fpexc32_el2);
vcpu->arch.fp_state = FP_STATE_GUEST_OWNED;
*host_data_ptr(fp_owner) = FP_STATE_GUEST_OWNED;
return true;
}
......@@ -449,60 +442,6 @@ static inline bool handle_tx2_tvm(struct kvm_vcpu *vcpu)
return true;
}
static inline bool esr_is_ptrauth_trap(u64 esr)
{
switch (esr_sys64_to_sysreg(esr)) {
case SYS_APIAKEYLO_EL1:
case SYS_APIAKEYHI_EL1:
case SYS_APIBKEYLO_EL1:
case SYS_APIBKEYHI_EL1:
case SYS_APDAKEYLO_EL1:
case SYS_APDAKEYHI_EL1:
case SYS_APDBKEYLO_EL1:
case SYS_APDBKEYHI_EL1:
case SYS_APGAKEYLO_EL1:
case SYS_APGAKEYHI_EL1:
return true;
}
return false;
}
#define __ptrauth_save_key(ctxt, key) \
do { \
u64 __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYLO_EL1); \
ctxt_sys_reg(ctxt, key ## KEYLO_EL1) = __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYHI_EL1); \
ctxt_sys_reg(ctxt, key ## KEYHI_EL1) = __val; \
} while(0)
DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
static bool kvm_hyp_handle_ptrauth(struct kvm_vcpu *vcpu, u64 *exit_code)
{
struct kvm_cpu_context *ctxt;
u64 val;
if (!vcpu_has_ptrauth(vcpu))
return false;
ctxt = this_cpu_ptr(&kvm_hyp_ctxt);
__ptrauth_save_key(ctxt, APIA);
__ptrauth_save_key(ctxt, APIB);
__ptrauth_save_key(ctxt, APDA);
__ptrauth_save_key(ctxt, APDB);
__ptrauth_save_key(ctxt, APGA);
vcpu_ptrauth_enable(vcpu);
val = read_sysreg(hcr_el2);
val |= (HCR_API | HCR_APK);
write_sysreg(val, hcr_el2);
return true;
}
static bool kvm_hyp_handle_cntpct(struct kvm_vcpu *vcpu)
{
struct arch_timer_context *ctxt;
......@@ -590,9 +529,6 @@ static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code)
__vgic_v3_perform_cpuif_access(vcpu) == 1)
return true;
if (esr_is_ptrauth_trap(kvm_vcpu_get_esr(vcpu)))
return kvm_hyp_handle_ptrauth(vcpu, exit_code);
if (kvm_hyp_handle_cntpct(vcpu))
return true;
......
......@@ -53,7 +53,13 @@ pkvm_hyp_vcpu_to_hyp_vm(struct pkvm_hyp_vcpu *hyp_vcpu)
return container_of(hyp_vcpu->vcpu.kvm, struct pkvm_hyp_vm, kvm);
}
static inline bool pkvm_hyp_vcpu_is_protected(struct pkvm_hyp_vcpu *hyp_vcpu)
{
return vcpu_is_protected(&hyp_vcpu->vcpu);
}
void pkvm_hyp_vm_table_init(void *tbl);
void pkvm_host_fpsimd_state_init(void);
int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva);
......
......@@ -83,10 +83,10 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
/* Disable and flush SPE data generation */
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
__debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
__debug_save_spe(host_data_ptr(host_debug_state.pmscr_el1));
/* Disable and flush Self-Hosted Trace generation */
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
__debug_save_trace(host_data_ptr(host_debug_state.trfcr_el1));
}
void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
......@@ -97,9 +97,9 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
__debug_restore_spe(*host_data_ptr(host_debug_state.pmscr_el1));
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
__debug_restore_trace(*host_data_ptr(host_debug_state.trfcr_el1));
}
void __debug_switch_to_host(struct kvm_vcpu *vcpu)
......
......@@ -600,7 +600,6 @@ static bool ffa_call_supported(u64 func_id)
case FFA_MSG_POLL:
case FFA_MSG_WAIT:
/* 32-bit variants of 64-bit calls */
case FFA_MSG_SEND_DIRECT_REQ:
case FFA_MSG_SEND_DIRECT_RESP:
case FFA_RXTX_MAP:
case FFA_MEM_DONATE:
......
......@@ -39,10 +39,8 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
hyp_vcpu->vcpu.arch.cptr_el2 = host_vcpu->arch.cptr_el2;
hyp_vcpu->vcpu.arch.iflags = host_vcpu->arch.iflags;
hyp_vcpu->vcpu.arch.fp_state = host_vcpu->arch.fp_state;
hyp_vcpu->vcpu.arch.debug_ptr = kern_hyp_va(host_vcpu->arch.debug_ptr);
hyp_vcpu->vcpu.arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
hyp_vcpu->vcpu.arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
......@@ -64,7 +62,6 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
host_vcpu->arch.fault = hyp_vcpu->vcpu.arch.fault;
host_vcpu->arch.iflags = hyp_vcpu->vcpu.arch.iflags;
host_vcpu->arch.fp_state = hyp_vcpu->vcpu.arch.fp_state;
host_cpu_if->vgic_hcr = hyp_cpu_if->vgic_hcr;
for (i = 0; i < hyp_cpu_if->used_lrs; ++i)
......@@ -178,16 +175,6 @@ static void handle___vgic_v3_get_gic_config(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __vgic_v3_get_gic_config();
}
static void handle___vgic_v3_read_vmcr(struct kvm_cpu_context *host_ctxt)
{
cpu_reg(host_ctxt, 1) = __vgic_v3_read_vmcr();
}
static void handle___vgic_v3_write_vmcr(struct kvm_cpu_context *host_ctxt)
{
__vgic_v3_write_vmcr(cpu_reg(host_ctxt, 1));
}
static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
{
__vgic_v3_init_lrs();
......@@ -198,18 +185,18 @@ static void handle___kvm_get_mdcr_el2(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __kvm_get_mdcr_el2();
}
static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
static void handle___vgic_v3_save_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
__vgic_v3_save_aprs(kern_hyp_va(cpu_if));
__vgic_v3_save_vmcr_aprs(kern_hyp_va(cpu_if));
}
static void handle___vgic_v3_restore_aprs(struct kvm_cpu_context *host_ctxt)
static void handle___vgic_v3_restore_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
__vgic_v3_restore_aprs(kern_hyp_va(cpu_if));
__vgic_v3_restore_vmcr_aprs(kern_hyp_va(cpu_if));
}
static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
......@@ -340,10 +327,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
HANDLE_FUNC(__kvm_flush_cpu_context),
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__vgic_v3_read_vmcr),
HANDLE_FUNC(__vgic_v3_write_vmcr),
HANDLE_FUNC(__vgic_v3_save_aprs),
HANDLE_FUNC(__vgic_v3_restore_aprs),
HANDLE_FUNC(__vgic_v3_save_vmcr_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
HANDLE_FUNC(__pkvm_vcpu_init_traps),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
......
......@@ -533,7 +533,13 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
int ret = 0;
esr = read_sysreg_el2(SYS_ESR);
BUG_ON(!__get_fault_info(esr, &fault));
if (!__get_fault_info(esr, &fault)) {
/*
* We've presumably raced with a page-table change which caused
* AT to fail, try again.
*/
return;
}
addr = (fault.hpfar_el2 & HPFAR_MASK) << 8;
ret = host_stage2_idmap(addr);
......
......@@ -200,7 +200,7 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
}
/*
* Initialize trap register values for protected VMs.
* Initialize trap register values in protected mode.
*/
void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
{
......@@ -247,6 +247,17 @@ void pkvm_hyp_vm_table_init(void *tbl)
vm_table = tbl;
}
void pkvm_host_fpsimd_state_init(void)
{
unsigned long i;
for (i = 0; i < hyp_nr_cpus; i++) {
struct kvm_host_data *host_data = per_cpu_ptr(&kvm_host_data, i);
host_data->fpsimd_state = &host_data->host_ctxt.fp_regs;
}
}
/*
* Return the hyp vm structure corresponding to the handle.
*/
......@@ -430,6 +441,7 @@ static void *map_donated_memory(unsigned long host_va, size_t size)
static void __unmap_donated_memory(void *va, size_t size)
{
kvm_flush_dcache_to_poc(va, size);
WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
PAGE_ALIGN(size) >> PAGE_SHIFT));
}
......
......@@ -205,7 +205,7 @@ asmlinkage void __noreturn __kvm_host_psci_cpu_entry(bool is_cpu_on)
struct psci_boot_args *boot_args;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
if (is_cpu_on)
boot_args = this_cpu_ptr(&cpu_on_args);
......
......@@ -257,8 +257,7 @@ static int fix_hyp_pgtable_refcnt(void)
void __noreturn __pkvm_init_finalise(void)
{
struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
struct kvm_cpu_context *host_ctxt = host_data_ptr(host_ctxt);
unsigned long nr_pages, reserved_pages, pfn;
int ret;
......@@ -301,6 +300,7 @@ void __noreturn __pkvm_init_finalise(void)
goto out;
pkvm_hyp_vm_table_init(vm_table_base);
pkvm_host_fpsimd_state_init();
out:
/*
* We tail-called to here from handle___pkvm_init() and will not return,
......
......@@ -40,7 +40,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
___activate_traps(vcpu);
___activate_traps(vcpu, vcpu->arch.hcr_el2);
__activate_traps_common(vcpu);
val = vcpu->arch.cptr_el2;
......@@ -53,7 +53,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val |= CPTR_EL2_TSM;
}
if (!guest_owns_fp_regs(vcpu)) {
if (!guest_owns_fp_regs()) {
if (has_hvhe())
val &= ~(CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN |
CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN);
......@@ -191,7 +191,6 @@ static const exit_handler_fn hyp_exit_handlers[] = {
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
[ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low,
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
[ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops,
};
......@@ -203,13 +202,12 @@ static const exit_handler_fn pvm_exit_handlers[] = {
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
[ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low,
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
[ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops,
};
static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
{
if (unlikely(kvm_vm_is_protected(kern_hyp_va(vcpu->kvm))))
if (unlikely(vcpu_is_protected(vcpu)))
return pvm_exit_handlers;
return hyp_exit_handlers;
......@@ -228,9 +226,7 @@ static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
*/
static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
{
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
if (kvm_vm_is_protected(kvm) && vcpu_mode_is_32bit(vcpu)) {
if (unlikely(vcpu_is_protected(vcpu) && vcpu_mode_is_32bit(vcpu))) {
/*
* As we have caught the guest red-handed, decide that it isn't
* fit for purpose anymore by making the vcpu invalid. The VMM
......@@ -264,7 +260,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
pmr_sync();
}
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = &vcpu->arch.ctxt;
......@@ -337,7 +333,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
__sysreg_restore_state_nvhe(host_ctxt);
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)
if (guest_owns_fp_regs())
__fpsimd_save_fpexc32(vcpu);
__debug_switch_to_host(vcpu);
......@@ -367,7 +363,7 @@ asmlinkage void __noreturn hyp_panic(void)
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
vcpu = host_ctxt->__hyp_running_vcpu;
if (vcpu) {
......
......@@ -11,13 +11,23 @@
#include <nvhe/mem_protect.h>
struct tlb_inv_context {
u64 tcr;
struct kvm_s2_mmu *mmu;
u64 tcr;
u64 sctlr;
};
static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt,
bool nsh)
static void enter_vmid_context(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt,
bool nsh)
{
struct kvm_s2_mmu *host_s2_mmu = &host_mmu.arch.mmu;
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
vcpu = host_ctxt->__hyp_running_vcpu;
cxt->mmu = NULL;
/*
* We have two requirements:
*
......@@ -40,20 +50,55 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
else
dsb(ish);
/*
* If we're already in the desired context, then there's nothing to do.
*/
if (vcpu) {
/*
* We're in guest context. However, for this to work, this needs
* to be called from within __kvm_vcpu_run(), which ensures that
* __hyp_running_vcpu is set to the current guest vcpu.
*/
if (mmu == vcpu->arch.hw_mmu || WARN_ON(mmu != host_s2_mmu))
return;
cxt->mmu = vcpu->arch.hw_mmu;
} else {
/* We're in host context. */
if (mmu == host_s2_mmu)
return;
cxt->mmu = host_s2_mmu;
}
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
u64 val;
/*
* For CPUs that are affected by ARM 1319367, we need to
* avoid a host Stage-1 walk while we have the guest's
* VMID set in the VTTBR in order to invalidate TLBs.
* We're guaranteed that the S1 MMU is enabled, so we can
* simply set the EPD bits to avoid any further TLB fill.
* avoid a Stage-1 walk with the old VMID while we have
* the new VMID set in the VTTBR in order to invalidate TLBs.
* We're guaranteed that the host S1 MMU is enabled, so
* we can simply set the EPD bits to avoid any further
* TLB fill. For guests, we ensure that the S1 MMU is
* temporarily enabled in the next context.
*/
val = cxt->tcr = read_sysreg_el1(SYS_TCR);
val |= TCR_EPD1_MASK | TCR_EPD0_MASK;
write_sysreg_el1(val, SYS_TCR);
isb();
if (vcpu) {
val = cxt->sctlr = read_sysreg_el1(SYS_SCTLR);
if (!(val & SCTLR_ELx_M)) {
val |= SCTLR_ELx_M;
write_sysreg_el1(val, SYS_SCTLR);
isb();
}
} else {
/* The host S1 MMU is always enabled. */
cxt->sctlr = SCTLR_ELx_M;
}
}
/*
......@@ -62,18 +107,40 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
* ensuring that we always have an ISB, but not two ISBs back
* to back.
*/
__load_stage2(mmu, kern_hyp_va(mmu->arch));
if (vcpu)
__load_host_stage2();
else
__load_stage2(mmu, kern_hyp_va(mmu->arch));
asm(ALTERNATIVE("isb", "nop", ARM64_WORKAROUND_SPECULATIVE_AT));
}
static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
static void exit_vmid_context(struct tlb_inv_context *cxt)
{
__load_host_stage2();
struct kvm_s2_mmu *mmu = cxt->mmu;
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
vcpu = host_ctxt->__hyp_running_vcpu;
if (!mmu)
return;
if (vcpu)
__load_stage2(mmu, kern_hyp_va(mmu->arch));
else
__load_host_stage2();
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
/* Ensure write of the host VMID */
/* Ensure write of the old VMID */
isb();
/* Restore the host's TCR_EL1 */
if (!(cxt->sctlr & SCTLR_ELx_M)) {
write_sysreg_el1(cxt->sctlr, SYS_SCTLR);
isb();
}
write_sysreg_el1(cxt->tcr, SYS_TCR);
}
}
......@@ -84,7 +151,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
/*
* We could do so much better if we had the VA as well.
......@@ -105,7 +172,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
......@@ -114,7 +181,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, true);
enter_vmid_context(mmu, &cxt, true);
/*
* We could do so much better if we had the VA as well.
......@@ -135,7 +202,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
......@@ -152,7 +219,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
start = round_down(start, stride);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride,
TLBI_TTL_UNKNOWN);
......@@ -162,7 +229,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
......@@ -170,13 +237,13 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
__tlbi(vmalls12e1is);
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
......@@ -184,19 +251,19 @@ void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
__tlbi(vmalle1);
asm volatile("ic iallu");
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_vm_context(void)
{
/* Same remark as in __tlb_switch_to_guest() */
/* Same remark as in enter_vmid_context() */
dsb(ish);
__tlbi(alle1is);
dsb(ish);
......
......@@ -914,12 +914,12 @@ static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
{
u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
return kvm_pte_valid(pte) && memattr == KVM_S2_MEMATTR(pgt, NORMAL);
}
static bool stage2_pte_executable(kvm_pte_t pte)
{
return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
return kvm_pte_valid(pte) && !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
}
static u64 stage2_map_walker_phys_addr(const struct kvm_pgtable_visit_ctx *ctx,
......@@ -979,6 +979,21 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
if (!stage2_pte_needs_update(ctx->old, new))
return -EAGAIN;
/* If we're only changing software bits, then store them and go! */
if (!kvm_pgtable_walk_shared(ctx) &&
!((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW)) {
bool old_is_counted = stage2_pte_is_counted(ctx->old);
if (old_is_counted != stage2_pte_is_counted(new)) {
if (old_is_counted)
mm_ops->put_page(ctx->ptep);
else
mm_ops->get_page(ctx->ptep);
}
WARN_ON_ONCE(!stage2_try_set_pte(ctx, new));
return 0;
}
if (!stage2_try_break_pte(ctx, data->mmu))
return -EAGAIN;
......@@ -1370,7 +1385,7 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
struct kvm_pgtable *pgt = ctx->arg;
struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
if (!stage2_pte_cacheable(pgt, ctx->old))
return 0;
if (mm_ops->dcache_clean_inval_poc)
......
......@@ -330,7 +330,7 @@ void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if)
write_gicreg(0, ICH_HCR_EL2);
}
void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
static void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
{
u64 val;
u32 nr_pre_bits;
......@@ -363,7 +363,7 @@ void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
}
}
void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if)
static void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if)
{
u64 val;
u32 nr_pre_bits;
......@@ -455,16 +455,35 @@ u64 __vgic_v3_get_gic_config(void)
return val;
}
u64 __vgic_v3_read_vmcr(void)
static u64 __vgic_v3_read_vmcr(void)
{
return read_gicreg(ICH_VMCR_EL2);
}
void __vgic_v3_write_vmcr(u32 vmcr)
static void __vgic_v3_write_vmcr(u32 vmcr)
{
write_gicreg(vmcr, ICH_VMCR_EL2);
}
void __vgic_v3_save_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if)
{
__vgic_v3_save_aprs(cpu_if);
if (cpu_if->vgic_sre)
cpu_if->vgic_vmcr = __vgic_v3_read_vmcr();
}
void __vgic_v3_restore_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if)
{
/*
* If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
* is dependent on ICC_SRE_EL1.SRE, and we have to perform the
* VMCR_EL2 save/restore in the world switch.
*/
if (cpu_if->vgic_sre)
__vgic_v3_write_vmcr(cpu_if->vgic_vmcr);
__vgic_v3_restore_aprs(cpu_if);
}
static int __vgic_v3_bpr_min(void)
{
/* See Pseudocode for VPriorityGroup */
......
......@@ -33,11 +33,43 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
/*
* HCR_EL2 bits that the NV guest can freely change (no RES0/RES1
* semantics, irrespective of the configuration), but that cannot be
* applied to the actual HW as things would otherwise break badly.
*
* - TGE: we want the guest to use EL1, which is incompatible with
* this bit being set
*
* - API/APK: they are already accounted for by vcpu_load(), and can
* only take effect across a load/put cycle (such as ERET)
*/
#define NV_HCR_GUEST_EXCLUDE (HCR_TGE | HCR_API | HCR_APK)
static u64 __compute_hcr(struct kvm_vcpu *vcpu)
{
u64 hcr = vcpu->arch.hcr_el2;
if (!vcpu_has_nv(vcpu))
return hcr;
if (is_hyp_ctxt(vcpu)) {
hcr |= HCR_NV | HCR_NV2 | HCR_AT | HCR_TTLB;
if (!vcpu_el2_e2h_is_set(vcpu))
hcr |= HCR_NV1;
write_sysreg_s(vcpu->arch.ctxt.vncr_array, SYS_VNCR_EL2);
}
return hcr | (__vcpu_sys_reg(vcpu, HCR_EL2) & ~NV_HCR_GUEST_EXCLUDE);
}
static void __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
___activate_traps(vcpu);
___activate_traps(vcpu, __compute_hcr(vcpu));
if (has_cntpoff()) {
struct timer_map map;
......@@ -75,7 +107,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val |= CPTR_EL2_TAM;
if (guest_owns_fp_regs(vcpu)) {
if (guest_owns_fp_regs()) {
if (vcpu_has_sve(vcpu))
val |= CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN;
} else {
......@@ -162,6 +194,8 @@ static void __vcpu_put_deactivate_traps(struct kvm_vcpu *vcpu)
void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu)
{
host_data_ptr(host_ctxt)->__hyp_running_vcpu = vcpu;
__vcpu_load_switch_sysregs(vcpu);
__vcpu_load_activate_traps(vcpu);
__load_stage2(vcpu->arch.hw_mmu, vcpu->arch.hw_mmu->arch);
......@@ -171,6 +205,61 @@ void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu)
{
__vcpu_put_deactivate_traps(vcpu);
__vcpu_put_switch_sysregs(vcpu);
host_data_ptr(host_ctxt)->__hyp_running_vcpu = NULL;
}
static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
{
u64 esr = kvm_vcpu_get_esr(vcpu);
u64 spsr, elr, mode;
/*
* Going through the whole put/load motions is a waste of time
* if this is a VHE guest hypervisor returning to its own
* userspace, or the hypervisor performing a local exception
* return. No need to save/restore registers, no need to
* switch S2 MMU. Just do the canonical ERET.
*
* Unless the trap has to be forwarded further down the line,
* of course...
*/
if ((__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV) ||
(__vcpu_sys_reg(vcpu, HFGITR_EL2) & HFGITR_EL2_ERET))
return false;
spsr = read_sysreg_el1(SYS_SPSR);
mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
switch (mode) {
case PSR_MODE_EL0t:
if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
return false;
break;
case PSR_MODE_EL2t:
mode = PSR_MODE_EL1t;
break;
case PSR_MODE_EL2h:
mode = PSR_MODE_EL1h;
break;
default:
return false;
}
/* If ERETAx fails, take the slow path */
if (esr_iss_is_eretax(esr)) {
if (!(vcpu_has_ptrauth(vcpu) && kvm_auth_eretax(vcpu, &elr)))
return false;
} else {
elr = read_sysreg_el1(SYS_ELR);
}
spsr = (spsr & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
write_sysreg_el2(spsr, SYS_SPSR);
write_sysreg_el2(elr, SYS_ELR);
return true;
}
static const exit_handler_fn hyp_exit_handlers[] = {
......@@ -182,7 +271,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
[ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low,
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
[ESR_ELx_EC_ERET] = kvm_hyp_handle_eret,
[ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops,
};
......@@ -197,7 +286,7 @@ static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
* If we were in HYP context on entry, adjust the PSTATE view
* so that the usual helpers work correctly.
*/
if (unlikely(vcpu_get_flag(vcpu, VCPU_HYP_CONTEXT))) {
if (vcpu_has_nv(vcpu) && (read_sysreg(hcr_el2) & HCR_NV)) {
u64 mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
switch (mode) {
......@@ -221,8 +310,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt;
u64 exit_code;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt->__hyp_running_vcpu = vcpu;
host_ctxt = host_data_ptr(host_ctxt);
guest_ctxt = &vcpu->arch.ctxt;
sysreg_save_host_state_vhe(host_ctxt);
......@@ -240,11 +328,6 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
sysreg_restore_guest_state_vhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
if (is_hyp_ctxt(vcpu))
vcpu_set_flag(vcpu, VCPU_HYP_CONTEXT);
else
vcpu_clear_flag(vcpu, VCPU_HYP_CONTEXT);
do {
/* Jump in the fire! */
exit_code = __guest_enter(vcpu);
......@@ -258,7 +341,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
sysreg_restore_host_state_vhe(host_ctxt);
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)
if (guest_owns_fp_regs())
__fpsimd_save_fpexc32(vcpu);
__debug_switch_to_host(vcpu);
......@@ -306,7 +389,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 par)
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
vcpu = host_ctxt->__hyp_running_vcpu;
__deactivate_traps(vcpu);
......
......@@ -67,7 +67,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
__sysreg_save_user_state(host_ctxt);
/*
......@@ -110,7 +110,7 @@ void __vcpu_put_switch_sysregs(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
__sysreg_save_el1_state(guest_ctxt);
__sysreg_save_user_state(guest_ctxt);
......
......@@ -17,8 +17,8 @@ struct tlb_inv_context {
u64 sctlr;
};
static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt)
static void enter_vmid_context(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt)
{
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
u64 val;
......@@ -67,7 +67,7 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
isb();
}
static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
static void exit_vmid_context(struct tlb_inv_context *cxt)
{
/*
* We're done with the TLB operation, let's restore the host's
......@@ -97,7 +97,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
dsb(ishst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
/*
* We could do so much better if we had the VA as well.
......@@ -118,7 +118,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
......@@ -129,7 +129,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
dsb(nshst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
/*
* We could do so much better if we had the VA as well.
......@@ -150,7 +150,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
......@@ -169,7 +169,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
dsb(ishst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride,
TLBI_TTL_UNKNOWN);
......@@ -179,7 +179,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
......@@ -189,13 +189,13 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
dsb(ishst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
__tlbi(vmalls12e1is);
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
......@@ -203,14 +203,14 @@ void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
__tlbi(vmalle1);
asm volatile("ic iallu");
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_vm_context(void)
......
......@@ -86,7 +86,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
/* Detect an already handled MMIO return */
if (unlikely(!vcpu->mmio_needed))
return 0;
return 1;
vcpu->mmio_needed = 0;
......@@ -117,7 +117,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
*/
kvm_incr_pc(vcpu);
return 0;
return 1;
}
int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
......@@ -133,11 +133,19 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
/*
* No valid syndrome? Ask userspace for help if it has
* volunteered to do so, and bail out otherwise.
*
* In the protected VM case, there isn't much userspace can do
* though, so directly deliver an exception to the guest.
*/
if (!kvm_vcpu_dabt_isvalid(vcpu)) {
trace_kvm_mmio_nisv(*vcpu_pc(vcpu), kvm_vcpu_get_esr(vcpu),
kvm_vcpu_get_hfar(vcpu), fault_ipa);
if (vcpu_is_protected(vcpu)) {
kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
return 1;
}
if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
&vcpu->kvm->arch.flags)) {
run->exit_reason = KVM_EXIT_ARM_NISV;
......
......@@ -1522,8 +1522,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
read_lock(&kvm->mmu_lock);
pgt = vcpu->arch.hw_mmu->pgt;
if (mmu_invalidate_retry(kvm, mmu_seq))
if (mmu_invalidate_retry(kvm, mmu_seq)) {
ret = -EAGAIN;
goto out_unlock;
}
/*
* If we are not forced to use page mapping, check if we are
......@@ -1581,6 +1583,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
memcache,
KVM_PGTABLE_WALK_HANDLE_FAULT |
KVM_PGTABLE_WALK_SHARED);
out_unlock:
read_unlock(&kvm->mmu_lock);
/* Mark the page dirty only if the fault is handled successfully */
if (writable && !ret) {
......@@ -1588,8 +1592,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
mark_page_dirty_in_slot(kvm, memslot, gfn);
}
out_unlock:
read_unlock(&kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
return ret != -EAGAIN ? ret : 0;
}
......
......@@ -35,13 +35,9 @@ static u64 limit_nv_id_reg(u32 id, u64 val)
break;
case SYS_ID_AA64ISAR1_EL1:
/* Support everything but PtrAuth and Spec Invalidation */
/* Support everything but Spec Invalidation */
val &= ~(GENMASK_ULL(63, 56) |
NV_FTR(ISAR1, SPECRES) |
NV_FTR(ISAR1, GPI) |
NV_FTR(ISAR1, GPA) |
NV_FTR(ISAR1, API) |
NV_FTR(ISAR1, APA));
NV_FTR(ISAR1, SPECRES));
break;
case SYS_ID_AA64PFR0_EL1:
......
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2024 - Google LLC
* Author: Marc Zyngier <maz@kernel.org>
*
* Primitive PAuth emulation for ERETAA/ERETAB.
*
* This code assumes that is is run from EL2, and that it is part of
* the emulation of ERETAx for a guest hypervisor. That's a lot of
* baked-in assumptions and shortcuts.
*
* Do no reuse for anything else!
*/
#include <linux/kvm_host.h>
#include <asm/gpr-num.h>
#include <asm/kvm_emulate.h>
#include <asm/pointer_auth.h>
/* PACGA Xd, Xn, Xm */
#define PACGA(d,n,m) \
asm volatile(__DEFINE_ASM_GPR_NUMS \
".inst 0x9AC03000 |" \
"(.L__gpr_num_%[Rd] << 0) |" \
"(.L__gpr_num_%[Rn] << 5) |" \
"(.L__gpr_num_%[Rm] << 16)\n" \
: [Rd] "=r" ((d)) \
: [Rn] "r" ((n)), [Rm] "r" ((m)))
static u64 compute_pac(struct kvm_vcpu *vcpu, u64 ptr,
struct ptrauth_key ikey)
{
struct ptrauth_key gkey;
u64 mod, pac = 0;
preempt_disable();
if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
mod = __vcpu_sys_reg(vcpu, SP_EL2);
else
mod = read_sysreg(sp_el1);
gkey.lo = read_sysreg_s(SYS_APGAKEYLO_EL1);
gkey.hi = read_sysreg_s(SYS_APGAKEYHI_EL1);
__ptrauth_key_install_nosync(APGA, ikey);
isb();
PACGA(pac, ptr, mod);
isb();
__ptrauth_key_install_nosync(APGA, gkey);
preempt_enable();
/* PAC in the top 32bits */
return pac;
}
static bool effective_tbi(struct kvm_vcpu *vcpu, bool bit55)
{
u64 tcr = vcpu_read_sys_reg(vcpu, TCR_EL2);
bool tbi, tbid;
/*
* Since we are authenticating an instruction address, we have
* to take TBID into account. If E2H==0, ignore VA[55], as
* TCR_EL2 only has a single TBI/TBID. If VA[55] was set in
* this case, this is likely a guest bug...
*/
if (!vcpu_el2_e2h_is_set(vcpu)) {
tbi = tcr & BIT(20);
tbid = tcr & BIT(29);
} else if (bit55) {
tbi = tcr & TCR_TBI1;
tbid = tcr & TCR_TBID1;
} else {
tbi = tcr & TCR_TBI0;
tbid = tcr & TCR_TBID0;
}
return tbi && !tbid;
}
static int compute_bottom_pac(struct kvm_vcpu *vcpu, bool bit55)
{
static const int maxtxsz = 39; // Revisit these two values once
static const int mintxsz = 16; // (if) we support TTST/LVA/LVA2
u64 tcr = vcpu_read_sys_reg(vcpu, TCR_EL2);
int txsz;
if (!vcpu_el2_e2h_is_set(vcpu) || !bit55)
txsz = FIELD_GET(TCR_T0SZ_MASK, tcr);
else
txsz = FIELD_GET(TCR_T1SZ_MASK, tcr);
return 64 - clamp(txsz, mintxsz, maxtxsz);
}
static u64 compute_pac_mask(struct kvm_vcpu *vcpu, bool bit55)
{
int bottom_pac;
u64 mask;
bottom_pac = compute_bottom_pac(vcpu, bit55);
mask = GENMASK(54, bottom_pac);
if (!effective_tbi(vcpu, bit55))
mask |= GENMASK(63, 56);
return mask;
}
static u64 to_canonical_addr(struct kvm_vcpu *vcpu, u64 ptr, u64 mask)
{
bool bit55 = !!(ptr & BIT(55));
if (bit55)
return ptr | mask;
return ptr & ~mask;
}
static u64 corrupt_addr(struct kvm_vcpu *vcpu, u64 ptr)
{
bool bit55 = !!(ptr & BIT(55));
u64 mask, error_code;
int shift;
if (effective_tbi(vcpu, bit55)) {
mask = GENMASK(54, 53);
shift = 53;
} else {
mask = GENMASK(62, 61);
shift = 61;
}
if (esr_iss_is_eretab(kvm_vcpu_get_esr(vcpu)))
error_code = 2 << shift;
else
error_code = 1 << shift;
ptr &= ~mask;
ptr |= error_code;
return ptr;
}
/*
* Authenticate an ERETAA/ERETAB instruction, returning true if the
* authentication succeeded and false otherwise. In all cases, *elr
* contains the VA to ERET to. Potential exception injection is left
* to the caller.
*/
bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr)
{
u64 sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL2);
u64 esr = kvm_vcpu_get_esr(vcpu);
u64 ptr, cptr, pac, mask;
struct ptrauth_key ikey;
*elr = ptr = vcpu_read_sys_reg(vcpu, ELR_EL2);
/* We assume we're already in the context of an ERETAx */
if (esr_iss_is_eretab(esr)) {
if (!(sctlr & SCTLR_EL1_EnIB))
return true;
ikey.lo = __vcpu_sys_reg(vcpu, APIBKEYLO_EL1);
ikey.hi = __vcpu_sys_reg(vcpu, APIBKEYHI_EL1);
} else {
if (!(sctlr & SCTLR_EL1_EnIA))
return true;
ikey.lo = __vcpu_sys_reg(vcpu, APIAKEYLO_EL1);
ikey.hi = __vcpu_sys_reg(vcpu, APIAKEYHI_EL1);
}
mask = compute_pac_mask(vcpu, !!(ptr & BIT(55)));
cptr = to_canonical_addr(vcpu, ptr, mask);
pac = compute_pac(vcpu, cptr, ikey);
/*
* Slightly deviate from the pseudocode: if we have a PAC
* match with the signed pointer, then it must be good.
* Anything after this point is pure error handling.
*/
if ((pac & mask) == (ptr & mask)) {
*elr = cptr;
return true;
}
/*
* Authentication failed, corrupt the canonical address if
* PAuth2 isn't implemented, or some XORing if it is.
*/
if (!kvm_has_pauth(vcpu->kvm, PAuth2))
cptr = corrupt_addr(vcpu, cptr);
else
cptr = ptr ^ (pac & mask);
*elr = cptr;
return false;
}
......@@ -222,7 +222,6 @@ void pkvm_destroy_hyp_vm(struct kvm *host_kvm)
int pkvm_init_host_vm(struct kvm *host_kvm)
{
mutex_init(&host_kvm->lock);
return 0;
}
......@@ -259,6 +258,7 @@ static int __init finalize_pkvm(void)
* at, which would end badly once inaccessible.
*/
kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start);
kmemleak_free_part(__hyp_rodata_start, __hyp_rodata_end - __hyp_rodata_start);
kmemleak_free_part_phys(hyp_mem_base, hyp_mem_size);
ret = pkvm_drop_host_privileges();
......
......@@ -232,7 +232,7 @@ bool kvm_set_pmuserenr(u64 val)
if (!vcpu || !vcpu_get_flag(vcpu, PMUSERENR_ON_CPU))
return false;
hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
hctxt = host_data_ptr(host_ctxt);
ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val;
return true;
}
......
......@@ -151,7 +151,6 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
{
void *sve_state = vcpu->arch.sve_state;
kvm_vcpu_unshare_task_fp(vcpu);
kvm_unshare_hyp(vcpu, vcpu + 1);
if (sve_state)
kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
......
......@@ -1568,17 +1568,31 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
return IDREG(vcpu->kvm, reg_to_encoding(r));
}
static bool is_feature_id_reg(u32 encoding)
{
return (sys_reg_Op0(encoding) == 3 &&
(sys_reg_Op1(encoding) < 2 || sys_reg_Op1(encoding) == 3) &&
sys_reg_CRn(encoding) == 0 &&
sys_reg_CRm(encoding) <= 7);
}
/*
* Return true if the register's (Op0, Op1, CRn, CRm, Op2) is
* (3, 0, 0, crm, op2), where 1<=crm<8, 0<=op2<8.
* (3, 0, 0, crm, op2), where 1<=crm<8, 0<=op2<8, which is the range of ID
* registers KVM maintains on a per-VM basis.
*/
static inline bool is_id_reg(u32 id)
static inline bool is_vm_ftr_id_reg(u32 id)
{
return (sys_reg_Op0(id) == 3 && sys_reg_Op1(id) == 0 &&
sys_reg_CRn(id) == 0 && sys_reg_CRm(id) >= 1 &&
sys_reg_CRm(id) < 8);
}
static inline bool is_vcpu_ftr_id_reg(u32 id)
{
return is_feature_id_reg(id) && !is_vm_ftr_id_reg(id);
}
static inline bool is_aa32_id_reg(u32 id)
{
return (sys_reg_Op0(id) == 3 && sys_reg_Op1(id) == 0 &&
......@@ -2338,7 +2352,6 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_AA64MMFR0_EL1_TGRAN16_2)),
ID_WRITABLE(ID_AA64MMFR1_EL1, ~(ID_AA64MMFR1_EL1_RES0 |
ID_AA64MMFR1_EL1_HCX |
ID_AA64MMFR1_EL1_XNX |
ID_AA64MMFR1_EL1_TWED |
ID_AA64MMFR1_EL1_XNX |
ID_AA64MMFR1_EL1_VH |
......@@ -3069,12 +3082,14 @@ static bool check_sysreg_table(const struct sys_reg_desc *table, unsigned int n,
for (i = 0; i < n; i++) {
if (!is_32 && table[i].reg && !table[i].reset) {
kvm_err("sys_reg table %pS entry %d lacks reset\n", &table[i], i);
kvm_err("sys_reg table %pS entry %d (%s) lacks reset\n",
&table[i], i, table[i].name);
return false;
}
if (i && cmp_sys_reg(&table[i-1], &table[i]) >= 0) {
kvm_err("sys_reg table %pS entry %d out of order\n", &table[i - 1], i - 1);
kvm_err("sys_reg table %pS entry %d (%s -> %s) out of order\n",
&table[i], i, table[i - 1].name, table[i].name);
return false;
}
}
......@@ -3509,26 +3524,25 @@ void kvm_sys_regs_create_debugfs(struct kvm *kvm)
&idregs_debug_fops);
}
static void kvm_reset_id_regs(struct kvm_vcpu *vcpu)
static void reset_vm_ftr_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *reg)
{
const struct sys_reg_desc *idreg = first_idreg;
u32 id = reg_to_encoding(idreg);
u32 id = reg_to_encoding(reg);
struct kvm *kvm = vcpu->kvm;
if (test_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags))
return;
lockdep_assert_held(&kvm->arch.config_lock);
IDREG(kvm, id) = reg->reset(vcpu, reg);
}
/* Initialize all idregs */
while (is_id_reg(id)) {
IDREG(kvm, id) = idreg->reset(vcpu, idreg);
idreg++;
id = reg_to_encoding(idreg);
}
static void reset_vcpu_ftr_id_reg(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *reg)
{
if (kvm_vcpu_initialized(vcpu))
return;
set_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags);
reg->reset(vcpu, reg);
}
/**
......@@ -3540,19 +3554,24 @@ static void kvm_reset_id_regs(struct kvm_vcpu *vcpu)
*/
void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
unsigned long i;
kvm_reset_id_regs(vcpu);
for (i = 0; i < ARRAY_SIZE(sys_reg_descs); i++) {
const struct sys_reg_desc *r = &sys_reg_descs[i];
if (is_id_reg(reg_to_encoding(r)))
if (!r->reset)
continue;
if (r->reset)
if (is_vm_ftr_id_reg(reg_to_encoding(r)))
reset_vm_ftr_id_reg(vcpu, r);
else if (is_vcpu_ftr_id_reg(reg_to_encoding(r)))
reset_vcpu_ftr_id_reg(vcpu, r);
else
r->reset(vcpu, r);
}
set_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags);
}
/**
......@@ -3978,14 +3997,6 @@ int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
sys_reg_CRm(r), \
sys_reg_Op2(r))
static bool is_feature_id_reg(u32 encoding)
{
return (sys_reg_Op0(encoding) == 3 &&
(sys_reg_Op1(encoding) < 2 || sys_reg_Op1(encoding) == 3) &&
sys_reg_CRn(encoding) == 0 &&
sys_reg_CRm(encoding) <= 7);
}
int kvm_vm_ioctl_get_reg_writable_masks(struct kvm *kvm, struct reg_mask_range *range)
{
const void *zero_page = page_to_virt(ZERO_PAGE(0));
......@@ -4014,7 +4025,7 @@ int kvm_vm_ioctl_get_reg_writable_masks(struct kvm *kvm, struct reg_mask_range *
* compliant with a given revision of the architecture, but the
* RES0/RES1 definitions allow us to do that.
*/
if (is_id_reg(encoding)) {
if (is_vm_ftr_id_reg(encoding)) {
if (!reg->val ||
(is_aa32_id_reg(encoding) && !kvm_supports_32bit_el0()))
continue;
......
......@@ -28,27 +28,65 @@ struct vgic_state_iter {
int nr_lpis;
int dist_id;
int vcpu_id;
int intid;
unsigned long intid;
int lpi_idx;
u32 *lpi_array;
};
static void iter_next(struct vgic_state_iter *iter)
static void iter_next(struct kvm *kvm, struct vgic_state_iter *iter)
{
struct vgic_dist *dist = &kvm->arch.vgic;
if (iter->dist_id == 0) {
iter->dist_id++;
return;
}
/*
* Let the xarray drive the iterator after the last SPI, as the iterator
* has exhausted the sequentially-allocated INTID space.
*/
if (iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS - 1)) {
if (iter->lpi_idx < iter->nr_lpis)
xa_find_after(&dist->lpi_xa, &iter->intid,
VGIC_LPI_MAX_INTID,
LPI_XA_MARK_DEBUG_ITER);
iter->lpi_idx++;
return;
}
iter->intid++;
if (iter->intid == VGIC_NR_PRIVATE_IRQS &&
++iter->vcpu_id < iter->nr_cpus)
iter->intid = 0;
}
if (iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS)) {
if (iter->lpi_idx < iter->nr_lpis)
iter->intid = iter->lpi_array[iter->lpi_idx];
iter->lpi_idx++;
static int iter_mark_lpis(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq;
unsigned long intid;
int nr_lpis = 0;
xa_for_each(&dist->lpi_xa, intid, irq) {
if (!vgic_try_get_irq_kref(irq))
continue;
xa_set_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER);
nr_lpis++;
}
return nr_lpis;
}
static void iter_unmark_lpis(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq;
unsigned long intid;
xa_for_each(&dist->lpi_xa, intid, irq) {
xa_clear_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER);
vgic_put_irq(kvm, irq);
}
}
......@@ -61,15 +99,12 @@ static void iter_init(struct kvm *kvm, struct vgic_state_iter *iter,
iter->nr_cpus = nr_cpus;
iter->nr_spis = kvm->arch.vgic.nr_spis;
if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
iter->nr_lpis = vgic_copy_lpi_list(kvm, NULL, &iter->lpi_array);
if (iter->nr_lpis < 0)
iter->nr_lpis = 0;
}
if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
iter->nr_lpis = iter_mark_lpis(kvm);
/* Fast forward to the right position if needed */
while (pos--)
iter_next(iter);
iter_next(kvm, iter);
}
static bool end_of_vgic(struct vgic_state_iter *iter)
......@@ -114,7 +149,7 @@ static void *vgic_debug_next(struct seq_file *s, void *v, loff_t *pos)
struct vgic_state_iter *iter = kvm->arch.vgic.iter;
++*pos;
iter_next(iter);
iter_next(kvm, iter);
if (end_of_vgic(iter))
iter = NULL;
return iter;
......@@ -134,13 +169,14 @@ static void vgic_debug_stop(struct seq_file *s, void *v)
mutex_lock(&kvm->arch.config_lock);
iter = kvm->arch.vgic.iter;
kfree(iter->lpi_array);
iter_unmark_lpis(kvm);
kfree(iter);
kvm->arch.vgic.iter = NULL;
mutex_unlock(&kvm->arch.config_lock);
}
static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
static void print_dist_state(struct seq_file *s, struct vgic_dist *dist,
struct vgic_state_iter *iter)
{
bool v3 = dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3;
......@@ -149,7 +185,7 @@ static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
seq_printf(s, "vgic_model:\t%s\n", v3 ? "GICv3" : "GICv2");
seq_printf(s, "nr_spis:\t%d\n", dist->nr_spis);
if (v3)
seq_printf(s, "nr_lpis:\t%d\n", atomic_read(&dist->lpi_count));
seq_printf(s, "nr_lpis:\t%d\n", iter->nr_lpis);
seq_printf(s, "enabled:\t%d\n", dist->enabled);
seq_printf(s, "\n");
......@@ -236,7 +272,7 @@ static int vgic_debug_show(struct seq_file *s, void *v)
unsigned long flags;
if (iter->dist_id == 0) {
print_dist_state(s, &kvm->arch.vgic);
print_dist_state(s, &kvm->arch.vgic, iter);
return 0;
}
......@@ -246,11 +282,13 @@ static int vgic_debug_show(struct seq_file *s, void *v)
if (iter->vcpu_id < iter->nr_cpus)
vcpu = kvm_get_vcpu(kvm, iter->vcpu_id);
/*
* Expect this to succeed, as iter_mark_lpis() takes a reference on
* every LPI to be visited.
*/
irq = vgic_get_irq(kvm, vcpu, iter->intid);
if (!irq) {
seq_printf(s, " LPI %4d freed\n", iter->intid);
return 0;
}
if (WARN_ON_ONCE(!irq))
return -EINVAL;
raw_spin_lock_irqsave(&irq->irq_lock, flags);
print_irq_state(s, irq, vcpu);
......
......@@ -53,8 +53,6 @@ void kvm_vgic_early_init(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
INIT_LIST_HEAD(&dist->lpi_translation_cache);
raw_spin_lock_init(&dist->lpi_list_lock);
xa_init_flags(&dist->lpi_xa, XA_FLAGS_LOCK_IRQ);
}
......@@ -182,27 +180,22 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
return 0;
}
/**
* kvm_vgic_vcpu_init() - Initialize static VGIC VCPU data
* structures and register VCPU-specific KVM iodevs
*
* @vcpu: pointer to the VCPU being created and initialized
*
* Only do initialization, but do not actually enable the
* VGIC CPU interface
*/
int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
int ret = 0;
int i;
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
lockdep_assert_held(&vcpu->kvm->arch.config_lock);
INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
raw_spin_lock_init(&vgic_cpu->ap_list_lock);
atomic_set(&vgic_cpu->vgic_v3.its_vpe.vlpi_count, 0);
if (vgic_cpu->private_irqs)
return 0;
vgic_cpu->private_irqs = kcalloc(VGIC_NR_PRIVATE_IRQS,
sizeof(struct vgic_irq),
GFP_KERNEL_ACCOUNT);
if (!vgic_cpu->private_irqs)
return -ENOMEM;
/*
* Enable and configure all SGIs to be edge-triggered and
......@@ -227,9 +220,48 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
}
}
return 0;
}
static int vgic_allocate_private_irqs(struct kvm_vcpu *vcpu)
{
int ret;
mutex_lock(&vcpu->kvm->arch.config_lock);
ret = vgic_allocate_private_irqs_locked(vcpu);
mutex_unlock(&vcpu->kvm->arch.config_lock);
return ret;
}
/**
* kvm_vgic_vcpu_init() - Initialize static VGIC VCPU data
* structures and register VCPU-specific KVM iodevs
*
* @vcpu: pointer to the VCPU being created and initialized
*
* Only do initialization, but do not actually enable the
* VGIC CPU interface
*/
int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
int ret = 0;
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
raw_spin_lock_init(&vgic_cpu->ap_list_lock);
atomic_set(&vgic_cpu->vgic_v3.its_vpe.vlpi_count, 0);
if (!irqchip_in_kernel(vcpu->kvm))
return 0;
ret = vgic_allocate_private_irqs(vcpu);
if (ret)
return ret;
/*
* If we are creating a VCPU with a GICv3 we must also register the
* KVM io device for the redistributor that belongs to this VCPU.
......@@ -285,10 +317,13 @@ int vgic_init(struct kvm *kvm)
/* Initialize groups on CPUs created before the VGIC type was known */
kvm_for_each_vcpu(idx, vcpu, kvm) {
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
ret = vgic_allocate_private_irqs_locked(vcpu);
if (ret)
goto out;
for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) {
struct vgic_irq *irq = &vgic_cpu->private_irqs[i];
struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, i);
switch (dist->vgic_model) {
case KVM_DEV_TYPE_ARM_VGIC_V3:
irq->group = 1;
......@@ -300,14 +335,15 @@ int vgic_init(struct kvm *kvm)
break;
default:
ret = -EINVAL;
goto out;
}
vgic_put_irq(kvm, irq);
if (ret)
goto out;
}
}
if (vgic_has_its(kvm))
vgic_lpi_translation_cache_init(kvm);
/*
* If we have GICv4.1 enabled, unconditionally request enable the
* v4 support so that we get HW-accelerated vSGIs. Otherwise, only
......@@ -361,9 +397,6 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
}
if (vgic_has_its(kvm))
vgic_lpi_translation_cache_destroy(kvm);
if (vgic_supports_direct_msis(kvm))
vgic_v4_teardown(kvm);
......@@ -381,6 +414,9 @@ static void __kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
vgic_flush_pending_lpis(vcpu);
INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
kfree(vgic_cpu->private_irqs);
vgic_cpu->private_irqs = NULL;
if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
vgic_unregister_redist_iodev(vcpu);
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
......
This diff is collapsed.
......@@ -277,7 +277,7 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu,
return;
vgic_flush_pending_lpis(vcpu);
vgic_its_invalidate_cache(vcpu->kvm);
vgic_its_invalidate_all_caches(vcpu->kvm);
atomic_set_release(&vgic_cpu->ctlr, 0);
} else {
ctlr = atomic_cmpxchg_acquire(&vgic_cpu->ctlr, 0,
......
......@@ -464,17 +464,10 @@ void vgic_v2_load(struct kvm_vcpu *vcpu)
kvm_vgic_global_state.vctrl_base + GICH_APR);
}
void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu)
{
struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
cpu_if->vgic_vmcr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VMCR);
}
void vgic_v2_put(struct kvm_vcpu *vcpu)
{
struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
vgic_v2_vmcr_sync(vcpu);
cpu_if->vgic_vmcr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VMCR);
cpu_if->vgic_apr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_APR);
}
......@@ -722,15 +722,7 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
/*
* If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
* is dependent on ICC_SRE_EL1.SRE, and we have to perform the
* VMCR_EL2 save/restore in the world switch.
*/
if (likely(cpu_if->vgic_sre))
kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
kvm_call_hyp(__vgic_v3_restore_aprs, cpu_if);
kvm_call_hyp(__vgic_v3_restore_vmcr_aprs, cpu_if);
if (has_vhe())
__vgic_v3_activate_traps(cpu_if);
......@@ -738,24 +730,13 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
WARN_ON(vgic_v4_load(vcpu));
}
void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
if (likely(cpu_if->vgic_sre))
cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
}
void vgic_v3_put(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
kvm_call_hyp(__vgic_v3_save_vmcr_aprs, cpu_if);
WARN_ON(vgic_v4_put(vcpu));
vgic_v3_vmcr_sync(vcpu);
kvm_call_hyp(__vgic_v3_save_aprs, cpu_if);
if (has_vhe())
__vgic_v3_deactivate_traps(cpu_if);
}
......@@ -29,9 +29,8 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
* its->cmd_lock (mutex)
* its->its_lock (mutex)
* vgic_cpu->ap_list_lock must be taken with IRQs disabled
* kvm->lpi_list_lock must be taken with IRQs disabled
* vgic_dist->lpi_xa.xa_lock must be taken with IRQs disabled
* vgic_irq->irq_lock must be taken with IRQs disabled
* vgic_dist->lpi_xa.xa_lock must be taken with IRQs disabled
* vgic_irq->irq_lock must be taken with IRQs disabled
*
* As the ap_list_lock might be taken from the timer interrupt handler,
* we have to disable IRQs before taking this lock and everything lower
......@@ -126,7 +125,6 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
__xa_erase(&dist->lpi_xa, irq->intid);
xa_unlock_irqrestore(&dist->lpi_xa, flags);
atomic_dec(&dist->lpi_count);
kfree_rcu(irq, rcu);
}
......@@ -939,17 +937,6 @@ void kvm_vgic_put(struct kvm_vcpu *vcpu)
vgic_v3_put(vcpu);
}
void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu)
{
if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
return;
if (kvm_vgic_global_state.type == VGIC_V2)
vgic_v2_vmcr_sync(vcpu);
else
vgic_v3_vmcr_sync(vcpu);
}
int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
......
......@@ -16,6 +16,7 @@
#define INTERRUPT_ID_BITS_SPIS 10
#define INTERRUPT_ID_BITS_ITS 16
#define VGIC_LPI_MAX_INTID ((1 << INTERRUPT_ID_BITS_ITS) - 1)
#define VGIC_PRI_BITS 5
#define vgic_irq_is_sgi(intid) ((intid) < VGIC_NR_SGIS)
......@@ -214,7 +215,6 @@ int vgic_register_dist_iodev(struct kvm *kvm, gpa_t dist_base_address,
void vgic_v2_init_lrs(void);
void vgic_v2_load(struct kvm_vcpu *vcpu);
void vgic_v2_put(struct kvm_vcpu *vcpu);
void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu);
void vgic_v2_save_state(struct kvm_vcpu *vcpu);
void vgic_v2_restore_state(struct kvm_vcpu *vcpu);
......@@ -253,7 +253,6 @@ bool vgic_v3_check_base(struct kvm *kvm);
void vgic_v3_load(struct kvm_vcpu *vcpu);
void vgic_v3_put(struct kvm_vcpu *vcpu);
void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu);
bool vgic_has_its(struct kvm *kvm);
int kvm_vgic_register_its_device(void);
......@@ -330,14 +329,11 @@ static inline bool vgic_dist_overlap(struct kvm *kvm, gpa_t base, size_t size)
}
bool vgic_lpis_enabled(struct kvm_vcpu *vcpu);
int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr);
int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its,
u32 devid, u32 eventid, struct vgic_irq **irq);
struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi);
int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi);
void vgic_lpi_translation_cache_init(struct kvm *kvm);
void vgic_lpi_translation_cache_destroy(struct kvm *kvm);
void vgic_its_invalidate_cache(struct kvm *kvm);
void vgic_its_invalidate_all_caches(struct kvm *kvm);
/* GICv4.1 MMIO interface */
int vgic_its_inv_lpi(struct kvm *kvm, struct vgic_irq *irq);
......
......@@ -210,6 +210,12 @@ struct vgic_its {
struct mutex its_lock;
struct list_head device_list;
struct list_head collection_list;
/*
* Caches the (device_id, event_id) -> vgic_irq translation for
* LPIs that are mapped and enabled.
*/
struct xarray translation_cache;
};
struct vgic_state_iter;
......@@ -274,13 +280,8 @@ struct vgic_dist {
*/
u64 propbaser;
/* Protects the lpi_list. */
raw_spinlock_t lpi_list_lock;
#define LPI_XA_MARK_DEBUG_ITER XA_MARK_0
struct xarray lpi_xa;
atomic_t lpi_count;
/* LPI translation cache */
struct list_head lpi_translation_cache;
/* used by vgic-debug */
struct vgic_state_iter *iter;
......@@ -330,7 +331,7 @@ struct vgic_cpu {
struct vgic_v3_cpu_if vgic_v3;
};
struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
struct vgic_irq *private_irqs;
raw_spinlock_t ap_list_lock; /* Protects the ap_list */
......@@ -388,7 +389,6 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
void kvm_vgic_load(struct kvm_vcpu *vcpu);
void kvm_vgic_put(struct kvm_vcpu *vcpu);
void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
#define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel))
#define vgic_initialized(k) ((k)->arch.vgic.initialized)
......
......@@ -45,6 +45,7 @@ LIBKVM_x86_64 += lib/x86_64/vmx.c
LIBKVM_aarch64 += lib/aarch64/gic.c
LIBKVM_aarch64 += lib/aarch64/gic_v3.c
LIBKVM_aarch64 += lib/aarch64/gic_v3_its.c
LIBKVM_aarch64 += lib/aarch64/handlers.S
LIBKVM_aarch64 += lib/aarch64/processor.c
LIBKVM_aarch64 += lib/aarch64/spinlock.c
......@@ -158,6 +159,7 @@ TEST_GEN_PROGS_aarch64 += aarch64/smccc_filter
TEST_GEN_PROGS_aarch64 += aarch64/vcpu_width_config
TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
TEST_GEN_PROGS_aarch64 += aarch64/vgic_irq
TEST_GEN_PROGS_aarch64 += aarch64/vgic_lpi_stress
TEST_GEN_PROGS_aarch64 += aarch64/vpmu_counter_access
TEST_GEN_PROGS_aarch64 += access_tracking_perf_test
TEST_GEN_PROGS_aarch64 += arch_timer
......
......@@ -14,9 +14,6 @@
#include "timer_test.h"
#include "vgic.h"
#define GICD_BASE_GPA 0x8000000ULL
#define GICR_BASE_GPA 0x80A0000ULL
enum guest_stage {
GUEST_STAGE_VTIMER_CVAL = 1,
GUEST_STAGE_VTIMER_TVAL,
......@@ -149,8 +146,7 @@ static void guest_code(void)
local_irq_disable();
gic_init(GIC_V3, test_args.nr_vcpus,
(void *)GICD_BASE_GPA, (void *)GICR_BASE_GPA);
gic_init(GIC_V3, test_args.nr_vcpus);
timer_set_ctl(VIRTUAL, CTL_IMASK);
timer_set_ctl(PHYSICAL, CTL_IMASK);
......@@ -209,7 +205,7 @@ struct kvm_vm *test_vm_create(void)
vcpu_init_descriptor_tables(vcpus[i]);
test_init_timer_irq(vm);
gic_fd = vgic_v3_setup(vm, nr_vcpus, 64, GICD_BASE_GPA, GICR_BASE_GPA);
gic_fd = vgic_v3_setup(vm, nr_vcpus, 64);
__TEST_REQUIRE(gic_fd >= 0, "Failed to create vgic-v3");
/* Make all the test's cmdline args visible to the guest */
......
......@@ -13,7 +13,9 @@
#define _GNU_SOURCE
#include <linux/kernel.h>
#include <linux/psci.h>
#include <asm/cputype.h>
#include "kvm_util.h"
#include "processor.h"
......
......@@ -327,8 +327,8 @@ uint64_t get_invalid_value(const struct reg_ftr_bits *ftr_bits, uint64_t ftr)
return ftr;
}
static void test_reg_set_success(struct kvm_vcpu *vcpu, uint64_t reg,
const struct reg_ftr_bits *ftr_bits)
static uint64_t test_reg_set_success(struct kvm_vcpu *vcpu, uint64_t reg,
const struct reg_ftr_bits *ftr_bits)
{
uint8_t shift = ftr_bits->shift;
uint64_t mask = ftr_bits->mask;
......@@ -346,6 +346,8 @@ static void test_reg_set_success(struct kvm_vcpu *vcpu, uint64_t reg,
vcpu_set_reg(vcpu, reg, val);
vcpu_get_reg(vcpu, reg, &new_val);
TEST_ASSERT_EQ(new_val, val);
return new_val;
}
static void test_reg_set_fail(struct kvm_vcpu *vcpu, uint64_t reg,
......@@ -374,7 +376,15 @@ static void test_reg_set_fail(struct kvm_vcpu *vcpu, uint64_t reg,
TEST_ASSERT_EQ(val, old_val);
}
static void test_user_set_reg(struct kvm_vcpu *vcpu, bool aarch64_only)
static uint64_t test_reg_vals[KVM_ARM_FEATURE_ID_RANGE_SIZE];
#define encoding_to_range_idx(encoding) \
KVM_ARM_FEATURE_ID_RANGE_IDX(sys_reg_Op0(encoding), sys_reg_Op1(encoding), \
sys_reg_CRn(encoding), sys_reg_CRm(encoding), \
sys_reg_Op2(encoding))
static void test_vm_ftr_id_regs(struct kvm_vcpu *vcpu, bool aarch64_only)
{
uint64_t masks[KVM_ARM_FEATURE_ID_RANGE_SIZE];
struct reg_mask_range range = {
......@@ -398,9 +408,7 @@ static void test_user_set_reg(struct kvm_vcpu *vcpu, bool aarch64_only)
int idx;
/* Get the index to masks array for the idreg */
idx = KVM_ARM_FEATURE_ID_RANGE_IDX(sys_reg_Op0(reg_id), sys_reg_Op1(reg_id),
sys_reg_CRn(reg_id), sys_reg_CRm(reg_id),
sys_reg_Op2(reg_id));
idx = encoding_to_range_idx(reg_id);
for (int j = 0; ftr_bits[j].type != FTR_END; j++) {
/* Skip aarch32 reg on aarch64 only system, since they are RAZ/WI. */
......@@ -414,7 +422,9 @@ static void test_user_set_reg(struct kvm_vcpu *vcpu, bool aarch64_only)
TEST_ASSERT_EQ(masks[idx] & ftr_bits[j].mask, ftr_bits[j].mask);
test_reg_set_fail(vcpu, reg, &ftr_bits[j]);
test_reg_set_success(vcpu, reg, &ftr_bits[j]);
test_reg_vals[idx] = test_reg_set_success(vcpu, reg,
&ftr_bits[j]);
ksft_test_result_pass("%s\n", ftr_bits[j].name);
}
......@@ -425,7 +435,6 @@ static void test_guest_reg_read(struct kvm_vcpu *vcpu)
{
bool done = false;
struct ucall uc;
uint64_t val;
while (!done) {
vcpu_run(vcpu);
......@@ -436,8 +445,8 @@ static void test_guest_reg_read(struct kvm_vcpu *vcpu)
break;
case UCALL_SYNC:
/* Make sure the written values are seen by guest */
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(uc.args[2]), &val);
TEST_ASSERT_EQ(val, uc.args[3]);
TEST_ASSERT_EQ(test_reg_vals[encoding_to_range_idx(uc.args[2])],
uc.args[3]);
break;
case UCALL_DONE:
done = true;
......@@ -448,13 +457,85 @@ static void test_guest_reg_read(struct kvm_vcpu *vcpu)
}
}
/* Politely lifted from arch/arm64/include/asm/cache.h */
/* Ctypen, bits[3(n - 1) + 2 : 3(n - 1)], for n = 1 to 7 */
#define CLIDR_CTYPE_SHIFT(level) (3 * (level - 1))
#define CLIDR_CTYPE_MASK(level) (7 << CLIDR_CTYPE_SHIFT(level))
#define CLIDR_CTYPE(clidr, level) \
(((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level))
static void test_clidr(struct kvm_vcpu *vcpu)
{
uint64_t clidr;
int level;
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_CLIDR_EL1), &clidr);
/* find the first empty level in the cache hierarchy */
for (level = 1; level < 7; level++) {
if (!CLIDR_CTYPE(clidr, level))
break;
}
/*
* If you have a mind-boggling 7 levels of cache, congratulations, you
* get to fix this.
*/
TEST_ASSERT(level <= 7, "can't find an empty level in cache hierarchy");
/* stick in a unified cache level */
clidr |= BIT(2) << CLIDR_CTYPE_SHIFT(level);
vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_CLIDR_EL1), clidr);
test_reg_vals[encoding_to_range_idx(SYS_CLIDR_EL1)] = clidr;
}
static void test_vcpu_ftr_id_regs(struct kvm_vcpu *vcpu)
{
u64 val;
test_clidr(vcpu);
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &val);
val++;
vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), val);
test_reg_vals[encoding_to_range_idx(SYS_MPIDR_EL1)] = val;
ksft_test_result_pass("%s\n", __func__);
}
static void test_assert_id_reg_unchanged(struct kvm_vcpu *vcpu, uint32_t encoding)
{
size_t idx = encoding_to_range_idx(encoding);
uint64_t observed;
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(encoding), &observed);
TEST_ASSERT_EQ(test_reg_vals[idx], observed);
}
static void test_reset_preserves_id_regs(struct kvm_vcpu *vcpu)
{
/*
* Calls KVM_ARM_VCPU_INIT behind the scenes, which will do an
* architectural reset of the vCPU.
*/
aarch64_vcpu_setup(vcpu, NULL);
for (int i = 0; i < ARRAY_SIZE(test_regs); i++)
test_assert_id_reg_unchanged(vcpu, test_regs[i].reg);
test_assert_id_reg_unchanged(vcpu, SYS_CLIDR_EL1);
ksft_test_result_pass("%s\n", __func__);
}
int main(void)
{
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
bool aarch64_only;
uint64_t val, el0;
int ftr_cnt;
int test_cnt;
TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES));
......@@ -467,18 +548,22 @@ int main(void)
ksft_print_header();
ftr_cnt = ARRAY_SIZE(ftr_id_aa64dfr0_el1) + ARRAY_SIZE(ftr_id_dfr0_el1) +
ARRAY_SIZE(ftr_id_aa64isar0_el1) + ARRAY_SIZE(ftr_id_aa64isar1_el1) +
ARRAY_SIZE(ftr_id_aa64isar2_el1) + ARRAY_SIZE(ftr_id_aa64pfr0_el1) +
ARRAY_SIZE(ftr_id_aa64mmfr0_el1) + ARRAY_SIZE(ftr_id_aa64mmfr1_el1) +
ARRAY_SIZE(ftr_id_aa64mmfr2_el1) + ARRAY_SIZE(ftr_id_aa64zfr0_el1) -
ARRAY_SIZE(test_regs);
test_cnt = ARRAY_SIZE(ftr_id_aa64dfr0_el1) + ARRAY_SIZE(ftr_id_dfr0_el1) +
ARRAY_SIZE(ftr_id_aa64isar0_el1) + ARRAY_SIZE(ftr_id_aa64isar1_el1) +
ARRAY_SIZE(ftr_id_aa64isar2_el1) + ARRAY_SIZE(ftr_id_aa64pfr0_el1) +
ARRAY_SIZE(ftr_id_aa64mmfr0_el1) + ARRAY_SIZE(ftr_id_aa64mmfr1_el1) +
ARRAY_SIZE(ftr_id_aa64mmfr2_el1) + ARRAY_SIZE(ftr_id_aa64zfr0_el1) -
ARRAY_SIZE(test_regs) + 2;
ksft_set_plan(ftr_cnt);
ksft_set_plan(test_cnt);
test_vm_ftr_id_regs(vcpu, aarch64_only);
test_vcpu_ftr_id_regs(vcpu);
test_user_set_reg(vcpu, aarch64_only);
test_guest_reg_read(vcpu);
test_reset_preserves_id_regs(vcpu);
kvm_vm_free(vm);
ksft_finished();
......
......@@ -19,9 +19,6 @@
#include "gic_v3.h"
#include "vgic.h"
#define GICD_BASE_GPA 0x08000000ULL
#define GICR_BASE_GPA 0x080A0000ULL
/*
* Stores the user specified args; it's passed to the guest and to every test
* function.
......@@ -49,9 +46,6 @@ struct test_args {
#define IRQ_DEFAULT_PRIO (LOWEST_PRIO - 1)
#define IRQ_DEFAULT_PRIO_REG (IRQ_DEFAULT_PRIO << KVM_PRIO_SHIFT) /* 0xf0 */
static void *dist = (void *)GICD_BASE_GPA;
static void *redist = (void *)GICR_BASE_GPA;
/*
* The kvm_inject_* utilities are used by the guest to ask the host to inject
* interrupts (e.g., using the KVM_IRQ_LINE ioctl).
......@@ -152,7 +146,7 @@ static void reset_stats(void)
static uint64_t gic_read_ap1r0(void)
{
uint64_t reg = read_sysreg_s(SYS_ICV_AP1R0_EL1);
uint64_t reg = read_sysreg_s(SYS_ICC_AP1R0_EL1);
dsb(sy);
return reg;
......@@ -160,7 +154,7 @@ static uint64_t gic_read_ap1r0(void)
static void gic_write_ap1r0(uint64_t val)
{
write_sysreg_s(val, SYS_ICV_AP1R0_EL1);
write_sysreg_s(val, SYS_ICC_AP1R0_EL1);
isb();
}
......@@ -478,7 +472,7 @@ static void guest_code(struct test_args *args)
bool level_sensitive = args->level_sensitive;
struct kvm_inject_desc *f, *inject_fns;
gic_init(GIC_V3, 1, dist, redist);
gic_init(GIC_V3, 1);
for (i = 0; i < nr_irqs; i++)
gic_irq_enable(i);
......@@ -764,8 +758,7 @@ static void test_vgic(uint32_t nr_irqs, bool level_sensitive, bool eoi_split)
memcpy(addr_gva2hva(vm, args_gva), &args, sizeof(args));
vcpu_args_set(vcpu, 1, args_gva);
gic_fd = vgic_v3_setup(vm, 1, nr_irqs,
GICD_BASE_GPA, GICR_BASE_GPA);
gic_fd = vgic_v3_setup(vm, 1, nr_irqs);
__TEST_REQUIRE(gic_fd >= 0, "Failed to create vgic-v3, skipping");
vm_install_exception_handler(vm, VECTOR_IRQ_CURRENT,
......
// SPDX-License-Identifier: GPL-2.0
/*
* vgic_lpi_stress - Stress test for KVM's ITS emulation
*
* Copyright (c) 2024 Google LLC
*/
#include <linux/sizes.h>
#include <pthread.h>
#include <stdatomic.h>
#include <sys/sysinfo.h>
#include "kvm_util.h"
#include "gic.h"
#include "gic_v3.h"
#include "gic_v3_its.h"
#include "processor.h"
#include "ucall.h"
#include "vgic.h"
#define TEST_MEMSLOT_INDEX 1
#define GIC_LPI_OFFSET 8192
static size_t nr_iterations = 1000;
static vm_paddr_t gpa_base;
static struct kvm_vm *vm;
static struct kvm_vcpu **vcpus;
static int gic_fd, its_fd;
static struct test_data {
bool request_vcpus_stop;
u32 nr_cpus;
u32 nr_devices;
u32 nr_event_ids;
vm_paddr_t device_table;
vm_paddr_t collection_table;
vm_paddr_t cmdq_base;
void *cmdq_base_va;
vm_paddr_t itt_tables;
vm_paddr_t lpi_prop_table;
vm_paddr_t lpi_pend_tables;
} test_data = {
.nr_cpus = 1,
.nr_devices = 1,
.nr_event_ids = 16,
};
static void guest_irq_handler(struct ex_regs *regs)
{
u32 intid = gic_get_and_ack_irq();
if (intid == IAR_SPURIOUS)
return;
GUEST_ASSERT(intid >= GIC_LPI_OFFSET);
gic_set_eoi(intid);
}
static void guest_setup_its_mappings(void)
{
u32 coll_id, device_id, event_id, intid = GIC_LPI_OFFSET;
u32 nr_events = test_data.nr_event_ids;
u32 nr_devices = test_data.nr_devices;
u32 nr_cpus = test_data.nr_cpus;
for (coll_id = 0; coll_id < nr_cpus; coll_id++)
its_send_mapc_cmd(test_data.cmdq_base_va, coll_id, coll_id, true);
/* Round-robin the LPIs to all of the vCPUs in the VM */
coll_id = 0;
for (device_id = 0; device_id < nr_devices; device_id++) {
vm_paddr_t itt_base = test_data.itt_tables + (device_id * SZ_64K);
its_send_mapd_cmd(test_data.cmdq_base_va, device_id,
itt_base, SZ_64K, true);
for (event_id = 0; event_id < nr_events; event_id++) {
its_send_mapti_cmd(test_data.cmdq_base_va, device_id,
event_id, coll_id, intid++);
coll_id = (coll_id + 1) % test_data.nr_cpus;
}
}
}
static void guest_invalidate_all_rdists(void)
{
int i;
for (i = 0; i < test_data.nr_cpus; i++)
its_send_invall_cmd(test_data.cmdq_base_va, i);
}
static void guest_setup_gic(void)
{
static atomic_int nr_cpus_ready = 0;
u32 cpuid = guest_get_vcpuid();
gic_init(GIC_V3, test_data.nr_cpus);
gic_rdist_enable_lpis(test_data.lpi_prop_table, SZ_64K,
test_data.lpi_pend_tables + (cpuid * SZ_64K));
atomic_fetch_add(&nr_cpus_ready, 1);
if (cpuid > 0)
return;
while (atomic_load(&nr_cpus_ready) < test_data.nr_cpus)
cpu_relax();
its_init(test_data.collection_table, SZ_64K,
test_data.device_table, SZ_64K,
test_data.cmdq_base, SZ_64K);
guest_setup_its_mappings();
guest_invalidate_all_rdists();
}
static void guest_code(size_t nr_lpis)
{
guest_setup_gic();
GUEST_SYNC(0);
/*
* Don't use WFI here to avoid blocking the vCPU thread indefinitely and
* never getting the stop signal.
*/
while (!READ_ONCE(test_data.request_vcpus_stop))
cpu_relax();
GUEST_DONE();
}
static void setup_memslot(void)
{
size_t pages;
size_t sz;
/*
* For the ITS:
* - A single level device table
* - A single level collection table
* - The command queue
* - An ITT for each device
*/
sz = (3 + test_data.nr_devices) * SZ_64K;
/*
* For the redistributors:
* - A shared LPI configuration table
* - An LPI pending table for each vCPU
*/
sz += (1 + test_data.nr_cpus) * SZ_64K;
pages = sz / vm->page_size;
gpa_base = ((vm_compute_max_gfn(vm) + 1) * vm->page_size) - sz;
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, gpa_base,
TEST_MEMSLOT_INDEX, pages, 0);
}
#define LPI_PROP_DEFAULT_PRIO 0xa0
static void configure_lpis(void)
{
size_t nr_lpis = test_data.nr_devices * test_data.nr_event_ids;
u8 *tbl = addr_gpa2hva(vm, test_data.lpi_prop_table);
size_t i;
for (i = 0; i < nr_lpis; i++) {
tbl[i] = LPI_PROP_DEFAULT_PRIO |
LPI_PROP_GROUP1 |
LPI_PROP_ENABLED;
}
}
static void setup_test_data(void)
{
size_t pages_per_64k = vm_calc_num_guest_pages(vm->mode, SZ_64K);
u32 nr_devices = test_data.nr_devices;
u32 nr_cpus = test_data.nr_cpus;
vm_paddr_t cmdq_base;
test_data.device_table = vm_phy_pages_alloc(vm, pages_per_64k,
gpa_base,
TEST_MEMSLOT_INDEX);
test_data.collection_table = vm_phy_pages_alloc(vm, pages_per_64k,
gpa_base,
TEST_MEMSLOT_INDEX);
cmdq_base = vm_phy_pages_alloc(vm, pages_per_64k, gpa_base,
TEST_MEMSLOT_INDEX);
virt_map(vm, cmdq_base, cmdq_base, pages_per_64k);
test_data.cmdq_base = cmdq_base;
test_data.cmdq_base_va = (void *)cmdq_base;
test_data.itt_tables = vm_phy_pages_alloc(vm, pages_per_64k * nr_devices,
gpa_base, TEST_MEMSLOT_INDEX);
test_data.lpi_prop_table = vm_phy_pages_alloc(vm, pages_per_64k,
gpa_base, TEST_MEMSLOT_INDEX);
configure_lpis();
test_data.lpi_pend_tables = vm_phy_pages_alloc(vm, pages_per_64k * nr_cpus,
gpa_base, TEST_MEMSLOT_INDEX);
sync_global_to_guest(vm, test_data);
}
static void setup_gic(void)
{
gic_fd = vgic_v3_setup(vm, test_data.nr_cpus, 64);
__TEST_REQUIRE(gic_fd >= 0, "Failed to create GICv3");
its_fd = vgic_its_setup(vm);
}
static void signal_lpi(u32 device_id, u32 event_id)
{
vm_paddr_t db_addr = GITS_BASE_GPA + GITS_TRANSLATER;
struct kvm_msi msi = {
.address_lo = db_addr,
.address_hi = db_addr >> 32,
.data = event_id,
.devid = device_id,
.flags = KVM_MSI_VALID_DEVID,
};
/*
* KVM_SIGNAL_MSI returns 1 if the MSI wasn't 'blocked' by the VM,
* which for arm64 implies having a valid translation in the ITS.
*/
TEST_ASSERT(__vm_ioctl(vm, KVM_SIGNAL_MSI, &msi) == 1,
"KVM_SIGNAL_MSI ioctl failed");
}
static pthread_barrier_t test_setup_barrier;
static void *lpi_worker_thread(void *data)
{
u32 device_id = (size_t)data;
u32 event_id;
size_t i;
pthread_barrier_wait(&test_setup_barrier);
for (i = 0; i < nr_iterations; i++)
for (event_id = 0; event_id < test_data.nr_event_ids; event_id++)
signal_lpi(device_id, event_id);
return NULL;
}
static void *vcpu_worker_thread(void *data)
{
struct kvm_vcpu *vcpu = data;
struct ucall uc;
while (true) {
vcpu_run(vcpu);
switch (get_ucall(vcpu, &uc)) {
case UCALL_SYNC:
pthread_barrier_wait(&test_setup_barrier);
continue;
case UCALL_DONE:
return NULL;
case UCALL_ABORT:
REPORT_GUEST_ASSERT(uc);
break;
default:
TEST_FAIL("Unknown ucall: %lu", uc.cmd);
}
}
return NULL;
}
static void report_stats(struct timespec delta)
{
double nr_lpis;
double time;
nr_lpis = test_data.nr_devices * test_data.nr_event_ids * nr_iterations;
time = delta.tv_sec;
time += ((double)delta.tv_nsec) / NSEC_PER_SEC;
pr_info("Rate: %.2f LPIs/sec\n", nr_lpis / time);
}
static void run_test(void)
{
u32 nr_devices = test_data.nr_devices;
u32 nr_vcpus = test_data.nr_cpus;
pthread_t *lpi_threads = malloc(nr_devices * sizeof(pthread_t));
pthread_t *vcpu_threads = malloc(nr_vcpus * sizeof(pthread_t));
struct timespec start, delta;
size_t i;
TEST_ASSERT(lpi_threads && vcpu_threads, "Failed to allocate pthread arrays");
pthread_barrier_init(&test_setup_barrier, NULL, nr_vcpus + nr_devices + 1);
for (i = 0; i < nr_vcpus; i++)
pthread_create(&vcpu_threads[i], NULL, vcpu_worker_thread, vcpus[i]);
for (i = 0; i < nr_devices; i++)
pthread_create(&lpi_threads[i], NULL, lpi_worker_thread, (void *)i);
pthread_barrier_wait(&test_setup_barrier);
clock_gettime(CLOCK_MONOTONIC, &start);
for (i = 0; i < nr_devices; i++)
pthread_join(lpi_threads[i], NULL);
delta = timespec_elapsed(start);
write_guest_global(vm, test_data.request_vcpus_stop, true);
for (i = 0; i < nr_vcpus; i++)
pthread_join(vcpu_threads[i], NULL);
report_stats(delta);
}
static void setup_vm(void)
{
int i;
vcpus = malloc(test_data.nr_cpus * sizeof(struct kvm_vcpu));
TEST_ASSERT(vcpus, "Failed to allocate vCPU array");
vm = vm_create_with_vcpus(test_data.nr_cpus, guest_code, vcpus);
vm_init_descriptor_tables(vm);
for (i = 0; i < test_data.nr_cpus; i++)
vcpu_init_descriptor_tables(vcpus[i]);
vm_install_exception_handler(vm, VECTOR_IRQ_CURRENT, guest_irq_handler);
setup_memslot();
setup_gic();
setup_test_data();
}
static void destroy_vm(void)
{
close(its_fd);
close(gic_fd);
kvm_vm_free(vm);
free(vcpus);
}
static void pr_usage(const char *name)
{
pr_info("%s [-v NR_VCPUS] [-d NR_DEVICES] [-e NR_EVENTS] [-i ITERS] -h\n", name);
pr_info(" -v:\tnumber of vCPUs (default: %u)\n", test_data.nr_cpus);
pr_info(" -d:\tnumber of devices (default: %u)\n", test_data.nr_devices);
pr_info(" -e:\tnumber of event IDs per device (default: %u)\n", test_data.nr_event_ids);
pr_info(" -i:\tnumber of iterations (default: %lu)\n", nr_iterations);
}
int main(int argc, char **argv)
{
u32 nr_threads;
int c;
while ((c = getopt(argc, argv, "hv:d:e:i:")) != -1) {
switch (c) {
case 'v':
test_data.nr_cpus = atoi(optarg);
break;
case 'd':
test_data.nr_devices = atoi(optarg);
break;
case 'e':
test_data.nr_event_ids = atoi(optarg);
break;
case 'i':
nr_iterations = strtoul(optarg, NULL, 0);
break;
case 'h':
default:
pr_usage(argv[0]);
return 1;
}
}
nr_threads = test_data.nr_cpus + test_data.nr_devices;
if (nr_threads > get_nprocs())
pr_info("WARNING: running %u threads on %d CPUs; performance is degraded.\n",
nr_threads, get_nprocs());
setup_vm();
run_test();
destroy_vm();
return 0;
}
......@@ -404,9 +404,6 @@ static void guest_code(uint64_t expected_pmcr_n)
GUEST_DONE();
}
#define GICD_BASE_GPA 0x8000000ULL
#define GICR_BASE_GPA 0x80A0000ULL
/* Create a VM that has one vCPU with PMUv3 configured. */
static void create_vpmu_vm(void *guest_code)
{
......@@ -438,8 +435,7 @@ static void create_vpmu_vm(void *guest_code)
init.features[0] |= (1 << KVM_ARM_VCPU_PMU_V3);
vpmu_vm.vcpu = aarch64_vcpu_add(vpmu_vm.vm, 0, &init, guest_code);
vcpu_init_descriptor_tables(vpmu_vm.vcpu);
vpmu_vm.gic_fd = vgic_v3_setup(vpmu_vm.vm, 1, 64,
GICD_BASE_GPA, GICR_BASE_GPA);
vpmu_vm.gic_fd = vgic_v3_setup(vpmu_vm.vm, 1, 64);
__TEST_REQUIRE(vpmu_vm.gic_fd >= 0,
"Failed to create vgic-v3, skipping");
......
......@@ -22,9 +22,6 @@
#ifdef __aarch64__
#include "aarch64/vgic.h"
#define GICD_BASE_GPA 0x8000000ULL
#define GICR_BASE_GPA 0x80A0000ULL
static int gic_fd;
static void arch_setup_vm(struct kvm_vm *vm, unsigned int nr_vcpus)
......@@ -33,7 +30,7 @@ static void arch_setup_vm(struct kvm_vm *vm, unsigned int nr_vcpus)
* The test can still run even if hardware does not support GICv3, as it
* is only an optimization to reduce guest exits.
*/
gic_fd = vgic_v3_setup(vm, nr_vcpus, 64, GICD_BASE_GPA, GICR_BASE_GPA);
gic_fd = vgic_v3_setup(vm, nr_vcpus, 64);
}
static void arch_cleanup_vm(struct kvm_vm *vm)
......
......@@ -6,11 +6,26 @@
#ifndef SELFTEST_KVM_GIC_H
#define SELFTEST_KVM_GIC_H
#include <asm/kvm.h>
enum gic_type {
GIC_V3,
GIC_TYPE_MAX,
};
/*
* Note that the redistributor frames are at the end, as the range scales
* with the number of vCPUs in the VM.
*/
#define GITS_BASE_GPA 0x8000000ULL
#define GICD_BASE_GPA (GITS_BASE_GPA + KVM_VGIC_V3_ITS_SIZE)
#define GICR_BASE_GPA (GICD_BASE_GPA + KVM_VGIC_V3_DIST_SIZE)
/* The GIC is identity-mapped into the guest at the time of setup. */
#define GITS_BASE_GVA ((volatile void *)GITS_BASE_GPA)
#define GICD_BASE_GVA ((volatile void *)GICD_BASE_GPA)
#define GICR_BASE_GVA ((volatile void *)GICR_BASE_GPA)
#define MIN_SGI 0
#define MIN_PPI 16
#define MIN_SPI 32
......@@ -21,8 +36,7 @@ enum gic_type {
#define INTID_IS_PPI(intid) (MIN_PPI <= (intid) && (intid) < MIN_SPI)
#define INTID_IS_SPI(intid) (MIN_SPI <= (intid) && (intid) <= MAX_SPI)
void gic_init(enum gic_type type, unsigned int nr_cpus,
void *dist_base, void *redist_base);
void gic_init(enum gic_type type, unsigned int nr_cpus);
void gic_irq_enable(unsigned int intid);
void gic_irq_disable(unsigned int intid);
unsigned int gic_get_and_ack_irq(void);
......@@ -44,4 +58,7 @@ void gic_irq_clear_pending(unsigned int intid);
bool gic_irq_get_pending(unsigned int intid);
void gic_irq_set_config(unsigned int intid, bool is_edge);
void gic_rdist_enable_lpis(vm_paddr_t cfg_table, size_t cfg_table_size,
vm_paddr_t pend_table);
#endif /* SELFTEST_KVM_GIC_H */
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __SELFTESTS_GIC_V3_ITS_H__
#define __SELFTESTS_GIC_V3_ITS_H__
#include <linux/sizes.h>
void its_init(vm_paddr_t coll_tbl, size_t coll_tbl_sz,
vm_paddr_t device_tbl, size_t device_tbl_sz,
vm_paddr_t cmdq, size_t cmdq_size);
void its_send_mapd_cmd(void *cmdq_base, u32 device_id, vm_paddr_t itt_base,
size_t itt_size, bool valid);
void its_send_mapc_cmd(void *cmdq_base, u32 vcpu_id, u32 collection_id, bool valid);
void its_send_mapti_cmd(void *cmdq_base, u32 device_id, u32 event_id,
u32 collection_id, u32 intid);
void its_send_invall_cmd(void *cmdq_base, u32 collection_id);
#endif // __SELFTESTS_GIC_V3_ITS_H__
......@@ -58,8 +58,6 @@
MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT))
#define MPIDR_HWID_BITMASK (0xff00fffffful)
void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);
struct kvm_vcpu *aarch64_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
struct kvm_vcpu_init *init, void *guest_code);
......@@ -177,11 +175,28 @@ static __always_inline u32 __raw_readl(const volatile void *addr)
return val;
}
static __always_inline void __raw_writeq(u64 val, volatile void *addr)
{
asm volatile("str %0, [%1]" : : "rZ" (val), "r" (addr));
}
static __always_inline u64 __raw_readq(const volatile void *addr)
{
u64 val;
asm volatile("ldr %0, [%1]" : "=r" (val) : "r" (addr));
return val;
}
#define writel_relaxed(v,c) ((void)__raw_writel((__force u32)cpu_to_le32(v),(c)))
#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32)__raw_readl(c)); __r; })
#define writeq_relaxed(v,c) ((void)__raw_writeq((__force u64)cpu_to_le64(v),(c)))
#define readq_relaxed(c) ({ u64 __r = le64_to_cpu((__force __le64)__raw_readq(c)); __r; })
#define writel(v,c) ({ __iowmb(); writel_relaxed((v),(c));})
#define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(__v); __v; })
#define writeq(v,c) ({ __iowmb(); writeq_relaxed((v),(c));})
#define readq(c) ({ u64 __v = readq_relaxed(c); __iormb(__v); __v; })
static inline void local_irq_enable(void)
{
......
......@@ -16,8 +16,7 @@
((uint64_t)(flags) << 12) | \
index)
int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
uint64_t gicd_base_gpa, uint64_t gicr_base_gpa);
int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs);
#define VGIC_MAX_RESERVED 1023
......@@ -33,4 +32,6 @@ void kvm_irq_write_isactiver(int gic_fd, uint32_t intid, struct kvm_vcpu *vcpu);
#define KVM_IRQCHIP_NUM_PINS (1020 - 32)
int vgic_its_setup(struct kvm_vm *vm);
#endif // SELFTEST_KVM_VGIC_H
......@@ -17,13 +17,12 @@
static const struct gic_common_ops *gic_common_ops;
static struct spinlock gic_lock;
static void gic_cpu_init(unsigned int cpu, void *redist_base)
static void gic_cpu_init(unsigned int cpu)
{
gic_common_ops->gic_cpu_init(cpu, redist_base);
gic_common_ops->gic_cpu_init(cpu);
}
static void
gic_dist_init(enum gic_type type, unsigned int nr_cpus, void *dist_base)
static void gic_dist_init(enum gic_type type, unsigned int nr_cpus)
{
const struct gic_common_ops *gic_ops = NULL;
......@@ -40,7 +39,7 @@ gic_dist_init(enum gic_type type, unsigned int nr_cpus, void *dist_base)
GUEST_ASSERT(gic_ops);
gic_ops->gic_init(nr_cpus, dist_base);
gic_ops->gic_init(nr_cpus);
gic_common_ops = gic_ops;
/* Make sure that the initialized data is visible to all the vCPUs */
......@@ -49,18 +48,15 @@ gic_dist_init(enum gic_type type, unsigned int nr_cpus, void *dist_base)
spin_unlock(&gic_lock);
}
void gic_init(enum gic_type type, unsigned int nr_cpus,
void *dist_base, void *redist_base)
void gic_init(enum gic_type type, unsigned int nr_cpus)
{
uint32_t cpu = guest_get_vcpuid();
GUEST_ASSERT(type < GIC_TYPE_MAX);
GUEST_ASSERT(dist_base);
GUEST_ASSERT(redist_base);
GUEST_ASSERT(nr_cpus);
gic_dist_init(type, nr_cpus, dist_base);
gic_cpu_init(cpu, redist_base);
gic_dist_init(type, nr_cpus);
gic_cpu_init(cpu);
}
void gic_irq_enable(unsigned int intid)
......
......@@ -8,8 +8,8 @@
#define SELFTEST_KVM_GIC_PRIVATE_H
struct gic_common_ops {
void (*gic_init)(unsigned int nr_cpus, void *dist_base);
void (*gic_cpu_init)(unsigned int cpu, void *redist_base);
void (*gic_init)(unsigned int nr_cpus);
void (*gic_cpu_init)(unsigned int cpu);
void (*gic_irq_enable)(unsigned int intid);
void (*gic_irq_disable)(unsigned int intid);
uint64_t (*gic_read_iar)(void);
......
This diff is collapsed.
......@@ -3,8 +3,10 @@
* ARM Generic Interrupt Controller (GIC) v3 host support
*/
#include <linux/kernel.h>
#include <linux/kvm.h>
#include <linux/sizes.h>
#include <asm/cputype.h>
#include <asm/kvm_para.h>
#include <asm/kvm.h>
......@@ -19,8 +21,6 @@
* Input args:
* vm - KVM VM
* nr_vcpus - Number of vCPUs supported by this VM
* gicd_base_gpa - Guest Physical Address of the Distributor region
* gicr_base_gpa - Guest Physical Address of the Redistributor region
*
* Output args: None
*
......@@ -30,11 +30,10 @@
* redistributor regions of the guest. Since it depends on the number of
* vCPUs for the VM, it must be called after all the vCPUs have been created.
*/
int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
uint64_t gicd_base_gpa, uint64_t gicr_base_gpa)
int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs)
{
int gic_fd;
uint64_t redist_attr;
uint64_t attr;
struct list_head *iter;
unsigned int nr_gic_pages, nr_vcpus_created = 0;
......@@ -60,18 +59,19 @@ int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);
attr = GICD_BASE_GPA;
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
KVM_VGIC_V3_ADDR_TYPE_DIST, &gicd_base_gpa);
KVM_VGIC_V3_ADDR_TYPE_DIST, &attr);
nr_gic_pages = vm_calc_num_guest_pages(vm->mode, KVM_VGIC_V3_DIST_SIZE);
virt_map(vm, gicd_base_gpa, gicd_base_gpa, nr_gic_pages);
virt_map(vm, GICD_BASE_GPA, GICD_BASE_GPA, nr_gic_pages);
/* Redistributor setup */
redist_attr = REDIST_REGION_ATTR_ADDR(nr_vcpus, gicr_base_gpa, 0, 0);
attr = REDIST_REGION_ATTR_ADDR(nr_vcpus, GICR_BASE_GPA, 0, 0);
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &redist_attr);
KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &attr);
nr_gic_pages = vm_calc_num_guest_pages(vm->mode,
KVM_VGIC_V3_REDIST_SIZE * nr_vcpus);
virt_map(vm, gicr_base_gpa, gicr_base_gpa, nr_gic_pages);
virt_map(vm, GICR_BASE_GPA, GICR_BASE_GPA, nr_gic_pages);
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);
......@@ -168,3 +168,21 @@ void kvm_irq_write_isactiver(int gic_fd, uint32_t intid, struct kvm_vcpu *vcpu)
{
vgic_poke_irq(gic_fd, intid, vcpu, GICD_ISACTIVER);
}
int vgic_its_setup(struct kvm_vm *vm)
{
int its_fd = kvm_create_device(vm, KVM_DEV_TYPE_ARM_VGIC_ITS);
u64 attr;
attr = GITS_BASE_GPA;
kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
KVM_VGIC_ITS_ADDR_TYPE, &attr);
kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);
virt_map(vm, GITS_BASE_GPA, GITS_BASE_GPA,
vm_calc_num_guest_pages(vm->mode, KVM_VGIC_V3_ITS_SIZE));
return its_fd;
}
This diff is collapsed.
......@@ -366,6 +366,8 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
struct kvm_device *tmp;
struct kvm_vfio *kv;
lockdep_assert_held(&dev->kvm->lock);
/* Only one VFIO "device" per VM */
list_for_each_entry(tmp, &dev->kvm->devices, vm_node)
if (tmp->ops == &kvm_vfio_ops)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment