Commit c9b012e5 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The big highlight is support for the Scalable Vector Extension (SVE)
  which required extensive ABI work to ensure we don't break existing
  applications by blowing away their signal stack with the rather large
  new vector context (<= 2 kbit per vector register). There's further
  work to be done optimising things like exception return, but the ABI
  is solid now.

  Much of the line count comes from some new PMU drivers we have, but
  they're pretty self-contained and I suspect we'll have more of them in
  future.

  Plenty of acronym soup here:

   - initial support for the Scalable Vector Extension (SVE)

   - improved handling for SError interrupts (required to handle RAS
     events)

   - enable GCC support for 128-bit integer types

   - remove kernel text addresses from backtraces and register dumps

   - use of WFE to implement long delay()s

   - ACPI IORT updates from Lorenzo Pieralisi

   - perf PMU driver for the Statistical Profiling Extension (SPE)

   - perf PMU driver for Hisilicon's system PMUs

   - misc cleanups and non-critical fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
  arm64: Make ARMV8_DEPRECATED depend on SYSCTL
  arm64: Implement __lshrti3 library function
  arm64: support __int128 on gcc 5+
  arm64/sve: Add documentation
  arm64/sve: Detect SVE and activate runtime support
  arm64/sve: KVM: Hide SVE from CPU features exposed to guests
  arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
  arm64/sve: KVM: Prevent guests from using SVE
  arm64/sve: Add sysctl to set the default vector length for new processes
  arm64/sve: Add prctl controls for userspace vector length management
  arm64/sve: ptrace and ELF coredump support
  arm64/sve: Preserve SVE registers around EFI runtime service calls
  arm64/sve: Preserve SVE registers around kernel-mode NEON use
  arm64/sve: Probe SVE capabilities and usable vector lengths
  arm64: cpufeature: Move sys_caps_initialised declarations
  arm64/sve: Backend logic for setting the vector length
  arm64/sve: Signal handling support
  arm64/sve: Support vector length resetting for new processes
  arm64/sve: Core task context handling
  arm64/sve: Low-level CPU setup
  ...
parents b293fca4 6cfa7cc4
......@@ -110,10 +110,20 @@ infrastructure:
x--------------------------------------------------x
| Name | bits | visible |
|--------------------------------------------------|
| RES0 | [63-32] | n |
| RES0 | [63-48] | n |
|--------------------------------------------------|
| DP | [47-44] | y |
|--------------------------------------------------|
| SM4 | [43-40] | y |
|--------------------------------------------------|
| SM3 | [39-36] | y |
|--------------------------------------------------|
| SHA3 | [35-32] | y |
|--------------------------------------------------|
| RDM | [31-28] | y |
|--------------------------------------------------|
| RES0 | [27-24] | n |
|--------------------------------------------------|
| ATOMICS | [23-20] | y |
|--------------------------------------------------|
| CRC32 | [19-16] | y |
......@@ -132,7 +142,11 @@ infrastructure:
x--------------------------------------------------x
| Name | bits | visible |
|--------------------------------------------------|
| RES0 | [63-28] | n |
| RES0 | [63-36] | n |
|--------------------------------------------------|
| SVE | [35-32] | y |
|--------------------------------------------------|
| RES0 | [31-28] | n |
|--------------------------------------------------|
| GIC | [27-24] | n |
|--------------------------------------------------|
......
ARM64 ELF hwcaps
================
This document describes the usage and semantics of the arm64 ELF hwcaps.
1. Introduction
---------------
Some hardware or software features are only available on some CPU
implementations, and/or with certain kernel configurations, but have no
architected discovery mechanism available to userspace code at EL0. The
kernel exposes the presence of these features to userspace through a set
of flags called hwcaps, exposed in the auxilliary vector.
Userspace software can test for features by acquiring the AT_HWCAP entry
of the auxilliary vector, and testing whether the relevant flags are
set, e.g.
bool floating_point_is_present(void)
{
unsigned long hwcaps = getauxval(AT_HWCAP);
if (hwcaps & HWCAP_FP)
return true;
return false;
}
Where software relies on a feature described by a hwcap, it should check
the relevant hwcap flag to verify that the feature is present before
attempting to make use of the feature.
Features cannot be probed reliably through other means. When a feature
is not available, attempting to use it may result in unpredictable
behaviour, and is not guaranteed to result in any reliable indication
that the feature is unavailable, such as a SIGILL.
2. Interpretation of hwcaps
---------------------------
The majority of hwcaps are intended to indicate the presence of features
which are described by architected ID registers inaccessible to
userspace code at EL0. These hwcaps are defined in terms of ID register
fields, and should be interpreted with reference to the definition of
these fields in the ARM Architecture Reference Manual (ARM ARM).
Such hwcaps are described below in the form:
Functionality implied by idreg.field == val.
Such hwcaps indicate the availability of functionality that the ARM ARM
defines as being present when idreg.field has value val, but do not
indicate that idreg.field is precisely equal to val, nor do they
indicate the absence of functionality implied by other values of
idreg.field.
Other hwcaps may indicate the presence of features which cannot be
described by ID registers alone. These may be described without
reference to ID registers, and may refer to other documentation.
3. The hwcaps exposed in AT_HWCAP
---------------------------------
HWCAP_FP
Functionality implied by ID_AA64PFR0_EL1.FP == 0b0000.
HWCAP_ASIMD
Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0000.
HWCAP_EVTSTRM
The generic timer is configured to generate events at a frequency of
approximately 100KHz.
HWCAP_AES
Functionality implied by ID_AA64ISAR1_EL1.AES == 0b0001.
HWCAP_PMULL
Functionality implied by ID_AA64ISAR1_EL1.AES == 0b0010.
HWCAP_SHA1
Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001.
HWCAP_SHA2
Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001.
HWCAP_CRC32
Functionality implied by ID_AA64ISAR0_EL1.CRC32 == 0b0001.
HWCAP_ATOMICS
Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0010.
HWCAP_FPHP
Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001.
HWCAP_ASIMDHP
Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0001.
HWCAP_CPUID
EL0 access to certain ID registers is available, to the extent
described by Documentation/arm64/cpu-feature-registers.txt.
These ID registers may imply the availability of features.
HWCAP_ASIMDRDM
Functionality implied by ID_AA64ISAR0_EL1.RDM == 0b0001.
HWCAP_JSCVT
Functionality implied by ID_AA64ISAR1_EL1.JSCVT == 0b0001.
HWCAP_FCMA
Functionality implied by ID_AA64ISAR1_EL1.FCMA == 0b0001.
HWCAP_LRCPC
Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0001.
HWCAP_DCPOP
Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0001.
HWCAP_SHA3
Functionality implied by ID_AA64ISAR0_EL1.SHA3 == 0b0001.
HWCAP_SM3
Functionality implied by ID_AA64ISAR0_EL1.SM3 == 0b0001.
HWCAP_SM4
Functionality implied by ID_AA64ISAR0_EL1.SM4 == 0b0001.
HWCAP_ASIMDDP
Functionality implied by ID_AA64ISAR0_EL1.DP == 0b0001.
HWCAP_SHA512
Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0002.
HWCAP_SVE
Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001.
......@@ -86,9 +86,9 @@ Translation table lookup with 64KB pages:
+-------------------------------------------------> [63] TTBR0/1
When using KVM, the hypervisor maps kernel pages in EL2, at a fixed
offset from the kernel VA (top 24bits of the kernel VA set to zero):
When using KVM without the Virtualization Host Extensions, the hypervisor
maps kernel pages in EL2 at a fixed offset from the kernel VA. See the
kern_hyp_va macro for more details.
Start End Size Use
-----------------------------------------------------------------------
0000004000000000 0000007fffffffff 256GB kernel objects mapped in HYP
When using KVM with the Virtualization Host Extensions, no additional
mappings are created, since the host kernel runs directly in EL2.
This diff is collapsed.
* ARMv8.2 Statistical Profiling Extension (SPE) Performance Monitor Units (PMU)
ARMv8.2 introduces the optional Statistical Profiling Extension for collecting
performance sample data using an in-memory trace buffer.
** SPE Required properties:
- compatible : should be one of:
"arm,statistical-profiling-extension-v1"
- interrupts : Exactly 1 PPI must be listed. For heterogeneous systems where
SPE is only supported on a subset of the CPUs, please consult
the arm,gic-v3 binding for details on describing a PPI partition.
** Example:
spe-pmu {
compatible = "arm,statistical-profiling-extension-v1";
interrupts = <GIC_PPI 05 IRQ_TYPE_LEVEL_HIGH &part1>;
};
HiSilicon SoC uncore Performance Monitoring Unit (PMU)
======================================================
The HiSilicon SoC chip includes various independent system device PMUs
such as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are
independent and have hardware logic to gather statistics and performance
information.
The HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster
(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is
called Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has
two HHAs (0 - 1) and four DDRCs (0 - 3), respectively.
HiSilicon SoC uncore PMU driver
---------------------------------------
Each device PMU has separate registers for event counting, control and
interrupt, and the PMU driver shall register perf PMU drivers like L3C,
HHA and DDRC etc. The available events and configuration options shall
be described in the sysfs, see :
/sys/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>/, or
/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
The "perf list" command shall list the available events from sysfs.
Each L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
name will appear in event listing as hisi_sccl<sccl-id>_module<index-id>.
where "sccl-id" is the identifier of the SCCL and "index-id" is the index of
module.
e.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in
SCCL ID #3.
e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
SCCL ID #1.
The driver also provides a "cpumask" sysfs attribute, which shows the CPU core
ID used to count the uncore PMU event.
Example usage of perf:
$# perf list
hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event]
------------------------------------------
hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event]
------------------------------------------
hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event]
------------------------------------------
hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event]
------------------------------------------
$# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
$# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
The current driver does not support sampling. So "perf record" is unsupported.
Also attach to a task is unsupported as the events are all uncore.
Note: Please contact the maintainer for a complete list of events supported for
the PMU devices in the SoC and its information if needed.
......@@ -6259,6 +6259,13 @@ S: Maintained
F: drivers/net/ethernet/hisilicon/
F: Documentation/devicetree/bindings/net/hisilicon*.txt
HISILICON PMU DRIVER
M: Shaokun Zhang <zhangshaokun@hisilicon.com>
W: http://www.hisilicon.com
S: Supported
F: drivers/perf/hisilicon
F: Documentation/perf/hisi-pmu.txt
HISILICON ROCE DRIVER
M: Lijun Ou <oulijun@huawei.com>
M: Wei Hu(Xavier) <xavier.huwei@huawei.com>
......
......@@ -107,6 +107,7 @@ static inline u32 arch_timer_get_cntkctl(void)
static inline void arch_timer_set_cntkctl(u32 cntkctl)
{
asm volatile("mcr p15, 0, %0, c14, c1, 0" : : "r" (cntkctl));
isb();
}
#endif
......
......@@ -293,4 +293,7 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
/* All host FP/SIMD state is restored on guest exit, so nothing to save: */
static inline void kvm_fpsimd_flush_cpu_state(void) {}
#endif /* __ARM_KVM_HOST_H__ */
......@@ -21,7 +21,7 @@ config ARM64
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_NMI_SAFE_CMPXCHG if ACPI_APEI_SEA
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_INLINE_READ_LOCK if !PREEMPT
select ARCH_INLINE_READ_LOCK_BH if !PREEMPT
select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPT
......@@ -115,7 +115,7 @@ config ARM64
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_NMI if ACPI_APEI_SEA
select HAVE_NMI
select HAVE_PATA_PLATFORM
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
......@@ -136,6 +136,7 @@ config ARM64
select PCI_ECAM if ACPI
select POWER_RESET
select POWER_SUPPLY
select REFCOUNT_FULL
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
......@@ -842,6 +843,7 @@ config FORCE_MAX_ZONEORDER
menuconfig ARMV8_DEPRECATED
bool "Emulate deprecated/obsolete ARMv8 instructions"
depends on COMPAT
depends on SYSCTL
help
Legacy software support may require certain instructions
that have been deprecated or obsoleted in the architecture.
......@@ -1011,6 +1013,17 @@ config ARM64_PMEM
endmenu
config ARM64_SVE
bool "ARM Scalable Vector Extension support"
default y
help
The Scalable Vector Extension (SVE) is an extension to the AArch64
execution state which complements and extends the SIMD functionality
of the base architecture to support much larger vectors and to enable
additional vectorisation opportunities.
To enable use of this extension on CPUs that implement it, say Y.
config ARM64_MODULE_CMODEL_LARGE
bool
......@@ -1099,6 +1112,7 @@ config EFI_STUB
config EFI
bool "UEFI runtime support"
depends on OF && !CPU_BIG_ENDIAN
depends on KERNEL_MODE_NEON
select LIBFDT
select UCS2_STRING
select EFI_PARAMS_FROM_FDT
......
......@@ -14,8 +14,12 @@ LDFLAGS_vmlinux :=-p --no-undefined -X
CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET)
GZFLAGS :=-9
ifneq ($(CONFIG_RELOCATABLE),)
LDFLAGS_vmlinux += -pie -shared -Bsymbolic
ifeq ($(CONFIG_RELOCATABLE), y)
# Pass --no-apply-dynamic-relocs to restore pre-binutils-2.27 behaviour
# for relative relocs, since this leads to better Image compression
# with the relocation offsets always being zero.
LDFLAGS_vmlinux += -pie -shared -Bsymbolic \
$(call ld-option, --no-apply-dynamic-relocs)
endif
ifeq ($(CONFIG_ARM64_ERRATUM_843419),y)
......@@ -53,6 +57,8 @@ KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
KBUILD_CFLAGS += $(call cc-option,-mabi=lp64)
KBUILD_AFLAGS += $(call cc-option,-mabi=lp64)
KBUILD_CFLAGS += $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128)
ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
KBUILD_CPPFLAGS += -mbig-endian
CHECKFLAGS += -D__AARCH64EB__
......
......@@ -144,6 +144,7 @@ static inline u32 arch_timer_get_cntkctl(void)
static inline void arch_timer_set_cntkctl(u32 cntkctl)
{
write_sysreg(cntkctl, cntkctl_el1);
isb();
}
static inline u64 arch_counter_get_cntpct(void)
......
......@@ -22,10 +22,10 @@
#define _BUGVERBOSE_LOCATION(file, line) __BUGVERBOSE_LOCATION(file, line)
#define __BUGVERBOSE_LOCATION(file, line) \
.pushsection .rodata.str,"aMS",@progbits,1; \
2: .string file; \
14472: .string file; \
.popsection; \
\
.long 2b - 0b; \
.long 14472b - 14470b; \
.short line;
#else
#define _BUGVERBOSE_LOCATION(file, line)
......@@ -36,11 +36,11 @@
#define __BUG_ENTRY(flags) \
.pushsection __bug_table,"aw"; \
.align 2; \
0: .long 1f - 0b; \
14470: .long 14471f - 14470b; \
_BUGVERBOSE_LOCATION(__FILE__, __LINE__) \
.short flags; \
.popsection; \
1:
14471:
#else
#define __BUG_ENTRY(flags)
#endif
......
......@@ -25,12 +25,41 @@
#include <asm/asm-offsets.h>
#include <asm/cpufeature.h>
#include <asm/debug-monitors.h>
#include <asm/mmu_context.h>
#include <asm/page.h>
#include <asm/pgtable-hwdef.h>
#include <asm/ptrace.h>
#include <asm/thread_info.h>
.macro save_and_disable_daif, flags
mrs \flags, daif
msr daifset, #0xf
.endm
.macro disable_daif
msr daifset, #0xf
.endm
.macro enable_daif
msr daifclr, #0xf
.endm
.macro restore_daif, flags:req
msr daif, \flags
.endm
/* Only on aarch64 pstate, PSR_D_BIT is different for aarch32 */
.macro inherit_daif, pstate:req, tmp:req
and \tmp, \pstate, #(PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT)
msr daif, \tmp
.endm
/* IRQ is the lowest priority flag, unconditionally unmask the rest. */
.macro enable_da_f
msr daifclr, #(8 | 4 | 1)
.endm
/*
* Enable and disable interrupts.
*/
......@@ -51,13 +80,6 @@
msr daif, \flags
.endm
/*
* Enable and disable debug exceptions.
*/
.macro disable_dbg
msr daifset, #8
.endm
.macro enable_dbg
msr daifclr, #8
.endm
......@@ -65,30 +87,21 @@
.macro disable_step_tsk, flgs, tmp
tbz \flgs, #TIF_SINGLESTEP, 9990f
mrs \tmp, mdscr_el1
bic \tmp, \tmp, #1
bic \tmp, \tmp, #DBG_MDSCR_SS
msr mdscr_el1, \tmp
isb // Synchronise with enable_dbg
9990:
.endm
/* call with daif masked */
.macro enable_step_tsk, flgs, tmp
tbz \flgs, #TIF_SINGLESTEP, 9990f
disable_dbg
mrs \tmp, mdscr_el1
orr \tmp, \tmp, #1
orr \tmp, \tmp, #DBG_MDSCR_SS
msr mdscr_el1, \tmp
9990:
.endm
/*
* Enable both debug exceptions and interrupts. This is likely to be
* faster than two daifclr operations, since writes to this register
* are self-synchronising.
*/
.macro enable_dbg_and_irq
msr daifclr, #(8 | 2)
.endm
/*
* SMP data memory barrier
*/
......
......@@ -31,6 +31,8 @@
#define dmb(opt) asm volatile("dmb " #opt : : : "memory")
#define dsb(opt) asm volatile("dsb " #opt : : : "memory")
#define psb_csync() asm volatile("hint #17" : : : "memory")
#define mb() dsb(sy)
#define rmb() dsb(ld)
#define wmb() dsb(st)
......
......@@ -41,6 +41,7 @@ struct cpuinfo_arm64 {
u64 reg_id_aa64mmfr2;
u64 reg_id_aa64pfr0;
u64 reg_id_aa64pfr1;
u64 reg_id_aa64zfr0;
u32 reg_id_dfr0;
u32 reg_id_isar0;
......@@ -59,6 +60,9 @@ struct cpuinfo_arm64 {
u32 reg_mvfr0;
u32 reg_mvfr1;
u32 reg_mvfr2;
/* pseudo-ZCR for recording maximum ZCR_EL1 LEN value: */
u64 reg_zcr;
};
DECLARE_PER_CPU(struct cpuinfo_arm64, cpu_data);
......
......@@ -40,7 +40,8 @@
#define ARM64_WORKAROUND_858921 19
#define ARM64_WORKAROUND_CAVIUM_30115 20
#define ARM64_HAS_DCPOP 21
#define ARM64_SVE 22
#define ARM64_NCAPS 22
#define ARM64_NCAPS 23
#endif /* __ASM_CPUCAPS_H */
......@@ -10,7 +10,9 @@
#define __ASM_CPUFEATURE_H
#include <asm/cpucaps.h>
#include <asm/fpsimd.h>
#include <asm/hwcap.h>
#include <asm/sigcontext.h>
#include <asm/sysreg.h>
/*
......@@ -223,6 +225,13 @@ static inline bool id_aa64pfr0_32bit_el0(u64 pfr0)
return val == ID_AA64PFR0_EL0_32BIT_64BIT;
}
static inline bool id_aa64pfr0_sve(u64 pfr0)
{
u32 val = cpuid_feature_extract_unsigned_field(pfr0, ID_AA64PFR0_SVE_SHIFT);
return val > 0;
}
void __init setup_cpu_features(void);
void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
......@@ -262,6 +271,39 @@ static inline bool system_uses_ttbr0_pan(void)
!cpus_have_const_cap(ARM64_HAS_PAN);
}
static inline bool system_supports_sve(void)
{
return IS_ENABLED(CONFIG_ARM64_SVE) &&
cpus_have_const_cap(ARM64_SVE);
}
/*
* Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
* vector length.
*
* Use only if SVE is present.
* This function clobbers the SVE vector length.
*/
static inline u64 read_zcr_features(void)
{
u64 zcr;
unsigned int vq_max;
/*
* Set the maximum possible VL, and write zeroes to all other
* bits to see if they stick.
*/
sve_kernel_enable(NULL);
write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
zcr = read_sysreg_s(SYS_ZCR_EL1);
zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
vq_max = sve_vq_from_vl(sve_get_vl());
zcr |= vq_max - 1; /* set LEN field to maximum effective value */
return zcr;
}
#endif /* __ASSEMBLY__ */
#endif
/*
* Copyright (C) 2017 ARM Ltd.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef __ASM_DAIFFLAGS_H
#define __ASM_DAIFFLAGS_H
#include <linux/irqflags.h>
#define DAIF_PROCCTX 0
#define DAIF_PROCCTX_NOIRQ PSR_I_BIT
/* mask/save/unmask/restore all exceptions, including interrupts. */
static inline void local_daif_mask(void)
{
asm volatile(
"msr daifset, #0xf // local_daif_mask\n"
:
:
: "memory");
trace_hardirqs_off();
}
static inline unsigned long local_daif_save(void)
{
unsigned long flags;
asm volatile(
"mrs %0, daif // local_daif_save\n"
: "=r" (flags)
:
: "memory");
local_daif_mask();
return flags;
}
static inline void local_daif_unmask(void)
{
trace_hardirqs_on();
asm volatile(
"msr daifclr, #0xf // local_daif_unmask"
:
:
: "memory");
}
static inline void local_daif_restore(unsigned long flags)
{
if (!arch_irqs_disabled_flags(flags))
trace_hardirqs_on();
asm volatile(
"msr daif, %0 // local_daif_restore"
:
: "r" (flags)
: "memory");
if (arch_irqs_disabled_flags(flags))
trace_hardirqs_off();
}
#endif
......@@ -188,8 +188,8 @@ typedef compat_elf_greg_t compat_elf_gregset_t[COMPAT_ELF_NGREG];
#define compat_start_thread compat_start_thread
/*
* Unlike the native SET_PERSONALITY macro, the compat version inherits
* READ_IMPLIES_EXEC across a fork() since this is the behaviour on
* Unlike the native SET_PERSONALITY macro, the compat version maintains
* READ_IMPLIES_EXEC across an execve() since this is the behaviour on
* arch/arm/.
*/
#define COMPAT_SET_PERSONALITY(ex) \
......
......@@ -43,7 +43,8 @@
#define ESR_ELx_EC_HVC64 (0x16)
#define ESR_ELx_EC_SMC64 (0x17)
#define ESR_ELx_EC_SYS64 (0x18)
/* Unallocated EC: 0x19 - 0x1E */
#define ESR_ELx_EC_SVE (0x19)
/* Unallocated EC: 0x1A - 0x1E */
#define ESR_ELx_EC_IMP_DEF (0x1f)
#define ESR_ELx_EC_IABT_LOW (0x20)
#define ESR_ELx_EC_IABT_CUR (0x21)
......
......@@ -17,9 +17,13 @@
#define __ASM_FP_H
#include <asm/ptrace.h>
#include <asm/errno.h>
#ifndef __ASSEMBLY__
#include <linux/cache.h>
#include <linux/stddef.h>
/*
* FP/SIMD storage area has:
* - FPSR and FPCR
......@@ -35,13 +39,16 @@ struct fpsimd_state {
__uint128_t vregs[32];
u32 fpsr;
u32 fpcr;
/*
* For ptrace compatibility, pad to next 128-bit
* boundary here if extending this struct.
*/
};
};
/* the id of the last cpu to have restored this state */
unsigned int cpu;
};
#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/* Masks for extracting the FPSR and FPCR from the FPSCR */
#define VFP_FPSCR_STAT_MASK 0xf800009f
......@@ -61,11 +68,73 @@ extern void fpsimd_load_state(struct fpsimd_state *state);
extern void fpsimd_thread_switch(struct task_struct *next);
extern void fpsimd_flush_thread(void);
extern void fpsimd_signal_preserve_current_state(void);
extern void fpsimd_preserve_current_state(void);
extern void fpsimd_restore_current_state(void);
extern void fpsimd_update_current_state(struct fpsimd_state *state);
extern void fpsimd_flush_task_state(struct task_struct *target);
extern void sve_flush_cpu_state(void);
/* Maximum VL that SVE VL-agnostic software can transparently support */
#define SVE_VL_ARCH_MAX 0x100
extern void sve_save_state(void *state, u32 *pfpsr);
extern void sve_load_state(void const *state, u32 const *pfpsr,
unsigned long vq_minus_1);
extern unsigned int sve_get_vl(void);
extern int sve_kernel_enable(void *);
extern int __ro_after_init sve_max_vl;
#ifdef CONFIG_ARM64_SVE
extern size_t sve_state_size(struct task_struct const *task);
extern void sve_alloc(struct task_struct *task);
extern void fpsimd_release_task(struct task_struct *task);
extern void fpsimd_sync_to_sve(struct task_struct *task);
extern void sve_sync_to_fpsimd(struct task_struct *task);
extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
extern int sve_set_vector_length(struct task_struct *task,
unsigned long vl, unsigned long flags);
extern int sve_set_current_vl(unsigned long arg);
extern int sve_get_current_vl(void);
/*
* Probing and setup functions.
* Calls to these functions must be serialised with one another.
*/
extern void __init sve_init_vq_map(void);
extern void sve_update_vq_map(void);
extern int sve_verify_vq_map(void);
extern void __init sve_setup(void);
#else /* ! CONFIG_ARM64_SVE */
static inline void sve_alloc(struct task_struct *task) { }
static inline void fpsimd_release_task(struct task_struct *task) { }
static inline void sve_sync_to_fpsimd(struct task_struct *task) { }
static inline void sve_sync_from_fpsimd_zeropad(struct task_struct *task) { }
static inline int sve_set_current_vl(unsigned long arg)
{
return -EINVAL;
}
static inline int sve_get_current_vl(void)
{
return -EINVAL;
}
static inline void sve_init_vq_map(void) { }
static inline void sve_update_vq_map(void) { }
static inline int sve_verify_vq_map(void) { return 0; }
static inline void sve_setup(void) { }
#endif /* ! CONFIG_ARM64_SVE */
/* For use by EFI runtime services calls only */
extern void __efi_fpsimd_begin(void);
......
......@@ -75,3 +75,151 @@
ldr w\tmpnr, [\state, #16 * 2 + 4]
fpsimd_restore_fpcr x\tmpnr, \state
.endm
/* Sanity-check macros to help avoid encoding garbage instructions */
.macro _check_general_reg nr
.if (\nr) < 0 || (\nr) > 30
.error "Bad register number \nr."
.endif
.endm
.macro _sve_check_zreg znr
.if (\znr) < 0 || (\znr) > 31
.error "Bad Scalable Vector Extension vector register number \znr."
.endif
.endm
.macro _sve_check_preg pnr
.if (\pnr) < 0 || (\pnr) > 15
.error "Bad Scalable Vector Extension predicate register number \pnr."
.endif
.endm
.macro _check_num n, min, max
.if (\n) < (\min) || (\n) > (\max)
.error "Number \n out of range [\min,\max]"
.endif
.endm
/* SVE instruction encodings for non-SVE-capable assemblers */
/* STR (vector): STR Z\nz, [X\nxbase, #\offset, MUL VL] */
.macro _sve_str_v nz, nxbase, offset=0
_sve_check_zreg \nz
_check_general_reg \nxbase
_check_num (\offset), -0x100, 0xff
.inst 0xe5804000 \
| (\nz) \
| ((\nxbase) << 5) \
| (((\offset) & 7) << 10) \
| (((\offset) & 0x1f8) << 13)
.endm
/* LDR (vector): LDR Z\nz, [X\nxbase, #\offset, MUL VL] */
.macro _sve_ldr_v nz, nxbase, offset=0
_sve_check_zreg \nz
_check_general_reg \nxbase
_check_num (\offset), -0x100, 0xff
.inst 0x85804000 \
| (\nz) \
| ((\nxbase) << 5) \
| (((\offset) & 7) << 10) \
| (((\offset) & 0x1f8) << 13)
.endm
/* STR (predicate): STR P\np, [X\nxbase, #\offset, MUL VL] */
.macro _sve_str_p np, nxbase, offset=0
_sve_check_preg \np
_check_general_reg \nxbase
_check_num (\offset), -0x100, 0xff
.inst 0xe5800000 \
| (\np) \
| ((\nxbase) << 5) \
| (((\offset) & 7) << 10) \
| (((\offset) & 0x1f8) << 13)
.endm
/* LDR (predicate): LDR P\np, [X\nxbase, #\offset, MUL VL] */
.macro _sve_ldr_p np, nxbase, offset=0
_sve_check_preg \np
_check_general_reg \nxbase
_check_num (\offset), -0x100, 0xff
.inst 0x85800000 \
| (\np) \
| ((\nxbase) << 5) \
| (((\offset) & 7) << 10) \
| (((\offset) & 0x1f8) << 13)
.endm
/* RDVL X\nx, #\imm */
.macro _sve_rdvl nx, imm
_check_general_reg \nx
_check_num (\imm), -0x20, 0x1f
.inst 0x04bf5000 \
| (\nx) \
| (((\imm) & 0x3f) << 5)
.endm
/* RDFFR (unpredicated): RDFFR P\np.B */
.macro _sve_rdffr np
_sve_check_preg \np
.inst 0x2519f000 \
| (\np)
.endm
/* WRFFR P\np.B */
.macro _sve_wrffr np
_sve_check_preg \np
.inst 0x25289000 \
| ((\np) << 5)
.endm
.macro __for from:req, to:req
.if (\from) == (\to)
_for__body \from
.else
__for \from, (\from) + ((\to) - (\from)) / 2
__for (\from) + ((\to) - (\from)) / 2 + 1, \to
.endif
.endm
.macro _for var:req, from:req, to:req, insn:vararg
.macro _for__body \var:req
\insn
.endm
__for \from, \to
.purgem _for__body
.endm
.macro sve_save nxbase, xpfpsr, nxtmp
_for n, 0, 31, _sve_str_v \n, \nxbase, \n - 34
_for n, 0, 15, _sve_str_p \n, \nxbase, \n - 16
_sve_rdffr 0
_sve_str_p 0, \nxbase
_sve_ldr_p 0, \nxbase, -16
mrs x\nxtmp, fpsr
str w\nxtmp, [\xpfpsr]
mrs x\nxtmp, fpcr
str w\nxtmp, [\xpfpsr, #4]
.endm
.macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp
mrs_s x\nxtmp, SYS_ZCR_EL1
bic x\nxtmp, x\nxtmp, ZCR_ELx_LEN_MASK
orr x\nxtmp, x\nxtmp, \xvqminus1
msr_s SYS_ZCR_EL1, x\nxtmp // self-synchronising
_for n, 0, 31, _sve_ldr_v \n, \nxbase, \n - 34
_sve_ldr_p 0, \nxbase
_sve_wrffr 0
_for n, 0, 15, _sve_ldr_p \n, \nxbase, \n - 16
ldr w\nxtmp, [\xpfpsr]
msr fpsr, x\nxtmp
ldr w\nxtmp, [\xpfpsr, #4]
msr fpcr, x\nxtmp
.endm
......@@ -20,6 +20,19 @@
#include <asm/ptrace.h>
/*
* Aarch64 has flags for masking: Debug, Asynchronous (serror), Interrupts and
* FIQ exceptions, in the 'daif' register. We mask and unmask them in 'dai'
* order:
* Masking debug exceptions causes all other exceptions to be masked too/
* Masking SError masks irq, but not debug exceptions. Masking irqs has no
* side effects for other flags. Keeping to this order makes it easier for
* entry.S to know which exceptions should be unmasked.
*
* FIQ is never expected, but we mask it when we disable debug exceptions, and
* unmask it at all other times.
*/
/*
* CPU interrupt mask handling.
*/
......@@ -53,12 +66,6 @@ static inline void arch_local_irq_disable(void)
: "memory");
}
#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory")
#define local_fiq_disable() asm("msr daifset, #1" : : : "memory")
#define local_async_enable() asm("msr daifclr, #4" : : : "memory")
#define local_async_disable() asm("msr daifset, #4" : : : "memory")
/*
* Save the current interrupt enable state.
*/
......@@ -89,26 +96,5 @@ static inline int arch_irqs_disabled_flags(unsigned long flags)
{
return flags & PSR_I_BIT;
}
/*
* save and restore debug state
*/
#define local_dbg_save(flags) \
do { \
typecheck(unsigned long, flags); \
asm volatile( \
"mrs %0, daif // local_dbg_save\n" \
"msr daifset, #8" \
: "=r" (flags) : : "memory"); \
} while (0)
#define local_dbg_restore(flags) \
do { \
typecheck(unsigned long, flags); \
asm volatile( \
"msr daif, %0 // local_dbg_restore\n" \
: : "r" (flags) : "memory"); \
} while (0)
#endif
#endif
......@@ -185,7 +185,9 @@
#define CPTR_EL2_TCPAC (1 << 31)
#define CPTR_EL2_TTA (1 << 20)
#define CPTR_EL2_TFP (1 << CPTR_EL2_TFP_SHIFT)
#define CPTR_EL2_DEFAULT 0x000033ff
#define CPTR_EL2_TZ (1 << 8)
#define CPTR_EL2_RES1 0x000032ff /* known RES1 bits in CPTR_EL2 */
#define CPTR_EL2_DEFAULT CPTR_EL2_RES1
/* Hyp Debug Configuration Register bits */
#define MDCR_EL2_TPMS (1 << 14)
......@@ -236,5 +238,6 @@
#define CPACR_EL1_FPEN (3 << 20)
#define CPACR_EL1_TTA (1 << 28)
#define CPACR_EL1_DEFAULT (CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
#endif /* __ARM64_KVM_ARM_H__ */
......@@ -25,6 +25,7 @@
#include <linux/types.h>
#include <linux/kvm_types.h>
#include <asm/cpufeature.h>
#include <asm/fpsimd.h>
#include <asm/kvm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_mmio.h>
......@@ -384,4 +385,14 @@ static inline void __cpu_init_stage2(void)
"PARange is %d bits, unsupported configuration!", parange);
}
/*
* All host FP/SIMD state is restored on guest exit, so nothing needs
* doing here except in the SVE case:
*/
static inline void kvm_fpsimd_flush_cpu_state(void)
{
if (system_supports_sve())
sve_flush_cpu_state();
}
#endif /* __ARM64_KVM_HOST_H__ */
......@@ -61,8 +61,6 @@
* KIMAGE_VADDR - the virtual address of the start of the kernel image
* VA_BITS - the maximum number of bits for virtual addresses.
* VA_START - the first kernel virtual address.
* TASK_SIZE - the maximum size of a user space task.
* TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area.
*/
#define VA_BITS (CONFIG_ARM64_VA_BITS)
#define VA_START (UL(0xffffffffffffffff) - \
......@@ -77,19 +75,6 @@
#define PCI_IO_END (VMEMMAP_START - SZ_2M)
#define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE)
#define FIXADDR_TOP (PCI_IO_START - SZ_2M)
#define TASK_SIZE_64 (UL(1) << VA_BITS)
#ifdef CONFIG_COMPAT
#define TASK_SIZE_32 UL(0x100000000)
#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
TASK_SIZE_32 : TASK_SIZE_64)
#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \
TASK_SIZE_32 : TASK_SIZE_64)
#else
#define TASK_SIZE TASK_SIZE_64
#endif /* CONFIG_COMPAT */
#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 4))
#define KERNEL_START _text
#define KERNEL_END _end
......
......@@ -98,6 +98,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
((pte_val(pte) & (PTE_VALID | PTE_USER | PTE_UXN)) == (PTE_VALID | PTE_UXN))
#define pte_valid_young(pte) \
((pte_val(pte) & (PTE_VALID | PTE_AF)) == (PTE_VALID | PTE_AF))
#define pte_valid_user(pte) \
((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER))
/*
* Could the pte be present in the TLB? We must check mm_tlb_flush_pending
......@@ -107,6 +109,18 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
#define pte_accessible(mm, pte) \
(mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte))
/*
* p??_access_permitted() is true for valid user mappings (subject to the
* write permission check) other than user execute-only which do not have the
* PTE_USER bit set. PROT_NONE mappings do not have the PTE_VALID bit set.
*/
#define pte_access_permitted(pte, write) \
(pte_valid_user(pte) && (!(write) || pte_write(pte)))
#define pmd_access_permitted(pmd, write) \
(pte_access_permitted(pmd_pte(pmd), (write)))
#define pud_access_permitted(pud, write) \
(pte_access_permitted(pud_pte(pud), (write)))
static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
{
pte_val(pte) &= ~pgprot_val(prot);
......
......@@ -19,6 +19,10 @@
#ifndef __ASM_PROCESSOR_H
#define __ASM_PROCESSOR_H
#define TASK_SIZE_64 (UL(1) << VA_BITS)
#ifndef __ASSEMBLY__
/*
* Default implementation of macro that returns current
* instruction pointer ("program counter").
......@@ -37,6 +41,22 @@
#include <asm/ptrace.h>
#include <asm/types.h>
/*
* TASK_SIZE - the maximum size of a user space task.
* TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area.
*/
#ifdef CONFIG_COMPAT
#define TASK_SIZE_32 UL(0x100000000)
#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
TASK_SIZE_32 : TASK_SIZE_64)
#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \
TASK_SIZE_32 : TASK_SIZE_64)
#else
#define TASK_SIZE TASK_SIZE_64
#endif /* CONFIG_COMPAT */
#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 4))
#define STACK_TOP_MAX TASK_SIZE_64
#ifdef CONFIG_COMPAT
#define AARCH32_VECTORS_BASE 0xffff0000
......@@ -85,6 +105,9 @@ struct thread_struct {
unsigned long tp2_value;
#endif
struct fpsimd_state fpsimd_state;
void *sve_state; /* SVE registers, if any */
unsigned int sve_vl; /* SVE vector length */
unsigned int sve_vl_onexec; /* SVE vl after next exec */
unsigned long fault_address; /* fault info */
unsigned long fault_code; /* ESR_EL1 value */
struct debug_info debug; /* debugging */
......@@ -194,4 +217,9 @@ static inline void spin_lock_prefetch(const void *ptr)
int cpu_enable_pan(void *__unused);
int cpu_enable_cache_maint_trap(void *__unused);
/* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */
#define SVE_SET_VL(arg) sve_set_current_vl(arg)
#define SVE_GET_VL() sve_get_current_vl()
#endif /* __ASSEMBLY__ */
#endif /* __ASM_PROCESSOR_H */
......@@ -145,10 +145,14 @@
#define SYS_ID_AA64PFR0_EL1 sys_reg(3, 0, 0, 4, 0)
#define SYS_ID_AA64PFR1_EL1 sys_reg(3, 0, 0, 4, 1)
#define SYS_ID_AA64ZFR0_EL1 sys_reg(3, 0, 0, 4, 4)
#define SYS_ID_AA64DFR0_EL1 sys_reg(3, 0, 0, 5, 0)
#define SYS_ID_AA64DFR1_EL1 sys_reg(3, 0, 0, 5, 1)
#define SYS_ID_AA64AFR0_EL1 sys_reg(3, 0, 0, 5, 4)
#define SYS_ID_AA64AFR1_EL1 sys_reg(3, 0, 0, 5, 5)
#define SYS_ID_AA64ISAR0_EL1 sys_reg(3, 0, 0, 6, 0)
#define SYS_ID_AA64ISAR1_EL1 sys_reg(3, 0, 0, 6, 1)
......@@ -160,6 +164,8 @@
#define SYS_ACTLR_EL1 sys_reg(3, 0, 1, 0, 1)
#define SYS_CPACR_EL1 sys_reg(3, 0, 1, 0, 2)
#define SYS_ZCR_EL1 sys_reg(3, 0, 1, 2, 0)
#define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0)
#define SYS_TTBR1_EL1 sys_reg(3, 0, 2, 0, 1)
#define SYS_TCR_EL1 sys_reg(3, 0, 2, 0, 2)
......@@ -172,6 +178,99 @@
#define SYS_FAR_EL1 sys_reg(3, 0, 6, 0, 0)
#define SYS_PAR_EL1 sys_reg(3, 0, 7, 4, 0)
/*** Statistical Profiling Extension ***/
/* ID registers */
#define SYS_PMSIDR_EL1 sys_reg(3, 0, 9, 9, 7)
#define SYS_PMSIDR_EL1_FE_SHIFT 0
#define SYS_PMSIDR_EL1_FT_SHIFT 1
#define SYS_PMSIDR_EL1_FL_SHIFT 2
#define SYS_PMSIDR_EL1_ARCHINST_SHIFT 3
#define SYS_PMSIDR_EL1_LDS_SHIFT 4
#define SYS_PMSIDR_EL1_ERND_SHIFT 5
#define SYS_PMSIDR_EL1_INTERVAL_SHIFT 8
#define SYS_PMSIDR_EL1_INTERVAL_MASK 0xfUL
#define SYS_PMSIDR_EL1_MAXSIZE_SHIFT 12
#define SYS_PMSIDR_EL1_MAXSIZE_MASK 0xfUL
#define SYS_PMSIDR_EL1_COUNTSIZE_SHIFT 16
#define SYS_PMSIDR_EL1_COUNTSIZE_MASK 0xfUL
#define SYS_PMBIDR_EL1 sys_reg(3, 0, 9, 10, 7)
#define SYS_PMBIDR_EL1_ALIGN_SHIFT 0
#define SYS_PMBIDR_EL1_ALIGN_MASK 0xfU
#define SYS_PMBIDR_EL1_P_SHIFT 4
#define SYS_PMBIDR_EL1_F_SHIFT 5
/* Sampling controls */
#define SYS_PMSCR_EL1 sys_reg(3, 0, 9, 9, 0)
#define SYS_PMSCR_EL1_E0SPE_SHIFT 0
#define SYS_PMSCR_EL1_E1SPE_SHIFT 1
#define SYS_PMSCR_EL1_CX_SHIFT 3
#define SYS_PMSCR_EL1_PA_SHIFT 4
#define SYS_PMSCR_EL1_TS_SHIFT 5
#define SYS_PMSCR_EL1_PCT_SHIFT 6
#define SYS_PMSCR_EL2 sys_reg(3, 4, 9, 9, 0)
#define SYS_PMSCR_EL2_E0HSPE_SHIFT 0
#define SYS_PMSCR_EL2_E2SPE_SHIFT 1
#define SYS_PMSCR_EL2_CX_SHIFT 3
#define SYS_PMSCR_EL2_PA_SHIFT 4
#define SYS_PMSCR_EL2_TS_SHIFT 5
#define SYS_PMSCR_EL2_PCT_SHIFT 6
#define SYS_PMSICR_EL1 sys_reg(3, 0, 9, 9, 2)
#define SYS_PMSIRR_EL1 sys_reg(3, 0, 9, 9, 3)
#define SYS_PMSIRR_EL1_RND_SHIFT 0
#define SYS_PMSIRR_EL1_INTERVAL_SHIFT 8
#define SYS_PMSIRR_EL1_INTERVAL_MASK 0xffffffUL
/* Filtering controls */
#define SYS_PMSFCR_EL1 sys_reg(3, 0, 9, 9, 4)
#define SYS_PMSFCR_EL1_FE_SHIFT 0
#define SYS_PMSFCR_EL1_FT_SHIFT 1
#define SYS_PMSFCR_EL1_FL_SHIFT 2
#define SYS_PMSFCR_EL1_B_SHIFT 16
#define SYS_PMSFCR_EL1_LD_SHIFT 17
#define SYS_PMSFCR_EL1_ST_SHIFT 18
#define SYS_PMSEVFR_EL1 sys_reg(3, 0, 9, 9, 5)
#define SYS_PMSEVFR_EL1_RES0 0x0000ffff00ff0f55UL
#define SYS_PMSLATFR_EL1 sys_reg(3, 0, 9, 9, 6)
#define SYS_PMSLATFR_EL1_MINLAT_SHIFT 0
/* Buffer controls */
#define SYS_PMBLIMITR_EL1 sys_reg(3, 0, 9, 10, 0)
#define SYS_PMBLIMITR_EL1_E_SHIFT 0
#define SYS_PMBLIMITR_EL1_FM_SHIFT 1
#define SYS_PMBLIMITR_EL1_FM_MASK 0x3UL
#define SYS_PMBLIMITR_EL1_FM_STOP_IRQ (0 << SYS_PMBLIMITR_EL1_FM_SHIFT)
#define SYS_PMBPTR_EL1 sys_reg(3, 0, 9, 10, 1)
/* Buffer error reporting */
#define SYS_PMBSR_EL1 sys_reg(3, 0, 9, 10, 3)
#define SYS_PMBSR_EL1_COLL_SHIFT 16
#define SYS_PMBSR_EL1_S_SHIFT 17
#define SYS_PMBSR_EL1_EA_SHIFT 18
#define SYS_PMBSR_EL1_DL_SHIFT 19
#define SYS_PMBSR_EL1_EC_SHIFT 26
#define SYS_PMBSR_EL1_EC_MASK 0x3fUL
#define SYS_PMBSR_EL1_EC_BUF (0x0UL << SYS_PMBSR_EL1_EC_SHIFT)
#define SYS_PMBSR_EL1_EC_FAULT_S1 (0x24UL << SYS_PMBSR_EL1_EC_SHIFT)
#define SYS_PMBSR_EL1_EC_FAULT_S2 (0x25UL << SYS_PMBSR_EL1_EC_SHIFT)
#define SYS_PMBSR_EL1_FAULT_FSC_SHIFT 0
#define SYS_PMBSR_EL1_FAULT_FSC_MASK 0x3fUL
#define SYS_PMBSR_EL1_BUF_BSC_SHIFT 0
#define SYS_PMBSR_EL1_BUF_BSC_MASK 0x3fUL
#define SYS_PMBSR_EL1_BUF_BSC_FULL (0x1UL << SYS_PMBSR_EL1_BUF_BSC_SHIFT)
/*** End of Statistical Profiling Extension ***/
#define SYS_PMINTENSET_EL1 sys_reg(3, 0, 9, 14, 1)
#define SYS_PMINTENCLR_EL1 sys_reg(3, 0, 9, 14, 2)
......@@ -250,6 +349,8 @@
#define SYS_PMCCFILTR_EL0 sys_reg (3, 3, 14, 15, 7)
#define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0)
#define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)
#define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
#define SYS_FPEXC32_EL2 sys_reg(3, 4, 5, 3, 0)
......@@ -318,6 +419,10 @@
#define SCTLR_EL1_CP15BEN (1 << 5)
/* id_aa64isar0 */
#define ID_AA64ISAR0_DP_SHIFT 44
#define ID_AA64ISAR0_SM4_SHIFT 40
#define ID_AA64ISAR0_SM3_SHIFT 36
#define ID_AA64ISAR0_SHA3_SHIFT 32
#define ID_AA64ISAR0_RDM_SHIFT 28
#define ID_AA64ISAR0_ATOMICS_SHIFT 20
#define ID_AA64ISAR0_CRC32_SHIFT 16
......@@ -332,6 +437,7 @@
#define ID_AA64ISAR1_DPB_SHIFT 0
/* id_aa64pfr0 */
#define ID_AA64PFR0_SVE_SHIFT 32
#define ID_AA64PFR0_GIC_SHIFT 24
#define ID_AA64PFR0_ASIMD_SHIFT 20
#define ID_AA64PFR0_FP_SHIFT 16
......@@ -340,6 +446,7 @@
#define ID_AA64PFR0_EL1_SHIFT 4
#define ID_AA64PFR0_EL0_SHIFT 0
#define ID_AA64PFR0_SVE 0x1
#define ID_AA64PFR0_FP_NI 0xf
#define ID_AA64PFR0_FP_SUPPORTED 0x0
#define ID_AA64PFR0_ASIMD_NI 0xf
......@@ -441,6 +548,20 @@
#endif
/*
* The ZCR_ELx_LEN_* definitions intentionally include bits [8:4] which
* are reserved by the SVE architecture for future expansion of the LEN
* field, with compatible semantics.
*/
#define ZCR_ELx_LEN_SHIFT 0
#define ZCR_ELx_LEN_SIZE 9
#define ZCR_ELx_LEN_MASK 0x1ff
#define CPACR_EL1_ZEN_EL1EN (1 << 16) /* enable EL1 access */
#define CPACR_EL1_ZEN_EL0EN (1 << 17) /* enable EL0 access, if EL1EN set */
#define CPACR_EL1_ZEN (CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN)
/* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
#define SYS_MPIDR_SAFE_VAL (1UL << 31)
......
......@@ -63,6 +63,8 @@ struct thread_info {
void arch_setup_new_exec(void);
#define arch_setup_new_exec arch_setup_new_exec
void arch_release_task_struct(struct task_struct *tsk);
#endif
/*
......@@ -92,6 +94,8 @@ void arch_setup_new_exec(void);
#define TIF_RESTORE_SIGMASK 20
#define TIF_SINGLESTEP 21
#define TIF_32BIT 22 /* 32bit process */
#define TIF_SVE 23 /* Scalable Vector Extension in use */
#define TIF_SVE_VL_INHERIT 24 /* Inherit sve_vl_onexec across exec */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
......@@ -105,6 +109,7 @@ void arch_setup_new_exec(void);
#define _TIF_UPROBE (1 << TIF_UPROBE)
#define _TIF_FSCHECK (1 << TIF_FSCHECK)
#define _TIF_32BIT (1 << TIF_32BIT)
#define _TIF_SVE (1 << TIF_SVE)
#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
_TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
......
......@@ -34,9 +34,17 @@ struct undef_hook {
void register_undef_hook(struct undef_hook *hook);
void unregister_undef_hook(struct undef_hook *hook);
void force_signal_inject(int signal, int code, struct pt_regs *regs,
unsigned long address);
void arm64_notify_segfault(struct pt_regs *regs, unsigned long addr);
/*
* Move regs->pc to next instruction and do necessary setup before it
* is executed.
*/
void arm64_skip_faulting_instruction(struct pt_regs *regs, unsigned long size);
static inline int __in_irqentry_text(unsigned long ptr)
{
return ptr >= (unsigned long)&__irqentry_text_start &&
......
......@@ -37,5 +37,11 @@
#define HWCAP_FCMA (1 << 14)
#define HWCAP_LRCPC (1 << 15)
#define HWCAP_DCPOP (1 << 16)
#define HWCAP_SHA3 (1 << 17)
#define HWCAP_SM3 (1 << 18)
#define HWCAP_SM4 (1 << 19)
#define HWCAP_ASIMDDP (1 << 20)
#define HWCAP_SHA512 (1 << 21)
#define HWCAP_SVE (1 << 22)
#endif /* _UAPI__ASM_HWCAP_H */
......@@ -23,6 +23,7 @@
#include <linux/types.h>
#include <asm/hwcap.h>
#include <asm/sigcontext.h>
/*
......@@ -47,7 +48,6 @@
#define PSR_D_BIT 0x00000200
#define PSR_PAN_BIT 0x00400000
#define PSR_UAO_BIT 0x00800000
#define PSR_Q_BIT 0x08000000
#define PSR_V_BIT 0x10000000
#define PSR_C_BIT 0x20000000
#define PSR_Z_BIT 0x40000000
......@@ -64,6 +64,8 @@
#ifndef __ASSEMBLY__
#include <linux/prctl.h>
/*
* User structures for general purpose, floating point and debug registers.
*/
......@@ -91,6 +93,141 @@ struct user_hwdebug_state {
} dbg_regs[16];
};
/* SVE/FP/SIMD state (NT_ARM_SVE) */
struct user_sve_header {
__u32 size; /* total meaningful regset content in bytes */
__u32 max_size; /* maxmium possible size for this thread */
__u16 vl; /* current vector length */
__u16 max_vl; /* maximum possible vector length */
__u16 flags;
__u16 __reserved;
};
/* Definitions for user_sve_header.flags: */
#define SVE_PT_REGS_MASK (1 << 0)
#define SVE_PT_REGS_FPSIMD 0
#define SVE_PT_REGS_SVE SVE_PT_REGS_MASK
/*
* Common SVE_PT_* flags:
* These must be kept in sync with prctl interface in <linux/ptrace.h>
*/
#define SVE_PT_VL_INHERIT (PR_SVE_VL_INHERIT >> 16)
#define SVE_PT_VL_ONEXEC (PR_SVE_SET_VL_ONEXEC >> 16)
/*
* The remainder of the SVE state follows struct user_sve_header. The
* total size of the SVE state (including header) depends on the
* metadata in the header: SVE_PT_SIZE(vq, flags) gives the total size
* of the state in bytes, including the header.
*
* Refer to <asm/sigcontext.h> for details of how to pass the correct
* "vq" argument to these macros.
*/
/* Offset from the start of struct user_sve_header to the register data */
#define SVE_PT_REGS_OFFSET \
((sizeof(struct sve_context) + (SVE_VQ_BYTES - 1)) \
/ SVE_VQ_BYTES * SVE_VQ_BYTES)
/*
* The register data content and layout depends on the value of the
* flags field.
*/
/*
* (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD case:
*
* The payload starts at offset SVE_PT_FPSIMD_OFFSET, and is of type
* struct user_fpsimd_state. Additional data might be appended in the
* future: use SVE_PT_FPSIMD_SIZE(vq, flags) to compute the total size.
* SVE_PT_FPSIMD_SIZE(vq, flags) will never be less than
* sizeof(struct user_fpsimd_state).
*/
#define SVE_PT_FPSIMD_OFFSET SVE_PT_REGS_OFFSET
#define SVE_PT_FPSIMD_SIZE(vq, flags) (sizeof(struct user_fpsimd_state))
/*
* (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE case:
*
* The payload starts at offset SVE_PT_SVE_OFFSET, and is of size
* SVE_PT_SVE_SIZE(vq, flags).
*
* Additional macros describe the contents and layout of the payload.
* For each, SVE_PT_SVE_x_OFFSET(args) is the start offset relative to
* the start of struct user_sve_header, and SVE_PT_SVE_x_SIZE(args) is
* the size in bytes:
*
* x type description
* - ---- -----------
* ZREGS \
* ZREG |
* PREGS | refer to <asm/sigcontext.h>
* PREG |
* FFR /
*
* FPSR uint32_t FPSR
* FPCR uint32_t FPCR
*
* Additional data might be appended in the future.
*/
#define SVE_PT_SVE_ZREG_SIZE(vq) SVE_SIG_ZREG_SIZE(vq)
#define SVE_PT_SVE_PREG_SIZE(vq) SVE_SIG_PREG_SIZE(vq)
#define SVE_PT_SVE_FFR_SIZE(vq) SVE_SIG_FFR_SIZE(vq)
#define SVE_PT_SVE_FPSR_SIZE sizeof(__u32)
#define SVE_PT_SVE_FPCR_SIZE sizeof(__u32)
#define __SVE_SIG_TO_PT(offset) \
((offset) - SVE_SIG_REGS_OFFSET + SVE_PT_REGS_OFFSET)
#define SVE_PT_SVE_OFFSET SVE_PT_REGS_OFFSET
#define SVE_PT_SVE_ZREGS_OFFSET \
__SVE_SIG_TO_PT(SVE_SIG_ZREGS_OFFSET)
#define SVE_PT_SVE_ZREG_OFFSET(vq, n) \
__SVE_SIG_TO_PT(SVE_SIG_ZREG_OFFSET(vq, n))
#define SVE_PT_SVE_ZREGS_SIZE(vq) \
(SVE_PT_SVE_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_PT_SVE_ZREGS_OFFSET)
#define SVE_PT_SVE_PREGS_OFFSET(vq) \
__SVE_SIG_TO_PT(SVE_SIG_PREGS_OFFSET(vq))
#define SVE_PT_SVE_PREG_OFFSET(vq, n) \
__SVE_SIG_TO_PT(SVE_SIG_PREG_OFFSET(vq, n))
#define SVE_PT_SVE_PREGS_SIZE(vq) \
(SVE_PT_SVE_PREG_OFFSET(vq, SVE_NUM_PREGS) - \
SVE_PT_SVE_PREGS_OFFSET(vq))
#define SVE_PT_SVE_FFR_OFFSET(vq) \
__SVE_SIG_TO_PT(SVE_SIG_FFR_OFFSET(vq))
#define SVE_PT_SVE_FPSR_OFFSET(vq) \
((SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq) + \
(SVE_VQ_BYTES - 1)) \
/ SVE_VQ_BYTES * SVE_VQ_BYTES)
#define SVE_PT_SVE_FPCR_OFFSET(vq) \
(SVE_PT_SVE_FPSR_OFFSET(vq) + SVE_PT_SVE_FPSR_SIZE)
/*
* Any future extension appended after FPCR must be aligned to the next
* 128-bit boundary.
*/
#define SVE_PT_SVE_SIZE(vq, flags) \
((SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE \
- SVE_PT_SVE_OFFSET + (SVE_VQ_BYTES - 1)) \
/ SVE_VQ_BYTES * SVE_VQ_BYTES)
#define SVE_PT_SIZE(vq, flags) \
(((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \
SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \
: SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags))
#endif /* __ASSEMBLY__ */
#endif /* _UAPI__ASM_PTRACE_H */
......@@ -17,6 +17,8 @@
#ifndef _UAPI__ASM_SIGCONTEXT_H
#define _UAPI__ASM_SIGCONTEXT_H
#ifndef __ASSEMBLY__
#include <linux/types.h>
/*
......@@ -42,10 +44,11 @@ struct sigcontext {
*
* 0x210 fpsimd_context
* 0x10 esr_context
* 0x8a0 sve_context (vl <= 64) (optional)
* 0x20 extra_context (optional)
* 0x10 terminator (null _aarch64_ctx)
*
* 0xdb0 (reserved for future allocation)
* 0x510 (reserved for future allocation)
*
* New records that can exceed this space need to be opt-in for userspace, so
* that an expanded signal frame is not generated unexpectedly. The mechanism
......@@ -117,4 +120,119 @@ struct extra_context {
__u32 __reserved[3];
};
#define SVE_MAGIC 0x53564501
struct sve_context {
struct _aarch64_ctx head;
__u16 vl;
__u16 __reserved[3];
};
#endif /* !__ASSEMBLY__ */
/*
* The SVE architecture leaves space for future expansion of the
* vector length beyond its initial architectural limit of 2048 bits
* (16 quadwords).
*
* See linux/Documentation/arm64/sve.txt for a description of the VL/VQ
* terminology.
*/
#define SVE_VQ_BYTES 16 /* number of bytes per quadword */
#define SVE_VQ_MIN 1
#define SVE_VQ_MAX 512
#define SVE_VL_MIN (SVE_VQ_MIN * SVE_VQ_BYTES)
#define SVE_VL_MAX (SVE_VQ_MAX * SVE_VQ_BYTES)
#define SVE_NUM_ZREGS 32
#define SVE_NUM_PREGS 16
#define sve_vl_valid(vl) \
((vl) % SVE_VQ_BYTES == 0 && (vl) >= SVE_VL_MIN && (vl) <= SVE_VL_MAX)
#define sve_vq_from_vl(vl) ((vl) / SVE_VQ_BYTES)
#define sve_vl_from_vq(vq) ((vq) * SVE_VQ_BYTES)
/*
* If the SVE registers are currently live for the thread at signal delivery,
* sve_context.head.size >=
* SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl))
* and the register data may be accessed using the SVE_SIG_*() macros.
*
* If sve_context.head.size <
* SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)),
* the SVE registers were not live for the thread and no register data
* is included: in this case, the SVE_SIG_*() macros should not be
* used except for this check.
*
* The same convention applies when returning from a signal: a caller
* will need to remove or resize the sve_context block if it wants to
* make the SVE registers live when they were previously non-live or
* vice-versa. This may require the the caller to allocate fresh
* memory and/or move other context blocks in the signal frame.
*
* Changing the vector length during signal return is not permitted:
* sve_context.vl must equal the thread's current vector length when
* doing a sigreturn.
*
*
* Note: for all these macros, the "vq" argument denotes the SVE
* vector length in quadwords (i.e., units of 128 bits).
*
* The correct way to obtain vq is to use sve_vq_from_vl(vl). The
* result is valid if and only if sve_vl_valid(vl) is true. This is
* guaranteed for a struct sve_context written by the kernel.
*
*
* Additional macros describe the contents and layout of the payload.
* For each, SVE_SIG_x_OFFSET(args) is the start offset relative to
* the start of struct sve_context, and SVE_SIG_x_SIZE(args) is the
* size in bytes:
*
* x type description
* - ---- -----------
* REGS the entire SVE context
*
* ZREGS __uint128_t[SVE_NUM_ZREGS][vq] all Z-registers
* ZREG __uint128_t[vq] individual Z-register Zn
*
* PREGS uint16_t[SVE_NUM_PREGS][vq] all P-registers
* PREG uint16_t[vq] individual P-register Pn
*
* FFR uint16_t[vq] first-fault status register
*
* Additional data might be appended in the future.
*/
#define SVE_SIG_ZREG_SIZE(vq) ((__u32)(vq) * SVE_VQ_BYTES)
#define SVE_SIG_PREG_SIZE(vq) ((__u32)(vq) * (SVE_VQ_BYTES / 8))
#define SVE_SIG_FFR_SIZE(vq) SVE_SIG_PREG_SIZE(vq)
#define SVE_SIG_REGS_OFFSET \
((sizeof(struct sve_context) + (SVE_VQ_BYTES - 1)) \
/ SVE_VQ_BYTES * SVE_VQ_BYTES)
#define SVE_SIG_ZREGS_OFFSET SVE_SIG_REGS_OFFSET
#define SVE_SIG_ZREG_OFFSET(vq, n) \
(SVE_SIG_ZREGS_OFFSET + SVE_SIG_ZREG_SIZE(vq) * (n))
#define SVE_SIG_ZREGS_SIZE(vq) \
(SVE_SIG_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_SIG_ZREGS_OFFSET)
#define SVE_SIG_PREGS_OFFSET(vq) \
(SVE_SIG_ZREGS_OFFSET + SVE_SIG_ZREGS_SIZE(vq))
#define SVE_SIG_PREG_OFFSET(vq, n) \
(SVE_SIG_PREGS_OFFSET(vq) + SVE_SIG_PREG_SIZE(vq) * (n))
#define SVE_SIG_PREGS_SIZE(vq) \
(SVE_SIG_PREG_OFFSET(vq, SVE_NUM_PREGS) - SVE_SIG_PREGS_OFFSET(vq))
#define SVE_SIG_FFR_OFFSET(vq) \
(SVE_SIG_PREGS_OFFSET(vq) + SVE_SIG_PREGS_SIZE(vq))
#define SVE_SIG_REGS_SIZE(vq) \
(SVE_SIG_FFR_OFFSET(vq) + SVE_SIG_FFR_SIZE(vq) - SVE_SIG_REGS_OFFSET)
#define SVE_SIG_CONTEXT_SIZE(vq) (SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq))
#endif /* _UAPI__ASM_SIGCONTEXT_H */
......@@ -11,8 +11,6 @@ CFLAGS_REMOVE_ftrace.o = -pg
CFLAGS_REMOVE_insn.o = -pg
CFLAGS_REMOVE_return_address.o = -pg
CFLAGS_setup.o = -DUTS_MACHINE='"$(UTS_MACHINE)"'
# Object file lists.
arm64-obj-y := debug-monitors.o entry.o irq.o fpsimd.o \
entry-fpsimd.o process.o ptrace.o setup.o signal.o \
......
......@@ -228,15 +228,7 @@ static int emulation_proc_handler(struct ctl_table *table, int write,
return ret;
}
static struct ctl_table ctl_abi[] = {
{
.procname = "abi",
.mode = 0555,
},
{ }
};
static void __init register_insn_emulation_sysctl(struct ctl_table *table)
static void __init register_insn_emulation_sysctl(void)
{
unsigned long flags;
int i = 0;
......@@ -262,8 +254,7 @@ static void __init register_insn_emulation_sysctl(struct ctl_table *table)
}
raw_spin_unlock_irqrestore(&insn_emulation_lock, flags);
table->child = insns_sysctl;
register_sysctl_table(table);
register_sysctl("abi", insns_sysctl);
}
/*
......@@ -431,7 +422,7 @@ static int swp_handler(struct pt_regs *regs, u32 instr)
pr_warn_ratelimited("\"%s\" (%ld) uses obsolete SWP{B} instruction at 0x%llx\n",
current->comm, (unsigned long)current->pid, regs->pc);
regs->pc += 4;
arm64_skip_faulting_instruction(regs, 4);
return 0;
fault:
......@@ -512,7 +503,7 @@ static int cp15barrier_handler(struct pt_regs *regs, u32 instr)
pr_warn_ratelimited("\"%s\" (%ld) uses deprecated CP15 Barrier instruction at 0x%llx\n",
current->comm, (unsigned long)current->pid, regs->pc);
regs->pc += 4;
arm64_skip_faulting_instruction(regs, 4);
return 0;
}
......@@ -586,14 +577,14 @@ static int compat_setend_handler(struct pt_regs *regs, u32 big_endian)
static int a32_setend_handler(struct pt_regs *regs, u32 instr)
{
int rc = compat_setend_handler(regs, (instr >> 9) & 1);
regs->pc += 4;
arm64_skip_faulting_instruction(regs, 4);
return rc;
}
static int t16_setend_handler(struct pt_regs *regs, u32 instr)
{
int rc = compat_setend_handler(regs, (instr >> 3) & 1);
regs->pc += 2;
arm64_skip_faulting_instruction(regs, 2);
return rc;
}
......@@ -644,7 +635,7 @@ static int __init armv8_deprecated_init(void)
cpuhp_setup_state_nocalls(CPUHP_AP_ARM64_ISNDEP_STARTING,
"arm64/isndep:starting",
run_all_insn_set_hw_mode, NULL);
register_insn_emulation_sysctl(ctl_abi);
register_insn_emulation_sysctl();
return 0;
}
......
This diff is collapsed.
......@@ -19,6 +19,7 @@
#include <asm/cpu.h>
#include <asm/cputype.h>
#include <asm/cpufeature.h>
#include <asm/fpsimd.h>
#include <linux/bitops.h>
#include <linux/bug.h>
......@@ -69,6 +70,12 @@ static const char *const hwcap_str[] = {
"fcma",
"lrcpc",
"dcpop",
"sha3",
"sm3",
"sm4",
"asimddp",
"sha512",
"sve",
NULL
};
......@@ -326,6 +333,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
info->reg_id_aa64mmfr2 = read_cpuid(ID_AA64MMFR2_EL1);
info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1);
info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1);
info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1);
/* Update the 32bit ID registers only if AArch32 is implemented */
if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
......@@ -348,6 +356,10 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
info->reg_mvfr2 = read_cpuid(MVFR2_EL1);
}
if (IS_ENABLED(CONFIG_ARM64_SVE) &&
id_aa64pfr0_sve(info->reg_id_aa64pfr0))
info->reg_zcr = read_zcr_features();
cpuinfo_detect_icache_policy(info);
}
......
......@@ -30,6 +30,7 @@
#include <asm/cpufeature.h>
#include <asm/cputype.h>
#include <asm/daifflags.h>
#include <asm/debug-monitors.h>
#include <asm/system_misc.h>
......@@ -46,9 +47,9 @@ u8 debug_monitors_arch(void)
static void mdscr_write(u32 mdscr)
{
unsigned long flags;
local_dbg_save(flags);
flags = local_daif_save();
write_sysreg(mdscr, mdscr_el1);
local_dbg_restore(flags);
local_daif_restore(flags);
}
NOKPROBE_SYMBOL(mdscr_write);
......
......@@ -41,3 +41,20 @@ ENTRY(fpsimd_load_state)
fpsimd_restore x0, 8
ret
ENDPROC(fpsimd_load_state)
#ifdef CONFIG_ARM64_SVE
ENTRY(sve_save_state)
sve_save 0, x1, 2
ret
ENDPROC(sve_save_state)
ENTRY(sve_load_state)
sve_load 0, x1, x2, 3
ret
ENDPROC(sve_load_state)
ENTRY(sve_get_vl)
_sve_rdvl 0, 1
ret
ENDPROC(sve_get_vl)
#endif /* CONFIG_ARM64_SVE */
......@@ -108,13 +108,8 @@ ENTRY(_mcount)
mcount_get_lr x1 // function's lr (= parent's pc)
blr x2 // (*ftrace_trace_function)(pc, lr);
#ifndef CONFIG_FUNCTION_GRAPH_TRACER
skip_ftrace_call: // return;
mcount_exit // }
#else
mcount_exit // return;
// }
skip_ftrace_call:
skip_ftrace_call: // }
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
ldr_l x2, ftrace_graph_return
cmp x0, x2 // if ((ftrace_graph_return
b.ne ftrace_graph_caller // != ftrace_stub)
......@@ -123,9 +118,8 @@ skip_ftrace_call:
adr_l x0, ftrace_graph_entry_stub // != ftrace_graph_entry_stub))
cmp x0, x2
b.ne ftrace_graph_caller // ftrace_graph_caller();
mcount_exit
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
mcount_exit
ENDPROC(_mcount)
#else /* CONFIG_DYNAMIC_FTRACE */
......
......@@ -28,7 +28,7 @@
#include <asm/errno.h>
#include <asm/esr.h>
#include <asm/irq.h>
#include <asm/memory.h>
#include <asm/processor.h>
#include <asm/ptrace.h>
#include <asm/thread_info.h>
#include <asm/asm-uaccess.h>
......@@ -221,6 +221,8 @@ alternative_else_nop_endif
.macro kernel_exit, el
.if \el != 0
disable_daif
/* Restore the task's original addr_limit. */
ldr x20, [sp, #S_ORIG_ADDR_LIMIT]
str x20, [tsk, #TSK_TI_ADDR_LIMIT]
......@@ -373,18 +375,18 @@ ENTRY(vectors)
kernel_ventry el1_sync // Synchronous EL1h
kernel_ventry el1_irq // IRQ EL1h
kernel_ventry el1_fiq_invalid // FIQ EL1h
kernel_ventry el1_error_invalid // Error EL1h
kernel_ventry el1_error // Error EL1h
kernel_ventry el0_sync // Synchronous 64-bit EL0
kernel_ventry el0_irq // IRQ 64-bit EL0
kernel_ventry el0_fiq_invalid // FIQ 64-bit EL0
kernel_ventry el0_error_invalid // Error 64-bit EL0
kernel_ventry el0_error // Error 64-bit EL0
#ifdef CONFIG_COMPAT
kernel_ventry el0_sync_compat // Synchronous 32-bit EL0
kernel_ventry el0_irq_compat // IRQ 32-bit EL0
kernel_ventry el0_fiq_invalid_compat // FIQ 32-bit EL0
kernel_ventry el0_error_invalid_compat // Error 32-bit EL0
kernel_ventry el0_error_compat // Error 32-bit EL0
#else
kernel_ventry el0_sync_invalid // Synchronous 32-bit EL0
kernel_ventry el0_irq_invalid // IRQ 32-bit EL0
......@@ -453,10 +455,6 @@ ENDPROC(el0_error_invalid)
el0_fiq_invalid_compat:
inv_entry 0, BAD_FIQ, 32
ENDPROC(el0_fiq_invalid_compat)
el0_error_invalid_compat:
inv_entry 0, BAD_ERROR, 32
ENDPROC(el0_error_invalid_compat)
#endif
el1_sync_invalid:
......@@ -508,24 +506,18 @@ el1_da:
* Data abort handling
*/
mrs x3, far_el1
enable_dbg
// re-enable interrupts if they were enabled in the aborted context
tbnz x23, #7, 1f // PSR_I_BIT
enable_irq
1:
inherit_daif pstate=x23, tmp=x2
clear_address_tag x0, x3
mov x2, sp // struct pt_regs
bl do_mem_abort
// disable interrupts before pulling preserved data off the stack
disable_irq
kernel_exit 1
el1_sp_pc:
/*
* Stack or PC alignment exception handling
*/
mrs x0, far_el1
enable_dbg
inherit_daif pstate=x23, tmp=x2
mov x2, sp
bl do_sp_pc_abort
ASM_BUG()
......@@ -533,7 +525,7 @@ el1_undef:
/*
* Undefined instruction
*/
enable_dbg
inherit_daif pstate=x23, tmp=x2
mov x0, sp
bl do_undefinstr
ASM_BUG()
......@@ -550,7 +542,7 @@ el1_dbg:
kernel_exit 1
el1_inv:
// TODO: add support for undefined instructions in kernel mode
enable_dbg
inherit_daif pstate=x23, tmp=x2
mov x0, sp
mov x2, x1
mov x1, #BAD_SYNC
......@@ -561,7 +553,7 @@ ENDPROC(el1_sync)
.align 6
el1_irq:
kernel_entry 1
enable_dbg
enable_da_f
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
#endif
......@@ -607,6 +599,8 @@ el0_sync:
b.eq el0_ia
cmp x24, #ESR_ELx_EC_FP_ASIMD // FP/ASIMD access
b.eq el0_fpsimd_acc
cmp x24, #ESR_ELx_EC_SVE // SVE access
b.eq el0_sve_acc
cmp x24, #ESR_ELx_EC_FP_EXC64 // FP/ASIMD exception
b.eq el0_fpsimd_exc
cmp x24, #ESR_ELx_EC_SYS64 // configurable trap
......@@ -658,6 +652,7 @@ el0_svc_compat:
/*
* AArch32 syscall handling
*/
ldr x16, [tsk, #TSK_TI_FLAGS] // load thread flags
adrp stbl, compat_sys_call_table // load compat syscall table pointer
mov wscno, w7 // syscall number in w7 (r7)
mov wsc_nr, #__NR_compat_syscalls
......@@ -667,6 +662,10 @@ el0_svc_compat:
el0_irq_compat:
kernel_entry 0, 32
b el0_irq_naked
el0_error_compat:
kernel_entry 0, 32
b el0_error_naked
#endif
el0_da:
......@@ -674,8 +673,7 @@ el0_da:
* Data abort handling
*/
mrs x26, far_el1
// enable interrupts before calling the main handler
enable_dbg_and_irq
enable_daif
ct_user_exit
clear_address_tag x0, x26
mov x1, x25
......@@ -687,8 +685,7 @@ el0_ia:
* Instruction abort handling
*/
mrs x26, far_el1
// enable interrupts before calling the main handler
enable_dbg_and_irq
enable_daif
ct_user_exit
mov x0, x26
mov x1, x25
......@@ -699,17 +696,27 @@ el0_fpsimd_acc:
/*
* Floating Point or Advanced SIMD access
*/
enable_dbg
enable_daif
ct_user_exit
mov x0, x25
mov x1, sp
bl do_fpsimd_acc
b ret_to_user
el0_sve_acc:
/*
* Scalable Vector Extension access
*/
enable_daif
ct_user_exit
mov x0, x25
mov x1, sp
bl do_sve_acc
b ret_to_user
el0_fpsimd_exc:
/*
* Floating Point or Advanced SIMD exception
* Floating Point, Advanced SIMD or SVE exception
*/
enable_dbg
enable_daif
ct_user_exit
mov x0, x25
mov x1, sp
......@@ -720,8 +727,7 @@ el0_sp_pc:
* Stack or PC alignment exception handling
*/
mrs x26, far_el1
// enable interrupts before calling the main handler
enable_dbg_and_irq
enable_daif
ct_user_exit
mov x0, x26
mov x1, x25
......@@ -732,8 +738,7 @@ el0_undef:
/*
* Undefined instruction
*/
// enable interrupts before calling the main handler
enable_dbg_and_irq
enable_daif
ct_user_exit
mov x0, sp
bl do_undefinstr
......@@ -742,7 +747,7 @@ el0_sys:
/*
* System instructions, for trapped cache maintenance instructions
*/
enable_dbg_and_irq
enable_daif
ct_user_exit
mov x0, x25
mov x1, sp
......@@ -757,11 +762,11 @@ el0_dbg:
mov x1, x25
mov x2, sp
bl do_debug_exception
enable_dbg
enable_daif
ct_user_exit
b ret_to_user
el0_inv:
enable_dbg
enable_daif
ct_user_exit
mov x0, sp
mov x1, #BAD_SYNC
......@@ -774,7 +779,7 @@ ENDPROC(el0_sync)
el0_irq:
kernel_entry 0
el0_irq_naked:
enable_dbg
enable_da_f
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
#endif
......@@ -788,12 +793,34 @@ el0_irq_naked:
b ret_to_user
ENDPROC(el0_irq)
el1_error:
kernel_entry 1
mrs x1, esr_el1
enable_dbg
mov x0, sp
bl do_serror
kernel_exit 1
ENDPROC(el1_error)
el0_error:
kernel_entry 0
el0_error_naked:
mrs x1, esr_el1
enable_dbg
mov x0, sp
bl do_serror
enable_daif
ct_user_exit
b ret_to_user
ENDPROC(el0_error)
/*
* This is the fast syscall return path. We do as little as possible here,
* and this includes saving x0 back into the kernel stack.
*/
ret_fast_syscall:
disable_irq // disable interrupts
disable_daif
str x0, [sp, #S_X0] // returned x0
ldr x1, [tsk, #TSK_TI_FLAGS] // re-check for syscall tracing
and x2, x1, #_TIF_SYSCALL_WORK
......@@ -803,7 +830,7 @@ ret_fast_syscall:
enable_step_tsk x1, x2
kernel_exit 0
ret_fast_syscall_trace:
enable_irq // enable interrupts
enable_daif
b __sys_trace_return_skipped // we already saved x0
/*
......@@ -821,7 +848,7 @@ work_pending:
* "slow" syscall return path.
*/
ret_to_user:
disable_irq // disable interrupts
disable_daif
ldr x1, [tsk, #TSK_TI_FLAGS]
and x2, x1, #_TIF_WORK_MASK
cbnz x2, work_pending
......@@ -835,16 +862,37 @@ ENDPROC(ret_to_user)
*/
.align 6
el0_svc:
ldr x16, [tsk, #TSK_TI_FLAGS] // load thread flags
adrp stbl, sys_call_table // load syscall table pointer
mov wscno, w8 // syscall number in w8
mov wsc_nr, #__NR_syscalls
#ifdef CONFIG_ARM64_SVE
alternative_if_not ARM64_SVE
b el0_svc_naked
alternative_else_nop_endif
tbz x16, #TIF_SVE, el0_svc_naked // Skip unless TIF_SVE set:
bic x16, x16, #_TIF_SVE // discard SVE state
str x16, [tsk, #TSK_TI_FLAGS]
/*
* task_fpsimd_load() won't be called to update CPACR_EL1 in
* ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
* happens if a context switch or kernel_neon_begin() or context
* modification (sigreturn, ptrace) intervenes.
* So, ensure that CPACR_EL1 is already correct for the fast-path case:
*/
mrs x9, cpacr_el1
bic x9, x9, #CPACR_EL1_ZEN_EL0EN // disable SVE for el0
msr cpacr_el1, x9 // synchronised by eret to el0
#endif
el0_svc_naked: // compat entry point
stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
enable_dbg_and_irq
enable_daif
ct_user_exit 1
ldr x16, [tsk, #TSK_TI_FLAGS] // check for syscall hooks
tst x16, #_TIF_SYSCALL_WORK
tst x16, #_TIF_SYSCALL_WORK // check for syscall hooks
b.ne __sys_trace
cmp wscno, wsc_nr // check upper syscall limit
b.hs ni_sys
......
This diff is collapsed.
......@@ -480,14 +480,21 @@ set_hcr:
/* Statistical profiling */
ubfx x0, x1, #32, #4 // Check ID_AA64DFR0_EL1 PMSVer
cbz x0, 6f // Skip if SPE not present
cbnz x2, 5f // VHE?
cbz x0, 7f // Skip if SPE not present
cbnz x2, 6f // VHE?
mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,
and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
cbnz x4, 5f // then permit sampling of physical
mov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
1 << SYS_PMSCR_EL2_PA_SHIFT)
msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
5:
mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
orr x3, x3, x1 // If we don't have VHE, then
b 6f // use EL1&0 translation.
5: // For VHE, use EL2 translation
b 7f // use EL1&0 translation.
6: // For VHE, use EL2 translation
orr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1
6:
7:
msr mdcr_el2, x3 // Configure debug traps
/* Stage-2 translation */
......@@ -517,8 +524,19 @@ CPU_LE( movk x0, #0x30d0, lsl #16 ) // Clear EE and E0E on LE systems
mov x0, #0x33ff
msr cptr_el2, x0 // Disable copro. traps to EL2
/* SVE register access */
mrs x1, id_aa64pfr0_el1
ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
cbz x1, 7f
bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
msr cptr_el2, x0 // Disable copro. traps to EL2
isb
mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
msr_s SYS_ZCR_EL2, x1 // length for EL1.
/* Hypervisor stub */
adr_l x0, __hyp_stub_vectors
7: adr_l x0, __hyp_stub_vectors
msr vbar_el2, x0
/* spsr */
......
......@@ -27,6 +27,7 @@
#include <asm/barrier.h>
#include <asm/cacheflush.h>
#include <asm/cputype.h>
#include <asm/daifflags.h>
#include <asm/irqflags.h>
#include <asm/kexec.h>
#include <asm/memory.h>
......@@ -285,7 +286,7 @@ int swsusp_arch_suspend(void)
return -EBUSY;
}
local_dbg_save(flags);
flags = local_daif_save();
if (__cpu_suspend_enter(&state)) {
/* make the crash dump kernel image visible/saveable */
......@@ -315,7 +316,7 @@ int swsusp_arch_suspend(void)
__cpu_suspend_exit();
}
local_dbg_restore(flags);
local_daif_restore(flags);
return ret;
}
......
......@@ -25,8 +25,7 @@
*/
void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count)
{
while (count && (!IS_ALIGNED((unsigned long)from, 8) ||
!IS_ALIGNED((unsigned long)to, 8))) {
while (count && !IS_ALIGNED((unsigned long)from, 8)) {
*(u8 *)to = __raw_readb(from);
from++;
to++;
......@@ -54,23 +53,22 @@ EXPORT_SYMBOL(__memcpy_fromio);
*/
void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count)
{
while (count && (!IS_ALIGNED((unsigned long)to, 8) ||
!IS_ALIGNED((unsigned long)from, 8))) {
__raw_writeb(*(volatile u8 *)from, to);
while (count && !IS_ALIGNED((unsigned long)to, 8)) {
__raw_writeb(*(u8 *)from, to);
from++;
to++;
count--;
}
while (count >= 8) {
__raw_writeq(*(volatile u64 *)from, to);
__raw_writeq(*(u64 *)from, to);
from += 8;
to += 8;
count -= 8;
}
while (count) {
__raw_writeb(*(volatile u8 *)from, to);
__raw_writeb(*(u8 *)from, to);
from++;
to++;
count--;
......
......@@ -18,6 +18,7 @@
#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
#include <asm/daifflags.h>
#include <asm/memory.h>
#include <asm/mmu.h>
#include <asm/mmu_context.h>
......@@ -195,8 +196,7 @@ void machine_kexec(struct kimage *kimage)
pr_info("Bye!\n");
/* Disable all DAIF exceptions. */
asm volatile ("msr daifset, #0xf" : : : "memory");
local_daif_mask();
/*
* cpu_soft_restart will shutdown the MMU, disable data caches, then
......
......@@ -49,6 +49,7 @@
#include <linux/notifier.h>
#include <trace/events/power.h>
#include <linux/percpu.h>
#include <linux/thread_info.h>
#include <asm/alternative.h>
#include <asm/compat.h>
......@@ -170,6 +171,39 @@ void machine_restart(char *cmd)
while (1);
}
static void print_pstate(struct pt_regs *regs)
{
u64 pstate = regs->pstate;
if (compat_user_mode(regs)) {
printk("pstate: %08llx (%c%c%c%c %c %s %s %c%c%c)\n",
pstate,
pstate & COMPAT_PSR_N_BIT ? 'N' : 'n',
pstate & COMPAT_PSR_Z_BIT ? 'Z' : 'z',
pstate & COMPAT_PSR_C_BIT ? 'C' : 'c',
pstate & COMPAT_PSR_V_BIT ? 'V' : 'v',
pstate & COMPAT_PSR_Q_BIT ? 'Q' : 'q',
pstate & COMPAT_PSR_T_BIT ? "T32" : "A32",
pstate & COMPAT_PSR_E_BIT ? "BE" : "LE",
pstate & COMPAT_PSR_A_BIT ? 'A' : 'a',
pstate & COMPAT_PSR_I_BIT ? 'I' : 'i',
pstate & COMPAT_PSR_F_BIT ? 'F' : 'f');
} else {
printk("pstate: %08llx (%c%c%c%c %c%c%c%c %cPAN %cUAO)\n",
pstate,
pstate & PSR_N_BIT ? 'N' : 'n',
pstate & PSR_Z_BIT ? 'Z' : 'z',
pstate & PSR_C_BIT ? 'C' : 'c',
pstate & PSR_V_BIT ? 'V' : 'v',
pstate & PSR_D_BIT ? 'D' : 'd',
pstate & PSR_A_BIT ? 'A' : 'a',
pstate & PSR_I_BIT ? 'I' : 'i',
pstate & PSR_F_BIT ? 'F' : 'f',
pstate & PSR_PAN_BIT ? '+' : '-',
pstate & PSR_UAO_BIT ? '+' : '-');
}
}
void __show_regs(struct pt_regs *regs)
{
int i, top_reg;
......@@ -186,10 +220,9 @@ void __show_regs(struct pt_regs *regs)
}
show_regs_print_info(KERN_DEFAULT);
print_symbol("PC is at %s\n", instruction_pointer(regs));
print_symbol("LR is at %s\n", lr);
printk("pc : [<%016llx>] lr : [<%016llx>] pstate: %08llx\n",
regs->pc, lr, regs->pstate);
print_pstate(regs);
print_symbol("pc : %s\n", regs->pc);
print_symbol("lr : %s\n", lr);
printk("sp : %016llx\n", sp);
i = top_reg;
......@@ -241,11 +274,27 @@ void release_thread(struct task_struct *dead_task)
{
}
void arch_release_task_struct(struct task_struct *tsk)
{
fpsimd_release_task(tsk);
}
/*
* src and dst may temporarily have aliased sve_state after task_struct
* is copied. We cannot fix this properly here, because src may have
* live SVE state and dst's thread_info may not exist yet, so tweaking
* either src's or dst's TIF_SVE is not safe.
*
* The unaliasing is done in copy_thread() instead. This works because
* dst is not schedulable or traceable until both of these functions
* have been called.
*/
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
{
if (current->mm)
fpsimd_preserve_current_state();
*dst = *src;
return 0;
}
......@@ -258,6 +307,13 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));
/*
* Unalias p->thread.sve_state (if any) from the parent task
* and disable discard SVE state for p:
*/
clear_tsk_thread_flag(p, TIF_SVE);
p->thread.sve_state = NULL;
if (likely(!(p->flags & PF_KTHREAD))) {
*childregs = *current_pt_regs();
childregs->regs[0] = 0;
......
......@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/init.h>
#include <linux/signal.h>
#include <linux/string.h>
#include <linux/uaccess.h>
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>
......@@ -40,6 +41,7 @@
#include <linux/elf.h>
#include <asm/compat.h>
#include <asm/cpufeature.h>
#include <asm/debug-monitors.h>
#include <asm/pgtable.h>
#include <asm/stacktrace.h>
......@@ -618,17 +620,56 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset,
/*
* TODO: update fp accessors for lazy context switching (sync/flush hwstate)
*/
static int fpr_get(struct task_struct *target, const struct user_regset *regset,
static int __fpr_get(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
void *kbuf, void __user *ubuf, unsigned int start_pos)
{
struct user_fpsimd_state *uregs;
sve_sync_to_fpsimd(target);
uregs = &target->thread.fpsimd_state.user_fpsimd;
return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs,
start_pos, start_pos + sizeof(*uregs));
}
static int fpr_get(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
if (target == current)
fpsimd_preserve_current_state();
return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, 0, -1);
return __fpr_get(target, regset, pos, count, kbuf, ubuf, 0);
}
static int __fpr_set(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf,
unsigned int start_pos)
{
int ret;
struct user_fpsimd_state newstate;
/*
* Ensure target->thread.fpsimd_state is up to date, so that a
* short copyin can't resurrect stale data.
*/
sve_sync_to_fpsimd(target);
newstate = target->thread.fpsimd_state.user_fpsimd;
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate,
start_pos, start_pos + sizeof(newstate));
if (ret)
return ret;
target->thread.fpsimd_state.user_fpsimd = newstate;
return ret;
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
......@@ -636,15 +677,14 @@ static int fpr_set(struct task_struct *target, const struct user_regset *regset,
const void *kbuf, const void __user *ubuf)
{
int ret;
struct user_fpsimd_state newstate =
target->thread.fpsimd_state.user_fpsimd;
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0, -1);
ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, 0);
if (ret)
return ret;
target->thread.fpsimd_state.user_fpsimd = newstate;
sve_sync_from_fpsimd_zeropad(target);
fpsimd_flush_task_state(target);
return ret;
}
......@@ -702,6 +742,215 @@ static int system_call_set(struct task_struct *target,
return ret;
}
#ifdef CONFIG_ARM64_SVE
static void sve_init_header_from_task(struct user_sve_header *header,
struct task_struct *target)
{
unsigned int vq;
memset(header, 0, sizeof(*header));
header->flags = test_tsk_thread_flag(target, TIF_SVE) ?
SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD;
if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
header->flags |= SVE_PT_VL_INHERIT;
header->vl = target->thread.sve_vl;
vq = sve_vq_from_vl(header->vl);
header->max_vl = sve_max_vl;
if (WARN_ON(!sve_vl_valid(sve_max_vl)))
header->max_vl = header->vl;
header->size = SVE_PT_SIZE(vq, header->flags);
header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl),
SVE_PT_REGS_SVE);
}
static unsigned int sve_size_from_header(struct user_sve_header const *header)
{
return ALIGN(header->size, SVE_VQ_BYTES);
}
static unsigned int sve_get_size(struct task_struct *target,
const struct user_regset *regset)
{
struct user_sve_header header;
if (!system_supports_sve())
return 0;
sve_init_header_from_task(&header, target);
return sve_size_from_header(&header);
}
static int sve_get(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
int ret;
struct user_sve_header header;
unsigned int vq;
unsigned long start, end;
if (!system_supports_sve())
return -EINVAL;
/* Header */
sve_init_header_from_task(&header, target);
vq = sve_vq_from_vl(header.vl);
ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &header,
0, sizeof(header));
if (ret)
return ret;
if (target == current)
fpsimd_preserve_current_state();
/* Registers: FPSIMD-only case */
BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header));
if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD)
return __fpr_get(target, regset, pos, count, kbuf, ubuf,
SVE_PT_FPSIMD_OFFSET);
/* Otherwise: full SVE case */
BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
start = SVE_PT_SVE_OFFSET;
end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
target->thread.sve_state,
start, end);
if (ret)
return ret;
start = end;
end = SVE_PT_SVE_FPSR_OFFSET(vq);
ret = user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf,
start, end);
if (ret)
return ret;
/*
* Copy fpsr, and fpcr which must follow contiguously in
* struct fpsimd_state:
*/
start = end;
end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE;
ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&target->thread.fpsimd_state.fpsr,
start, end);
if (ret)
return ret;
start = end;
end = sve_size_from_header(&header);
return user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf,
start, end);
}
static int sve_set(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
int ret;
struct user_sve_header header;
unsigned int vq;
unsigned long start, end;
if (!system_supports_sve())
return -EINVAL;
/* Header */
if (count < sizeof(header))
return -EINVAL;
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &header,
0, sizeof(header));
if (ret)
goto out;
/*
* Apart from PT_SVE_REGS_MASK, all PT_SVE_* flags are consumed by
* sve_set_vector_length(), which will also validate them for us:
*/
ret = sve_set_vector_length(target, header.vl,
((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16);
if (ret)
goto out;
/* Actual VL set may be less than the user asked for: */
vq = sve_vq_from_vl(target->thread.sve_vl);
/* Registers: FPSIMD-only case */
BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header));
if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) {
ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
SVE_PT_FPSIMD_OFFSET);
clear_tsk_thread_flag(target, TIF_SVE);
goto out;
}
/* Otherwise: full SVE case */
/*
* If setting a different VL from the requested VL and there is
* register data, the data layout will be wrong: don't even
* try to set the registers in this case.
*/
if (count && vq != sve_vq_from_vl(header.vl)) {
ret = -EIO;
goto out;
}
sve_alloc(target);
/*
* Ensure target->thread.sve_state is up to date with target's
* FPSIMD regs, so that a short copyin leaves trailing registers
* unmodified.
*/
fpsimd_sync_to_sve(target);
set_tsk_thread_flag(target, TIF_SVE);
BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
start = SVE_PT_SVE_OFFSET;
end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
target->thread.sve_state,
start, end);
if (ret)
goto out;
start = end;
end = SVE_PT_SVE_FPSR_OFFSET(vq);
ret = user_regset_copyin_ignore(&pos, &count, &kbuf, &ubuf,
start, end);
if (ret)
goto out;
/*
* Copy fpsr, and fpcr which must follow contiguously in
* struct fpsimd_state:
*/
start = end;
end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE;
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpsimd_state.fpsr,
start, end);
out:
fpsimd_flush_task_state(target);
return ret;
}
#endif /* CONFIG_ARM64_SVE */
enum aarch64_regset {
REGSET_GPR,
REGSET_FPR,
......@@ -711,6 +960,9 @@ enum aarch64_regset {
REGSET_HW_WATCH,
#endif
REGSET_SYSTEM_CALL,
#ifdef CONFIG_ARM64_SVE
REGSET_SVE,
#endif
};
static const struct user_regset aarch64_regsets[] = {
......@@ -768,6 +1020,18 @@ static const struct user_regset aarch64_regsets[] = {
.get = system_call_get,
.set = system_call_set,
},
#ifdef CONFIG_ARM64_SVE
[REGSET_SVE] = { /* Scalable Vector Extension */
.core_note_type = NT_ARM_SVE,
.n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE),
SVE_VQ_BYTES),
.size = SVE_VQ_BYTES,
.align = SVE_VQ_BYTES,
.get = sve_get,
.set = sve_set,
.get_size = sve_get_size,
},
#endif
};
static const struct user_regset_view user_aarch64_view = {
......
......@@ -23,7 +23,6 @@
#include <linux/stddef.h>
#include <linux/ioport.h>
#include <linux/delay.h>
#include <linux/utsname.h>
#include <linux/initrd.h>
#include <linux/console.h>
#include <linux/cache.h>
......@@ -48,6 +47,7 @@
#include <asm/fixmap.h>
#include <asm/cpu.h>
#include <asm/cputype.h>
#include <asm/daifflags.h>
#include <asm/elf.h>
#include <asm/cpufeature.h>
#include <asm/cpu_ops.h>
......@@ -103,7 +103,8 @@ void __init smp_setup_processor_id(void)
* access percpu variable inside lock_release
*/
set_my_cpu_offset(0);
pr_info("Booting Linux on physical CPU 0x%lx\n", (unsigned long)mpidr);
pr_info("Booting Linux on physical CPU 0x%010lx [0x%08x]\n",
(unsigned long)mpidr, read_cpuid_id());
}
bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
......@@ -244,9 +245,6 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
void __init setup_arch(char **cmdline_p)
{
pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
sprintf(init_utsname()->machine, UTS_MACHINE);
init_mm.start_code = (unsigned long) _text;
init_mm.end_code = (unsigned long) _etext;
init_mm.end_data = (unsigned long) _edata;
......@@ -262,10 +260,11 @@ void __init setup_arch(char **cmdline_p)
parse_early_param();
/*
* Unmask asynchronous aborts after bringing up possible earlycon.
* (Report possible System Errors once we can report this occurred)
* Unmask asynchronous aborts and fiq after bringing up possible
* earlycon. (Report possible System Errors once we can report this
* occurred).
*/
local_async_enable();
local_daif_restore(DAIF_PROCCTX_NOIRQ);
/*
* TTBR0 is only used for the identity mapping at this stage. Make it
......
......@@ -31,6 +31,7 @@
#include <linux/ratelimit.h>
#include <linux/syscalls.h>
#include <asm/daifflags.h>
#include <asm/debug-monitors.h>
#include <asm/elf.h>
#include <asm/cacheflush.h>
......@@ -63,6 +64,7 @@ struct rt_sigframe_user_layout {
unsigned long fpsimd_offset;
unsigned long esr_offset;
unsigned long sve_offset;
unsigned long extra_offset;
unsigned long end_offset;
};
......@@ -179,9 +181,6 @@ static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
struct fpsimd_state *fpsimd = &current->thread.fpsimd_state;
int err;
/* dump the hardware registers to the fpsimd_state structure */
fpsimd_preserve_current_state();
/* copy the FP and status/control registers */
err = __copy_to_user(ctx->vregs, fpsimd->vregs, sizeof(fpsimd->vregs));
__put_user_error(fpsimd->fpsr, &ctx->fpsr, err);
......@@ -214,6 +213,8 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
__get_user_error(fpsimd.fpsr, &ctx->fpsr, err);
__get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
clear_thread_flag(TIF_SVE);
/* load the hardware registers from the fpsimd_state structure */
if (!err)
fpsimd_update_current_state(&fpsimd);
......@@ -221,10 +222,118 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
return err ? -EFAULT : 0;
}
struct user_ctxs {
struct fpsimd_context __user *fpsimd;
struct sve_context __user *sve;
};
#ifdef CONFIG_ARM64_SVE
static int preserve_sve_context(struct sve_context __user *ctx)
{
int err = 0;
u16 reserved[ARRAY_SIZE(ctx->__reserved)];
unsigned int vl = current->thread.sve_vl;
unsigned int vq = 0;
if (test_thread_flag(TIF_SVE))
vq = sve_vq_from_vl(vl);
memset(reserved, 0, sizeof(reserved));
__put_user_error(SVE_MAGIC, &ctx->head.magic, err);
__put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16),
&ctx->head.size, err);
__put_user_error(vl, &ctx->vl, err);
BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
if (vq) {
/*
* This assumes that the SVE state has already been saved to
* the task struct by calling preserve_fpsimd_context().
*/
err |= __copy_to_user((char __user *)ctx + SVE_SIG_REGS_OFFSET,
current->thread.sve_state,
SVE_SIG_REGS_SIZE(vq));
}
return err ? -EFAULT : 0;
}
static int restore_sve_fpsimd_context(struct user_ctxs *user)
{
int err;
unsigned int vq;
struct fpsimd_state fpsimd;
struct sve_context sve;
if (__copy_from_user(&sve, user->sve, sizeof(sve)))
return -EFAULT;
if (sve.vl != current->thread.sve_vl)
return -EINVAL;
if (sve.head.size <= sizeof(*user->sve)) {
clear_thread_flag(TIF_SVE);
goto fpsimd_only;
}
vq = sve_vq_from_vl(sve.vl);
if (sve.head.size < SVE_SIG_CONTEXT_SIZE(vq))
return -EINVAL;
/*
* Careful: we are about __copy_from_user() directly into
* thread.sve_state with preemption enabled, so protection is
* needed to prevent a racing context switch from writing stale
* registers back over the new data.
*/
fpsimd_flush_task_state(current);
barrier();
/* From now, fpsimd_thread_switch() won't clear TIF_FOREIGN_FPSTATE */
set_thread_flag(TIF_FOREIGN_FPSTATE);
barrier();
/* From now, fpsimd_thread_switch() won't touch thread.sve_state */
sve_alloc(current);
err = __copy_from_user(current->thread.sve_state,
(char __user const *)user->sve +
SVE_SIG_REGS_OFFSET,
SVE_SIG_REGS_SIZE(vq));
if (err)
return -EFAULT;
set_thread_flag(TIF_SVE);
fpsimd_only:
/* copy the FP and status/control registers */
/* restore_sigframe() already checked that user->fpsimd != NULL. */
err = __copy_from_user(fpsimd.vregs, user->fpsimd->vregs,
sizeof(fpsimd.vregs));
__get_user_error(fpsimd.fpsr, &user->fpsimd->fpsr, err);
__get_user_error(fpsimd.fpcr, &user->fpsimd->fpcr, err);
/* load the hardware registers from the fpsimd_state structure */
if (!err)
fpsimd_update_current_state(&fpsimd);
return err ? -EFAULT : 0;
}
#else /* ! CONFIG_ARM64_SVE */
/* Turn any non-optimised out attempts to use these into a link error: */
extern int preserve_sve_context(void __user *ctx);
extern int restore_sve_fpsimd_context(struct user_ctxs *user);
#endif /* ! CONFIG_ARM64_SVE */
static int parse_user_sigframe(struct user_ctxs *user,
struct rt_sigframe __user *sf)
{
......@@ -237,6 +346,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
char const __user *const sfp = (char const __user *)sf;
user->fpsimd = NULL;
user->sve = NULL;
if (!IS_ALIGNED((unsigned long)base, 16))
goto invalid;
......@@ -287,6 +397,19 @@ static int parse_user_sigframe(struct user_ctxs *user,
/* ignore */
break;
case SVE_MAGIC:
if (!system_supports_sve())
goto invalid;
if (user->sve)
goto invalid;
if (size < sizeof(*user->sve))
goto invalid;
user->sve = (struct sve_context __user *)head;
break;
case EXTRA_MAGIC:
if (have_extra_context)
goto invalid;
......@@ -343,6 +466,10 @@ static int parse_user_sigframe(struct user_ctxs *user,
*/
offset = 0;
limit = extra_size;
if (!access_ok(VERIFY_READ, base, limit))
goto invalid;
continue;
default:
......@@ -359,9 +486,6 @@ static int parse_user_sigframe(struct user_ctxs *user,
}
done:
if (!user->fpsimd)
goto invalid;
return 0;
invalid:
......@@ -395,8 +519,19 @@ static int restore_sigframe(struct pt_regs *regs,
if (err == 0)
err = parse_user_sigframe(&user, sf);
if (err == 0)
if (err == 0) {
if (!user.fpsimd)
return -EINVAL;
if (user.sve) {
if (!system_supports_sve())
return -EINVAL;
err = restore_sve_fpsimd_context(&user);
} else {
err = restore_fpsimd_context(user.fpsimd);
}
}
return err;
}
......@@ -455,6 +590,18 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user)
return err;
}
if (system_supports_sve()) {
unsigned int vq = 0;
if (test_thread_flag(TIF_SVE))
vq = sve_vq_from_vl(current->thread.sve_vl);
err = sigframe_alloc(user, &user->sve_offset,
SVE_SIG_CONTEXT_SIZE(vq));
if (err)
return err;
}
return sigframe_alloc_end(user);
}
......@@ -496,6 +643,13 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user,
__put_user_error(current->thread.fault_code, &esr_ctx->esr, err);
}
/* Scalable Vector Extension state, if present */
if (system_supports_sve() && err == 0 && user->sve_offset) {
struct sve_context __user *sve_ctx =
apply_user_offset(user, user->sve_offset);
err |= preserve_sve_context(sve_ctx);
}
if (err == 0 && user->extra_offset) {
char __user *sfp = (char __user *)user->sigframe;
char __user *userp =
......@@ -595,6 +749,8 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set,
struct rt_sigframe __user *frame;
int err = 0;
fpsimd_signal_preserve_current_state();
if (get_sigframe(&user, ksig, regs))
return 1;
......@@ -756,9 +912,12 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
addr_limit_user_check();
if (thread_flags & _TIF_NEED_RESCHED) {
/* Unmask Debug and SError for the next task */
local_daif_restore(DAIF_PROCCTX_NOIRQ);
schedule();
} else {
local_irq_enable();
local_daif_restore(DAIF_PROCCTX);
if (thread_flags & _TIF_UPROBE)
uprobe_notify_resume(regs);
......@@ -775,7 +934,7 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
fpsimd_restore_current_state();
}
local_irq_disable();
local_daif_mask();
thread_flags = READ_ONCE(current_thread_info()->flags);
} while (thread_flags & _TIF_WORK_MASK);
}
......@@ -239,7 +239,7 @@ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
* Note that this also saves V16-31, which aren't visible
* in AArch32.
*/
fpsimd_preserve_current_state();
fpsimd_signal_preserve_current_state();
/* Place structure header on the stack */
__put_user_error(magic, &frame->magic, err);
......
......@@ -47,6 +47,7 @@
#include <asm/cpu.h>
#include <asm/cputype.h>
#include <asm/cpu_ops.h>
#include <asm/daifflags.h>
#include <asm/mmu_context.h>
#include <asm/numa.h>
#include <asm/pgtable.h>
......@@ -216,6 +217,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
*/
asmlinkage void secondary_start_kernel(void)
{
u64 mpidr = read_cpuid_mpidr() & MPIDR_HWID_BITMASK;
struct mm_struct *mm = &init_mm;
unsigned int cpu;
......@@ -265,14 +267,14 @@ asmlinkage void secondary_start_kernel(void)
* the CPU migration code to notice that the CPU is online
* before we continue.
*/
pr_info("CPU%u: Booted secondary processor [%08x]\n",
cpu, read_cpuid_id());
pr_info("CPU%u: Booted secondary processor 0x%010lx [0x%08x]\n",
cpu, (unsigned long)mpidr,
read_cpuid_id());
update_cpu_boot_status(CPU_BOOT_SUCCESS);
set_cpu_online(cpu, true);
complete(&cpu_running);
local_irq_enable();
local_async_enable();
local_daif_restore(DAIF_PROCCTX);
/*
* OK, it's off to the idle thread for us
......@@ -368,10 +370,6 @@ void __cpu_die(unsigned int cpu)
/*
* Called from the idle thread for the CPU which has been shutdown.
*
* Note that we disable IRQs here, but do not re-enable them
* before returning to the caller. This is also the behaviour
* of the other hotplug-cpu capable cores, so presumably coming
* out of idle fixes this.
*/
void cpu_die(void)
{
......@@ -379,7 +377,7 @@ void cpu_die(void)
idle_task_exit();
local_irq_disable();
local_daif_mask();
/* Tell __cpu_die() that this CPU is now safe to dispose of */
(void)cpu_report_death();
......@@ -837,7 +835,7 @@ static void ipi_cpu_stop(unsigned int cpu)
{
set_cpu_online(cpu, false);
local_irq_disable();
local_daif_mask();
while (1)
cpu_relax();
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment