- 10 Dec, 2018 17 commits
-
-
Steve Capper authored
For cases where there is a mismatch in ARMv8.2-LVA support between CPUs we have to be careful in allowing secondary CPUs to boot if 52-bit virtual addresses have already been enabled on the boot CPU. This patch adds code to the secondary startup path. If the boot CPU has enabled 52-bit VAs then ID_AA64MMFR2_EL1 is checked to see if the secondary can also enable 52-bit support. If not, the secondary is prevented from booting and an error message is displayed indicating why. Technically this patch could be implemented using the cpufeature code when considering 52-bit userspace support. However, we employ low level checks here as the cpufeature code won't be able to run if we have mismatched 52-bit kernel va support. Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Steve Capper authored
Enabling 52-bit VAs on arm64 requires that the PGD table expands from 64 entries (for the 48-bit case) to 1024 entries. This quantity, PTRS_PER_PGD is used as follows to compute which PGD entry corresponds to a given virtual address, addr: pgd_index(addr) -> (addr >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1) Userspace addresses are prefixed by 0's, so for a 48-bit userspace address, uva, the following is true: (uva >> PGDIR_SHIFT) & (1024 - 1) == (uva >> PGDIR_SHIFT) & (64 - 1) In other words, a 48-bit userspace address will have the same pgd_index when using PTRS_PER_PGD = 64 and 1024. Kernel addresses are prefixed by 1's so, given a 48-bit kernel address, kva, we have the following inequality: (kva >> PGDIR_SHIFT) & (1024 - 1) != (kva >> PGDIR_SHIFT) & (64 - 1) In other words a 48-bit kernel virtual address will have a different pgd_index when using PTRS_PER_PGD = 64 and 1024. If, however, we note that: kva = 0xFFFF << 48 + lower (where lower[63:48] == 0b) and, PGDIR_SHIFT = 42 (as we are dealing with 64KB PAGE_SIZE) We can consider: (kva >> PGDIR_SHIFT) & (1024 - 1) - (kva >> PGDIR_SHIFT) & (64 - 1) = (0xFFFF << 6) & 0x3FF - (0xFFFF << 6) & 0x3F // "lower" cancels out = 0x3C0 In other words, one can switch PTRS_PER_PGD to the 52-bit value globally provided that they increment ttbr1_el1 by 0x3C0 * 8 = 0x1E00 bytes when running with 48-bit kernel VAs (TCR_EL1.T1SZ = 16). For kernel configuration where 52-bit userspace VAs are possible, this patch offsets ttbr1_el1 and sets PTRS_PER_PGD corresponding to the 52-bit value. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> [will: added comment to TTBR1_BADDR_4852_OFFSET calculation] Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Steve Capper authored
Now that we have DEFAULT_MAP_WINDOW defined, we can arch_get_mmap_end and arch_get_mmap_base helpers to allow for high addresses in mmap. Signed-off-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Steve Capper authored
We wish to introduce a 52-bit virtual address space for userspace but maintain compatibility with software that assumes the maximum VA space size is 48 bit. In order to achieve this, on 52-bit VA systems, we make mmap behave as if it were running on a 48-bit VA system (unless userspace explicitly requests a VA where addr[51:48] != 0). On a system running a 52-bit userspace we need TASK_SIZE to represent the 52-bit limit as it is used in various places to distinguish between kernelspace and userspace addresses. Thus we need a new limit for mmap, stack, ELF loader and EFI (which uses TTBR0) to represent the non-extended VA space. This patch introduces DEFAULT_MAP_WINDOW and DEFAULT_MAP_WINDOW_64 and switches the appropriate logic to use that instead of TASK_SIZE. Signed-off-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Steve Capper authored
This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of arch_get_unmapped_* functions. However, on arm64 we use the generic versions of these functions. Rather than duplicate the generic arch_get_unmapped_* implementations for arm64, this patch instead introduces two architectural helper macros and applies them to arch_get_unmapped_*: arch_get_mmap_end(addr) - get mmap upper limit depending on addr hint arch_get_mmap_base(addr, base) - get mmap_base depending on addr hint If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. Signed-off-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Qian Cai authored
If the kernel is configured with KASAN_EXTRA, the stack size is increased significantly due to setting the GCC -fstack-reuse option to "none" [1]. As a result, it can trigger a stack overrun quite often with 32k stack size compiled using GCC 8. For example, this reproducer https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c can trigger a "corrupted stack end detected inside scheduler" very reliably with CONFIG_SCHED_STACK_END_CHECK enabled. There are other reports at: https://lore.kernel.org/lkml/1542144497.12945.29.camel@gmx.us/ https://lore.kernel.org/lkml/721E7B42-2D55-4866-9C1A-3E8D64F33F9C@gmx.us/ There are just too many functions that could have a large stack with KASAN_EXTRA due to large local variables that have been called over and over again without being able to reuse the stacks. Some noticiable ones are, size 7536 shrink_inactive_list 7440 shrink_page_list 6560 fscache_stats_show 3920 jbd2_journal_commit_transaction 3216 try_to_unmap_one 3072 migrate_page_move_mapping 3584 migrate_misplaced_transhuge_page 3920 ip_vs_lblcr_schedule 4304 lpfc_nvme_info_show 3888 lpfc_debugfs_nvmestat_data.constprop There are other 49 functions over 2k in size while compiling kernel with "-Wframe-larger-than=" on this machine. Hence, it is too much work to change Makefiles for each object to compile without -fsanitize-address-use-after-scope individually. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Will Deacon authored
The dcache_by_line_op macro suffers from a couple of small problems: First, the GAS directives that are currently being used rely on assembler behavior that is not documented, and probably not guaranteed to produce the correct behavior going forward. As a result, we end up with some undefined symbols in cache.o: $ nm arch/arm64/mm/cache.o ... U civac ... U cvac U cvap U cvau This is due to the fact that the comparisons used to select the operation type in the dcache_by_line_op macro are comparing symbols not strings, and even though it seems that GAS is doing the right thing here (undefined symbols by the same name are equal to each other), it seems unwise to rely on this. Second, when patching in a DC CVAP instruction on CPUs that support it, the fallback path consists of a DC CVAU instruction which may be affected by CPU errata that require ARM64_WORKAROUND_CLEAN_CACHE. Solve these issues by unrolling the various maintenance routines and using the conditional directives that are documented as operating on strings. To avoid the complexity of nested alternatives, we move the DC CVAP patching to __clean_dcache_area_pop, falling back to a branch to __clean_dcache_area_poc if DCPOP is not supported by the CPU. Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
Now that arm64ksyms.c has been reduced to a stub, let's remove it entirely. New exports should be associated with their function definition. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the ftrace exports to the assembly files the functions are defined in. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the string routine exports to the assembly files the functions are defined in. Routines which should only be exported for !KASAN builds are exported using the EXPORT_SYMBOL_NOKASAN() helper. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the uaccess exports to the assembly files the functions are defined in. As we have to include <asm/assembler.h>, the existing includes are fixed to follow the usual ordering conventions. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the copy_page and clear_page exports to the assembly files the functions are defined in. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the SMCCC exports to the assembly file the functions are defined in. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the tishift exports to the assembly file the functions are defined in. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
So that we can export symbols directly from assembly files, let's make use of the generic <asm/export.h>. We have a few symbols that we'll want to conditionally export for !KASAN kernel builds, so we add a helper for that in <asm/assembler.h>. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
Since we define memstart_addr in a C file, we can have the export immediately after the definition of the symbol, as we do elsewhere. As a step towards removing arm64ksyms.c, move the export of memstart_addr to init.c, where the symbol is defined. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
Now that the arm64 bitops are inlines built atop of the regular atomics, we don't need to export anything. Remove the redundant exports. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
- 07 Dec, 2018 6 commits
-
-
Will Deacon authored
The "L" AArch64 machine constraint, which we use for the "old" value in an LL/SC cmpxchg(), generates an immediate that is suitable for a 64-bit logical instruction. However, for cmpxchg() operations on types smaller than 64 bits, this constraint can result in an invalid instruction which is correctly rejected by GAS, such as EOR W1, W1, #0xffffffff. Whilst we could special-case the constraint based on the cmpxchg size, it's far easier to change the constraint to "K" and put up with using a register for large 64-bit immediates. For out-of-line LL/SC atomics, this is all moot anyway. Reported-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Will Deacon authored
Our percpu code is a bit of an inconsistent mess: * It rolls its own xchg(), but reuses cmpxchg_local() * It uses various different flavours of preempt_{enable,disable}() * It returns values even for the non-returning RmW operations * It makes no use of LSE atomics outside of the cmpxchg() ops * There are individual macros for different sizes of access, but these are all funneled through a switch statement rather than dispatched directly to the relevant case This patch rewrites the per-cpu operations to address these shortcomings. Whilst the new code is a lot cleaner, the big advantage is that we can use the non-returning ST- atomic instructions when we have LSE. Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Will Deacon authored
The CAS instructions implicitly access only the relevant bits of the "old" argument, so there is no need for explicit masking via type-casting as there is in the LL/SC implementation. Move the casting into the LL/SC code and remove it altogether for the LSE implementation. Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Will Deacon authored
Our atomic instructions (either LSE atomics of LDXR/STXR sequences) natively support byte, half-word, word and double-word memory accesses so there is no need to mask the data register prior to being stored. Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Will Deacon authored
The asm-generic/preempt.h implementation doesn't make use of the PREEMPT_NEED_RESCHED flag, since this can interact badly with load/store architectures which rely on the preempt_count word being unchanged across an interrupt. However, since we're a 64-bit architecture and the preempt count is only 32 bits wide, we can simply pack it next to the resched flag and load the whole thing in one go, so that a dec-and-test operation doesn't need to load twice. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Will Deacon authored
PREEMPT_NEED_RESCHED is never used directly, so move it into the arch code where it can potentially be implemented using either a different bit in the preempt count or as an entirely separate entity. Cc: Robert Love <rml@tech9.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
- 06 Dec, 2018 13 commits
-
-
Allen Pais authored
Add hstate for each supported hugepage size using arch initcall. * no hugepage parameters Without hugepage parameters, only a default hugepage size is available for dynamic allocation. It's different, for example, from x86_64 and sparc64 where all supported hugepage sizes are available. * only default_hugepagesz= is specified and set not to HPAGE_SIZE In spite of the fact that default_hugepagesz= is set to a valid hugepage size, it's treated as unsupported and reverted to HPAGE_SIZE. Such behaviour is also different from x86_64 and sparc64. Acked-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Dmitry Klochkov <dmitry.klochkov@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Jackie Liu authored
This is a NEON acceleration method that can improve performance by approximately 20%. I got the following data from the centos 7.5 on Huawei's HISI1616 chip: [ 93.837726] xor: measuring software checksum speed [ 93.874039] 8regs : 7123.200 MB/sec [ 93.914038] 32regs : 7180.300 MB/sec [ 93.954043] arm64_neon: 9856.000 MB/sec [ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec) I believe this code can bring some optimization for all arm64 platform. thanks for Ard Biesheuvel's suggestions. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Jackie Liu authored
In a way similar to ARM commit 09096f6a ("ARM: 7822/1: add workaround for ambiguous C99 stdint.h types"), this patch redefines the macros that are used in stdint.h so its definitions of uint64_t and int64_t are compatible with those of the kernel. This patch comes from: https://patchwork.kernel.org/patch/3540001/ Wrote by: Ard Biesheuvel <ard.biesheuvel@linaro.org> We mark this file as a private file and don't have to override asm/types.h Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Will Deacon authored
The comment about SYS_MEMBARRIER_SYNC_CORE relying on ERET being context-synchronizing is confusing and misplaced with kpti. Given that this is already documented under Documentation/ (see arch-support.txt for membarrier), remove the comment altogether. Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Will Deacon authored
Some CPUs can speculate past an ERET instruction and potentially perform speculative accesses to memory before processing the exception return. Since the register state is often controlled by a lower privilege level at the point of an ERET, this could potentially be used as part of a side-channel attack. This patch emits an SB sequence after each ERET so that speculation is held up on exception return. Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Will Deacon authored
We currently use a DSB; ISB sequence to inhibit speculation in set_fs(). Whilst this works for current CPUs, future CPUs may implement a new SB barrier instruction which acts as an architected speculation barrier. On CPUs that support it, patch in an SB; NOP sequence over the DSB; ISB sequence and advertise the presence of the new instruction to userspace. Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Suzuki K Poulose authored
We use a stop_machine call for each available capability to enable it on all the CPUs available at boot time. Instead we could batch the cpu_enable callbacks to a single stop_machine() call to save us some time. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Suzuki K Poulose authored
Use the sorted list of capability entries for the detection and verification. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Suzuki K Poulose authored
Make use of the sorted capability list to access the capability entry in this_cpu_has_cap() to avoid iterating over the two tables. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Suzuki K Poulose authored
We maintain two separate tables of capabilities, errata and features, which decide the system capabilities. We iterate over each of these tables for various operations (e.g, detection, verification etc.). We do not have a way to map a system "capability" to its entry, (i.e, cap -> struct arm64_cpu_capabilities) which is needed for this_cpu_has_cap(). So we iterate over the table one by one to find the entry and then do the operation. Also, this prevents us from optimizing the way we "enable" the capabilities on the CPUs, where we now issue a stop_machine() for each available capability. One solution is to merge the two tables into a single table, sorted by the capability. But this is has the following disadvantages: - We loose the "classification" of an errata vs. feature - It is quite easy to make a mistake when adding an entry, unless we sort the table at runtime. So we maintain a list of pointers to the capability entry, sorted by the "cap number" in a separate array, initialized at boot time. The only restriction is that we can have one "entry" per capability. While at it, remove the duplicate declaration of arm64_errata table. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Suzuki K Poulose authored
Remove duplicate entries for Qualcomm erratum 1003. Since the entries are not purely based on generic MIDR checks, use the multi_cap_entry type to merge the entries. Cc: Christopher Covington <cov@codeaurora.org> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Suzuki K Poulose authored
Merge duplicate entries for a single capability using the midr range list for Cavium errata 30115 and 27456. Cc: Andrew Pinski <apinski@cavium.com> Cc: David Daney <david.daney@cavium.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Suzuki K Poulose authored
We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability : 1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012] 2) ARM Errata 819472 on A53 r0p[01] Both have the same work around. Merge these entries to avoid duplicate entries for a single capability. Add a new Kconfig entry to control the "capability" entry to make it easier to handle combinations of the CONFIGs. Cc: Will Deacon <will.deacon@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
- 04 Dec, 2018 1 commit
-
-
Ard Biesheuvel authored
readelf complains about the section layout of vmlinux when building with CONFIG_RELOCATABLE=y (for KASLR): readelf: Warning: [21]: Link field (0) should index a symtab section. readelf: Warning: [21]: Info field (0) should index a relocatable section. Also, it seems that our use of '-pie -shared' is contradictory, and thus ambiguous. In general, the way KASLR is wired up at the moment is highly tailored to how ld.bfd happens to implement (and conflate) PIE executables and shared libraries, so given the current effort to support other toolchains, let's fix some of these issues as well. - Drop the -pie linker argument and just leave -shared. In ld.bfd, the differences between them are unclear (except for the ELF type of the produced image [0]) but lld chokes on seeing both at the same time. - Rename the .rela output section to .rela.dyn, as is customary for shared libraries and PIE executables, so that it is not misidentified by readelf as a static relocation section (producing the warnings above). - Pass the -z notext and -z norelro options to explicitly instruct the linker to permit text relocations, and to omit the RELRO program header (which requires a certain section layout that we don't adhere to in the kernel). These are the defaults for current versions of ld.bfd. - Discard .eh_frame and .gnu.hash sections to avoid them from being emitted between .head.text and .text, screwing up the section layout. These changes only affect the ELF image, and produce the same binary image. [0] b9dce7f1 ("arm64: kernel: force ET_DYN ELF type for ...") Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Smith <peter.smith@linaro.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
- 30 Nov, 2018 3 commits
-
-
Ard Biesheuvel authored
Improve the performance of the crc32() asm routines by getting rid of most of the branches and small sized loads on the common path. Instead, use a branchless code path involving overlapping 16 byte loads to process the first (length % 32) bytes, and process the remainder using a loop that processes 32 bytes at a time. Tested using the following test program: #include <stdlib.h> extern void crc32_le(unsigned short, char const*, int); int main(void) { static const char buf[4096]; srand(20181126); for (int i = 0; i < 100 * 1000 * 1000; i++) crc32_le(0, buf, rand() % 1024); return 0; } On Cortex-A53 and Cortex-A57, the performance regresses but only very slightly. On Cortex-A72 however, the performance improves from $ time ./crc32 real 0m10.149s user 0m10.149s sys 0m0.000s to $ time ./crc32 real 0m7.915s user 0m7.915s sys 0m0.000s Cc: Rui Sun <sunrui26@huawei.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
The core ftrace hooks take the instrumented PC in x0, but for some reason arm64's prepare_ftrace_return() takes this in x1. For consistency, let's flip the argument order and always pass the instrumented PC in x0. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Mark Rutland authored
The save_return_regs and restore_return_regs macros are only used by return_to_handler, and having them defined out-of-line only serves to obscure the logic. Before we complicate, let's clean this up and fold the logic directly into return_to_handler, saving a few lines of macro boilerplate in the process. At the same time, a missing trailing space is added to the comments, fixing a code style violation. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-