Commits · 234a0f202a09a6144fd3c17ac6d018bdab9780bb · Kirill Smelkov / linux

12 Mar, 2022 1 commit

ARM: fix building NOMMU ARMv4/v5 kernels · 234a0f20

Arnd Bergmann authored Mar 09, 2022

The removal of the old-style irq entry broke obscure NOMMU
configurations on machines that have an MMU:

ld.lld: error: undefined symbol: generic_handle_arch_irq
 referenced by kernel/entry-armv.o:(__irq_svc) in archive arch/arm/built-in.a

A follow-up patch to convert nvic to the generic_handle_arch_irq()
could have fixed this by removing the Kconfig conditional, but did
it differently.

Change the Kconfig logic so ARM machines now unconditionally
enable the feature.

I have also submitted a patch to remove support for the configurations
that broke, but fixing the regression first is a trivial and correct
change.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 54f481a2 ("ARM: remove old-style irq entry")
Fixes: 52d24087 ("irqchip: nvic: Use GENERIC_IRQ_MULTI_HANDLER")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

234a0f20

11 Mar, 2022 4 commits

ARM: unwind: only permit stack switch when unwinding call_with_stack() · f6b8e352

Ard Biesheuvel authored Mar 11, 2022

Commit b6506981 ("ARM: unwind: support unwinding across multiple
stacks") updated the logic in the ARM unwinder to widen the bounds
within which SP is assumed to be valid, in order to allow the unwind to
traverse from the IRQ stack to the task stack. This is necessary, as
otherwise, unwinds started from the IRQ stack would terminate in the IRQ
exception handler, making stacktraces substantially less useful.

This turns out to be a mistake, as it breaks asynchronous unwinding
across exceptions, when the exception is taken before the stack frame is
consistent with the unwind info. For instance, in the following
backtrace:

  ...
   generic_handle_arch_irq from call_with_stack+0x18/0x20
   call_with_stack from __irq_svc+0x80/0x98
  Exception stack(0xc7093e20 to 0xc7093e68)
  3e20: b6a94a88 c7093ea0 00000008 00000000 c7093ea0 b7e127d0 00000051 c9220000
  3e40: b6a94a88 b6a94a88 00000004 0002b000 0036b570 c7093e70 c040ca2c c0994a90
  3e60: 20070013 ffffffff
   __irq_svc from __copy_to_user_std+0x20/0x378
  ...

we need to apply the following unwind directives:

  0xc099720c <__copy_to_user_std+0x1c>: @0xc295d1d4
    Compact model index: 1
    0x9b      vsp = r11
    0xb1 0x0d pop {r0, r2, r3}
    0x84 0x81 pop {r4, r11, r14}
    0xb0      finish

which tell us to switch to the frame pointer register R11 and proceed
with the unwind from that. However, having been interrupted 0x20 bytes
into the function:

  c09971f0 <__copy_to_user_std>:
  c09971f0:       e59f3350        ldr     r3, [pc, #848]
  c09971f4:       e243c001        sub     ip, r3, #1
  c09971f8:       e05cc000        subs    ip, ip, r0
  c09971fc:       228cc001        addcs   ip, ip, #1
  c0997200:       205cc002        subscs  ip, ip, r2
  c0997204:       33a00000        movcc   r0, #0
  c0997208:       e320f014        csdb
  c099720c:       e3a03000        mov     r3, #0
  c0997210:       e92d481d        push    {r0, r2, r3, r4, fp, lr}
  c0997214:       e1a0b00d        mov     fp, sp
  c0997218:       e2522004        subs    r2, r2, #4

the value for R11 recovered from the previous frame (__irq_svc) will be
a snapshot of its value before the exception was taken (0x0002b000),
which occurred at address __copy_to_user_std+0x20 (0xc0997210), when R11
had not been assigned its value yet.

This means we can never assume that the SP values recovered from the
stack or from the frame pointer are ever safe to use, given the need to
do asynchronous unwinding, and the only robust approach is to revert to
the previous approach, which is to derive bounds for SP based on the
initial value, and never update them.

We can make an exception, though: now that the IRQ stack switch is
guaranteed to occur in call_with_stack(), we can implement a special
case for this function, and use a different set of bounds based on the
knowledge that it will always unwind from R11 rather than SP. As
call_with_stack() is a hand-rolled assembly routine, this is guaranteed
to remain that way.

So let's do a partial revert of b6506981, and drop all manipulations
for sp_low and sp_high based on the information collected during the
unwind itself. To support call_with_stack(), set sp_low and sp_high
explicitly to values derived from R11 when we unwind that function.

The only downside is that, while unwinding an overflow of the vmap'ed
stack will work fine as before, we will no longer be able to produce a
backtrace that unwinds the overflow stack itself across the exception
that was raised due to the faulting access to the guard region. However,
this only affects exceptions caused by problems in the stack overflow
handling code itself, in which case the remaining backtrace is not that
relevant.

Fixes: b6506981 ("ARM: unwind: support unwinding across multiple stacks")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

f6b8e352

ARM: Revert "unwind: dump exception stack from calling frame" · bee4e1fd

Ard Biesheuvel authored Mar 10, 2022

After simplifying the stack switch code in the IRQ exception handler by
deferring the actual stack switch to call_with_stack(), we no longer
need to special case the way we dump the exception stack, since it will
always be at the top of whichever stack was active when the exception
was taken.

So revert this special handling for the ARM unwinder.

This reverts commit 4ab68270.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

bee4e1fd

ARM: entry: fix unwinder problems caused by IRQ stacks · 7a8ca84a

Ard Biesheuvel authored Mar 10, 2022

The IRQ stacks series made some changes to the unwinder, to permit
unwinding across different stacks. This is needed because otherwise, the
call stack would terminate at the point where the stack switch between
the task stack and the IRQ stack occurs, which would defeat any
diagnostics that rely on timer interrupts, such as RCU stall detection.

Unfortunately, getting the unwind annotations correct turns out to be
difficult, given that this now involves a frame pointer which needs to
point into the right location in the task stack when unwinding from the
IRQ stack. Getting this wrong for an exception handling routine results
in the stack pointer to be unwound from the wrong location, causing any
subsequent unwind attempts to cause all kinds of issues, as reported by
Naresh here [0].

So let's simplify this, by deferring the stack switch to
call_with_stack(), which already has the correct unwind annotations, and
removing all the complicated handling of the stack frame from the IRQ
exception entrypoint itself.

[0] https://lore.kernel.org/all/CA+G9fYtpy8VgK+ag6OsA9TDrwi5YGU4hu7GM8xwpO7v6LrCD4Q@mail.gmail.com/Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

7a8ca84a

ARM: unwind: set frame.pc correctly for current-thread unwinding · c46c2c9b

Russell King (Oracle) authored Mar 09, 2022

When e.g. a WARN_ON() is encountered, we attempt to unwind the current
thread. To do this, we set frame.pc to unwind_backtrace, which means it
points at the beginning of the function. However, the rest of the state
is initialised from within the function, which means the function
prologue has already been run.

This can be confusing, and with a recent patch from Ard, can result in
the unwinder misbehaving if we want to be strict about the PC value.

If we correctly initialise the state so it is self-consistent (in other
words, set frame.pc to the location we are initialising it) then we
eliminate this confusion, and avoid possible future issues.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

c46c2c9b

07 Mar, 2022 2 commits

ARM: 9184/1: return_address: disable again for CONFIG_ARM_UNWIND=y · 6845d64d

Ard Biesheuvel authored Mar 02, 2022

Commit 41918ec8 ("ARM: ftrace: enable the graph tracer with the EABI
unwinder") removed the dummy version of return_address() that was
provided for the CONFIG_ARM_UNWIND=y case, on the assumption that the
removal of the kernel_text_address() call from unwind_frame() in the
preceding patch made it safe to do so.

However, this turns out not to be the case: Corentin reports warnings
about suspicious RCU usage and other strange behavior that seems to
originate in the stack unwinding that occurs in return_address().

Given that the function graph tracer (which is what these changes were
enabling for CONFIG_ARM_UNWIND=y builds) does not appear to care about
this distinction, let's revert return_address() to the old state.

Cc: Corentin Labbe <clabbe.montjoie@gmail.com>
Fixes: 41918ec8 ("ARM: ftrace: enable the graph tracer with the EABI unwinder")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

6845d64d

ARM: 9183/1: unwind: avoid spurious warnings on bogus code addresses · 81679376

Ard Biesheuvel authored Mar 02, 2022

Corentin reports that since commit 538b9265 ("ARM: unwind: track
location of LR value in stack frame"), numerous spurious warnings are
emitted into the kernel log:

  [    0.000000] unwind: Index not found c0f0c440
  [    0.000000] unwind: Index not found 00000000
  [    0.000000] unwind: Index not found c0f0c440
  [    0.000000] unwind: Index not found 00000000

This is due to the fact that the commit in question removes a check
whether the PC value in the unwound frame is actually a kernel text
address, on the assumption that such an address will not be associated
with valid unwind data to begin with, which is checked right after.

The reason for removing this check was that unwind_frame() will be
called by the ftrace graph tracer code, which means that it can no
longer be safely instrumented itself, or any code that it calls, as it
could cause infinite recursion.

In order to prevent the spurious diagnostics, let's add back the call to
kernel_text_address(), but this time, only call it if no unwind data
could be found for the address in question. This is more efficient for
the common successful case, and should avoid any unintended recursion,
considering that kernel_text_address() will only be called if no unwind
data was found.

Cc: Corentin Labbe <clabbe.montjoie@gmail.com>
Fixes: 538b9265 ("ARM: unwind: track location of LR value in stack frame")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

81679376

28 Feb, 2022 1 commit

Merge tag 'arm-ftrace-for-rmk' of... · 74aaaa1e

Russell King (Oracle) authored Feb 28, 2022

Merge tag 'arm-ftrace-for-rmk' of git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux into devel-stable

ARM: ftrace fixes and cleanups

Make all flavors of ftrace available on all builds, regardless of ISA
choice, unwinder choice or compiler:
- use ADD not POP where possible
- fix a couple of Thumb2 related issues
- enable HAVE_FUNCTION_GRAPH_FP_TEST for robustness
- enable the graph tracer with the EABI unwinder
- avoid clobbering frame pointer registers to make Clang happy

Link: https://lore.kernel.org/linux-arm-kernel/20220203082204.1176734-1-ardb@kernel.org/

74aaaa1e

10 Feb, 2022 2 commits

Revert "ARM: 9144/1: forbid ftrace with clang and thumb2_kernel" · d6800ca7

Ard Biesheuvel authored Jan 26, 2022

This reverts commit ecb108e3.

Clang + Thumb2 with ftrace is now supported.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

d6800ca7

ARM: mach-bcm: disable ftrace in SMC invocation routines · 64dff07b

Ard Biesheuvel authored Feb 01, 2022

The SMC calling convention uses R7 as an argument register, which
conflicts with its use as a frame pointer when building in Thumb2 mode.
Given that Clang with ftrace does not permit frame pointers to be
disabled, let's omit this compilation unit from ftrace instrumentation.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>

64dff07b

09 Feb, 2022 10 commits

ARM: cacheflush: avoid clobbering the frame pointer · 1f640552

Ard Biesheuvel authored Jan 26, 2022

Thumb2 uses R7 rather than R11 as the frame pointer, and even if we
rarely use a frame pointer to begin with when building in Thumb2 mode,
there are cases where it is required by the compiler (Clang when
inserting profiling hooks via -pg)

However, preserving and restoring the frame pointer is risky, as any
unhandled exceptions raised in the mean time will produce a bogus
backtrace, and it would be better not to touch the frame pointer at all.
This is the case even when CONFIG_FRAME_POINTER is not set, as the
unwind directive used by the unwinder may also use R7 or R11 as the
unwind anchor, even if the frame pointer is not managed strictly
according to the frame pointer ABI.

So let's tweak the cacheflush asm code not to clobber R7 or R11 at all,
so that we can drop R7 from the clobber lists of the inline asm blocks
that call these routines, and remove the code that preserves/restores
R11.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

1f640552

ARM: kprobes: treat R7 as the frame pointer register in Thumb2 builds · dd12e97f

Ard Biesheuvel authored Jan 26, 2022

Thumb2 code uses R7 as the frame pointer rather than R11, because the
opcodes to access it are generally shorter.

This means that there are cases where we cannot simply add it to the
clobber list of an asm() block, but need to preserve/restore it
explicitly, or the compiler may complain in some cases (e.g., Clang
builds with ftrace enabled).

Since R11 is not special in that case, clobber it instead, and use it to
preserve/restore the value of R7.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>

dd12e97f

ARM: ftrace: enable the graph tracer with the EABI unwinder · 41918ec8

Ard Biesheuvel authored Jan 25, 2022

Enable the function graph tracer in combination with the EABI unwinder,
so that Thumb2 builds or Clang ARM builds can make use of it.

This involves using the unwinder to locate the return address of an
instrumented function on the stack, so that it can be overridden and
made to refer to the ftrace handling routines that need to be called at
function return.

Given that for these builds, it is not guaranteed that the value of the
link register is stored on the stack, fall back to the stack slot that
will be used by the ftrace exit code to restore LR in the instrumented
function's execution context.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

41918ec8

ARM: unwind: track location of LR value in stack frame · 538b9265

Ard Biesheuvel authored Jan 24, 2022

The ftrace graph tracer needs to override the return address of an
instrumented function, in order to install a hook that gets invoked when
the function returns again.

Currently, we only support this when building for ARM using GCC with
frame pointers, as in this case, it is guaranteed that the function will
reload LR from [FP, #-4] in all cases, and we can simply pass that
address to the ftrace code.

In order to support this for configurations that rely on the EABI
unwinder, such as Thumb2 builds, make the unwinder keep track of the
address from which LR was unwound, permitting ftrace to make use of this
in a subsequent patch.

Drop the call to is_kernel_text_address(), which is problematic in terms
of ftrace recursion, given that it may be instrumented itself. The call
is redundant anyway, as no unwind directives will be found unless the PC
points to memory that is known to contain executable code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

538b9265

ARM: ftrace: enable HAVE_FUNCTION_GRAPH_FP_TEST · 953f534a

Ard Biesheuvel authored Jan 25, 2022

Fix the frame pointer handling in the function graph tracer entry and
exit code so we can enable HAVE_FUNCTION_GRAPH_FP_TEST. Instead of using
FP directly (which will have different values between the entry and exit
pieces of the function graph tracer), use the value of SP at entry and
exit, as we can derive the former value from the frame pointer.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

953f534a

ARM: ftrace: avoid unnecessary literal loads · 65aa7e34

Ard Biesheuvel authored Dec 07, 2021

Avoid explicit literal loads and instead, use accessor macros that
generate the optimal sequence depending on the architecture revision
being targeted.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

65aa7e34

ARM: ftrace: avoid redundant loads or clobbering IP · d1196787

Ard Biesheuvel authored Jan 23, 2022

Tweak the ftrace return paths to avoid redundant loads of SP, as well as
unnecessary clobbering of IP.

This also fixes the inconsistency of using MOV to perform a function
return, which is sub-optimal on recent micro-architectures but more
importantly, does not perform an interworking return, unlike compiler
generated function returns in Thumb2 builds.

Let's fix this by popping PC from the stack like most ordinary code
does.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

d1196787

ARM: ftrace: use trampolines to keep .init.text in branching range · dc438db5

Ard Biesheuvel authored Jan 23, 2022

Kernel images that are large in comparison to the range of a direct
branch may fail to work as expected with ftrace, as patching a direct
branch to one of the core ftrace routines may not be possible from the
.init.text section, if it is emitted too far away from the normal .text
section.

This is more likely to affect Thumb2 builds, given that its range is
only -/+ 16 MiB (as opposed to ARM which has -/+ 32 MiB), but may occur
in either ISA.

To work around this, add a couple of trampolines to .init.text and
swap these in when the ftrace patching code is operating on callers in
.init.text.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

dc438db5

ARM: ftrace: use ADD not POP to counter PUSH at entry · ad1c2f39

Ard Biesheuvel authored Dec 08, 2021

The compiler emitted hook used for ftrace consists of a PUSH {LR} to
preserve the link register, followed by a branch-and-link (BL) to
__gnu_mount_nc. Dynamic ftrace patches away the latter to turn the
combined sequence into a NOP, using a POP {LR} instruction.

This is not necessary, since the link register does not get clobbered in
this case, and simply adding #4 to the stack pointer is sufficient, and
avoids a memory access that may take a few cycles to resolve depending
on the micro-architecture.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ad1c2f39

ARM: ftrace: ensure that ADR takes the Thumb bit into account · dd88b03f

Ard Biesheuvel authored Jan 24, 2022

Using ADR to take the address of 'ftrace_stub' via a local label
produces an address that has the Thumb bit cleared, which means the
subsequent comparison is guaranteed to fail. Instead, use the badr
macro, which forces the Thumb bit to be set.

Fixes: a3ba87a6 ("ARM: 6316/1: ftrace: add Thumb-2 support")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

dd88b03f

31 Jan, 2022 2 commits

Merge tag 'arm-vmap-stacks-v6' of... · 2fa39482

Russell King (Oracle) authored Jan 31, 2022

Merge tag 'arm-vmap-stacks-v6' of git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux into devel-stable

ARM: support for IRQ and vmap'ed stacks [v6]

This tag covers the changes between the version of vmap'ed + IRQ stacks
support pulled into rmk/devel-stable [0] (which was dropped from v5.17
due to issues discovered too late in the cycle), and my v5 proposed for
the v5.18 cycle [1].

[0] git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git arm-irq-and-vmap-stacks-for-rmk
[1] https://lore.kernel.org/linux-arm-kernel/20220124174744.1054712-1-ardb@kernel.org/

2fa39482

ARM: make get_current() and __my_cpu_offset() __always_inline · 4d5a643e

Ard Biesheuvel authored Jan 24, 2022

The get_current() and __my_cpu_offset() accessors evaluate to only a
single instruction emitted inline, but due to the size of the asm string
that is created for SMP+v6 configurations, the compiler assumes
otherwise, and may emit the functions out of line instead.

So use __always_inline to avoid this.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

4d5a643e

25 Jan, 2022 5 commits

ARM: drop pointless SMP check on secondary startup path · 57a42043

Ard Biesheuvel authored Jan 24, 2022

Only SMP systems use the secondary startup path by definition, so there
is no need for SMP conditionals there.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

57a42043

ARM: iop: make iop_handle_irq() static · a14a96d7

Ard Biesheuvel authored Jan 24, 2022

The build bots complain about iop_handle_irq() not being declared so
let's make it static instead.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

a14a96d7

ARM: mm: make vmalloc_seq handling SMP safe · d31e23af

Ard Biesheuvel authored Jan 10, 2022

Rework the vmalloc_seq handling so it can be used safely under SMP, as
we started using it to ensure that vmap'ed stacks are guaranteed to be
mapped by the active mm before switching to a task, and here we need to
ensure that changes to the page tables are visible to other CPUs when
they observe a change in the sequence count.

Since LPAE needs none of this, fold a check against it into the
vmalloc_seq counter check after breaking it out into a separate static
inline helper.

Given that vmap'ed stacks are now also supported on !SMP configurations,
let's drop the WARN() that could potentially now fire spuriously.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

d31e23af

ARM: entry: avoid clobbering R9 in IRQ handler · aa0a20f5

Ard Biesheuvel authored Jan 24, 2022

Avoid using R9 in the IRQ handler code, as the entry code uses it for
tsk, and expects it to remain untouched between the IRQ entry and exit
code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

aa0a20f5

ARM: smp: elide HWCAP_TLS checks or __entry_task updates on SMP+v6 · 75fa4adc

Ard Biesheuvel authored Jan 24, 2022

Use the SMP_ON_UP patching framework to elide HWCAP_TLS tests from the
context switch and return to userspace code paths, as SMP systems are
guaranteed to have this h/w capability.

At the same time, omit the update of __entry_task if the system is
detected to be UP at runtime, as in that case, the value is never used.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

75fa4adc

24 Jan, 2022 2 commits

ARM: assembler: define a Kconfig symbol for group relocation support · d6905849

Ard Biesheuvel authored Jan 24, 2022

Nathan reports the group relocations go out of range in pathological
cases such as allyesconfig kernels, which have little chance of actually
booting but are still used in validation.

So add a Kconfig symbol for this feature, and make it depend on
!COMPILE_TEST.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

d6905849

ARM: mm: switch to swapper_pg_dir early for vmap'ed stack · 8b806b82

Ard Biesheuvel authored Jan 24, 2022

When onlining a CPU, switch to swapper_pg_dir as soon as possible so
that it is guaranteed that the vmap'ed stack is mapped before it is
used.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

8b806b82

06 Jan, 2022 1 commit

ARM: 9176/1: avoid literal references in inline assembly · 5fe41793

Ard Biesheuvel authored Dec 24, 2021

Nathan reports that the new get_current() and per-CPU offset accessors
may cause problems at build time due to the use of a literal to hold the
address of the respective variables. This is due to the fact that LLD
before v14 does not support the PC-relative group relocations that are
normally used for this, and the fallback relies on literals but does not
emit the literal pools explictly using the .ltorg directive.

./arch/arm/include/asm/current.h:53:6: error: out of range pc-relative fixup value
        asm(LOAD_SYM_ARMV6(%0, __current) : "=r"(cur));
            ^
./arch/arm/include/asm/insn.h:25:2: note: expanded from macro 'LOAD_SYM_ARMV6'
        "       ldr     " #reg ", =" #sym "                     nt"
        ^
<inline asm>:1:3: note: instantiated into assembly here
                ldr     r0, =__current
                ^

Since emitting a literal pool in this particular case is not possible,
let's avoid the LOAD_SYM_ARMV6() entirely, and use the ordinary C
assigment instead.

As it turns out, there are other such cases, and here, using .ltorg to
emit the literal pool within range of the LDR instruction would be
possible due to the presence of an unconditional branch right after it.
Unfortunately, putting .ltorg directives in subsections appears to
confuse the Clang inline assembler, resulting in similar errors even
though the .ltorg is most definitely within range.

So let's fix this by emitting the literal explicitly, and not rely on
the assembler to figure this out. This means we have move the fallback
out of the LOAD_SYM_ARMV6() macro and into the callers.

Link: https://github.com/ClangBuiltLinux/linux/issues/1551

Fixes: 9c46929e ("ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

5fe41793

05 Jan, 2022 1 commit

ARM: 9177/1: disable vmap'ed stacks on suspend-capable SMP configs · 23d9a928

Ard Biesheuvel authored Jan 05, 2022

There are several reports about the new vmap'ed stacks code breaking
suspend/resume on Exynos, Renesas and Tegra SMP platforms. While this is
under investigation, let's disable the vmap'ed stacks feature for the
time being for SMP configurations that have suspend/resume enabled.

[0] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-8-ardb@kernel.org/

Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

23d9a928

17 Dec, 2021 1 commit

Merge tag 'arm-irq-and-vmap-stacks-for-rmk' of... · 9cf72c35

Russell King (Oracle) authored Dec 17, 2021

Merge tag 'arm-irq-and-vmap-stacks-for-rmk' of git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux into devel-stable

ARM: support for IRQ and vmap'ed stacks

This PR covers all the work related to implementing IRQ stacks and
vmap'ed stacks for all 32-bit ARM systems that are currently supported
by the Linux kernel, including RiscPC and Footbridge. It has been
submitted for review in three different waves:
- IRQ stacks support for v7 SMP systems [0],
- vmap'ed stacks support for v7 SMP systems[1],
- extending support for both IRQ stacks and vmap'ed stacks for all
  remaining configurations, including v6/v7 SMP multiplatform kernels
  and uniprocessor configurations including v7-M [2]

[0] https://lore.kernel.org/linux-arm-kernel/20211115084732.3704393-1-ardb@kernel.org/
[1] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-1-ardb@kernel.org/
[2] https://lore.kernel.org/linux-arm-kernel/20211206164659.1495084-1-ardb@kernel.org/

9cf72c35

06 Dec, 2021 8 commits

ARM: v7m: enable support for IRQ stacks · cafc0eab

Ard Biesheuvel authored Dec 02, 2021

Enable support for IRQ stacks on !MMU, and add the code to the IRQ entry
path to switch to the IRQ stack if not running from it already.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M

cafc0eab

ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems · 9c46929e

Ard Biesheuvel authored Nov 24, 2021

On UP systems, only a single task can be 'current' at the same time,
which means we can use a global variable to track it. This means we can
also enable THREAD_INFO_IN_TASK for those systems, as in that case,
thread_info is accessed via current rather than the other way around,
removing the need to store thread_info at the base of the task stack.
This, in turn, permits us to enable IRQ stacks and vmap'ed stacks on UP
systems as well.

To partially mitigate the performance overhead of this arrangement, use
a ADD/ADD/LDR sequence with the appropriate PC-relative group
relocations to load the value of current when needed. This means that
accessing current will still only require a single load as before,
avoiding the need for a literal to carry the address of the global
variable in each function. However, accessing thread_info will now
require this load as well.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M

9c46929e

ARM: smp: defer TPIDRURO update for SMP v6 configurations too · c2755910

Ard Biesheuvel authored Nov 25, 2021

Defer TPIDURO updates for user space until exit also for CPU_V6+SMP
configurations so that we can decide at runtime whether to use it to
carry the current pointer, provided that we are running on a CPU that
actually implements this register. This is needed for
THREAD_INFO_IN_TASK support for UP systems, which requires that all SMP
capable systems use the TPIDRURO based access to 'current' as the only
remaining alternative will be a global variable which only works on UP.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M

c2755910

ARM: use TLS register for 'current' on !SMP as well · b87cf911

Ard Biesheuvel authored Nov 26, 2021

Enable the use of the TLS register to hold the 'current' pointer also on
non-SMP configurations that target v6k or later CPUs. This will permit
the use of THREAD_INFO_IN_TASK as well as IRQ stacks and vmap'ed stacks
for such configurations.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M

b87cf911

ARM: percpu: add SMP_ON_UP support · 7b9896c3

Ard Biesheuvel authored Nov 25, 2021

Permit the use of the TPIDRPRW system register for carrying the per-CPU
offset in generic SMP configurations that also target non-SMP capable
ARMv6 cores. This uses the SMP_ON_UP code patching framework to turn all
TPIDRPRW accesses into reads/writes of entry #0 in the __per_cpu_offset
array.

While at it, switch over some existing direct TPIDRPRW accesses in asm
code to invocations of a new helper that is patched in the same way when
necessary.

Note that CPU_V6+SMP without SMP_ON_UP results in a kernel that does not
boot on v6 CPUs without SMP extensions, so add this dependency to
Kconfig as well.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M

7b9896c3

ARM: assembler: add optimized ldr/str macros to load variables from memory · 4e918ab1

Ard Biesheuvel authored Nov 26, 2021

We will be adding variable loads to various hot paths, so it makes sense
to add a helper macro that can load variables from asm code without the
use of literal pool entries. On v7 or later, we can simply use MOVW/MOVT
pairs, but on earlier cores, this requires a bit of hackery to emit a
instruction sequence that implements this using a sequence of ADD/LDR
instructions.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M

4e918ab1

ARM: module: implement support for PC-relative group relocations · 1fa8c4b1

Ard Biesheuvel authored Nov 24, 2021

Add support for the R_ARM_ALU_PC_Gn_NC and R_ARM_LDR_PC_G2 group
relocations [0] so we can use them in modules. These will be used to
load the current task pointer from a global variable without having to
rely on a literal pool entry to carry the address of this variable,
which may have a significant negative impact on cache utilization for
variables that are used often and in many different places, as each
occurrence will result in a literal pool entry and therefore a line in
the D-cache.

[0] 'ELF for the ARM architecture'
    https://github.com/ARM-software/abi-aa/releasesAcked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M

1fa8c4b1

ARM: entry: preserve thread_info pointer in switch_to · 831a469b

Ard Biesheuvel authored Nov 24, 2021

Tweak the UP stack protector handling code so that the thread info
pointer is preserved in R7 until set_current is called. This is needed
for a subsequent patch that implements THREAD_INFO_IN_TASK and
set_current for UP as well.

This also means we will prefer the per-task protector on UP systems that
implement the thread ID registers, so tweak the preprocessor
conditionals to reflect this.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M

831a469b