arm64: percpu: Fix LSE implementation of value-returning pcpu atomics
Commit 959bf2fd ("arm64: percpu: Rewrite per-cpu ops to allow use of LSE atomics") introduced alternative code sequences for the arm64 percpu atomics, so that the LSE instructions can be patched in at runtime if they are supported by the CPU. Unfortunately, when patching in the LSE sequence for a value-returning pcpu atomic, the argument registers are the wrong way round. The implementation of this_cpu_add_return() therefore ends up adding uninitialised stack to the percpu variable and returning garbage. As it turns out, there aren't very many users of the value-returning percpu atomics in mainline and we only spotted this due to a failure in the kprobes selftests. In this case, when attempting to single-step over the out-of-line instruction slot, the debug monitors would not be enabled because calling this_cpu_inc_return() on the kernel debug monitor refcount would fail to detect the transition from 0. We would consequently execute past the slot and take an undefined instruction exception from the kernel, resulting in a BUG: | kernel BUG at arch/arm64/kernel/traps.c:421! | PREEMPT SMP | pc : do_undefinstr+0x268/0x278 | lr : do_undefinstr+0x124/0x278 | Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____)) | Call trace: | do_undefinstr+0x268/0x278 | el1_undef+0x10/0x78 | 0xffff00000803c004 | init_kprobes+0x150/0x180 | do_one_initcall+0x74/0x178 | kernel_init_freeable+0x188/0x224 | kernel_init+0x10/0x100 | ret_from_fork+0x10/0x1c Fix the argument order to get the value-returning pcpu atomics working correctly when implemented using the LSE instructions. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Showing
Please register or sign in to comment