• Mark Rutland's avatar
    arm64: atomics: lse: define RETURN ops in terms of FETCH ops · 053f58ba
    Mark Rutland authored
    The FEAT_LSE atomic instructions include LD* instructions which return
    the original value of a memory location can be used to directly
    implement FETCH opertations. Each RETURN op is implemented as a copy of
    the corresponding FETCH op with a trailing instruction to generate the
    new value of the memory location. We only directly implement
    *_fetch_add*(), for which we have a trailing `add` instruction.
    
    As the compiler has no visibility of the `add`, this leads to less than
    optimal code generation when consuming the result.
    
    For example, the compiler cannot constant-fold the addition into later
    operations, and currently GCC 11.1.0 will compile:
    
           return __lse_atomic_sub_return(1, v) == 0;
    
    As:
    
    	mov     w1, #0xffffffff
    	ldaddal w1, w2, [x0]
    	add     w1, w1, w2
    	cmp     w1, #0x0
    	cset    w0, eq  // eq = none
    	ret
    
    This patch improves this by replacing the `add` with C addition after
    the inline assembly block, e.g.
    
    	ret += i;
    
    This allows the compiler to manipulate `i`. This permits the compiler to
    merge the `add` and `cmp` for the above, e.g.
    
    	mov     w1, #0xffffffff
    	ldaddal w1, w1, [x0]
    	cmp     w1, #0x1
    	cset    w0, eq  // eq = none
    	ret
    
    With this change the assembly for each RETURN op is identical to the
    corresponding FETCH op (including barriers and clobbers) so I've removed
    the inline assembly and rewritten each RETURN op in terms of the
    corresponding FETCH op, e.g.
    
    | static inline void __lse_atomic_add_return(int i, atomic_t *v)
    | {
    |       return __lse_atomic_fetch_add(i, v) + i
    | }
    
    The new construction does not adversely affect the common case, and
    before and after this patch GCC 11.1.0 can compile:
    
    	__lse_atomic_add_return(i, v)
    
    As:
    
    	ldaddal w0, w2, [x1]
    	add     w0, w0, w2
    
    ... while having the freedom to do better elsewhere.
    
    This is intended as an optimization and cleanup.
    There should be no functional change as a result of this patch.
    Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Will Deacon <will@kernel.org>
    Acked-by: default avatarWill Deacon <will@kernel.org>
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20211210151410.2782645-6-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    053f58ba
atomic_lse.h 8.4 KB