• Michael Munday's avatar
    cmd/compile, runtime: intrinsify atomic And8 and Or8 on s390x · b3885dbc
    Michael Munday authored
    Intrinsify these functions to match other platforms. Update the
    sequence of instructions used in the assembly implementations to
    match the intrinsics.
    
    Also, add a micro benchmark so we can more easily measure the
    performance of these two functions:
    
    name            old time/op  new time/op  delta
    And8-8          5.33ns ± 7%  2.55ns ± 8%  -52.12%  (p=0.000 n=20+20)
    And8Parallel-8  7.39ns ± 5%  3.74ns ± 4%  -49.34%  (p=0.000 n=20+20)
    Or8-8           4.84ns ±15%  2.64ns ±11%  -45.50%  (p=0.000 n=20+20)
    Or8Parallel-8   7.27ns ± 3%  3.84ns ± 4%  -47.10%  (p=0.000 n=19+20)
    
    By using a 'rotate then xor selected bits' instruction combined with
    either a 'load and and' or a 'load and or' instruction we can
    implement And8 and Or8 with far fewer instructions. Replacing
    'compare and swap' with atomic instructions may also improve
    performance when there is contention.
    
    Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01
    Reviewed-on: https://go-review.googlesource.com/c/go/+/204277
    Run-TryBot: Michael Munday <mike.munday@ibm.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
    b3885dbc
S390XOps.go 49.7 KB