• erifan01's avatar
    cmd/compile: optimize math/bits Len32 intrinsic on arm64 · dd91269b
    erifan01 authored
    Arm64 has a 32-bit CLZ instruction CLZW, which can be used for intrinsic Len32.
    Function LeadingZeros32 calls Len32, with this change, the assembly code of
    LeadingZeros32 becomes more concise.
    
    Go code:
    
    func f32(x uint32) { z = bits.LeadingZeros32(x) }
    
    Before:
    
    "".f32 STEXT size=32 args=0x8 locals=0x0 leaf
            0x0000 00000 (test.go:7)        TEXT    "".f32(SB), LEAF|NOFRAME|ABIInternal, $0-8
            0x0004 00004 (test.go:7)        MOVWU   "".x(FP), R0
            0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZ     R0, R0
            0x000c 00012 ($GOROOT/src/math/bits/bits.go:30) SUB     $32, R0, R0
            0x0010 00016 (test.go:7)        MOVD    R0, "".z(SB)
            0x001c 00028 (test.go:7)        RET     (R30)
    
    After:
    
    "".f32 STEXT size=32 args=0x8 locals=0x0 leaf
            0x0000 00000 (test.go:7)        TEXT    "".f32(SB), LEAF|NOFRAME|ABIInternal, $0-8
            0x0004 00004 (test.go:7)        MOVWU   "".x(FP), R0
            0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZW    R0, R0
            0x000c 00012 (test.go:7)        MOVD    R0, "".z(SB)
            0x0018 00024 (test.go:7)        RET     (R30)
    
    Benchmarks:
    name              old time/op  new time/op  delta
    LeadingZeros-8    2.53ns ± 0%  2.55ns ± 0%   +0.67%  (p=0.000 n=10+10)
    LeadingZeros8-8   3.56ns ± 0%  3.56ns ± 0%     ~     (all equal)
    LeadingZeros16-8  3.55ns ± 0%  3.56ns ± 0%     ~     (p=0.465 n=10+10)
    LeadingZeros32-8  3.55ns ± 0%  2.96ns ± 0%  -16.71%  (p=0.000 n=10+7)
    LeadingZeros64-8  2.53ns ± 0%  2.54ns ± 0%     ~     (p=0.059 n=8+10)
    
    Change-Id: Ie5666bb82909e341060e02ffd4e86c0e5d67e90a
    Reviewed-on: https://go-review.googlesource.com/c/157000
    Run-TryBot: Cherry Zhang <cherryyz@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
    dd91269b
ssa.go 186 KB