• Martin Möhrmann's avatar
    compile: prefer an AND instead of SHR+SHL instructions · 9ec7074a
    Martin Möhrmann authored
    On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute.
    A pair of shifts that operate on the same register will take 2 cycles
    and needs to wait for the input register value to be available.
    
    Large constants used to mask the high bits of a register with an AND
    instruction can not be encoded as an immediate in the AND instruction
    on amd64 and therefore need to be loaded into a register with a MOV
    instruction.
    
    However that MOV instruction is not dependent on the output register and
    on many CPUs does not compete with the AND or shift instructions for
    execution ports.
    
    Using a pair of shifts to mask high bits instead of an AND to mask high
    bits of a register has a shorter encoding and uses one less general
    purpose register but is slower due to taking one clock cycle longer
    if there is no register pressure that would make the AND variant need to
    generate a spill.
    
    For example the instructions emitted for (x & 1 << 63) before this CL are:
    48c1ea3f                SHRQ $0x3f, DX
    48c1e23f                SHLQ $0x3f, DX
    
    after this CL the instructions are the same as GCC and LLVM use:
    48b80000000000000080    MOVQ $0x8000000000000000, AX
    4821d0                  ANDQ DX, AX
    
    Some platforms such as arm64 already have SSA optimization rules to fuse
    two shift instructions back into an AND.
    
    Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark:
    
    var GlobalU uint
    
    func BenchmarkAndHighBits(b *testing.B) {
    	x := uint(0)
    	for i := 0; i < b.N; i++ {
    		x &= 1 << 63
    	}
    	GlobalU = x
    }
    
    amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:
    name           old time/op  new time/op  delta
    AndHighBits-4  0.61ns ± 6%  0.42ns ± 6%  -31.42%  (p=0.000 n=25+25):
    
    Updates #33826
    Updates #32781
    
    Change-Id: I862d3587446410c447b9a7265196b57f85358633
    Reviewed-on: https://go-review.googlesource.com/c/go/+/191780
    Run-TryBot: Martin Möhrmann <moehrmann@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: default avatarKeith Randall <khr@golang.org>
    9ec7074a
ARM64.rules 155 KB