1. 09 Sep, 2019 2 commits
    • Martin Möhrmann's avatar
      compile: prefer an AND instead of SHR+SHL instructions · 9ec7074a
      Martin Möhrmann authored
      On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute.
      A pair of shifts that operate on the same register will take 2 cycles
      and needs to wait for the input register value to be available.
      
      Large constants used to mask the high bits of a register with an AND
      instruction can not be encoded as an immediate in the AND instruction
      on amd64 and therefore need to be loaded into a register with a MOV
      instruction.
      
      However that MOV instruction is not dependent on the output register and
      on many CPUs does not compete with the AND or shift instructions for
      execution ports.
      
      Using a pair of shifts to mask high bits instead of an AND to mask high
      bits of a register has a shorter encoding and uses one less general
      purpose register but is slower due to taking one clock cycle longer
      if there is no register pressure that would make the AND variant need to
      generate a spill.
      
      For example the instructions emitted for (x & 1 << 63) before this CL are:
      48c1ea3f                SHRQ $0x3f, DX
      48c1e23f                SHLQ $0x3f, DX
      
      after this CL the instructions are the same as GCC and LLVM use:
      48b80000000000000080    MOVQ $0x8000000000000000, AX
      4821d0                  ANDQ DX, AX
      
      Some platforms such as arm64 already have SSA optimization rules to fuse
      two shift instructions back into an AND.
      
      Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark:
      
      var GlobalU uint
      
      func BenchmarkAndHighBits(b *testing.B) {
      	x := uint(0)
      	for i := 0; i < b.N; i++ {
      		x &= 1 << 63
      	}
      	GlobalU = x
      }
      
      amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:
      name           old time/op  new time/op  delta
      AndHighBits-4  0.61ns ± 6%  0.42ns ± 6%  -31.42%  (p=0.000 n=25+25):
      
      Updates #33826
      Updates #32781
      
      Change-Id: I862d3587446410c447b9a7265196b57f85358633
      Reviewed-on: https://go-review.googlesource.com/c/go/+/191780
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      9ec7074a
    • Keisuke Kishimoto's avatar
      syscall: minor cleanup of duplicated code · 844e6423
      Keisuke Kishimoto authored
      Call the Nano methods of Timespec and Timeval in TimespecToNsec and
      TimevalToNsec respectively, instead of duplicating the implementation.
      
      Change-Id: I17551ea54c59c1e45ce472e029c625093a67251a
      GitHub-Last-Rev: fecf43d163f4ebe72e8bb1d3854d4ad962c08b03
      GitHub-Pull-Request: golang/go#33390
      Reviewed-on: https://go-review.googlesource.com/c/go/+/188397Reviewed-by: default avatarDaniel Martí <mvdan@mvdan.cc>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      844e6423
  2. 08 Sep, 2019 2 commits
  3. 07 Sep, 2019 7 commits
  4. 06 Sep, 2019 18 commits
  5. 05 Sep, 2019 2 commits
  6. 04 Sep, 2019 9 commits