1. 21 Sep, 2019 3 commits
    • Martin Möhrmann's avatar
      compile: prefer an AND instead of SHR+SHL instructions · 4e2b84ff
      Martin Möhrmann authored
      On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute.
      A pair of shifts that operate on the same register will take 2 cycles
      and needs to wait for the input register value to be available.
      
      Large constants used to mask the high bits of a register with an AND
      instruction can not be encoded as an immediate in the AND instruction
      on amd64 and therefore need to be loaded into a register with a MOV
      instruction.
      
      However that MOV instruction is not dependent on the output register and
      on many CPUs does not compete with the AND or shift instructions for
      execution ports.
      
      Using a pair of shifts to mask high bits instead of an AND to mask high
      bits of a register has a shorter encoding and uses one less general
      purpose register but is slower due to taking one clock cycle longer
      if there is no register pressure that would make the AND variant need to
      generate a spill.
      
      For example the instructions emitted for (x & 1 << 63) before this CL are:
      48c1ea3f                SHRQ $0x3f, DX
      48c1e23f                SHLQ $0x3f, DX
      
      after this CL the instructions are the same as GCC and LLVM use:
      48b80000000000000080    MOVQ $0x8000000000000000, AX
      4821d0                  ANDQ DX, AX
      
      Some platforms such as arm64 already have SSA optimization rules to fuse
      two shift instructions back into an AND.
      
      Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark:
      
          var GlobalU uint
      
          func BenchmarkAndHighBits(b *testing.B) {
              x := uint(0)
              for i := 0; i < b.N; i++ {
                      x &= 1 << 63
              }
              GlobalU = x
          }
      
      amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:
      name           old time/op  new time/op  delta
      AndHighBits-4  0.61ns ± 6%  0.42ns ± 6%  -31.42%  (p=0.000 n=25+25):
      
      'go run run.go -all_codegen -v codegen' passes  with following adjustments:
      
      ARM64: The BFXIL pattern ((x << lc) >> rc | y & ac) needed adjustment
             since ORshiftRL generation fusing '>> rc' and '|' interferes
             with matching ((x << lc) >> rc) to generate UBFX. Previously
             ORshiftLL was created first using the shifts generated for (y & ac).
      
      S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs.
      
      Updates #33826
      Updates #32781
      
      Change-Id: I43227da76b625de03fbc51117162b23b9c678cdb
      Reviewed-on: https://go-review.googlesource.com/c/go/+/194297
      Run-TryBot: Martin Möhrmann <martisch@uos.de>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      4e2b84ff
    • Agniva De Sarker's avatar
      test/codegen: fix wasm codegen breakage · ecc7dd54
      Agniva De Sarker authored
      i32.eqz instructions don't appear unless needed in if conditions anymore
      after CL 195204. I forgot to run the codegen tests while submitting the CL.
      
      Thanks to @martisch for catching it.
      
      Fixes #34442
      
      Change-Id: I177b064b389be48e39d564849714d7a8839be13e
      Reviewed-on: https://go-review.googlesource.com/c/go/+/196580
      Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMartin Möhrmann <moehrmann@google.com>
      ecc7dd54
    • Agniva De Sarker's avatar
      cmd/compile: optimize ssa if blocks for wasm architecture · 9c384cc5
      Agniva De Sarker authored
      Check for the next block and accordingly place the successor blocks.
      This saves an additional jump instruction if the next block is any one
      of the successor blocks.
      
      While at it, inline the logic of goToBlock.
      
      Reduces the size of pkg/js_wasm by 264 bytes.
      
      Change-Id: I671ac4322e6edcb0d7e590dcca27e074268068d5
      Reviewed-on: https://go-review.googlesource.com/c/go/+/195204
      Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRichard Musiol <neelance@gmail.com>
      9c384cc5
  2. 20 Sep, 2019 6 commits
  3. 19 Sep, 2019 12 commits
  4. 18 Sep, 2019 19 commits