An error occurred fetching the project authors.
  1. 03 Apr, 2017 1 commit
    • Keith Randall's avatar
      cmd/compile: automatically handle commuting ops in rewrite rules · 53f8a6ae
      Keith Randall authored
      Note that this is a redo of an undo of the original buggy CL 38666.
      
      We have lots of rewrite rules that vary only in the fact that
      we have 2 versions for the 2 different orderings of various
      commuting ops. For example:
      
      (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x)
      (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x)
      
      It can get unwieldly quickly, especially when there is more than
      one commuting op in a rule.
      
      Our existing "fix" for this problem is to have rules that
      canonicalize the operations first. For example:
      
      (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x)
      
      Subsequent rules can then assume if there is a constant arg to Eq64,
      it will be the first one. This fix kinda works, but it is fragile and
      only works when we remember to include the required extra rules.
      
      The fundamental problem is that the rule matcher doesn't
      know anything about commuting ops. This CL fixes that fact.
      
      We already have information about which ops commute. (The register
      allocator takes advantage of commutivity.)  The rule generator now
      automatically generates multiple rules for a single source rule when
      there are commutative ops in the rule. We can now drop all of our
      almost-duplicate source-level rules and the canonicalization rules.
      
      I have some CLs in progress that will be a lot less verbose when
      the rule generator handles commutivity for me.
      
      I had to reorganize the load-combining rules a bit. The 8-way OR rules
      generated 128 different reorderings, which was causing the generator
      to put too much code in the rewrite*.go files (the big ones were going
      from 25K lines to 132K lines). Instead I reorganized the rules to
      combine pairs of loads at a time. The generated rule files are now
      actually a bit (5%) smaller.
      
      Make.bash times are ~unchanged.
      
      Compiler benchmarks are not observably different. Probably because
      we don't spend much compiler time in rule matching anyway.
      
      I've also done a pass over all of our ops adding commutative markings
      for ops which hadn't had them previously.
      
      Fixes #18292
      
      Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805
      Reviewed-on: https://go-review.googlesource.com/38801
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      53f8a6ae
  2. 30 Mar, 2017 1 commit
  3. 29 Mar, 2017 2 commits
    • Keith Randall's avatar
      Revert "cmd/compile: automatically handle commuting ops in rewrite rules" · 68da265c
      Keith Randall authored
      This reverts commit 041ecb69.
      
      Reason for revert: Not working on S390x and some 386 archs.
      I have a guess why the S390x is failing.  No clue on the 386 yet.
      Revert until I can figure it out.
      
      Change-Id: I64f1ce78fa6d1037ebe7ee2a8a8107cb4c1db70c
      Reviewed-on: https://go-review.googlesource.com/38790Reviewed-by: default avatarKeith Randall <khr@golang.org>
      68da265c
    • Keith Randall's avatar
      cmd/compile: automatically handle commuting ops in rewrite rules · 041ecb69
      Keith Randall authored
      We have lots of rewrite rules that vary only in the fact that
      we have 2 versions for the 2 different orderings of various
      commuting ops. For example:
      
      (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x)
      (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x)
      
      It can get unwieldly quickly, especially when there is more than
      one commuting op in a rule.
      
      Our existing "fix" for this problem is to have rules that
      canonicalize the operations first. For example:
      
      (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x)
      
      Subsequent rules can then assume if there is a constant arg to Eq64,
      it will be the first one. This fix kinda works, but it is fragile and
      only works when we remember to include the required extra rules.
      
      The fundamental problem is that the rule matcher doesn't
      know anything about commuting ops. This CL fixes that fact.
      
      We already have information about which ops commute. (The register
      allocator takes advantage of commutivity.)  The rule generator now
      automatically generates multiple rules for a single source rule when
      there are commutative ops in the rule. We can now drop all of our
      almost-duplicate source-level rules and the canonicalization rules.
      
      I have some CLs in progress that will be a lot less verbose when
      the rule generator handles commutivity for me.
      
      I had to reorganize the load-combining rules a bit. The 8-way OR rules
      generated 128 different reorderings, which was causing the generator
      to put too much code in the rewrite*.go files (the big ones were going
      from 25K lines to 132K lines). Instead I reorganized the rules to
      combine pairs of loads at a time. The generated rule files are now
      actually a bit (5%) smaller.
      [Note to reviewers: check these carefully. Most of the other rule
      changes are trivial.]
      
      Make.bash times are ~unchanged.
      
      Compiler benchmarks are not observably different. Probably because
      we don't spend much compiler time in rule matching anyway.
      
      I've also done a pass over all of our ops adding commutative markings
      for ops which hadn't had them previously.
      
      Fixes #18292
      
      Change-Id: I999b1307272e91965b66754576019dedcbe7527a
      Reviewed-on: https://go-review.googlesource.com/38666
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      041ecb69
  4. 14 Mar, 2017 1 commit
  5. 13 Mar, 2017 1 commit
  6. 28 Feb, 2017 1 commit
    • Michael Munday's avatar
      cmd/compile: emit fused multiply-{add,subtract} instructions on s390x · bd8a39b6
      Michael Munday authored
      Explcitly block fused multiply-add pattern matching when a cast is used
      after the multiplication, for example:
      
          - (a * b) + c        // can emit fused multiply-add
          - float64(a * b) + c // cannot emit fused multiply-add
      
      float{32,64} and complex{64,128} casts of matching types are now kept
      as OCONV operations rather than being replaced with OCONVNOP operations
      because they now imply a rounding operation (and therefore aren't a
      no-op anymore).
      
      Operations (for example, multiplication) on complex types may utilize
      fused multiply-add and -subtract instructions internally. There is no
      way to disable this behavior at the moment.
      
      Improves the performance of the floating point implementation of
      poly1305:
      
      name         old speed     new speed     delta
      64           246MB/s ± 0%  275MB/s ± 0%  +11.48%   (p=0.000 n=10+8)
      1K           312MB/s ± 0%  357MB/s ± 0%  +14.41%  (p=0.000 n=10+10)
      64Unaligned  246MB/s ± 0%  274MB/s ± 0%  +11.43%  (p=0.000 n=10+10)
      1KUnaligned  312MB/s ± 0%  357MB/s ± 0%  +14.39%   (p=0.000 n=10+8)
      
      Updates #17895.
      
      Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
      Reviewed-on: https://go-review.googlesource.com/36963
      Run-TryBot: Michael Munday <munday@ca.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      bd8a39b6
  7. 22 Feb, 2017 1 commit
  8. 03 Feb, 2017 1 commit
    • Michael Munday's avatar
      cmd/compile: fix type propagation through s390x SSA rules · ddf807fc
      Michael Munday authored
      This CL fixes two issues:
      
      1. Load ops were initially always lowered to unsigned loads, even
         for signed types. This was fine by itself however LoadReg ops
         (used to re-load spilled values) were lowered to signed loads
         for signed types. This meant that spills could invalidate
         optimizations that assumed the original unsigned load.
      
      2. Types were not always being maintained correctly through rules
         designed to eliminate unnecessary zero and sign extensions.
      
      Fixes #18906.
      
      Change-Id: I95785dcadba03f7e3e94524677e7d8d3d3b9b737
      Reviewed-on: https://go-review.googlesource.com/36256
      Run-TryBot: Michael Munday <munday@ca.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      ddf807fc
  9. 02 Feb, 2017 1 commit
  10. 25 Oct, 2016 1 commit
  11. 17 Oct, 2016 1 commit
    • Michael Munday's avatar
      cmd/compile: merge loads into operations on s390x · 1cfb5c3f
      Michael Munday authored
      Adds the new canMergeLoad function which can be used by rules to
      decide whether a load can be merged into an operation. The function
      ensures that the merge will not reorder the load relative to memory
      operations (for example, stores) in such a way that the block can no
      longer be scheduled.
      
      This new function enables transformations such as:
      
      MOVD 0(R1), R2
      ADD  R2, R3
      
      to:
      
      ADD  0(R1), R3
      
      The two-operand form of the following instructions can now read a
      single memory operand:
      
       - ADD
       - ADDC
       - ADDW
       - MULLD
       - MULLW
       - SUB
       - SUBC
       - SUBE
       - SUBW
       - AND
       - ANDW
       - OR
       - ORW
       - XOR
       - XORW
      
      Improves SHA3 performance by 6-8%.
      
      Updates #15054.
      
      Change-Id: Ibcb9122126cd1a26f2c01c0dfdbb42fe5e7b5b94
      Reviewed-on: https://go-review.googlesource.com/29272
      Run-TryBot: Michael Munday <munday@ca.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      1cfb5c3f
  12. 11 Oct, 2016 1 commit
    • Michael Munday's avatar
      cmd/compile: make link register allocatable in non-leaf functions · 15817e40
      Michael Munday authored
      We save and restore the link register in non-leaf functions because
      it is clobbered by CALLs. It is therefore available for general
      purpose use.
      
      Only enabled on s390x currently. The RC4 benchmarks in particular
      benefit from the extra register:
      
      name     old speed     new speed     delta
      RC4_128  243MB/s ± 2%  341MB/s ± 2%  +40.46%  (p=0.008 n=5+5)
      RC4_1K   267MB/s ± 0%  359MB/s ± 1%  +34.32%  (p=0.008 n=5+5)
      RC4_8K   271MB/s ± 0%  362MB/s ± 0%  +33.61%  (p=0.008 n=5+5)
      
      Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f
      Reviewed-on: https://go-review.googlesource.com/30597Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Michael Munday <munday@ca.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      15817e40
  13. 07 Oct, 2016 1 commit
  14. 06 Oct, 2016 1 commit
  15. 30 Sep, 2016 1 commit
    • Michael Munday's avatar
      cmd/compile: improve load/store merging on s390x · 962dc4b4
      Michael Munday authored
      This commit makes the process of load/store merging more incremental
      for both big and little endian operations. It also adds support for
      32-bit shifts (needed to merge 16- and 32-bit loads/stores).
      
      In addition, the merging of little endian stores is now supported.
      Little endian stores are now up to 30 times faster.
      
      Change-Id: Iefdd81eda4a65b335f23c3ff222146540083ad9c
      Reviewed-on: https://go-review.googlesource.com/29956
      Run-TryBot: Michael Munday <munday@ca.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      962dc4b4
  16. 27 Sep, 2016 1 commit
  17. 19 Sep, 2016 1 commit
  18. 15 Sep, 2016 1 commit
    • Keith Randall's avatar
      cmd/compile: redo nil checks · 3134ab3c
      Keith Randall authored
      Get rid of BlockCheck. Josh goaded me into it, and I went
      down a rabbithole making it happen.
      
      NilCheck now panics if the pointer is nil and returns void, as before.
      BlockCheck is gone, and NilCheck is no longer a Control value for
      any block. It just exists (and deadcode knows not to throw it away).
      
      I rewrote the nilcheckelim pass to handle this case.  In particular,
      there can now be multiple NilCheck ops per block.
      
      I moved all of the arch-dependent nil check elimination done as
      part of ssaGenValue into its own proper pass, so we don't have to
      duplicate that code for every architecture.
      
      Making the arch-dependent nil check its own pass means I needed
      to add a bunch of flags to the opcode table so I could write
      the code without arch-dependent ops everywhere.
      
      Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
      Reviewed-on: https://go-review.googlesource.com/29120
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      3134ab3c
  19. 13 Sep, 2016 1 commit