An error occurred fetching the project authors.
- 03 Apr, 2017 1 commit
-
-
Keith Randall authored
Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
- 30 Mar, 2017 1 commit
-
-
Michael Munday authored
During the review of CL 38801 it was noted that it would be nice to have a bit more clarity on how-and-why SB addressing is handled strangely on s390x. This additional comment should hopefully help. In general SB is handled differently because not all instructions have variants that use relative addressing. Change-Id: I3379012ae3f167478c191c435939c3b876c645ed Reviewed-on: https://go-review.googlesource.com/38952Reviewed-by:
Keith Randall <khr@golang.org>
-
- 29 Mar, 2017 2 commits
-
-
Keith Randall authored
This reverts commit 041ecb69. Reason for revert: Not working on S390x and some 386 archs. I have a guess why the S390x is failing. No clue on the 386 yet. Revert until I can figure it out. Change-Id: I64f1ce78fa6d1037ebe7ee2a8a8107cb4c1db70c Reviewed-on: https://go-review.googlesource.com/38790Reviewed-by:
Keith Randall <khr@golang.org>
-
Keith Randall authored
We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. [Note to reviewers: check these carefully. Most of the other rule changes are trivial.] Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: I999b1307272e91965b66754576019dedcbe7527a Reviewed-on: https://go-review.googlesource.com/38666 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
- 14 Mar, 2017 1 commit
-
-
Matthew Dempsky authored
Changes to ${GOARCH}Ops.go files were mechanically produced using github.com/mdempsky/ssa-symops, a one-off tool that inserts "SymEffect: X" elements by pattern matching against the Op names. Change-Id: Ibf3e481ffd588647f2a31662d72114b740ccbfcf Reviewed-on: https://go-review.googlesource.com/38084 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 13 Mar, 2017 1 commit
-
-
Matthew Dempsky authored
Passes toolstash-check -all. Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2 Reviewed-on: https://go-review.googlesource.com/38080 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 28 Feb, 2017 1 commit
-
-
Michael Munday authored
Explcitly block fused multiply-add pattern matching when a cast is used after the multiplication, for example: - (a * b) + c // can emit fused multiply-add - float64(a * b) + c // cannot emit fused multiply-add float{32,64} and complex{64,128} casts of matching types are now kept as OCONV operations rather than being replaced with OCONVNOP operations because they now imply a rounding operation (and therefore aren't a no-op anymore). Operations (for example, multiplication) on complex types may utilize fused multiply-add and -subtract instructions internally. There is no way to disable this behavior at the moment. Improves the performance of the floating point implementation of poly1305: name old speed new speed delta 64 246MB/s ± 0% 275MB/s ± 0% +11.48% (p=0.000 n=10+8) 1K 312MB/s ± 0% 357MB/s ± 0% +14.41% (p=0.000 n=10+10) 64Unaligned 246MB/s ± 0% 274MB/s ± 0% +11.43% (p=0.000 n=10+10) 1KUnaligned 312MB/s ± 0% 357MB/s ± 0% +14.39% (p=0.000 n=10+8) Updates #17895. Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16 Reviewed-on: https://go-review.googlesource.com/36963 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 22 Feb, 2017 1 commit
-
-
David Chase authored
Added a flag to generic and various architectures' atomic operations that are judged to have observable side effects and thus cannot be dead-code-eliminated. Test requires GOMAXPROCS > 1 without preemption in loop. Fixes #19182. Change-Id: Id2230031abd2cca0bbb32fd68fc8a58fb912070f Reviewed-on: https://go-review.googlesource.com/37333 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 03 Feb, 2017 1 commit
-
-
Michael Munday authored
This CL fixes two issues: 1. Load ops were initially always lowered to unsigned loads, even for signed types. This was fine by itself however LoadReg ops (used to re-load spilled values) were lowered to signed loads for signed types. This meant that spills could invalidate optimizations that assumed the original unsigned load. 2. Types were not always being maintained correctly through rules designed to eliminate unnecessary zero and sign extensions. Fixes #18906. Change-Id: I95785dcadba03f7e3e94524677e7d8d3d3b9b737 Reviewed-on: https://go-review.googlesource.com/36256 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 02 Feb, 2017 1 commit
-
-
Cherry Zhang authored
Fixes #18003. Change-Id: Iadcc5c424c64badecfb5fdbd4dbd9197df56182c Reviewed-on: https://go-review.googlesource.com/33421 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 25 Oct, 2016 1 commit
-
-
Michael Munday authored
Implements the following intrinsics on s390x: - AtomicAdd{32,64} - AtomicCompareAndSwap{32,64} - AtomicExchange{32,64} - AtomicLoad{32,64,Ptr} - AtomicStore{32,64,PtrNoWB} I haven't added rules for And8 or Or8 yet. Change-Id: I647af023a8e513718e90e98a60191e7af6167314 Reviewed-on: https://go-review.googlesource.com/31614 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org>
-
- 17 Oct, 2016 1 commit
-
-
Michael Munday authored
Adds the new canMergeLoad function which can be used by rules to decide whether a load can be merged into an operation. The function ensures that the merge will not reorder the load relative to memory operations (for example, stores) in such a way that the block can no longer be scheduled. This new function enables transformations such as: MOVD 0(R1), R2 ADD R2, R3 to: ADD 0(R1), R3 The two-operand form of the following instructions can now read a single memory operand: - ADD - ADDC - ADDW - MULLD - MULLW - SUB - SUBC - SUBE - SUBW - AND - ANDW - OR - ORW - XOR - XORW Improves SHA3 performance by 6-8%. Updates #15054. Change-Id: Ibcb9122126cd1a26f2c01c0dfdbb42fe5e7b5b94 Reviewed-on: https://go-review.googlesource.com/29272 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 11 Oct, 2016 1 commit
-
-
Michael Munday authored
We save and restore the link register in non-leaf functions because it is clobbered by CALLs. It is therefore available for general purpose use. Only enabled on s390x currently. The RC4 benchmarks in particular benefit from the extra register: name old speed new speed delta RC4_128 243MB/s ± 2% 341MB/s ± 2% +40.46% (p=0.008 n=5+5) RC4_1K 267MB/s ± 0% 359MB/s ± 1% +34.32% (p=0.008 n=5+5) RC4_8K 271MB/s ± 0% 362MB/s ± 0% +33.61% (p=0.008 n=5+5) Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f Reviewed-on: https://go-review.googlesource.com/30597Reviewed-by:
Cherry Zhang <cherryyz@google.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 07 Oct, 2016 1 commit
-
-
Michael Munday authored
TESTB was implemented as AND $0xff, Rx, REGTMP. Unfortunately there is no 3-operand AND-with-immediate instruction and so it was emulated by the assembler using two instructions. This CL uses CMPW instead of AND and also optimizes CMPW to use the chi instruction where possible. Overall this CL reduces the size of the .text section of the bin/go binary by ~2%. Change-Id: Ic335c29fc1129378fcbb1265bfb10f5b744a0f3f Reviewed-on: https://go-review.googlesource.com/30690 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org>
-
- 06 Oct, 2016 1 commit
-
-
Michael Munday authored
Adds the following instructions and uses them in the SSA backend: - ANDW - ORW - XORW The instruction encodings for 32-bit operations are typically shorter, particularly when an immediate is used. For example, XORW $-1, R1 only requires one instruction, whereas XOR requires two. Also removes some unused instructions (that were emulated): - ANDN - NAND - ORN - NOR Change-Id: Iff2a16f52004ba498720034e354be9771b10cac4 Reviewed-on: https://go-review.googlesource.com/30291Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 30 Sep, 2016 1 commit
-
-
Michael Munday authored
This commit makes the process of load/store merging more incremental for both big and little endian operations. It also adds support for 32-bit shifts (needed to merge 16- and 32-bit loads/stores). In addition, the merging of little endian stores is now supported. Little endian stores are now up to 30 times faster. Change-Id: Iefdd81eda4a65b335f23c3ff222146540083ad9c Reviewed-on: https://go-review.googlesource.com/29956 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
- 27 Sep, 2016 1 commit
-
-
Keith Randall authored
Mark nil check operations as faulting if their arg is zero. This lets the late nilcheck pass remove duplicates. Fixes #17242. Change-Id: I4c9938d8a5a1e43edd85b4a66f0b34004860bcd9 Reviewed-on: https://go-review.googlesource.com/29952 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 19 Sep, 2016 1 commit
-
-
Michael Munday authored
Also adds the 'find leftmost one' instruction (FLOGR) and replaces the WORD-encoded use of FLOGR in math/big with it. Change-Id: I18e7cd19e75b8501a6ae8bd925471f7e37ded206 Reviewed-on: https://go-review.googlesource.com/29372Reviewed-by:
Cherry Zhang <cherryyz@google.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 15 Sep, 2016 1 commit
-
-
Keith Randall authored
Get rid of BlockCheck. Josh goaded me into it, and I went down a rabbithole making it happen. NilCheck now panics if the pointer is nil and returns void, as before. BlockCheck is gone, and NilCheck is no longer a Control value for any block. It just exists (and deadcode knows not to throw it away). I rewrote the nilcheckelim pass to handle this case. In particular, there can now be multiple NilCheck ops per block. I moved all of the arch-dependent nil check elimination done as part of ssaGenValue into its own proper pass, so we don't have to duplicate that code for every architecture. Making the arch-dependent nil check its own pass means I needed to add a bunch of flags to the opcode table so I could write the code without arch-dependent ops everywhere. Change-Id: I419f891ac9b0de313033ff09115c374163416a9f Reviewed-on: https://go-review.googlesource.com/29120 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
- 13 Sep, 2016 1 commit
-
-
Michael Munday authored
The new SSA backend modifies the ABI slightly: R0 is now a usable general purpose register. Fixes #16677. Change-Id: I367435ce921e0c7e79e021c80cf8ef5d1d1466cf Reviewed-on: https://go-review.googlesource.com/28978 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-