Commits · 53f8a6aeb089f981c9d4fc7871ef712d1669d0a4 · Kirill Smelkov / go

An error occurred fetching the project authors.

03 Apr, 2017 1 commit

cmd/compile: automatically handle commuting ops in rewrite rules · 53f8a6ae

Keith Randall authored 7 years ago

Note that this is a redo of an undo of the original buggy CL 38666.

We have lots of rewrite rules that vary only in the fact that
we have 2 versions for the 2 different orderings of various
commuting ops. For example:

(ADDL x (MOVLconst [c])) -> (ADDLconst [c] x)
(ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x)

It can get unwieldly quickly, especially when there is more than
one commuting op in a rule.

Our existing "fix" for this problem is to have rules that
canonicalize the operations first. For example:

(Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x)

Subsequent rules can then assume if there is a constant arg to Eq64,
it will be the first one. This fix kinda works, but it is fragile and
only works when we remember to include the required extra rules.

The fundamental problem is that the rule matcher doesn't
know anything about commuting ops. This CL fixes that fact.

We already have information about which ops commute. (The register
allocator takes advantage of commutivity.)  The rule generator now
automatically generates multiple rules for a single source rule when
there are commutative ops in the rule. We can now drop all of our
almost-duplicate source-level rules and the canonicalization rules.

I have some CLs in progress that will be a lot less verbose when
the rule generator handles commutivity for me.

I had to reorganize the load-combining rules a bit. The 8-way OR rules
generated 128 different reorderings, which was causing the generator
to put too much code in the rewrite*.go files (the big ones were going
from 25K lines to 132K lines). Instead I reorganized the rules to
combine pairs of loads at a time. The generated rule files are now
actually a bit (5%) smaller.

Make.bash times are ~unchanged.

Compiler benchmarks are not observably different. Probably because
we don't spend much compiler time in rule matching anyway.

I've also done a pass over all of our ops adding commutative markings
for ops which hadn't had them previously.

Fixes #18292

Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805
Reviewed-on: https://go-review.googlesource.com/38801
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>

53f8a6ae

30 Mar, 2017 1 commit

cmd/compile/internal/ssa/gen: add comment on SB-addressing on s390x · 399086f2

Michael Munday authored 7 years ago

During the review of CL 38801 it was noted that it would be nice
to have a bit more clarity on how-and-why SB addressing is handled
strangely on s390x. This additional comment should hopefully help.

In general SB is handled differently because not all instructions
have variants that use relative addressing.

Change-Id: I3379012ae3f167478c191c435939c3b876c645ed
Reviewed-on: https://go-review.googlesource.com/38952Reviewed-by: Keith Randall <khr@golang.org>

399086f2

29 Mar, 2017 2 commits

Revert "cmd/compile: automatically handle commuting ops in rewrite rules" · 68da265c

Keith Randall authored 7 years ago

This reverts commit 041ecb69.

Reason for revert: Not working on S390x and some 386 archs.
I have a guess why the S390x is failing.  No clue on the 386 yet.
Revert until I can figure it out.

Change-Id: I64f1ce78fa6d1037ebe7ee2a8a8107cb4c1db70c
Reviewed-on: https://go-review.googlesource.com/38790Reviewed-by: Keith Randall <khr@golang.org>

68da265c

cmd/compile: automatically handle commuting ops in rewrite rules · 041ecb69

Keith Randall authored 7 years ago

We have lots of rewrite rules that vary only in the fact that
we have 2 versions for the 2 different orderings of various
commuting ops. For example:

(ADDL x (MOVLconst [c])) -> (ADDLconst [c] x)
(ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x)

It can get unwieldly quickly, especially when there is more than
one commuting op in a rule.

Our existing "fix" for this problem is to have rules that
canonicalize the operations first. For example:

(Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x)

Subsequent rules can then assume if there is a constant arg to Eq64,
it will be the first one. This fix kinda works, but it is fragile and
only works when we remember to include the required extra rules.

The fundamental problem is that the rule matcher doesn't
know anything about commuting ops. This CL fixes that fact.

We already have information about which ops commute. (The register
allocator takes advantage of commutivity.)  The rule generator now
automatically generates multiple rules for a single source rule when
there are commutative ops in the rule. We can now drop all of our
almost-duplicate source-level rules and the canonicalization rules.

I have some CLs in progress that will be a lot less verbose when
the rule generator handles commutivity for me.

I had to reorganize the load-combining rules a bit. The 8-way OR rules
generated 128 different reorderings, which was causing the generator
to put too much code in the rewrite*.go files (the big ones were going
from 25K lines to 132K lines). Instead I reorganized the rules to
combine pairs of loads at a time. The generated rule files are now
actually a bit (5%) smaller.
[Note to reviewers: check these carefully. Most of the other rule
changes are trivial.]

Make.bash times are ~unchanged.

Compiler benchmarks are not observably different. Probably because
we don't spend much compiler time in rule matching anyway.

I've also done a pass over all of our ops adding commutative markings
for ops which hadn't had them previously.

Fixes #18292

Change-Id: I999b1307272e91965b66754576019dedcbe7527a
Reviewed-on: https://go-review.googlesource.com/38666
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>

041ecb69

14 Mar, 2017 1 commit

cmd/compile/internal/ssa: populate SymEffects for SSA Ops · 69175530

Matthew Dempsky authored 7 years ago

Changes to ${GOARCH}Ops.go files were mechanically produced using
github.com/mdempsky/ssa-symops, a one-off tool that inserts
"SymEffect: X" elements by pattern matching against the Op names.

Change-Id: Ibf3e481ffd588647f2a31662d72114b740ccbfcf
Reviewed-on: https://go-review.googlesource.com/38084
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

69175530

13 Mar, 2017 1 commit

cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall · 08d8d5c9

Matthew Dempsky authored 7 years ago

Passes toolstash-check -all.

Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2
Reviewed-on: https://go-review.googlesource.com/38080
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

08d8d5c9

28 Feb, 2017 1 commit

cmd/compile: emit fused multiply-{add,subtract} instructions on s390x · bd8a39b6

Michael Munday authored 8 years ago

Explcitly block fused multiply-add pattern matching when a cast is used
after the multiplication, for example:

    - (a * b) + c        // can emit fused multiply-add
    - float64(a * b) + c // cannot emit fused multiply-add

float{32,64} and complex{64,128} casts of matching types are now kept
as OCONV operations rather than being replaced with OCONVNOP operations
because they now imply a rounding operation (and therefore aren't a
no-op anymore).

Operations (for example, multiplication) on complex types may utilize
fused multiply-add and -subtract instructions internally. There is no
way to disable this behavior at the moment.

Improves the performance of the floating point implementation of
poly1305:

name         old speed     new speed     delta
64           246MB/s ± 0%  275MB/s ± 0%  +11.48%   (p=0.000 n=10+8)
1K           312MB/s ± 0%  357MB/s ± 0%  +14.41%  (p=0.000 n=10+10)
64Unaligned  246MB/s ± 0%  274MB/s ± 0%  +11.43%  (p=0.000 n=10+10)
1KUnaligned  312MB/s ± 0%  357MB/s ± 0%  +14.39%   (p=0.000 n=10+8)

Updates #17895.

Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
Reviewed-on: https://go-review.googlesource.com/36963
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

bd8a39b6

22 Feb, 2017 1 commit

cmd/compile: add opcode flag hasSideEffects for do-not-remove · 11b28309

David Chase authored 8 years ago

Added a flag to generic and various architectures' atomic
operations that are judged to have observable side effects
and thus cannot be dead-code-eliminated.

Test requires GOMAXPROCS > 1 without preemption in loop.

Fixes #19182.

Change-Id: Id2230031abd2cca0bbb32fd68fc8a58fb912070f
Reviewed-on: https://go-review.googlesource.com/37333
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

11b28309

03 Feb, 2017 1 commit

cmd/compile: fix type propagation through s390x SSA rules · ddf807fc

Michael Munday authored 8 years ago

This CL fixes two issues:

1. Load ops were initially always lowered to unsigned loads, even
   for signed types. This was fine by itself however LoadReg ops
   (used to re-load spilled values) were lowered to signed loads
   for signed types. This meant that spills could invalidate
   optimizations that assumed the original unsigned load.

2. Types were not always being maintained correctly through rules
   designed to eliminate unnecessary zero and sign extensions.

Fixes #18906.

Change-Id: I95785dcadba03f7e3e94524677e7d8d3d3b9b737
Reviewed-on: https://go-review.googlesource.com/36256
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

ddf807fc

02 Feb, 2017 1 commit

cmd/compile: remove nil check for Zero/Move on 386, AMD64, S390X · fddc0045

Cherry Zhang authored 8 years ago

Fixes #18003.

Change-Id: Iadcc5c424c64badecfb5fdbd4dbd9197df56182c
Reviewed-on: https://go-review.googlesource.com/33421
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

fddc0045

25 Oct, 2016 1 commit

cmd/compile: intrinsify atomic operations on s390x · 517a44d5

Michael Munday authored 8 years ago

Implements the following intrinsics on s390x:
 - AtomicAdd{32,64}
 - AtomicCompareAndSwap{32,64}
 - AtomicExchange{32,64}
 - AtomicLoad{32,64,Ptr}
 - AtomicStore{32,64,PtrNoWB}

I haven't added rules for And8 or Or8 yet.

Change-Id: I647af023a8e513718e90e98a60191e7af6167314
Reviewed-on: https://go-review.googlesource.com/31614
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

517a44d5

17 Oct, 2016 1 commit

cmd/compile: merge loads into operations on s390x · 1cfb5c3f

Michael Munday authored 8 years ago

Adds the new canMergeLoad function which can be used by rules to
decide whether a load can be merged into an operation. The function
ensures that the merge will not reorder the load relative to memory
operations (for example, stores) in such a way that the block can no
longer be scheduled.

This new function enables transformations such as:

MOVD 0(R1), R2
ADD  R2, R3

to:

ADD  0(R1), R3

The two-operand form of the following instructions can now read a
single memory operand:

 - ADD
 - ADDC
 - ADDW
 - MULLD
 - MULLW
 - SUB
 - SUBC
 - SUBE
 - SUBW
 - AND
 - ANDW
 - OR
 - ORW
 - XOR
 - XORW

Improves SHA3 performance by 6-8%.

Updates #15054.

Change-Id: Ibcb9122126cd1a26f2c01c0dfdbb42fe5e7b5b94
Reviewed-on: https://go-review.googlesource.com/29272
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

1cfb5c3f

11 Oct, 2016 1 commit

cmd/compile: make link register allocatable in non-leaf functions · 15817e40

Michael Munday authored 8 years ago

We save and restore the link register in non-leaf functions because
it is clobbered by CALLs. It is therefore available for general
purpose use.

Only enabled on s390x currently. The RC4 benchmarks in particular
benefit from the extra register:

name     old speed     new speed     delta
RC4_128  243MB/s ± 2%  341MB/s ± 2%  +40.46%  (p=0.008 n=5+5)
RC4_1K   267MB/s ± 0%  359MB/s ± 1%  +34.32%  (p=0.008 n=5+5)
RC4_8K   271MB/s ± 0%  362MB/s ± 0%  +33.61%  (p=0.008 n=5+5)

Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f
Reviewed-on: https://go-review.googlesource.com/30597Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

15817e40

07 Oct, 2016 1 commit

cmd/{asm,compile}: replace TESTB op with CMPWconst on s390x · 45b26a93

Michael Munday authored 8 years ago

TESTB was implemented as AND $0xff, Rx, REGTMP. Unfortunately there
is no 3-operand AND-with-immediate instruction and so it was emulated
by the assembler using two instructions.

This CL uses CMPW instead of AND and also optimizes CMPW to use
the chi instruction where possible.

Overall this CL reduces the size of the .text section of the
bin/go binary by ~2%.

Change-Id: Ic335c29fc1129378fcbb1265bfb10f5b744a0f3f
Reviewed-on: https://go-review.googlesource.com/30690
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

45b26a93

06 Oct, 2016 1 commit

cmd/{asm,compile}: add ANDW, ORW and XORW instructions to s390x · dd1dcf94

Michael Munday authored 8 years ago

Adds the following instructions and uses them in the SSA backend:

 - ANDW
 - ORW
 - XORW

The instruction encodings for 32-bit operations are typically shorter,
particularly when an immediate is used. For example, XORW $-1, R1
only requires one instruction, whereas XOR requires two.

Also removes some unused instructions (that were emulated):

 - ANDN
 - NAND
 - ORN
 - NOR

Change-Id: Iff2a16f52004ba498720034e354be9771b10cac4
Reviewed-on: https://go-review.googlesource.com/30291Reviewed-by: Cherry Zhang <cherryyz@google.com>

dd1dcf94

30 Sep, 2016 1 commit

cmd/compile: improve load/store merging on s390x · 962dc4b4

Michael Munday authored 8 years ago

This commit makes the process of load/store merging more incremental
for both big and little endian operations. It also adds support for
32-bit shifts (needed to merge 16- and 32-bit loads/stores).

In addition, the merging of little endian stores is now supported.
Little endian stores are now up to 30 times faster.

Change-Id: Iefdd81eda4a65b335f23c3ff222146540083ad9c
Reviewed-on: https://go-review.googlesource.com/29956
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

962dc4b4

27 Sep, 2016 1 commit

cmd/compile: remove duplicate nilchecks · 98938189

Keith Randall authored 8 years ago

Mark nil check operations as faulting if their arg is zero.
This lets the late nilcheck pass remove duplicates.

Fixes #17242.

Change-Id: I4c9938d8a5a1e43edd85b4a66f0b34004860bcd9
Reviewed-on: https://go-review.googlesource.com/29952
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

98938189

19 Sep, 2016 1 commit

cmd/compile: intrinsify Ctz{32,64} and Bswap{32,64} on s390x · e94c5293

Michael Munday authored 8 years ago

Also adds the 'find leftmost one' instruction (FLOGR) and replaces the
WORD-encoded use of FLOGR in math/big with it.

Change-Id: I18e7cd19e75b8501a6ae8bd925471f7e37ded206
Reviewed-on: https://go-review.googlesource.com/29372Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

e94c5293

15 Sep, 2016 1 commit

cmd/compile: redo nil checks · 3134ab3c

Keith Randall authored 8 years ago

Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.

NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).

I rewrote the nilcheckelim pass to handle this case.  In particular,
there can now be multiple NilCheck ops per block.

I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.

Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.

Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>

3134ab3c

13 Sep, 2016 1 commit

cmd/compile: add SSA backend for s390x and enable by default · 6ec993ad

Michael Munday authored 8 years ago

The new SSA backend modifies the ABI slightly: R0 is now a usable
general purpose register.

Fixes #16677.

Change-Id: I367435ce921e0c7e79e021c80cf8ef5d1d1466cf
Reviewed-on: https://go-review.googlesource.com/28978
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

6ec993ad