1. 22 Feb, 2018 5 commits
    • Alberto Donizetti's avatar
      cmd/compile: use | in the most repetitive s390x rules · 1e05924c
      Alberto Donizetti authored
      For now, limited to the most repetitive rules that are also short and
      simple, so that we can have a substantial conciseness win without
      compromising rules readability.
      
      Ran rulegen, no changes in the rewrite files.
      
      Change-Id: I8447784895a218c5c1b4dfa1cdb355bd73dabfd1
      Reviewed-on: https://go-review.googlesource.com/95955Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      1e05924c
    • Martin Möhrmann's avatar
      reflect: avoid calling common if type is known to be *rtype · 1dbe4c50
      Martin Möhrmann authored
      If the type of Type is known to be *rtype than the common
      function is a no-op and does not need to be called.
      
      name  old time/op  new time/op  delta
      New   31.0ns ± 5%  30.2ns ± 4%  -2.74%  (p=0.008 n=20+20)
      
      Change-Id: I5d00346dbc782e34c530166d1ee0499b24068b51
      Reviewed-on: https://go-review.googlesource.com/96115Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      1dbe4c50
    • Ben Shi's avatar
      cmd/compile: improve FP performance on ARM64 · f4c3072c
      Ben Shi authored
      FMADD/FMSUB/FNMADD/FNMSUB are efficient FP instructions, which can
      be used by the comiler to improve FP performance. This CL implements
      this optimization.
      
      1. The compilecmp benchmark shows little change.
      name        old time/op       new time/op       delta
      Template          2.35s ± 4%        2.38s ± 4%    ~     (p=0.161 n=15+15)
      Unicode           1.36s ± 5%        1.36s ± 4%    ~     (p=0.685 n=14+13)
      GoTypes           8.11s ± 3%        8.13s ± 2%    ~     (p=0.624 n=15+15)
      Compiler          40.5s ± 2%        40.7s ± 2%    ~     (p=0.137 n=15+15)
      SSA                115s ± 3%         116s ± 1%    ~     (p=0.270 n=15+14)
      Flate             1.46s ± 4%        1.45s ± 5%    ~     (p=0.870 n=15+15)
      GoParser          1.85s ± 2%        1.87s ± 3%    ~     (p=0.477 n=14+15)
      Reflect           5.11s ± 4%        5.10s ± 2%    ~     (p=0.624 n=15+15)
      Tar               2.23s ± 3%        2.23s ± 5%    ~     (p=0.624 n=15+15)
      XML               2.72s ± 5%        2.74s ± 3%    ~     (p=0.290 n=15+14)
      [Geo mean]        5.02s             5.03s       +0.29%
      
      name        old user-time/op  new user-time/op  delta
      Template          2.90s ± 2%        2.90s ± 3%    ~     (p=0.780 n=14+15)
      Unicode           1.71s ± 5%        1.70s ± 3%    ~     (p=0.458 n=14+13)
      GoTypes           9.77s ± 2%        9.76s ± 2%    ~     (p=0.838 n=15+15)
      Compiler          49.1s ± 2%        49.1s ± 2%    ~     (p=0.902 n=15+15)
      SSA                144s ± 1%         144s ± 2%    ~     (p=0.567 n=15+15)
      Flate             1.75s ± 5%        1.74s ± 3%    ~     (p=0.461 n=15+15)
      GoParser          2.22s ± 2%        2.21s ± 3%    ~     (p=0.233 n=15+15)
      Reflect           5.99s ± 2%        5.95s ± 1%    ~     (p=0.093 n=14+15)
      Tar               2.68s ± 2%        2.67s ± 3%    ~     (p=0.310 n=14+15)
      XML               3.22s ± 2%        3.24s ± 3%    ~     (p=0.512 n=15+15)
      [Geo mean]        6.08s             6.07s       -0.19%
      
      name        old text-bytes    new text-bytes    delta
      HelloSize         641kB ± 0%        641kB ± 0%    ~     (all equal)
      
      name        old data-bytes    new data-bytes    delta
      HelloSize        9.46kB ± 0%       9.46kB ± 0%    ~     (all equal)
      
      name        old bss-bytes     new bss-bytes     delta
      HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize        1.24MB ± 0%       1.24MB ± 0%    ~     (all equal)
      
      2. The go1 benchmark shows little improvement in total (excluding noise),
      but some improvement in test case Mandelbrot200 and FmtFprintfFloat.
      name                     old time/op    new time/op    delta
      BinaryTree17-4              42.1s ± 2%     42.0s ± 2%    ~     (p=0.453 n=30+28)
      Fannkuch11-4                33.5s ± 3%     33.3s ± 3%  -0.38%  (p=0.045 n=30+30)
      FmtFprintfEmpty-4           534ns ± 0%     534ns ± 0%    ~     (all equal)
      FmtFprintfString-4         1.09µs ± 0%    1.09µs ± 0%  -0.27%  (p=0.000 n=23+17)
      FmtFprintfInt-4            1.16µs ± 3%    1.16µs ± 3%    ~     (p=0.714 n=30+30)
      FmtFprintfIntInt-4         1.76µs ± 1%    1.77µs ± 0%  +0.15%  (p=0.002 n=23+23)
      FmtFprintfPrefixedInt-4    2.21µs ± 3%    2.20µs ± 3%    ~     (p=0.390 n=30+30)
      FmtFprintfFloat-4          3.28µs ± 0%    3.11µs ± 0%  -5.01%  (p=0.000 n=25+26)
      FmtManyArgs-4              7.18µs ± 0%    7.19µs ± 0%  +0.13%  (p=0.000 n=24+25)
      GobDecode-4                94.9ms ± 0%    95.6ms ± 5%  +0.83%  (p=0.002 n=23+29)
      GobEncode-4                80.7ms ± 4%    79.8ms ± 0%  -1.11%  (p=0.003 n=30+24)
      Gzip-4                      4.58s ± 4%     4.59s ± 3%  +0.26%  (p=0.002 n=30+26)
      Gunzip-4                    449ms ± 4%     443ms ± 0%    ~     (p=0.096 n=30+26)
      HTTPClientServer-4          553µs ± 1%     548µs ± 1%  -0.96%  (p=0.000 n=30+30)
      JSONEncode-4                215ms ± 4%     214ms ± 4%  -0.29%  (p=0.000 n=30+30)
      JSONDecode-4                868ms ± 4%     875ms ± 5%  +0.79%  (p=0.008 n=30+30)
      Mandelbrot200-4            51.4ms ± 0%    46.7ms ± 3%  -9.09%  (p=0.000 n=25+26)
      GoParse-4                  42.1ms ± 0%    41.8ms ± 0%  -0.61%  (p=0.000 n=25+24)
      RegexpMatchEasy0_32-4      1.02µs ± 4%    1.02µs ± 4%  -0.17%  (p=0.000 n=30+30)
      RegexpMatchEasy0_1K-4      3.90µs ± 0%    3.95µs ± 4%    ~     (p=0.516 n=23+30)
      RegexpMatchEasy1_32-4       970ns ± 3%     973ns ± 3%    ~     (p=0.951 n=30+30)
      RegexpMatchEasy1_1K-4      6.43µs ± 3%    6.33µs ± 0%  -1.62%  (p=0.000 n=30+25)
      RegexpMatchMedium_32-4     1.75µs ± 0%    1.75µs ± 0%    ~     (p=0.422 n=25+24)
      RegexpMatchMedium_1K-4      568µs ± 3%     562µs ± 0%    ~     (p=0.079 n=30+24)
      RegexpMatchHard_32-4       30.8µs ± 0%    31.2µs ± 4%  +1.46%  (p=0.018 n=23+30)
      RegexpMatchHard_1K-4        932µs ± 0%     946µs ± 3%  +1.49%  (p=0.000 n=24+30)
      Revcomp-4                   7.69s ± 3%     7.69s ± 2%  +0.04%  (p=0.032 n=24+25)
      Template-4                  893ms ± 5%     880ms ± 6%  -1.53%  (p=0.000 n=30+30)
      TimeParse-4                4.90µs ± 3%    4.84µs ± 0%    ~     (p=0.080 n=30+25)
      TimeFormat-4               4.70µs ± 1%    4.76µs ± 0%  +1.21%  (p=0.000 n=23+26)
      [Geo mean]                  710µs          706µs       -0.63%
      
      name                     old speed      new speed      delta
      GobDecode-4              8.09MB/s ± 0%  8.03MB/s ± 5%  -0.77%  (p=0.002 n=23+29)
      GobEncode-4              9.52MB/s ± 4%  9.62MB/s ± 0%  +1.07%  (p=0.003 n=30+24)
      Gzip-4                   4.24MB/s ± 4%  4.23MB/s ± 3%  -0.35%  (p=0.002 n=30+26)
      Gunzip-4                 43.2MB/s ± 4%  43.8MB/s ± 0%    ~     (p=0.123 n=30+26)
      JSONEncode-4             9.03MB/s ± 4%  9.06MB/s ± 4%  +0.28%  (p=0.000 n=30+30)
      JSONDecode-4             2.24MB/s ± 4%  2.22MB/s ± 5%  -0.79%  (p=0.008 n=30+30)
      GoParse-4                1.38MB/s ± 1%  1.38MB/s ± 0%    ~     (p=0.401 n=25+17)
      RegexpMatchEasy0_32-4    31.4MB/s ± 4%  31.5MB/s ± 3%  +0.16%  (p=0.000 n=30+30)
      RegexpMatchEasy0_1K-4     262MB/s ± 0%   259MB/s ± 4%    ~     (p=0.693 n=23+30)
      RegexpMatchEasy1_32-4    33.0MB/s ± 3%  32.9MB/s ± 3%    ~     (p=0.139 n=30+30)
      RegexpMatchEasy1_1K-4     159MB/s ± 3%   162MB/s ± 0%  +1.60%  (p=0.000 n=30+25)
      RegexpMatchMedium_32-4    570kB/s ± 0%   570kB/s ± 0%    ~     (all equal)
      RegexpMatchMedium_1K-4   1.80MB/s ± 3%  1.82MB/s ± 0%  +1.09%  (p=0.007 n=30+24)
      RegexpMatchHard_32-4     1.04MB/s ± 0%  1.03MB/s ± 3%  -1.38%  (p=0.003 n=23+30)
      RegexpMatchHard_1K-4     1.10MB/s ± 0%  1.08MB/s ± 3%  -1.52%  (p=0.000 n=24+30)
      Revcomp-4                33.0MB/s ± 3%  33.0MB/s ± 2%    ~     (p=0.128 n=24+25)
      Template-4               2.17MB/s ± 5%  2.21MB/s ± 6%  +1.61%  (p=0.000 n=30+30)
      [Geo mean]               7.79MB/s       7.79MB/s       +0.05%
      
      Change-Id: Ied3dbdb5ba8e386168629cba06fcd4263bbb83e1
      Reviewed-on: https://go-review.googlesource.com/94901
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      f4c3072c
    • erifan01's avatar
      cmd/asm: add arm64 instructions for math optimization · f5de4200
      erifan01 authored
      Add arm64 HW instructions FMADDD, FMADDS, FMSUBD, FMSUBS, FNMADDD, FNMADDS,
      FNMSUBD, FNMSUBS, VFMLA, VFMLS, VMOV (element) for math optimization.
      
      Add check on register element index and test cases.
      
      Change-Id: Ice07c50b1a02d488ad2cde2a4e8aea93f3e3afff
      Reviewed-on: https://go-review.googlesource.com/90876Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      f5de4200
    • David Chase's avatar
      cmd/compile: decouple emitted block order from regalloc block order · c18ff184
      David Chase authored
      While tinkering with different block orders for the preemptible
      loop experiment, crashed the register allocator with a "bad"
      one (these exist).  Realized that one knob was controlling
      two things (register allocation and branch patterns) and
      decided that life would be simpler if the two orders were
      independent.
      
      Ran some experiments and determined that we have probably,
      mostly, been optimizing for register allocation effects, not
      branch effects.  Bad block orders for register allocation are
      somewhat costly.
      
      This will also allow separate experimentation with perhaps-
      better block orders for register allocation.
      
      Change-Id: I6ecf2f24cca178b6f8acc0d3c4caaef043c11ed9
      Reviewed-on: https://go-review.googlesource.com/47314
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      c18ff184
  2. 21 Feb, 2018 29 commits
  3. 20 Feb, 2018 6 commits