• Ben Shi's avatar
    cmd/compile: improve FP performance on ARM64 · f4c3072c
    Ben Shi authored
    FMADD/FMSUB/FNMADD/FNMSUB are efficient FP instructions, which can
    be used by the comiler to improve FP performance. This CL implements
    this optimization.
    
    1. The compilecmp benchmark shows little change.
    name        old time/op       new time/op       delta
    Template          2.35s ± 4%        2.38s ± 4%    ~     (p=0.161 n=15+15)
    Unicode           1.36s ± 5%        1.36s ± 4%    ~     (p=0.685 n=14+13)
    GoTypes           8.11s ± 3%        8.13s ± 2%    ~     (p=0.624 n=15+15)
    Compiler          40.5s ± 2%        40.7s ± 2%    ~     (p=0.137 n=15+15)
    SSA                115s ± 3%         116s ± 1%    ~     (p=0.270 n=15+14)
    Flate             1.46s ± 4%        1.45s ± 5%    ~     (p=0.870 n=15+15)
    GoParser          1.85s ± 2%        1.87s ± 3%    ~     (p=0.477 n=14+15)
    Reflect           5.11s ± 4%        5.10s ± 2%    ~     (p=0.624 n=15+15)
    Tar               2.23s ± 3%        2.23s ± 5%    ~     (p=0.624 n=15+15)
    XML               2.72s ± 5%        2.74s ± 3%    ~     (p=0.290 n=15+14)
    [Geo mean]        5.02s             5.03s       +0.29%
    
    name        old user-time/op  new user-time/op  delta
    Template          2.90s ± 2%        2.90s ± 3%    ~     (p=0.780 n=14+15)
    Unicode           1.71s ± 5%        1.70s ± 3%    ~     (p=0.458 n=14+13)
    GoTypes           9.77s ± 2%        9.76s ± 2%    ~     (p=0.838 n=15+15)
    Compiler          49.1s ± 2%        49.1s ± 2%    ~     (p=0.902 n=15+15)
    SSA                144s ± 1%         144s ± 2%    ~     (p=0.567 n=15+15)
    Flate             1.75s ± 5%        1.74s ± 3%    ~     (p=0.461 n=15+15)
    GoParser          2.22s ± 2%        2.21s ± 3%    ~     (p=0.233 n=15+15)
    Reflect           5.99s ± 2%        5.95s ± 1%    ~     (p=0.093 n=14+15)
    Tar               2.68s ± 2%        2.67s ± 3%    ~     (p=0.310 n=14+15)
    XML               3.22s ± 2%        3.24s ± 3%    ~     (p=0.512 n=15+15)
    [Geo mean]        6.08s             6.07s       -0.19%
    
    name        old text-bytes    new text-bytes    delta
    HelloSize         641kB ± 0%        641kB ± 0%    ~     (all equal)
    
    name        old data-bytes    new data-bytes    delta
    HelloSize        9.46kB ± 0%       9.46kB ± 0%    ~     (all equal)
    
    name        old bss-bytes     new bss-bytes     delta
    HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
    
    name        old exe-bytes     new exe-bytes     delta
    HelloSize        1.24MB ± 0%       1.24MB ± 0%    ~     (all equal)
    
    2. The go1 benchmark shows little improvement in total (excluding noise),
    but some improvement in test case Mandelbrot200 and FmtFprintfFloat.
    name                     old time/op    new time/op    delta
    BinaryTree17-4              42.1s ± 2%     42.0s ± 2%    ~     (p=0.453 n=30+28)
    Fannkuch11-4                33.5s ± 3%     33.3s ± 3%  -0.38%  (p=0.045 n=30+30)
    FmtFprintfEmpty-4           534ns ± 0%     534ns ± 0%    ~     (all equal)
    FmtFprintfString-4         1.09µs ± 0%    1.09µs ± 0%  -0.27%  (p=0.000 n=23+17)
    FmtFprintfInt-4            1.16µs ± 3%    1.16µs ± 3%    ~     (p=0.714 n=30+30)
    FmtFprintfIntInt-4         1.76µs ± 1%    1.77µs ± 0%  +0.15%  (p=0.002 n=23+23)
    FmtFprintfPrefixedInt-4    2.21µs ± 3%    2.20µs ± 3%    ~     (p=0.390 n=30+30)
    FmtFprintfFloat-4          3.28µs ± 0%    3.11µs ± 0%  -5.01%  (p=0.000 n=25+26)
    FmtManyArgs-4              7.18µs ± 0%    7.19µs ± 0%  +0.13%  (p=0.000 n=24+25)
    GobDecode-4                94.9ms ± 0%    95.6ms ± 5%  +0.83%  (p=0.002 n=23+29)
    GobEncode-4                80.7ms ± 4%    79.8ms ± 0%  -1.11%  (p=0.003 n=30+24)
    Gzip-4                      4.58s ± 4%     4.59s ± 3%  +0.26%  (p=0.002 n=30+26)
    Gunzip-4                    449ms ± 4%     443ms ± 0%    ~     (p=0.096 n=30+26)
    HTTPClientServer-4          553µs ± 1%     548µs ± 1%  -0.96%  (p=0.000 n=30+30)
    JSONEncode-4                215ms ± 4%     214ms ± 4%  -0.29%  (p=0.000 n=30+30)
    JSONDecode-4                868ms ± 4%     875ms ± 5%  +0.79%  (p=0.008 n=30+30)
    Mandelbrot200-4            51.4ms ± 0%    46.7ms ± 3%  -9.09%  (p=0.000 n=25+26)
    GoParse-4                  42.1ms ± 0%    41.8ms ± 0%  -0.61%  (p=0.000 n=25+24)
    RegexpMatchEasy0_32-4      1.02µs ± 4%    1.02µs ± 4%  -0.17%  (p=0.000 n=30+30)
    RegexpMatchEasy0_1K-4      3.90µs ± 0%    3.95µs ± 4%    ~     (p=0.516 n=23+30)
    RegexpMatchEasy1_32-4       970ns ± 3%     973ns ± 3%    ~     (p=0.951 n=30+30)
    RegexpMatchEasy1_1K-4      6.43µs ± 3%    6.33µs ± 0%  -1.62%  (p=0.000 n=30+25)
    RegexpMatchMedium_32-4     1.75µs ± 0%    1.75µs ± 0%    ~     (p=0.422 n=25+24)
    RegexpMatchMedium_1K-4      568µs ± 3%     562µs ± 0%    ~     (p=0.079 n=30+24)
    RegexpMatchHard_32-4       30.8µs ± 0%    31.2µs ± 4%  +1.46%  (p=0.018 n=23+30)
    RegexpMatchHard_1K-4        932µs ± 0%     946µs ± 3%  +1.49%  (p=0.000 n=24+30)
    Revcomp-4                   7.69s ± 3%     7.69s ± 2%  +0.04%  (p=0.032 n=24+25)
    Template-4                  893ms ± 5%     880ms ± 6%  -1.53%  (p=0.000 n=30+30)
    TimeParse-4                4.90µs ± 3%    4.84µs ± 0%    ~     (p=0.080 n=30+25)
    TimeFormat-4               4.70µs ± 1%    4.76µs ± 0%  +1.21%  (p=0.000 n=23+26)
    [Geo mean]                  710µs          706µs       -0.63%
    
    name                     old speed      new speed      delta
    GobDecode-4              8.09MB/s ± 0%  8.03MB/s ± 5%  -0.77%  (p=0.002 n=23+29)
    GobEncode-4              9.52MB/s ± 4%  9.62MB/s ± 0%  +1.07%  (p=0.003 n=30+24)
    Gzip-4                   4.24MB/s ± 4%  4.23MB/s ± 3%  -0.35%  (p=0.002 n=30+26)
    Gunzip-4                 43.2MB/s ± 4%  43.8MB/s ± 0%    ~     (p=0.123 n=30+26)
    JSONEncode-4             9.03MB/s ± 4%  9.06MB/s ± 4%  +0.28%  (p=0.000 n=30+30)
    JSONDecode-4             2.24MB/s ± 4%  2.22MB/s ± 5%  -0.79%  (p=0.008 n=30+30)
    GoParse-4                1.38MB/s ± 1%  1.38MB/s ± 0%    ~     (p=0.401 n=25+17)
    RegexpMatchEasy0_32-4    31.4MB/s ± 4%  31.5MB/s ± 3%  +0.16%  (p=0.000 n=30+30)
    RegexpMatchEasy0_1K-4     262MB/s ± 0%   259MB/s ± 4%    ~     (p=0.693 n=23+30)
    RegexpMatchEasy1_32-4    33.0MB/s ± 3%  32.9MB/s ± 3%    ~     (p=0.139 n=30+30)
    RegexpMatchEasy1_1K-4     159MB/s ± 3%   162MB/s ± 0%  +1.60%  (p=0.000 n=30+25)
    RegexpMatchMedium_32-4    570kB/s ± 0%   570kB/s ± 0%    ~     (all equal)
    RegexpMatchMedium_1K-4   1.80MB/s ± 3%  1.82MB/s ± 0%  +1.09%  (p=0.007 n=30+24)
    RegexpMatchHard_32-4     1.04MB/s ± 0%  1.03MB/s ± 3%  -1.38%  (p=0.003 n=23+30)
    RegexpMatchHard_1K-4     1.10MB/s ± 0%  1.08MB/s ± 3%  -1.52%  (p=0.000 n=24+30)
    Revcomp-4                33.0MB/s ± 3%  33.0MB/s ± 2%    ~     (p=0.128 n=24+25)
    Template-4               2.17MB/s ± 5%  2.21MB/s ± 6%  +1.61%  (p=0.000 n=30+30)
    [Geo mean]               7.79MB/s       7.79MB/s       +0.05%
    
    Change-Id: Ied3dbdb5ba8e386168629cba06fcd4263bbb83e1
    Reviewed-on: https://go-review.googlesource.com/94901
    Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
    Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
    f4c3072c
ARM64.rules 70.3 KB