• Ben Shi's avatar
    cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU · a2f22a68
    Ben Shi authored
    Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current
    ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to
    generate further optimized ARM code.
    
    My patch implements this optimization. Here are some contrast test
    results against the original go compiler.
    
    1. The total size of all .a files in pkg/ shrinks by 0.03%.
    
    2. The compilecmp benchmark shows a little decline.
    name        old time/op       new time/op       delta
    Template          2.35s ± 1%        2.37s ± 3%  +0.94%  (p=0.006 n=19+19)
    Unicode           1.33s ± 3%        1.33s ± 2%    ~     (p=0.158 n=20+18)
    GoTypes           7.86s ± 2%        7.84s ± 1%    ~     (p=0.284 n=19+18)
    Compiler          37.5s ± 1%        37.7s ± 2%    ~     (p=0.101 n=20+19)
    SSA               83.4s ± 2%        83.6s ± 2%    ~     (p=0.231 n=20+20)
    Flate             1.46s ± 2%        1.45s ± 1%    ~     (p=0.097 n=20+17)
    GoParser          1.86s ± 2%        1.86s ± 4%    ~     (p=0.738 n=20+20)
    Reflect           5.10s ± 1%        5.11s ± 1%    ~     (p=0.290 n=20+18)
    Tar               1.78s ± 2%        1.77s ± 2%    ~     (p=0.166 n=19+20)
    XML               2.61s ± 2%        2.61s ± 2%    ~     (p=0.665 n=19+19)
    [Geo mean]        4.67s             4.68s       +0.16%
    
    name        old user-time/op  new user-time/op  delta
    Template          2.79s ± 3%        2.80s ± 2%    ~     (p=0.662 n=20+20)
    Unicode           1.62s ± 3%        1.64s ± 4%    ~     (p=0.252 n=20+20)
    GoTypes           9.58s ± 2%        9.62s ± 2%    ~     (p=0.250 n=20+20)
    Compiler          46.2s ± 1%        46.2s ± 1%    ~     (p=0.602 n=20+19)
    SSA                108s ± 1%         108s ± 2%    ~     (p=0.242 n=18+20)
    Flate             1.69s ± 3%        1.69s ± 4%    ~     (p=0.470 n=20+20)
    GoParser          2.16s ± 3%        2.20s ± 4%  +1.70%  (p=0.005 n=19+20)
    Reflect           6.02s ± 2%        6.02s ± 2%    ~     (p=0.700 n=20+17)
    Tar               2.11s ± 2%        2.11s ± 3%    ~     (p=0.480 n=18+20)
    XML               3.07s ± 2%        3.11s ± 4%  +1.50%  (p=0.043 n=20+20)
    [Geo mean]        5.61s             5.64s       +0.55%
    
    name        old text-bytes    new text-bytes    delta
    HelloSize         586kB ± 0%        586kB ± 0%    ~     (all equal)
    
    name        old data-bytes    new data-bytes    delta
    HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)
    
    name        old bss-bytes     new bss-bytes     delta
    HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)
    
    name        old exe-bytes     new exe-bytes     delta
    HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)
    
    3. The go1 benchmark shows improvement totally, and even more than 10%
    improvement in the test case Revcomp. 
    name                     old time/op    new time/op    delta
    BinaryTree17-4              42.0s ± 1%     41.5s ± 1%   -1.32%  (p=0.000 n=39+40)
    Fannkuch11-4                24.1s ± 1%     23.6s ± 0%   -2.38%  (p=0.000 n=40+40)
    FmtFprintfEmpty-4           843ns ± 0%     839ns ± 1%   -0.46%  (p=0.000 n=33+40)
    FmtFprintfString-4         1.44µs ± 1%    1.37µs ± 1%   -5.48%  (p=0.000 n=40+35)
    FmtFprintfInt-4            1.44µs ± 1%    1.41µs ± 2%   -1.50%  (p=0.000 n=40+40)
    FmtFprintfIntInt-4         2.07µs ± 1%    2.06µs ± 0%   -0.78%  (p=0.000 n=40+40)
    FmtFprintfPrefixedInt-4    2.50µs ± 1%    2.33µs ± 1%   -6.85%  (p=0.000 n=40+40)
    FmtFprintfFloat-4          4.36µs ± 1%    4.34µs ± 0%   -0.39%  (p=0.017 n=40+40)
    FmtManyArgs-4              8.11µs ± 0%    8.00µs ± 0%   -1.37%  (p=0.000 n=40+40)
    GobDecode-4                 105ms ± 2%     103ms ± 2%   -2.17%  (p=0.000 n=39+39)
    GobEncode-4                90.1ms ± 2%    88.6ms ± 1%   -1.67%  (p=0.000 n=40+39)
    Gzip-4                      4.18s ± 1%     4.09s ± 1%   -2.03%  (p=0.000 n=40+40)
    Gunzip-4                    608ms ± 1%     603ms ± 1%   -0.86%  (p=0.000 n=40+34)
    HTTPClientServer-4          674µs ± 3%     661µs ± 2%   -1.82%  (p=0.000 n=40+39)
    JSONEncode-4                256ms ± 1%     243ms ± 0%   -5.11%  (p=0.000 n=39+31)
    JSONDecode-4                915ms ± 1%     904ms ± 1%   -1.18%  (p=0.000 n=40+36)
    Mandelbrot200-4            49.2ms ± 0%    49.3ms ± 0%     ~     (p=0.254 n=34+40)
    GoParse-4                  46.9ms ± 2%    46.9ms ± 1%     ~     (p=0.737 n=40+39)
    RegexpMatchEasy0_32-4      1.28µs ± 1%    1.27µs ± 1%   -0.71%  (p=0.000 n=40+40)
    RegexpMatchEasy0_1K-4      7.86µs ± 4%    7.67µs ± 4%   -2.46%  (p=0.000 n=38+40)
    RegexpMatchEasy1_32-4      1.28µs ± 1%    1.28µs ± 1%   -0.54%  (p=0.000 n=40+40)
    RegexpMatchEasy1_1K-4      10.4µs ± 2%    10.3µs ± 2%   -0.88%  (p=0.003 n=40+39)
    RegexpMatchMedium_32-4     2.05µs ± 0%    2.04µs ± 0%   -0.34%  (p=0.000 n=40+33)
    RegexpMatchMedium_1K-4      541µs ± 1%     535µs ± 1%   -1.02%  (p=0.000 n=40+38)
    RegexpMatchHard_32-4       29.3µs ± 1%    29.1µs ± 1%   -0.51%  (p=0.000 n=40+40)
    RegexpMatchHard_1K-4        881µs ± 1%     871µs ± 1%   -1.15%  (p=0.000 n=40+40)
    Revcomp-4                  81.7ms ± 2%    67.5ms ± 2%  -17.37%  (p=0.000 n=39+39)
    Template-4                  1.05s ± 1%     1.08s ± 2%   +3.67%  (p=0.000 n=40+40)
    TimeParse-4                7.24µs ± 1%    7.09µs ± 1%   -2.13%  (p=0.000 n=40+40)
    TimeFormat-4               13.2µs ± 1%    13.1µs ± 0%   -0.31%  (p=0.007 n=40+31)
    [Geo mean]                  733µs          718µs        -2.03%
    
    name                     old speed      new speed      delta
    GobDecode-4              7.28MB/s ± 2%  7.44MB/s ± 2%   +2.23%  (p=0.000 n=39+39)
    GobEncode-4              8.52MB/s ± 2%  8.67MB/s ± 1%   +1.70%  (p=0.000 n=40+39)
    Gzip-4                   4.65MB/s ± 1%  4.74MB/s ± 1%   +1.94%  (p=0.000 n=37+40)
    Gunzip-4                 31.9MB/s ± 1%  32.2MB/s ± 1%   +0.90%  (p=0.000 n=40+36)
    JSONEncode-4             7.57MB/s ± 1%  7.98MB/s ± 0%   +5.41%  (p=0.000 n=40+31)
    JSONDecode-4             2.12MB/s ± 1%  2.15MB/s ± 1%   +1.23%  (p=0.000 n=40+40)
    GoParse-4                1.23MB/s ± 1%  1.23MB/s ± 1%     ~     (p=0.769 n=39+40)
    RegexpMatchEasy0_32-4    25.0MB/s ± 1%  25.2MB/s ± 1%   +0.71%  (p=0.000 n=40+40)
    RegexpMatchEasy0_1K-4     130MB/s ± 5%   134MB/s ± 4%   +2.53%  (p=0.000 n=38+40)
    RegexpMatchEasy1_32-4    24.9MB/s ± 1%  25.1MB/s ± 1%   +0.55%  (p=0.000 n=40+40)
    RegexpMatchEasy1_1K-4    98.5MB/s ± 2%  99.4MB/s ± 2%   +0.88%  (p=0.003 n=40+39)
    RegexpMatchMedium_32-4    490kB/s ± 0%   490kB/s ± 0%     ~     (all equal)
    RegexpMatchMedium_1K-4   1.89MB/s ± 1%  1.91MB/s ± 1%   +1.02%  (p=0.000 n=40+38)
    RegexpMatchHard_32-4     1.10MB/s ± 1%  1.10MB/s ± 0%   +0.41%  (p=0.000 n=40+33)
    RegexpMatchHard_1K-4     1.16MB/s ± 1%  1.17MB/s ± 1%   +1.21%  (p=0.000 n=40+40)
    Revcomp-4                31.1MB/s ± 2%  37.6MB/s ± 2%  +21.03%  (p=0.000 n=39+39)
    Template-4               1.86MB/s ± 1%  1.79MB/s ± 1%   -3.51%  (p=0.000 n=40+38)
    [Geo mean]               6.66MB/s       6.80MB/s        +2.13%
    
    fixes #21492
    
    Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d
    Reviewed-on: https://go-review.googlesource.com/58450Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
    Run-TryBot: Cherry Zhang <cherryyz@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    a2f22a68
rewriteARM.go 385 KB