• Klaus Post's avatar
    compress/flate: eliminate most common bounds checks · f20b1809
    Klaus Post authored
    This uses the SSA compiler to eliminate various unneeded bounds
    checks in loops and various lookups.
    
    This fixes the low hanging fruit, without any major code changes.
    
    name                       old time/op    new time/op    delta
    EncodeDigitsHuffman1e4-8     49.9µs ± 1%    48.1µs ± 1%  -3.74%   (p=0.000 n=10+9)
    EncodeDigitsHuffman1e5-8      476µs ± 1%     458µs ± 1%  -3.58%  (p=0.000 n=10+10)
    EncodeDigitsHuffman1e6-8     4.80ms ± 2%    4.56ms ± 1%  -5.07%   (p=0.000 n=10+9)
    EncodeDigitsSpeed1e4-8        305µs ± 3%     290µs ± 2%  -5.03%   (p=0.000 n=10+9)
    EncodeDigitsSpeed1e5-8       3.67ms ± 2%    3.49ms ± 2%  -4.78%   (p=0.000 n=9+10)
    EncodeDigitsSpeed1e6-8       38.3ms ± 2%    35.8ms ± 1%  -6.58%   (p=0.000 n=9+10)
    EncodeDigitsDefault1e4-8      361µs ± 2%     346µs ± 3%  -4.12%   (p=0.000 n=10+9)
    EncodeDigitsDefault1e5-8     5.24ms ± 2%    4.96ms ± 3%  -5.38%  (p=0.000 n=10+10)
    EncodeDigitsDefault1e6-8     56.5ms ± 3%    52.2ms ± 2%  -7.68%  (p=0.000 n=10+10)
    EncodeDigitsCompress1e4-8     362µs ± 2%     343µs ± 1%  -5.20%   (p=0.000 n=10+9)
    EncodeDigitsCompress1e5-8    5.26ms ± 3%    4.98ms ± 2%  -5.48%  (p=0.000 n=10+10)
    EncodeDigitsCompress1e6-8    56.0ms ± 4%    52.1ms ± 1%  -7.01%  (p=0.000 n=10+10)
    EncodeTwainHuffman1e4-8      70.9µs ± 3%    64.7µs ± 1%  -8.68%   (p=0.000 n=10+9)
    EncodeTwainHuffman1e5-8       556µs ± 2%     524µs ± 2%  -5.84%  (p=0.000 n=10+10)
    EncodeTwainHuffman1e6-8      5.54ms ± 3%    5.22ms ± 2%  -5.70%  (p=0.000 n=10+10)
    EncodeTwainSpeed1e4-8         294µs ± 3%     284µs ± 1%  -3.71%  (p=0.000 n=10+10)
    EncodeTwainSpeed1e5-8        2.59ms ± 2%    2.48ms ± 1%  -4.14%   (p=0.000 n=10+9)
    EncodeTwainSpeed1e6-8        25.6ms ± 1%    24.3ms ± 1%  -5.28%   (p=0.000 n=9+10)
    EncodeTwainDefault1e4-8       419µs ± 2%     396µs ± 1%  -5.59%   (p=0.000 n=10+9)
    EncodeTwainDefault1e5-8      6.23ms ± 4%    5.75ms ± 1%  -7.83%   (p=0.000 n=10+9)
    EncodeTwainDefault1e6-8      66.2ms ± 2%    61.4ms ± 1%  -7.22%  (p=0.000 n=10+10)
    EncodeTwainCompress1e4-8      426µs ± 1%     405µs ± 1%  -4.97%   (p=0.000 n=9+10)
    EncodeTwainCompress1e5-8     6.80ms ± 1%    6.32ms ± 1%  -6.97%   (p=0.000 n=9+10)
    EncodeTwainCompress1e6-8     74.6ms ± 3%    68.7ms ± 1%  -7.90%   (p=0.000 n=10+9)
    
    name                       old speed      new speed      delta
    EncodeDigitsHuffman1e4-8    200MB/s ± 1%   208MB/s ± 1%  +3.88%   (p=0.000 n=10+9)
    EncodeDigitsHuffman1e5-8    210MB/s ± 1%   218MB/s ± 1%  +3.71%  (p=0.000 n=10+10)
    EncodeDigitsHuffman1e6-8    208MB/s ± 2%   219MB/s ± 1%  +5.32%   (p=0.000 n=10+9)
    EncodeDigitsSpeed1e4-8     32.8MB/s ± 3%  34.5MB/s ± 2%  +5.29%   (p=0.000 n=10+9)
    EncodeDigitsSpeed1e5-8     27.2MB/s ± 2%  28.6MB/s ± 2%  +5.29%  (p=0.000 n=10+10)
    EncodeDigitsSpeed1e6-8     26.1MB/s ± 2%  27.9MB/s ± 1%  +7.02%   (p=0.000 n=9+10)
    EncodeDigitsDefault1e4-8   27.7MB/s ± 2%  28.9MB/s ± 3%  +4.30%   (p=0.000 n=10+9)
    EncodeDigitsDefault1e5-8   19.1MB/s ± 2%  20.2MB/s ± 3%  +5.69%  (p=0.000 n=10+10)
    EncodeDigitsDefault1e6-8   17.7MB/s ± 3%  19.2MB/s ± 2%  +8.31%  (p=0.000 n=10+10)
    EncodeDigitsCompress1e4-8  27.6MB/s ± 2%  29.1MB/s ± 1%  +5.47%   (p=0.000 n=10+9)
    EncodeDigitsCompress1e5-8  19.0MB/s ± 3%  20.1MB/s ± 2%  +5.78%  (p=0.000 n=10+10)
    EncodeDigitsCompress1e6-8  17.9MB/s ± 4%  19.2MB/s ± 1%  +7.50%  (p=0.000 n=10+10)
    EncodeTwainHuffman1e4-8     141MB/s ± 3%   154MB/s ± 1%  +9.46%   (p=0.000 n=10+9)
    EncodeTwainHuffman1e5-8     180MB/s ± 2%   191MB/s ± 2%  +6.19%  (p=0.000 n=10+10)
    EncodeTwainHuffman1e6-8     181MB/s ± 3%   192MB/s ± 2%  +6.02%  (p=0.000 n=10+10)
    EncodeTwainSpeed1e4-8      34.0MB/s ± 3%  35.3MB/s ± 1%  +3.84%  (p=0.000 n=10+10)
    EncodeTwainSpeed1e5-8      38.7MB/s ± 2%  40.3MB/s ± 1%  +4.30%   (p=0.000 n=10+9)
    EncodeTwainSpeed1e6-8      39.1MB/s ± 1%  41.2MB/s ± 1%  +5.57%   (p=0.000 n=9+10)
    EncodeTwainDefault1e4-8    23.9MB/s ± 2%  25.3MB/s ± 1%  +5.91%   (p=0.000 n=10+9)
    EncodeTwainDefault1e5-8    16.0MB/s ± 4%  17.4MB/s ± 1%  +8.47%   (p=0.000 n=10+9)
    EncodeTwainDefault1e6-8    15.1MB/s ± 2%  16.3MB/s ± 1%  +7.76%  (p=0.000 n=10+10)
    EncodeTwainCompress1e4-8   23.5MB/s ± 1%  24.7MB/s ± 1%  +5.24%   (p=0.000 n=9+10)
    EncodeTwainCompress1e5-8   14.7MB/s ± 1%  15.8MB/s ± 1%  +7.50%   (p=0.000 n=9+10)
    EncodeTwainCompress1e6-8   13.4MB/s ± 3%  14.6MB/s ± 1%  +8.57%   (p=0.000 n=10+9)
    
    Change-Id: I5c7e84c2f9ea4d38a2115995705eebb93387e22f
    Reviewed-on: https://go-review.googlesource.com/21759Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
    Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    f20b1809
deflate.go 17.6 KB