• Austin Clements's avatar
    runtime: separate soft and hard heap limits · 03eb9483
    Austin Clements authored
    Currently, GC pacing is based on a single hard heap limit computed
    based on GOGC. In order to achieve this hard limit, assist pacing
    makes the conservative assumption that the entire heap is live.
    However, in the steady state (with GOGC=100), only half of the heap is
    live. As a result, the garbage collector works twice as hard as
    necessary and finishes half way between the trigger and the goal.
    Since this is a stable state for the trigger controller, this repeats
    from cycle to cycle. Matters are even worse if GOGC is higher. For
    example, if GOGC=200, only a third of the heap is live in steady
    state, so the GC will work three times harder than necessary and
    finish only a third of the way between the trigger and the goal.
    
    Since this causes the garbage collector to consume ~50% of the
    available CPU during marking instead of the intended 25%, about 25% of
    the CPU goes to mutator assists. This high mutator assist cost causes
    high mutator latency variability.
    
    This commit improves the situation by separating the heap goal into
    two goals: a soft goal and a hard goal. The soft goal is set based on
    GOGC, just like the current goal is, and the hard goal is set at a 10%
    larger heap than the soft goal. Prior to the soft goal, assist pacing
    assumes the heap is in steady state (e.g., only half of it is live).
    Between the soft goal and the hard goal, assist pacing switches to the
    current conservative assumption that the entire heap is live.
    
    In benchmarks, this nearly eliminates mutator assists. However, since
    background marking is fixed at 25% CPU, this causes the trigger
    controller to saturate, which leads to somewhat higher variability in
    heap size. The next commit will address this.
    
    The lower CPU usage of course leads to longer mark cycles, though
    really it means the mark cycles are as long as they should have been
    in the first place. This does, however, lead to two potential
    down-sides compared to the current pacing policy: 1. the total
    overhead of the write barrier is higher because it's enabled more of
    the time and 2. the heap size may be larger because there's more
    floating garbage. We addressed 1 by significantly improving the
    performance of the write barrier in the preceding commits. 2 can be
    demonstrated in intense GC benchmarks, but doesn't seem to be a
    problem in any real applications.
    
    Updates #14951.
    Updates #14812 (fixes?).
    Fixes #18534.
    
    This has no significant effect on the throughput of the
    github.com/dr2chase/bent benchmarks-50.
    
    This has little overall throughput effect on the go1 benchmarks:
    
    name                      old time/op    new time/op    delta
    BinaryTree17-12              2.41s ± 0%     2.40s ± 0%  -0.22%  (p=0.007 n=20+18)
    Fannkuch11-12                2.95s ± 0%     2.95s ± 0%  +0.07%  (p=0.003 n=17+18)
    FmtFprintfEmpty-12          41.7ns ± 3%    42.2ns ± 0%  +1.17%  (p=0.002 n=20+15)
    FmtFprintfString-12         66.5ns ± 0%    67.9ns ± 2%  +2.16%  (p=0.000 n=16+20)
    FmtFprintfInt-12            77.6ns ± 2%    75.6ns ± 3%  -2.55%  (p=0.000 n=19+19)
    FmtFprintfIntInt-12          124ns ± 1%     123ns ± 1%  -0.98%  (p=0.000 n=18+17)
    FmtFprintfPrefixedInt-12     151ns ± 1%     148ns ± 1%  -1.75%  (p=0.000 n=19+20)
    FmtFprintfFloat-12           210ns ± 1%     212ns ± 0%  +0.75%  (p=0.000 n=19+16)
    FmtManyArgs-12               501ns ± 1%     499ns ± 1%  -0.30%  (p=0.041 n=17+19)
    GobDecode-12                6.50ms ± 1%    6.49ms ± 1%    ~     (p=0.234 n=19+19)
    GobEncode-12                5.43ms ± 0%    5.47ms ± 0%  +0.75%  (p=0.000 n=20+19)
    Gzip-12                      216ms ± 1%     220ms ± 1%  +1.71%  (p=0.000 n=19+20)
    Gunzip-12                   38.6ms ± 0%    38.8ms ± 0%  +0.66%  (p=0.000 n=18+19)
    HTTPClientServer-12         78.1µs ± 1%    78.5µs ± 1%  +0.49%  (p=0.035 n=20+20)
    JSONEncode-12               12.1ms ± 0%    12.2ms ± 0%  +1.05%  (p=0.000 n=18+17)
    JSONDecode-12               53.0ms ± 0%    52.3ms ± 0%  -1.27%  (p=0.000 n=19+19)
    Mandelbrot200-12            3.74ms ± 0%    3.69ms ± 0%  -1.17%  (p=0.000 n=18+19)
    GoParse-12                  3.17ms ± 1%    3.17ms ± 1%    ~     (p=0.569 n=19+20)
    RegexpMatchEasy0_32-12      73.2ns ± 1%    73.7ns ± 0%  +0.76%  (p=0.000 n=18+17)
    RegexpMatchEasy0_1K-12       239ns ± 0%     238ns ± 0%  -0.27%  (p=0.000 n=13+17)
    RegexpMatchEasy1_32-12      69.0ns ± 2%    69.1ns ± 1%    ~     (p=0.404 n=19+19)
    RegexpMatchEasy1_1K-12       367ns ± 1%     365ns ± 1%  -0.60%  (p=0.000 n=19+19)
    RegexpMatchMedium_32-12      105ns ± 1%     104ns ± 1%  -1.24%  (p=0.000 n=19+16)
    RegexpMatchMedium_1K-12     34.1µs ± 2%    33.6µs ± 3%  -1.60%  (p=0.000 n=20+20)
    RegexpMatchHard_32-12       1.62µs ± 1%    1.67µs ± 1%  +2.75%  (p=0.000 n=18+18)
    RegexpMatchHard_1K-12       48.8µs ± 1%    50.3µs ± 2%  +3.07%  (p=0.000 n=20+19)
    Revcomp-12                   386ms ± 0%     384ms ± 0%  -0.57%  (p=0.000 n=20+19)
    Template-12                 59.9ms ± 1%    61.1ms ± 1%  +2.01%  (p=0.000 n=20+19)
    TimeParse-12                 301ns ± 2%     307ns ± 0%  +2.11%  (p=0.000 n=20+19)
    TimeFormat-12                323ns ± 0%     323ns ± 0%    ~     (all samples are equal)
    [Geo mean]                  47.0µs         47.1µs       +0.23%
    
    https://perf.golang.org/search?q=upload:20171030.1
    
    Likewise, the throughput effect on the x/benchmarks is minimal (and
    reasonably positive on the garbage benchmark with a large heap):
    
    name                         old time/op  new time/op  delta
    Garbage/benchmem-MB=1024-12  2.40ms ± 4%  2.29ms ± 3%  -4.57%  (p=0.000 n=19+18)
    Garbage/benchmem-MB=64-12    2.23ms ± 1%  2.24ms ± 2%  +0.59%  (p=0.016 n=19+18)
    HTTP-12                      12.5µs ± 1%  12.6µs ± 1%    ~     (p=0.326 n=20+19)
    JSON-12                      11.1ms ± 1%  11.3ms ± 2%  +2.15%  (p=0.000 n=16+17)
    
    It does increase the heap size of the garbage benchmarks, but seems to
    have relatively little impact on more realistic programs. Also, we'll
    gain some of this back with the next commit.
    
    name                         old peak-RSS-bytes  new peak-RSS-bytes  delta
    Garbage/benchmem-MB=1024-12          1.21G ± 1%          1.88G ± 2%  +55.59%  (p=0.000 n=19+20)
    Garbage/benchmem-MB=64-12             168M ± 3%           248M ± 8%  +48.08%  (p=0.000 n=18+20)
    HTTP-12                              45.6M ± 9%          47.0M ±27%     ~     (p=0.925 n=20+20)
    JSON-12                               193M ±11%           206M ±11%   +7.06%  (p=0.001 n=20+20)
    
    https://perf.golang.org/search?q=upload:20171030.2
    
    Change-Id: Ic78904135f832b4d64056cbe734ab979f5ad9736
    Reviewed-on: https://go-review.googlesource.com/59970
    Run-TryBot: Austin Clements <austin@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: default avatarRick Hudson <rlh@golang.org>
    03eb9483
mgc.go 72.2 KB