• Austin Clements's avatar
    runtime: perform concurrent scan in GC workers · 82d14d77
    Austin Clements authored
    Currently the concurrent root scan is performed in its entirety by the
    GC coordinator before entering concurrent mark (which enables GC
    workers). This scan is done sequentially, which can prolong the scan
    phase, delay the mark phase, and means that the scan phase does not
    obey the 25% CPU goal. Furthermore, there's no need to complete the
    root scan before starting marking (in fact, we already allow GC
    assists to happen during the scan phase), so this acts as an
    unnecessary barrier between root scanning and marking.
    
    This change shifts the root scan work out of the GC coordinator and in
    to the GC workers. The coordinator simply sets up the scan state and
    enqueues the right number of root scan jobs. The GC workers then drain
    the root scan jobs prior to draining heap scan jobs.
    
    This parallelizes the root scan process, makes it obey the 25% CPU
    goal, and effectively eliminates root scanning as an isolated phase,
    allowing the system to smoothly transition from root scanning to heap
    marking. This also eliminates a major non-STW responsibility of the GC
    coordinator, which will make it easier to switch to a decentralized
    state machine. Finally, it puts us in a good position to perform root
    scanning in assists as well, which will help satisfy assists at the
    beginning of the GC cycle.
    
    This is mostly straightforward. One tricky aspect is that we have to
    deal with preemption deadlock: where two non-preemptible gorountines
    are trying to preempt each other to perform a stack scan. Given the
    context where this happens, the only instance of this is two
    background workers trying to scan each other. We avoid this by simply
    not scanning the stacks of background workers during the concurrent
    phase; this is safe because we'll scan them during mark termination
    (and their stacks are *very* small and should not contain any new
    pointers).
    
    This change also switches the root marking during mark termination to
    use the same gcDrain-based code path as concurrent mark. This
    shouldn't affect performance because STW root marking was already
    parallel and tasks switched to heap marking immediately when no more
    root marking tasks were available. However, it simplifies the code and
    unifies these code paths.
    
    This has negligible effect on the go1 benchmarks. It slightly slows
    down the garbage benchmark, possibly by making GC run slightly more
    frequently.
    
    name              old time/op  new time/op  delta
    XBenchGarbage-12  5.10ms ± 1%  5.24ms ± 1%  +2.87%  (p=0.000 n=18+18)
    
    name                      old time/op    new time/op    delta
    BinaryTree17-12              3.25s ± 3%     3.20s ± 5%  -1.57%  (p=0.013 n=20+20)
    Fannkuch11-12                2.45s ± 1%     2.46s ± 1%  +0.38%  (p=0.019 n=20+18)
    FmtFprintfEmpty-12          49.7ns ± 3%    49.9ns ± 4%    ~     (p=0.851 n=19+20)
    FmtFprintfString-12          170ns ± 2%     170ns ± 1%    ~     (p=0.775 n=20+19)
    FmtFprintfInt-12             161ns ± 1%     160ns ± 1%  -0.78%  (p=0.000 n=19+18)
    FmtFprintfIntInt-12          267ns ± 1%     270ns ± 1%  +1.04%  (p=0.000 n=19+19)
    FmtFprintfPrefixedInt-12     238ns ± 2%     238ns ± 1%    ~     (p=0.133 n=18+19)
    FmtFprintfFloat-12           311ns ± 1%     310ns ± 2%  -0.35%  (p=0.023 n=20+19)
    FmtManyArgs-12              1.08µs ± 1%    1.06µs ± 1%  -2.31%  (p=0.000 n=20+20)
    GobDecode-12                8.65ms ± 1%    8.63ms ± 1%    ~     (p=0.377 n=18+20)
    GobEncode-12                6.49ms ± 1%    6.52ms ± 1%  +0.37%  (p=0.015 n=20+20)
    Gzip-12                      319ms ± 3%     318ms ± 1%    ~     (p=0.975 n=19+17)
    Gunzip-12                   41.9ms ± 1%    42.1ms ± 2%  +0.65%  (p=0.004 n=19+20)
    HTTPClientServer-12         61.7µs ± 1%    62.6µs ± 1%  +1.40%  (p=0.000 n=18+20)
    JSONEncode-12               16.8ms ± 1%    16.9ms ± 1%    ~     (p=0.239 n=20+18)
    JSONDecode-12               58.4ms ± 1%    60.7ms ± 1%  +3.85%  (p=0.000 n=19+20)
    Mandelbrot200-12            3.86ms ± 0%    3.86ms ± 1%    ~     (p=0.092 n=18+19)
    GoParse-12                  3.75ms ± 2%    3.75ms ± 2%    ~     (p=0.708 n=19+20)
    RegexpMatchEasy0_32-12       100ns ± 1%     100ns ± 2%  +0.60%  (p=0.010 n=17+20)
    RegexpMatchEasy0_1K-12       341ns ± 1%     342ns ± 2%    ~     (p=0.203 n=20+19)
    RegexpMatchEasy1_32-12      82.5ns ± 2%    83.2ns ± 2%  +0.83%  (p=0.007 n=19+19)
    RegexpMatchEasy1_1K-12       495ns ± 1%     495ns ± 2%    ~     (p=0.970 n=19+18)
    RegexpMatchMedium_32-12      130ns ± 2%     130ns ± 2%  +0.59%  (p=0.039 n=19+20)
    RegexpMatchMedium_1K-12     39.2µs ± 1%    39.3µs ± 1%    ~     (p=0.214 n=18+18)
    RegexpMatchHard_32-12       2.03µs ± 2%    2.02µs ± 1%    ~     (p=0.166 n=18+19)
    RegexpMatchHard_1K-12       61.0µs ± 1%    60.9µs ± 1%    ~     (p=0.169 n=20+18)
    Revcomp-12                   533ms ± 1%     535ms ± 1%    ~     (p=0.071 n=19+17)
    Template-12                 68.1ms ± 2%    73.0ms ± 1%  +7.26%  (p=0.000 n=19+20)
    TimeParse-12                 355ns ± 2%     356ns ± 2%    ~     (p=0.530 n=19+20)
    TimeFormat-12                357ns ± 2%     347ns ± 1%  -2.59%  (p=0.000 n=20+19)
    [Geo mean]                  62.1µs         62.3µs       +0.31%
    
    name                      old speed      new speed      delta
    GobDecode-12              88.7MB/s ± 1%  88.9MB/s ± 1%    ~     (p=0.377 n=18+20)
    GobEncode-12               118MB/s ± 1%   118MB/s ± 1%  -0.37%  (p=0.015 n=20+20)
    Gzip-12                   60.9MB/s ± 3%  60.9MB/s ± 1%    ~     (p=0.944 n=19+17)
    Gunzip-12                  464MB/s ± 1%   461MB/s ± 2%  -0.64%  (p=0.004 n=19+20)
    JSONEncode-12              115MB/s ± 1%   115MB/s ± 1%    ~     (p=0.236 n=20+18)
    JSONDecode-12             33.2MB/s ± 1%  32.0MB/s ± 1%  -3.71%  (p=0.000 n=19+20)
    GoParse-12                15.5MB/s ± 2%  15.5MB/s ± 2%    ~     (p=0.702 n=19+20)
    RegexpMatchEasy0_32-12     320MB/s ± 1%   318MB/s ± 2%    ~     (p=0.094 n=18+20)
    RegexpMatchEasy0_1K-12    3.00GB/s ± 1%  2.99GB/s ± 1%    ~     (p=0.194 n=20+19)
    RegexpMatchEasy1_32-12     388MB/s ± 2%   385MB/s ± 2%  -0.83%  (p=0.008 n=19+19)
    RegexpMatchEasy1_1K-12    2.07GB/s ± 1%  2.07GB/s ± 1%    ~     (p=0.964 n=19+18)
    RegexpMatchMedium_32-12   7.68MB/s ± 1%  7.64MB/s ± 2%  -0.57%  (p=0.020 n=19+20)
    RegexpMatchMedium_1K-12   26.1MB/s ± 1%  26.1MB/s ± 1%    ~     (p=0.211 n=18+18)
    RegexpMatchHard_32-12     15.8MB/s ± 1%  15.8MB/s ± 1%    ~     (p=0.180 n=18+19)
    RegexpMatchHard_1K-12     16.8MB/s ± 1%  16.8MB/s ± 2%    ~     (p=0.236 n=20+19)
    Revcomp-12                 477MB/s ± 1%   475MB/s ± 1%    ~     (p=0.071 n=19+17)
    Template-12               28.5MB/s ± 2%  26.6MB/s ± 1%  -6.77%  (p=0.000 n=19+20)
    [Geo mean]                 100MB/s       99.0MB/s       -0.82%
    
    Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0
    Reviewed-on: https://go-review.googlesource.com/16059Reviewed-by: default avatarRick Hudson <rlh@golang.org>
    Run-TryBot: Austin Clements <austin@google.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    82d14d77
mgc.go 58.9 KB