• Carl Shapiro's avatar
    runtime: move stack scanning into the parallel mark phase · 0368a7ce
    Carl Shapiro authored
    This change reduces the cost of the stack scanning by frames.
    It moves the stack scanning from the serial root enumeration
    phase to the parallel tracing phase.  The output that follows
    are timings for the issue 6482 benchmark
    
    Baseline
    
    BenchmarkGoroutineSelect	      50	 108027405 ns/op
    BenchmarkGoroutineBlocking	      50	  89573332 ns/op
    BenchmarkGoroutineForRange	      20	  95614116 ns/op
    BenchmarkGoroutineIdle		      20	 122809512 ns/op
    
    Stack scan by frames, non-parallel
    
    BenchmarkGoroutineSelect	      20	 297138929 ns/op
    BenchmarkGoroutineBlocking	      20	 301137599 ns/op
    BenchmarkGoroutineForRange	      10	 312499469 ns/op
    BenchmarkGoroutineIdle		      10	 209428876 ns/op
    
    Stack scan by frames, parallel
    
    BenchmarkGoroutineSelect	      20	 183938431 ns/op
    BenchmarkGoroutineBlocking	      20	 170109999 ns/op
    BenchmarkGoroutineForRange	      20	 179628882 ns/op
    BenchmarkGoroutineIdle		      20	 157541498 ns/op
    
    The remaining performance disparity is due to inefficiencies
    in gentraceback and its callees.  The effect was isolated by
    using a parallel stack scan where scanstack was modified to do
    a conservative scan of the stack segments without gentraceback
    followed by a call of gentrackback with a no-op callback.
    
    The output that follows are the top-10 most frequent tops of
    stacks as determined by the Linux perf record facility.
    
    Baseline
    
    +  25.19%  gc.test  gc.test            [.] runtime.xchg
    +  19.00%  gc.test  gc.test            [.] scanblock
    +   8.53%  gc.test  gc.test            [.] scanstack
    +   8.46%  gc.test  gc.test            [.] flushptrbuf
    +   5.08%  gc.test  gc.test            [.] procresize
    +   3.57%  gc.test  gc.test            [.] runtime.chanrecv
    +   2.94%  gc.test  gc.test            [.] dequeue
    +   2.74%  gc.test  gc.test            [.] addroots
    +   2.25%  gc.test  gc.test            [.] runtime.ready
    +   1.33%  gc.test  gc.test            [.] runtime.cas64
    
    Gentraceback
    
    +  18.12%  gc.test  gc.test             [.] runtime.xchg
    +  14.68%  gc.test  gc.test             [.] scanblock
    +   8.20%  gc.test  gc.test             [.] runtime.gentraceback
    +   7.38%  gc.test  gc.test             [.] flushptrbuf
    +   6.84%  gc.test  gc.test             [.] scanstack
    +   5.92%  gc.test  gc.test             [.] runtime.findfunc
    +   3.62%  gc.test  gc.test             [.] procresize
    +   3.15%  gc.test  gc.test             [.] readvarint
    +   1.92%  gc.test  gc.test             [.] addroots
    +   1.87%  gc.test  gc.test             [.] runtime.chanrecv
    
    R=golang-dev, dvyukov, rsc
    CC=golang-dev
    https://golang.org/cl/17410043
    0368a7ce
mgc0.c 65 KB