runtime: start an M when handing off a P when there's GC work
Currently it's possible for the scheduler to deadlock with the right confluence of locked Gs, assists, and scheduling of background mark workers. Broadly, this happens because handoffp is stricter than findrunnable, and if the only work for a P is GC work, handoffp will put the P into idle, rather than starting an M to execute that P. One way this can happen is as follows: 0. There is only one user G, which we'll call G 1. There is more than one P, but they're all idle except the one running G 1. 1. G 1 locks itself to an M using runtime.LockOSThread. 2. GC starts up and enters mark 1. 3. G 1 performs a GC assist, which completes mark 1 without being fully satisfied. Completing mark 1 causes all background mark workers to park. And since the assist isn't fully satisfied, it parks as well, waiting for a background mark worker to satisfy its remaining assist debt. 4. The assist park enters the scheduler. Since G 1 is locked to the M, the scheduler releases the P and calls handoffp to hand the P to another M. 5. handoffp checks the local and global run queues, which are empty, and sees that there are idle Ps, so rather than start an M, it puts the P into idle. At this point, all of the Gs are waiting and all of the Ps are idle. In particular, none of the GC workers are running, so no mark work gets done and the assist on the main G is never satisfied, so the whole process soft locks up. Fix this by making handoffp start an M if there is GC work. This reintroduces a key invariant: that in any situation where findrunnable would return a G to run on a P, handoffp for that P will start an M to run work on that P. Fixes #13645. Tested by running 2,689 iterations of `go tool dist test -no-rebuild runtime:cpu124` across 10 linux-amd64-noopt VMs with no failures. Without this change, the failure rate was somewhere around 1%. Performance change is negligible. name old time/op new time/op delta XBenchGarbage-12 2.48ms ± 2% 2.48ms ± 1% -0.24% (p=0.000 n=92+93) name old time/op new time/op delta BinaryTree17-12 2.86s ± 2% 2.87s ± 2% ~ (p=0.667 n=19+20) Fannkuch11-12 2.52s ± 1% 2.47s ± 1% -2.05% (p=0.000 n=18+20) FmtFprintfEmpty-12 51.7ns ± 1% 51.5ns ± 3% ~ (p=0.931 n=16+20) FmtFprintfString-12 170ns ± 1% 168ns ± 1% -0.65% (p=0.000 n=19+19) FmtFprintfInt-12 160ns ± 0% 160ns ± 0% +0.18% (p=0.033 n=17+19) FmtFprintfIntInt-12 265ns ± 1% 273ns ± 1% +2.98% (p=0.000 n=17+19) FmtFprintfPrefixedInt-12 235ns ± 1% 239ns ± 1% +1.99% (p=0.000 n=16+19) FmtFprintfFloat-12 315ns ± 0% 315ns ± 1% ~ (p=0.250 n=17+19) FmtManyArgs-12 1.04µs ± 1% 1.05µs ± 0% +0.87% (p=0.000 n=17+19) GobDecode-12 7.93ms ± 0% 7.85ms ± 1% -1.03% (p=0.000 n=16+18) GobEncode-12 6.62ms ± 1% 6.58ms ± 1% -0.60% (p=0.000 n=18+19) Gzip-12 322ms ± 1% 320ms ± 1% -0.46% (p=0.009 n=20+20) Gunzip-12 42.5ms ± 1% 42.5ms ± 0% ~ (p=0.751 n=19+19) HTTPClientServer-12 69.7µs ± 1% 70.0µs ± 2% ~ (p=0.056 n=19+19) JSONEncode-12 16.9ms ± 1% 16.7ms ± 1% -1.13% (p=0.000 n=19+19) JSONDecode-12 61.5ms ± 1% 61.3ms ± 1% -0.35% (p=0.001 n=20+17) Mandelbrot200-12 3.94ms ± 0% 3.91ms ± 0% -0.67% (p=0.000 n=20+18) GoParse-12 3.71ms ± 1% 3.70ms ± 1% ~ (p=0.244 n=17+19) RegexpMatchEasy0_32-12 101ns ± 1% 102ns ± 2% +0.54% (p=0.037 n=19+20) RegexpMatchEasy0_1K-12 349ns ± 0% 350ns ± 0% +0.33% (p=0.000 n=17+18) RegexpMatchEasy1_32-12 84.5ns ± 2% 84.2ns ± 1% -0.43% (p=0.048 n=19+20) RegexpMatchEasy1_1K-12 510ns ± 1% 513ns ± 2% +0.58% (p=0.002 n=18+20) RegexpMatchMedium_32-12 132ns ± 1% 134ns ± 1% +0.95% (p=0.000 n=20+20) RegexpMatchMedium_1K-12 40.1µs ± 1% 39.6µs ± 1% -1.39% (p=0.000 n=20+20) RegexpMatchHard_32-12 2.08µs ± 0% 2.06µs ± 1% -0.95% (p=0.000 n=18+18) RegexpMatchHard_1K-12 62.2µs ± 1% 61.9µs ± 1% -0.42% (p=0.001 n=19+20) Revcomp-12 537ms ± 0% 536ms ± 0% ~ (p=0.076 n=20+20) Template-12 71.3ms ± 1% 69.3ms ± 1% -2.75% (p=0.000 n=20+20) TimeParse-12 361ns ± 0% 360ns ± 1% ~ (p=0.056 n=19+19) TimeFormat-12 353ns ± 0% 352ns ± 0% -0.23% (p=0.000 n=17+18) [Geo mean] 62.6µs 62.5µs -0.17% Change-Id: I0fbbbe4d7d99653ba5600ffb4394fa03558bc4e9 Reviewed-on: https://go-review.googlesource.com/19107Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
Showing
Please register or sign in to comment