Commits · 91740582c3ec1e57621cbb0ec0f9163431f1b688 · Kirill Smelkov / go

An error occurred fetching the project authors.

19 May, 2016 1 commit

runtime: add 'next' flag to ready · 91740582

Austin Clements authored 8 years ago

Currently ready always puts the readied goroutine in runnext. We're
going to have to change this for some uses, so add a flag for whether
or not to use runnext.

For now we always pass true so this is a no-op change.

For #15706.

Change-Id: Iaa66d8355ccfe4bbe347570cc1b1878c70fa25df
Reviewed-on: https://go-review.googlesource.com/23171
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

91740582

16 May, 2016 1 commit

runtime: remove obsolete comment from scanobject · 30ded165

Austin Clements authored 8 years ago

Change-Id: I5ebf93b60213c0138754fc20888ae5ce60237b8c
Reviewed-on: https://go-review.googlesource.com/23131Reviewed-by: Rick Hudson <rlh@golang.org>

30ded165

30 Apr, 2016 2 commits

runtime: reclaim scan/dead bit in first word · a20fd1f6

Austin Clements authored 8 years ago

With the switch to separate mark bitmaps, the scan/dead bit for the
first word of each object is now unused. Reclaim this bit and use it
as a scan/dead bit, just like words three and on. The second word is
still used for checkmark.

This dramatically simplifies heapBitsSetTypeNoScan and hasPointers,
since they no longer need different cases for 1, 2, and 3+ word
objects. They can instead just manipulate the heap bitmap for the
first word and be done with it.

In order to enable this, we change heapBitsSetType and runGCProg to
always set the scan/dead bit to scan for the first word on every code
path. Since these functions only apply to types that have pointers,
there's no need to do this conditionally: it's *always* necessary to
set the scan bit in the first word.

We also change every place that scans an object and checks if there
are more pointers. Rather than only checking morePointers if the word
is >= 2, we now check morePointers if word != 1 (since that's the
checkmark word).

Looking forward, we should probably reclaim the checkmark bit, too,
but that's going to be quite a bit more work.

Tested by setting doubleCheck in heapBitsSetType and running all.bash
on both linux/amd64 and linux/386, and by running GOGC=10 all.bash.

This particularly improves the FmtFprintf* go1 benchmarks, since they
do a large amount of noscan allocation.

name old time/op new time/op delta
BinaryTree17-12 2.34s ± 1% 2.38s ± 1% +1.70% (p=0.000 n=17+19)
Fannkuch11-12 2.09s ± 0% 2.09s ± 1% ~ (p=0.276 n=17+16)
FmtFprintfEmpty-12 44.9ns ± 2% 44.8ns ± 2% ~ (p=0.340 n=19+18)
FmtFprintfString-12 127ns ± 0% 125ns ± 0% -1.57% (p=0.000 n=16+15)
FmtFprintfInt-12 128ns ± 0% 122ns ± 1% -4.45% (p=0.000 n=15+20)
FmtFprintfIntInt-12 207ns ± 1% 193ns ± 0% -6.55% (p=0.000 n=19+14)
FmtFprintfPrefixedInt-12 197ns ± 1% 191ns ± 0% -2.93% (p=0.000 n=17+18)
FmtFprintfFloat-12 263ns ± 0% 248ns ± 1% -5.88% (p=0.000 n=15+19)
FmtManyArgs-12 794ns ± 0% 779ns ± 1% -1.90% (p=0.000 n=18+18)
GobDecode-12 7.14ms ± 2% 7.11ms ± 1% ~ (p=0.072 n=20+20)
GobEncode-12 5.85ms ± 1% 5.82ms ± 1% -0.49% (p=0.000 n=20+20)
Gzip-12 218ms ± 1% 215ms ± 1% -1.22% (p=0.000 n=19+19)
Gunzip-12 36.8ms ± 0% 36.7ms ± 0% -0.18% (p=0.006 n=18+20)
HTTPClientServer-12 77.1µs ± 4% 77.1µs ± 3% ~ (p=0.945 n=19+20)
JSONEncode-12 15.6ms ± 1% 15.9ms ± 1% +1.68% (p=0.000 n=18+20)
JSONDecode-12 55.2ms ± 1% 53.6ms ± 1% -2.93% (p=0.000 n=17+19)
Mandelbrot200-12 4.05ms ± 1% 4.05ms ± 0% ~ (p=0.306 n=17+17)
GoParse-12 3.14ms ± 1% 3.10ms ± 1% -1.31% (p=0.000 n=19+18)
RegexpMatchEasy0_32-12 69.3ns ± 1% 70.0ns ± 0% +0.89% (p=0.000 n=19+17)
RegexpMatchEasy0_1K-12 237ns ± 1% 236ns ± 0% -0.62% (p=0.000 n=19+16)
RegexpMatchEasy1_32-12 69.5ns ± 1% 70.3ns ± 1% +1.14% (p=0.000 n=18+17)
RegexpMatchEasy1_1K-12 377ns ± 1% 366ns ± 1% -3.03% (p=0.000 n=15+19)
RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 2% ~ (p=0.318 n=20+19)
RegexpMatchMedium_1K-12 33.8µs ± 3% 33.5µs ± 1% -1.04% (p=0.001 n=20+19)
RegexpMatchHard_32-12 1.68µs ± 1% 1.73µs ± 0% +2.50% (p=0.000 n=20+18)
RegexpMatchHard_1K-12 50.8µs ± 1% 52.0µs ± 1% +2.50% (p=0.000 n=19+18)
Revcomp-12 381ms ± 1% 385ms ± 1% +1.00% (p=0.000 n=17+18)
Template-12 64.9ms ± 3% 62.6ms ± 1% -3.55% (p=0.000 n=19+18)
TimeParse-12 324ns ± 0% 328ns ± 1% +1.25% (p=0.000 n=18+18)
TimeFormat-12 345ns ± 0% 334ns ± 0% -3.31% (p=0.000 n=15+17)
[Geo mean] 52.1µs 51.5µs -1.00%

Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049
Reviewed-on: https://go-review.googlesource.com/22632Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

a20fd1f6

runtime: use morePointers and isPointer in more places · d5e3d08b

Austin Clements authored 8 years ago

This makes this code better self-documenting and makes it easier to
find these places in the future.

Change-Id: I31dc5598ae67f937fb9ef26df92fd41d01e983c3
Reviewed-on: https://go-review.googlesource.com/22631Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

d5e3d08b

29 Apr, 2016 2 commits

[dev.garbage] runtime: use s.base() everywhere it makes sense · b7adc41f

Austin Clements authored 8 years ago

Currently we have lots of (s.start << _PageShift) and variants. We now
have an s.base() function that returns this. It's faster and more
readable, so use it.

Change-Id: I888060a9dae15ea75ca8cc1c2b31c905e71b452b
Reviewed-on: https://go-review.googlesource.com/22559Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>

b7adc41f

[dev.garbage] runtime: restructure alloc and mark bits · 2063d5d9

Rick Hudson authored 8 years ago

Two changes are included here that are dependent on the other.
The first is that allocBits and gcamrkBits are changed to
a *uint8 which points to the first byte of that span's
mark and alloc bits. Several places were altered to
perform pointer arithmetic to locate the byte corresponding
to an object in the span. The actual bit corresponding
to an object is indexed in the byte by using the lower three
bits of the objects index.

The second change avoids the redundant calculation of an
object's index. The index is returned from heapBitsForObject
and then used by the functions indexing allocBits
and gcmarkBits.

Finally we no longer allocate the gc bits in the span
structures. Instead we use an arena based allocation scheme
that allows for a more compact bit map as well as recycling
and bulk clearing of the mark bits.

Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef
Reviewed-on: https://go-review.googlesource.com/20705Reviewed-by: Austin Clements <austin@google.com>

2063d5d9

27 Apr, 2016 4 commits

[dev.garbage] runtime: add gc work buffer tryGet and put fast paths · 1354b32c

Rick Hudson authored 8 years ago

The complexity of the GC work buffers put and tryGet
prevented them from being inlined. This CL simplifies
the fast path thus enabling inlining. If the fast
path does not succeed the previous put and tryGet
functions are called.

Change-Id: I6da6495d0dadf42bd0377c110b502274cc01acf5
Reviewed-on: https://go-review.googlesource.com/20704Reviewed-by: Austin Clements <austin@google.com>

1354b32c

[dev.garbage] runtime: cleanup and optimize span.base() · f8d0d4fd

Rick Hudson authored 8 years ago

Prior to this CL the base of a span was calculated in various
places using shifts or calls to base(). This CL now
always calls base() which has been optimized to calculate the
base of the span when the span is initialized and store that
value in the span structure.

Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100
Reviewed-on: https://go-review.googlesource.com/20703Reviewed-by: Austin Clements <austin@google.com>

f8d0d4fd

[dev.garbage] runtime: allocate directly from GC mark bits · 3479b065

Rick Hudson authored 9 years ago

Instead of building a freelist from the mark bits generated
by the GC this CL allocates directly from the mark bits.

The approach moves the mark bits from the pointer/no pointer
heap structures into their own per span data structures. The
mark/allocation vectors consist of a single mark bit per
object. Two vectors are maintained, one for allocation and
one for the GC's mark phase. During the GC cycle's sweep
phase the interpretation of the vectors is swapped. The
mark vector becomes the allocation vector and the old
allocation vector is cleared and becomes the mark vector that
the next GC cycle will use.

Marked entries in the allocation vector indicate that the
object is not free. Each allocation vector maintains a boundary
between areas of the span already allocated from and areas
not yet allocated from. As objects are allocated this boundary
is moved until it reaches the end of the span. At this point
further allocations will be done from another span.

Since we no longer sweep a span inspecting each freed object
the responsibility for maintaining pointer/scalar bits in
the heapBitMap containing is now the responsibility of the
the routines doing the actual allocation.

This CL is functionally complete and ready for performance
tuning.

Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04
Reviewed-on: https://go-review.googlesource.com/19470Reviewed-by: Austin Clements <austin@google.com>

3479b065

runtime: don't rescan globals · b49b71ae

Austin Clements authored 8 years ago

Currently the runtime rescans globals during mark 2 and mark
termination. This costs as much as 500µs/MB in STW time, which is
enough to surpass the 10ms STW limit with only 20MB of globals.

It's also basically unnecessary. The compiler already generates write
barriers for global -> heap pointer updates and the regular write
barrier doesn't check whether the slot is a global or in the heap.
Some less common write barriers do cause problems.
heapBitsBulkBarrier, which is used by typedmemmove and related
functions, currently depends on having access to the pointer bitmap
and as a result ignores writes to globals. Likewise, the
reflect-related write barriers reflect_typedmemmovepartial and
callwritebarrier ignore non-heap destinations; though it appears they
can never be called with global pointers anyway.

This commit makes heapBitsBulkBarrier issue write barriers for writes
to global pointers using the data and BSS pointer bitmaps, removes the
inheap checks from the reflection write barriers, and eliminates the
rescans during mark 2 and mark termination. It also adds a test that
writes to globals have write barriers.

Programs with large data+BSS segments (with pointers) aren't common,
but for programs that do have large data+BSS segments, this
significantly reduces pause time:

name \ 95%ile-time/markTerm old new delta
LargeBSS/bss:1GB/gomaxprocs:4 148200µs ± 6% 302µs ±52% -99.80% (p=0.008 n=5+5)

This very slightly improves the go1 benchmarks:

name old time/op new time/op delta
BinaryTree17-12 2.62s ± 3% 2.62s ± 4% ~ (p=0.904 n=20+20)
Fannkuch11-12 2.15s ± 1% 2.13s ± 0% -1.29% (p=0.000 n=18+20)
FmtFprintfEmpty-12 48.3ns ± 2% 47.6ns ± 1% -1.52% (p=0.000 n=20+16)
FmtFprintfString-12 152ns ± 0% 152ns ± 1% ~ (p=0.725 n=18+18)
FmtFprintfInt-12 150ns ± 1% 149ns ± 1% -1.14% (p=0.000 n=19+20)
FmtFprintfIntInt-12 250ns ± 0% 244ns ± 1% -2.12% (p=0.000 n=20+18)
FmtFprintfPrefixedInt-12 219ns ± 1% 217ns ± 1% -1.20% (p=0.000 n=19+20)
FmtFprintfFloat-12 280ns ± 0% 281ns ± 1% +0.47% (p=0.000 n=19+19)
FmtManyArgs-12 928ns ± 0% 923ns ± 1% -0.53% (p=0.000 n=19+18)
GobDecode-12 7.21ms ± 1% 7.24ms ± 2% ~ (p=0.091 n=19+19)
GobEncode-12 6.07ms ± 1% 6.05ms ± 1% -0.36% (p=0.002 n=20+17)
Gzip-12 265ms ± 1% 265ms ± 1% ~ (p=0.496 n=20+19)
Gunzip-12 39.6ms ± 1% 39.3ms ± 1% -0.85% (p=0.000 n=19+19)
HTTPClientServer-12 74.0µs ± 2% 73.8µs ± 1% ~ (p=0.569 n=20+19)
JSONEncode-12 15.4ms ± 1% 15.3ms ± 1% -0.25% (p=0.049 n=17+17)
JSONDecode-12 53.7ms ± 2% 53.0ms ± 1% -1.29% (p=0.000 n=18+17)
Mandelbrot200-12 3.97ms ± 1% 3.97ms ± 0% ~ (p=0.072 n=17+18)
GoParse-12 3.35ms ± 2% 3.36ms ± 1% +0.51% (p=0.005 n=18+20)
RegexpMatchEasy0_32-12 72.7ns ± 2% 72.2ns ± 1% -0.70% (p=0.005 n=19+19)
RegexpMatchEasy0_1K-12 246ns ± 1% 245ns ± 0% -0.60% (p=0.000 n=18+16)
RegexpMatchEasy1_32-12 72.8ns ± 1% 72.5ns ± 1% -0.37% (p=0.011 n=18+18)
RegexpMatchEasy1_1K-12 380ns ± 1% 385ns ± 1% +1.34% (p=0.000 n=20+19)
RegexpMatchMedium_32-12 115ns ± 2% 115ns ± 1% +0.44% (p=0.047 n=20+20)
RegexpMatchMedium_1K-12 35.4µs ± 1% 35.5µs ± 1% ~ (p=0.079 n=18+19)
RegexpMatchHard_32-12 1.83µs ± 0% 1.80µs ± 1% -1.76% (p=0.000 n=18+18)
RegexpMatchHard_1K-12 55.1µs ± 0% 54.3µs ± 1% -1.42% (p=0.000 n=18+19)
Revcomp-12 386ms ± 1% 381ms ± 1% -1.14% (p=0.000 n=18+18)
Template-12 61.5ms ± 2% 61.5ms ± 2% ~ (p=0.647 n=19+20)
TimeParse-12 338ns ± 0% 336ns ± 1% -0.72% (p=0.000 n=14+19)
TimeFormat-12 350ns ± 0% 357ns ± 0% +2.05% (p=0.000 n=19+18)
[Geo mean] 55.3µs 55.0µs -0.41%

Change-Id: I57e8720385a1b991aeebd111b6874354308e2a6b
Reviewed-on: https://go-review.googlesource.com/20829
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>

b49b71ae

26 Apr, 2016 4 commits

runtime: make stack re-scan O(# dirty stacks) · 2a889b9d

Austin Clements authored 8 years ago

Currently the stack re-scan during mark termination is O(# stacks)
because we enqueue a root marking job for every goroutine. It takes
~34ns to process this root marking job for a valid (clean) stack, so
at around 300k goroutines we exceed the 10ms pause goal. A non-trivial
portion of this time is spent simply taking the cache miss to check
the gcscanvalid flag, so simply optimizing the path that handles clean
stacks can only improve this so much.

Fix this by keeping an explicit list of goroutines with dirty stacks
that need to be rescanned. When a goroutine first transitions to
running after a stack scan and marks its stack dirty, it adds itself
to this list. We enqueue root marking jobs only for the goroutines in
this list, so this improves stack re-scanning asymptotically by
completely eliminating time spent on clean goroutines.

This reduces mark termination time for 500k idle goroutines from 15ms
to 238µs. Overall performance effect is negligible.

name \ 95%ile-time/markTerm old new delta
IdleGs/gs:500000/gomaxprocs:12 15000µs ± 0% 238µs ± 5% -98.41% (p=0.000 n=10+10)

name old time/op new time/op delta
XBenchGarbage-12 2.30ms ± 3% 2.29ms ± 1% -0.43% (p=0.049 n=17+18)

name old time/op new time/op delta
BinaryTree17-12 2.57s ± 3% 2.59s ± 2% ~ (p=0.141 n=19+20)
Fannkuch11-12 2.09s ± 0% 2.10s ± 1% +0.53% (p=0.000 n=19+19)
FmtFprintfEmpty-12 45.3ns ± 3% 45.2ns ± 2% ~ (p=0.845 n=20+20)
FmtFprintfString-12 129ns ± 0% 127ns ± 0% -1.55% (p=0.000 n=16+16)
FmtFprintfInt-12 123ns ± 0% 119ns ± 1% -3.24% (p=0.000 n=19+19)
FmtFprintfIntInt-12 195ns ± 1% 189ns ± 1% -3.11% (p=0.000 n=17+17)
FmtFprintfPrefixedInt-12 193ns ± 1% 187ns ± 1% -3.06% (p=0.000 n=19+19)
FmtFprintfFloat-12 254ns ± 0% 255ns ± 1% +0.35% (p=0.001 n=14+17)
FmtManyArgs-12 781ns ± 0% 770ns ± 0% -1.48% (p=0.000 n=16+19)
GobDecode-12 7.00ms ± 1% 6.98ms ± 1% ~ (p=0.563 n=19+19)
GobEncode-12 5.91ms ± 1% 5.92ms ± 0% ~ (p=0.118 n=19+18)
Gzip-12 219ms ± 1% 215ms ± 1% -1.81% (p=0.000 n=18+18)
Gunzip-12 37.2ms ± 0% 37.4ms ± 0% +0.45% (p=0.000 n=17+19)
HTTPClientServer-12 76.9µs ± 3% 77.5µs ± 2% +0.81% (p=0.030 n=20+19)
JSONEncode-12 15.0ms ± 0% 14.8ms ± 1% -0.88% (p=0.001 n=15+19)
JSONDecode-12 50.6ms ± 0% 53.2ms ± 2% +5.07% (p=0.000 n=17+19)
Mandelbrot200-12 4.05ms ± 0% 4.05ms ± 1% ~ (p=0.581 n=16+17)
GoParse-12 3.34ms ± 1% 3.30ms ± 1% -1.21% (p=0.000 n=15+20)
RegexpMatchEasy0_32-12 69.6ns ± 1% 69.8ns ± 2% ~ (p=0.566 n=19+19)
RegexpMatchEasy0_1K-12 238ns ± 1% 236ns ± 0% -0.91% (p=0.000 n=17+13)
RegexpMatchEasy1_32-12 69.8ns ± 1% 70.0ns ± 1% +0.23% (p=0.026 n=17+16)
RegexpMatchEasy1_1K-12 371ns ± 1% 363ns ± 1% -2.07% (p=0.000 n=19+19)
RegexpMatchMedium_32-12 107ns ± 2% 106ns ± 1% -0.51% (p=0.031 n=18+20)
RegexpMatchMedium_1K-12 33.0µs ± 0% 32.9µs ± 0% -0.30% (p=0.004 n=16+16)
RegexpMatchHard_32-12 1.70µs ± 0% 1.70µs ± 0% +0.45% (p=0.000 n=16+17)
RegexpMatchHard_1K-12 51.1µs ± 2% 51.4µs ± 1% +0.53% (p=0.000 n=17+19)
Revcomp-12 378ms ± 1% 385ms ± 1% +1.92% (p=0.000 n=19+18)
Template-12 64.3ms ± 2% 65.0ms ± 2% +1.09% (p=0.001 n=19+19)
TimeParse-12 315ns ± 1% 317ns ± 2% ~ (p=0.108 n=18+20)
TimeFormat-12 360ns ± 1% 337ns ± 0% -6.30% (p=0.000 n=18+13)
[Geo mean] 51.8µs 51.6µs -0.48%

Change-Id: Icf8994671476840e3998236e15407a505d4c760c
Reviewed-on: https://go-review.googlesource.com/20700Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

2a889b9d

runtime: remove stack barriers during concurrent mark · 269c969c

Austin Clements authored 9 years ago

Currently we remove stack barriers during STW mark termination, which
has a non-trivial per-goroutine cost and means that we have to touch
even clean stacks during mark termination. However, there's no problem
with leaving them in during the sweep phase. They just have to be out
by the time we install new stack barriers immediately prior to
scanning the stack such as during the mark phase of the next GC cycle
or during mark termination in a STW GC.

Hence, move the gcRemoveStackBarriers from STW mark termination to
just before we install new stack barriers during concurrent mark. This
removes the cost from STW. Furthermore, this combined with concurrent
stack shrinking means that the mark termination scan of a clean stack
is a complete no-op, which will make it possible to skip clean stacks
entirely during mark termination.

This has the downside that it will mess up anything outside of Go that
tries to walk Go stacks all the time instead of just some of the time.
This includes tools like GDB, perf, and VTune. We'll improve the
situation shortly.

Change-Id: Ia40baad8f8c16aeefac05425e00b0cf478137097
Reviewed-on: https://go-review.googlesource.com/20667Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

269c969c

runtime: avoid span root marking entirely during mark termination · efb0c554

Austin Clements authored 8 years ago

Currently we enqueue span root mark jobs during both concurrent mark
and mark termination, but we make the job a no-op during mark
termination.

This is silly. Instead of queueing them up just to not do them, don't
queue them up in the first place.

Change-Id: Ie1d36de884abfb17dd0db6f0449a2b7c997affab
Reviewed-on: https://go-review.googlesource.com/20666Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

efb0c554

runtime: free dead G stacks concurrently · e8337491

Austin Clements authored 8 years ago

Currently we free cached stacks of dead Gs during STW stack root
marking. We do this during STW because there's no way to take
ownership of a particular dead G, so attempting to free a dead G's
stack during concurrent stack root marking could race with reusing
that G.

However, we can do this concurrently if we take a completely different
approach. One way to prevent reuse of a dead G is to remove it from
the free G list. Hence, this adds a new fixed root marking task that
simply removes all Gs from the list of dead Gs with cached stacks,
frees their stacks, and then adds them to the list of dead Gs without
cached stacks.

This is also a necessary step toward rescanning only dirty stacks,
since it eliminates another task from STW stack marking.

Change-Id: Iefbad03078b284a2e7bf30fba397da4ca87fe095
Reviewed-on: https://go-review.googlesource.com/20665Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

e8337491

21 Apr, 2016 2 commits

runtime: simplify/optimize allocate-black a bit · 64a26b79

Austin Clements authored 8 years ago

Currently allocating black switches to the system stack (which is
probably a historical accident) and atomically updates the global
bytes marked stat. Since we're about to depend on this much more,
optimize it a bit by putting it back on the regular stack and updating
the per-P bytes marked stat, which gets lazily folded into the global
bytes marked stat.

Change-Id: Ibbe16e5382d3fd2256e4381f88af342bf7020b04
Reviewed-on: https://go-review.googlesource.com/22170Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

64a26b79

runtime: count black allocations toward scan work · 479501c1

Austin Clements authored 8 years ago

Currently we count black allocations toward the scannable heap size,
but not toward the scan work we've done so far. This is clearly
inconsistent (we have, in effect, scanned these allocations and since
they're already black, we're not going to scan them again). Worse, it
means we don't count black allocations toward the scannable heap size
as of the *next* GC because this is based on the amount of scan work
we did in this cycle.

Fix this by counting black allocations as scan work. Currently the GC
spends very little time in allocate-black mode, so this probably
hasn't been a problem, but this will become important when we switch
to always allocating black.

Change-Id: If6ff693b070c385b65b6ecbbbbf76283a0f9d990
Reviewed-on: https://go-review.googlesource.com/22119Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

479501c1

11 Apr, 2016 1 commit

runtime: remove remaining references to TheChar · ba09d06e

Jeremy Jackins authored 8 years ago

After mdempsky's recent changes, these are the only references to
"TheChar" left in the Go tree. Without the context, and without
knowing the history, this is confusing.

Also rename sys.TheGoos and sys.TheGoarch to sys.GOOS
and sys.GOARCH.

Also change the heap dump format to include sys.GOARCH
rather than TheChar, which is no longer a concept.

Updates #15169 (changes heapdump format)

Change-Id: I3e99eeeae00ed55d7d01e6ed503d958c6e931dca
Reviewed-on: https://go-review.googlesource.com/21647Reviewed-by: Matthew Dempsky <mdempsky@google.com>

ba09d06e

16 Mar, 2016 2 commits

runtime: shrink stacks during concurrent mark · f11e4eb5

Austin Clements authored 9 years ago

Currently we shrink stacks during STW mark termination because it used
to be unsafe to shrink them concurrently. For some programs, this
significantly increases pause time: stack shrinking costs ~5ms/MB
copied plus 2µs/shrink.

Now that we've made it safe to shrink a stack without the world being
stopped, shrink them during the concurrent mark phase.

This reduces the STW time in the program from issue #12967 by an order
of magnitude and brings it from over the 10ms goal to well under:

name old 95%ile-markTerm-time new 95%ile-markTerm-time delta
Stackshrink-4 23.8ms ±60% 1.80ms ±39% -92.44% (p=0.008 n=5+5)

Fixes #12967.

This slows down the go1 and garbage benchmarks overall by < 0.5%.

name old time/op new time/op delta
XBenchGarbage-12 2.48ms ± 1% 2.49ms ± 1% +0.45% (p=0.005 n=25+21)

name old time/op new time/op delta
BinaryTree17-12 2.93s ± 2% 2.97s ± 2% +1.34% (p=0.002 n=19+20)
Fannkuch11-12 2.51s ± 1% 2.59s ± 0% +3.09% (p=0.000 n=18+18)
FmtFprintfEmpty-12 51.1ns ± 2% 51.5ns ± 1% ~ (p=0.280 n=20+17)
FmtFprintfString-12 175ns ± 1% 169ns ± 1% -3.01% (p=0.000 n=20+20)
FmtFprintfInt-12 160ns ± 1% 160ns ± 0% +0.53% (p=0.000 n=20+20)
FmtFprintfIntInt-12 265ns ± 0% 266ns ± 1% +0.59% (p=0.000 n=20+20)
FmtFprintfPrefixedInt-12 237ns ± 1% 238ns ± 1% +0.44% (p=0.000 n=20+20)
FmtFprintfFloat-12 326ns ± 1% 341ns ± 1% +4.55% (p=0.000 n=20+19)
FmtManyArgs-12 1.01µs ± 0% 1.02µs ± 0% +0.43% (p=0.000 n=20+19)
GobDecode-12 8.41ms ± 1% 8.30ms ± 2% -1.22% (p=0.000 n=20+19)
GobEncode-12 6.66ms ± 1% 6.68ms ± 0% +0.30% (p=0.000 n=18+19)
Gzip-12 322ms ± 1% 322ms ± 1% ~ (p=1.000 n=20+20)
Gunzip-12 42.8ms ± 0% 42.9ms ± 0% ~ (p=0.174 n=20+20)
HTTPClientServer-12 69.7µs ± 1% 70.6µs ± 1% +1.20% (p=0.000 n=20+20)
JSONEncode-12 16.8ms ± 0% 16.8ms ± 1% ~ (p=0.154 n=19+19)
JSONDecode-12 65.1ms ± 0% 65.3ms ± 1% +0.34% (p=0.003 n=20+20)
Mandelbrot200-12 3.93ms ± 0% 3.92ms ± 0% ~ (p=0.396 n=19+20)
GoParse-12 3.66ms ± 1% 3.65ms ± 1% ~ (p=0.117 n=16+18)
RegexpMatchEasy0_32-12 85.0ns ± 2% 85.5ns ± 2% ~ (p=0.143 n=20+20)
RegexpMatchEasy0_1K-12 267ns ± 1% 267ns ± 1% ~ (p=0.867 n=20+17)
RegexpMatchEasy1_32-12 83.3ns ± 2% 83.8ns ± 1% ~ (p=0.068 n=20+20)
RegexpMatchEasy1_1K-12 432ns ± 1% 432ns ± 1% ~ (p=0.804 n=20+19)
RegexpMatchMedium_32-12 133ns ± 0% 133ns ± 0% ~ (p=1.000 n=20+20)
RegexpMatchMedium_1K-12 40.3µs ± 1% 40.4µs ± 1% ~ (p=0.319 n=20+19)
RegexpMatchHard_32-12 2.10µs ± 1% 2.10µs ± 1% ~ (p=0.723 n=20+18)
RegexpMatchHard_1K-12 63.0µs ± 0% 63.0µs ± 0% ~ (p=0.158 n=19+17)
Revcomp-12 461ms ± 1% 476ms ± 8% +3.29% (p=0.002 n=20+20)
Template-12 80.1ms ± 1% 79.3ms ± 1% -1.00% (p=0.000 n=20+20)
TimeParse-12 360ns ± 0% 360ns ± 0% ~ (p=0.802 n=18+19)
TimeFormat-12 374ns ± 1% 372ns ± 0% -0.77% (p=0.000 n=20+19)
[Geo mean] 61.8µs 62.0µs +0.40%

Change-Id: Ib60cd46b7a4987e07670eb271d22f6cee5802842
Reviewed-on: https://go-review.googlesource.com/20044Reviewed-by: Keith Randall <khr@golang.org>

f11e4eb5

runtime: generalize work.finalizersDone to work.markrootDone · c14d25c6

Austin Clements authored 9 years ago

We're about to add another root marking job that needs to happen only
during the first markroot pass (whether that's concurrent or STW),
just like finalizer scanning. Rather than introducing another flag
that has the same value as finalizersDone, just rename finalizersDone
to markrootDone.

Change-Id: I535356c6ea1f3734cb5b6add264cb7bf48de95e8
Reviewed-on: https://go-review.googlesource.com/20043Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

c14d25c6

07 Mar, 2016 1 commit

runtime: eliminate unnecessary type conversions · a03bdc3e

Matthew Dempsky authored 9 years ago

Automated refactoring produced using github.com/mdempsky/unconvert.

Change-Id: Iacf871a4f221ef17f48999a464ab2858b2bbaa90
Reviewed-on: https://go-review.googlesource.com/20071Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

a03bdc3e

02 Mar, 2016 1 commit

all: single space after period. · 5fea2ccc

Brad Fitzpatrick authored 9 years ago

The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

5fea2ccc

25 Feb, 2016 1 commit

runtime: pass gcWork to markroot · 7b229001

Austin Clements authored 9 years ago

Currently markroot uses a gcWork on the stack and disposes of it
immediately after marking one root. This used to be necessary because
markroot was called from the depths of parfor, but now that we call it
directly and have ready access to a gcWork at the call site, pass the
gcWork in, use it directly in markroot, and share it across calls to
markroot from the same P.

Change-Id: Id7c3b811bfb944153760e01873c07c8d18909be1
Reviewed-on: https://go-review.googlesource.com/19635Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>

7b229001

16 Jan, 2016 1 commit

runtime: fix sleep/wakeup race for GC assists · b05f18e3

Austin Clements authored 9 years ago

GC assists check gcBlackenEnabled under the assist queue lock to avoid
going to sleep after gcWakeAllAssists has already woken all assists.
However, currently we clear gcBlackenEnabled shortly *after* waking
all assists, which opens a window where this exact race can happen.

Fix this by clearing gcBlackenEnabled before waking blocked assists.
However, it's unlikely this actually matters because the world is
stopped between waking assists and clearing gcBlackenEnabled and there
aren't any obvious allocations during this window, so I don't think an
assist could actually slip in to this race window.

Updates #13645.

Change-Id: I7571f059530481dc781d8fd96a1a40aadebecb0d
Reviewed-on: https://go-review.googlesource.com/18682
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

b05f18e3

12 Jan, 2016 2 commits

runtime: mark greyobject go:nowritebarrierrec · 27df2e3f

Austin Clements authored 9 years ago

It would certainly be a mistake to invoke a write barrier while
greying an object.

Change-Id: I34445a15ab09655ea8a3628a507df56aea61e618
Reviewed-on: https://go-review.googlesource.com/18533
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

27df2e3f

runtime: remove out-of-date comment · 7b1f055e

Austin Clements authored 9 years ago

It used to be the case that repeatedly getting one GC pointer and
enqueuing one GC pointer could cause contention on the work buffers as
each operation passed over the boundary of a work buffer. As of
b6c0934a, we use a two buffer cache that prevents this sort of
contention.

Change-Id: I4f1111623f76df9c5493dd9124dec1e0bfaf53b7
Reviewed-on: https://go-review.googlesource.com/18532
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

7b1f055e

11 Jan, 2016 1 commit

runtime: eagerly share GC work buffers · 9439fa10

Rick Hudson authored 9 years ago

Currently, due to an oversight, we only balance work buffers
in background and idle workers and not in assists. As a
result, in assist-heavy workloads, assists are likely to tie
up large work buffers in per-P caches increasing the
likelihood that the global list will be empty. This increases
the likelihood that other GC workers will exit and assists
will block, slowing down the system as a whole. Fix this by
eagerly balancing work buffers as soon as the assists notice
that the global buffers are empty. This makes it much more
likely that work will be immediately available to other
workers and assists.

This change reduces the garbage benchmark time by 39% and
fixes the regresssion seen at CL 15893 golang.org/cl/15893.

Garbage benchmark times before and after this CL.
Before GOPERF-METRIC:time=4427020
After  GOPERF-METRIC:time=2721645

Fixes #13827

Change-Id: I9cb531fb873bab4b69ce9c1617e30df6c49cdcfe
Reviewed-on: https://go-review.googlesource.com/18341Reviewed-by: Austin Clements <austin@google.com>

9439fa10

24 Nov, 2015 1 commit

runtime: make gcFlushBgCredit go:nowritebarrierrec · ab9d5f38

Austin Clements authored 9 years ago

Write barriers in gcFlushBgCredit lead to very subtle bugs because it
executes after the getfull barrier. I tracked some bugs of this form
down before go:nowritebarrierrec was implemented. Ensure that they
don't reappear by making gcFlushBgCredit go:nowritebarrierrec.

Change-Id: Ia5ca2dc59e6268bce8d8b4c87055bd0f6e19bed2
Reviewed-on: https://go-review.googlesource.com/17052Reviewed-by: Russ Cox <rsc@golang.org>

ab9d5f38

19 Nov, 2015 1 commit

runtime: prevent sigprof during all stack barrier ops · 9c9d74ab

Austin Clements authored 9 years ago

A sigprof during stack barrier insertion or removal can crash if it
detects an inconsistency between the stkbar array and the stack
itself. Currently we protect against this when scanning another G's
stack using stackLock, but we don't protect against it when unwinding
stack barriers for a recover or a memmove to the stack.

This commit cleans up and improves the stack locking code. It
abstracts out the lock and unlock operations. It uses the lock
consistently everywhere we perform stack operations, and pushes the
lock/unlock down closer to where the stack barrier operations happen
to make it more obvious what it's protecting. Finally, it modifies
sigprof so that instead of spinning until it acquires the lock, it
simply doesn't perform a traceback if it can't acquire it. This is
necessary to prevent self-deadlock.

Updates #11863, which introduced stackLock to fix some of these
issues, but didn't go far enough.

Updates #12528.

Change-Id: I9d1fa88ae3744d31ba91500c96c6988ce1a3a349
Reviewed-on: https://go-review.googlesource.com/17036Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

9c9d74ab

16 Nov, 2015 1 commit

runtime: add optional expensive check for invalid cgo pointer passing · be1ef467

Ian Lance Taylor authored 9 years ago

If you set GODEBUG=cgocheck=2 the runtime package will use the write
barrier to detect cases where a Go program writes a Go pointer into
non-Go memory.  In conjunction with the existing cgo checks, and the
not-yet-implemented cgo check for exported functions, this should
reliably detect all cases (that do not import the unsafe package) in
which a Go pointer is incorrectly shared with C code.  This check is
optional because it turns on the write barrier at all times, which is
known to be expensive.

Update #12416.

Change-Id: I549d8b2956daa76eac853928e9280e615d6365f4
Reviewed-on: https://go-review.googlesource.com/16899Reviewed-by: Russ Cox <rsc@golang.org>

be1ef467

12 Nov, 2015 1 commit

runtime: break out system-specific constants into package sys · 432cb66f

Michael Matloob authored 9 years ago

runtime/internal/sys will hold system-, architecture- and config-
specific constants.

Updates #11647

Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54
Reviewed-on: https://go-review.googlesource.com/16817Reviewed-by: Russ Cox <rsc@golang.org>

432cb66f

11 Nov, 2015 2 commits

runtime: remove unused marking parfor · d727312c

Austin Clements authored 9 years ago

The GC now handles the root marking jobs as part of general marking,
so work.markfor is no longer used.

Change-Id: I6c3b23fed27e4e7ea6430d6ca7ba25ae4d04ed14
Reviewed-on: https://go-review.googlesource.com/16811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

d727312c

runtime: clean up park messages · f84420c2

Austin Clements authored 9 years ago

This changes "mark worker (idle)" to "GC worker (idle)" so it's more
clear to users that these goroutines are GC-related. It changes "GC
assist" to "GC assist wait" to make it clear that the assist is
blocked.

Change-Id: Iafbc0903c84f9250ff6bee14baac6fcd4ed5ef76
Reviewed-on: https://go-review.googlesource.com/16511Reviewed-by: Rick Hudson <rlh@golang.org>

f84420c2

10 Nov, 2015 1 commit

runtime: break atomics out into package runtime/internal/atomic · 67faca7d

Michael Matloob authored 9 years ago

This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.

all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.

Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>

67faca7d

05 Nov, 2015 2 commits

runtime: decentralize mark done and mark termination · c99d7f7f

Austin Clements authored 9 years ago

This moves all of the mark 1 to mark 2 transition and mark termination
to the mark done transition function. This means these transitions are
now handled on the goroutine that detected mark completion. This also
means that the GC coordinator and the background completion barriers
are no longer used and various workarounds to yield to the coordinator
are no longer necessary. These will be removed in follow-up commits.

One consequence of this is that mark workers now need to be
preemptible when performing the mark done transition. This allows them
to stop the world and to perform the final clean-up steps of GC after
restarting the world. They are only made preemptible while performing
this transition, so if the worker findRunnableGCWorker would schedule
isn't available, we didn't want to schedule it anyway.

Fixes #11970.

Change-Id: I9203a2d6287eeff62d589ec02ad9cb1e29ddb837
Reviewed-on: https://go-review.googlesource.com/16391Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

c99d7f7f

runtime: factor mark done transition · 171204b5

Austin Clements authored 9 years ago

Currently the code for completion of mark 1/mark 2 is duplicated in
background workers and assists. Factor this in to a single function
that will serve as the transition function for concurrent mark.

Change-Id: I4d9f697a15da0d349db3b34d56f3a220dd41d41b
Reviewed-on: https://go-review.googlesource.com/16359Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

171204b5

04 Nov, 2015 1 commit

runtime: eliminate getfull barrier from concurrent mark · 62ba520b

Austin Clements authored 9 years ago

Currently dedicated mark workers participate in the getfull barrier
during concurrent mark. However, the getfull barrier wasn't designed
for concurrent work and this causes no end of headaches.

In the concurrent setting, participants come and go. This makes mark
completion susceptible to live-lock: since dedicated workers are only
periodically polling for completion, it's possible for the program to
be in some transient worker each time one of the dedicated workers
wakes up to check if it can exit the getfull barrier. It also
complicates reasoning about the system because dedicated workers
participate directly in the getfull barrier, but transient workers
must instead use trygetfull because they have exit conditions that
aren't captured by getfull (e.g., fractional workers exit when
preempted). The complexity of implementing these exit conditions
contributed to #11677. Furthermore, the getfull barrier is inefficient
because we could be running user code instead of spinning on a P. In
effect, we're dedicating 25% of the CPU to marking even if that means
we have to spin to make that 25%. It also causes issues on Windows
because we can't actually sleep for 100µs (#8687).

Fix this by making dedicated workers no longer participate in the
getfull barrier. Instead, dedicated workers simply return to the
scheduler when they fail to get more work, regardless of what others
workers are doing, and the scheduler only starts new dedicated workers
if there's work available. Everything that needs to be handled by this
barrier is already handled by detection of mark completion.

This makes the system much more symmetric because all workers and
assists now use trygetfull during concurrent mark. It also loosens the
25% CPU target so that we can give some of that 25% back to user code
if there isn't enough work to keep the mark worker busy. And it
eliminates the problematic 100µs sleep on Windows during concurrent
mark (though not during mark termination).

The downside of this is that if we hit a bottleneck in the heap graph
that then expands back out, the system may shut down dedicated workers
and take a while to start them back up. We'll address this in the next
commit.

Updates #12041 and #8687.

No effect on the go1 benchmarks. This slows down the garbage benchmark
by 9%, but we'll more than make it up in the next commit.

name old time/op new time/op delta
XBenchGarbage-12 5.80ms ± 2% 6.32ms ± 4% +9.03% (p=0.000 n=20+20)

Change-Id: I65100a9ba005a8b5cf97940798918672ea9dd09b
Reviewed-on: https://go-review.googlesource.com/16297Reviewed-by: Rick Hudson <rlh@golang.org>

62ba520b

03 Nov, 2015 2 commits

runtime: make assists preemptible · 45652830

Austin Clements authored 9 years ago

Currently, assists are non-preemptible, which means a heavily
assisting G can block other Gs from running. At the beginning of a GC
cycle, it can also delay scang, which will spin until the assist is
done. Since scanning is currently done sequentially, this can
seriously extend the length of the scan phase.

Fix this by making assists preemptible. Since the assist holds work
buffers and runs on the system stack, this must be done cooperatively:
we make gcDrainN return on preemption, and make the assist return from
the system stack and voluntarily Gosched.

This is prerequisite to enlarging the work buffers. Without this
change, the delays and spinning in scang increase significantly.

This has no effect on the go1 benchmarks.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.72ms ± 4%  5.37ms ± 5%  -6.11%  (p=0.000 n=20+20)

Change-Id: I829e732a0f23b126da633516a1a9ec1a508fdbf1
Reviewed-on: https://go-review.googlesource.com/15894Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

45652830

runtime: replace assist sleep loop with park/ready · 15aa6bbd

Austin Clements authored 9 years ago

GC assists must block until the assist can be satisfied (either
through stealing credit or doing work) or the GC cycle ends.
Currently, this is implemented as a retry loop with a 100 µs delay.
This obviously isn't ideal, as it wastes CPU and delays mutator
execution. It also has the somewhat peculiar downside that sleeping a
G requires allocation, and this requires working around recursive
allocation.

Replace this timed delay with a proper scheduling queue. When an
assist can't be satisfied immediately, it adds the allocating G to a
queue and parks it. Any time background scan credit is flushed, it
consults this queue, directly satisfies the debt of queued assists,
and wakes up satisfied assists before flushing any remaining credit to
the background credit pool.

No effect on the go1 benchmarks. Slightly speeds up the garbage
benchmark.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.81ms ± 1%  5.72ms ± 4%  -1.65%  (p=0.011 n=20+20)

Updates #12041.

Change-Id: I8ee3b6274dd097b12b10a8030796a958a4b0e7b7
Reviewed-on: https://go-review.googlesource.com/15890Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

15aa6bbd

30 Oct, 2015 2 commits

runtime: perform concurrent scan in GC workers · 82d14d77

Austin Clements authored 9 years ago

Currently the concurrent root scan is performed in its entirety by the
GC coordinator before entering concurrent mark (which enables GC
workers). This scan is done sequentially, which can prolong the scan
phase, delay the mark phase, and means that the scan phase does not
obey the 25% CPU goal. Furthermore, there's no need to complete the
root scan before starting marking (in fact, we already allow GC
assists to happen during the scan phase), so this acts as an
unnecessary barrier between root scanning and marking.

This change shifts the root scan work out of the GC coordinator and in
to the GC workers. The coordinator simply sets up the scan state and
enqueues the right number of root scan jobs. The GC workers then drain
the root scan jobs prior to draining heap scan jobs.

This parallelizes the root scan process, makes it obey the 25% CPU
goal, and effectively eliminates root scanning as an isolated phase,
allowing the system to smoothly transition from root scanning to heap
marking. This also eliminates a major non-STW responsibility of the GC
coordinator, which will make it easier to switch to a decentralized
state machine. Finally, it puts us in a good position to perform root
scanning in assists as well, which will help satisfy assists at the
beginning of the GC cycle.

This is mostly straightforward. One tricky aspect is that we have to
deal with preemption deadlock: where two non-preemptible gorountines
are trying to preempt each other to perform a stack scan. Given the
context where this happens, the only instance of this is two
background workers trying to scan each other. We avoid this by simply
not scanning the stacks of background workers during the concurrent
phase; this is safe because we'll scan them during mark termination
(and their stacks are *very* small and should not contain any new
pointers).

This change also switches the root marking during mark termination to
use the same gcDrain-based code path as concurrent mark. This
shouldn't affect performance because STW root marking was already
parallel and tasks switched to heap marking immediately when no more
root marking tasks were available. However, it simplifies the code and
unifies these code paths.

This has negligible effect on the go1 benchmarks. It slightly slows
down the garbage benchmark, possibly by making GC run slightly more
frequently.

name old time/op new time/op delta
XBenchGarbage-12 5.10ms ± 1% 5.24ms ± 1% +2.87% (p=0.000 n=18+18)

name old time/op new time/op delta
BinaryTree17-12 3.25s ± 3% 3.20s ± 5% -1.57% (p=0.013 n=20+20)
Fannkuch11-12 2.45s ± 1% 2.46s ± 1% +0.38% (p=0.019 n=20+18)
FmtFprintfEmpty-12 49.7ns ± 3% 49.9ns ± 4% ~ (p=0.851 n=19+20)
FmtFprintfString-12 170ns ± 2% 170ns ± 1% ~ (p=0.775 n=20+19)
FmtFprintfInt-12 161ns ± 1% 160ns ± 1% -0.78% (p=0.000 n=19+18)
FmtFprintfIntInt-12 267ns ± 1% 270ns ± 1% +1.04% (p=0.000 n=19+19)
FmtFprintfPrefixedInt-12 238ns ± 2% 238ns ± 1% ~ (p=0.133 n=18+19)
FmtFprintfFloat-12 311ns ± 1% 310ns ± 2% -0.35% (p=0.023 n=20+19)
FmtManyArgs-12 1.08µs ± 1% 1.06µs ± 1% -2.31% (p=0.000 n=20+20)
GobDecode-12 8.65ms ± 1% 8.63ms ± 1% ~ (p=0.377 n=18+20)
GobEncode-12 6.49ms ± 1% 6.52ms ± 1% +0.37% (p=0.015 n=20+20)
Gzip-12 319ms ± 3% 318ms ± 1% ~ (p=0.975 n=19+17)
Gunzip-12 41.9ms ± 1% 42.1ms ± 2% +0.65% (p=0.004 n=19+20)
HTTPClientServer-12 61.7µs ± 1% 62.6µs ± 1% +1.40% (p=0.000 n=18+20)
JSONEncode-12 16.8ms ± 1% 16.9ms ± 1% ~ (p=0.239 n=20+18)
JSONDecode-12 58.4ms ± 1% 60.7ms ± 1% +3.85% (p=0.000 n=19+20)
Mandelbrot200-12 3.86ms ± 0% 3.86ms ± 1% ~ (p=0.092 n=18+19)
GoParse-12 3.75ms ± 2% 3.75ms ± 2% ~ (p=0.708 n=19+20)
RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 2% +0.60% (p=0.010 n=17+20)
RegexpMatchEasy0_1K-12 341ns ± 1% 342ns ± 2% ~ (p=0.203 n=20+19)
RegexpMatchEasy1_32-12 82.5ns ± 2% 83.2ns ± 2% +0.83% (p=0.007 n=19+19)
RegexpMatchEasy1_1K-12 495ns ± 1% 495ns ± 2% ~ (p=0.970 n=19+18)
RegexpMatchMedium_32-12 130ns ± 2% 130ns ± 2% +0.59% (p=0.039 n=19+20)
RegexpMatchMedium_1K-12 39.2µs ± 1% 39.3µs ± 1% ~ (p=0.214 n=18+18)
RegexpMatchHard_32-12 2.03µs ± 2% 2.02µs ± 1% ~ (p=0.166 n=18+19)
RegexpMatchHard_1K-12 61.0µs ± 1% 60.9µs ± 1% ~ (p=0.169 n=20+18)
Revcomp-12 533ms ± 1% 535ms ± 1% ~ (p=0.071 n=19+17)
Template-12 68.1ms ± 2% 73.0ms ± 1% +7.26% (p=0.000 n=19+20)
TimeParse-12 355ns ± 2% 356ns ± 2% ~ (p=0.530 n=19+20)
TimeFormat-12 357ns ± 2% 347ns ± 1% -2.59% (p=0.000 n=20+19)
[Geo mean] 62.1µs 62.3µs +0.31%

name old speed new speed delta
GobDecode-12 88.7MB/s ± 1% 88.9MB/s ± 1% ~ (p=0.377 n=18+20)
GobEncode-12 118MB/s ± 1% 118MB/s ± 1% -0.37% (p=0.015 n=20+20)
Gzip-12 60.9MB/s ± 3% 60.9MB/s ± 1% ~ (p=0.944 n=19+17)
Gunzip-12 464MB/s ± 1% 461MB/s ± 2% -0.64% (p=0.004 n=19+20)
JSONEncode-12 115MB/s ± 1% 115MB/s ± 1% ~ (p=0.236 n=20+18)
JSONDecode-12 33.2MB/s ± 1% 32.0MB/s ± 1% -3.71% (p=0.000 n=19+20)
GoParse-12 15.5MB/s ± 2% 15.5MB/s ± 2% ~ (p=0.702 n=19+20)
RegexpMatchEasy0_32-12 320MB/s ± 1% 318MB/s ± 2% ~ (p=0.094 n=18+20)
RegexpMatchEasy0_1K-12 3.00GB/s ± 1% 2.99GB/s ± 1% ~ (p=0.194 n=20+19)
RegexpMatchEasy1_32-12 388MB/s ± 2% 385MB/s ± 2% -0.83% (p=0.008 n=19+19)
RegexpMatchEasy1_1K-12 2.07GB/s ± 1% 2.07GB/s ± 1% ~ (p=0.964 n=19+18)
RegexpMatchMedium_32-12 7.68MB/s ± 1% 7.64MB/s ± 2% -0.57% (p=0.020 n=19+20)
RegexpMatchMedium_1K-12 26.1MB/s ± 1% 26.1MB/s ± 1% ~ (p=0.211 n=18+18)
RegexpMatchHard_32-12 15.8MB/s ± 1% 15.8MB/s ± 1% ~ (p=0.180 n=18+19)
RegexpMatchHard_1K-12 16.8MB/s ± 1% 16.8MB/s ± 2% ~ (p=0.236 n=20+19)
Revcomp-12 477MB/s ± 1% 475MB/s ± 1% ~ (p=0.071 n=19+17)
Template-12 28.5MB/s ± 2% 26.6MB/s ± 1% -6.77% (p=0.000 n=19+20)
[Geo mean] 100MB/s 99.0MB/s -0.82%

Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0
Reviewed-on: https://go-review.googlesource.com/16059Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

82d14d77

runtime: consolidate "out of GC work" checks · 4cca1cc0

Austin Clements authored 9 years ago

We already have gcMarkWorkAvailable, but the check for GC mark work is
open-coded in several places. Generalize gcMarkWorkAvailable slightly
and replace these open-coded checks with calls to gcMarkWorkAvailable.

In addition to cleaning up the code, this puts us in a better position
to make this check slightly more complicated.

Change-Id: I1b29883300ecd82a1bf6be193e9b4ee96582a860
Reviewed-on: https://go-review.googlesource.com/16058Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

4cca1cc0