runtime: better approximate total cost of scavenging

Currently, the scavenger is paced according to how long it takes to scavenge one runtime page's worth of memory. However, this pacing doesn't take into account the additional cost of actually using a scavenged page. This operation, "sysUsed," is a counterpart to the scavenging operation "sysUnused." On most systems this operation is a no-op, but on some systems like Darwin and Windows we actually make a syscall. Even on systems where it's a no-op, the cost is implicit: a more expensive page fault when re-using the page. On Darwin in particular the cost of "sysUnused" is fairly close to the cost of "sysUsed", which skews the pacing to be too fast. A lot of soon-to-be-allocated memory ends up scavenged, resulting in many more expensive "sysUsed" operations, ultimately slowing down the application. The way to fix this problem is to include the future cost of "sysUsed" on a page in the scavenging cost. However, measuring the "sysUsed" cost directly (like we do with "sysUnused") on most systems is infeasible because we would have to measure the cost of the first access. Instead, this change applies a multiplicative constant to the measured scavenging time which is based on a per-system ratio of "sysUnused" to "sysUsed" costs in the worst case (on systems where it's a no-op, we measure the cost of the first access). This ultimately slows down the scavenger to a more reasonable pace, limiting its impact on performance but still retaining the memory footprint improvements from the previous release. Fixes #36507. Change-Id: I050659cd8cdfa5a32f5cc0b56622716ea0fa5407 Reviewed-on: https://go-review.googlesource.com/c/go/+/214517 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>

runtime: better approximate total cost of scavenging
Currently, the scavenger is paced according to how long it takes to scavenge one runtime page's worth of memory. However, this pacing doesn't take into account the additional cost of actually using a scavenged page. This operation, "sysUsed," is a counterpart to the scavenging operation "sysUnused." On most systems this operation is a no-op, but on some systems like Darwin and Windows we actually make a syscall. Even on systems where it's a no-op, the cost is implicit: a more expensive page fault when re-using the page. On Darwin in particular the cost of "sysUnused" is fairly close to the cost of "sysUsed", which skews the pacing to be too fast. A lot of soon-to-be-allocated memory ends up scavenged, resulting in many more expensive "sysUsed" operations, ultimately slowing down the application. The way to fix this problem is to include the future cost of "sysUsed" on a page in the scavenging cost. However, measuring the "sysUsed" cost directly (like we do with "sysUnused") on most systems is infeasible because we would have to measure the cost of the first access. Instead, this change applies a multiplicative constant to the measured scavenging time which is based on a per-system ratio of "sysUnused" to "sysUsed" costs in the worst case (on systems where it's a no-op, we measure the cost of the first access). This ultimately slows down the scavenger to a more reasonable pace, limiting its impact on performance but still retaining the memory footprint improvements from the previous release. Fixes #36507. Change-Id: I050659cd8cdfa5a32f5cc0b56622716ea0fa5407 Reviewed-on: https://go-review.googlesource.com/c/go/+/214517 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
71154e06 · Michael Anthony Knyszek · Michael Knyszek · 0c65531e · 71154e06
Commit 71154e06 authored Jan 13, 2020 by Michael Anthony Knyszek Committed by Michael Knyszek Jan 14, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 23 additions and 4 deletions

src/runtime/mgcscavenge.go src/runtime/mgcscavenge.go +23 -4

No files found.
--- a/src/runtime/mgcscavenge.go
+++ b/src/runtime/mgcscavenge.go
@@ -80,6 +80,17 @@ const (
 	// maxPagesPerPhysPage is the maximum number of supported runtime pages per
 	// physical page, based on maxPhysPageSize.
 	maxPagesPerPhysPage = maxPhysPageSize / pageSize
+	// scavengeCostRatio is the approximate ratio between the costs of using previously
+	// scavenged memory and scavenging memory.
+	//
+	// For most systems the cost of scavenging greatly outweighs the costs
+	// associated with using scavenged memory, making this constant 0. On other systems
+	// (especially ones where "sysUsed" is not just a no-op) this cost is non-trivial.
+	//
+	// This ratio is used as part of multiplicative factor to help the scavenger account
+	// for the additional costs of using scavenged memory in its pacing.
+	scavengeCostRatio = 0.7 * sys.GoosDarwin
 )
 // heapRetained returns an estimate of the current heap RSS.
@@ -246,7 +257,7 @@ func bgscavenge(c chan int) {
 		released := uintptr(0)
 		// Time in scavenging critical section.
-		crit := int64(0)
+		crit := float64(0)
 		// Run on the system stack since we grab the heap lock,
 		// and a stack growth with the heap lock means a deadlock.
@@ -265,7 +276,7 @@ func bgscavenge(c chan int) {
 			start := nanotime()
 			released = mheap_.pages.scavengeOne(physPageSize, false)
 			atomic.Xadduintptr(&mheap_.pages.scavReleased, released)
-			crit = nanotime() - start
+			crit = float64(nanotime() - start)
 		})
 		if released == 0 {
@@ -275,6 +286,14 @@ func bgscavenge(c chan int) {
 			continue
 		}
+		// Multiply the critical time by 1 + the ratio of the costs of using
+		// scavenged memory vs. scavenging memory. This forces us to pay down
+		// the cost of reusing this memory eagerly by sleeping for a longer period
+		// of time and scavenging less frequently. More concretely, we avoid situations
+		// where we end up scavenging so often that we hurt allocation performance
+		// because of the additional overheads of using scavenged memory.
+		crit *= 1 + scavengeCostRatio
 		// If we spent more than 10 ms (for example, if the OS scheduled us away, or someone
 		// put their machine to sleep) in the critical section, bound the time we use to
 		// calculate at 10 ms to avoid letting the sleep time get arbitrarily high.
@@ -290,13 +309,13 @@ func bgscavenge(c chan int) {
 		// much, then scavengeEMWA < idealFraction, so we'll adjust the sleep time
 		// down.
 		adjust := scavengeEWMA / idealFraction
-		sleepTime := int64(adjust * float64(crit) / (scavengePercent / 100.0))
+		sleepTime := int64(adjust * crit / (scavengePercent / 100.0))
 		// Go to sleep.
 		slept := scavengeSleep(sleepTime)
 		// Compute the new ratio.
-		fraction := float64(crit) / float64(crit+slept)
+		fraction := crit / (crit + float64(slept))
 		// Set a lower bound on the fraction.
 		// Due to OS-related anomalies we may "sleep" for an inordinate amount