An error occurred fetching the project authors.
  1. 19 May, 2016 1 commit
  2. 16 May, 2016 1 commit
  3. 30 Apr, 2016 2 commits
    • Austin Clements's avatar
      runtime: reclaim scan/dead bit in first word · a20fd1f6
      Austin Clements authored
      With the switch to separate mark bitmaps, the scan/dead bit for the
      first word of each object is now unused. Reclaim this bit and use it
      as a scan/dead bit, just like words three and on. The second word is
      still used for checkmark.
      
      This dramatically simplifies heapBitsSetTypeNoScan and hasPointers,
      since they no longer need different cases for 1, 2, and 3+ word
      objects. They can instead just manipulate the heap bitmap for the
      first word and be done with it.
      
      In order to enable this, we change heapBitsSetType and runGCProg to
      always set the scan/dead bit to scan for the first word on every code
      path. Since these functions only apply to types that have pointers,
      there's no need to do this conditionally: it's *always* necessary to
      set the scan bit in the first word.
      
      We also change every place that scans an object and checks if there
      are more pointers. Rather than only checking morePointers if the word
      is >= 2, we now check morePointers if word != 1 (since that's the
      checkmark word).
      
      Looking forward, we should probably reclaim the checkmark bit, too,
      but that's going to be quite a bit more work.
      
      Tested by setting doubleCheck in heapBitsSetType and running all.bash
      on both linux/amd64 and linux/386, and by running GOGC=10 all.bash.
      
      This particularly improves the FmtFprintf* go1 benchmarks, since they
      do a large amount of noscan allocation.
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.34s ± 1%     2.38s ± 1%  +1.70%  (p=0.000 n=17+19)
      Fannkuch11-12                2.09s ± 0%     2.09s ± 1%    ~     (p=0.276 n=17+16)
      FmtFprintfEmpty-12          44.9ns ± 2%    44.8ns ± 2%    ~     (p=0.340 n=19+18)
      FmtFprintfString-12          127ns ± 0%     125ns ± 0%  -1.57%  (p=0.000 n=16+15)
      FmtFprintfInt-12             128ns ± 0%     122ns ± 1%  -4.45%  (p=0.000 n=15+20)
      FmtFprintfIntInt-12          207ns ± 1%     193ns ± 0%  -6.55%  (p=0.000 n=19+14)
      FmtFprintfPrefixedInt-12     197ns ± 1%     191ns ± 0%  -2.93%  (p=0.000 n=17+18)
      FmtFprintfFloat-12           263ns ± 0%     248ns ± 1%  -5.88%  (p=0.000 n=15+19)
      FmtManyArgs-12               794ns ± 0%     779ns ± 1%  -1.90%  (p=0.000 n=18+18)
      GobDecode-12                7.14ms ± 2%    7.11ms ± 1%    ~     (p=0.072 n=20+20)
      GobEncode-12                5.85ms ± 1%    5.82ms ± 1%  -0.49%  (p=0.000 n=20+20)
      Gzip-12                      218ms ± 1%     215ms ± 1%  -1.22%  (p=0.000 n=19+19)
      Gunzip-12                   36.8ms ± 0%    36.7ms ± 0%  -0.18%  (p=0.006 n=18+20)
      HTTPClientServer-12         77.1µs ± 4%    77.1µs ± 3%    ~     (p=0.945 n=19+20)
      JSONEncode-12               15.6ms ± 1%    15.9ms ± 1%  +1.68%  (p=0.000 n=18+20)
      JSONDecode-12               55.2ms ± 1%    53.6ms ± 1%  -2.93%  (p=0.000 n=17+19)
      Mandelbrot200-12            4.05ms ± 1%    4.05ms ± 0%    ~     (p=0.306 n=17+17)
      GoParse-12                  3.14ms ± 1%    3.10ms ± 1%  -1.31%  (p=0.000 n=19+18)
      RegexpMatchEasy0_32-12      69.3ns ± 1%    70.0ns ± 0%  +0.89%  (p=0.000 n=19+17)
      RegexpMatchEasy0_1K-12       237ns ± 1%     236ns ± 0%  -0.62%  (p=0.000 n=19+16)
      RegexpMatchEasy1_32-12      69.5ns ± 1%    70.3ns ± 1%  +1.14%  (p=0.000 n=18+17)
      RegexpMatchEasy1_1K-12       377ns ± 1%     366ns ± 1%  -3.03%  (p=0.000 n=15+19)
      RegexpMatchMedium_32-12      107ns ± 1%     107ns ± 2%    ~     (p=0.318 n=20+19)
      RegexpMatchMedium_1K-12     33.8µs ± 3%    33.5µs ± 1%  -1.04%  (p=0.001 n=20+19)
      RegexpMatchHard_32-12       1.68µs ± 1%    1.73µs ± 0%  +2.50%  (p=0.000 n=20+18)
      RegexpMatchHard_1K-12       50.8µs ± 1%    52.0µs ± 1%  +2.50%  (p=0.000 n=19+18)
      Revcomp-12                   381ms ± 1%     385ms ± 1%  +1.00%  (p=0.000 n=17+18)
      Template-12                 64.9ms ± 3%    62.6ms ± 1%  -3.55%  (p=0.000 n=19+18)
      TimeParse-12                 324ns ± 0%     328ns ± 1%  +1.25%  (p=0.000 n=18+18)
      TimeFormat-12                345ns ± 0%     334ns ± 0%  -3.31%  (p=0.000 n=15+17)
      [Geo mean]                  52.1µs         51.5µs       -1.00%
      
      Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049
      Reviewed-on: https://go-review.googlesource.com/22632Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a20fd1f6
    • Austin Clements's avatar
      runtime: use morePointers and isPointer in more places · d5e3d08b
      Austin Clements authored
      This makes this code better self-documenting and makes it easier to
      find these places in the future.
      
      Change-Id: I31dc5598ae67f937fb9ef26df92fd41d01e983c3
      Reviewed-on: https://go-review.googlesource.com/22631Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      d5e3d08b
  4. 29 Apr, 2016 2 commits
    • Austin Clements's avatar
      [dev.garbage] runtime: use s.base() everywhere it makes sense · b7adc41f
      Austin Clements authored
      Currently we have lots of (s.start << _PageShift) and variants. We now
      have an s.base() function that returns this. It's faster and more
      readable, so use it.
      
      Change-Id: I888060a9dae15ea75ca8cc1c2b31c905e71b452b
      Reviewed-on: https://go-review.googlesource.com/22559Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      b7adc41f
    • Rick Hudson's avatar
      [dev.garbage] runtime: restructure alloc and mark bits · 2063d5d9
      Rick Hudson authored
      Two changes are included here that are dependent on the other.
      The first is that allocBits and gcamrkBits are changed to
      a *uint8 which points to the first byte of that span's
      mark and alloc bits. Several places were altered to
      perform pointer arithmetic to locate the byte corresponding
      to an object in the span. The actual bit corresponding
      to an object is indexed in the byte by using the lower three
      bits of the objects index.
      
      The second change avoids the redundant calculation of an
      object's index. The index is returned from heapBitsForObject
      and then used by the functions indexing allocBits
      and gcmarkBits.
      
      Finally we no longer allocate the gc bits in the span
      structures. Instead we use an arena based allocation scheme
      that allows for a more compact bit map as well as recycling
      and bulk clearing of the mark bits.
      
      Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef
      Reviewed-on: https://go-review.googlesource.com/20705Reviewed-by: default avatarAustin Clements <austin@google.com>
      2063d5d9
  5. 27 Apr, 2016 4 commits
    • Rick Hudson's avatar
      [dev.garbage] runtime: add gc work buffer tryGet and put fast paths · 1354b32c
      Rick Hudson authored
      The complexity of the GC work buffers put and tryGet
      prevented them from being inlined. This CL simplifies
      the fast path thus enabling inlining. If the fast
      path does not succeed the previous put and tryGet
      functions are called.
      
      Change-Id: I6da6495d0dadf42bd0377c110b502274cc01acf5
      Reviewed-on: https://go-review.googlesource.com/20704Reviewed-by: default avatarAustin Clements <austin@google.com>
      1354b32c
    • Rick Hudson's avatar
      [dev.garbage] runtime: cleanup and optimize span.base() · f8d0d4fd
      Rick Hudson authored
      Prior to this CL the base of a span was calculated in various
      places using shifts or calls to base(). This CL now
      always calls base() which has been optimized to calculate the
      base of the span when the span is initialized and store that
      value in the span structure.
      
      Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100
      Reviewed-on: https://go-review.googlesource.com/20703Reviewed-by: default avatarAustin Clements <austin@google.com>
      f8d0d4fd
    • Rick Hudson's avatar
      [dev.garbage] runtime: allocate directly from GC mark bits · 3479b065
      Rick Hudson authored
      Instead of building a freelist from the mark bits generated
      by the GC this CL allocates directly from the mark bits.
      
      The approach moves the mark bits from the pointer/no pointer
      heap structures into their own per span data structures. The
      mark/allocation vectors consist of a single mark bit per
      object. Two vectors are maintained, one for allocation and
      one for the GC's mark phase. During the GC cycle's sweep
      phase the interpretation of the vectors is swapped. The
      mark vector becomes the allocation vector and the old
      allocation vector is cleared and becomes the mark vector that
      the next GC cycle will use.
      
      Marked entries in the allocation vector indicate that the
      object is not free. Each allocation vector maintains a boundary
      between areas of the span already allocated from and areas
      not yet allocated from. As objects are allocated this boundary
      is moved until it reaches the end of the span. At this point
      further allocations will be done from another span.
      
      Since we no longer sweep a span inspecting each freed object
      the responsibility for maintaining pointer/scalar bits in
      the heapBitMap containing is now the responsibility of the
      the routines doing the actual allocation.
      
      This CL is functionally complete and ready for performance
      tuning.
      
      Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04
      Reviewed-on: https://go-review.googlesource.com/19470Reviewed-by: default avatarAustin Clements <austin@google.com>
      3479b065
    • Austin Clements's avatar
      runtime: don't rescan globals · b49b71ae
      Austin Clements authored
      Currently the runtime rescans globals during mark 2 and mark
      termination. This costs as much as 500µs/MB in STW time, which is
      enough to surpass the 10ms STW limit with only 20MB of globals.
      
      It's also basically unnecessary. The compiler already generates write
      barriers for global -> heap pointer updates and the regular write
      barrier doesn't check whether the slot is a global or in the heap.
      Some less common write barriers do cause problems.
      heapBitsBulkBarrier, which is used by typedmemmove and related
      functions, currently depends on having access to the pointer bitmap
      and as a result ignores writes to globals. Likewise, the
      reflect-related write barriers reflect_typedmemmovepartial and
      callwritebarrier ignore non-heap destinations; though it appears they
      can never be called with global pointers anyway.
      
      This commit makes heapBitsBulkBarrier issue write barriers for writes
      to global pointers using the data and BSS pointer bitmaps, removes the
      inheap checks from the reflection write barriers, and eliminates the
      rescans during mark 2 and mark termination. It also adds a test that
      writes to globals have write barriers.
      
      Programs with large data+BSS segments (with pointers) aren't common,
      but for programs that do have large data+BSS segments, this
      significantly reduces pause time:
      
      name \ 95%ile-time/markTerm              old         new  delta
      LargeBSS/bss:1GB/gomaxprocs:4  148200µs ± 6%  302µs ±52%  -99.80% (p=0.008 n=5+5)
      
      This very slightly improves the go1 benchmarks:
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.62s ± 3%     2.62s ± 4%    ~     (p=0.904 n=20+20)
      Fannkuch11-12                2.15s ± 1%     2.13s ± 0%  -1.29%  (p=0.000 n=18+20)
      FmtFprintfEmpty-12          48.3ns ± 2%    47.6ns ± 1%  -1.52%  (p=0.000 n=20+16)
      FmtFprintfString-12          152ns ± 0%     152ns ± 1%    ~     (p=0.725 n=18+18)
      FmtFprintfInt-12             150ns ± 1%     149ns ± 1%  -1.14%  (p=0.000 n=19+20)
      FmtFprintfIntInt-12          250ns ± 0%     244ns ± 1%  -2.12%  (p=0.000 n=20+18)
      FmtFprintfPrefixedInt-12     219ns ± 1%     217ns ± 1%  -1.20%  (p=0.000 n=19+20)
      FmtFprintfFloat-12           280ns ± 0%     281ns ± 1%  +0.47%  (p=0.000 n=19+19)
      FmtManyArgs-12               928ns ± 0%     923ns ± 1%  -0.53%  (p=0.000 n=19+18)
      GobDecode-12                7.21ms ± 1%    7.24ms ± 2%    ~     (p=0.091 n=19+19)
      GobEncode-12                6.07ms ± 1%    6.05ms ± 1%  -0.36%  (p=0.002 n=20+17)
      Gzip-12                      265ms ± 1%     265ms ± 1%    ~     (p=0.496 n=20+19)
      Gunzip-12                   39.6ms ± 1%    39.3ms ± 1%  -0.85%  (p=0.000 n=19+19)
      HTTPClientServer-12         74.0µs ± 2%    73.8µs ± 1%    ~     (p=0.569 n=20+19)
      JSONEncode-12               15.4ms ± 1%    15.3ms ± 1%  -0.25%  (p=0.049 n=17+17)
      JSONDecode-12               53.7ms ± 2%    53.0ms ± 1%  -1.29%  (p=0.000 n=18+17)
      Mandelbrot200-12            3.97ms ± 1%    3.97ms ± 0%    ~     (p=0.072 n=17+18)
      GoParse-12                  3.35ms ± 2%    3.36ms ± 1%  +0.51%  (p=0.005 n=18+20)
      RegexpMatchEasy0_32-12      72.7ns ± 2%    72.2ns ± 1%  -0.70%  (p=0.005 n=19+19)
      RegexpMatchEasy0_1K-12       246ns ± 1%     245ns ± 0%  -0.60%  (p=0.000 n=18+16)
      RegexpMatchEasy1_32-12      72.8ns ± 1%    72.5ns ± 1%  -0.37%  (p=0.011 n=18+18)
      RegexpMatchEasy1_1K-12       380ns ± 1%     385ns ± 1%  +1.34%  (p=0.000 n=20+19)
      RegexpMatchMedium_32-12      115ns ± 2%     115ns ± 1%  +0.44%  (p=0.047 n=20+20)
      RegexpMatchMedium_1K-12     35.4µs ± 1%    35.5µs ± 1%    ~     (p=0.079 n=18+19)
      RegexpMatchHard_32-12       1.83µs ± 0%    1.80µs ± 1%  -1.76%  (p=0.000 n=18+18)
      RegexpMatchHard_1K-12       55.1µs ± 0%    54.3µs ± 1%  -1.42%  (p=0.000 n=18+19)
      Revcomp-12                   386ms ± 1%     381ms ± 1%  -1.14%  (p=0.000 n=18+18)
      Template-12                 61.5ms ± 2%    61.5ms ± 2%    ~     (p=0.647 n=19+20)
      TimeParse-12                 338ns ± 0%     336ns ± 1%  -0.72%  (p=0.000 n=14+19)
      TimeFormat-12                350ns ± 0%     357ns ± 0%  +2.05%  (p=0.000 n=19+18)
      [Geo mean]                  55.3µs         55.0µs       -0.41%
      
      Change-Id: I57e8720385a1b991aeebd111b6874354308e2a6b
      Reviewed-on: https://go-review.googlesource.com/20829
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      b49b71ae
  6. 26 Apr, 2016 4 commits
    • Austin Clements's avatar
      runtime: make stack re-scan O(# dirty stacks) · 2a889b9d
      Austin Clements authored
      Currently the stack re-scan during mark termination is O(# stacks)
      because we enqueue a root marking job for every goroutine. It takes
      ~34ns to process this root marking job for a valid (clean) stack, so
      at around 300k goroutines we exceed the 10ms pause goal. A non-trivial
      portion of this time is spent simply taking the cache miss to check
      the gcscanvalid flag, so simply optimizing the path that handles clean
      stacks can only improve this so much.
      
      Fix this by keeping an explicit list of goroutines with dirty stacks
      that need to be rescanned. When a goroutine first transitions to
      running after a stack scan and marks its stack dirty, it adds itself
      to this list. We enqueue root marking jobs only for the goroutines in
      this list, so this improves stack re-scanning asymptotically by
      completely eliminating time spent on clean goroutines.
      
      This reduces mark termination time for 500k idle goroutines from 15ms
      to 238µs. Overall performance effect is negligible.
      
      name \ 95%ile-time/markTerm     old           new         delta
      IdleGs/gs:500000/gomaxprocs:12  15000µs ± 0%  238µs ± 5%  -98.41% (p=0.000 n=10+10)
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  2.30ms ± 3%  2.29ms ± 1%  -0.43%  (p=0.049 n=17+18)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.57s ± 3%     2.59s ± 2%    ~     (p=0.141 n=19+20)
      Fannkuch11-12                2.09s ± 0%     2.10s ± 1%  +0.53%  (p=0.000 n=19+19)
      FmtFprintfEmpty-12          45.3ns ± 3%    45.2ns ± 2%    ~     (p=0.845 n=20+20)
      FmtFprintfString-12          129ns ± 0%     127ns ± 0%  -1.55%  (p=0.000 n=16+16)
      FmtFprintfInt-12             123ns ± 0%     119ns ± 1%  -3.24%  (p=0.000 n=19+19)
      FmtFprintfIntInt-12          195ns ± 1%     189ns ± 1%  -3.11%  (p=0.000 n=17+17)
      FmtFprintfPrefixedInt-12     193ns ± 1%     187ns ± 1%  -3.06%  (p=0.000 n=19+19)
      FmtFprintfFloat-12           254ns ± 0%     255ns ± 1%  +0.35%  (p=0.001 n=14+17)
      FmtManyArgs-12               781ns ± 0%     770ns ± 0%  -1.48%  (p=0.000 n=16+19)
      GobDecode-12                7.00ms ± 1%    6.98ms ± 1%    ~     (p=0.563 n=19+19)
      GobEncode-12                5.91ms ± 1%    5.92ms ± 0%    ~     (p=0.118 n=19+18)
      Gzip-12                      219ms ± 1%     215ms ± 1%  -1.81%  (p=0.000 n=18+18)
      Gunzip-12                   37.2ms ± 0%    37.4ms ± 0%  +0.45%  (p=0.000 n=17+19)
      HTTPClientServer-12         76.9µs ± 3%    77.5µs ± 2%  +0.81%  (p=0.030 n=20+19)
      JSONEncode-12               15.0ms ± 0%    14.8ms ± 1%  -0.88%  (p=0.001 n=15+19)
      JSONDecode-12               50.6ms ± 0%    53.2ms ± 2%  +5.07%  (p=0.000 n=17+19)
      Mandelbrot200-12            4.05ms ± 0%    4.05ms ± 1%    ~     (p=0.581 n=16+17)
      GoParse-12                  3.34ms ± 1%    3.30ms ± 1%  -1.21%  (p=0.000 n=15+20)
      RegexpMatchEasy0_32-12      69.6ns ± 1%    69.8ns ± 2%    ~     (p=0.566 n=19+19)
      RegexpMatchEasy0_1K-12       238ns ± 1%     236ns ± 0%  -0.91%  (p=0.000 n=17+13)
      RegexpMatchEasy1_32-12      69.8ns ± 1%    70.0ns ± 1%  +0.23%  (p=0.026 n=17+16)
      RegexpMatchEasy1_1K-12       371ns ± 1%     363ns ± 1%  -2.07%  (p=0.000 n=19+19)
      RegexpMatchMedium_32-12      107ns ± 2%     106ns ± 1%  -0.51%  (p=0.031 n=18+20)
      RegexpMatchMedium_1K-12     33.0µs ± 0%    32.9µs ± 0%  -0.30%  (p=0.004 n=16+16)
      RegexpMatchHard_32-12       1.70µs ± 0%    1.70µs ± 0%  +0.45%  (p=0.000 n=16+17)
      RegexpMatchHard_1K-12       51.1µs ± 2%    51.4µs ± 1%  +0.53%  (p=0.000 n=17+19)
      Revcomp-12                   378ms ± 1%     385ms ± 1%  +1.92%  (p=0.000 n=19+18)
      Template-12                 64.3ms ± 2%    65.0ms ± 2%  +1.09%  (p=0.001 n=19+19)
      TimeParse-12                 315ns ± 1%     317ns ± 2%    ~     (p=0.108 n=18+20)
      TimeFormat-12                360ns ± 1%     337ns ± 0%  -6.30%  (p=0.000 n=18+13)
      [Geo mean]                  51.8µs         51.6µs       -0.48%
      
      Change-Id: Icf8994671476840e3998236e15407a505d4c760c
      Reviewed-on: https://go-review.googlesource.com/20700Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      2a889b9d
    • Austin Clements's avatar
      runtime: remove stack barriers during concurrent mark · 269c969c
      Austin Clements authored
      Currently we remove stack barriers during STW mark termination, which
      has a non-trivial per-goroutine cost and means that we have to touch
      even clean stacks during mark termination. However, there's no problem
      with leaving them in during the sweep phase. They just have to be out
      by the time we install new stack barriers immediately prior to
      scanning the stack such as during the mark phase of the next GC cycle
      or during mark termination in a STW GC.
      
      Hence, move the gcRemoveStackBarriers from STW mark termination to
      just before we install new stack barriers during concurrent mark. This
      removes the cost from STW. Furthermore, this combined with concurrent
      stack shrinking means that the mark termination scan of a clean stack
      is a complete no-op, which will make it possible to skip clean stacks
      entirely during mark termination.
      
      This has the downside that it will mess up anything outside of Go that
      tries to walk Go stacks all the time instead of just some of the time.
      This includes tools like GDB, perf, and VTune. We'll improve the
      situation shortly.
      
      Change-Id: Ia40baad8f8c16aeefac05425e00b0cf478137097
      Reviewed-on: https://go-review.googlesource.com/20667Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      269c969c
    • Austin Clements's avatar
      runtime: avoid span root marking entirely during mark termination · efb0c554
      Austin Clements authored
      Currently we enqueue span root mark jobs during both concurrent mark
      and mark termination, but we make the job a no-op during mark
      termination.
      
      This is silly. Instead of queueing them up just to not do them, don't
      queue them up in the first place.
      
      Change-Id: Ie1d36de884abfb17dd0db6f0449a2b7c997affab
      Reviewed-on: https://go-review.googlesource.com/20666Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      efb0c554
    • Austin Clements's avatar
      runtime: free dead G stacks concurrently · e8337491
      Austin Clements authored
      Currently we free cached stacks of dead Gs during STW stack root
      marking. We do this during STW because there's no way to take
      ownership of a particular dead G, so attempting to free a dead G's
      stack during concurrent stack root marking could race with reusing
      that G.
      
      However, we can do this concurrently if we take a completely different
      approach. One way to prevent reuse of a dead G is to remove it from
      the free G list. Hence, this adds a new fixed root marking task that
      simply removes all Gs from the list of dead Gs with cached stacks,
      frees their stacks, and then adds them to the list of dead Gs without
      cached stacks.
      
      This is also a necessary step toward rescanning only dirty stacks,
      since it eliminates another task from STW stack marking.
      
      Change-Id: Iefbad03078b284a2e7bf30fba397da4ca87fe095
      Reviewed-on: https://go-review.googlesource.com/20665Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      e8337491
  7. 21 Apr, 2016 2 commits
    • Austin Clements's avatar
      runtime: simplify/optimize allocate-black a bit · 64a26b79
      Austin Clements authored
      Currently allocating black switches to the system stack (which is
      probably a historical accident) and atomically updates the global
      bytes marked stat. Since we're about to depend on this much more,
      optimize it a bit by putting it back on the regular stack and updating
      the per-P bytes marked stat, which gets lazily folded into the global
      bytes marked stat.
      
      Change-Id: Ibbe16e5382d3fd2256e4381f88af342bf7020b04
      Reviewed-on: https://go-review.googlesource.com/22170Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      64a26b79
    • Austin Clements's avatar
      runtime: count black allocations toward scan work · 479501c1
      Austin Clements authored
      Currently we count black allocations toward the scannable heap size,
      but not toward the scan work we've done so far. This is clearly
      inconsistent (we have, in effect, scanned these allocations and since
      they're already black, we're not going to scan them again). Worse, it
      means we don't count black allocations toward the scannable heap size
      as of the *next* GC because this is based on the amount of scan work
      we did in this cycle.
      
      Fix this by counting black allocations as scan work. Currently the GC
      spends very little time in allocate-black mode, so this probably
      hasn't been a problem, but this will become important when we switch
      to always allocating black.
      
      Change-Id: If6ff693b070c385b65b6ecbbbbf76283a0f9d990
      Reviewed-on: https://go-review.googlesource.com/22119Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      479501c1
  8. 11 Apr, 2016 1 commit
  9. 16 Mar, 2016 2 commits
    • Austin Clements's avatar
      runtime: shrink stacks during concurrent mark · f11e4eb5
      Austin Clements authored
      Currently we shrink stacks during STW mark termination because it used
      to be unsafe to shrink them concurrently. For some programs, this
      significantly increases pause time: stack shrinking costs ~5ms/MB
      copied plus 2µs/shrink.
      
      Now that we've made it safe to shrink a stack without the world being
      stopped, shrink them during the concurrent mark phase.
      
      This reduces the STW time in the program from issue #12967 by an order
      of magnitude and brings it from over the 10ms goal to well under:
      
      name           old 95%ile-markTerm-time  new 95%ile-markTerm-time  delta
      Stackshrink-4               23.8ms ±60%               1.80ms ±39%  -92.44%  (p=0.008 n=5+5)
      
      Fixes #12967.
      
      This slows down the go1 and garbage benchmarks overall by < 0.5%.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  2.48ms ± 1%  2.49ms ± 1%  +0.45%  (p=0.005 n=25+21)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.93s ± 2%     2.97s ± 2%  +1.34%  (p=0.002 n=19+20)
      Fannkuch11-12                2.51s ± 1%     2.59s ± 0%  +3.09%  (p=0.000 n=18+18)
      FmtFprintfEmpty-12          51.1ns ± 2%    51.5ns ± 1%    ~     (p=0.280 n=20+17)
      FmtFprintfString-12          175ns ± 1%     169ns ± 1%  -3.01%  (p=0.000 n=20+20)
      FmtFprintfInt-12             160ns ± 1%     160ns ± 0%  +0.53%  (p=0.000 n=20+20)
      FmtFprintfIntInt-12          265ns ± 0%     266ns ± 1%  +0.59%  (p=0.000 n=20+20)
      FmtFprintfPrefixedInt-12     237ns ± 1%     238ns ± 1%  +0.44%  (p=0.000 n=20+20)
      FmtFprintfFloat-12           326ns ± 1%     341ns ± 1%  +4.55%  (p=0.000 n=20+19)
      FmtManyArgs-12              1.01µs ± 0%    1.02µs ± 0%  +0.43%  (p=0.000 n=20+19)
      GobDecode-12                8.41ms ± 1%    8.30ms ± 2%  -1.22%  (p=0.000 n=20+19)
      GobEncode-12                6.66ms ± 1%    6.68ms ± 0%  +0.30%  (p=0.000 n=18+19)
      Gzip-12                      322ms ± 1%     322ms ± 1%    ~     (p=1.000 n=20+20)
      Gunzip-12                   42.8ms ± 0%    42.9ms ± 0%    ~     (p=0.174 n=20+20)
      HTTPClientServer-12         69.7µs ± 1%    70.6µs ± 1%  +1.20%  (p=0.000 n=20+20)
      JSONEncode-12               16.8ms ± 0%    16.8ms ± 1%    ~     (p=0.154 n=19+19)
      JSONDecode-12               65.1ms ± 0%    65.3ms ± 1%  +0.34%  (p=0.003 n=20+20)
      Mandelbrot200-12            3.93ms ± 0%    3.92ms ± 0%    ~     (p=0.396 n=19+20)
      GoParse-12                  3.66ms ± 1%    3.65ms ± 1%    ~     (p=0.117 n=16+18)
      RegexpMatchEasy0_32-12      85.0ns ± 2%    85.5ns ± 2%    ~     (p=0.143 n=20+20)
      RegexpMatchEasy0_1K-12       267ns ± 1%     267ns ± 1%    ~     (p=0.867 n=20+17)
      RegexpMatchEasy1_32-12      83.3ns ± 2%    83.8ns ± 1%    ~     (p=0.068 n=20+20)
      RegexpMatchEasy1_1K-12       432ns ± 1%     432ns ± 1%    ~     (p=0.804 n=20+19)
      RegexpMatchMedium_32-12      133ns ± 0%     133ns ± 0%    ~     (p=1.000 n=20+20)
      RegexpMatchMedium_1K-12     40.3µs ± 1%    40.4µs ± 1%    ~     (p=0.319 n=20+19)
      RegexpMatchHard_32-12       2.10µs ± 1%    2.10µs ± 1%    ~     (p=0.723 n=20+18)
      RegexpMatchHard_1K-12       63.0µs ± 0%    63.0µs ± 0%    ~     (p=0.158 n=19+17)
      Revcomp-12                   461ms ± 1%     476ms ± 8%  +3.29%  (p=0.002 n=20+20)
      Template-12                 80.1ms ± 1%    79.3ms ± 1%  -1.00%  (p=0.000 n=20+20)
      TimeParse-12                 360ns ± 0%     360ns ± 0%    ~     (p=0.802 n=18+19)
      TimeFormat-12                374ns ± 1%     372ns ± 0%  -0.77%  (p=0.000 n=20+19)
      [Geo mean]                  61.8µs         62.0µs       +0.40%
      
      Change-Id: Ib60cd46b7a4987e07670eb271d22f6cee5802842
      Reviewed-on: https://go-review.googlesource.com/20044Reviewed-by: default avatarKeith Randall <khr@golang.org>
      f11e4eb5
    • Austin Clements's avatar
      runtime: generalize work.finalizersDone to work.markrootDone · c14d25c6
      Austin Clements authored
      We're about to add another root marking job that needs to happen only
      during the first markroot pass (whether that's concurrent or STW),
      just like finalizer scanning. Rather than introducing another flag
      that has the same value as finalizersDone, just rename finalizersDone
      to markrootDone.
      
      Change-Id: I535356c6ea1f3734cb5b6add264cb7bf48de95e8
      Reviewed-on: https://go-review.googlesource.com/20043Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c14d25c6
  10. 07 Mar, 2016 1 commit
  11. 02 Mar, 2016 1 commit
    • Brad Fitzpatrick's avatar
      all: single space after period. · 5fea2ccc
      Brad Fitzpatrick authored
      The tree's pretty inconsistent about single space vs double space
      after a period in documentation. Make it consistently a single space,
      per earlier decisions. This means contributors won't be confused by
      misleading precedence.
      
      This CL doesn't use go/doc to parse. It only addresses // comments.
      It was generated with:
      
      $ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
      $ go test go/doc -update
      
      Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
      Reviewed-on: https://go-review.googlesource.com/20022Reviewed-by: default avatarRob Pike <r@golang.org>
      Reviewed-by: default avatarDave Day <djd@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      5fea2ccc
  12. 25 Feb, 2016 1 commit
    • Austin Clements's avatar
      runtime: pass gcWork to markroot · 7b229001
      Austin Clements authored
      Currently markroot uses a gcWork on the stack and disposes of it
      immediately after marking one root. This used to be necessary because
      markroot was called from the depths of parfor, but now that we call it
      directly and have ready access to a gcWork at the call site, pass the
      gcWork in, use it directly in markroot, and share it across calls to
      markroot from the same P.
      
      Change-Id: Id7c3b811bfb944153760e01873c07c8d18909be1
      Reviewed-on: https://go-review.googlesource.com/19635Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      7b229001
  13. 16 Jan, 2016 1 commit
    • Austin Clements's avatar
      runtime: fix sleep/wakeup race for GC assists · b05f18e3
      Austin Clements authored
      GC assists check gcBlackenEnabled under the assist queue lock to avoid
      going to sleep after gcWakeAllAssists has already woken all assists.
      However, currently we clear gcBlackenEnabled shortly *after* waking
      all assists, which opens a window where this exact race can happen.
      
      Fix this by clearing gcBlackenEnabled before waking blocked assists.
      However, it's unlikely this actually matters because the world is
      stopped between waking assists and clearing gcBlackenEnabled and there
      aren't any obvious allocations during this window, so I don't think an
      assist could actually slip in to this race window.
      
      Updates #13645.
      
      Change-Id: I7571f059530481dc781d8fd96a1a40aadebecb0d
      Reviewed-on: https://go-review.googlesource.com/18682
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      b05f18e3
  14. 12 Jan, 2016 2 commits
  15. 11 Jan, 2016 1 commit
    • Rick Hudson's avatar
      runtime: eagerly share GC work buffers · 9439fa10
      Rick Hudson authored
      Currently, due to an oversight, we only balance work buffers
      in background and idle workers and not in assists. As a
      result, in assist-heavy workloads, assists are likely to tie
      up large work buffers in per-P caches increasing the
      likelihood that the global list will be empty. This increases
      the likelihood that other GC workers will exit and assists
      will block, slowing down the system as a whole. Fix this by
      eagerly balancing work buffers as soon as the assists notice
      that the global buffers are empty. This makes it much more
      likely that work will be immediately available to other
      workers and assists.
      
      This change reduces the garbage benchmark time by 39% and
      fixes the regresssion seen at CL 15893 golang.org/cl/15893.
      
      Garbage benchmark times before and after this CL.
      Before GOPERF-METRIC:time=4427020
      After  GOPERF-METRIC:time=2721645
      
      Fixes #13827
      
      Change-Id: I9cb531fb873bab4b69ce9c1617e30df6c49cdcfe
      Reviewed-on: https://go-review.googlesource.com/18341Reviewed-by: default avatarAustin Clements <austin@google.com>
      9439fa10
  16. 24 Nov, 2015 1 commit
  17. 19 Nov, 2015 1 commit
    • Austin Clements's avatar
      runtime: prevent sigprof during all stack barrier ops · 9c9d74ab
      Austin Clements authored
      A sigprof during stack barrier insertion or removal can crash if it
      detects an inconsistency between the stkbar array and the stack
      itself. Currently we protect against this when scanning another G's
      stack using stackLock, but we don't protect against it when unwinding
      stack barriers for a recover or a memmove to the stack.
      
      This commit cleans up and improves the stack locking code. It
      abstracts out the lock and unlock operations. It uses the lock
      consistently everywhere we perform stack operations, and pushes the
      lock/unlock down closer to where the stack barrier operations happen
      to make it more obvious what it's protecting. Finally, it modifies
      sigprof so that instead of spinning until it acquires the lock, it
      simply doesn't perform a traceback if it can't acquire it. This is
      necessary to prevent self-deadlock.
      
      Updates #11863, which introduced stackLock to fix some of these
      issues, but didn't go far enough.
      
      Updates #12528.
      
      Change-Id: I9d1fa88ae3744d31ba91500c96c6988ce1a3a349
      Reviewed-on: https://go-review.googlesource.com/17036Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      9c9d74ab
  18. 16 Nov, 2015 1 commit
    • Ian Lance Taylor's avatar
      runtime: add optional expensive check for invalid cgo pointer passing · be1ef467
      Ian Lance Taylor authored
      If you set GODEBUG=cgocheck=2 the runtime package will use the write
      barrier to detect cases where a Go program writes a Go pointer into
      non-Go memory.  In conjunction with the existing cgo checks, and the
      not-yet-implemented cgo check for exported functions, this should
      reliably detect all cases (that do not import the unsafe package) in
      which a Go pointer is incorrectly shared with C code.  This check is
      optional because it turns on the write barrier at all times, which is
      known to be expensive.
      
      Update #12416.
      
      Change-Id: I549d8b2956daa76eac853928e9280e615d6365f4
      Reviewed-on: https://go-review.googlesource.com/16899Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      be1ef467
  19. 12 Nov, 2015 1 commit
  20. 11 Nov, 2015 2 commits
  21. 10 Nov, 2015 1 commit
    • Michael Matloob's avatar
      runtime: break atomics out into package runtime/internal/atomic · 67faca7d
      Michael Matloob authored
      This change breaks out most of the atomics functions in the runtime
      into package runtime/internal/atomic. It adds some basic support
      in the toolchain for runtime packages, and also modifies linux/arm
      atomics to remove the dependency on the runtime's mutex. The mutexes
      have been replaced with spinlocks.
      
      all trybots are happy!
      In addition to the trybots, I've tested on the darwin/arm64 builder,
      on the darwin/arm builder, and on a ppc64le machine.
      
      Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
      Reviewed-on: https://go-review.googlesource.com/14204
      Run-TryBot: Michael Matloob <matloob@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      67faca7d
  22. 05 Nov, 2015 2 commits
    • Austin Clements's avatar
      runtime: decentralize mark done and mark termination · c99d7f7f
      Austin Clements authored
      This moves all of the mark 1 to mark 2 transition and mark termination
      to the mark done transition function. This means these transitions are
      now handled on the goroutine that detected mark completion. This also
      means that the GC coordinator and the background completion barriers
      are no longer used and various workarounds to yield to the coordinator
      are no longer necessary. These will be removed in follow-up commits.
      
      One consequence of this is that mark workers now need to be
      preemptible when performing the mark done transition. This allows them
      to stop the world and to perform the final clean-up steps of GC after
      restarting the world. They are only made preemptible while performing
      this transition, so if the worker findRunnableGCWorker would schedule
      isn't available, we didn't want to schedule it anyway.
      
      Fixes #11970.
      
      Change-Id: I9203a2d6287eeff62d589ec02ad9cb1e29ddb837
      Reviewed-on: https://go-review.googlesource.com/16391Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c99d7f7f
    • Austin Clements's avatar
      runtime: factor mark done transition · 171204b5
      Austin Clements authored
      Currently the code for completion of mark 1/mark 2 is duplicated in
      background workers and assists. Factor this in to a single function
      that will serve as the transition function for concurrent mark.
      
      Change-Id: I4d9f697a15da0d349db3b34d56f3a220dd41d41b
      Reviewed-on: https://go-review.googlesource.com/16359Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      171204b5
  23. 04 Nov, 2015 1 commit
    • Austin Clements's avatar
      runtime: eliminate getfull barrier from concurrent mark · 62ba520b
      Austin Clements authored
      Currently dedicated mark workers participate in the getfull barrier
      during concurrent mark. However, the getfull barrier wasn't designed
      for concurrent work and this causes no end of headaches.
      
      In the concurrent setting, participants come and go. This makes mark
      completion susceptible to live-lock: since dedicated workers are only
      periodically polling for completion, it's possible for the program to
      be in some transient worker each time one of the dedicated workers
      wakes up to check if it can exit the getfull barrier. It also
      complicates reasoning about the system because dedicated workers
      participate directly in the getfull barrier, but transient workers
      must instead use trygetfull because they have exit conditions that
      aren't captured by getfull (e.g., fractional workers exit when
      preempted). The complexity of implementing these exit conditions
      contributed to #11677. Furthermore, the getfull barrier is inefficient
      because we could be running user code instead of spinning on a P. In
      effect, we're dedicating 25% of the CPU to marking even if that means
      we have to spin to make that 25%. It also causes issues on Windows
      because we can't actually sleep for 100µs (#8687).
      
      Fix this by making dedicated workers no longer participate in the
      getfull barrier. Instead, dedicated workers simply return to the
      scheduler when they fail to get more work, regardless of what others
      workers are doing, and the scheduler only starts new dedicated workers
      if there's work available. Everything that needs to be handled by this
      barrier is already handled by detection of mark completion.
      
      This makes the system much more symmetric because all workers and
      assists now use trygetfull during concurrent mark. It also loosens the
      25% CPU target so that we can give some of that 25% back to user code
      if there isn't enough work to keep the mark worker busy. And it
      eliminates the problematic 100µs sleep on Windows during concurrent
      mark (though not during mark termination).
      
      The downside of this is that if we hit a bottleneck in the heap graph
      that then expands back out, the system may shut down dedicated workers
      and take a while to start them back up. We'll address this in the next
      commit.
      
      Updates #12041 and #8687.
      
      No effect on the go1 benchmarks. This slows down the garbage benchmark
      by 9%, but we'll more than make it up in the next commit.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  5.80ms ± 2%  6.32ms ± 4%  +9.03%  (p=0.000 n=20+20)
      
      Change-Id: I65100a9ba005a8b5cf97940798918672ea9dd09b
      Reviewed-on: https://go-review.googlesource.com/16297Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      62ba520b
  24. 03 Nov, 2015 2 commits
    • Austin Clements's avatar
      runtime: make assists preemptible · 45652830
      Austin Clements authored
      Currently, assists are non-preemptible, which means a heavily
      assisting G can block other Gs from running. At the beginning of a GC
      cycle, it can also delay scang, which will spin until the assist is
      done. Since scanning is currently done sequentially, this can
      seriously extend the length of the scan phase.
      
      Fix this by making assists preemptible. Since the assist holds work
      buffers and runs on the system stack, this must be done cooperatively:
      we make gcDrainN return on preemption, and make the assist return from
      the system stack and voluntarily Gosched.
      
      This is prerequisite to enlarging the work buffers. Without this
      change, the delays and spinning in scang increase significantly.
      
      This has no effect on the go1 benchmarks.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  5.72ms ± 4%  5.37ms ± 5%  -6.11%  (p=0.000 n=20+20)
      
      Change-Id: I829e732a0f23b126da633516a1a9ec1a508fdbf1
      Reviewed-on: https://go-review.googlesource.com/15894Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      45652830
    • Austin Clements's avatar
      runtime: replace assist sleep loop with park/ready · 15aa6bbd
      Austin Clements authored
      GC assists must block until the assist can be satisfied (either
      through stealing credit or doing work) or the GC cycle ends.
      Currently, this is implemented as a retry loop with a 100 µs delay.
      This obviously isn't ideal, as it wastes CPU and delays mutator
      execution. It also has the somewhat peculiar downside that sleeping a
      G requires allocation, and this requires working around recursive
      allocation.
      
      Replace this timed delay with a proper scheduling queue. When an
      assist can't be satisfied immediately, it adds the allocating G to a
      queue and parks it. Any time background scan credit is flushed, it
      consults this queue, directly satisfies the debt of queued assists,
      and wakes up satisfied assists before flushing any remaining credit to
      the background credit pool.
      
      No effect on the go1 benchmarks. Slightly speeds up the garbage
      benchmark.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  5.81ms ± 1%  5.72ms ± 4%  -1.65%  (p=0.011 n=20+20)
      
      Updates #12041.
      
      Change-Id: I8ee3b6274dd097b12b10a8030796a958a4b0e7b7
      Reviewed-on: https://go-review.googlesource.com/15890Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      15aa6bbd
  25. 30 Oct, 2015 2 commits
    • Austin Clements's avatar
      runtime: perform concurrent scan in GC workers · 82d14d77
      Austin Clements authored
      Currently the concurrent root scan is performed in its entirety by the
      GC coordinator before entering concurrent mark (which enables GC
      workers). This scan is done sequentially, which can prolong the scan
      phase, delay the mark phase, and means that the scan phase does not
      obey the 25% CPU goal. Furthermore, there's no need to complete the
      root scan before starting marking (in fact, we already allow GC
      assists to happen during the scan phase), so this acts as an
      unnecessary barrier between root scanning and marking.
      
      This change shifts the root scan work out of the GC coordinator and in
      to the GC workers. The coordinator simply sets up the scan state and
      enqueues the right number of root scan jobs. The GC workers then drain
      the root scan jobs prior to draining heap scan jobs.
      
      This parallelizes the root scan process, makes it obey the 25% CPU
      goal, and effectively eliminates root scanning as an isolated phase,
      allowing the system to smoothly transition from root scanning to heap
      marking. This also eliminates a major non-STW responsibility of the GC
      coordinator, which will make it easier to switch to a decentralized
      state machine. Finally, it puts us in a good position to perform root
      scanning in assists as well, which will help satisfy assists at the
      beginning of the GC cycle.
      
      This is mostly straightforward. One tricky aspect is that we have to
      deal with preemption deadlock: where two non-preemptible gorountines
      are trying to preempt each other to perform a stack scan. Given the
      context where this happens, the only instance of this is two
      background workers trying to scan each other. We avoid this by simply
      not scanning the stacks of background workers during the concurrent
      phase; this is safe because we'll scan them during mark termination
      (and their stacks are *very* small and should not contain any new
      pointers).
      
      This change also switches the root marking during mark termination to
      use the same gcDrain-based code path as concurrent mark. This
      shouldn't affect performance because STW root marking was already
      parallel and tasks switched to heap marking immediately when no more
      root marking tasks were available. However, it simplifies the code and
      unifies these code paths.
      
      This has negligible effect on the go1 benchmarks. It slightly slows
      down the garbage benchmark, possibly by making GC run slightly more
      frequently.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  5.10ms ± 1%  5.24ms ± 1%  +2.87%  (p=0.000 n=18+18)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              3.25s ± 3%     3.20s ± 5%  -1.57%  (p=0.013 n=20+20)
      Fannkuch11-12                2.45s ± 1%     2.46s ± 1%  +0.38%  (p=0.019 n=20+18)
      FmtFprintfEmpty-12          49.7ns ± 3%    49.9ns ± 4%    ~     (p=0.851 n=19+20)
      FmtFprintfString-12          170ns ± 2%     170ns ± 1%    ~     (p=0.775 n=20+19)
      FmtFprintfInt-12             161ns ± 1%     160ns ± 1%  -0.78%  (p=0.000 n=19+18)
      FmtFprintfIntInt-12          267ns ± 1%     270ns ± 1%  +1.04%  (p=0.000 n=19+19)
      FmtFprintfPrefixedInt-12     238ns ± 2%     238ns ± 1%    ~     (p=0.133 n=18+19)
      FmtFprintfFloat-12           311ns ± 1%     310ns ± 2%  -0.35%  (p=0.023 n=20+19)
      FmtManyArgs-12              1.08µs ± 1%    1.06µs ± 1%  -2.31%  (p=0.000 n=20+20)
      GobDecode-12                8.65ms ± 1%    8.63ms ± 1%    ~     (p=0.377 n=18+20)
      GobEncode-12                6.49ms ± 1%    6.52ms ± 1%  +0.37%  (p=0.015 n=20+20)
      Gzip-12                      319ms ± 3%     318ms ± 1%    ~     (p=0.975 n=19+17)
      Gunzip-12                   41.9ms ± 1%    42.1ms ± 2%  +0.65%  (p=0.004 n=19+20)
      HTTPClientServer-12         61.7µs ± 1%    62.6µs ± 1%  +1.40%  (p=0.000 n=18+20)
      JSONEncode-12               16.8ms ± 1%    16.9ms ± 1%    ~     (p=0.239 n=20+18)
      JSONDecode-12               58.4ms ± 1%    60.7ms ± 1%  +3.85%  (p=0.000 n=19+20)
      Mandelbrot200-12            3.86ms ± 0%    3.86ms ± 1%    ~     (p=0.092 n=18+19)
      GoParse-12                  3.75ms ± 2%    3.75ms ± 2%    ~     (p=0.708 n=19+20)
      RegexpMatchEasy0_32-12       100ns ± 1%     100ns ± 2%  +0.60%  (p=0.010 n=17+20)
      RegexpMatchEasy0_1K-12       341ns ± 1%     342ns ± 2%    ~     (p=0.203 n=20+19)
      RegexpMatchEasy1_32-12      82.5ns ± 2%    83.2ns ± 2%  +0.83%  (p=0.007 n=19+19)
      RegexpMatchEasy1_1K-12       495ns ± 1%     495ns ± 2%    ~     (p=0.970 n=19+18)
      RegexpMatchMedium_32-12      130ns ± 2%     130ns ± 2%  +0.59%  (p=0.039 n=19+20)
      RegexpMatchMedium_1K-12     39.2µs ± 1%    39.3µs ± 1%    ~     (p=0.214 n=18+18)
      RegexpMatchHard_32-12       2.03µs ± 2%    2.02µs ± 1%    ~     (p=0.166 n=18+19)
      RegexpMatchHard_1K-12       61.0µs ± 1%    60.9µs ± 1%    ~     (p=0.169 n=20+18)
      Revcomp-12                   533ms ± 1%     535ms ± 1%    ~     (p=0.071 n=19+17)
      Template-12                 68.1ms ± 2%    73.0ms ± 1%  +7.26%  (p=0.000 n=19+20)
      TimeParse-12                 355ns ± 2%     356ns ± 2%    ~     (p=0.530 n=19+20)
      TimeFormat-12                357ns ± 2%     347ns ± 1%  -2.59%  (p=0.000 n=20+19)
      [Geo mean]                  62.1µs         62.3µs       +0.31%
      
      name                      old speed      new speed      delta
      GobDecode-12              88.7MB/s ± 1%  88.9MB/s ± 1%    ~     (p=0.377 n=18+20)
      GobEncode-12               118MB/s ± 1%   118MB/s ± 1%  -0.37%  (p=0.015 n=20+20)
      Gzip-12                   60.9MB/s ± 3%  60.9MB/s ± 1%    ~     (p=0.944 n=19+17)
      Gunzip-12                  464MB/s ± 1%   461MB/s ± 2%  -0.64%  (p=0.004 n=19+20)
      JSONEncode-12              115MB/s ± 1%   115MB/s ± 1%    ~     (p=0.236 n=20+18)
      JSONDecode-12             33.2MB/s ± 1%  32.0MB/s ± 1%  -3.71%  (p=0.000 n=19+20)
      GoParse-12                15.5MB/s ± 2%  15.5MB/s ± 2%    ~     (p=0.702 n=19+20)
      RegexpMatchEasy0_32-12     320MB/s ± 1%   318MB/s ± 2%    ~     (p=0.094 n=18+20)
      RegexpMatchEasy0_1K-12    3.00GB/s ± 1%  2.99GB/s ± 1%    ~     (p=0.194 n=20+19)
      RegexpMatchEasy1_32-12     388MB/s ± 2%   385MB/s ± 2%  -0.83%  (p=0.008 n=19+19)
      RegexpMatchEasy1_1K-12    2.07GB/s ± 1%  2.07GB/s ± 1%    ~     (p=0.964 n=19+18)
      RegexpMatchMedium_32-12   7.68MB/s ± 1%  7.64MB/s ± 2%  -0.57%  (p=0.020 n=19+20)
      RegexpMatchMedium_1K-12   26.1MB/s ± 1%  26.1MB/s ± 1%    ~     (p=0.211 n=18+18)
      RegexpMatchHard_32-12     15.8MB/s ± 1%  15.8MB/s ± 1%    ~     (p=0.180 n=18+19)
      RegexpMatchHard_1K-12     16.8MB/s ± 1%  16.8MB/s ± 2%    ~     (p=0.236 n=20+19)
      Revcomp-12                 477MB/s ± 1%   475MB/s ± 1%    ~     (p=0.071 n=19+17)
      Template-12               28.5MB/s ± 2%  26.6MB/s ± 1%  -6.77%  (p=0.000 n=19+20)
      [Geo mean]                 100MB/s       99.0MB/s       -0.82%
      
      Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0
      Reviewed-on: https://go-review.googlesource.com/16059Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      82d14d77
    • Austin Clements's avatar
      runtime: consolidate "out of GC work" checks · 4cca1cc0
      Austin Clements authored
      We already have gcMarkWorkAvailable, but the check for GC mark work is
      open-coded in several places. Generalize gcMarkWorkAvailable slightly
      and replace these open-coded checks with calls to gcMarkWorkAvailable.
      
      In addition to cleaning up the code, this puts us in a better position
      to make this check slightly more complicated.
      
      Change-Id: I1b29883300ecd82a1bf6be193e9b4ee96582a860
      Reviewed-on: https://go-review.googlesource.com/16058Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      4cca1cc0