1. 09 Apr, 2019 7 commits
  2. 08 Apr, 2019 17 commits
  3. 07 Apr, 2019 6 commits
  4. 05 Apr, 2019 10 commits
    • Daniel Martí's avatar
      encoding/json: use SetBytes in UnmarshalReuse benchmark · cb664623
      Daniel Martí authored
      This was the only benchmark missing the SetBytes call, as spotted
      earlier by Bryan.
      
      It's not required to make the benchmark useful, but it can still be a
      good way to see how its speed is affected by the reduced allocations:
      
      name                  time/op
      CodeUnmarshal-8        12.1ms ± 1%
      CodeUnmarshalReuse-8   11.4ms ± 1%
      
      name                  speed
      CodeUnmarshal-8       161MB/s ± 1%
      CodeUnmarshalReuse-8  171MB/s ± 1%
      
      name                  alloc/op
      CodeUnmarshal-8        3.28MB ± 0%
      CodeUnmarshalReuse-8   1.94MB ± 0%
      
      name                  allocs/op
      CodeUnmarshal-8         92.7k ± 0%
      CodeUnmarshalReuse-8    77.6k ± 0%
      
      While at it, remove some unnecessary empty lines.
      
      Change-Id: Ib2bd92d5b3237b8f3092e8c6f863dab548fee2f5
      Reviewed-on: https://go-review.googlesource.com/c/go/+/170938
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      cb664623
    • Keith Randall's avatar
      syscall: use openat instead of dup to make a really new file descriptor · 06cff114
      Keith Randall authored
      Update #31269
      
      Change-Id: I0e7184420055b8dfd23688dab9f9d8cba1fa2485
      Reviewed-on: https://go-review.googlesource.com/c/go/+/170892
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      06cff114
    • Austin Clements's avatar
      runtime: separate stack freeing from stack shrinking · 68d89bb8
      Austin Clements authored
      Currently, shrinkstack will free the stack if the goroutine is dead.
      There are only two places that call shrinkstack: scanstack, which will
      never call it if the goroutine is dead; and markrootFreeGStacks, which
      only calls it on dead goroutines.
      
      Clean this up by separating stack freeing out of shrinkstack.
      
      Change-Id: I7d7891e620550c32a2220833923a025704986681
      Reviewed-on: https://go-review.googlesource.com/c/go/+/170890
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMichael Knyszek <mknyszek@google.com>
      68d89bb8
    • Austin Clements's avatar
      runtime: use acquirem/releasem more widely · ea9859f8
      Austin Clements authored
      We've copy-pasted the pattern of releasem in many places. This CL
      replaces almost everywhere that manipulates g.m.locks and g.preempt
      with calls to acquirem/releasem. There are a few where we do something
      more complicated, like where exitsyscall has to restore the stack
      bound differently depending on the preempt flag, which this CL leaves
      alone.
      
      Change-Id: Ia7a46c261daea6e7802b80e7eb9227499f460433
      Reviewed-on: https://go-review.googlesource.com/c/go/+/170064
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      ea9859f8
    • Austin Clements's avatar
      runtime: fix sanity check in notetsleep · ac40a7fb
      Austin Clements authored
      CL 3660 replaced m.gcing with m.preemptoff, but unintentionally
      reversed the sense of part of a sanity check in notetsleep.
      Originally, notetsleep required that it be called from g0 or with
      preemption disabled (specifically from within the garbage collector).
      CL 3660 made it require that it be called from g0 or that preemption
      be *enabled*.
      
      I'm not sure why it had the original exception for being called from a
      user g within the garbage collector, but the current garbage collector
      certainly doesn't need that, and the new condition is completely wrong.
      
      Make the sanity check just require that it's called on g0.
      
      Change-Id: I6980d44f5a4461935e10b1b33a981e32b1b7b0c9
      Reviewed-on: https://go-review.googlesource.com/c/go/+/170063
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      ac40a7fb
    • Austin Clements's avatar
      sync: smooth out Pool behavior over GC with a victim cache · 2dcbf8b3
      Austin Clements authored
      Currently, every Pool is cleared completely at the start of each GC.
      This is a problem for heavy users of Pool because it causes an
      allocation spike immediately after Pools are clear, which impacts both
      throughput and latency.
      
      This CL fixes this by introducing a victim cache mechanism. Instead of
      clearing Pools, the victim cache is dropped and the primary cache is
      moved to the victim cache. As a result, in steady-state, there are
      (roughly) no new allocations, but if Pool usage drops, objects will
      still be collected within two GCs (as opposed to one).
      
      This victim cache approach also improves Pool's impact on GC dynamics.
      The current approach causes all objects in Pools to be short lived.
      However, if an application is in steady state and is just going to
      repopulate its Pools, then these objects impact the live heap size *as
      if* they were long lived. Since Pooled objects count as short lived
      when computing the GC trigger and goal, but act as long lived objects
      in the live heap, this causes GC to trigger too frequently. If Pooled
      objects are a non-trivial portion of an application's heap, this
      increases the CPU overhead of GC. The victim cache lets Pooled objects
      affect the GC trigger and goal as long-lived objects.
      
      This has no impact on Get/Put performance, but substantially reduces
      the impact to the Pool user when a GC happens. PoolExpensiveNew
      demonstrates this in the substantially reduction in the rate at which
      the "New" function is called.
      
      name                 old time/op     new time/op     delta
      Pool-12                 2.21ns ±36%     2.00ns ± 0%     ~     (p=0.070 n=19+16)
      PoolOverflow-12          587ns ± 1%      583ns ± 1%   -0.77%  (p=0.000 n=18+18)
      PoolSTW-12              5.57µs ± 3%     4.52µs ± 4%  -18.82%  (p=0.000 n=20+19)
      PoolExpensiveNew-12     3.69ms ± 7%     1.25ms ± 5%  -66.25%  (p=0.000 n=20+19)
      
      name                 old p50-ns/STW  new p50-ns/STW  delta
      PoolSTW-12               5.48k ± 2%      4.53k ± 2%  -17.32%  (p=0.000 n=20+20)
      
      name                 old p95-ns/STW  new p95-ns/STW  delta
      PoolSTW-12               6.69k ± 4%      5.13k ± 3%  -23.31%  (p=0.000 n=19+18)
      
      name                 old GCs/op      new GCs/op      delta
      PoolExpensiveNew-12       0.39 ± 1%       0.32 ± 2%  -17.95%  (p=0.000 n=18+20)
      
      name                 old New/op      new New/op      delta
      PoolExpensiveNew-12       40.0 ± 6%       12.4 ± 6%  -68.91%  (p=0.000 n=20+19)
      
      (https://perf.golang.org/search?q=upload:20190311.2)
      
      Fixes #22950.
      
      Change-Id: If2e183d948c650417283076aacc20739682cdd70
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166961
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      2dcbf8b3
    • Austin Clements's avatar
      sync: use lock-free structure for Pool stealing · d5fd2dd6
      Austin Clements authored
      Currently, Pool stores each per-P shard's overflow in a slice
      protected by a Mutex. In order to store to the overflow or steal from
      another shard, a P must lock that shard's Mutex. This allows for
      simple synchronization between Put and Get, but has unfortunate
      consequences for clearing pools.
      
      Pools are cleared during STW sweep termination, and hence rely on
      pinning a goroutine to its P to synchronize between Get/Put and
      clearing. This makes the Get/Put fast path extremely fast because it
      can rely on quiescence-style coordination, which doesn't even require
      atomic writes, much less locking.
      
      The catch is that a goroutine cannot acquire a Mutex while pinned to
      its P (as this could deadlock). Hence, it must drop the pin on the
      slow path. But this means the slow path is not synchronized with
      clearing. As a result,
      
      1) It's difficult to reason about races between clearing and the slow
      path. Furthermore, this reasoning often depends on unspecified nuances
      of where preemption points can occur.
      
      2) Clearing must zero out the pointer to every object in every Pool to
      prevent a concurrent slow path from causing all objects to be
      retained. Since this happens during STW, this has an O(# objects in
      Pools) effect on STW time.
      
      3) We can't implement a victim cache without making clearing even
      slower.
      
      This CL solves these problems by replacing the locked overflow slice
      with a lock-free structure. This allows Gets and Puts to be pinned the
      whole time they're manipulating the shards slice (Pool.local), which
      eliminates the races between Get/Put and clearing. This, in turn,
      eliminates the need to zero all object pointers, reducing clearing to
      O(# of Pools) during STW.
      
      In addition to significantly reducing STW impact, this also happens to
      speed up the Get/Put fast-path and the slow path. It somewhat
      increases the cost of PoolExpensiveNew, but we'll fix that in the next
      CL.
      
      name                 old time/op     new time/op     delta
      Pool-12                 3.00ns ± 0%     2.21ns ±36%  -26.32%  (p=0.000 n=18+19)
      PoolOverflow-12          600ns ± 1%      587ns ± 1%   -2.21%  (p=0.000 n=16+18)
      PoolSTW-12              71.0µs ± 2%      5.6µs ± 3%  -92.15%  (p=0.000 n=20+20)
      PoolExpensiveNew-12     3.14ms ± 5%     3.69ms ± 7%  +17.67%  (p=0.000 n=19+20)
      
      name                 old p50-ns/STW  new p50-ns/STW  delta
      PoolSTW-12               70.7k ± 1%       5.5k ± 2%  -92.25%  (p=0.000 n=20+20)
      
      name                 old p95-ns/STW  new p95-ns/STW  delta
      PoolSTW-12               73.1k ± 2%       6.7k ± 4%  -90.86%  (p=0.000 n=18+19)
      
      name                 old GCs/op      new GCs/op      delta
      PoolExpensiveNew-12       0.38 ± 1%       0.39 ± 1%   +2.07%  (p=0.000 n=20+18)
      
      name                 old New/op      new New/op      delta
      PoolExpensiveNew-12       33.9 ± 6%       40.0 ± 6%  +17.97%  (p=0.000 n=19+20)
      
      (https://perf.golang.org/search?q=upload:20190311.1)
      
      Fixes #22331.
      For #22950.
      
      Change-Id: Ic5cd826e25e218f3f8256dbc4d22835c1fecb391
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166960
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      d5fd2dd6
    • Austin Clements's avatar
      sync: add Pool benchmarks to stress STW and reuse · 59f2704d
      Austin Clements authored
      This adds two benchmarks that will highlight two problems in Pool that
      we're about to address.
      
      The first benchmark measures the impact of large Pools on GC STW time.
      Currently, STW time is O(# of items in Pools), and this benchmark
      demonstrates 70µs STW times.
      
      The second benchmark measures the impact of fully clearing all Pools
      on each GC. Typically this is a problem in heavily-loaded systems
      because it causes a spike in allocation. This benchmark stresses this
      by simulating an expensive "New" function, so the cost of creating new
      objects is reflected in the ns/op of the benchmark.
      
      For #22950, #22331.
      
      Change-Id: I0c8853190d23144026fa11837b6bf42adc461722
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166959
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      59f2704d
    • Austin Clements's avatar
      sync: internal dynamically sized lock-free queue for sync.Pool · 57bb7be4
      Austin Clements authored
      This adds a dynamically sized, lock-free, single-producer,
      multi-consumer queue that will be used in the new Pool stealing
      implementation. It's built on top of the fixed-size queue added in the
      previous CL.
      
      For #22950, #22331.
      
      Change-Id: Ifc0ca3895bec7e7f9289ba9fb7dd0332bf96ba5a
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166958
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      57bb7be4
    • Austin Clements's avatar
      sync: internal fixed size lock-free queue for sync.Pool · 2b605670
      Austin Clements authored
      This is the first step toward fixing multiple issues with sync.Pool.
      This adds a fixed size, lock-free, single-producer, multi-consumer
      queue that will be used in the new Pool stealing implementation.
      
      For #22950, #22331.
      
      Change-Id: I50e85e3cb83a2ee71f611ada88e7f55996504bb5
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166957
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      2b605670