- 27 Apr, 2016 10 commits
-
-
Rick Hudson authored
Prior to this CL the base of a span was calculated in various places using shifts or calls to base(). This CL now always calls base() which has been optimized to calculate the base of the span when the span is initialized and store that value in the span structure. Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100 Reviewed-on: https://go-review.googlesource.com/20703Reviewed-by: Austin Clements <austin@google.com>
-
Rick Hudson authored
Prior to this CL the sweep phase was responsible for locating all objects that were about to be freed and calling a function to process the object. This was done by the function heapBitsSweepSpan. Part of processing included calls to tracefree and msanfree as well as counting how many objects were freed. The calls to tracefree and msanfree have been moved into the gcmalloc routine and called when the object is about to be reallocated. The counting of free objects has been optimized using an array based popcnt algorithm and if all the objects in a span are free then span is freed. Similarly the code to locate the next free object has been optimized to use an array based ctz (count trailing zero). Various hot paths in the allocation logic have been optimized. At this point the garbage benchmark is within 3% of the 1.6 release. Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa Reviewed-on: https://go-review.googlesource.com/20201Reviewed-by: Austin Clements <austin@google.com>
-
Rick Hudson authored
Add to each span a 64 bit cache (allocCache) of the allocBits at freeindex. allocCache is shifted such that the lowest bit corresponds to the bit freeindex. allocBits uses a 0 to indicate an object is free, on the other hand allocCache uses a 1 to indicate an object is free. This facilitates ctz64 (count trailing zero) which counts the number of 0s trailing the least significant 1. This is also the index of the least significant 1. Each span maintains a freeindex indicating the boundary between allocated objects and unallocated objects. allocCache is shifted as freeindex is incremented such that the low bit in allocCache corresponds to the bit a freeindex in the allocBits array. Currently ctz64 is written in Go using a for loop so it is not very efficient. Use of the hardware instruction will follow. With this in mind comparisons of the garbage benchmark are as follows. 1.6 release 2.8 seconds dev:garbage branch 3.1 seconds. Profiling shows the go implementation of ctz64 takes up 1% of the total time. Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f Reviewed-on: https://go-review.googlesource.com/20200Reviewed-by: Austin Clements <austin@google.com>
-
Rick Hudson authored
Most (all?) processors that Go supports supply a hardware instruction that takes a byte and returns the number of zeros trailing the first 1 encountered, or 8 if no ones are found. This is the index within the byte of the first 1 encountered. CTZ should improve the performance of the nextFreeIndex function. Since nextFreeIndex wants the next unmarked (0) bit a bit-wise complement is needed before calling ctz. Furthermore unmarked bits associated with previously allocated objects need to be ignored. Instead of writing a 1 as we allocate the code masks all bits less than the freeindex after loading the byte. While this CL does not actual execute a CTZ instruction it supplies a ctz function with the appropiate signature along with the logic to execute it. Change-Id: I5c55ce0ed48ca22c21c4dd9f969b0819b4eadaa7 Reviewed-on: https://go-review.googlesource.com/20169Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>
-
Rick Hudson authored
This is a renaming of the field ref to the more appropriate allocCount. The field holds the number of objects in the span that are currently allocated. Some throws strings were adjusted to more accurately convey the meaning of allocCount. Change-Id: I10daf44e3e9cc24a10912638c7de3c1984ef8efe Reviewed-on: https://go-review.googlesource.com/19518Reviewed-by: Austin Clements <austin@google.com>
-
Rick Hudson authored
Instead of building a freelist from the mark bits generated by the GC this CL allocates directly from the mark bits. The approach moves the mark bits from the pointer/no pointer heap structures into their own per span data structures. The mark/allocation vectors consist of a single mark bit per object. Two vectors are maintained, one for allocation and one for the GC's mark phase. During the GC cycle's sweep phase the interpretation of the vectors is swapped. The mark vector becomes the allocation vector and the old allocation vector is cleared and becomes the mark vector that the next GC cycle will use. Marked entries in the allocation vector indicate that the object is not free. Each allocation vector maintains a boundary between areas of the span already allocated from and areas not yet allocated from. As objects are allocated this boundary is moved until it reaches the end of the span. At this point further allocations will be done from another span. Since we no longer sweep a span inspecting each freed object the responsibility for maintaining pointer/scalar bits in the heapBitMap containing is now the responsibility of the the routines doing the actual allocation. This CL is functionally complete and ready for performance tuning. Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04 Reviewed-on: https://go-review.googlesource.com/19470Reviewed-by: Austin Clements <austin@google.com>
-
Rick Hudson authored
The gcmarkBits is a bit vector used by the GC to mark reachable objects. Once a GC cycle is complete the gcmarkBits swap places with the allocBits. allocBits is then used directly by malloc to locate free objects, thus avoiding the construction of a linked free list. This CL introduces a set of helper functions for manipulating gcmarkBits and allocBits that will be used by later CLs to realize the actual algorithm. Minimal attempts have been made to optimize these helper routines. Change-Id: I55ad6240ca32cd456e8ed4973c6970b3b882dd34 Reviewed-on: https://go-review.googlesource.com/19420Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Rick Hudson <rlh@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Rick Hudson authored
In preparation for changing how the next free object is chosen refactor and consolidate code into a single function. Change-Id: I6836cd88ed7cbf0b2df87abd7c1c3b9fabc1cbd8 Reviewed-on: https://go-review.googlesource.com/19317Reviewed-by: Austin Clements <austin@google.com>
-
Rick Hudson authored
The freelist for normal objects and the freelist for stacks share the same mspan field for holding the list head but are operated on by different code sequences. This overloading complicates the use of bit vectors for allocation of normal objects. This change refactors the use of the stackfreelist out from the use of freelist. Change-Id: I5b155b5b8a1fcd8e24c12ee1eb0800ad9b6b4fa0 Reviewed-on: https://go-review.googlesource.com/19315Reviewed-by: Austin Clements <austin@google.com>
-
Rick Hudson authored
The bitmap allocation data structure prototypes. Before this is released these underlying data structures need to be more performant but the signatures of helper functions utilizing these structures will remain stable. Change-Id: I5ace12f2fb512a7038a52bbde2bfb7e98783bcbe Reviewed-on: https://go-review.googlesource.com/19221Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 05 Apr, 2016 25 commits
-
-
Austin Clements authored
Change-Id: I47ac4112befc07d3674d7a88827227199edd93b4
-
Josh Bleecher Snyder authored
Pure code movement. Change-Id: Ia07ee0b0041c931b08adf090f262a6f74a6fdb01 Reviewed-on: https://go-review.googlesource.com/21546 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
Before: ... 0x00d0 ff ff ff e8 00 00 00 00 e9 23 ff ff ff cc cc cc .........#...... rel 5+4 t=14 +0 rel 82+4 t=13 runtime.writeBarrier+0 ... After: ... 0x00d0 ff ff ff e8 00 00 00 00 e9 23 ff ff ff cc cc cc .........#...... rel 5+4 t=14 TLS+0 rel 82+4 t=13 runtime.writeBarrier+0 ... Change-Id: Ibdaf694581b5fd5fb87fa8ce6a792f3eb4493622 Reviewed-on: https://go-review.googlesource.com/21545Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Joe Tsai authored
CL/19862 introduced the same set of constants to the io package. We should steer users away from the os.SEEK* versions and towards the io.Seek* versions. Updates #6885 Change-Id: I96ec5be3ec3439e1295c937159dadaf1ebfb2737 Reviewed-on: https://go-review.googlesource.com/21540Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Robert Griesemer authored
I think we had this code before but it may have gone lost somehow. Change-Id: Ifde490e686de0d2bfe907cbe19c9197f24f5fa8e Reviewed-on: https://go-review.googlesource.com/21537Reviewed-by: Matthew Dempsky <mdempsky@google.com>
-
David Chase authored
Missed a case for closure calls (OCALLFUNC && indirect) in esc.go:esccall. Cleanup to runtime code for windows to more thoroughly hide a technical escape. Also made code pickier about failing to late non-optional kernel32.dll. Fixes #14409. Change-Id: Ie75486a2c8626c4583224e02e4872c2875f7bca5 Reviewed-on: https://go-review.googlesource.com/20102 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Robert Griesemer authored
For PublicKey.P == 0, Verify will fail. Don't even try. Change-Id: I1009f2b3dead8d0041626c946633acb10086d8c8 Reviewed-on: https://go-review.googlesource.com/21533Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Brad Fitzpatrick authored
Fixes #14928 Change-Id: Id772eb623815cb2bb3e49de68a916762345a9dc1 Reviewed-on: https://go-review.googlesource.com/21531Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Dmitry Vyukov authored
Two GC-related functions, scang and casgstatus, wait in an active spin loop. Active spinning is never a good idea in user-space. Once we wait several times more than the expected wait time, something unexpected is happenning (e.g. the thread we are waiting for is descheduled or handling a page fault) and we need to yield to OS scheduler. Moreover, the expected wait time is very high for these functions: scang wait time can be tens of milliseconds, casgstatus can be hundreds of microseconds. It does not make sense to spin even for that time. go install -a std profile on a 4-core machine shows that 11% of time is spent in the active spin in scang: 6.12% compile compile [.] runtime.scang 3.27% compile compile [.] runtime.readgstatus 1.72% compile compile [.] runtime/internal/atomic.Load The active spin also increases tail latency in the case of the slightest oversubscription: GC goroutines spend whole quantum in the loop instead of executing user code. Here is scang wait time histogram during go install -a std: 13707.0000 - 1815442.7667 [ 118]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎... 1815442.7667 - 3617178.5333 [ 9]: ∎∎∎∎∎∎∎∎∎ 3617178.5333 - 5418914.3000 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ 5418914.3000 - 7220650.0667 [ 5]: ∎∎∎∎∎ 7220650.0667 - 9022385.8333 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎ 9022385.8333 - 10824121.6000 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎ 10824121.6000 - 12625857.3667 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 12625857.3667 - 14427593.1333 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 14427593.1333 - 16229328.9000 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 16229328.9000 - 18031064.6667 [ 32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 18031064.6667 - 19832800.4333 [ 28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 19832800.4333 - 21634536.2000 [ 6]: ∎∎∎∎∎∎ 21634536.2000 - 23436271.9667 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 23436271.9667 - 25238007.7333 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ 25238007.7333 - 27039743.5000 [ 27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 27039743.5000 - 28841479.2667 [ 20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 28841479.2667 - 30643215.0333 [ 10]: ∎∎∎∎∎∎∎∎∎∎ 30643215.0333 - 32444950.8000 [ 7]: ∎∎∎∎∎∎∎ 32444950.8000 - 34246686.5667 [ 4]: ∎∎∎∎ 34246686.5667 - 36048422.3333 [ 4]: ∎∎∎∎ 36048422.3333 - 37850158.1000 [ 1]: ∎ 37850158.1000 - 39651893.8667 [ 5]: ∎∎∎∎∎ 39651893.8667 - 41453629.6333 [ 2]: ∎∎ 41453629.6333 - 43255365.4000 [ 2]: ∎∎ 43255365.4000 - 45057101.1667 [ 2]: ∎∎ 45057101.1667 - 46858836.9333 [ 1]: ∎ 46858836.9333 - 48660572.7000 [ 2]: ∎∎ 48660572.7000 - 50462308.4667 [ 3]: ∎∎∎ 50462308.4667 - 52264044.2333 [ 2]: ∎∎ 52264044.2333 - 54065780.0000 [ 2]: ∎∎ and the zoomed-in first part: 13707.0000 - 19916.7667 [ 2]: ∎∎ 19916.7667 - 26126.5333 [ 2]: ∎∎ 26126.5333 - 32336.3000 [ 9]: ∎∎∎∎∎∎∎∎∎ 32336.3000 - 38546.0667 [ 8]: ∎∎∎∎∎∎∎∎ 38546.0667 - 44755.8333 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎ 44755.8333 - 50965.6000 [ 10]: ∎∎∎∎∎∎∎∎∎∎ 50965.6000 - 57175.3667 [ 5]: ∎∎∎∎∎ 57175.3667 - 63385.1333 [ 6]: ∎∎∎∎∎∎ 63385.1333 - 69594.9000 [ 5]: ∎∎∎∎∎ 69594.9000 - 75804.6667 [ 6]: ∎∎∎∎∎∎ 75804.6667 - 82014.4333 [ 6]: ∎∎∎∎∎∎ 82014.4333 - 88224.2000 [ 4]: ∎∎∎∎ 88224.2000 - 94433.9667 [ 1]: ∎ 94433.9667 - 100643.7333 [ 1]: ∎ 100643.7333 - 106853.5000 [ 2]: ∎∎ 106853.5000 - 113063.2667 [ 0]: 113063.2667 - 119273.0333 [ 2]: ∎∎ 119273.0333 - 125482.8000 [ 2]: ∎∎ 125482.8000 - 131692.5667 [ 1]: ∎ 131692.5667 - 137902.3333 [ 1]: ∎ 137902.3333 - 144112.1000 [ 0]: 144112.1000 - 150321.8667 [ 2]: ∎∎ 150321.8667 - 156531.6333 [ 1]: ∎ 156531.6333 - 162741.4000 [ 1]: ∎ 162741.4000 - 168951.1667 [ 0]: 168951.1667 - 175160.9333 [ 0]: 175160.9333 - 181370.7000 [ 1]: ∎ 181370.7000 - 187580.4667 [ 1]: ∎ 187580.4667 - 193790.2333 [ 2]: ∎∎ 193790.2333 - 200000.0000 [ 0]: Here is casgstatus wait time histogram: 631.0000 - 5276.6333 [ 3]: ∎∎∎ 5276.6333 - 9922.2667 [ 5]: ∎∎∎∎∎ 9922.2667 - 14567.9000 [ 2]: ∎∎ 14567.9000 - 19213.5333 [ 6]: ∎∎∎∎∎∎ 19213.5333 - 23859.1667 [ 5]: ∎∎∎∎∎ 23859.1667 - 28504.8000 [ 6]: ∎∎∎∎∎∎ 28504.8000 - 33150.4333 [ 6]: ∎∎∎∎∎∎ 33150.4333 - 37796.0667 [ 2]: ∎∎ 37796.0667 - 42441.7000 [ 1]: ∎ 42441.7000 - 47087.3333 [ 3]: ∎∎∎ 47087.3333 - 51732.9667 [ 0]: 51732.9667 - 56378.6000 [ 1]: ∎ 56378.6000 - 61024.2333 [ 0]: 61024.2333 - 65669.8667 [ 0]: 65669.8667 - 70315.5000 [ 0]: 70315.5000 - 74961.1333 [ 1]: ∎ 74961.1333 - 79606.7667 [ 0]: 79606.7667 - 84252.4000 [ 0]: 84252.4000 - 88898.0333 [ 0]: 88898.0333 - 93543.6667 [ 0]: 93543.6667 - 98189.3000 [ 0]: 98189.3000 - 102834.9333 [ 0]: 102834.9333 - 107480.5667 [ 1]: ∎ 107480.5667 - 112126.2000 [ 0]: 112126.2000 - 116771.8333 [ 0]: 116771.8333 - 121417.4667 [ 0]: 121417.4667 - 126063.1000 [ 0]: 126063.1000 - 130708.7333 [ 0]: 130708.7333 - 135354.3667 [ 0]: 135354.3667 - 140000.0000 [ 1]: ∎ Ideally we eliminate the waiting by switching to async state machine for GC, but for now just yield to OS scheduler after a reasonable wait time. To choose yielding parameters I've measured golang.org/x/benchmarks/http tail latencies with different yield delays and oversubscription levels. With no oversubscription (to the degree possible): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 1.41ms ±15% 1.41ms ± 5% ~ (p=0.611 n=13+12) Latency-95 5.21ms ± 2% 5.15ms ± 2% -1.15% (p=0.012 n=13+13) Latency-99 7.16ms ± 2% 7.05ms ± 2% -1.54% (p=0.002 n=13+13) Latency-999 10.7ms ± 9% 10.2ms ±10% -5.46% (p=0.004 n=12+13) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 1.41ms ±15% 1.41ms ± 8% ~ (p=0.511 n=13+13) Latency-95 5.21ms ± 2% 5.14ms ± 2% -1.23% (p=0.006 n=13+13) Latency-99 7.16ms ± 2% 7.02ms ± 2% -1.94% (p=0.000 n=13+13) Latency-999 10.7ms ± 9% 10.1ms ± 8% -6.14% (p=0.000 n=12+13) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 1.41ms ±15% 1.45ms ± 6% ~ (p=0.724 n=13+13) Latency-95 5.21ms ± 2% 5.18ms ± 1% ~ (p=0.287 n=13+13) Latency-99 7.16ms ± 2% 7.05ms ± 2% -1.64% (p=0.002 n=13+13) Latency-999 10.7ms ± 9% 10.0ms ± 5% -6.72% (p=0.000 n=12+13) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 1.41ms ±15% 1.51ms ± 7% +6.57% (p=0.002 n=13+13) Latency-95 5.21ms ± 2% 5.21ms ± 2% ~ (p=0.960 n=13+13) Latency-99 7.16ms ± 2% 7.06ms ± 2% -1.50% (p=0.012 n=13+13) Latency-999 10.7ms ± 9% 10.0ms ± 6% -6.49% (p=0.000 n=12+13) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 1.41ms ±15% 1.53ms ± 6% +8.48% (p=0.000 n=13+12) Latency-95 5.21ms ± 2% 5.23ms ± 2% ~ (p=0.287 n=13+13) Latency-99 7.16ms ± 2% 7.08ms ± 2% -1.21% (p=0.004 n=13+13) Latency-999 10.7ms ± 9% 9.9ms ± 3% -7.99% (p=0.000 n=12+12) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 1.41ms ±15% 1.47ms ± 5% ~ (p=0.072 n=13+13) Latency-95 5.21ms ± 2% 5.17ms ± 2% ~ (p=0.091 n=13+13) Latency-99 7.16ms ± 2% 7.02ms ± 2% -1.99% (p=0.000 n=13+13) Latency-999 10.7ms ± 9% 9.9ms ± 5% -7.86% (p=0.000 n=12+13) With slight oversubscription (another instance of http benchmark was running in background with reduced GOMAXPROCS): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 840µs ± 3% 804µs ± 3% -4.37% (p=0.000 n=15+18) Latency-95 6.52ms ± 4% 6.03ms ± 4% -7.51% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.0ms ± 4% -7.33% (p=0.000 n=18+14) Latency-999 18.0ms ± 9% 16.8ms ± 7% -6.84% (p=0.000 n=18+18) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 840µs ± 3% 809µs ± 3% -3.71% (p=0.000 n=15+17) Latency-95 6.52ms ± 4% 6.11ms ± 4% -6.29% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 9.9ms ± 6% -7.55% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.5ms ±11% -8.49% (p=0.000 n=18+18) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 840µs ± 3% 823µs ± 5% -2.06% (p=0.002 n=15+18) Latency-95 6.52ms ± 4% 6.32ms ± 3% -3.05% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 4% -5.22% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.7ms ±10% -7.09% (p=0.000 n=18+18) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 840µs ± 3% 836µs ± 5% ~ (p=0.442 n=15+18) Latency-95 6.52ms ± 4% 6.39ms ± 3% -2.00% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 6% -5.15% (p=0.000 n=18+17) Latency-999 18.0ms ± 9% 16.6ms ± 8% -7.48% (p=0.000 n=18+18) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 840µs ± 3% 836µs ± 6% ~ (p=0.401 n=15+18) Latency-95 6.52ms ± 4% 6.40ms ± 4% -1.79% (p=0.010 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 5% -4.95% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.5ms ±14% -8.17% (p=0.000 n=18+18) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 840µs ± 3% 828µs ± 2% -1.49% (p=0.001 n=15+17) Latency-95 6.52ms ± 4% 6.38ms ± 4% -2.04% (p=0.001 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 4% -4.77% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.9ms ± 9% -6.23% (p=0.000 n=18+18) With significant oversubscription (background http benchmark was running with full GOMAXPROCS): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 1.32ms ±12% 1.30ms ±13% ~ (p=0.454 n=14+14) Latency-95 16.3ms ±10% 15.3ms ± 7% -6.29% (p=0.001 n=14+14) Latency-99 29.4ms ±10% 27.9ms ± 5% -5.04% (p=0.001 n=14+12) Latency-999 49.9ms ±19% 45.9ms ± 5% -8.00% (p=0.008 n=14+13) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 1.32ms ±12% 1.29ms ± 9% ~ (p=0.227 n=14+14) Latency-95 16.3ms ±10% 15.4ms ± 5% -5.27% (p=0.002 n=14+14) Latency-99 29.4ms ±10% 27.9ms ± 6% -5.16% (p=0.001 n=14+14) Latency-999 49.9ms ±19% 46.8ms ± 8% -6.21% (p=0.050 n=14+14) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 1.32ms ±12% 1.35ms ± 9% ~ (p=0.401 n=14+14) Latency-95 16.3ms ±10% 15.0ms ± 4% -7.67% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.4ms ± 5% -6.98% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.7ms ± 5% -10.56% (p=0.000 n=14+11) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 1.32ms ±12% 1.36ms ±10% ~ (p=0.246 n=14+14) Latency-95 16.3ms ±10% 14.9ms ± 5% -8.31% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.4ms ± 7% -6.70% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.9ms ±15% -10.13% (p=0.003 n=14+14) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 1.32ms ±12% 1.41ms ± 9% +6.37% (p=0.008 n=14+13) Latency-95 16.3ms ±10% 15.1ms ± 8% -7.45% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.5ms ±12% -6.67% (p=0.002 n=14+14) Latency-999 49.9ms ±19% 45.9ms ±16% -8.06% (p=0.019 n=14+14) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 1.32ms ±12% 1.42ms ±10% +7.21% (p=0.003 n=14+14) Latency-95 16.3ms ±10% 15.0ms ± 7% -7.59% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.3ms ± 8% -7.20% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.8ms ± 8% -10.21% (p=0.001 n=14+13) All numbers are on 8 cores and with GOGC=10 (http benchmark has tiny heap, few goroutines and low allocation rate, so by default GC barely affects tail latency). 10us/5us yield delays seem to provide a reasonable compromise and give 5-10% tail latency reduction. That's what used in this change. go install -a std results on 4 core machine: name old time/op new time/op delta Time 8.39s ± 2% 7.94s ± 2% -5.34% (p=0.000 n=47+49) UserTime 24.6s ± 2% 22.9s ± 2% -6.76% (p=0.000 n=49+49) SysTime 1.77s ± 9% 1.89s ±11% +7.00% (p=0.000 n=49+49) CpuLoad 315ns ± 2% 313ns ± 1% -0.59% (p=0.000 n=49+48) # %CPU MaxRSS 97.1ms ± 4% 97.5ms ± 9% ~ (p=0.838 n=46+49) # bytes Update #14396 Update #14189 Change-Id: I3f4109bf8f7fd79b39c466576690a778232055a2 Reviewed-on: https://go-review.googlesource.com/21503 Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>
-
Dmitry Vyukov authored
Usleep(100) in runqgrab negatively affects latency and throughput of parallel application. We are sleeping instead of doing useful work. This is effect is particularly visible on windows where minimal sleep duration is 1-15ms. Reduce sleep from 100us to 3us and use osyield on windows. Sync chan send/recv takes ~50ns, so 3us gives us ~50x overshoot. benchmark old ns/op new ns/op delta BenchmarkChanSync-12 216 217 +0.46% BenchmarkChanSyncWork-12 27213 25816 -5.13% CPU consumption goes up from 106% to 108% in the first case, and from 107% to 125% in the second case. Test case from #14790 on windows: BenchmarkDefaultResolution-8 4583372 29720 -99.35% Benchmark1ms-8 992056 30701 -96.91% 99-th latency percentile for HTTP request serving is improved by up to 15% (see http://golang.org/cl/20835 for details). The following benchmarks are from the change that originally added this sleep (see https://golang.org/s/go15gomaxprocs): name old time/op new time/op delta Chain 22.6µs ± 2% 22.7µs ± 6% ~ (p=0.905 n=9+10) ChainBuf 22.4µs ± 3% 22.5µs ± 4% ~ (p=0.780 n=9+10) Chain-2 23.5µs ± 4% 24.9µs ± 1% +5.66% (p=0.000 n=10+9) ChainBuf-2 23.7µs ± 1% 24.4µs ± 1% +3.31% (p=0.000 n=9+10) Chain-4 24.2µs ± 2% 25.1µs ± 3% +3.70% (p=0.000 n=9+10) ChainBuf-4 24.4µs ± 5% 25.0µs ± 2% +2.37% (p=0.023 n=10+10) Powser 2.37s ± 1% 2.37s ± 1% ~ (p=0.423 n=8+9) Powser-2 2.48s ± 2% 2.57s ± 2% +3.74% (p=0.000 n=10+9) Powser-4 2.66s ± 1% 2.75s ± 1% +3.40% (p=0.000 n=10+10) Sieve 13.3s ± 2% 13.3s ± 2% ~ (p=1.000 n=10+9) Sieve-2 7.00s ± 2% 7.44s ±16% ~ (p=0.408 n=8+10) Sieve-4 4.13s ±21% 3.85s ±22% ~ (p=0.113 n=9+9) Fixes #14790 Change-Id: Ie7c6a1c4f9c8eb2f5d65ab127a3845386d6f8b5d Reviewed-on: https://go-review.googlesource.com/20835Reviewed-by: Austin Clements <austin@google.com>
-
Ilya Tocar authored
We already generate ADDL for byte operations, reflect this in code. This also allows inc/dec for +-1 operation, which are 1-byte shorter, and enables lea for 3-operand addition/subtraction. Change-Id: Ibfdfee50667ca4cd3c28f72e3dece0c6d114d3ae Reviewed-on: https://go-review.googlesource.com/21251Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Augusto Roman authored
This CL allows JSON-encoding & -decoding maps whose keys are types that implement encoding.TextMarshaler / TextUnmarshaler. During encode, the map keys are marshaled upfront so that they can be sorted. Fixes #12146 Change-Id: I43809750a7ad82a3603662f095c7baf75fd172da Reviewed-on: https://go-review.googlesource.com/20356 Run-TryBot: Caleb Spare <cespare@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Eric Lagergren authored
Fixes #6885 Change-Id: I6907958186f6a2427da1ad2f6c20bd5d7bf7a3f9 Reviewed-on: https://go-review.googlesource.com/19862Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Alexandru Moșoi authored
Since BCE happens over several passes (opt, loopbce, prove) it's easy to regress especially with rewriting. The pass is only activated with special debug flag. Change-Id: I46205982e7a2751156db8e875d69af6138068f59 Reviewed-on: https://go-review.googlesource.com/21510 Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
Brad Fitzpatrick authored
Go 1.6's HTTP/1.x Transport started enforcing that responses have 3 status digits, per the spec, but we could still write out invalid status codes ourselves if the called ResponseWriter.WriteHeader(0). That is bogus anyway, since the minimum status code is 1xx, but be a little bit less bogus (and consistent) and zero pad our responses. Change-Id: I6883901fd95073cb72f6b74035cabf1a79c35e1c Reviewed-on: https://go-review.googlesource.com/19130 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Andrew Gerrand <adg@golang.org>
-
Dave Cheney authored
Fixes #15122 Change-Id: Ie2c802d78aea731e25bf4b193b3c2e4c884e0573 Reviewed-on: https://go-review.googlesource.com/21524 Run-TryBot: Dave Cheney <dave@cheney.net> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
David Symonds authored
Change-Id: If4e740f3dbef4053355542eebdd899b3099d872c Reviewed-on: https://go-review.googlesource.com/21525Reviewed-by: Andrew Gerrand <adg@golang.org>
-
Paul Marks authored
This fixes a race which made it possible to cancel a connection after returning from net.Dial. Fixes #15035 Fixes #15078 Change-Id: Iec6215009538362f7ad9f408a33549f3e94d1606 Reviewed-on: https://go-review.googlesource.com/21497Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Alex Brainman authored
Fixes #15124 Change-Id: I55fe4c2957370f3fb417c3df54f99fb085a5dada Reviewed-on: https://go-review.googlesource.com/21522Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Alex Brainman authored
Fixes #15121 Change-Id: I651521743c56244c55eda5762905889d7e06887a Reviewed-on: https://go-review.googlesource.com/21521Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Alex Brainman <alex.brainman@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Alex Brainman authored
Fixes #15120 Change-Id: I1d9a192ac163826bad8b46e8c0b0b9e218e69570 Reviewed-on: https://go-review.googlesource.com/21520Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Brad Fitzpatrick authored
Currently only used by the client. The server is not yet wired up. A TODO remains to document how it works server-side, once implemented. Updates #14660 Change-Id: I27c2e74198872b2720995fa8271d91de200e23d5 Reviewed-on: https://go-review.googlesource.com/21496Reviewed-by: Andrew Gerrand <adg@golang.org>
-
Alex Brainman authored
Fixes #15119 Change-Id: I31445bf282a5e2a160ff4e66c5a592b989a5798f Reviewed-on: https://go-review.googlesource.com/21448Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Hiroshi Ioka authored
Current implementation uses GetShortPathName and GetLongPathName to get a normalized path. That approach sometimes fails because user can disable short path name anytime. This CL provides an alternative approach suggested by MSDN. https://msdn.microsoft.com/en-us/library/windows/desktop/aa364989(v=vs.85).aspx Fixes #13980 Change-Id: Icf4afe4c9c4b507fc110c1483bf8db2c3f606b0a Reviewed-on: https://go-review.googlesource.com/20860Reviewed-by: Alex Brainman <alex.brainman@gmail.com> Run-TryBot: Alex Brainman <alex.brainman@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Brad Fitzpatrick authored
This copies the golang.org/x/net/context package to the standard library. It is imported from the x/net repo's git rev 1d9fd3b8333e (the most recent modified to x/net/context as of 2016-03-07). The corresponding change to x/net/context is in https://golang.org/cl/20347 Updates #14660 Change-Id: Ida14b1b7e115194d6218d9ac614548b9f41641cc Reviewed-on: https://go-review.googlesource.com/20346Reviewed-by: Sameer Ajmani <sameer@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 04 Apr, 2016 5 commits
-
-
Matthew Dempsky authored
Briefly document what the importfoo functions do. Get rid of importsym's unused result parameter. Get rid of the redundant calls to importsym(s, OTYPE) after we've already called pkgtype(s). Passes toolstash -cmp. Change-Id: I4c057358144044f5356e4dec68907ec85f1fe806 Reviewed-on: https://go-review.googlesource.com/21498 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Mikio Hara authored
Also updates documentation. Change-Id: Idb0fc0feed61407f7f07eab81ce82b55ffde5040 Reviewed-on: https://go-review.googlesource.com/21446Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Mikio Hara authored
Alos moves TestTCPSelfConnect into tcpsock_test.go Change-Id: I3e1cbd029594ecb36a67f42bc3ecdbc7176a95dc Reviewed-on: https://go-review.googlesource.com/21447 Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Shahar Kohanim authored
Counting the final buffer size usually doesn't result in the buffer growing, so assume that it doesn't need to grow and only grow if necessary. name old secs new secs delta LinkCmdGo 0.49 ± 4% 0.48 ± 3% -1.31% (p=0.000 n=95+95) name old MaxRSS new MaxRSS delta LinkCmdGo 122k ± 4% 121k ± 5% ~ (p=0.065 n=96+100) Change-Id: I85e7f5688a61ef5ef2b1b7afe56507e71c5bd5b1 Reviewed-on: https://go-review.googlesource.com/21509Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Shahar Kohanim <skohanim@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org>
-
Robert Griesemer authored
Completed implementation for exporting inlined functions using the new binary export format. This change passes (export GO_GCFLAGS=-newexport; make all.bash) but for gc's builtin_test.go which we need to adjust before enabling this code by default. For a high-level description of the export format see the comment at the top of bexport.go. Major changes: 1) The export format for the platform independent export data changed: When we export inlined function bodies, additional objects (other functions, types, etc.) that are referred to by the function bodies will need to be exported. While this doesn't affect the platform-independent portion directly, it adds more objects to the exportlist while we are exporting. Instead of trying to sort the objects into groups, just export objects as they appear in the export list. This is slightly less compact (one extra byte per object), but it is simpler and much more flexible. 2) The export format contains now three sections: 1) The plat- form independent objects, 2) the objects pulled in for export via inlined function bodies, and 3) the inlined function bodies. 3) Completed the exporting and importing code for inlined function bodies. The format is completely compiler-specific and easily changeable w/o affecting other tools. There is still quite a bit of room for denser encoding. This can happen at any time in the future. This change contains also the adjustments for go/internal/gcimporter, necessary because of the export format change 1) mentioned above. For #13241. Change-Id: I86bca0bd984b12ccf13d0d30892e6e25f6d04ed5 Reviewed-on: https://go-review.googlesource.com/21172 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-