1. 11 May, 2015 8 commits
    • Russ Cox's avatar
      runtime: optimize heapBitsSetType · feb8a3b6
      Russ Cox authored
      For the conversion of the heap bitmap from 4-bit to 2-bit fields,
      I replaced heapBitsSetType with the dumbest thing that could possibly work:
      two atomic operations (atomicand8+atomicor8) per 2-bit field.
      
      This CL replaces that code with a proper implementation that
      avoids the atomics whenever possible. Benchmarks vs base CL
      (before the conversion to 2-bit heap bitmap) and vs Go 1.4 below.
      
      Compared to Go 1.4, SetTypePtr (a 1-pointer allocation)
      is 10ns slower because a race against the concurrent GC requires the
      use of an atomicor8 that used to be an ordinary write. This slowdown
      was present even in the base CL.
      
      Compared to both Go 1.4 and base, SetTypeNode8 (a 10-word allocation)
      is 10ns slower because it too needs a new atomic, because with the
      denser representation, the byte on the end of the allocation is now shared
      with the object next to it; this was not true with the 4-bit representation.
      
      Excluding these two (fundamental) slowdowns due to the use of atomics,
      the new code is noticeably faster than both Go 1.4 and the base CL.
      
      The next CL will reintroduce the ``typeDead'' optimization.
      
      Stats are from 5 runs on a MacBookPro10,2 (late 2012 Core i5).
      
      Compared to base CL (** = new atomic)
      name                  old mean              new mean              delta
      SetTypePtr            14.1ns × (0.99,1.02)  14.7ns × (0.93,1.10)     ~    (p=0.175)
      SetTypePtr8           18.4ns × (1.00,1.01)  18.6ns × (0.81,1.21)     ~    (p=0.866)
      SetTypePtr16          28.7ns × (1.00,1.00)  22.4ns × (0.90,1.27)  -21.88% (p=0.015)
      SetTypePtr32          52.3ns × (1.00,1.00)  33.8ns × (0.93,1.24)  -35.37% (p=0.001)
      SetTypePtr64          79.2ns × (1.00,1.00)  55.1ns × (1.00,1.01)  -30.43% (p=0.000)
      SetTypePtr126          118ns × (1.00,1.00)   100ns × (1.00,1.00)  -15.97% (p=0.000)
      SetTypePtr128          130ns × (0.92,1.19)    98ns × (1.00,1.00)  -24.36% (p=0.008)
      SetTypePtrSlice        726ns × (0.96,1.08)   760ns × (1.00,1.00)     ~    (p=0.152)
      SetTypeNode1          14.1ns × (0.94,1.15)  12.0ns × (1.00,1.01)  -14.60% (p=0.020)
      SetTypeNode1Slice      135ns × (0.96,1.07)    88ns × (1.00,1.00)  -34.53% (p=0.000)
      SetTypeNode8          20.9ns × (1.00,1.01)  32.6ns × (1.00,1.00)  +55.37% (p=0.000) **
      SetTypeNode8Slice      414ns × (0.99,1.02)   244ns × (1.00,1.00)  -41.09% (p=0.000)
      SetTypeNode64         80.0ns × (1.00,1.00)  57.4ns × (1.00,1.00)  -28.23% (p=0.000)
      SetTypeNode64Slice    2.15µs × (1.00,1.01)  1.56µs × (1.00,1.00)  -27.43% (p=0.000)
      SetTypeNode124         119ns × (0.99,1.00)   100ns × (1.00,1.00)  -16.11% (p=0.000)
      SetTypeNode124Slice   3.40µs × (1.00,1.00)  2.93µs × (1.00,1.00)  -13.80% (p=0.000)
      SetTypeNode126         120ns × (1.00,1.01)    98ns × (1.00,1.00)  -18.19% (p=0.000)
      SetTypeNode126Slice   3.53µs × (0.98,1.08)  3.02µs × (1.00,1.00)  -14.49% (p=0.002)
      SetTypeNode1024        726ns × (0.97,1.09)   740ns × (1.00,1.00)     ~    (p=0.451)
      SetTypeNode1024Slice  24.9µs × (0.89,1.37)  23.1µs × (1.00,1.00)     ~    (p=0.476)
      
      Compared to Go 1.4 (** = new atomic)
      name                  old mean               new mean              delta
      SetTypePtr            5.71ns × (0.89,1.19)  14.68ns × (0.93,1.10)  +157.24% (p=0.000) **
      SetTypePtr8           19.3ns × (0.96,1.10)   18.6ns × (0.81,1.21)      ~    (p=0.638)
      SetTypePtr16          30.7ns × (0.99,1.03)   22.4ns × (0.90,1.27)   -26.88% (p=0.005)
      SetTypePtr32          51.5ns × (1.00,1.00)   33.8ns × (0.93,1.24)   -34.40% (p=0.001)
      SetTypePtr64          83.6ns × (0.94,1.12)   55.1ns × (1.00,1.01)   -34.12% (p=0.001)
      SetTypePtr126          137ns × (0.87,1.26)    100ns × (1.00,1.00)   -27.10% (p=0.028)
      SetTypePtrSlice        865ns × (0.80,1.23)    760ns × (1.00,1.00)      ~    (p=0.243)
      SetTypeNode1          15.2ns × (0.88,1.12)   12.0ns × (1.00,1.01)   -20.89% (p=0.014)
      SetTypeNode1Slice      156ns × (0.93,1.16)     88ns × (1.00,1.00)   -43.57% (p=0.001)
      SetTypeNode8          23.8ns × (0.90,1.18)   32.6ns × (1.00,1.00)   +36.76% (p=0.003) **
      SetTypeNode8Slice      502ns × (0.92,1.10)    244ns × (1.00,1.00)   -51.46% (p=0.000)
      SetTypeNode64         85.6ns × (0.94,1.11)   57.4ns × (1.00,1.00)   -32.89% (p=0.001)
      SetTypeNode64Slice    2.36µs × (0.91,1.14)   1.56µs × (1.00,1.00)   -33.96% (p=0.002)
      SetTypeNode124         130ns × (0.91,1.12)    100ns × (1.00,1.00)   -23.49% (p=0.004)
      SetTypeNode124Slice   3.81µs × (0.90,1.22)   2.93µs × (1.00,1.00)   -23.09% (p=0.025)
      
      There are fewer benchmarks vs Go 1.4 because unrolling directly
      into the heap bitmap is not yet implemented, so those would not
      be meaningful comparisons.
      
      These benchmarks were not present in Go 1.4 as distributed.
      The backport to Go 1.4 is in github.com/rsc/go's go14bench branch,
      commit 71d5ee5.
      
      Change-Id: I95ed05a22bf484b0fc9efad549279e766c98d2b6
      Reviewed-on: https://go-review.googlesource.com/9704Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      feb8a3b6
    • Russ Cox's avatar
      runtime: use 2-bit heap bitmap (in place of 4-bit) · 0234dfd4
      Russ Cox authored
      Previous CLs changed the representation of the non-heap type bitmaps
      to be 1-bit bitmaps (pointer or not). Before this CL, the heap bitmap
      stored a 2-bit type for each word and a mark bit and checkmark bit
      for the first word of the object. (There used to be additional per-word bits.)
      
      Reduce heap bitmap to 2-bit, with 1 dedicated to pointer or not,
      and the other used for mark, checkmark, and "keep scanning forward
      to find pointers in this object." See comments for details.
      
      This CL replaces heapBitsSetType with very slow but obviously correct code.
      A followup CL will optimize it. (Spoiler: the new code is faster than Go 1.4 was.)
      
      Change-Id: I999577a133f3cfecacebdec9cdc3573c235c7fb9
      Reviewed-on: https://go-review.googlesource.com/9703Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      0234dfd4
    • Russ Cox's avatar
      runtime: use 1-bit pointer bitmaps in type representation · 6d8a147b
      Russ Cox authored
      The type information in reflect.Type and the GC programs is now
      1 bit per word, down from 2 bits.
      
      The in-memory unrolled type bitmap representation are now
      1 bit per word, down from 4 bits.
      
      The conversion from the unrolled (now 1-bit) bitmap to the
      heap bitmap (still 4-bit) is not optimized. A followup CL will
      work on that, after the heap bitmap has been converted to 2-bit.
      
      The typeDead optimization, in which a special value denotes
      that there are no more pointers anywhere in the object, is lost
      in this CL. A followup CL will bring it back in the final form of
      heapBitsSetType.
      
      Change-Id: If61e67950c16a293b0b516a6fd9a1c755b6d5549
      Reviewed-on: https://go-review.googlesource.com/9702Reviewed-by: default avatarAustin Clements <austin@google.com>
      6d8a147b
    • Russ Cox's avatar
      runtime: add benchmark of heapBitsSetType · 7d9e16ab
      Russ Cox authored
      There was an old benchmark that measured this indirectly
      via allocation, but I don't understand how to factor out the
      allocation cost when interpreting the numbers.
      
      Replace with a benchmark that only calls heapBitsSetType,
      that does not allocate. This was not possible when the
      benchmark was first written, because heapBitsSetType had
      not been factored out of mallocgc.
      
      Change-Id: I30f0f02362efab3465a50769398be859832e6640
      Reviewed-on: https://go-review.googlesource.com/9701Reviewed-by: default avatarAustin Clements <austin@google.com>
      7d9e16ab
    • Daniel Morsing's avatar
      runtime: enable profiling on g0 · db6f88a8
      Daniel Morsing authored
      Since we now have stack information for code running on the
      systemstack, we can traceback over it. To make cpu profiles useful,
      add a case in gentraceback to jump over systemstack switches.
      
      Fixes #10609.
      
      Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7
      Reviewed-on: https://go-review.googlesource.com/9506Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      db6f88a8
    • Patrick Mezard's avatar
      internal/syscall/windows/registry: handle invalid integer values · 19e81a9b
      Patrick Mezard authored
      I have around twenty of such values on a Windows 7 development machine.
      regedit displays (translated): "invalid 32-bits DWORD value".
      
      Change-Id: Ib37a414ee4c85e891b0a25fed2ddad9e105f5f4e
      Reviewed-on: https://go-review.googlesource.com/9901Reviewed-by: default avatarAlex Brainman <alex.brainman@gmail.com>
      19e81a9b
    • Shenghou Ma's avatar
      misc/trace: add license for the trace-viewer · dce432b3
      Shenghou Ma authored
      The trace-viewer doesn't use the Go license, so it makes sense
      to include the license text into the README.md file.
      
      While we're at here, reformat existing text using real Markdown
      syntax.
      
      Change-Id: I13e42d3cc6a0ca7e64e3d46ad460dc0460f7ed09
      Reviewed-on: https://go-review.googlesource.com/9882Reviewed-by: default avatarRob Pike <r@golang.org>
      dce432b3
    • Mikio Hara's avatar
      net: increase timeout in TestWriteTimeoutFluctuation on darwin/arm · cbcc7584
      Mikio Hara authored
      On darwin/arm, the test sometimes fails with:
      
      Process 557 resuming
      --- FAIL: TestWriteTimeoutFluctuation (1.64s)
      	timeout_test.go:706: Write took over 1s; expected 0.1s
      FAIL
      Process 557 exited with status = 1 (0x00000001)
      go_darwin_arm_exec: timeout running tests
      
      This change increaes timeout on iOS builders from 1s to 3s as a
      temporarily fix.
      
      Updates #10775.
      
      Change-Id: Ifdaf99cf5b8582c1a636a0f7d5cc66bb276efd72
      Reviewed-on: https://go-review.googlesource.com/9915Reviewed-by: default avatarMinux Ma <minux@golang.org>
      cbcc7584
  2. 10 May, 2015 2 commits
  3. 09 May, 2015 3 commits
  4. 08 May, 2015 6 commits
  5. 07 May, 2015 19 commits
  6. 06 May, 2015 2 commits