1. 11 May, 2015 20 commits
    • Daniel Morsing's avatar
      net/http: silence race detector on client header timeout test · 516f0d1c
      Daniel Morsing authored
      When running the client header timeout test, there is a race between
      us timing out and waiting on the remaining requests to be serviced. If
      the client times out before the server blocks on the channel in the
      handler, we will be simultaneously adding to a waitgroup with the
      value 0 and waiting on it when we call TestServer.Close().
      
      This is largely a theoretical race. We have to time out before we
      enter the handler and the only reason we would time out if we're
      blocked on the channel. Nevertheless, make the race detector happy
      by turning the close into a channel send. This turns the defer call
      into a synchronization point and we can be sure that we've entered
      the handler before we close the server.
      
      Fixes #10780
      
      Change-Id: Id73b017d1eb7503e446aa51538712ef49f2f5c9e
      Reviewed-on: https://go-review.googlesource.com/9905Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      516f0d1c
    • Russ Cox's avatar
      runtime: use heap bitmap for typedmemmove · 4212a3c3
      Russ Cox authored
      The current implementation of typedmemmove walks the ptrmask
      in the type to find out where pointers are. This led to turning off
      GC programs for the Go 1.5 dev cycle, so that there would always
      be a ptrmask. Instead of also interpreting the GC programs,
      interpret the heap bitmap, which we know must be available and
      up to date. (There is no point to write barriers when writing outside
      the heap.)
      
      This CL is only about correctness. The next CL will optimize the code.
      
      Change-Id: Id1305c7c071fd2734ab96634b0e1c745b23fa793
      Reviewed-on: https://go-review.googlesource.com/9886Reviewed-by: default avatarAustin Clements <austin@google.com>
      4212a3c3
    • Russ Cox's avatar
      runtime: zero entire bitmap for object, even past dead marker · 266a842f
      Russ Cox authored
      We want typedmemmove to use the heap bitmap to determine
      where pointers are, instead of reinterpreting the type information.
      The heap bitmap is simpler to access.
      
      In general, typedmemmove will need to be able to look up the bits
      for any word and find valid pointer information, so fill even after the
      dead marker. Not filling after the dead marker was an optimization
      I introduced only a few days ago, when reintroducing the dead marker
      code. At the time I said it probably wouldn't last, and it didn't.
      
      Change-Id: I6ba01bff17ddee1ff429f454abe29867ec60606e
      Reviewed-on: https://go-review.googlesource.com/9885Reviewed-by: default avatarAustin Clements <austin@google.com>
      266a842f
    • Russ Cox's avatar
      runtime: reorder bits in heap bitmap bytes · e375ca2a
      Russ Cox authored
      The runtime deals with 1-bit pointer bitmaps and 2-bit heap bitmaps
      that have entries for both pointers and mark bits.
      
      Each byte in a 1-bit pointer bitmap looks like pppppppp (all pointer bits).
      Each byte in a 2-bit heap bitmap looks like mpmpmpmp (mark, pointer, ...).
      This means that when converting from 1-bit to 2-bit, as we do
      during malloc, we have to pick up 4 bits in pppp form and use
      shifts to create the mpmpmpmp form.
      
      This CL changes the 2-bit heap bitmap form to mmmmpppp,
      so that 4 bits picked up in 1-bit form can be used directly in
      the low bits of the heap bitmap byte, without expansion.
      This simplifies the code, and it also happens to be faster.
      
      name                    old mean              new mean              delta
      SetTypePtr              14.0ns × (0.98,1.09)  14.0ns × (0.98,1.08)     ~    (p=0.966)
      SetTypePtr8             16.5ns × (0.99,1.05)  15.3ns × (0.96,1.16)   -6.86% (p=0.012)
      SetTypePtr16            21.3ns × (0.98,1.05)  18.8ns × (0.94,1.14)  -11.49% (p=0.000)
      SetTypePtr32            34.6ns × (0.93,1.22)  27.7ns × (0.91,1.26)  -20.08% (p=0.001)
      SetTypePtr64            55.7ns × (0.97,1.11)  41.6ns × (0.98,1.04)  -25.30% (p=0.000)
      SetTypePtr126           98.0ns × (1.00,1.00)  67.7ns × (0.99,1.05)  -30.88% (p=0.000)
      SetTypePtr128           98.6ns × (1.00,1.01)  68.6ns × (0.99,1.03)  -30.44% (p=0.000)
      SetTypePtrSlice          781ns × (0.99,1.01)   571ns × (0.99,1.04)  -26.93% (p=0.000)
      SetTypeNode1            13.1ns × (0.99,1.01)  12.1ns × (0.99,1.01)   -7.45% (p=0.000)
      SetTypeNode1Slice        113ns × (0.99,1.01)    94ns × (1.00,1.00)  -16.35% (p=0.000)
      SetTypeNode8            32.7ns × (1.00,1.00)  29.8ns × (0.99,1.01)   -8.97% (p=0.000)
      SetTypeNode8Slice        266ns × (1.00,1.00)   204ns × (1.00,1.00)  -23.40% (p=0.000)
      SetTypeNode64           58.0ns × (0.98,1.08)  42.8ns × (1.00,1.01)  -26.24% (p=0.000)
      SetTypeNode64Slice      1.55µs × (0.99,1.02)  0.96µs × (1.00,1.00)  -37.84% (p=0.000)
      SetTypeNode64Dead       13.1ns × (0.99,1.01)  12.1ns × (1.00,1.00)   -7.33% (p=0.000)
      SetTypeNode64DeadSlice  1.52µs × (1.00,1.01)  1.08µs × (1.00,1.01)  -28.95% (p=0.000)
      SetTypeNode124          97.9ns × (1.00,1.00)  67.1ns × (1.00,1.01)  -31.49% (p=0.000)
      SetTypeNode124Slice     2.87µs × (0.99,1.02)  1.75µs × (1.00,1.01)  -39.15% (p=0.000)
      SetTypeNode126          98.4ns × (1.00,1.01)  68.1ns × (1.00,1.01)  -30.79% (p=0.000)
      SetTypeNode126Slice     2.91µs × (0.99,1.01)  1.77µs × (0.99,1.01)  -39.09% (p=0.000)
      SetTypeNode1024          732ns × (1.00,1.00)   511ns × (0.87,1.42)  -30.14% (p=0.000)
      SetTypeNode1024Slice    23.1µs × (1.00,1.00)  13.9µs × (0.99,1.02)  -39.83% (p=0.000)
      
      Change-Id: I12e3b850a4e6fa6c8146b8635ff728f3ef658819
      Reviewed-on: https://go-review.googlesource.com/9828Reviewed-by: default avatarAustin Clements <austin@google.com>
      e375ca2a
    • Russ Cox's avatar
      runtime: move a few atomic fields up · 363fd1dd
      Russ Cox authored
      Moving them up makes them properly aligned on 32-bit systems.
      There are some odd fields above them right now
      (like fixalloc and mutex maybe).
      
      Change-Id: I57851a5bbb2e7cc339712f004f99bb6c0cce0ca5
      Reviewed-on: https://go-review.googlesource.com/9889Reviewed-by: default avatarAustin Clements <austin@google.com>
      363fd1dd
    • Russ Cox's avatar
      cmd/internal/gc: mark panicindex calls as not returning · fc595b78
      Russ Cox authored
      Most of the calls to panicindex are already
      marked as not returning, but these two were missed
      at some point.
      
      Performance changes below.
      
      name                   old mean              new mean              delta
      BinaryTree17            5.70s × (0.98,1.04)   5.68s × (0.97,1.04)    ~    (p=0.681)
      Fannkuch11              4.32s × (1.00,1.00)   4.41s × (0.98,1.03)  +1.98% (p=0.018)
      FmtFprintfEmpty        92.6ns × (0.91,1.11)  92.7ns × (0.91,1.16)    ~    (p=0.969)
      FmtFprintfString        280ns × (0.97,1.05)   281ns × (0.96,1.08)    ~    (p=0.860)
      FmtFprintfInt           284ns × (0.99,1.02)   288ns × (0.97,1.06)    ~    (p=0.207)
      FmtFprintfIntInt        488ns × (0.98,1.01)   493ns × (0.97,1.04)    ~    (p=0.271)
      FmtFprintfPrefixedInt   418ns × (0.98,1.04)   423ns × (0.97,1.04)    ~    (p=0.311)
      FmtFprintfFloat         597ns × (1.00,1.00)   598ns × (0.99,1.01)    ~    (p=0.789)
      FmtManyArgs            1.87µs × (0.99,1.01)  1.89µs × (0.98,1.05)    ~    (p=0.158)
      GobDecode              14.6ms × (0.99,1.01)  14.8ms × (0.98,1.03)  +1.51% (p=0.015)
      GobEncode              12.3ms × (0.98,1.03)  12.3ms × (0.98,1.01)    ~    (p=0.474)
      Gzip                    647ms × (1.00,1.01)   656ms × (0.99,1.05)    ~    (p=0.104)
      Gunzip                  142ms × (1.00,1.00)   142ms × (1.00,1.00)    ~    (p=0.110)
      HTTPClientServer       89.6µs × (0.99,1.03)  91.2µs × (0.97,1.04)    ~    (p=0.061)
      JSONEncode             31.7ms × (0.99,1.01)  32.6ms × (0.97,1.08)  +2.87% (p=0.038)
      JSONDecode              111ms × (1.00,1.01)   114ms × (0.97,1.05)  +2.47% (p=0.040)
      Mandelbrot200          6.01ms × (1.00,1.00)  6.11ms × (0.98,1.04)    ~    (p=0.073)
      GoParse                6.54ms × (0.99,1.02)  6.66ms × (0.97,1.04)    ~    (p=0.064)
      RegexpMatchEasy0_32     159ns × (0.99,1.02)   159ns × (0.99,1.00)    ~    (p=0.693)
      RegexpMatchEasy0_1K     540ns × (0.99,1.03)   538ns × (1.00,1.01)    ~    (p=0.360)
      RegexpMatchEasy1_32     137ns × (0.99,1.01)   138ns × (1.00,1.00)    ~    (p=0.511)
      RegexpMatchEasy1_1K     867ns × (1.00,1.01)   869ns × (0.99,1.01)    ~    (p=0.193)
      RegexpMatchMedium_32    252ns × (1.00,1.00)   252ns × (0.99,1.01)    ~    (p=0.076)
      RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  72.7µs × (1.00,1.00)    ~    (p=0.963)
      RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.85µs × (1.00,1.00)    ~    (p=0.371)
      RegexpMatchHard_1K      117µs × (1.00,1.01)   118µs × (1.00,1.00)    ~    (p=0.898)
      Revcomp                 909ms × (0.98,1.03)   920ms × (0.97,1.07)    ~    (p=0.368)
      Template                128ms × (0.99,1.01)   129ms × (0.98,1.03)  +1.41% (p=0.042)
      TimeParse               619ns × (0.98,1.01)   619ns × (0.99,1.01)    ~    (p=0.730)
      TimeFormat              651ns × (1.00,1.01)   661ns × (0.98,1.04)    ~    (p=0.097)
      
      Change-Id: I0ec5baff41f5d282307137ce0d927e6301e4fa10
      Reviewed-on: https://go-review.googlesource.com/9811Reviewed-by: default avatarDavid Chase <drchase@google.com>
      fc595b78
    • Russ Cox's avatar
      cmd/internal/gc: drop unused Reslice field from Node · dcf6e206
      Russ Cox authored
      Dead code.
      
      This field is left over from Go 1.4, when we elided the fake write
      barrier in this case. Today, it's unused (always false).
      The upcoming append/slice changes handle this case again,
      but without needing this field.
      
      Change-Id: Ic6f160b64efdc1bbed02097ee03050f8cd0ab1b8
      Reviewed-on: https://go-review.googlesource.com/9789Reviewed-by: default avatarDavid Chase <drchase@google.com>
      dcf6e206
    • Russ Cox's avatar
      cmd/internal/gc: show register dump before crashing on register left allocated · c70b4b5f
      Russ Cox authored
      If you are using -h to get a stack trace at the site of the failure,
      Yyerror will never return. Dump the register allocation sites
      before calling Yyerror.
      
      Change-Id: I51266c03e06cb5084c2eaa89b367b9ed85ba286a
      Reviewed-on: https://go-review.googlesource.com/9788Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: default avatarDave Cheney <dave@cheney.net>
      c70b4b5f
    • Russ Cox's avatar
      runtime: fix TestLFStack on 386 · 8f037fa1
      Russ Cox authored
      The new(uint64) was moving to the stack, which may not be aligned.
      
      Change-Id: Iad070964202001b52029494d43e299fed980f939
      Reviewed-on: https://go-review.googlesource.com/9787Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      8f037fa1
    • Russ Cox's avatar
      cmd/internal/gc: emit branches in -g mode · 351897d9
      Russ Cox authored
      The -g mode is a debugging mode that prints instructions
      as they are constructed. Gbranch was just missing the print.
      
      Change-Id: I3fb45fd9bd3996ed96df5be903b9fd6bd97148b0
      Reviewed-on: https://go-review.googlesource.com/9827Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      351897d9
    • Russ Cox's avatar
      runtime: remove wbshadow mode · 1635ab7d
      Russ Cox authored
      The write barrier shadow heap was very useful for
      developing the write barriers initially, but it's no longer used,
      clunky, and dragging the rest of the implementation down.
      
      The gccheckmark mode will find bugs due to missed barriers
      when they result in missed marks; wbshadow mode found the
      missed barriers more aggressively, but it required an entire
      separate copy of the heap. The gccheckmark mode requires
      no extra memory, making it more useful in practice.
      
      Compared to previous CL:
      name                   old mean              new mean              delta
      BinaryTree17            5.91s × (0.96,1.06)   5.72s × (0.97,1.03)  -3.12% (p=0.000)
      Fannkuch11              4.32s × (1.00,1.00)   4.36s × (1.00,1.00)  +0.91% (p=0.000)
      FmtFprintfEmpty        89.0ns × (0.93,1.10)  86.6ns × (0.96,1.11)    ~    (p=0.077)
      FmtFprintfString        298ns × (0.98,1.06)   283ns × (0.99,1.04)  -4.90% (p=0.000)
      FmtFprintfInt           286ns × (0.98,1.03)   283ns × (0.98,1.04)  -1.09% (p=0.032)
      FmtFprintfIntInt        498ns × (0.97,1.06)   480ns × (0.99,1.02)  -3.65% (p=0.000)
      FmtFprintfPrefixedInt   408ns × (0.98,1.02)   396ns × (0.99,1.01)  -3.00% (p=0.000)
      FmtFprintfFloat         587ns × (0.98,1.01)   562ns × (0.99,1.01)  -4.34% (p=0.000)
      FmtManyArgs            1.94µs × (0.99,1.02)  1.89µs × (0.99,1.01)  -2.85% (p=0.000)
      GobDecode              15.8ms × (0.98,1.03)  15.7ms × (0.99,1.02)    ~    (p=0.251)
      GobEncode              12.0ms × (0.96,1.09)  11.8ms × (0.98,1.03)  -1.87% (p=0.024)
      Gzip                    648ms × (0.99,1.01)   647ms × (0.99,1.01)    ~    (p=0.688)
      Gunzip                  143ms × (1.00,1.01)   143ms × (1.00,1.01)    ~    (p=0.203)
      HTTPClientServer       90.3µs × (0.98,1.01)  89.1µs × (0.99,1.02)  -1.30% (p=0.000)
      JSONEncode             31.6ms × (0.99,1.01)  31.7ms × (0.98,1.02)    ~    (p=0.219)
      JSONDecode              107ms × (1.00,1.01)   111ms × (0.99,1.01)  +3.58% (p=0.000)
      Mandelbrot200          6.03ms × (1.00,1.01)  6.01ms × (1.00,1.00)    ~    (p=0.077)
      GoParse                6.53ms × (0.99,1.03)  6.54ms × (0.99,1.02)    ~    (p=0.585)
      RegexpMatchEasy0_32     161ns × (1.00,1.01)   161ns × (0.98,1.05)    ~    (p=0.948)
      RegexpMatchEasy0_1K     541ns × (0.99,1.01)   559ns × (0.98,1.01)  +3.32% (p=0.000)
      RegexpMatchEasy1_32     138ns × (1.00,1.00)   137ns × (0.99,1.01)  -0.55% (p=0.001)
      RegexpMatchEasy1_1K     887ns × (0.99,1.01)   878ns × (0.99,1.01)  -0.98% (p=0.000)
      RegexpMatchMedium_32    253ns × (0.99,1.01)   252ns × (0.99,1.01)  -0.39% (p=0.001)
      RegexpMatchMedium_1K   72.8µs × (1.00,1.00)  72.7µs × (1.00,1.00)    ~    (p=0.485)
      RegexpMatchHard_32     3.85µs × (1.00,1.01)  3.85µs × (1.00,1.01)    ~    (p=0.283)
      RegexpMatchHard_1K      117µs × (1.00,1.01)   117µs × (1.00,1.00)    ~    (p=0.175)
      Revcomp                 922ms × (0.97,1.08)   903ms × (0.98,1.05)  -2.15% (p=0.021)
      Template                126ms × (0.99,1.01)   126ms × (0.99,1.01)    ~    (p=0.943)
      TimeParse               628ns × (0.99,1.01)   634ns × (0.99,1.01)  +0.92% (p=0.000)
      TimeFormat              668ns × (0.99,1.01)   698ns × (0.98,1.03)  +4.53% (p=0.000)
      
      It's nice that the microbenchmarks are the ones helped the most,
      because those were the ones hurt the most by the conversion from
      4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that
      process to (compared to CL 9706 patch set 1):
      
      name                   old mean              new mean              delta
      BinaryTree17            5.87s × (0.94,1.09)   5.72s × (0.97,1.03)  -2.57% (p=0.011)
      Fannkuch11              4.32s × (1.00,1.00)   4.36s × (1.00,1.00)  +0.87% (p=0.000)
      FmtFprintfEmpty        89.1ns × (0.95,1.16)  86.6ns × (0.96,1.11)    ~    (p=0.090)
      FmtFprintfString        283ns × (0.98,1.02)   283ns × (0.99,1.04)    ~    (p=0.681)
      FmtFprintfInt           284ns × (0.98,1.04)   283ns × (0.98,1.04)    ~    (p=0.620)
      FmtFprintfIntInt        486ns × (0.98,1.03)   480ns × (0.99,1.02)  -1.27% (p=0.002)
      FmtFprintfPrefixedInt   400ns × (0.99,1.02)   396ns × (0.99,1.01)  -0.84% (p=0.001)
      FmtFprintfFloat         566ns × (0.99,1.01)   562ns × (0.99,1.01)  -0.80% (p=0.000)
      FmtManyArgs            1.91µs × (0.99,1.02)  1.89µs × (0.99,1.01)  -1.10% (p=0.000)
      GobDecode              15.5ms × (0.98,1.05)  15.7ms × (0.99,1.02)  +1.55% (p=0.005)
      GobEncode              11.9ms × (0.97,1.03)  11.8ms × (0.98,1.03)  -0.97% (p=0.048)
      Gzip                    648ms × (0.99,1.01)   647ms × (0.99,1.01)    ~    (p=0.627)
      Gunzip                  143ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.482)
      HTTPClientServer       89.2µs × (0.99,1.02)  89.1µs × (0.99,1.02)    ~    (p=0.740)
      JSONEncode             32.3ms × (0.97,1.06)  31.7ms × (0.98,1.02)  -1.95% (p=0.002)
      JSONDecode              106ms × (0.99,1.01)   111ms × (0.99,1.01)  +4.22% (p=0.000)
      Mandelbrot200          6.02ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.417)
      GoParse                6.57ms × (0.97,1.06)  6.54ms × (0.99,1.02)    ~    (p=0.404)
      RegexpMatchEasy0_32     162ns × (1.00,1.00)   161ns × (0.98,1.05)    ~    (p=0.088)
      RegexpMatchEasy0_1K     561ns × (0.99,1.02)   559ns × (0.98,1.01)  -0.47% (p=0.034)
      RegexpMatchEasy1_32     145ns × (0.95,1.04)   137ns × (0.99,1.01)  -5.56% (p=0.000)
      RegexpMatchEasy1_1K     864ns × (0.99,1.04)   878ns × (0.99,1.01)  +1.57% (p=0.000)
      RegexpMatchMedium_32    255ns × (0.99,1.04)   252ns × (0.99,1.01)  -1.43% (p=0.001)
      RegexpMatchMedium_1K   73.9µs × (0.98,1.04)  72.7µs × (1.00,1.00)  -1.55% (p=0.004)
      RegexpMatchHard_32     3.92µs × (0.98,1.04)  3.85µs × (1.00,1.01)  -1.80% (p=0.003)
      RegexpMatchHard_1K      120µs × (0.98,1.04)   117µs × (1.00,1.00)  -2.13% (p=0.001)
      Revcomp                 936ms × (0.95,1.08)   903ms × (0.98,1.05)  -3.58% (p=0.002)
      Template                130ms × (0.98,1.04)   126ms × (0.99,1.01)  -2.98% (p=0.000)
      TimeParse               638ns × (0.98,1.05)   634ns × (0.99,1.01)    ~    (p=0.198)
      TimeFormat              674ns × (0.99,1.01)   698ns × (0.98,1.03)  +3.69% (p=0.000)
      
      Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c
      Reviewed-on: https://go-review.googlesource.com/9706Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      1635ab7d
    • Russ Cox's avatar
      runtime: reintroduce ``dead'' space during GC scan · 54af9a3b
      Russ Cox authored
      Reintroduce an optimization discarded during the initial conversion
      from 4-bit heap bitmaps to 2-bit heap bitmaps: when we reach the
      place in the bitmap where there are no more pointers, mark that position
      for the GC so that it can avoid scanning past that place.
      
      During heapBitsSetType we can also avoid initializing heap bitmap
      beyond that location, which gives a bit of a win compared to Go 1.4.
      This particular optimization (not initializing the heap bitmap) may not last:
      we might change typedmemmove to use the heap bitmap, in which
      case it would all need to be initialized. The early stop in the GC scan
      will stay no matter what.
      
      Compared to Go 1.4 (github.com/rsc/go, branch go14bench):
      name                    old mean              new mean              delta
      SetTypeNode64           80.7ns × (1.00,1.01)  57.4ns × (1.00,1.01)  -28.83% (p=0.000)
      SetTypeNode64Dead       80.5ns × (1.00,1.01)  13.1ns × (0.99,1.02)  -83.77% (p=0.000)
      SetTypeNode64Slice      2.16µs × (1.00,1.01)  1.54µs × (1.00,1.01)  -28.75% (p=0.000)
      SetTypeNode64DeadSlice  2.16µs × (1.00,1.01)  1.52µs × (1.00,1.00)  -29.74% (p=0.000)
      
      Compared to previous CL:
      name                    old mean              new mean              delta
      SetTypeNode64           56.7ns × (1.00,1.00)  57.4ns × (1.00,1.01)   +1.19% (p=0.000)
      SetTypeNode64Dead       57.2ns × (1.00,1.00)  13.1ns × (0.99,1.02)  -77.15% (p=0.000)
      SetTypeNode64Slice      1.56µs × (1.00,1.01)  1.54µs × (1.00,1.01)   -0.89% (p=0.000)
      SetTypeNode64DeadSlice  1.55µs × (1.00,1.01)  1.52µs × (1.00,1.00)   -2.23% (p=0.000)
      
      This is the last CL in the sequence converting from the 4-bit heap
      to the 2-bit heap, with all the same optimizations reenabled.
      Compared to before that process began (compared to CL 9701 patch set 1):
      
      name                    old mean              new mean              delta
      BinaryTree17             5.87s × (0.94,1.09)   5.91s × (0.96,1.06)    ~    (p=0.578)
      Fannkuch11               4.32s × (1.00,1.00)   4.32s × (1.00,1.00)    ~    (p=0.474)
      FmtFprintfEmpty         89.1ns × (0.95,1.16)  89.0ns × (0.93,1.10)    ~    (p=0.942)
      FmtFprintfString         283ns × (0.98,1.02)   298ns × (0.98,1.06)  +5.33% (p=0.000)
      FmtFprintfInt            284ns × (0.98,1.04)   286ns × (0.98,1.03)    ~    (p=0.208)
      FmtFprintfIntInt         486ns × (0.98,1.03)   498ns × (0.97,1.06)  +2.48% (p=0.000)
      FmtFprintfPrefixedInt    400ns × (0.99,1.02)   408ns × (0.98,1.02)  +2.23% (p=0.000)
      FmtFprintfFloat          566ns × (0.99,1.01)   587ns × (0.98,1.01)  +3.69% (p=0.000)
      FmtManyArgs             1.91µs × (0.99,1.02)  1.94µs × (0.99,1.02)  +1.81% (p=0.000)
      GobDecode               15.5ms × (0.98,1.05)  15.8ms × (0.98,1.03)  +1.94% (p=0.002)
      GobEncode               11.9ms × (0.97,1.03)  12.0ms × (0.96,1.09)    ~    (p=0.263)
      Gzip                     648ms × (0.99,1.01)   648ms × (0.99,1.01)    ~    (p=0.992)
      Gunzip                   143ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.585)
      HTTPClientServer        89.2µs × (0.99,1.02)  90.3µs × (0.98,1.01)  +1.24% (p=0.000)
      JSONEncode              32.3ms × (0.97,1.06)  31.6ms × (0.99,1.01)  -2.29% (p=0.000)
      JSONDecode               106ms × (0.99,1.01)   107ms × (1.00,1.01)  +0.62% (p=0.000)
      Mandelbrot200           6.02ms × (1.00,1.00)  6.03ms × (1.00,1.01)    ~    (p=0.250)
      GoParse                 6.57ms × (0.97,1.06)  6.53ms × (0.99,1.03)    ~    (p=0.243)
      RegexpMatchEasy0_32      162ns × (1.00,1.00)   161ns × (1.00,1.01)  -0.80% (p=0.000)
      RegexpMatchEasy0_1K      561ns × (0.99,1.02)   541ns × (0.99,1.01)  -3.67% (p=0.000)
      RegexpMatchEasy1_32      145ns × (0.95,1.04)   138ns × (1.00,1.00)  -5.04% (p=0.000)
      RegexpMatchEasy1_1K      864ns × (0.99,1.04)   887ns × (0.99,1.01)  +2.57% (p=0.000)
      RegexpMatchMedium_32     255ns × (0.99,1.04)   253ns × (0.99,1.01)  -1.05% (p=0.012)
      RegexpMatchMedium_1K    73.9µs × (0.98,1.04)  72.8µs × (1.00,1.00)  -1.51% (p=0.005)
      RegexpMatchHard_32      3.92µs × (0.98,1.04)  3.85µs × (1.00,1.01)  -1.88% (p=0.002)
      RegexpMatchHard_1K       120µs × (0.98,1.04)   117µs × (1.00,1.01)  -2.02% (p=0.001)
      Revcomp                  936ms × (0.95,1.08)   922ms × (0.97,1.08)    ~    (p=0.234)
      Template                 130ms × (0.98,1.04)   126ms × (0.99,1.01)  -2.99% (p=0.000)
      TimeParse                638ns × (0.98,1.05)   628ns × (0.99,1.01)  -1.54% (p=0.004)
      TimeFormat               674ns × (0.99,1.01)   668ns × (0.99,1.01)  -0.80% (p=0.001)
      
      The slowdown of the first few benchmarks seems to be due to the new
      atomic operations for certain small size allocations. But the larger
      benchmarks mostly improve, probably due to the decreased memory
      pressure from having half as much heap bitmap.
      
      CL 9706, which removes the (never used anymore) wbshadow mode,
      gets back what is lost in the early microbenchmarks.
      
      Change-Id: I37423a209e8ec2a2e92538b45cac5422a6acd32d
      Reviewed-on: https://go-review.googlesource.com/9705Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      54af9a3b
    • Russ Cox's avatar
      runtime: optimize heapBitsSetType · feb8a3b6
      Russ Cox authored
      For the conversion of the heap bitmap from 4-bit to 2-bit fields,
      I replaced heapBitsSetType with the dumbest thing that could possibly work:
      two atomic operations (atomicand8+atomicor8) per 2-bit field.
      
      This CL replaces that code with a proper implementation that
      avoids the atomics whenever possible. Benchmarks vs base CL
      (before the conversion to 2-bit heap bitmap) and vs Go 1.4 below.
      
      Compared to Go 1.4, SetTypePtr (a 1-pointer allocation)
      is 10ns slower because a race against the concurrent GC requires the
      use of an atomicor8 that used to be an ordinary write. This slowdown
      was present even in the base CL.
      
      Compared to both Go 1.4 and base, SetTypeNode8 (a 10-word allocation)
      is 10ns slower because it too needs a new atomic, because with the
      denser representation, the byte on the end of the allocation is now shared
      with the object next to it; this was not true with the 4-bit representation.
      
      Excluding these two (fundamental) slowdowns due to the use of atomics,
      the new code is noticeably faster than both Go 1.4 and the base CL.
      
      The next CL will reintroduce the ``typeDead'' optimization.
      
      Stats are from 5 runs on a MacBookPro10,2 (late 2012 Core i5).
      
      Compared to base CL (** = new atomic)
      name                  old mean              new mean              delta
      SetTypePtr            14.1ns × (0.99,1.02)  14.7ns × (0.93,1.10)     ~    (p=0.175)
      SetTypePtr8           18.4ns × (1.00,1.01)  18.6ns × (0.81,1.21)     ~    (p=0.866)
      SetTypePtr16          28.7ns × (1.00,1.00)  22.4ns × (0.90,1.27)  -21.88% (p=0.015)
      SetTypePtr32          52.3ns × (1.00,1.00)  33.8ns × (0.93,1.24)  -35.37% (p=0.001)
      SetTypePtr64          79.2ns × (1.00,1.00)  55.1ns × (1.00,1.01)  -30.43% (p=0.000)
      SetTypePtr126          118ns × (1.00,1.00)   100ns × (1.00,1.00)  -15.97% (p=0.000)
      SetTypePtr128          130ns × (0.92,1.19)    98ns × (1.00,1.00)  -24.36% (p=0.008)
      SetTypePtrSlice        726ns × (0.96,1.08)   760ns × (1.00,1.00)     ~    (p=0.152)
      SetTypeNode1          14.1ns × (0.94,1.15)  12.0ns × (1.00,1.01)  -14.60% (p=0.020)
      SetTypeNode1Slice      135ns × (0.96,1.07)    88ns × (1.00,1.00)  -34.53% (p=0.000)
      SetTypeNode8          20.9ns × (1.00,1.01)  32.6ns × (1.00,1.00)  +55.37% (p=0.000) **
      SetTypeNode8Slice      414ns × (0.99,1.02)   244ns × (1.00,1.00)  -41.09% (p=0.000)
      SetTypeNode64         80.0ns × (1.00,1.00)  57.4ns × (1.00,1.00)  -28.23% (p=0.000)
      SetTypeNode64Slice    2.15µs × (1.00,1.01)  1.56µs × (1.00,1.00)  -27.43% (p=0.000)
      SetTypeNode124         119ns × (0.99,1.00)   100ns × (1.00,1.00)  -16.11% (p=0.000)
      SetTypeNode124Slice   3.40µs × (1.00,1.00)  2.93µs × (1.00,1.00)  -13.80% (p=0.000)
      SetTypeNode126         120ns × (1.00,1.01)    98ns × (1.00,1.00)  -18.19% (p=0.000)
      SetTypeNode126Slice   3.53µs × (0.98,1.08)  3.02µs × (1.00,1.00)  -14.49% (p=0.002)
      SetTypeNode1024        726ns × (0.97,1.09)   740ns × (1.00,1.00)     ~    (p=0.451)
      SetTypeNode1024Slice  24.9µs × (0.89,1.37)  23.1µs × (1.00,1.00)     ~    (p=0.476)
      
      Compared to Go 1.4 (** = new atomic)
      name                  old mean               new mean              delta
      SetTypePtr            5.71ns × (0.89,1.19)  14.68ns × (0.93,1.10)  +157.24% (p=0.000) **
      SetTypePtr8           19.3ns × (0.96,1.10)   18.6ns × (0.81,1.21)      ~    (p=0.638)
      SetTypePtr16          30.7ns × (0.99,1.03)   22.4ns × (0.90,1.27)   -26.88% (p=0.005)
      SetTypePtr32          51.5ns × (1.00,1.00)   33.8ns × (0.93,1.24)   -34.40% (p=0.001)
      SetTypePtr64          83.6ns × (0.94,1.12)   55.1ns × (1.00,1.01)   -34.12% (p=0.001)
      SetTypePtr126          137ns × (0.87,1.26)    100ns × (1.00,1.00)   -27.10% (p=0.028)
      SetTypePtrSlice        865ns × (0.80,1.23)    760ns × (1.00,1.00)      ~    (p=0.243)
      SetTypeNode1          15.2ns × (0.88,1.12)   12.0ns × (1.00,1.01)   -20.89% (p=0.014)
      SetTypeNode1Slice      156ns × (0.93,1.16)     88ns × (1.00,1.00)   -43.57% (p=0.001)
      SetTypeNode8          23.8ns × (0.90,1.18)   32.6ns × (1.00,1.00)   +36.76% (p=0.003) **
      SetTypeNode8Slice      502ns × (0.92,1.10)    244ns × (1.00,1.00)   -51.46% (p=0.000)
      SetTypeNode64         85.6ns × (0.94,1.11)   57.4ns × (1.00,1.00)   -32.89% (p=0.001)
      SetTypeNode64Slice    2.36µs × (0.91,1.14)   1.56µs × (1.00,1.00)   -33.96% (p=0.002)
      SetTypeNode124         130ns × (0.91,1.12)    100ns × (1.00,1.00)   -23.49% (p=0.004)
      SetTypeNode124Slice   3.81µs × (0.90,1.22)   2.93µs × (1.00,1.00)   -23.09% (p=0.025)
      
      There are fewer benchmarks vs Go 1.4 because unrolling directly
      into the heap bitmap is not yet implemented, so those would not
      be meaningful comparisons.
      
      These benchmarks were not present in Go 1.4 as distributed.
      The backport to Go 1.4 is in github.com/rsc/go's go14bench branch,
      commit 71d5ee5.
      
      Change-Id: I95ed05a22bf484b0fc9efad549279e766c98d2b6
      Reviewed-on: https://go-review.googlesource.com/9704Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      feb8a3b6
    • Russ Cox's avatar
      runtime: use 2-bit heap bitmap (in place of 4-bit) · 0234dfd4
      Russ Cox authored
      Previous CLs changed the representation of the non-heap type bitmaps
      to be 1-bit bitmaps (pointer or not). Before this CL, the heap bitmap
      stored a 2-bit type for each word and a mark bit and checkmark bit
      for the first word of the object. (There used to be additional per-word bits.)
      
      Reduce heap bitmap to 2-bit, with 1 dedicated to pointer or not,
      and the other used for mark, checkmark, and "keep scanning forward
      to find pointers in this object." See comments for details.
      
      This CL replaces heapBitsSetType with very slow but obviously correct code.
      A followup CL will optimize it. (Spoiler: the new code is faster than Go 1.4 was.)
      
      Change-Id: I999577a133f3cfecacebdec9cdc3573c235c7fb9
      Reviewed-on: https://go-review.googlesource.com/9703Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      0234dfd4
    • Russ Cox's avatar
      runtime: use 1-bit pointer bitmaps in type representation · 6d8a147b
      Russ Cox authored
      The type information in reflect.Type and the GC programs is now
      1 bit per word, down from 2 bits.
      
      The in-memory unrolled type bitmap representation are now
      1 bit per word, down from 4 bits.
      
      The conversion from the unrolled (now 1-bit) bitmap to the
      heap bitmap (still 4-bit) is not optimized. A followup CL will
      work on that, after the heap bitmap has been converted to 2-bit.
      
      The typeDead optimization, in which a special value denotes
      that there are no more pointers anywhere in the object, is lost
      in this CL. A followup CL will bring it back in the final form of
      heapBitsSetType.
      
      Change-Id: If61e67950c16a293b0b516a6fd9a1c755b6d5549
      Reviewed-on: https://go-review.googlesource.com/9702Reviewed-by: default avatarAustin Clements <austin@google.com>
      6d8a147b
    • Russ Cox's avatar
      runtime: add benchmark of heapBitsSetType · 7d9e16ab
      Russ Cox authored
      There was an old benchmark that measured this indirectly
      via allocation, but I don't understand how to factor out the
      allocation cost when interpreting the numbers.
      
      Replace with a benchmark that only calls heapBitsSetType,
      that does not allocate. This was not possible when the
      benchmark was first written, because heapBitsSetType had
      not been factored out of mallocgc.
      
      Change-Id: I30f0f02362efab3465a50769398be859832e6640
      Reviewed-on: https://go-review.googlesource.com/9701Reviewed-by: default avatarAustin Clements <austin@google.com>
      7d9e16ab
    • Daniel Morsing's avatar
      runtime: enable profiling on g0 · db6f88a8
      Daniel Morsing authored
      Since we now have stack information for code running on the
      systemstack, we can traceback over it. To make cpu profiles useful,
      add a case in gentraceback to jump over systemstack switches.
      
      Fixes #10609.
      
      Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7
      Reviewed-on: https://go-review.googlesource.com/9506Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      db6f88a8
    • Patrick Mezard's avatar
      internal/syscall/windows/registry: handle invalid integer values · 19e81a9b
      Patrick Mezard authored
      I have around twenty of such values on a Windows 7 development machine.
      regedit displays (translated): "invalid 32-bits DWORD value".
      
      Change-Id: Ib37a414ee4c85e891b0a25fed2ddad9e105f5f4e
      Reviewed-on: https://go-review.googlesource.com/9901Reviewed-by: default avatarAlex Brainman <alex.brainman@gmail.com>
      19e81a9b
    • Shenghou Ma's avatar
      misc/trace: add license for the trace-viewer · dce432b3
      Shenghou Ma authored
      The trace-viewer doesn't use the Go license, so it makes sense
      to include the license text into the README.md file.
      
      While we're at here, reformat existing text using real Markdown
      syntax.
      
      Change-Id: I13e42d3cc6a0ca7e64e3d46ad460dc0460f7ed09
      Reviewed-on: https://go-review.googlesource.com/9882Reviewed-by: default avatarRob Pike <r@golang.org>
      dce432b3
    • Mikio Hara's avatar
      net: increase timeout in TestWriteTimeoutFluctuation on darwin/arm · cbcc7584
      Mikio Hara authored
      On darwin/arm, the test sometimes fails with:
      
      Process 557 resuming
      --- FAIL: TestWriteTimeoutFluctuation (1.64s)
      	timeout_test.go:706: Write took over 1s; expected 0.1s
      FAIL
      Process 557 exited with status = 1 (0x00000001)
      go_darwin_arm_exec: timeout running tests
      
      This change increaes timeout on iOS builders from 1s to 3s as a
      temporarily fix.
      
      Updates #10775.
      
      Change-Id: Ifdaf99cf5b8582c1a636a0f7d5cc66bb276efd72
      Reviewed-on: https://go-review.googlesource.com/9915Reviewed-by: default avatarMinux Ma <minux@golang.org>
      cbcc7584
  2. 10 May, 2015 2 commits
  3. 09 May, 2015 3 commits
  4. 08 May, 2015 6 commits
  5. 07 May, 2015 9 commits