1. 24 Apr, 2015 20 commits
    • Austin Clements's avatar
      runtime: replace per-M workbuf cache with per-P gcWork cache · 1b4025f4
      Austin Clements authored
      Currently, each M has a cache of the most recently used *workbuf. This
      is used primarily by the write barrier so it doesn't have to access
      the global workbuf lists on every write barrier. It's also used by
      stack scanning because it's convenient.
      
      This cache is important for write barrier performance, but this
      particular approach has several downsides. It's faster than no cache,
      but far from optimal (as the benchmarks below show). It's complex:
      access to the cache is sprinkled through most of the workbuf list
      operations and it requires special care to transform into and back out
      of the gcWork cache that's actually used for scanning and marking. It
      requires atomic exchanges to take ownership of the cached workbuf and
      to return it to the M's cache even though it's almost always used by
      only the current M. Since it's per-M, flushing these caches is O(# of
      Ms), which may be high. And it has some significant subtleties: for
      example, in general the cache shouldn't be used after the
      harvestwbufs() in mark termination because it could hide work from
      mark termination, but stack scanning can happen after this and *will*
      use the cache (but it turns out this is okay because it will always be
      followed by a getfull(), which drains the cache).
      
      This change replaces this cache with a per-P gcWork object. This
      gcWork cache can be used directly by scanning and marking (as long as
      preemption is disabled, which is a general requirement of gcWork).
      Since it's per-P, it doesn't require synchronization, which simplifies
      things and means the only atomic operations in the write barrier are
      occasionally fetching new work buffers and setting a mark bit if the
      object isn't already marked. This cache can be flushed in O(# of Ps),
      which is generally small. It follows a simple flushing rule: the cache
      can be used during any phase, but during mark termination it must be
      flushed before allowing preemption. This also makes the dispose during
      mutator assist no longer necessary, which eliminates the vast majority
      of gcWork dispose calls and reduces contention on the global workbuf
      lists. And it's a lot faster on some benchmarks:
      
      benchmark                          old ns/op       new ns/op       delta
      BenchmarkBinaryTree17              11963668673     11206112763     -6.33%
      BenchmarkFannkuch11                2643217136      2649182499      +0.23%
      BenchmarkFmtFprintfEmpty           70.4            70.2            -0.28%
      BenchmarkFmtFprintfString          364             307             -15.66%
      BenchmarkFmtFprintfInt             317             282             -11.04%
      BenchmarkFmtFprintfIntInt          512             483             -5.66%
      BenchmarkFmtFprintfPrefixedInt     404             380             -5.94%
      BenchmarkFmtFprintfFloat           521             479             -8.06%
      BenchmarkFmtManyArgs               2164            1894            -12.48%
      BenchmarkGobDecode                 30366146        22429593        -26.14%
      BenchmarkGobEncode                 29867472        26663152        -10.73%
      BenchmarkGzip                      391236616       396779490       +1.42%
      BenchmarkGunzip                    96639491        96297024        -0.35%
      BenchmarkHTTPClientServer          100110          70763           -29.31%
      BenchmarkJSONEncode                51866051        52511382        +1.24%
      BenchmarkJSONDecode                103813138       86094963        -17.07%
      BenchmarkMandelbrot200             4121834         4120886         -0.02%
      BenchmarkGoParse                   16472789        5879949         -64.31%
      BenchmarkRegexpMatchEasy0_32       140             140             +0.00%
      BenchmarkRegexpMatchEasy0_1K       394             394             +0.00%
      BenchmarkRegexpMatchEasy1_32       120             120             +0.00%
      BenchmarkRegexpMatchEasy1_1K       621             614             -1.13%
      BenchmarkRegexpMatchMedium_32      209             202             -3.35%
      BenchmarkRegexpMatchMedium_1K      54889           55175           +0.52%
      BenchmarkRegexpMatchHard_32        2682            2675            -0.26%
      BenchmarkRegexpMatchHard_1K        79383           79524           +0.18%
      BenchmarkRevcomp                   584116718       584595320       +0.08%
      BenchmarkTemplate                  125400565       109620196       -12.58%
      BenchmarkTimeParse                 386             387             +0.26%
      BenchmarkTimeFormat                580             447             -22.93%
      
      (Best out of 10 runs. The delta of averages is similar.)
      
      This also puts us in a good position to flush these caches when
      nearing the end of concurrent marking, which will let us increase the
      size of the work buffers while still controlling mark termination
      pause time.
      
      Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522
      Reviewed-on: https://go-review.googlesource.com/9178
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      1b4025f4
    • Austin Clements's avatar
      runtime: fix check for pending GC work · d1cae635
      Austin Clements authored
      When findRunnable considers running a fractional mark worker, it first
      checks if there's any work to be done; if there isn't there's no point
      in running the worker because it will just reschedule immediately.
      However, currently findRunnable just checks work.full and
      work.partial, whereas getfull can *also* draw work from m.currentwbuf.
      As a result, findRunnable may not start a worker even though there
      actually is work.
      
      This problem manifests itself in occasional failures of the
      test/init1.go test. This test is unusual because it performs a large
      amount of allocation without executing any write barriers, which means
      there's nothing to force the pointers in currentwbuf out to the
      work.partial/full lists where findRunnable can see them.
      
      This change fixes this problem by making findRunnable also check for a
      currentwbuf. This aligns findRunnable with trygetfull's notion of
      whether or not there's work.
      
      Change-Id: Ic76d22b7b5d040bc4f58a6b5975e9217650e66c4
      Reviewed-on: https://go-review.googlesource.com/9299Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      d1cae635
    • Austin Clements's avatar
      runtime: start dedicated mark workers even if there's no work · 26eac917
      Austin Clements authored
      Currently, findRunnable only considers running a mark worker if
      there's work in the work queue. In principle, this can delay the start
      of the desired number of dedicated mark workers if there's no work
      pending. This is unlikely to occur in practice, since there should be
      work queued from the scan phase, but if it were to come up, a CPU hog
      mutator could slow down or delay garbage collection.
      
      This check makes sense for fractional mark workers, since they'll just
      return to the scheduler immediately if there's no work, but we want
      the scheduler to start all of the dedicated mark workers promptly,
      even if there's currently no queued work. Hence, this change moves the
      pending work check after the check for starting a dedicated worker.
      
      Change-Id: I52b851cc9e41f508a0955b3f905ca80f109ea101
      Reviewed-on: https://go-review.googlesource.com/9298Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      26eac917
    • Austin Clements's avatar
      runtime: fix some out-of-date comments · 711a1642
      Austin Clements authored
      bgMarkCount no longer exists.
      
      Change-Id: I3aa406fdccfca659814da311229afbae55af8304
      Reviewed-on: https://go-review.googlesource.com/9297Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      711a1642
    • Hyang-Ah Hana Kim's avatar
      misc/cgo/testcshared: make test.bash resilient against noise. · e9a89b80
      Hyang-Ah Hana Kim authored
      Instead of comparing against the entire output that may include
      verbose warning messages, use the last line of the output and check
      it includes the expected success message (PASS).
      
      Change-Id: Iafd583ee5529a8aef5439b9f1f6ce0185e4b1331
      Reviewed-on: https://go-review.googlesource.com/9304Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      e9a89b80
    • Rob Pike's avatar
      cmd/go: rename doc.go to alldocs.go in preparation for "go doc" · b3000b6f
      Rob Pike authored
      Also rename and update mkdoc.sh to mkalldocs.sh
      
      Change-Id: Ief3673c22d45624e173fc65ee279cea324da03b5
      Reviewed-on: https://go-review.googlesource.com/9226Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      b3000b6f
    • Srdjan Petrovic's avatar
      runtime: implement xadduintptr and update system mstats using it · 6ad33be2
      Srdjan Petrovic authored
      The motivation is that sysAlloc/Free() currently aren't safe to be
      called without a valid G, because arm's xadd64() uses locks that require
      a valid G.
      
      The solution here was proposed by Dmitry Vyukov: use xadduintptr()
      instead of xadd64(), until arm can support xadd64 on all of its
      architectures (not a trivial task for arm).
      
      Change-Id: I250252079357ea2e4360e1235958b1c22051498f
      Reviewed-on: https://go-review.googlesource.com/9002Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      6ad33be2
    • Hyang-Ah Hana Kim's avatar
      misc/cgo/testcshared: add a c-shared test for android/arm. · 85669799
      Hyang-Ah Hana Kim authored
      - main3.c tests main.main is exported when compiled for GOOS=android.
      - wait longer for main2.c (it's slow on android/arm)
      - rearranged test.bash
      
      Fixes #10070.
      
      Change-Id: I6e5a98d1c5fae776afa54ecb5da633b59b269316
      Reviewed-on: https://go-review.googlesource.com/9296Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
      85669799
    • Michael Hudson-Doyle's avatar
      cmd/internal/gc, cmd/internal/ld, cmd/internal/obj: teach compiler about local symbols · 029c7bbd
      Michael Hudson-Doyle authored
      This lets us avoid loading string constants via the GOT and (together with
      http://golang.org/cl/9102) results in the fannkuch benchmark having very similar
      register usage with -dynlink as without.
      
      Change-Id: Ic3892b399074982b76773c3e547cfbba5dabb6f9
      Reviewed-on: https://go-review.googlesource.com/9103Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      029c7bbd
    • Austin Clements's avatar
      runtime: simplify process for starting GC goroutine · 0e6a6c51
      Austin Clements authored
      Currently, when allocation reaches the GC trigger, the runtime uses
      readyExecute to start the GC goroutine immediately rather than wait
      for the scheduler to get around to the GC goroutine while the mutator
      continues to grow the heap.
      
      Now that the scheduler runs the most recently readied goroutine when a
      goroutine yields its time slice, this rigmarole is no longer
      necessary. The runtime can simply ready the GC goroutine and yield
      from the readying goroutine.
      
      Change-Id: I3b4ebadd2a72a923b1389f7598f82973dd5c8710
      Reviewed-on: https://go-review.googlesource.com/9292Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      0e6a6c51
    • Austin Clements's avatar
      runtime: use park/ready to wake up GC at end of concurrent mark · ce502b06
      Austin Clements authored
      Currently, the main GC goroutine sleeps on a note during concurrent
      mark and the first background mark worker or assist to finish marking
      use wakes up that note to let the main goroutine proceed into mark
      termination. Unfortunately, the latency of this wakeup can be quite
      high, since the GC goroutine will typically have lost its P while in
      the futex sleep, meaning it will be placed on the global run queue and
      will wait there until some P is kind enough to pick it up. This delay
      gives the mutator more time to allocate and create floating garbage,
      growing the heap unnecessarily. Worse, it's likely that background
      marking has stopped at this point (unless GOMAXPROCS>4), so anything
      that's allocated and published to the heap during this window will
      have to be scanned during mark termination while the world is stopped.
      
      This change replaces the note sleep/wakeup with a gopark/ready
      scheme. This keeps the wakeup inside the Go scheduler and lets the
      garbage collector take advantage of the new scheduler semantics that
      run the ready()d goroutine immediately when the ready()ing goroutine
      sleeps.
      
      For the json benchmark from x/benchmarks with GOMAXPROCS=4, this
      reduces the delay in waking up the GC goroutine and entering mark
      termination once concurrent marking is done from ~100ms to typically
      <100µs.
      
      Change-Id: Ib11f8b581b8914f2d68e0094f121e49bac3bb384
      Reviewed-on: https://go-review.googlesource.com/9291Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      ce502b06
    • Austin Clements's avatar
      runtime: use timer for GC control revise rather than timeout · 4e32718d
      Austin Clements authored
      Currently, we use a note sleep with a timeout in a loop in func gc to
      periodically revise the GC control variables. Replace this with a
      fully blocking note sleep and use a periodic timer to trigger the
      revise instead. This is a step toward replacing the note sleep in func
      gc.
      
      Change-Id: I2d562f6b9b2e5f0c28e9a54227e2c0f8a2603f63
      Reviewed-on: https://go-review.googlesource.com/9290Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      4e32718d
    • Austin Clements's avatar
      runtime: yield time slice to most recently readied G · e870f06c
      Austin Clements authored
      Currently, when the runtime ready()s a G, it adds it to the end of the
      current P's run queue and continues running. If there are many other
      things in the run queue, this can result in a significant delay before
      the ready()d G actually runs and can hurt fairness when other Gs in
      the run queue are CPU hogs. For example, if there are three Gs sharing
      a P, one of which is a CPU hog that never voluntarily gives up the P
      and the other two of which are doing small amounts of work and
      communicating back and forth on an unbuffered channel, the two
      communicating Gs will get very little CPU time.
      
      Change this so that when G1 ready()s G2 and then blocks, the scheduler
      immediately hands off the remainder of G1's time slice to G2. In the
      above example, the two communicating Gs will now act as a unit and
      together get half of the CPU time, while the CPU hog gets the other
      half of the CPU time.
      
      This fixes the problem demonstrated by the ping-pong benchmark added
      in the previous commit:
      
      benchmark                old ns/op     new ns/op     delta
      BenchmarkPingPongHog     684287        825           -99.88%
      
      On the x/benchmarks suite, this change improves the performance of
      garbage by ~6% (for GOMAXPROCS=1 and 4), and json by 28% and 36% for
      GOMAXPROCS=1 and 4. It has negligible effect on heap size.
      
      This has no effect on the go1 benchmark suite since those benchmarks
      are mostly single-threaded.
      
      Change-Id: I858a08eaa78f702ea98a5fac99d28a4ac91d339f
      Reviewed-on: https://go-review.googlesource.com/9289Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      e870f06c
    • Austin Clements's avatar
      runtime: benchmark for ping-pong in the presence of a CPU hog · da0e37fa
      Austin Clements authored
      This benchmark demonstrates a current problem with the scheduler where
      a set of frequently communicating goroutines get very little CPU time
      in the presence of another goroutine that hogs that CPU, even if one
      of those communicating goroutines is always runnable.
      
      Currently it takes about 0.5 milliseconds to switch between
      ping-ponging goroutines in the presence of a CPU hog:
      
      BenchmarkPingPongHog	    2000	    684287 ns/op
      
      Change-Id: I278848c84f778de32344921ae8a4a8056e4898b0
      Reviewed-on: https://go-review.googlesource.com/9288Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      da0e37fa
    • Austin Clements's avatar
      runtime: factor checking if P run queue is empty · e5e52f4f
      Austin Clements authored
      There are a variety of places where we check if a P's run queue is
      empty. This test is about to get slightly more complicated, so factor
      it out into a new function, runqempty. This function is inlinable, so
      this has no effect on performance.
      
      Change-Id: If4a0b01ffbd004937de90d8d686f6ded4aad2c6b
      Reviewed-on: https://go-review.googlesource.com/9287Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      e5e52f4f
    • Russ Cox's avatar
      cmd/internal/gc: add and test write barrier debug output · 9406f68e
      Russ Cox authored
      We can expand the test cases as we discover problems.
      This is some basic tests plus all the things I got wrong
      in some recent work.
      
      Change-Id: Id875fcfaf74eb087ae42b441fe47a34c5b8ccb39
      Reviewed-on: https://go-review.googlesource.com/9158Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      9406f68e
    • Aamir Khan's avatar
      hash/crc32: clarify documentation · 80f575b7
      Aamir Khan authored
      Explicitly specify that we represent polynomial in reversed notation
      
      Fixes #8229
      
      Change-Id: Idf094c01fd82f133cd0c1b50fa967d12c577bdb5
      Reviewed-on: https://go-review.googlesource.com/9237Reviewed-by: default avatarDavid Chase <drchase@google.com>
      80f575b7
    • Shenghou Ma's avatar
      cmd/dist: allow $GO_TEST_TIMEOUT_SCALE to override timeoutScale · 7579867f
      Shenghou Ma authored
      Some machines are so slow that even with the default timeoutScale,
      they still timeout some tests. For example, currently some linux/arm
      builders and the openbsd/arm builder are timing out the runtime
      test and CL 8397 was proposed to skip some tests on openbsd/arm
      to fix the build.
      
      Instead of increasing timeoutScale or skipping tests, this CL
      introduces an environment variable $GO_TEST_TIMEOUT_SCALE that
      could be set to manually set a larger timeoutScale for those
      machines/builders.
      
      Fixes #10314.
      
      Change-Id: I16c9a9eb980d6a63309e4cacd79eee2fe05769ee
      Reviewed-on: https://go-review.googlesource.com/9223Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      7579867f
    • Srdjan Petrovic's avatar
      runtime: signal forwarding · 5c8fbc6f
      Srdjan Petrovic authored
      Forward signals to signal handlers installed before Go installs its own,
      under certain circumstances.  In particular, as iant@ suggests, signals are
      forwarded iff:
         (1) a non-SIG_DFL signal handler existed before Go, and
         (2) signal is synchronous (i.e., one of SIGSEGV, SIGBUS, SIGFPE), and
         	(3a) signal occured on a non-Go thread, or
         	(3b) signal occurred on a Go thread but in CGo code.
      
      Supported only on Linux, for now.
      
      Change-Id: I403219ee47b26cf65da819fb86cf1ec04d3e25f5
      Reviewed-on: https://go-review.googlesource.com/8712Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      5c8fbc6f
    • Egon Elbre's avatar
      encoding/base64: Optimize EncodeToString and DecodeString. · b075d1fc
      Egon Elbre authored
      benchmark                   old ns/op     new ns/op     delta
      BenchmarkEncodeToString     31281         23821         -23.85%
      BenchmarkDecodeString       156508        82254         -47.44%
      
      benchmark                   old MB/s     new MB/s     speedup
      BenchmarkEncodeToString     261.88       343.89       1.31x
      BenchmarkDecodeString       69.80        132.81       1.90x
      
      Change-Id: I115e0b18c3a6d5ef6bfdcb3f637644f02f290907
      Reviewed-on: https://go-review.googlesource.com/8808Reviewed-by: default avatarNigel Tao <nigeltao@golang.org>
      b075d1fc
  2. 23 Apr, 2015 11 commits
    • Josh Bleecher Snyder's avatar
      cmd/9g, etc: remove // fallthrough comments · 04829a41
      Josh Bleecher Snyder authored
      They are vestiges of the c2go transition.
      
      Change-Id: I22672e40373ef77d7a0bf69cfff8017e46353055
      Reviewed-on: https://go-review.googlesource.com/9265Reviewed-by: default avatarMinux Ma <minux@golang.org>
      04829a41
    • Josh Bleecher Snyder's avatar
      math/big: add partial arm64 assembly support · 56a7c5b9
      Josh Bleecher Snyder authored
      benchmark                       old ns/op      new ns/op      delta
      BenchmarkAddVV_1                18.7           14.8           -20.86%
      BenchmarkAddVV_2                21.8           16.6           -23.85%
      BenchmarkAddVV_3                26.1           17.1           -34.48%
      BenchmarkAddVV_4                30.4           21.9           -27.96%
      BenchmarkAddVV_5                35.5           19.8           -44.23%
      BenchmarkAddVV_1e1              63.0           28.3           -55.08%
      BenchmarkAddVV_1e2              593            178            -69.98%
      BenchmarkAddVV_1e3              5691           1490           -73.82%
      BenchmarkAddVV_1e4              56868          20761          -63.49%
      BenchmarkAddVV_1e5              569062         207679         -63.51%
      BenchmarkAddVW_1                15.8           12.6           -20.25%
      BenchmarkAddVW_2                17.8           13.1           -26.40%
      BenchmarkAddVW_3                21.2           13.9           -34.43%
      BenchmarkAddVW_4                23.6           14.7           -37.71%
      BenchmarkAddVW_5                26.0           15.8           -39.23%
      BenchmarkAddVW_1e1              41.3           21.6           -47.70%
      BenchmarkAddVW_1e2              383            145            -62.14%
      BenchmarkAddVW_1e3              3703           1264           -65.87%
      BenchmarkAddVW_1e4              36920          14359          -61.11%
      BenchmarkAddVW_1e5              370345         143046         -61.37%
      BenchmarkAddMulVVW_1            33.2           32.5           -2.11%
      BenchmarkAddMulVVW_2            58.0           57.2           -1.38%
      BenchmarkAddMulVVW_3            95.2           93.9           -1.37%
      BenchmarkAddMulVVW_4            108            106            -1.85%
      BenchmarkAddMulVVW_5            159            156            -1.89%
      BenchmarkAddMulVVW_1e1          344            340            -1.16%
      BenchmarkAddMulVVW_1e2          3644           3624           -0.55%
      BenchmarkAddMulVVW_1e3          37344          37208          -0.36%
      BenchmarkAddMulVVW_1e4          373295         372170         -0.30%
      BenchmarkAddMulVVW_1e5          3438116        3425606        -0.36%
      BenchmarkBitLen0                7.21           4.32           -40.08%
      BenchmarkBitLen1                6.49           4.32           -33.44%
      BenchmarkBitLen2                7.23           4.32           -40.25%
      BenchmarkBitLen3                6.49           4.32           -33.44%
      BenchmarkBitLen4                7.22           4.32           -40.17%
      BenchmarkBitLen5                6.52           4.33           -33.59%
      BenchmarkBitLen8                7.22           4.32           -40.17%
      BenchmarkBitLen9                6.49           4.32           -33.44%
      BenchmarkBitLen16               8.66           4.32           -50.12%
      BenchmarkBitLen17               7.95           4.32           -45.66%
      BenchmarkBitLen31               8.69           4.32           -50.29%
      BenchmarkGCD10x10               5021           5033           +0.24%
      BenchmarkGCD10x100              5571           5572           +0.02%
      BenchmarkGCD10x1000             6707           6729           +0.33%
      BenchmarkGCD10x10000            13526          13419          -0.79%
      BenchmarkGCD10x100000           85668          83242          -2.83%
      BenchmarkGCD100x100             24196          23936          -1.07%
      BenchmarkGCD100x1000            28802          27309          -5.18%
      BenchmarkGCD100x10000           64111          51704          -19.35%
      BenchmarkGCD100x100000          385840         274385         -28.89%
      BenchmarkGCD1000x1000           262892         236269         -10.13%
      BenchmarkGCD1000x10000          371393         277883         -25.18%
      BenchmarkGCD1000x100000         1311795        589055         -55.10%
      BenchmarkGCD10000x10000         9596740        6123930        -36.19%
      BenchmarkGCD10000x100000        16404000       7269610        -55.68%
      BenchmarkGCD100000x100000       776660000      419270000      -46.02%
      BenchmarkHilbert                13478980       13402270       -0.57%
      BenchmarkBinomial               9802           9440           -3.69%
      BenchmarkBitset                 142            142            +0.00%
      BenchmarkBitsetNeg              328            279            -14.94%
      BenchmarkBitsetOrig             853            861            +0.94%
      BenchmarkBitsetNegOrig          1489           1444           -3.02%
      BenchmarkMul                    420949000      410481000      -2.49%
      BenchmarkExp3Power0x10          1148           1229           +7.06%
      BenchmarkExp3Power0x40          1322           1376           +4.08%
      BenchmarkExp3Power0x100         2437           2486           +2.01%
      BenchmarkExp3Power0x400         9456           9346           -1.16%
      BenchmarkExp3Power0x1000        113623         108701         -4.33%
      BenchmarkExp3Power0x4000        1134933        1101481        -2.95%
      BenchmarkExp3Power0x10000       10773570       10396160       -3.50%
      BenchmarkExp3Power0x40000       101362100      97788300       -3.53%
      BenchmarkExp3Power0x100000      921114000      885249000      -3.89%
      BenchmarkExp3Power0x400000      8323094000     7969020000     -4.25%
      BenchmarkFibo                   322021600      92554450       -71.26%
      BenchmarkScanPi                 1264583        321065         -74.61%
      BenchmarkStringPiParallel       1644661        554216         -66.30%
      BenchmarkScan10Base2            1111           1080           -2.79%
      BenchmarkScan100Base2           6645           6345           -4.51%
      BenchmarkScan1000Base2          84084          62405          -25.78%
      BenchmarkScan10000Base2         3105998        932551         -69.98%
      BenchmarkScan100000Base2        257234800      40113333       -84.41%
      BenchmarkScan10Base8            571            573            +0.35%
      BenchmarkScan100Base8           2810           2543           -9.50%
      BenchmarkScan1000Base8          47383          25834          -45.48%
      BenchmarkScan10000Base8         2739518        567203         -79.30%
      BenchmarkScan100000Base8        253952400      36495680       -85.63%
      BenchmarkScan10Base10           553            556            +0.54%
      BenchmarkScan100Base10          2640           2385           -9.66%
      BenchmarkScan1000Base10         50865          24049          -52.72%
      BenchmarkScan10000Base10        3279916        549313         -83.25%
      BenchmarkScan100000Base10       309121000      36213140       -88.29%
      BenchmarkScan10Base16           478            483            +1.05%
      BenchmarkScan100Base16          2353           2144           -8.88%
      BenchmarkScan1000Base16         48091          24246          -49.58%
      BenchmarkScan10000Base16        2858886        586475         -79.49%
      BenchmarkScan100000Base16       266320000      38190500       -85.66%
      BenchmarkString10Base2          736            730            -0.82%
      BenchmarkString100Base2         2695           2707           +0.45%
      BenchmarkString1000Base2        20549          20388          -0.78%
      BenchmarkString10000Base2       212638         210782         -0.87%
      BenchmarkString100000Base2      1944963        1938033        -0.36%
      BenchmarkString10Base8          524            517            -1.34%
      BenchmarkString100Base8         1326           1320           -0.45%
      BenchmarkString1000Base8        8213           8249           +0.44%
      BenchmarkString10000Base8       72204          72092          -0.16%
      BenchmarkString100000Base8      769068         765993         -0.40%
      BenchmarkString10Base10         1018           982            -3.54%
      BenchmarkString100Base10        3485           3206           -8.01%
      BenchmarkString1000Base10       37102          18935          -48.97%
      BenchmarkString10000Base10      188633         88637          -53.01%
      BenchmarkString100000Base10     124490300      19700940       -84.17%
      BenchmarkString10Base16         509            502            -1.38%
      BenchmarkString100Base16        1084           1098           +1.29%
      BenchmarkString1000Base16       5641           5650           +0.16%
      BenchmarkString10000Base16      46900          46745          -0.33%
      BenchmarkString100000Base16     508957         505840         -0.61%
      BenchmarkLeafSize0              8934320        8149465        -8.78%
      BenchmarkLeafSize1              237666         118381         -50.19%
      BenchmarkLeafSize2              237807         117854         -50.44%
      BenchmarkLeafSize3              1688640        353494         -79.07%
      BenchmarkLeafSize4              235676         116196         -50.70%
      BenchmarkLeafSize5              2121896        430325         -79.72%
      BenchmarkLeafSize6              1682306        351775         -79.09%
      BenchmarkLeafSize7              1051847        251436         -76.10%
      BenchmarkLeafSize8              232697         115674         -50.29%
      BenchmarkLeafSize9              2403616        488443         -79.68%
      BenchmarkLeafSize10             2120975        429545         -79.75%
      BenchmarkLeafSize11             2023789        426525         -78.92%
      BenchmarkLeafSize12             1684830        351985         -79.11%
      BenchmarkLeafSize13             1465529        337906         -76.94%
      BenchmarkLeafSize14             1050498        253872         -75.83%
      BenchmarkLeafSize15             683228         197384         -71.11%
      BenchmarkLeafSize16             232496         116026         -50.10%
      BenchmarkLeafSize32             245841         126671         -48.47%
      BenchmarkLeafSize64             301728         190285         -36.93%
      
      Change-Id: I63e63297896d96b89c9a275b893c2b405a7e105d
      Reviewed-on: https://go-review.googlesource.com/9260Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      56a7c5b9
    • Srdjan Petrovic's avatar
      runtime: deflake TestNewOSProc0, fix _rt0_amd64_linux_lib stack alignment · 1f65c9c1
      Srdjan Petrovic authored
      This addresses iant's comments from CL 9164.
      
      Change-Id: I7b5b282f61b11aab587402c2d302697e76666376
      Reviewed-on: https://go-review.googlesource.com/9222Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      1f65c9c1
    • Austin Clements's avatar
      runtime: fix underflow in next_gc calculation · ed09e0e2
      Austin Clements authored
      Currently, it's possible for the next_gc calculation to underflow.
      Since next_gc is unsigned, this wraps around and effectively disables
      GC for the rest of the program's execution. Besides being obviously
      wrong, this is causing test failures on 32-bit because some tests are
      running out of heap.
      
      This underflow happens for two reasons, both having to do with how we
      estimate the reachable heap size at the end of the GC cycle.
      
      One reason is that this calculation depends on the value of heap_live
      at the beginning of the GC cycle, but we currently only record that
      value during a concurrent GC and not during a forced STW GC. Fix this
      by moving the recorded value from gcController to work and recording
      it on a common code path.
      
      The other reason is that we use the amount of allocation during the GC
      cycle as an approximation of the amount of floating garbage and
      subtract it from the marked heap to estimate the reachable heap.
      However, since this is only an approximation, it's possible for the
      amount of allocation during the cycle to be *larger* than the marked
      heap size (since the runtime allocates white and it's possible for
      these allocations to never be made reachable from the heap). Currently
      this causes wrap-around in our estimate of the reachable heap size,
      which in turn causes wrap-around in next_gc. Fix this by bottoming out
      the reachable heap estimate at 0, in which case we just fall back to
      triggering GC at heapminimum (which is okay since this only happens on
      small heaps).
      
      Fixes #10555, fixes #10556, and fixes #10559.
      
      Change-Id: Iad07b529c03772356fede2ae557732f13ebfdb63
      Reviewed-on: https://go-review.googlesource.com/9286
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      ed09e0e2
    • Rick Hudson's avatar
      runtime: Improve scanning performance · 77f56af0
      Rick Hudson authored
      To achieve a 2% improvement in the garbage benchmark this CL removes
      an unneeded assert and avoids one hbits.next() call per object
      being scanned.
      
      Change-Id: Ibd542d01e9c23eace42228886f9edc488354df0d
      Reviewed-on: https://go-review.googlesource.com/9244Reviewed-by: default avatarAustin Clements <austin@google.com>
      77f56af0
    • Hyang-Ah Hana Kim's avatar
      runtime: disable TestNewOSProc0 on android/arm. · aef54d40
      Hyang-Ah Hana Kim authored
      newosproc0 does not work on android/arm.
      See issue #10548.
      
      Change-Id: Ieaf6f5d0b77cddf5bf0b6c89fd12b1c1b8723f9b
      Reviewed-on: https://go-review.googlesource.com/9293Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      aef54d40
    • Nigel Tao's avatar
      image/png: don't silently swallow io.ReadFull's io.EOF error when it · ba8fa0e1
      Nigel Tao authored
      lands exactly on an IDAT row boundary.
      
      Fixes #10493
      
      Change-Id: I12be7c5bdcde7032e17ed1d4400db5f17c72bc87
      Reviewed-on: https://go-review.googlesource.com/9270Reviewed-by: default avatarRob Pike <r@golang.org>
      ba8fa0e1
    • Dmitry Savintsev's avatar
      doc/faq: replace reference to goven with gomvpkg · 133966d3
      Dmitry Savintsev authored
      github.com/kr/goven says it's deprecated and anyway
      it would be preferable to point users to a standard Go tool.
      
      Change-Id: Iac4a0d13233604a36538748d498f5770b2afce19
      Reviewed-on: https://go-review.googlesource.com/8969Reviewed-by: default avatarMinux Ma <minux@golang.org>
      133966d3
    • Brad Fitzpatrick's avatar
      net: use Go's DNS resolver when system configuration permits · 4a0ba7aa
      Brad Fitzpatrick authored
      If the machine's network configuration files (resolv.conf,
      nsswitch.conf) don't have any unsupported options, prefer Go's DNS
      resolver, which doesn't have the cgo & thread over.
      
      It means users can have more than 500 DNS requests outstanding (our
      current limit for cgo lookups) and not have one blocked thread per
      outstanding request.
      
      Discussed in thread https://groups.google.com/d/msg/golang-dev/2ZUi792oztM/Q0rg_DkF5HMJ
      
      Change-Id: I3f685d70aff6b47bec30b63e9fba674b20507f95
      Reviewed-on: https://go-review.googlesource.com/8945Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      4a0ba7aa
    • Josh Bleecher Snyder's avatar
      cmd/internal/gc: remove /*untyped*/ comments · c2312280
      Josh Bleecher Snyder authored
      They are vestiges of the c2go translation.
      
      Change-Id: I9a10536f5986b751a35cc7d84b5ba69ae0c2ede7
      Reviewed-on: https://go-review.googlesource.com/9262Reviewed-by: default avatarMinux Ma <minux@golang.org>
      c2312280
    • Nigel Tao's avatar
      image/jpeg: have the LargeImageWithShortData test only allocate 64 MiB, not 604 · 5e9ab665
      Nigel Tao authored
      MiB.
      
      Fixes #10531
      
      Change-Id: I9eece86837c3df2b1f7df315d5ec94bd3ede3eec
      Reviewed-on: https://go-review.googlesource.com/9238
      Run-TryBot: Nigel Tao <nigeltao@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      5e9ab665
  3. 22 Apr, 2015 9 commits
    • Shenghou Ma's avatar
      runtime: fix build after CL 9164 on Linux · edc53e1f
      Shenghou Ma authored
      There is an assumption that the function executed in child thread
      created by runtime.close should not return. And different systems
      enforce that differently: some exit that thread, some exit the
      whole process.
      
      The test TestNewOSProc0 introduced in CL 9161 breaks that assumption,
      so we need to adjust the code to only exit the thread should the
      called function return.
      
      Change-Id: Id631cb2f02ec6fbd765508377a79f3f96c6a2ed6
      Reviewed-on: https://go-review.googlesource.com/9246Reviewed-by: default avatarDave Cheney <dave@cheney.net>
      edc53e1f
    • Shenghou Ma's avatar
      log/syslog: make the BUG notes visible on golang.org · 43618e62
      Shenghou Ma authored
      It was only visible when you run godoc with explicit GOOS=windows,
      which is less useful for people developing portable application on
      non-windows platforms.
      
      Also added a note that log/syslog is not supported on NaCl.
      
      Change-Id: I81650445fb2a5ee161da7e0608c3d3547d5ac2a6
      Reviewed-on: https://go-review.googlesource.com/9245Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      43618e62
    • Michael Hudson-Doyle's avatar
      cmd/link, cmd/internal/goobj: update constants, regenerate testdata · 68f55700
      Michael Hudson-Doyle authored
      The constants in cmd/internal/goobj had gone stale (we had three copies of
      these constants, working on reducing that was what got me to noticing this).
      
      Some of the changes to link.hello.darwin.amd64 are the change from absolute
      to %rip-relative addressing, a change which happened quite a while ago...
      
      Depends on http://golang.org/cl/9113.
      
      Fixes #10501.
      
      Change-Id: Iaa1511f458a32228c2df2ccd0076bb9ae212a035
      Reviewed-on: https://go-review.googlesource.com/9105Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      68f55700
    • Austin Clements's avatar
      runtime: use reachable heap estimate to set trigger/goal · 4655aadd
      Austin Clements authored
      Currently, we set the heap goal for the next GC cycle using the size
      of the marked heap at the end of the current cycle. This can lead to a
      bad feedback loop if the mutator is rapidly allocating and releasing
      pointers that can significantly bloat heap size.
      
      If the GC were STW, the marked heap size would be exactly the
      reachable heap size (call it stwLive). However, in concurrent GC,
      marked=stwLive+floatLive, where floatLive is the amount of "floating
      garbage": objects that were reachable at some point during the cycle
      and were marked, but which are no longer reachable by the end of the
      cycle. If the GC cycle is short, then the mutator doesn't have much
      time to create floating garbage, so marked≈stwLive. However, if the GC
      cycle is long and the mutator is allocating and creating floating
      garbage very rapidly, then it's possible that marked≫stwLive. Since
      the runtime currently sets the heap goal based on marked, this will
      cause it to set a high heap goal. This means that 1) the next GC cycle
      will take longer because of the larger heap and 2) the assist ratio
      will be low because of the large distance between the trigger and the
      goal. The combination of these lets the mutator produce even more
      floating garbage in the next cycle, which further exacerbates the
      problem.
      
      For example, on the garbage benchmark with GOMAXPROCS=1, this causes
      the heap to grow to ~500MB and the garbage collector to retain upwards
      of ~300MB of heap, while the true reachable heap size is ~32MB. This,
      in turn, causes the GC cycle to take upwards of ~3 seconds.
      
      Fix this bad feedback loop by estimating the true reachable heap size
      (stwLive) and using this rather than the marked heap size
      (stwLive+floatLive) as the basis for the GC trigger and heap goal.
      This breaks the bad feedback loop and causes the mutator to assist
      more, which decreases the rate at which it can create floating
      garbage. On the same garbage benchmark, this reduces the maximum heap
      size to ~73MB, the retained heap to ~40MB, and the duration of the GC
      cycle to ~200ms.
      
      Change-Id: I7712244c94240743b266f9eb720c03802799cdd1
      Reviewed-on: https://go-review.googlesource.com/9177Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      4655aadd
    • Michael Hudson-Doyle's avatar
      cmd/go: refactor creation of top-level actions for -buildmode=shared · 91318dc7
      Michael Hudson-Doyle authored
      Change-Id: I429402dd91243cd9415b054ee17bfebccc68ed57
      Reviewed-on: https://go-review.googlesource.com/9197Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      91318dc7
    • Austin Clements's avatar
      runtime: include heap goal in gctrace line · 1ccc577b
      Austin Clements authored
      This may or may not be useful to the end user, but it's incredibly
      useful for us to understand the behavior of the pacer. Currently this
      is fairly easy (though not trivial) to derive from the other heap
      stats we print, but we're about to change how we compute the goal,
      which will make it much harder to derive.
      
      Change-Id: I796ef233d470c01f606bd9929820c01ece1f585a
      Reviewed-on: https://go-review.googlesource.com/9176Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      1ccc577b
    • Austin Clements's avatar
      runtime: avoid divide-by-zero in GC trigger controller · 1f39beb0
      Austin Clements authored
      The trigger controller computes GC CPU utilization by dividing by the
      wall-clock time that's passed since concurrent mark began. Since this
      delta is nanoseconds it's borderline impossible for it to be zero, but
      if it is zero we'll currently divide by zero. Be robust to this
      possibility by ignoring the utilization in the error term if no time
      has elapsed.
      
      Change-Id: I93dfc9e84735682af3e637f6538d1e7602634f09
      Reviewed-on: https://go-review.googlesource.com/9175Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      1f39beb0
    • Michael Hudson-Doyle's avatar
      cmd/internal/gc, cmd/internal/ld: fixes for global vars of types from other modules · 7820d270
      Michael Hudson-Doyle authored
      To make the gcprog for global data containing variables of types defined in other shared
      libraries, we need to know a lot about those types. So read the value of any symbol with
      a name starting with "type.". If a type uses a mask, the name of the symbol defining the
      mask unfortunately cannot be predicted from the type name so I have to keep track of the
      addresses of every such symbol and associate them with the type symbols after the fact.
      
      I'm not very happy about this change, but something like this is needed and this is as
      pleasant as I know how to make it.
      
      Change-Id: I408d831b08b3b31e0610688c41367b23998e975c
      Reviewed-on: https://go-review.googlesource.com/8334Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      7820d270
    • Michael Hudson-Doyle's avatar
      cmd/5g, etc, cmd/internal/gc, cmd/internal/obj, etc: coalesce bool2int implementations · ac1cdd13
      Michael Hudson-Doyle authored
      There were 10 implementations of the trivial bool2int function, 9 of which
      were the only thing in their file.  Remove all of them in favor of one in
      cmd/internal/obj.
      
      Change-Id: I9c51d30716239df51186860b9842a5e9b27264d3
      Reviewed-on: https://go-review.googlesource.com/9230Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      ac1cdd13