1. 29 Apr, 2016 8 commits
    • Austin Clements's avatar
      [dev.garbage] runtime: fix allocfreetrace · 6d114905
      Austin Clements authored
      We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode
      when we removed the sweep over the mark bitmap. Fix it by
      re-introducing the sweep over the bitmap specifically if we're in
      allocfreetrace mode. This doesn't have to be even remotely efficient,
      since the overhead of allocfreetrace is huge anyway, so we can keep
      the code for this down to just a few lines.
      
      Change-Id: I9e176b3b04c73608a0ea3068d5d0cd30760ebd40
      Reviewed-on: https://go-review.googlesource.com/22592
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      6d114905
    • Austin Clements's avatar
      [dev.garbage] runtime: reintroduce no-zeroing optimization · 38f67468
      Austin Clements authored
      Currently we always zero objects when we allocate them. We used to
      have an optimization that would not zero objects that had not been
      allocated since the whole span was last zeroed (either by getting it
      from the system or by getting it from the heap, which does a bulk
      zero), but this depended on the sweeper clobbering the first two words
      of each object. Hence, we lost this optimization when the bitmap
      sweeper went away.
      
      Re-introduce this optimization using a different mechanism. Each span
      already keeps a flag indicating that it just came from the OS or was
      just bulk zeroed by the mheap. We can simply use this flag to know
      when we don't need to zero an object. This is slightly less efficient
      than the old optimization: if a span gets allocated and partially
      used, then GC happens and the span gets returned to the mcentral, then
      the span gets re-acquired, the old optimization knew that it only had
      to re-zero the objects that had been reclaimed, whereas this
      optimization will re-zero everything. However, in this case, you're
      already paying for the garbage collection, and you've only wasted one
      zeroing of the span, so in practice there seems to be little
      difference. (If we did want to revive the full optimization, each span
      could keep track of a frontier beyond which all free slots are zeroed.
      I prototyped this and it didn't obvious do any better than the much
      simpler approach in this commit.)
      
      This significantly improves BinaryTree17, which is allocation-heavy
      (and runs first, so most pages are already zeroed), and slightly
      improves everything else.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  2.15ms ± 1%  2.14ms ± 1%  -0.80%  (p=0.000 n=17+17)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.71s ± 1%     2.56s ± 1%  -5.73%        (p=0.000 n=18+19)
      DivconstI64-12              1.70ns ± 1%    1.70ns ± 1%    ~           (p=0.562 n=18+18)
      DivconstU64-12              1.74ns ± 2%    1.74ns ± 1%    ~           (p=0.394 n=20+20)
      DivconstI32-12              1.74ns ± 0%    1.74ns ± 0%    ~     (all samples are equal)
      DivconstU32-12              1.66ns ± 1%    1.66ns ± 0%    ~           (p=0.516 n=15+16)
      DivconstI16-12              1.84ns ± 0%    1.84ns ± 0%    ~     (all samples are equal)
      DivconstU16-12              1.82ns ± 0%    1.82ns ± 0%    ~     (all samples are equal)
      DivconstI8-12               1.79ns ± 0%    1.79ns ± 0%    ~     (all samples are equal)
      DivconstU8-12               1.60ns ± 0%    1.60ns ± 1%    ~           (p=0.603 n=17+19)
      Fannkuch11-12                2.11s ± 1%     2.11s ± 0%    ~           (p=0.333 n=16+19)
      FmtFprintfEmpty-12          45.1ns ± 4%    45.4ns ± 5%    ~           (p=0.111 n=20+20)
      FmtFprintfString-12          134ns ± 0%     129ns ± 0%  -3.45%        (p=0.000 n=18+16)
      FmtFprintfInt-12             131ns ± 1%     129ns ± 1%  -1.54%        (p=0.000 n=16+18)
      FmtFprintfIntInt-12          205ns ± 2%     203ns ± 0%  -0.56%        (p=0.014 n=20+18)
      FmtFprintfPrefixedInt-12     200ns ± 2%     197ns ± 1%  -1.48%        (p=0.000 n=20+18)
      FmtFprintfFloat-12           256ns ± 1%     256ns ± 0%  -0.21%        (p=0.008 n=18+20)
      FmtManyArgs-12               805ns ± 0%     804ns ± 0%  -0.19%        (p=0.001 n=18+18)
      GobDecode-12                7.21ms ± 1%    7.14ms ± 1%  -0.92%        (p=0.000 n=19+20)
      GobEncode-12                5.88ms ± 1%    5.88ms ± 1%    ~           (p=0.641 n=18+19)
      Gzip-12                      218ms ± 1%     218ms ± 1%    ~           (p=0.271 n=19+18)
      Gunzip-12                   37.1ms ± 0%    36.9ms ± 0%  -0.29%        (p=0.000 n=18+17)
      HTTPClientServer-12         78.1µs ± 2%    77.4µs ± 2%    ~           (p=0.070 n=19+19)
      JSONEncode-12               15.5ms ± 1%    15.5ms ± 0%    ~           (p=0.063 n=20+18)
      JSONDecode-12               56.1ms ± 0%    55.4ms ± 1%  -1.18%        (p=0.000 n=19+18)
      Mandelbrot200-12            4.05ms ± 0%    4.06ms ± 0%  +0.29%        (p=0.001 n=18+18)
      GoParse-12                  3.28ms ± 1%    3.21ms ± 1%  -2.30%        (p=0.000 n=20+20)
      RegexpMatchEasy0_32-12      69.4ns ± 2%    69.3ns ± 1%    ~           (p=0.205 n=18+16)
      RegexpMatchEasy0_1K-12       239ns ± 0%     239ns ± 0%    ~     (all samples are equal)
      RegexpMatchEasy1_32-12      69.4ns ± 1%    69.4ns ± 1%    ~           (p=0.620 n=15+18)
      RegexpMatchEasy1_1K-12       370ns ± 1%     369ns ± 2%    ~           (p=0.088 n=20+20)
      RegexpMatchMedium_32-12      108ns ± 0%     108ns ± 0%    ~     (all samples are equal)
      RegexpMatchMedium_1K-12     33.6µs ± 3%    33.5µs ± 3%    ~           (p=0.718 n=20+20)
      RegexpMatchHard_32-12       1.68µs ± 1%    1.67µs ± 2%    ~           (p=0.316 n=20+20)
      RegexpMatchHard_1K-12       50.5µs ± 3%    50.4µs ± 3%    ~           (p=0.659 n=20+20)
      Revcomp-12                   381ms ± 1%     381ms ± 1%    ~           (p=0.916 n=19+18)
      Template-12                 66.5ms ± 1%    65.8ms ± 2%  -1.08%        (p=0.000 n=20+20)
      TimeParse-12                 317ns ± 0%     319ns ± 0%  +0.48%        (p=0.000 n=19+12)
      TimeFormat-12                338ns ± 0%     338ns ± 0%    ~           (p=0.124 n=19+18)
      [Geo mean]                  5.99µs         5.96µs       -0.54%
      
      Change-Id: I638ffd9d9f178835bbfa499bac20bd7224f1a907
      Reviewed-on: https://go-review.googlesource.com/22591Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      38f67468
    • Austin Clements's avatar
      [dev.garbage] runtime: eliminate mspan.start · 3e246238
      Austin Clements authored
      This converts all remaining uses of mspan.start to instead use
      mspan.base(). In many cases, this actually reduces the complexity of
      the code.
      
      Change-Id: If113840e00d3345a6cf979637f6a152e6344aee7
      Reviewed-on: https://go-review.googlesource.com/22590Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      3e246238
    • Austin Clements's avatar
      [dev.garbage] runtime: use s.base() everywhere it makes sense · b7adc41f
      Austin Clements authored
      Currently we have lots of (s.start << _PageShift) and variants. We now
      have an s.base() function that returns this. It's faster and more
      readable, so use it.
      
      Change-Id: I888060a9dae15ea75ca8cc1c2b31c905e71b452b
      Reviewed-on: https://go-review.googlesource.com/22559Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      b7adc41f
    • Austin Clements's avatar
      [dev.garbage] runtime: document sysAlloc · 2e8b74b6
      Austin Clements authored
      In particular, it always returns an aligned pointer.
      
      Change-Id: I763789a539a4bfd8b0efb36a39a80be1a479d3e2
      Reviewed-on: https://go-review.googlesource.com/22558Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      2e8b74b6
    • Austin Clements's avatar
      [dev.garbage] runtime: remove unused head/end arguments from freeSpan · 15744c92
      Austin Clements authored
      These used to be used for the list of newly freed objects, but that's
      no longer a thing.
      
      Change-Id: I5a4503137b74ec0eae5372ca271b1aa0b32df074
      Reviewed-on: https://go-review.googlesource.com/22557Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      15744c92
    • Rick Hudson's avatar
      [dev.garbage] runtime: use sys.Ctz64 intrinsic · 2fb75ea6
      Rick Hudson authored
      Our compilers now provides instrinsics including
      sys.Ctz64 that support CTZ (count trailing zero)
      instructions. This CL replaces the Go versions
      of CTZ with the compiler intrinsic.
      
      Count trailing zeros CTZ finds the least
      significant 1 in a word and returns the number
      of less significant 0s in the word.
      
      Allocation uses the bitmap created by the garbage
      collector to locate an unmarked object. The logic
      takes a word of the bitmap, complements, and then
      caches it. It then uses CTZ to locate an available
      unmarked object. It then shifts marked bits out of
      the bitmap word preparing it for the next search.
      Once all the unmarked objects are used in the
      cached work the bitmap gets another word and
      repeats the process.
      
      Change-Id: Id2fc42d1d4b9893efaa2e1bd01896985b7e42f82
      Reviewed-on: https://go-review.googlesource.com/21366Reviewed-by: default avatarAustin Clements <austin@google.com>
      2fb75ea6
    • Rick Hudson's avatar
      [dev.garbage] runtime: restructure alloc and mark bits · 2063d5d9
      Rick Hudson authored
      Two changes are included here that are dependent on the other.
      The first is that allocBits and gcamrkBits are changed to
      a *uint8 which points to the first byte of that span's
      mark and alloc bits. Several places were altered to
      perform pointer arithmetic to locate the byte corresponding
      to an object in the span. The actual bit corresponding
      to an object is indexed in the byte by using the lower three
      bits of the objects index.
      
      The second change avoids the redundant calculation of an
      object's index. The index is returned from heapBitsForObject
      and then used by the functions indexing allocBits
      and gcmarkBits.
      
      Finally we no longer allocate the gc bits in the span
      structures. Instead we use an arena based allocation scheme
      that allows for a more compact bit map as well as recycling
      and bulk clearing of the mark bits.
      
      Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef
      Reviewed-on: https://go-review.googlesource.com/20705Reviewed-by: default avatarAustin Clements <austin@google.com>
      2063d5d9
  2. 27 Apr, 2016 32 commits
    • Rick Hudson's avatar
      [dev.garbage] Merge remote-tracking branch 'origin/master' into HEAD · 23aeb34d
      Rick Hudson authored
      Change-Id: I282fd9ce9db435dfd35e882a9502ab1abc185297
      23aeb34d
    • Rick Hudson's avatar
      [dev.garbage] runtime: add gc work buffer tryGet and put fast paths · 1354b32c
      Rick Hudson authored
      The complexity of the GC work buffers put and tryGet
      prevented them from being inlined. This CL simplifies
      the fast path thus enabling inlining. If the fast
      path does not succeed the previous put and tryGet
      functions are called.
      
      Change-Id: I6da6495d0dadf42bd0377c110b502274cc01acf5
      Reviewed-on: https://go-review.googlesource.com/20704Reviewed-by: default avatarAustin Clements <austin@google.com>
      1354b32c
    • Rick Hudson's avatar
      [dev.garbage] runtime: cleanup and optimize span.base() · f8d0d4fd
      Rick Hudson authored
      Prior to this CL the base of a span was calculated in various
      places using shifts or calls to base(). This CL now
      always calls base() which has been optimized to calculate the
      base of the span when the span is initialized and store that
      value in the span structure.
      
      Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100
      Reviewed-on: https://go-review.googlesource.com/20703Reviewed-by: default avatarAustin Clements <austin@google.com>
      f8d0d4fd
    • Rick Hudson's avatar
      [dev.garbage] runtime: remove heapBitsSweepSpan · 8dda1c4c
      Rick Hudson authored
      Prior to this CL the sweep phase was responsible for locating
      all objects that were about to be freed and calling a function
      to process the object. This was done by the function
      heapBitsSweepSpan. Part of processing included calls to
      tracefree and msanfree as well as counting how many objects
      were freed.
      
      The calls to tracefree and msanfree have been moved into the
      gcmalloc routine and called when the object is about to be
      reallocated. The counting of free objects has been optimized
      using an array based popcnt algorithm and if all the objects
      in a span are free then span is freed.
      
      Similarly the code to locate the next free object has been
      optimized to use an array based ctz (count trailing zero).
      Various hot paths in the allocation logic have been optimized.
      
      At this point the garbage benchmark is within 3% of the 1.6
      release.
      
      Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa
      Reviewed-on: https://go-review.googlesource.com/20201Reviewed-by: default avatarAustin Clements <austin@google.com>
      8dda1c4c
    • Rick Hudson's avatar
      [dev.garbage] runtime: add bit and cache ctz64 (count trailing zero) · 40934815
      Rick Hudson authored
      Add to each span a 64 bit cache (allocCache) of the allocBits
      at freeindex. allocCache is shifted such that the lowest bit
      corresponds to the bit freeindex. allocBits uses a 0 to
      indicate an object is free, on the other hand allocCache
      uses a 1 to indicate an object is free. This facilitates
      ctz64 (count trailing zero) which counts the number of 0s
      trailing the least significant 1. This is also the index of
      the least significant 1.
      
      Each span maintains a freeindex indicating the boundary
      between allocated objects and unallocated objects. allocCache
      is shifted as freeindex is incremented such that the low bit
      in allocCache corresponds to the bit a freeindex in the
      allocBits array.
      
      Currently ctz64 is written in Go using a for loop so it is
      not very efficient. Use of the hardware instruction will
      follow. With this in mind comparisons of the garbage
      benchmark are as follows.
      
      1.6 release        2.8 seconds
      dev:garbage branch 3.1 seconds.
      
      Profiling shows the go implementation of ctz64 takes up
      1% of the total time.
      
      Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f
      Reviewed-on: https://go-review.googlesource.com/20200Reviewed-by: default avatarAustin Clements <austin@google.com>
      40934815
    • Rick Hudson's avatar
      [dev.garbage] runtime: logic that uses count trailing zero (ctz) · 44fe90d0
      Rick Hudson authored
      Most (all?) processors that Go supports supply a hardware
      instruction that takes a byte and returns the number
      of zeros trailing the first 1 encountered, or 8
      if no ones are found. This is the index within the
      byte of the first 1 encountered. CTZ should improve the
      performance of the nextFreeIndex function.
      
      Since nextFreeIndex wants the next unmarked (0) bit
      a bit-wise complement is needed before calling ctz.
      Furthermore unmarked bits associated with previously
      allocated objects need to be ignored. Instead of writing
      a 1 as we allocate the code masks all bits less than the
      freeindex after loading the byte.
      
      While this CL does not actual execute a CTZ instruction
      it supplies a ctz function with the appropiate signature
      along with the logic to execute it.
      
      Change-Id: I5c55ce0ed48ca22c21c4dd9f969b0819b4eadaa7
      Reviewed-on: https://go-review.googlesource.com/20169Reviewed-by: default avatarKeith Randall <khr@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      44fe90d0
    • Rick Hudson's avatar
      [dev.garbage] runtime: replace ref with allocCount · e4ac2d4a
      Rick Hudson authored
      This is a renaming of the field ref to the
      more appropriate allocCount. The field
      holds the number of objects in the span
      that are currently allocated. Some throws
      strings were adjusted to more accurately
      convey the meaning of allocCount.
      
      Change-Id: I10daf44e3e9cc24a10912638c7de3c1984ef8efe
      Reviewed-on: https://go-review.googlesource.com/19518Reviewed-by: default avatarAustin Clements <austin@google.com>
      e4ac2d4a
    • Rick Hudson's avatar
      [dev.garbage] runtime: allocate directly from GC mark bits · 3479b065
      Rick Hudson authored
      Instead of building a freelist from the mark bits generated
      by the GC this CL allocates directly from the mark bits.
      
      The approach moves the mark bits from the pointer/no pointer
      heap structures into their own per span data structures. The
      mark/allocation vectors consist of a single mark bit per
      object. Two vectors are maintained, one for allocation and
      one for the GC's mark phase. During the GC cycle's sweep
      phase the interpretation of the vectors is swapped. The
      mark vector becomes the allocation vector and the old
      allocation vector is cleared and becomes the mark vector that
      the next GC cycle will use.
      
      Marked entries in the allocation vector indicate that the
      object is not free. Each allocation vector maintains a boundary
      between areas of the span already allocated from and areas
      not yet allocated from. As objects are allocated this boundary
      is moved until it reaches the end of the span. At this point
      further allocations will be done from another span.
      
      Since we no longer sweep a span inspecting each freed object
      the responsibility for maintaining pointer/scalar bits in
      the heapBitMap containing is now the responsibility of the
      the routines doing the actual allocation.
      
      This CL is functionally complete and ready for performance
      tuning.
      
      Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04
      Reviewed-on: https://go-review.googlesource.com/19470Reviewed-by: default avatarAustin Clements <austin@google.com>
      3479b065
    • Rick Hudson's avatar
      [dev.garbage] runtime: mark/allocation helper functions · dc65a82e
      Rick Hudson authored
      The gcmarkBits is a bit vector used by the GC to mark
      reachable objects. Once a GC cycle is complete the gcmarkBits
      swap places with the allocBits. allocBits is then used directly
      by malloc to locate free objects, thus avoiding the
      construction of a linked free list. This CL introduces a set
      of helper functions for manipulating gcmarkBits and allocBits
      that will be used by later CLs to realize the actual
      algorithm. Minimal attempts have been made to optimize these
      helper routines.
      
      Change-Id: I55ad6240ca32cd456e8ed4973c6970b3b882dd34
      Reviewed-on: https://go-review.googlesource.com/19420Reviewed-by: default avatarAustin Clements <austin@google.com>
      Run-TryBot: Rick Hudson <rlh@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      dc65a82e
    • Rick Hudson's avatar
      [dev.garbage] runtime: refactor next free object · e1c4e9a7
      Rick Hudson authored
      In preparation for changing how the next free object is chosen
      refactor and consolidate code into a single function.
      
      Change-Id: I6836cd88ed7cbf0b2df87abd7c1c3b9fabc1cbd8
      Reviewed-on: https://go-review.googlesource.com/19317Reviewed-by: default avatarAustin Clements <austin@google.com>
      e1c4e9a7
    • Rick Hudson's avatar
      [dev.garbage] runtime: add stackfreelist · aed86103
      Rick Hudson authored
      The freelist for normal objects and the freelist
      for stacks share the same mspan field for holding
      the list head but are operated on by different code
      sequences. This overloading complicates the use of bit
      vectors for allocation of normal objects. This change
      refactors the use of the stackfreelist out from the
      use of freelist.
      
      Change-Id: I5b155b5b8a1fcd8e24c12ee1eb0800ad9b6b4fa0
      Reviewed-on: https://go-review.googlesource.com/19315Reviewed-by: default avatarAustin Clements <austin@google.com>
      aed86103
    • Rick Hudson's avatar
      [dev.garbage] runtime: bitmap allocation data structs · 2ac8bdc5
      Rick Hudson authored
      The bitmap allocation data structure prototypes. Before
      this is released these underlying data structures need
      to be more performant but the signatures of helper
      functions utilizing these structures will remain stable.
      
      Change-Id: I5ace12f2fb512a7038a52bbde2bfb7e98783bcbe
      Reviewed-on: https://go-review.googlesource.com/19221Reviewed-by: default avatarAustin Clements <austin@google.com>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      2ac8bdc5
    • Dave Cheney's avatar
      cmd/compile/internal/gc: remove oconv(op, 0) calls · d3c79d32
      Dave Cheney authored
      Updates #15462
      
      Automatic refactor with sed -e.
      
      Replace all oconv(op, 0) to string conversion with the raw op value
      which fmt's %v verb can print directly.
      
      The remaining oconv(op, FmtSharp) will be replaced with op.GoString and
      %#v in the next CL.
      
      Change-Id: I5e2f7ee0bd35caa65c6dd6cb1a866b5e4519e641
      Reviewed-on: https://go-review.googlesource.com/22499
      Run-TryBot: Dave Cheney <dave@cheney.net>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      d3c79d32
    • Dan Peterson's avatar
      net: search domain from hostname if no search directives · cbd72318
      Dan Peterson authored
      Fixes #14897
      
      Change-Id: Iffe7462983a5623a37aa0dc6f74c8c70e10c3244
      Reviewed-on: https://go-review.googlesource.com/21464Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      cbd72318
    • Damien Neil's avatar
      syscall: fix uint64->int cast of control message header · 4edb40d4
      Damien Neil authored
      Change-Id: I28980b307d10730b122a4f833809bc400d6aff24
      Reviewed-on: https://go-review.googlesource.com/22525Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      4edb40d4
    • Cherry Zhang's avatar
      misc/cgo/testcarchive: fix path of libgo.a for darwin/arm · 78bcdeb6
      Cherry Zhang authored
      After CL 22461, c-archive build on darwin/arm is by default compiled
      with -shared, so update the install path.
      
      Fix build.
      
      Change-Id: Ie93dbd226ed416b834da0234210f4b98bc0e3606
      Reviewed-on: https://go-review.googlesource.com/22507Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      78bcdeb6
    • Austin Clements's avatar
      runtime: don't rescan globals · b49b71ae
      Austin Clements authored
      Currently the runtime rescans globals during mark 2 and mark
      termination. This costs as much as 500µs/MB in STW time, which is
      enough to surpass the 10ms STW limit with only 20MB of globals.
      
      It's also basically unnecessary. The compiler already generates write
      barriers for global -> heap pointer updates and the regular write
      barrier doesn't check whether the slot is a global or in the heap.
      Some less common write barriers do cause problems.
      heapBitsBulkBarrier, which is used by typedmemmove and related
      functions, currently depends on having access to the pointer bitmap
      and as a result ignores writes to globals. Likewise, the
      reflect-related write barriers reflect_typedmemmovepartial and
      callwritebarrier ignore non-heap destinations; though it appears they
      can never be called with global pointers anyway.
      
      This commit makes heapBitsBulkBarrier issue write barriers for writes
      to global pointers using the data and BSS pointer bitmaps, removes the
      inheap checks from the reflection write barriers, and eliminates the
      rescans during mark 2 and mark termination. It also adds a test that
      writes to globals have write barriers.
      
      Programs with large data+BSS segments (with pointers) aren't common,
      but for programs that do have large data+BSS segments, this
      significantly reduces pause time:
      
      name \ 95%ile-time/markTerm              old         new  delta
      LargeBSS/bss:1GB/gomaxprocs:4  148200µs ± 6%  302µs ±52%  -99.80% (p=0.008 n=5+5)
      
      This very slightly improves the go1 benchmarks:
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.62s ± 3%     2.62s ± 4%    ~     (p=0.904 n=20+20)
      Fannkuch11-12                2.15s ± 1%     2.13s ± 0%  -1.29%  (p=0.000 n=18+20)
      FmtFprintfEmpty-12          48.3ns ± 2%    47.6ns ± 1%  -1.52%  (p=0.000 n=20+16)
      FmtFprintfString-12          152ns ± 0%     152ns ± 1%    ~     (p=0.725 n=18+18)
      FmtFprintfInt-12             150ns ± 1%     149ns ± 1%  -1.14%  (p=0.000 n=19+20)
      FmtFprintfIntInt-12          250ns ± 0%     244ns ± 1%  -2.12%  (p=0.000 n=20+18)
      FmtFprintfPrefixedInt-12     219ns ± 1%     217ns ± 1%  -1.20%  (p=0.000 n=19+20)
      FmtFprintfFloat-12           280ns ± 0%     281ns ± 1%  +0.47%  (p=0.000 n=19+19)
      FmtManyArgs-12               928ns ± 0%     923ns ± 1%  -0.53%  (p=0.000 n=19+18)
      GobDecode-12                7.21ms ± 1%    7.24ms ± 2%    ~     (p=0.091 n=19+19)
      GobEncode-12                6.07ms ± 1%    6.05ms ± 1%  -0.36%  (p=0.002 n=20+17)
      Gzip-12                      265ms ± 1%     265ms ± 1%    ~     (p=0.496 n=20+19)
      Gunzip-12                   39.6ms ± 1%    39.3ms ± 1%  -0.85%  (p=0.000 n=19+19)
      HTTPClientServer-12         74.0µs ± 2%    73.8µs ± 1%    ~     (p=0.569 n=20+19)
      JSONEncode-12               15.4ms ± 1%    15.3ms ± 1%  -0.25%  (p=0.049 n=17+17)
      JSONDecode-12               53.7ms ± 2%    53.0ms ± 1%  -1.29%  (p=0.000 n=18+17)
      Mandelbrot200-12            3.97ms ± 1%    3.97ms ± 0%    ~     (p=0.072 n=17+18)
      GoParse-12                  3.35ms ± 2%    3.36ms ± 1%  +0.51%  (p=0.005 n=18+20)
      RegexpMatchEasy0_32-12      72.7ns ± 2%    72.2ns ± 1%  -0.70%  (p=0.005 n=19+19)
      RegexpMatchEasy0_1K-12       246ns ± 1%     245ns ± 0%  -0.60%  (p=0.000 n=18+16)
      RegexpMatchEasy1_32-12      72.8ns ± 1%    72.5ns ± 1%  -0.37%  (p=0.011 n=18+18)
      RegexpMatchEasy1_1K-12       380ns ± 1%     385ns ± 1%  +1.34%  (p=0.000 n=20+19)
      RegexpMatchMedium_32-12      115ns ± 2%     115ns ± 1%  +0.44%  (p=0.047 n=20+20)
      RegexpMatchMedium_1K-12     35.4µs ± 1%    35.5µs ± 1%    ~     (p=0.079 n=18+19)
      RegexpMatchHard_32-12       1.83µs ± 0%    1.80µs ± 1%  -1.76%  (p=0.000 n=18+18)
      RegexpMatchHard_1K-12       55.1µs ± 0%    54.3µs ± 1%  -1.42%  (p=0.000 n=18+19)
      Revcomp-12                   386ms ± 1%     381ms ± 1%  -1.14%  (p=0.000 n=18+18)
      Template-12                 61.5ms ± 2%    61.5ms ± 2%    ~     (p=0.647 n=19+20)
      TimeParse-12                 338ns ± 0%     336ns ± 1%  -0.72%  (p=0.000 n=14+19)
      TimeFormat-12                350ns ± 0%     357ns ± 0%  +2.05%  (p=0.000 n=19+18)
      [Geo mean]                  55.3µs         55.0µs       -0.41%
      
      Change-Id: I57e8720385a1b991aeebd111b6874354308e2a6b
      Reviewed-on: https://go-review.googlesource.com/20829
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      b49b71ae
    • Austin Clements's avatar
      runtime: make {add,subtract}{b,1} nosplit · 30172f18
      Austin Clements authored
      These are used at the bottom level of various GC operations that must
      not be preempted. To be on the safe side, mark them all nosplit.
      
      Change-Id: I8f7360e79c9852bd044df71413b8581ad764380c
      Reviewed-on: https://go-review.googlesource.com/22504
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      30172f18
    • David Crawshaw's avatar
      reflect: fix strings of SliceOf-created types · bddfc337
      David Crawshaw authored
      The new type was inheriting the tflagExtraStar from its prototype.
      
      Fixes #15467
      
      Change-Id: Ic22c2a55cee7580cb59228d52b97e1c0a1e60220
      Reviewed-on: https://go-review.googlesource.com/22501Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      bddfc337
    • David Crawshaw's avatar
      reflect: unnamed interface types have no name · 217be5b3
      David Crawshaw authored
      Fixes #15468
      
      Change-Id: I8723171f87774a98d5e80e7832ebb96dd1fbea74
      Reviewed-on: https://go-review.googlesource.com/22524Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      217be5b3
    • Zhongwei Yao's avatar
      cmd/compile: enable const division for arm64 · 74a9bad6
      Zhongwei Yao authored
      performance:
      benchmark                   old ns/op     new ns/op     delta
      BenchmarkDivconstI64-8      8.28          2.70          -67.39%
      BenchmarkDivconstU64-8      8.28          4.69          -43.36%
      BenchmarkDivconstI32-8      8.28          6.39          -22.83%
      BenchmarkDivconstU32-8      8.28          4.43          -46.50%
      BenchmarkDivconstI16-8      5.17          5.17          +0.00%
      BenchmarkDivconstU16-8      5.33          5.34          +0.19%
      BenchmarkDivconstI8-8       3.50          3.50          +0.00%
      BenchmarkDivconstU8-8       3.51          3.50          -0.28%
      
      Fixes #15382
      
      Change-Id: Ibce7b28f0586d593b33c4d4ecc5d5e7e7c905d13
      Reviewed-on: https://go-review.googlesource.com/22292Reviewed-by: default avatarMichael Munday <munday@ca.ibm.com>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      74a9bad6
    • Robert Griesemer's avatar
      cmd/compile: switch to compact export format by default · 7538b1db
      Robert Griesemer authored
      builtin.go was auto-generated via go generate; all other
      changes were manual.
      
      The new format reduces the export data size by ~65% on average
      for the std library packages (and there is still quite a bit of
      room for improvement).
      
      The average time to write export data is reduced by (at least)
      62% as measured in one run over the std lib, it is likely more.
      
      The average time to read import data is reduced by (at least)
      37% as measured in one run over the std lib, it is likely more.
      There is also room to improve this time.
      
      The compiler transparently handles both packages using the old
      and the new format.
      
      Comparing the -S output of the go build for each package via
      the cmp.bash script (added) shows identical assembly code for
      all packages, but 6 files show file:line differences:
      
      The following files have differences because they use cgo
      and cgo uses different temp. directories for different builds.
      Harmless.
      
      	src/crypto/x509
      	src/net
      	src/os/user
      	src/runtime/cgo
      
      The following files have file:line differences that are not yet
      fully explained; however the differences exist w/ and w/o new export
      format (pre-existing condition). See issue #15453.
      
      	src/go/internal/gccgoimporter
      	src/go/internal/gcimporter
      
      In summary, switching to the new export format produces the same
      package files as before for all practical purposes.
      
      How can you tell which one you have (if you care): Open a package
      (.a) file in an editor. Textual export data starts with a $$ after
      the header and is more or less legible; binary export data starts
      with a $$B after the header and is mostly unreadable. A stand-alone
      decoder (for debugging) is in the works.
      
      In case of a problem, please first try reverting back to the old
      textual format to determine if the cause is the new export format:
      
      For a stand-alone compiler invocation:
      - go tool compile -newexport=0 <files>
      
      For a single package:
      - go build -gcflags="-newexport=0" <pkg>
      
      For make/all.bash:
      - (export GO_GCFLAGS="-newexport=0"; sh make.bash)
      
      Fixes #13241.
      
      Change-Id: I2588cb463be80af22446bf80c225e92ab79878b8
      Reviewed-on: https://go-review.googlesource.com/22123Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Robert Griesemer <gri@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      7538b1db
    • Michael Matloob's avatar
      regexp: add a harder regexp to the benchmarks · 70d95a48
      Michael Matloob authored
      This regexp has many parallel alternations
      
      Change-Id: I8044f460aa7d18f20cb0452e9470557b87facd6d
      Reviewed-on: https://go-review.googlesource.com/22471Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      70d95a48
    • Cherry Zhang's avatar
      cmd/link: remove absolute address for c-archive on darwin/arm · 9629f55f
      Cherry Zhang authored
      Now it is possible to build a c-archive as PIC on darwin/arm (this is
      now the default). Then the system linker can link the binary using
      the archive as PIE.
      
      Fixes #12896.
      
      Change-Id: Iad84131572422190f5fa036e7d71910dc155f155
      Reviewed-on: https://go-review.googlesource.com/22461Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      9629f55f
    • Robert Griesemer's avatar
      cmd/compile: don't write pos info for builtin packages · 86c93c98
      Robert Griesemer authored
      TestBuiltin will fail if run on Windows and builtin.go was generated
      on a non-Windows machine (or vice versa) because path names have
      different separators. Avoid problem altogether by not writing pos
      info for builtin packages. It's not needed.
      
      Affects -newexport only.
      
      Change-Id: I8944f343452faebaea9a08b5fb62829bed77c148
      Reviewed-on: https://go-review.googlesource.com/22498
      Run-TryBot: Robert Griesemer <gri@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      86c93c98
    • Keith Randall's avatar
      cmd/compile: don't use line numbers from ONAME and named OLITERALs · a19e60b2
      Keith Randall authored
      The line numbers of ONAMEs are the location of their
      declaration, not their use.
      
      The line numbers of named OLITERALs are also the location
      of their declaration.
      
      Ignore both of these.  Instead, we will inherit the line number from
      the containing syntactic item.
      
      Fixes #14742
      Fixes #15430
      
      Change-Id: Ie43b5b9f6321cbf8cead56e37ccc9364d0702f2f
      Reviewed-on: https://go-review.googlesource.com/22479Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      a19e60b2
    • Zhongwei Yao's avatar
      cmd/asm: fix SIMD register name on arm64 · c9389a10
      Zhongwei Yao authored
      Current V-register range is V32~V63 on arm64. This patch changes it to
      V0~V31.
      
      fix #15465.
      
      Change-Id: I90dab42dea46825ec5d7a8321ec4f6550735feb8
      Reviewed-on: https://go-review.googlesource.com/22520Reviewed-by: default avatarAram Hăvărneanu <aram@mgk.ro>
      Run-TryBot: Aram Hăvărneanu <aram@mgk.ro>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c9389a10
    • Dmitry Vyukov's avatar
      runtime/race: improve TestNoRaceIOHttp test · 6dfba5c7
      Dmitry Vyukov authored
      TestNoRaceIOHttp does all kinds of bad things:
      1. Binds to a fixed port, so concurrent tests fail.
      2. Registers HTTP handler multiple times, so repeated tests fail.
      3. Relies on sleep to wait for listen.
      
      Fix all of that.
      
      Change-Id: I1210b7797ef5e92465b37dc407246d92a2a24fe8
      Reviewed-on: https://go-review.googlesource.com/19953
      Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      6dfba5c7
    • Martin Möhrmann's avatar
      image/color: optimize RGBToYCbCr · 102cf2ae
      Martin Möhrmann authored
      Apply optimizations used to speed up YCbCrToRGB from
      https://go-review.googlesource.com/#/c/21910/
      to RGBToYCbCr.
      
      name             old time/op  new time/op  delta
      RGBToYCbCr/0-2   6.81ns ± 0%  5.96ns ± 0%  -12.48%  (p=0.000 n=38+50)
      RGBToYCbCr/Cb-2  7.68ns ± 0%  6.13ns ± 0%  -20.21%  (p=0.000 n=50+33)
      RGBToYCbCr/Cr-2  6.84ns ± 0%  6.04ns ± 0%  -11.70%  (p=0.000 n=39+42)
      
      Updates #15260
      
      Change-Id: If3ea5393ae371a955ddf18ab226aae20b48f9692
      Reviewed-on: https://go-review.googlesource.com/22411Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRalph Corderoy <ralph@inputplus.co.uk>
      102cf2ae
    • Dave Cheney's avatar
      cmd/compile/internal: unexport gc.Oconv · 8f2e780e
      Dave Cheney authored
      Updates #15462
      
      Semi automatic change with gofmt -r and hand fixups for callers outside
      internal/gc.
      
      All the uses of gc.Oconv outside cmd/compile/internal/gc were for the
      Oconv(op, 0) form, which is already handled the Op.String method.
      
      Replace the use of gc.Oconv(op, 0) with op itself, which will call
      Op.String via the %v or %s verb. Unexport Oconv.
      
      Change-Id: I84da2a2e4381b35f52efce427b2d6a3bccdf2526
      Reviewed-on: https://go-review.googlesource.com/22496
      Run-TryBot: Dave Cheney <dave@cheney.net>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      8f2e780e
    • Josh Bleecher Snyder's avatar
      cmd/compile: fix opnames · 707aed03
      Josh Bleecher Snyder authored
      Change-Id: Ief4707747338912216a8509b1adbf655c8ffac56
      Reviewed-on: https://go-review.googlesource.com/22495
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      707aed03
    • Brad Fitzpatrick's avatar
      net/http: remove idle transport connections from Transport when server closes · 2e302182
      Brad Fitzpatrick authored
      Previously the Transport would cache idle connections from the
      Transport for later reuse, but if a peer server disconnected
      (e.g. idle timeout), we would not proactively remove the *persistConn
      from the Transport's idle list, leading to a waste of memory
      (potentially forever).
      
      Instead, when the persistConn's readLoop terminates, remote it from
      the idle list, if present.
      
      This also adds the beginning of accounting for the total number of
      idle connections, which will be needed for Transport.MaxIdleConns
      later.
      
      Updates #15461
      
      Change-Id: Iab091f180f8dd1ee0d78f34b9705d68743b5557b
      Reviewed-on: https://go-review.googlesource.com/22492Reviewed-by: default avatarAndrew Gerrand <adg@golang.org>
      2e302182