1. 01 May, 2016 18 commits
  2. 30 Apr, 2016 7 commits
    • Brad Fitzpatrick's avatar
      net/http: expand documentation of Server.MaxHeaderBytes · 38cfaa5f
      Brad Fitzpatrick authored
      Clarify that it includes the RFC 7230 "request-line".
      
      Fixes #15494
      
      Change-Id: I9cc5dd5f2d85ebf903229539208cec4da5c38d04
      Reviewed-on: https://go-review.googlesource.com/22656Reviewed-by: default avatarAndrew Gerrand <adg@golang.org>
      38cfaa5f
    • Kevin Burke's avatar
      database/sql: clone data for named []byte types · 4e0cd1ee
      Kevin Burke authored
      Previously named byte types like json.RawMessage could get dirty
      database memory from a call to Scan. These types would activate a
      code path that didn't clone the byte data coming from the database
      before assigning it. Another thread could then overwrite the byte
      array in src, which has unexpected consequences.
      
      Originally reported by Jason Moiron; the patch and test are his
      suggestions. Fixes #13905.
      
      Change-Id: Iacfef61cbc9dd51c8fccef9b2b9d9544c77dd0e0
      Reviewed-on: https://go-review.googlesource.com/22393Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      4e0cd1ee
    • Austin Clements's avatar
      runtime: reclaim scan/dead bit in first word · a20fd1f6
      Austin Clements authored
      With the switch to separate mark bitmaps, the scan/dead bit for the
      first word of each object is now unused. Reclaim this bit and use it
      as a scan/dead bit, just like words three and on. The second word is
      still used for checkmark.
      
      This dramatically simplifies heapBitsSetTypeNoScan and hasPointers,
      since they no longer need different cases for 1, 2, and 3+ word
      objects. They can instead just manipulate the heap bitmap for the
      first word and be done with it.
      
      In order to enable this, we change heapBitsSetType and runGCProg to
      always set the scan/dead bit to scan for the first word on every code
      path. Since these functions only apply to types that have pointers,
      there's no need to do this conditionally: it's *always* necessary to
      set the scan bit in the first word.
      
      We also change every place that scans an object and checks if there
      are more pointers. Rather than only checking morePointers if the word
      is >= 2, we now check morePointers if word != 1 (since that's the
      checkmark word).
      
      Looking forward, we should probably reclaim the checkmark bit, too,
      but that's going to be quite a bit more work.
      
      Tested by setting doubleCheck in heapBitsSetType and running all.bash
      on both linux/amd64 and linux/386, and by running GOGC=10 all.bash.
      
      This particularly improves the FmtFprintf* go1 benchmarks, since they
      do a large amount of noscan allocation.
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.34s ± 1%     2.38s ± 1%  +1.70%  (p=0.000 n=17+19)
      Fannkuch11-12                2.09s ± 0%     2.09s ± 1%    ~     (p=0.276 n=17+16)
      FmtFprintfEmpty-12          44.9ns ± 2%    44.8ns ± 2%    ~     (p=0.340 n=19+18)
      FmtFprintfString-12          127ns ± 0%     125ns ± 0%  -1.57%  (p=0.000 n=16+15)
      FmtFprintfInt-12             128ns ± 0%     122ns ± 1%  -4.45%  (p=0.000 n=15+20)
      FmtFprintfIntInt-12          207ns ± 1%     193ns ± 0%  -6.55%  (p=0.000 n=19+14)
      FmtFprintfPrefixedInt-12     197ns ± 1%     191ns ± 0%  -2.93%  (p=0.000 n=17+18)
      FmtFprintfFloat-12           263ns ± 0%     248ns ± 1%  -5.88%  (p=0.000 n=15+19)
      FmtManyArgs-12               794ns ± 0%     779ns ± 1%  -1.90%  (p=0.000 n=18+18)
      GobDecode-12                7.14ms ± 2%    7.11ms ± 1%    ~     (p=0.072 n=20+20)
      GobEncode-12                5.85ms ± 1%    5.82ms ± 1%  -0.49%  (p=0.000 n=20+20)
      Gzip-12                      218ms ± 1%     215ms ± 1%  -1.22%  (p=0.000 n=19+19)
      Gunzip-12                   36.8ms ± 0%    36.7ms ± 0%  -0.18%  (p=0.006 n=18+20)
      HTTPClientServer-12         77.1µs ± 4%    77.1µs ± 3%    ~     (p=0.945 n=19+20)
      JSONEncode-12               15.6ms ± 1%    15.9ms ± 1%  +1.68%  (p=0.000 n=18+20)
      JSONDecode-12               55.2ms ± 1%    53.6ms ± 1%  -2.93%  (p=0.000 n=17+19)
      Mandelbrot200-12            4.05ms ± 1%    4.05ms ± 0%    ~     (p=0.306 n=17+17)
      GoParse-12                  3.14ms ± 1%    3.10ms ± 1%  -1.31%  (p=0.000 n=19+18)
      RegexpMatchEasy0_32-12      69.3ns ± 1%    70.0ns ± 0%  +0.89%  (p=0.000 n=19+17)
      RegexpMatchEasy0_1K-12       237ns ± 1%     236ns ± 0%  -0.62%  (p=0.000 n=19+16)
      RegexpMatchEasy1_32-12      69.5ns ± 1%    70.3ns ± 1%  +1.14%  (p=0.000 n=18+17)
      RegexpMatchEasy1_1K-12       377ns ± 1%     366ns ± 1%  -3.03%  (p=0.000 n=15+19)
      RegexpMatchMedium_32-12      107ns ± 1%     107ns ± 2%    ~     (p=0.318 n=20+19)
      RegexpMatchMedium_1K-12     33.8µs ± 3%    33.5µs ± 1%  -1.04%  (p=0.001 n=20+19)
      RegexpMatchHard_32-12       1.68µs ± 1%    1.73µs ± 0%  +2.50%  (p=0.000 n=20+18)
      RegexpMatchHard_1K-12       50.8µs ± 1%    52.0µs ± 1%  +2.50%  (p=0.000 n=19+18)
      Revcomp-12                   381ms ± 1%     385ms ± 1%  +1.00%  (p=0.000 n=17+18)
      Template-12                 64.9ms ± 3%    62.6ms ± 1%  -3.55%  (p=0.000 n=19+18)
      TimeParse-12                 324ns ± 0%     328ns ± 1%  +1.25%  (p=0.000 n=18+18)
      TimeFormat-12                345ns ± 0%     334ns ± 0%  -3.31%  (p=0.000 n=15+17)
      [Geo mean]                  52.1µs         51.5µs       -1.00%
      
      Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049
      Reviewed-on: https://go-review.googlesource.com/22632Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a20fd1f6
    • Austin Clements's avatar
      runtime: use morePointers and isPointer in more places · d5e3d08b
      Austin Clements authored
      This makes this code better self-documenting and makes it easier to
      find these places in the future.
      
      Change-Id: I31dc5598ae67f937fb9ef26df92fd41d01e983c3
      Reviewed-on: https://go-review.googlesource.com/22631Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      d5e3d08b
    • Austin Clements's avatar
      runtime: avoid conditional execution in morePointers and isPointer · a5d3f7ec
      Austin Clements authored
      heapBits.bits is carefully written to produce good machine code. Use
      it in heapBits.morePointers and heapBits.isPointer to get good machine
      code there, too.
      
      Change-Id: I208c7d0d38697e7a22cad67f692162589b75f1e2
      Reviewed-on: https://go-review.googlesource.com/22630Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a5d3f7ec
    • Keith Randall's avatar
      cmd/compile: ecx is reserved for PIC, don't let peep work on it · 7a60a962
      Keith Randall authored
      Fixes #15496
      
      Change-Id: Ieb5be1caa4b1c23e23b20d56c1a0a619032a9f5d
      Reviewed-on: https://go-review.googlesource.com/22652Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      7a60a962
    • Michael Munday's avatar
      runtime: fix cgocallback_gofunc on ppc64x · 58f52cbb
      Michael Munday authored
      Fix issues introduced in 5f9a870b.
      
      Change-Id: Ia75945ef563956613bf88bbe57800a96455c265d
      Reviewed-on: https://go-review.googlesource.com/22661Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      58f52cbb
  3. 29 Apr, 2016 15 commits
    • Ian Lance Taylor's avatar
      runtime: fix cgocallback_gofunc argument passing on arm64 · 9fe572e5
      Ian Lance Taylor authored
      Change-Id: I4b34bcd5cde71ecfbb352b39c4231de6168cc7f3
      Reviewed-on: https://go-review.googlesource.com/22651
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMichael Munday <munday@ca.ibm.com>
      9fe572e5
    • Matthew Dempsky's avatar
      root: remove dev.garbage file · 36b6c038
      Matthew Dempsky authored
      Change-Id: I99b2ca52824341d986090f5c78ab4f396594bcdf
      Reviewed-on: https://go-review.googlesource.com/22660Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      36b6c038
    • Ian Lance Taylor's avatar
      cmd/cgo, runtime, runtime/cgo: use cgo context function · 5f9a870b
      Ian Lance Taylor authored
      Add support for the context function set by runtime.SetCgoTraceback.
      The context function was added in CL 17761, without support.
      This CL is the support.
      
      This CL has not been tested for real C code, as a working context
      function for C code requires unwind support that does not seem to exist.
      I wanted to get the CL out before the freeze.
      
      I apologize for the length of this CL.  It's mostly plumbing, but
      unfortunately the plumbing is processor-specific.
      
      Change-Id: I8ce11a0de9b3dafcc29efd2649d776e93bff0e90
      Reviewed-on: https://go-review.googlesource.com/22508Reviewed-by: default avatarAustin Clements <austin@google.com>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      5f9a870b
    • Michael Munday's avatar
      crypto/cipher, crypto/aes: add s390x implementation of AES-CTR · c717675c
      Michael Munday authored
      This commit adds the new 'ctrAble' interface to the crypto/cipher
      package. The role of ctrAble is the same as gcmAble but for CTR
      instead of GCM. It allows block ciphers to provide optimized CTR
      implementations.
      
      The primary benefit of adding CTR support to the s390x AES
      implementation is that it allows us to encrypt the counter values
      in bulk, giving the cipher message instruction a larger chunk of
      data to work on per invocation.
      
      The xorBytes assembly is necessary because xorBytes becomes a
      bottleneck when CTR is done in this way. Hopefully it will be
      possible to remove this once s390x has migrated to the ssa
      backend.
      
      name      old speed     new speed     delta
      AESCTR1K  160MB/s ± 6%  867MB/s ± 0%  +442.42%  (p=0.000 n=9+10)
      
      Change-Id: I1ae16b0ce0e2641d2bdc7d7eabc94dd35f6e9318
      Reviewed-on: https://go-review.googlesource.com/22195Reviewed-by: default avatarAdam Langley <agl@golang.org>
      c717675c
    • Michael Munday's avatar
      crypto/cipher, crypto/aes: add s390x implementation of AES-CBC · 2f847564
      Michael Munday authored
      This commit adds the cbcEncAble and cbcDecAble interfaces that
      can be implemented by block ciphers that support an optimized
      implementation of CBC. This is similar to what is done for GCM
      with the gcmAble interface.
      
      The cbcEncAble, cbcDecAble and gcmAble interfaces all now have
      tests to ensure they are detected correctly in the cipher
      package.
      
      name             old speed     new speed      delta
      AESCBCEncrypt1K  152MB/s ± 1%  1362MB/s ± 0%  +795.59%   (p=0.000 n=10+9)
      AESCBCDecrypt1K  143MB/s ± 1%  1362MB/s ± 0%  +853.00%   (p=0.000 n=10+9)
      
      Change-Id: I715f686ab3686b189a3dac02f86001178fa60580
      Reviewed-on: https://go-review.googlesource.com/22523
      Run-TryBot: Michael Munday <munday@ca.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAdam Langley <agl@golang.org>
      2f847564
    • Keith Randall's avatar
      cmd/compile: make vet happy with ssa code · cd956576
      Keith Randall authored
      Fixes #15488
      
      Change-Id: I054eb1e1c859de315e3cdbdef5428682bce693fd
      Reviewed-on: https://go-review.googlesource.com/22609
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      cd956576
    • Rick Hudson's avatar
      Merge remote-tracking branch 'origin/dev.garbage' · 56b54912
      Rick Hudson authored
      This commit moves the GC from free list allocation to
      bit mark allocation. Instead of using the bitmaps
      generated during the mark phases to generate free
      list and then using the free lists for allocation we
      allocate directly from the bitmaps.
      
      The change in the garbage benchmark
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  2.22ms ± 1%  2.13ms ± 1%  -3.90%  (p=0.000 n=18+18)
      
      Change-Id: I17f57233336f0ca5ef5404c3be4ecb443ab622aa
      56b54912
    • Rick Hudson's avatar
      [dev.garbage] runtime: simplify nextFreeFast so it is inlined · e9eaa181
      Rick Hudson authored
      nextFreeFast is currently not inlined by the compiler due
      to its size and complexity. This CL simplifies
      nextFreeFast by letting the slow path handle (nextFree)
      handle a corner cases.
      
      Change-Id: Ia9c5d1a7912bcb4bec072f5fd240f0e0bafb20e4
      Reviewed-on: https://go-review.googlesource.com/22598Reviewed-by: default avatarAustin Clements <austin@google.com>
      Run-TryBot: Austin Clements <austin@google.com>
      e9eaa181
    • David Chase's avatar
      cmd/compile: Move divconst_test out of test/bench/go1 · d8d33514
      David Chase authored
      This is necessary to avoid disrupting the go1 suite and gives
      us a place to put other tests of basic compiler function and
      correctness.
      
      Change-Id: I36933819ff2bfe6a2121fff2be9a98efd2123d9a
      Reviewed-on: https://go-review.googlesource.com/22597
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      d8d33514
    • Keith Randall's avatar
      cmd/compile: clean up rewrite rules · fa9435cd
      Keith Randall authored
      Break really long lines.
      Add spacing to line up columns.
      
      In AMD64, put all the optimization rules after all the
      lowering rules.
      
      Change-Id: I45cc7368bf278416e67f89e74358db1bd4326a93
      Reviewed-on: https://go-review.googlesource.com/22470Reviewed-by: default avatarDavid Chase <drchase@google.com>
      fa9435cd
    • Austin Clements's avatar
      [dev.garbage] runtime: revive sweep fast path · b3579c09
      Austin Clements authored
      sweep used to skip mcental.freeSpan (and its locking) if it didn't
      find any new free objects. We lost that optimization when the
      freed-object counting changed in dad83f7 to count total free objects
      instead of newly freed objects.
      
      The previous commit brings back counting of newly freed objects, so we
      can easily revive this optimization by checking that count (like we
      used to) instead of the total free objects count.
      
      Change-Id: I43658707a1c61674d0366124d5976b00d98741a9
      Reviewed-on: https://go-review.googlesource.com/22596
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      b3579c09
    • Austin Clements's avatar
      [dev.garbage] runtime: fix nfree accounting · d97625ae
      Austin Clements authored
      Commit 8dda1c4c changed the meaning of "nfree" in sweep from the number
      of newly freed objects to the total number of free objects in the
      span, but didn't update where sweep added nfree to c.local_nsmallfree.
      Hence, we're over-accounting the number of frees. This is causing
      TestArrayHash to fail with "too many allocs NNN - hash not balanced".
      
      Fix this by computing the number of newly freed objects and adding
      that to c.local_nsmallfree, so it behaves like it used to. Computing
      this requires a small tweak to mallocgc: apparently we've never set
      s.allocCount when allocating a large object; fix this by setting it to
      1 so sweep doesn't get confused.
      
      Change-Id: I31902ffd310110da4ffd807c5c06f1117b872dc8
      Reviewed-on: https://go-review.googlesource.com/22595Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      d97625ae
    • Austin Clements's avatar
      [dev.garbage] runtime: fix allocfreetrace · 6d114905
      Austin Clements authored
      We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode
      when we removed the sweep over the mark bitmap. Fix it by
      re-introducing the sweep over the bitmap specifically if we're in
      allocfreetrace mode. This doesn't have to be even remotely efficient,
      since the overhead of allocfreetrace is huge anyway, so we can keep
      the code for this down to just a few lines.
      
      Change-Id: I9e176b3b04c73608a0ea3068d5d0cd30760ebd40
      Reviewed-on: https://go-review.googlesource.com/22592
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      6d114905
    • Austin Clements's avatar
      [dev.garbage] runtime: reintroduce no-zeroing optimization · 38f67468
      Austin Clements authored
      Currently we always zero objects when we allocate them. We used to
      have an optimization that would not zero objects that had not been
      allocated since the whole span was last zeroed (either by getting it
      from the system or by getting it from the heap, which does a bulk
      zero), but this depended on the sweeper clobbering the first two words
      of each object. Hence, we lost this optimization when the bitmap
      sweeper went away.
      
      Re-introduce this optimization using a different mechanism. Each span
      already keeps a flag indicating that it just came from the OS or was
      just bulk zeroed by the mheap. We can simply use this flag to know
      when we don't need to zero an object. This is slightly less efficient
      than the old optimization: if a span gets allocated and partially
      used, then GC happens and the span gets returned to the mcentral, then
      the span gets re-acquired, the old optimization knew that it only had
      to re-zero the objects that had been reclaimed, whereas this
      optimization will re-zero everything. However, in this case, you're
      already paying for the garbage collection, and you've only wasted one
      zeroing of the span, so in practice there seems to be little
      difference. (If we did want to revive the full optimization, each span
      could keep track of a frontier beyond which all free slots are zeroed.
      I prototyped this and it didn't obvious do any better than the much
      simpler approach in this commit.)
      
      This significantly improves BinaryTree17, which is allocation-heavy
      (and runs first, so most pages are already zeroed), and slightly
      improves everything else.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  2.15ms ± 1%  2.14ms ± 1%  -0.80%  (p=0.000 n=17+17)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.71s ± 1%     2.56s ± 1%  -5.73%        (p=0.000 n=18+19)
      DivconstI64-12              1.70ns ± 1%    1.70ns ± 1%    ~           (p=0.562 n=18+18)
      DivconstU64-12              1.74ns ± 2%    1.74ns ± 1%    ~           (p=0.394 n=20+20)
      DivconstI32-12              1.74ns ± 0%    1.74ns ± 0%    ~     (all samples are equal)
      DivconstU32-12              1.66ns ± 1%    1.66ns ± 0%    ~           (p=0.516 n=15+16)
      DivconstI16-12              1.84ns ± 0%    1.84ns ± 0%    ~     (all samples are equal)
      DivconstU16-12              1.82ns ± 0%    1.82ns ± 0%    ~     (all samples are equal)
      DivconstI8-12               1.79ns ± 0%    1.79ns ± 0%    ~     (all samples are equal)
      DivconstU8-12               1.60ns ± 0%    1.60ns ± 1%    ~           (p=0.603 n=17+19)
      Fannkuch11-12                2.11s ± 1%     2.11s ± 0%    ~           (p=0.333 n=16+19)
      FmtFprintfEmpty-12          45.1ns ± 4%    45.4ns ± 5%    ~           (p=0.111 n=20+20)
      FmtFprintfString-12          134ns ± 0%     129ns ± 0%  -3.45%        (p=0.000 n=18+16)
      FmtFprintfInt-12             131ns ± 1%     129ns ± 1%  -1.54%        (p=0.000 n=16+18)
      FmtFprintfIntInt-12          205ns ± 2%     203ns ± 0%  -0.56%        (p=0.014 n=20+18)
      FmtFprintfPrefixedInt-12     200ns ± 2%     197ns ± 1%  -1.48%        (p=0.000 n=20+18)
      FmtFprintfFloat-12           256ns ± 1%     256ns ± 0%  -0.21%        (p=0.008 n=18+20)
      FmtManyArgs-12               805ns ± 0%     804ns ± 0%  -0.19%        (p=0.001 n=18+18)
      GobDecode-12                7.21ms ± 1%    7.14ms ± 1%  -0.92%        (p=0.000 n=19+20)
      GobEncode-12                5.88ms ± 1%    5.88ms ± 1%    ~           (p=0.641 n=18+19)
      Gzip-12                      218ms ± 1%     218ms ± 1%    ~           (p=0.271 n=19+18)
      Gunzip-12                   37.1ms ± 0%    36.9ms ± 0%  -0.29%        (p=0.000 n=18+17)
      HTTPClientServer-12         78.1µs ± 2%    77.4µs ± 2%    ~           (p=0.070 n=19+19)
      JSONEncode-12               15.5ms ± 1%    15.5ms ± 0%    ~           (p=0.063 n=20+18)
      JSONDecode-12               56.1ms ± 0%    55.4ms ± 1%  -1.18%        (p=0.000 n=19+18)
      Mandelbrot200-12            4.05ms ± 0%    4.06ms ± 0%  +0.29%        (p=0.001 n=18+18)
      GoParse-12                  3.28ms ± 1%    3.21ms ± 1%  -2.30%        (p=0.000 n=20+20)
      RegexpMatchEasy0_32-12      69.4ns ± 2%    69.3ns ± 1%    ~           (p=0.205 n=18+16)
      RegexpMatchEasy0_1K-12       239ns ± 0%     239ns ± 0%    ~     (all samples are equal)
      RegexpMatchEasy1_32-12      69.4ns ± 1%    69.4ns ± 1%    ~           (p=0.620 n=15+18)
      RegexpMatchEasy1_1K-12       370ns ± 1%     369ns ± 2%    ~           (p=0.088 n=20+20)
      RegexpMatchMedium_32-12      108ns ± 0%     108ns ± 0%    ~     (all samples are equal)
      RegexpMatchMedium_1K-12     33.6µs ± 3%    33.5µs ± 3%    ~           (p=0.718 n=20+20)
      RegexpMatchHard_32-12       1.68µs ± 1%    1.67µs ± 2%    ~           (p=0.316 n=20+20)
      RegexpMatchHard_1K-12       50.5µs ± 3%    50.4µs ± 3%    ~           (p=0.659 n=20+20)
      Revcomp-12                   381ms ± 1%     381ms ± 1%    ~           (p=0.916 n=19+18)
      Template-12                 66.5ms ± 1%    65.8ms ± 2%  -1.08%        (p=0.000 n=20+20)
      TimeParse-12                 317ns ± 0%     319ns ± 0%  +0.48%        (p=0.000 n=19+12)
      TimeFormat-12                338ns ± 0%     338ns ± 0%    ~           (p=0.124 n=19+18)
      [Geo mean]                  5.99µs         5.96µs       -0.54%
      
      Change-Id: I638ffd9d9f178835bbfa499bac20bd7224f1a907
      Reviewed-on: https://go-review.googlesource.com/22591Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      38f67468
    • Nigel Tao's avatar
      compress/flate: use a constant hash table size for Best Speed. · 1fb4e4de
      Nigel Tao authored
      This makes compress/flate's version of Snappy diverge from the upstream
      golang/snappy version, but the latter has a goal of matching C++ snappy
      output byte-for-byte. Both C++ and the asm version of golang/snappy can
      use a smaller N for the O(N) zero-initialization of the hash table when
      the input is small, even if the pure Go golang/snappy algorithm cannot:
      "var table [tableSize]uint16" zeroes all tableSize elements.
      
      For this package, we don't have the match-C++-snappy goal, so we can use
      a different (constant) hash table size.
      
      This is a small win, in terms of throughput and output size, but it also
      enables us to re-use the (constant size) hash table between
      encodeBestSpeed calls, avoiding the cost of zero-initializing the hash
      table altogether. This will be implemented in follow-up commits.
      
      This package's benchmarks:
      name                    old speed      new speed      delta
      EncodeDigitsSpeed1e4-8  72.8MB/s ± 1%  73.5MB/s ± 1%  +0.86%  (p=0.000 n=10+10)
      EncodeDigitsSpeed1e5-8  77.5MB/s ± 1%  78.0MB/s ± 0%  +0.69%  (p=0.000 n=10+10)
      EncodeDigitsSpeed1e6-8  82.0MB/s ± 1%  82.7MB/s ± 1%  +0.85%   (p=0.000 n=10+9)
      EncodeTwainSpeed1e4-8   65.1MB/s ± 1%  65.6MB/s ± 0%  +0.78%   (p=0.000 n=10+9)
      EncodeTwainSpeed1e5-8   80.0MB/s ± 0%  80.6MB/s ± 1%  +0.66%   (p=0.000 n=9+10)
      EncodeTwainSpeed1e6-8   81.6MB/s ± 1%  82.1MB/s ± 1%  +0.55%  (p=0.017 n=10+10)
      
      Input size in bytes, output size (and time taken) before and after on
      some larger files:
      1073741824   57269781 (  3183ms)   57269781 (  3177ms) adresser.001
      1000000000  391052000 ( 11071ms)  391051996 ( 11067ms) enwik9
      1911399616  378679516 ( 13450ms)  378679514 ( 13079ms) gob-stream
      8558382592 3972329193 ( 99962ms) 3972329193 ( 91290ms) rawstudio-mint14.tar
       200000000  200015265 (   776ms)  200015265 (   774ms) sharnd.out
      
      Thanks to Klaus Post for the original suggestion on cl/21021.
      
      Change-Id: Ia4c63a8d1b92c67e1765ec5c3c8c69d289d9a6ce
      Reviewed-on: https://go-review.googlesource.com/22604Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      1fb4e4de