1. 23 Mar, 2016 11 commits
    • Lynn Boger's avatar
      bytes: Equal perf improvements on ppc64le/ppc64 · baec1487
      Lynn Boger authored
      The existing implementation for Equal and similar
      functions in the bytes package operate on one byte at
      at time.  This performs poorly on ppc64/ppc64le especially
      when the byte buffers are large.  This change improves
      those functions by loading and comparing double words where
      possible.  The common code has been moved to a function
      that can be shared by the other functions in this
      file which perform the same type of comparison.
      Further optimizations are done for the case where
      >= 32 bytes are being compared.  The new function
      memeqbody is used by memeq_varlen, Equal, and eqstring.
      
      When running the bytes test with -test.bench=Equal
      
      benchmark                     old MB/s     new MB/s     speedup
      BenchmarkEqual1               164.83       129.49       0.79x
      BenchmarkEqual6               563.51       445.47       0.79x
      BenchmarkEqual9               656.15       1099.00      1.67x
      BenchmarkEqual15              591.93       1024.30      1.73x
      BenchmarkEqual16              613.25       1914.12      3.12x
      BenchmarkEqual20              682.37       1687.04      2.47x
      BenchmarkEqual32              807.96       3843.29      4.76x
      BenchmarkEqual4K              1076.25      23280.51     21.63x
      BenchmarkEqual4M              1079.30      13120.14     12.16x
      BenchmarkEqual64M             1073.28      10876.92     10.13x
      
      It was determined that the degradation in the smaller byte tests
      were due to unfavorable code alignment of the single byte loop.
      
      Fixes #14368
      
      Change-Id: I0dd87382c28887c70f4fbe80877a8ba03c31d7cd
      Reviewed-on: https://go-review.googlesource.com/20249Reviewed-by: default avatarMinux Ma <minux@golang.org>
      baec1487
    • Shahar Kohanim's avatar
      cmd/link: Clean up Pcln struct · 516c6b40
      Shahar Kohanim authored
      Removes unnecessary fields from Pcln.
      
      Change-Id: I175049ca749b510eedaf65162355bc4d7a93315e
      Reviewed-on: https://go-review.googlesource.com/21041Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      516c6b40
    • Klaus Post's avatar
      compress/flate: rework matching algorithm · 53efe1e1
      Klaus Post authored
      This changes how matching is done in deflate algorithm.
      
      The major change is that we do not look for matches that are only
      3 bytes in length, matches must be 4 bytes at least.
      Contrary to what you would expect this actually improves the
      compresion ratio, since 3 literal bytes will often be shorter
      than a match after huffman encoding.
      This varies a bit by source, but is most often the case when the
      source is "easy" to compress.
      
      Second of all, a "stronger" hash is used. The hash is similar to
      the hashing function used by Snappy.
      
      Overall, the speed impact is biggest on higher compression levels.
      I intend to replace the "speed" compression level, which can be
      seen in CL 21021.
      
      The built-in benchmark using "digits" is slower at level 1.
      I see this as an exception, since "digits" is a special type
      of data, where you have low entropy (numbers 0->9), but no
      significant matches. Again, CL 20021 fixes that case.
      
      NewWriterDict is also made considerably faster, by not running data
      through the entire encoder. This is not reflected by the benchmark.
      
      Overall, the speed impact is biggest on higher compression levels.
      I intend to replace the "speed" compression level.
      
      COMPARED to tip/master:
      name                       old time/op    new time/op     delta
      EncodeDigitsSpeed1e4-4        401µs ± 1%      345µs ± 2%   -13.95%
      EncodeDigitsSpeed1e5-4       3.19ms ± 1%     4.27ms ± 3%   +33.96%
      EncodeDigitsSpeed1e6-4       27.7ms ± 4%     43.8ms ± 3%   +58.00%
      EncodeDigitsDefault1e4-4      641µs ± 0%      403µs ± 1%   -37.15%
      EncodeDigitsDefault1e5-4     13.8ms ± 1%      6.4ms ± 3%   -53.73%
      EncodeDigitsDefault1e6-4      162ms ± 1%       64ms ± 2%   -60.51%
      EncodeDigitsCompress1e4-4     627µs ± 1%      405µs ± 2%   -35.45%
      EncodeDigitsCompress1e5-4    13.9ms ± 0%      6.3ms ± 2%   -54.46%
      EncodeDigitsCompress1e6-4     159ms ± 1%       64ms ± 0%   -59.91%
      EncodeTwainSpeed1e4-4         433µs ± 4%      331µs ± 1%   -23.53%
      EncodeTwainSpeed1e5-4        2.82ms ± 1%     3.08ms ± 0%    +9.10%
      EncodeTwainSpeed1e6-4        28.1ms ± 2%     28.8ms ± 0%    +2.82%
      EncodeTwainDefault1e4-4       695µs ± 4%      474µs ± 1%   -31.78%
      EncodeTwainDefault1e5-4      11.8ms ± 0%      7.4ms ± 0%   -37.31%
      EncodeTwainDefault1e6-4       128ms ± 0%       75ms ± 0%   -40.93%
      EncodeTwainCompress1e4-4      719µs ± 3%      480µs ± 0%   -33.27%
      EncodeTwainCompress1e5-4     15.0ms ± 3%      8.2ms ± 2%   -45.55%
      EncodeTwainCompress1e6-4      170ms ± 0%       85ms ± 1%   -49.99%
      
      name                       old speed      new speed       delta
      EncodeDigitsSpeed1e4-4     25.0MB/s ± 1%   29.0MB/s ± 2%   +16.24%
      EncodeDigitsSpeed1e5-4     31.4MB/s ± 1%   23.4MB/s ± 3%   -25.34%
      EncodeDigitsSpeed1e6-4     36.1MB/s ± 4%   22.8MB/s ± 3%   -36.74%
      EncodeDigitsDefault1e4-4   15.6MB/s ± 0%   24.8MB/s ± 1%   +59.11%
      EncodeDigitsDefault1e5-4   7.27MB/s ± 1%  15.72MB/s ± 3%  +116.23%
      EncodeDigitsDefault1e6-4   6.16MB/s ± 0%  15.60MB/s ± 2%  +153.25%
      EncodeDigitsCompress1e4-4  15.9MB/s ± 1%   24.7MB/s ± 2%   +54.97%
      EncodeDigitsCompress1e5-4  7.19MB/s ± 0%  15.78MB/s ± 2%  +119.62%
      EncodeDigitsCompress1e6-4  6.27MB/s ± 1%  15.65MB/s ± 0%  +149.52%
      EncodeTwainSpeed1e4-4      23.1MB/s ± 4%   30.2MB/s ± 1%   +30.68%
      EncodeTwainSpeed1e5-4      35.4MB/s ± 1%   32.5MB/s ± 0%    -8.34%
      EncodeTwainSpeed1e6-4      35.6MB/s ± 2%   34.7MB/s ± 0%    -2.77%
      EncodeTwainDefault1e4-4    14.4MB/s ± 4%   21.1MB/s ± 1%   +46.48%
      EncodeTwainDefault1e5-4    8.49MB/s ± 0%  13.55MB/s ± 0%   +59.50%
      EncodeTwainDefault1e6-4    7.83MB/s ± 0%  13.25MB/s ± 0%   +69.19%
      EncodeTwainCompress1e4-4   13.9MB/s ± 3%   20.8MB/s ± 0%   +49.83%
      EncodeTwainCompress1e5-4   6.65MB/s ± 3%  12.20MB/s ± 2%   +83.51%
      EncodeTwainCompress1e6-4   5.88MB/s ± 0%  11.76MB/s ± 1%  +100.06%
      
      Change-Id: I724e33c1dd3e3a6a1b0a68e094baa959352baf32
      Reviewed-on: https://go-review.googlesource.com/20929
      Run-TryBot: Nigel Tao <nigeltao@golang.org>
      Reviewed-by: default avatarNigel Tao <nigeltao@golang.org>
      53efe1e1
    • Dave Cheney's avatar
      cmd/compile/internal/ssa: avoid string conversion in zcse · e6beec1f
      Dave Cheney authored
      Some ssa.Type implementations fell through to gc.Tconv which generated
      garbage to produce a string form of the Type.
      
      name      old time/op    new time/op    delta
      Template     405ms ± 7%     401ms ± 6%    ~     (p=0.478 n=20+20)
      GoTypes      1.32s ± 1%     1.30s ± 2%  -1.27%  (p=0.000 n=19+20)
      Compiler     6.07s ± 2%     6.03s ± 2%    ~     (p=0.121 n=20+20)
      
      name      old alloc/op   new alloc/op   delta
      Template    63.9MB ± 0%    63.7MB ± 0%  -0.21%  (p=0.000 n=19+20)
      GoTypes      220MB ± 0%     219MB ± 0%  -0.21%  (p=0.000 n=20+20)
      Compiler     966MB ± 0%     965MB ± 0%  -0.11%  (p=0.000 n=20+20)
      
      name      old allocs/op  new allocs/op  delta
      Template      708k ± 0%      701k ± 0%  -0.99%  (p=0.000 n=20+20)
      GoTypes      2.20M ± 0%     2.17M ± 0%  -1.43%  (p=0.000 n=17+20)
      Compiler     9.45M ± 0%     9.36M ± 0%  -0.91%  (p=0.000 n=20+20)
      
      Change-Id: I5fcc30e0f76a823d1c301d4980b583d716a75ce3
      Reviewed-on: https://go-review.googlesource.com/20844Reviewed-by: default avatarKeith Randall <khr@golang.org>
      e6beec1f
    • Dave Cheney's avatar
      cmd/compile/internal/gc: remove redundant Nod(OXXX, ...) pattern · a4be24cb
      Dave Cheney authored
      The pattern
      
          n := Nod(OXXX, nil, nil)
          Nodconst(n, ...)
      
      was a leftover from the C days where n must be heap allocated.
      
      No change in benchmarks, none expected as n escapes anyway.
      
      name      old time/op    new time/op    delta
      Template     391ms ± 6%     388ms ± 5%    ~     (p=0.659 n=20+20)
      GoTypes      1.27s ± 1%     1.27s ± 2%    ~     (p=0.828 n=18+20)
      Compiler     6.16s ± 2%     6.15s ± 1%    ~     (p=0.947 n=20+20)
      
      name      old alloc/op   new alloc/op   delta
      Template    63.7MB ± 0%    63.7MB ± 0%    ~     (p=0.414 n=20+20)
      GoTypes      219MB ± 0%     219MB ± 0%    ~     (p=0.904 n=20+20)
      Compiler     980MB ± 0%     980MB ± 0%  +0.00%  (p=0.007 n=20+19)
      
      name      old allocs/op  new allocs/op  delta
      Template      586k ± 0%      586k ± 0%    ~     (p=0.564 n=19+20)
      GoTypes      1.80M ± 0%     1.80M ± 0%    ~     (p=0.718 n=20+20)
      Compiler     7.74M ± 0%     7.74M ± 0%    ~     (p=0.358 n=20+20)
      
      The reuse of nc in multiple overlapping scopes in walk.go is the worst.
      
      Change-Id: I4ed6a63f7ffbfff68124ad609f6e3a68d95cbbba
      Reviewed-on: https://go-review.googlesource.com/21015Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a4be24cb
    • Aliaksandr Valialkin's avatar
      cmd/vet: check lock copy in function calls and return statements · 1374515a
      Aliaksandr Valialkin authored
      Fixes #14529
      
      Change-Id: I6ed059d279ba0fe12d76416859659f28d61781d2
      Reviewed-on: https://go-review.googlesource.com/20832
      Run-TryBot: Rob Pike <r@golang.org>
      Reviewed-by: default avatarRob Pike <r@golang.org>
      1374515a
    • Martin Möhrmann's avatar
      fmt: cleanup and optimize doPrintf for simple formats · 49da9312
      Martin Möhrmann authored
      Make a fast path for format strings that do not use
      precision or width specifications or argument indices.
      
      Only check and enforce the restriction to not pad left with zeros
      in code paths that change either f.minus or f.zero.
      
      Consolidate the if chains at the end of the main doPrintf loop
      into a switch statement. Move error printing into extra
      functions to reduce size of this switch statement.
      
      name                             old time/op  new time/op  delta
      SprintfPadding-2                  234ns ± 1%   233ns ± 1%   -0.54%  (p=0.010 n=19+19)
      SprintfEmpty-2                   37.0ns ± 3%  39.1ns ±14%     ~     (p=0.501 n=17+20)
      SprintfString-2                   112ns ± 1%   101ns ± 1%   -9.21%  (p=0.000 n=19+20)
      SprintfTruncateString-2           139ns ± 1%   139ns ± 0%   +0.57%  (p=0.000 n=19+19)
      SprintfQuoteString-2              402ns ± 0%   392ns ± 0%   -2.35%  (p=0.000 n=19+20)
      SprintfInt-2                      114ns ± 1%   102ns ± 2%  -10.92%  (p=0.000 n=20+20)
      SprintfIntInt-2                   177ns ± 2%   155ns ± 2%  -12.67%  (p=0.000 n=18+18)
      SprintfPrefixedInt-2              260ns ± 3%   249ns ± 3%   -4.55%  (p=0.000 n=20+20)
      SprintfFloat-2                    190ns ± 1%   178ns ± 2%   -6.54%  (p=0.000 n=20+20)
      SprintfComplex-2                  533ns ± 1%   517ns ± 3%   -2.95%  (p=0.000 n=20+20)
      SprintfBoolean-2                  102ns ± 1%    93ns ± 2%   -9.30%  (p=0.000 n=20+20)
      SprintfHexString-2                176ns ± 0%   168ns ± 2%   -4.49%  (p=0.000 n=16+19)
      SprintfHexBytes-2                 181ns ± 1%   174ns ± 2%   -4.27%  (p=0.000 n=20+20)
      SprintfBytes-2                    326ns ± 1%   311ns ± 1%   -4.51%  (p=0.000 n=20+20)
      ManyArgs-2                        540ns ± 2%   497ns ± 1%   -8.08%  (p=0.000 n=18+16)
      FprintInt-2                       150ns ± 0%   149ns ± 0%   -0.33%  (p=0.000 n=20+18)
      FprintfBytes-2                    185ns ± 0%   165ns ± 0%  -10.98%  (p=0.000 n=20+18)
      FprintIntNoAlloc-2                113ns ± 0%   112ns ± 0%   -0.88%  (p=0.000 n=20+20)
      
      Change-Id: I9ada8faa1f46aa67ea116a94ab3f4ad3e405c8fe
      Reviewed-on: https://go-review.googlesource.com/20919
      Run-TryBot: Rob Pike <r@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRob Pike <r@golang.org>
      49da9312
    • Tamir Duberstein's avatar
      database/sql/driver: remove string exclusion · 7162c4d0
      Tamir Duberstein authored
      The exclusion of string from IsScanValue prevents driver authors from
      writing their drivers in such a way that would allow users to
      distinguish between strings and byte arrays returned from a database.
      Such drivers are possible today, but require their authors to deviate
      from the guidance provided by the standard library.
      
      This exclusion has been in place since the birth of this package in
      https://github.com/golang/go/commit/357f2cb1a385f4d1418e48856f9abe0cce,
      but the fakedb implementation shipped in the same commit violates the
      exclusion!
      
      Strictly speaking this is a breaking change, but it increases the set
      of permissible Scan types, and should not cause breakage in practice.
      
      No test changes are necessary because fakedb already exercises this.
      
      Fixes #6497.
      
      Change-Id: I69dbd3a59d90464bcae8c852d7ec6c97bfd120f8
      Reviewed-on: https://go-review.googlesource.com/19439
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      7162c4d0
    • Ian Lance Taylor's avatar
      misc/cgo/testcarchive: rewrite test from bash to Go · bac0005e
      Ian Lance Taylor authored
      This is to support https://golang.org/cl/18057, which is going to add
      Windows support to this directory.  Better to write the test in Go then
      to have both test.bash and test.bat.
      
      Update #13494.
      
      Change-Id: I4af7004416309e885049ee60b9470926282f210d
      Reviewed-on: https://go-review.googlesource.com/20892
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      bac0005e
    • Mikio Hara's avatar
      cmd/dist: disable misc/cgo/fortran test on dragonfly · bafa0275
      Mikio Hara authored
      Updates #14544.
      
      Change-Id: I24ab8e6f9ad9d290a672216fc2f50f78c3ed8812
      Reviewed-on: https://go-review.googlesource.com/21014
      Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      bafa0275
    • Keith Randall's avatar
      cmd/compile: MOVBload and MOVBQZXload are the same op · 68e86e6d
      Keith Randall authored
      No need to have both ops when they do the same thing.
      Just declare MOVBload to zero extend and we can get rid
      of MOVBQZXload.  Same for W and L.
      
      Kind of a followon cleanup for https://go-review.googlesource.com/c/19506/
      Should enable an easier fix for #14920
      
      Change-Id: I7cfac909a8ba387f433a6ae75c050740ebb34d42
      Reviewed-on: https://go-review.googlesource.com/21004
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      68e86e6d
  2. 22 Mar, 2016 18 commits
  3. 21 Mar, 2016 11 commits