1. 16 Sep, 2017 2 commits
    • Giovanni Bajo's avatar
      runtime: improve fastrand with a better generator · e7e4a4ff
      Giovanni Bajo authored
      The current generator is a simple LSFR, which showed strong
      correlation in higher bits, as manifested by fastrandn().
      
      Change it with xorshift64+, which is slightly more complex,
      has a larger state, but has a period of 2^64-1 and is much better
      at statistical tests. The version used here is capable of
      passing Diehard and even SmallCrush.
      
      Speed is slightly worse but is probably insignificant:
      
      name                old time/op  new time/op  delta
      Fastrand-4          0.77ns ±12%  0.91ns ±21%  +17.31%  (p=0.048 n=5+5)
      FastrandHashiter-4  13.6ns ±21%  15.2ns ±17%     ~     (p=0.160 n=6+5)
      Fastrandn/2-4       2.30ns ± 5%  2.45ns ±15%     ~     (p=0.222 n=5+5)
      Fastrandn/3-4       2.36ns ± 7%  2.45ns ± 6%     ~     (p=0.222 n=5+5)
      Fastrandn/4-4       2.33ns ± 8%  2.61ns ±30%     ~     (p=0.126 n=6+5)
      Fastrandn/5-4       2.33ns ± 5%  2.48ns ± 9%     ~     (p=0.052 n=6+5)
      
      Fixes #21806
      
      Change-Id: I013bb37b463fdfc229a7f324df8fe2da8d286f33
      Reviewed-on: https://go-review.googlesource.com/62530
      Run-TryBot: Michael Munday <mike.munday@ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      e7e4a4ff
    • Michael Munday's avatar
      cmd/compile: test constant folded integer to/from float conversions · a5d6b414
      Michael Munday authored
      Improves test coverage of the rules added in CL 63795 and would have
      detected the bug fixed by CL 63950.
      
      Change-Id: I107ee8d8e0b6684ce85b2446bd5018c5a03d608a
      Reviewed-on: https://go-review.googlesource.com/64130Reviewed-by: default avatarKeith Randall <khr@golang.org>
      a5d6b414
  2. 15 Sep, 2017 11 commits
    • Ben Shi's avatar
      cmd/compile: optimize ARM code with MULAF/MULSF/MULAD/MULSD · a07176b4
      Ben Shi authored
      The go compiler can generate better ARM code with those more
      efficient FP instructions. And there is little improvement
      in total but big improvement in special cases.
      
      1. The size of pkg/linux_arm/math.a shrinks by 2.4%.
      
      2. there is neither improvement nor regression in compilecmp benchmark.
      name        old time/op       new time/op       delta
      Template          2.32s ± 2%        2.32s ± 1%    ~     (p=1.000 n=9+10)
      Unicode           1.32s ± 4%        1.32s ± 4%    ~     (p=0.912 n=10+10)
      GoTypes           7.76s ± 1%        7.79s ± 1%    ~     (p=0.447 n=9+10)
      Compiler          37.4s ± 2%        37.2s ± 2%    ~     (p=0.218 n=10+10)
      SSA               84.8s ± 2%        85.0s ± 1%    ~     (p=0.604 n=10+9)
      Flate             1.45s ± 2%        1.44s ± 2%    ~     (p=0.075 n=10+10)
      GoParser          1.82s ± 1%        1.81s ± 1%    ~     (p=0.190 n=10+10)
      Reflect           5.06s ± 1%        5.05s ± 1%    ~     (p=0.315 n=10+9)
      Tar               2.37s ± 1%        2.37s ± 2%    ~     (p=0.912 n=10+10)
      XML               2.56s ± 1%        2.58s ± 2%    ~     (p=0.089 n=10+10)
      [Geo mean]        4.77s             4.77s       -0.08%
      
      name        old user-time/op  new user-time/op  delta
      Template          2.74s ± 2%        2.75s ± 2%    ~     (p=0.856 n=9+10)
      Unicode           1.61s ± 4%        1.62s ± 3%    ~     (p=0.693 n=10+10)
      GoTypes           9.55s ± 1%        9.49s ± 2%    ~     (p=0.056 n=9+10)
      Compiler          45.9s ± 1%        45.8s ± 1%    ~     (p=0.345 n=9+10)
      SSA                110s ± 1%         110s ± 1%    ~     (p=0.763 n=9+10)
      Flate             1.68s ± 2%        1.68s ± 3%    ~     (p=0.616 n=10+10)
      GoParser          2.14s ± 4%        2.14s ± 1%    ~     (p=0.825 n=10+9)
      Reflect           5.95s ± 1%        5.97s ± 3%    ~     (p=0.951 n=9+10)
      Tar               2.94s ± 3%        2.93s ± 2%    ~     (p=0.359 n=10+10)
      XML               3.03s ± 3%        3.07s ± 6%    ~     (p=0.166 n=10+10)
      [Geo mean]        5.76s             5.77s       +0.12%
      
      name        old text-bytes    new text-bytes    delta
      HelloSize         588kB ± 0%        588kB ± 0%    ~     (all equal)
      
      name        old data-bytes    new data-bytes    delta
      HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)
      
      name        old bss-bytes     new bss-bytes     delta
      HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)
      
      3. The performance of Mandelbrot200 improves 15%, though little
         improvement in total.
      name                     old time/op    new time/op    delta
      BinaryTree17-4              41.7s ± 1%     41.7s ± 1%     ~     (p=0.264 n=29+23)
      Fannkuch11-4                24.2s ± 0%     24.1s ± 1%   -0.13%  (p=0.050 n=30+30)
      FmtFprintfEmpty-4           826ns ± 1%     824ns ± 1%   -0.24%  (p=0.038 n=25+30)
      FmtFprintfString-4         1.38µs ± 1%    1.38µs ± 0%   -0.42%  (p=0.000 n=27+25)
      FmtFprintfInt-4            1.46µs ± 1%    1.46µs ± 0%     ~     (p=0.060 n=30+23)
      FmtFprintfIntInt-4         2.11µs ± 1%    2.08µs ± 0%   -1.04%  (p=0.000 n=30+30)
      FmtFprintfPrefixedInt-4    2.23µs ± 1%    2.22µs ± 1%   -0.51%  (p=0.000 n=30+30)
      FmtFprintfFloat-4          4.49µs ± 1%    4.48µs ± 1%   -0.22%  (p=0.004 n=26+30)
      FmtManyArgs-4              8.06µs ± 1%    8.12µs ± 1%   +0.68%  (p=0.000 n=25+30)
      GobDecode-4                 104ms ± 1%     104ms ± 2%     ~     (p=0.362 n=29+29)
      GobEncode-4                92.9ms ± 1%    92.8ms ± 2%     ~     (p=0.786 n=30+30)
      Gzip-4                      4.12s ± 1%     4.12s ± 1%     ~     (p=0.314 n=30+30)
      Gunzip-4                    602ms ± 1%     603ms ± 1%     ~     (p=0.164 n=30+30)
      HTTPClientServer-4          659µs ± 1%     655µs ± 2%   -0.64%  (p=0.006 n=25+28)
      JSONEncode-4                234ms ± 1%     235ms ± 1%   +0.29%  (p=0.050 n=30+30)
      JSONDecode-4                912ms ± 0%     911ms ± 0%     ~     (p=0.385 n=18+24)
      Mandelbrot200-4            49.2ms ± 0%    41.7ms ± 0%  -15.35%  (p=0.000 n=25+27)
      GoParse-4                  46.3ms ± 1%    46.3ms ± 2%     ~     (p=0.572 n=30+30)
      RegexpMatchEasy0_32-4      1.29µs ± 1%    1.27µs ± 0%   -1.59%  (p=0.000 n=30+30)
      RegexpMatchEasy0_1K-4      7.62µs ± 4%    7.71µs ± 3%     ~     (p=0.074 n=30+30)
      RegexpMatchEasy1_32-4      1.31µs ± 0%    1.30µs ± 1%   -0.71%  (p=0.000 n=23+30)
      RegexpMatchEasy1_1K-4      10.3µs ± 3%    10.3µs ± 5%     ~     (p=0.105 n=30+30)
      RegexpMatchMedium_32-4     2.06µs ± 1%    2.06µs ± 1%     ~     (p=0.100 n=30+30)
      RegexpMatchMedium_1K-4      533µs ± 1%     534µs ± 1%     ~     (p=0.254 n=29+30)
      RegexpMatchHard_32-4       28.9µs ± 0%    28.9µs ± 0%     ~     (p=0.154 n=30+30)
      RegexpMatchHard_1K-4        868µs ± 1%     867µs ± 0%     ~     (p=0.729 n=30+23)
      Revcomp-4                  66.9ms ± 1%    67.2ms ± 2%     ~     (p=0.102 n=28+29)
      Template-4                  1.07s ± 1%     1.06s ± 1%   -0.53%  (p=0.000 n=30+30)
      TimeParse-4                7.07µs ± 1%    7.01µs ± 0%   -0.85%  (p=0.000 n=30+25)
      TimeFormat-4               13.1µs ± 0%    13.2µs ± 1%   +0.77%  (p=0.000 n=27+27)
      [Geo mean]                  721µs          716µs        -0.70%
      
      name                     old speed      new speed      delta
      GobDecode-4              7.38MB/s ± 1%  7.37MB/s ± 2%     ~     (p=0.399 n=29+29)
      GobEncode-4              8.26MB/s ± 1%  8.27MB/s ± 2%     ~     (p=0.790 n=30+30)
      Gzip-4                   4.71MB/s ± 1%  4.71MB/s ± 1%     ~     (p=0.885 n=30+30)
      Gunzip-4                 32.2MB/s ± 1%  32.2MB/s ± 1%     ~     (p=0.190 n=30+30)
      JSONEncode-4             8.28MB/s ± 1%  8.25MB/s ± 1%     ~     (p=0.053 n=30+30)
      JSONDecode-4             2.13MB/s ± 0%  2.12MB/s ± 1%     ~     (p=0.072 n=18+30)
      GoParse-4                1.25MB/s ± 1%  1.25MB/s ± 2%     ~     (p=0.863 n=30+30)
      RegexpMatchEasy0_32-4    24.8MB/s ± 0%  25.2MB/s ± 1%   +1.61%  (p=0.000 n=30+30)
      RegexpMatchEasy0_1K-4     134MB/s ± 4%   133MB/s ± 3%     ~     (p=0.074 n=30+30)
      RegexpMatchEasy1_32-4    24.5MB/s ± 0%  24.6MB/s ± 1%   +0.72%  (p=0.000 n=23+30)
      RegexpMatchEasy1_1K-4    99.1MB/s ± 3%  99.8MB/s ± 5%     ~     (p=0.105 n=30+30)
      RegexpMatchMedium_32-4    483kB/s ± 1%   487kB/s ± 1%   +0.83%  (p=0.002 n=30+30)
      RegexpMatchMedium_1K-4   1.92MB/s ± 1%  1.92MB/s ± 1%     ~     (p=0.058 n=30+30)
      RegexpMatchHard_32-4     1.10MB/s ± 0%  1.11MB/s ± 0%     ~     (p=0.804 n=30+30)
      RegexpMatchHard_1K-4     1.18MB/s ± 0%  1.18MB/s ± 0%     ~     (all equal)
      Revcomp-4                38.0MB/s ± 1%  37.8MB/s ± 2%     ~     (p=0.098 n=28+29)
      Template-4               1.82MB/s ± 1%  1.83MB/s ± 1%   +0.55%  (p=0.000 n=29+29)
      [Geo mean]               6.79MB/s       6.79MB/s        +0.09%
      
      Change-Id: Ia91991c2c5c59c5df712de85a83b13a21c0a554b
      Reviewed-on: https://go-review.googlesource.com/63770
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      a07176b4
    • isharipo's avatar
      cmd/internal/obj: change Prog.From3 to RestArgs ([]Addr) · 8c67f210
      isharipo authored
      This change makes it easier to express instructions
      with arbitrary number of operands.
      
      Rationale: previous approach with operand "hiding" does
      not scale well, AVX and especially AVX512 have many
      instructions with 3+ operands.
      
      x86 asm backend is updated to handle up to 6 explicit operands.
      It also fixes issue with 4-th immediate operand type checks.
      All `ytab` tables are updated accordingly.
      
      Changes to non-x86 backends only include these patterns:
      `p.From3 = X` => `p.SetFrom3(X)`
      `p.From3.X = Y` => `p.GetFrom3().X = Y`
      
      Over time, other backends can adapt Prog.RestArgs
      and reduce the amount of workarounds.
      
      -- Performance --
      
      x/benchmark/build:
      
      $ benchstat upstream.bench patched.bench
      name      old time/op                 new time/op                 delta
      Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)
      
      name      old binary-size             new binary-size             delta
      Build-48                  10.3M ± 0%                  10.3M ± 0%   ~     (all equal)
      
      name      old build-time/op           new build-time/op           delta
      Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)
      
      name      old build-peak-RSS-bytes    new build-peak-RSS-bytes    delta
      Build-48                  145MB ± 5%                  148MB ± 5%   ~     (p=0.218 n=10+10)
      
      name      old build-user+sys-time/op  new build-user+sys-time/op  delta
      Build-48                  21.0s ± 2%                  21.2s ± 2%   ~     (p=0.075 n=10+10)
      
      Microbenchmark shows a slight slowdown.
      
      name        old time/op  new time/op  delta
      AMD64asm-4  49.5ms ± 1%  49.9ms ± 1%  +0.67%  (p=0.001 n=23+15)
      
      func BenchmarkAMD64asm(b *testing.B) {
        for i := 0; i < b.N; i++ {
          TestAMD64EndToEnd(nil)
          TestAMD64Encoder(nil)
        }
      }
      
      Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183
      Reviewed-on: https://go-review.googlesource.com/63490
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      8c67f210
    • Alessandro Arzilli's avatar
      cmd/compile: fix lexical block of captured variables · e1cf2be7
      Alessandro Arzilli authored
      Variables captured by a closure were always assigned to the root scope
      in their declaration function. Using decl.Name.Defn.Pos will result in
      the correct scope for both the declaration function and the capturing
      function.
      
      Fixes #21515
      
      Change-Id: I3960aface3c4fc97e15b36191a74a7bed5b5ebc1
      Reviewed-on: https://go-review.googlesource.com/56830
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      e1cf2be7
    • Emmanuel Odeke's avatar
      runtime: return deltimer early if timer.timersBucket is unset · a72e26f2
      Emmanuel Odeke authored
      Return early from deltimer, with false as the result,
      to indicate that we couldn't delete the timer since its
      timersBucket was nil(not set) in the first place.
      
      That happens in such a case where a user created
      the timer from a Ticker with:
      
        t := time.Ticker{C: c}
      
      The above usage skips the entire setup of assigning
      the appropriate underlying runtimeTimer and timersBucket,
      steps that are done for us by time.NewTicker.
      
      CL 34784 introduced this bug with an optimization, by changing
      stopTimer to retrieve the timersBucket from the timer itself
      (which is unset with the mentioned usage pattern above),
      whereas the old  behavior relied on indexing
      by goroutine ID into the global slice of runtime
      timers, to retrieve the appropriate timersBucket.
      
      Fixes #21874
      
      Change-Id: Ie9ccc6bdee685414b2430dc4aa74ef618cea2b33
      Reviewed-on: https://go-review.googlesource.com/63970
      Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      a72e26f2
    • Matthew Dempsky's avatar
      cmd/internal/objabi: remove unused flag funcs · 37fc70ba
      Matthew Dempsky authored
      Change-Id: I728c5606882ece949d58e86f9558fc16ae4ffd85
      Reviewed-on: https://go-review.googlesource.com/64052
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      37fc70ba
    • Matthew Dempsky's avatar
      cmd/link: replace unrolled Cput loops with Cwrite/Cwritestring · f84a1db1
      Matthew Dempsky authored
      Passes toolstash-check -all.
      
      Change-Id: I1c85a2c0390517f4e9cdbddddbf3c353edca65b3
      Reviewed-on: https://go-review.googlesource.com/64051
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      f84a1db1
    • Ian Lance Taylor's avatar
      runtime: change lockedg/lockedm to guintptr/muintptr · 165c15af
      Ian Lance Taylor authored
      This change has no real effect in itself. This is to prepare for a
      followup change that will call lockOSThread during a cgo callback when
      there is no p assigned, and therefore when lockOSThread can not use a
      write barrier.
      
      Change-Id: Ia122d41acf54191864bcb68f393f2ed3b2f87abc
      Reviewed-on: https://go-review.googlesource.com/63630
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      165c15af
    • David Crawshaw's avatar
      cmd/compile: replace GOROOT in //line directives · 27e80f7c
      David Crawshaw authored
      The compiler replaces any path of the form /path/to/goroot/src/net/port.go
      with GOROOT/src/net/port.go so that the same object file is
      produced if the GOROOT is moved. It was skipping this transformation
      for any absolute path into the GOROOT that came from //line directives,
      such as those generated by cmd/cgo.
      
      Fixes #21373
      Fixes #21720
      Fixes #21825
      
      Change-Id: I2784c701b4391cfb92e23efbcb091a84957d61dd
      Reviewed-on: https://go-review.googlesource.com/63693
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      27e80f7c
    • Todd Neal's avatar
      cmd/compile: fix typo in floating point rule · af860838
      Todd Neal authored
      Change-Id: Idfb64fcb26f48d5b70bab872f9a3d96a036be681
      Reviewed-on: https://go-review.googlesource.com/63950
      Run-TryBot: Todd Neal <todd@tneal.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      af860838
    • Kunpei Sakai's avatar
      all: fix article typos · 5a986eca
      Kunpei Sakai authored
      a -> an
      
      Change-Id: I7362bdc199e83073a712be657f5d9ba16df3077e
      Reviewed-on: https://go-review.googlesource.com/63850Reviewed-by: default avatarRob Pike <r@golang.org>
      5a986eca
    • Emmanuel Odeke's avatar
      cmd/go: correctly report that -msan needs CGO_ENABLED=1 · 33cb1481
      Emmanuel Odeke authored
      Previously, if CGO_ENABLED=0 was set when building
      with -msan, the error message printed was:
      
        -race requires cgo; enable cgo by setting CGO_ENABLED=1
      
      yet the instrumentation flag passed in was -msan. This CL
      fixes that message to correctly report that -msan needed
      CGO_ENABLED=1, and likewise if -race, report -race needed it.
      
      Fixes #21895
      
      Change-Id: If423d520daae7847fb38cc97c3192ada5d960f9d
      Reviewed-on: https://go-review.googlesource.com/63930
      Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      33cb1481
  3. 14 Sep, 2017 8 commits
  4. 13 Sep, 2017 14 commits
  5. 12 Sep, 2017 5 commits