1. 13 Mar, 2018 13 commits
  2. 12 Mar, 2018 8 commits
    • Filippo Valsorda's avatar
      C: add Filippo Valsorda's @golang.org email · 2086f350
      Filippo Valsorda authored
      Change-Id: I3758a4350b600af304b2cff7ad59c7368a01ab5c
      Reviewed-on: https://go-review.googlesource.com/100215Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      2086f350
    • Josh Bleecher Snyder's avatar
      runtime: convert g.waitreason from string to uint8 · 4eea887f
      Josh Bleecher Snyder authored
      Every time I poke at #14921, the g.waitreason string
      pointer writes show up.
      
      They're not particularly important performance-wise,
      but it'd be nice to clear the noise away.
      
      And it does open up a few extra bytes in the g struct
      for some future use.
      
      Change-Id: I7ffbd52fbc2a286931a2218038fda52ed6473cc9
      Reviewed-on: https://go-review.googlesource.com/99078
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      4eea887f
    • Tobias Klauser's avatar
      runtime: simplify range expressions in tests · 025134b0
      Tobias Klauser authored
      Generated by running gofmt -s on the files in question.
      
      Change-Id: If6578b150e1bfced8657196d2af01f5d36879f93
      Reviewed-on: https://go-review.googlesource.com/100135
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      025134b0
    • Vladimir Kuzmin's avatar
      cmd/compile: avoid extra mapaccess in "m[k] op= r" · 73950831
      Vladimir Kuzmin authored
      Currently, order desugars map assignment operations like
      
          m[k] op= r
      
      into
      
          m[k] = m[k] op r
      
      which in turn is transformed during walk into:
      
          tmp := *mapaccess(m, k)
          tmp = tmp op r
          *mapassign(m, k) = tmp
      
      However, this is suboptimal, as we could instead produce just:
      
          *mapassign(m, k) op= r
      
      One complication though is if "r == 0", then "m[k] /= r" and "m[k] %=
      r" will panic, and they need to do so *before* calling mapassign,
      otherwise we may insert a new zero-value element into the map.
      
      It would be spec compliant to just emit the "r != 0" check before
      calling mapassign (see #23735), but currently these checks aren't
      generated until SSA construction. For now, it's simpler to continue
      desugaring /= and %= into two map indexing operations.
      
      Fixes #23661.
      
      Change-Id: I46e3739d9adef10e92b46fdd78b88d5aabe68952
      Reviewed-on: https://go-review.googlesource.com/91557
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      73950831
    • isharipo's avatar
      cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86 · 85a8d25d
      isharipo authored
      cmd/asm now supports three-operand form of IMUL,
      so instead of using IMUL with resultInArg0, emit IMUL3 instruction.
      
      This results in less redundant MOVs where SSA assigns
      different registers to input[0] and dst arguments.
      
      Note: these have exactly the same encoding when reg0=reg1:
            IMUL3x $const, reg0, reg1
            IMULx $const, reg
      Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0].
      This is why we don't bother to generate IMULx for the case where
      dst is the same as input[0].
      
      Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4
      Reviewed-on: https://go-review.googlesource.com/99656
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      85a8d25d
    • Giovanni Bajo's avatar
      cmd/compile: implement CMOV on amd64 · 080187f4
      Giovanni Bajo authored
      This builds upon the branchelim pass, activating it for amd64 and
      lowering CondSelect. Special care is made to FPU instructions for
      NaN handling.
      
      Benchmark results on Xeon E5630 (Westmere EP):
      
      name                      old time/op    new time/op    delta
      BinaryTree17-16              4.99s ± 9%     4.66s ± 2%     ~     (p=0.095 n=5+5)
      Fannkuch11-16                4.93s ± 3%     5.04s ± 2%     ~     (p=0.548 n=5+5)
      FmtFprintfEmpty-16          58.8ns ± 7%    61.4ns ±14%     ~     (p=0.579 n=5+5)
      FmtFprintfString-16          114ns ± 2%     114ns ± 4%     ~     (p=0.603 n=5+5)
      FmtFprintfInt-16             181ns ± 4%     125ns ± 3%  -30.90%  (p=0.008 n=5+5)
      FmtFprintfIntInt-16          263ns ± 2%     217ns ± 2%  -17.34%  (p=0.008 n=5+5)
      FmtFprintfPrefixedInt-16     230ns ± 1%     212ns ± 1%   -7.99%  (p=0.008 n=5+5)
      FmtFprintfFloat-16           411ns ± 3%     344ns ± 5%  -16.43%  (p=0.008 n=5+5)
      FmtManyArgs-16               828ns ± 4%     790ns ± 2%   -4.59%  (p=0.032 n=5+5)
      GobDecode-16                10.9ms ± 4%    10.8ms ± 5%     ~     (p=0.548 n=5+5)
      GobEncode-16                9.52ms ± 5%    9.46ms ± 2%     ~     (p=1.000 n=5+5)
      Gzip-16                      334ms ± 2%     337ms ± 2%     ~     (p=0.548 n=5+5)
      Gunzip-16                   64.4ms ± 1%    65.0ms ± 1%   +1.00%  (p=0.008 n=5+5)
      HTTPClientServer-16          156µs ± 3%     155µs ± 3%     ~     (p=0.690 n=5+5)
      JSONEncode-16               21.0ms ± 1%    21.8ms ± 0%   +3.76%  (p=0.016 n=5+4)
      JSONDecode-16               95.1ms ± 0%    95.7ms ± 1%     ~     (p=0.151 n=5+5)
      Mandelbrot200-16            6.38ms ± 1%    6.42ms ± 1%     ~     (p=0.095 n=5+5)
      GoParse-16                  5.47ms ± 2%    5.36ms ± 1%   -1.95%  (p=0.016 n=5+5)
      RegexpMatchEasy0_32-16       111ns ± 1%     111ns ± 1%     ~     (p=0.635 n=5+4)
      RegexpMatchEasy0_1K-16       408ns ± 1%     411ns ± 2%     ~     (p=0.087 n=5+5)
      RegexpMatchEasy1_32-16       103ns ± 1%     104ns ± 1%     ~     (p=0.484 n=5+5)
      RegexpMatchEasy1_1K-16       659ns ± 2%     652ns ± 1%     ~     (p=0.571 n=5+5)
      RegexpMatchMedium_32-16      176ns ± 2%     174ns ± 1%     ~     (p=0.476 n=5+5)
      RegexpMatchMedium_1K-16     58.6µs ± 4%    57.7µs ± 4%     ~     (p=0.548 n=5+5)
      RegexpMatchHard_32-16       3.07µs ± 3%    3.04µs ± 4%     ~     (p=0.421 n=5+5)
      RegexpMatchHard_1K-16       89.2µs ± 1%    87.9µs ± 2%   -1.52%  (p=0.032 n=5+5)
      Revcomp-16                   575ms ± 0%     587ms ± 2%   +2.12%  (p=0.032 n=4+5)
      Template-16                  110ms ± 1%     107ms ± 3%   -3.00%  (p=0.032 n=5+5)
      TimeParse-16                 463ns ± 0%     462ns ± 0%     ~     (p=0.810 n=5+4)
      TimeFormat-16                538ns ± 0%     535ns ± 0%   -0.63%  (p=0.024 n=5+5)
      
      name                      old speed      new speed      delta
      GobDecode-16              70.7MB/s ± 4%  71.4MB/s ± 5%     ~     (p=0.452 n=5+5)
      GobEncode-16              80.7MB/s ± 5%  81.2MB/s ± 2%     ~     (p=1.000 n=5+5)
      Gzip-16                   58.2MB/s ± 2%  57.7MB/s ± 2%     ~     (p=0.452 n=5+5)
      Gunzip-16                  302MB/s ± 1%   299MB/s ± 1%   -0.99%  (p=0.008 n=5+5)
      JSONEncode-16             92.4MB/s ± 1%  89.1MB/s ± 0%   -3.63%  (p=0.016 n=5+4)
      JSONDecode-16             20.4MB/s ± 0%  20.3MB/s ± 1%     ~     (p=0.135 n=5+5)
      GoParse-16                10.6MB/s ± 2%  10.8MB/s ± 1%   +2.00%  (p=0.016 n=5+5)
      RegexpMatchEasy0_32-16     286MB/s ± 1%   285MB/s ± 3%     ~     (p=1.000 n=5+5)
      RegexpMatchEasy0_1K-16    2.51GB/s ± 1%  2.49GB/s ± 2%     ~     (p=0.095 n=5+5)
      RegexpMatchEasy1_32-16     309MB/s ± 1%   307MB/s ± 1%     ~     (p=0.548 n=5+5)
      RegexpMatchEasy1_1K-16    1.55GB/s ± 2%  1.57GB/s ± 1%     ~     (p=0.690 n=5+5)
      RegexpMatchMedium_32-16   5.68MB/s ± 2%  5.73MB/s ± 1%     ~     (p=0.579 n=5+5)
      RegexpMatchMedium_1K-16   17.5MB/s ± 4%  17.8MB/s ± 4%     ~     (p=0.500 n=5+5)
      RegexpMatchHard_32-16     10.4MB/s ± 3%  10.5MB/s ± 4%     ~     (p=0.460 n=5+5)
      RegexpMatchHard_1K-16     11.5MB/s ± 1%  11.7MB/s ± 2%   +1.57%  (p=0.032 n=5+5)
      Revcomp-16                 442MB/s ± 0%   433MB/s ± 2%   -2.05%  (p=0.032 n=4+5)
      Template-16               17.7MB/s ± 1%  18.2MB/s ± 3%   +3.12%  (p=0.032 n=5+5)
      
      Change-Id: Ic7cb7374d07da031e771bdcbfdd832fd1b17159c
      Reviewed-on: https://go-review.googlesource.com/98695Reviewed-by: default avatarIlya Tocar <ilya.tocar@intel.com>
      080187f4
    • fanzha02's avatar
      cmd/asm: fix ARM64 vector register arrangement encoding bug · fdf5aaf5
      fanzha02 authored
      The current code assigns vector register arrangement a wrong value
      when the arrangement specifier is S2, which causes the incorrect
      assembly.
      
      The patch fixes the issue and adds the test cases.
      
      Fixes #24249
      
      Change-Id: I9736df1279494003d0b178da1af9cee9cd85ce21
      Reviewed-on: https://go-review.googlesource.com/98555Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      fdf5aaf5
    • erifan01's avatar
      math/big: optimize shlVU and shrVU on arm64 · 140bfe9c
      erifan01 authored
      This CL implements shlVU and shrVU with arm64 HW instructions "LDP" and "STP" to reduce load cost,
      it also removes unnecessary checks on the number of shifts for better performance.
      
      Benchmarks:
      
      name                              old time/op    new time/op    delta
      AddVV/1-8                           21.6ns ± 1%    21.6ns ± 1%      ~     (p=0.683 n=5+5)
      AddVV/2-8                           13.5ns ± 0%    13.5ns ± 0%      ~     (all equal)
      AddVV/3-8                           15.5ns ± 0%    15.5ns ± 0%      ~     (all equal)
      AddVV/4-8                           17.5ns ± 0%    17.5ns ± 0%      ~     (all equal)
      AddVV/5-8                           19.5ns ± 0%    19.5ns ± 0%      ~     (all equal)
      AddVV/10-8                          29.5ns ± 0%    29.5ns ± 0%      ~     (all equal)
      AddVV/100-8                          217ns ± 0%     217ns ± 0%      ~     (all equal)
      AddVV/1000-8                        2.02µs ± 0%    2.03µs ± 0%    +0.73%  (p=0.008 n=5+5)
      AddVV/10000-8                       20.5µs ± 0%    20.5µs ± 0%    -0.01%  (p=0.008 n=5+5)
      AddVV/100000-8                       246µs ± 5%     250µs ± 4%      ~     (p=0.548 n=5+5)
      AddVW/1-8                           9.26ns ± 0%    9.32ns ± 0%    +0.65%  (p=0.016 n=4+5)
      AddVW/2-8                           19.8ns ± 3%    19.8ns ± 0%      ~     (p=0.143 n=5+5)
      AddVW/3-8                           11.5ns ± 0%    11.5ns ± 0%      ~     (all equal)
      AddVW/4-8                           13.0ns ± 0%    13.0ns ± 0%      ~     (all equal)
      AddVW/5-8                           14.5ns ± 0%    14.5ns ± 0%      ~     (all equal)
      AddVW/10-8                          22.0ns ± 0%    22.0ns ± 0%      ~     (all equal)
      AddVW/100-8                          167ns ± 0%     166ns ± 0%    -0.60%  (p=0.000 n=5+4)
      AddVW/1000-8                        1.52µs ± 0%    1.52µs ± 0%      ~     (all equal)
      AddVW/10000-8                       15.1µs ± 0%    15.1µs ± 0%    +0.01%  (p=0.008 n=5+5)
      AddVW/100000-8                       163µs ± 4%     153µs ± 3%    -5.97%  (p=0.016 n=5+5)
      AddMulVVW/1-8                       32.4ns ± 1%    33.0ns ± 1%    +1.73%  (p=0.040 n=5+5)
      AddMulVVW/2-8                       56.4ns ± 2%    55.9ns ± 1%      ~     (p=0.135 n=5+5)
      AddMulVVW/3-8                       85.4ns ± 1%    85.1ns ± 0%      ~     (p=0.079 n=5+5)
      AddMulVVW/4-8                        129ns ± 1%     129ns ± 0%      ~     (p=0.397 n=5+5)
      AddMulVVW/5-8                        148ns ± 0%     148ns ± 0%      ~     (all equal)
      AddMulVVW/10-8                       270ns ± 0%     268ns ± 0%    -0.74%  (p=0.029 n=4+4)
      AddMulVVW/100-8                     2.75µs ± 0%    2.75µs ± 0%    -0.09%  (p=0.008 n=5+5)
      AddMulVVW/1000-8                    26.0µs ± 0%    26.0µs ± 0%    -0.06%  (p=0.024 n=5+5)
      AddMulVVW/10000-8                    312µs ± 0%     312µs ± 0%    -0.09%  (p=0.008 n=5+5)
      AddMulVVW/100000-8                  2.89ms ± 0%    2.89ms ± 0%    +0.14%  (p=0.016 n=5+5)
      DecimalConversion-8                  315µs ± 1%     312µs ± 0%      ~     (p=0.095 n=5+5)
      FloatString/100-8                   2.56µs ± 1%    2.52µs ± 1%    -1.31%  (p=0.016 n=5+5)
      FloatString/1000-8                  58.6µs ± 0%    58.2µs ± 0%    -0.75%  (p=0.008 n=5+5)
      FloatString/10000-8                 4.59ms ± 0%    4.59ms ± 0%      ~     (p=0.056 n=5+5)
      FloatString/100000-8                 446ms ± 0%     446ms ± 0%    -0.04%  (p=0.008 n=5+5)
      FloatAdd/10-8                        184ns ± 0%     178ns ± 0%    -3.48%  (p=0.008 n=5+5)
      FloatAdd/100-8                       189ns ± 3%     178ns ± 2%    -6.02%  (p=0.008 n=5+5)
      FloatAdd/1000-8                      371ns ± 0%     267ns ± 0%   -27.99%  (p=0.000 n=5+4)
      FloatAdd/10000-8                    1.87µs ± 0%    1.03µs ± 0%   -44.74%  (p=0.008 n=5+5)
      FloatAdd/100000-8                   17.1µs ± 0%     8.8µs ± 0%   -48.71%  (p=0.016 n=5+4)
      FloatSub/10-8                        148ns ± 0%     146ns ± 0%    -1.35%  (p=0.000 n=5+4)
      FloatSub/100-8                       148ns ± 0%     140ns ± 0%    -5.41%  (p=0.008 n=5+5)
      FloatSub/1000-8                      242ns ± 0%     191ns ± 0%   -21.24%  (p=0.008 n=5+5)
      FloatSub/10000-8                    1.07µs ± 0%    0.64µs ± 1%   -39.89%  (p=0.016 n=4+5)
      FloatSub/100000-8                   9.48µs ± 0%    5.32µs ± 0%   -43.87%  (p=0.008 n=5+5)
      ParseFloatSmallExp-8                29.3µs ± 3%    28.6µs ± 1%      ~     (p=0.310 n=5+5)
      ParseFloatLargeExp-8                 125µs ± 1%     123µs ± 0%    -1.99%  (p=0.008 n=5+5)
      GCD10x10/WithoutXY-8                 278ns ± 4%     289ns ± 5%    +3.96%  (p=0.040 n=5+5)
      GCD10x10/WithXY-8                   2.12µs ± 2%    2.15µs ± 2%      ~     (p=0.095 n=5+5)
      GCD10x100/WithoutXY-8                615ns ± 1%     629ns ± 5%      ~     (p=0.135 n=5+5)
      GCD10x100/WithXY-8                  3.42µs ± 1%    3.53µs ± 2%    +3.38%  (p=0.008 n=5+5)
      GCD10x1000/WithoutXY-8              1.39µs ± 1%    1.38µs ± 1%      ~     (p=0.460 n=5+5)
      GCD10x1000/WithXY-8                 7.47µs ± 2%    7.49µs ± 3%      ~     (p=1.000 n=5+5)
      GCD10x10000/WithoutXY-8             8.71µs ± 1%    8.71µs ± 0%      ~     (p=0.841 n=5+5)
      GCD10x10000/WithXY-8                28.4µs ± 2%    27.2µs ± 2%    -4.24%  (p=0.008 n=5+5)
      GCD10x100000/WithoutXY-8            78.9µs ± 1%    79.1µs ± 0%      ~     (p=0.222 n=5+5)
      GCD10x100000/WithXY-8                240µs ± 1%     228µs ± 1%    -4.98%  (p=0.008 n=5+5)
      GCD100x100/WithoutXY-8              1.87µs ± 2%    1.89µs ± 1%      ~     (p=0.095 n=5+5)
      GCD100x100/WithXY-8                 26.6µs ± 1%    26.3µs ± 0%    -1.14%  (p=0.032 n=5+5)
      GCD100x1000/WithoutXY-8             4.44µs ± 2%    4.47µs ± 2%      ~     (p=0.444 n=5+5)
      GCD100x1000/WithXY-8                36.7µs ± 1%    36.0µs ± 1%    -1.96%  (p=0.008 n=5+5)
      GCD100x10000/WithoutXY-8            22.8µs ± 1%    22.3µs ± 1%    -2.52%  (p=0.008 n=5+5)
      GCD100x10000/WithXY-8                145µs ± 1%     142µs ± 0%    -1.88%  (p=0.008 n=5+5)
      GCD100x100000/WithoutXY-8            198µs ± 1%     190µs ± 0%    -4.06%  (p=0.008 n=5+5)
      GCD100x100000/WithXY-8              1.11ms ± 0%    1.09ms ± 0%    -1.87%  (p=0.008 n=5+5)
      GCD1000x1000/WithoutXY-8            25.4µs ± 1%    25.0µs ± 1%    -1.34%  (p=0.008 n=5+5)
      GCD1000x1000/WithXY-8                515µs ± 1%     485µs ± 0%    -5.85%  (p=0.008 n=5+5)
      GCD1000x10000/WithoutXY-8           57.3µs ± 1%    56.2µs ± 1%    -1.95%  (p=0.008 n=5+5)
      GCD1000x10000/WithXY-8              1.21ms ± 0%    1.18ms ± 0%    -2.65%  (p=0.008 n=5+5)
      GCD1000x100000/WithoutXY-8           358µs ± 0%     352µs ± 1%    -1.71%  (p=0.008 n=5+5)
      GCD1000x100000/WithXY-8             8.72ms ± 0%    8.66ms ± 0%    -0.71%  (p=0.008 n=5+5)
      GCD10000x10000/WithoutXY-8           690µs ± 0%     687µs ± 1%      ~     (p=0.095 n=5+5)
      GCD10000x10000/WithXY-8             16.0ms ± 0%    12.5ms ± 0%   -22.01%  (p=0.008 n=5+5)
      GCD10000x100000/WithoutXY-8         2.09ms ± 0%    2.07ms ± 0%    -0.58%  (p=0.008 n=5+5)
      GCD10000x100000/WithXY-8            86.8ms ± 0%    83.4ms ± 0%    -3.95%  (p=0.008 n=5+5)
      GCD100000x100000/WithoutXY-8        51.2ms ± 0%    51.2ms ± 0%      ~     (p=0.548 n=5+5)
      GCD100000x100000/WithXY-8            1.25s ± 0%     0.89s ± 0%   -28.98%  (p=0.008 n=5+5)
      Hilbert-8                           2.46ms ± 2%    2.53ms ± 1%    +2.89%  (p=0.032 n=5+5)
      Binomial-8                          5.15µs ± 4%    4.92µs ± 1%    -4.43%  (p=0.032 n=5+5)
      QuoRem-8                            7.10µs ± 0%    7.05µs ± 0%    -0.59%  (p=0.008 n=5+5)
      Exp-8                                161ms ± 0%     161ms ± 0%    -0.24%  (p=0.008 n=5+5)
      Exp2-8                               161ms ± 0%     161ms ± 0%    -0.30%  (p=0.016 n=4+5)
      Bitset-8                            40.4ns ± 0%    40.3ns ± 0%      ~     (p=0.159 n=5+5)
      BitsetNeg-8                          158ns ± 4%     155ns ± 2%      ~     (p=0.183 n=5+5)
      BitsetOrig-8                         374ns ± 0%     383ns ± 1%    +2.35%  (p=0.008 n=5+5)
      BitsetNegOrig-8                      620ns ± 1%     663ns ± 2%    +7.00%  (p=0.008 n=5+5)
      ModSqrt225_Tonelli-8                7.26ms ± 0%    7.27ms ± 0%      ~     (p=0.841 n=5+5)
      ModSqrt224_3Mod4-8                  2.24ms ± 0%    2.24ms ± 0%      ~     (p=0.690 n=5+5)
      ModSqrt5430_Tonelli-8                62.3s ± 0%     62.4s ± 0%    +0.15%  (p=0.008 n=5+5)
      ModSqrt5430_3Mod4-8                  20.8s ± 0%     20.8s ± 0%      ~     (p=0.151 n=5+5)
      Sqrt-8                               101µs ± 0%      97µs ± 0%    -3.99%  (p=0.008 n=5+5)
      IntSqr/1-8                          32.7ns ± 1%    32.5ns ± 1%      ~     (p=0.325 n=5+5)
      IntSqr/2-8                           161ns ± 4%     160ns ± 4%      ~     (p=0.659 n=5+5)
      IntSqr/3-8                           296ns ± 7%     297ns ± 6%      ~     (p=0.841 n=5+5)
      IntSqr/5-8                           752ns ± 7%     755ns ± 6%      ~     (p=0.889 n=5+5)
      IntSqr/8-8                          1.91µs ± 3%    1.90µs ± 3%      ~     (p=0.746 n=5+5)
      IntSqr/10-8                         2.99µs ± 4%    3.00µs ± 4%      ~     (p=0.516 n=5+5)
      IntSqr/20-8                         6.29µs ± 2%    6.19µs ± 2%      ~     (p=0.151 n=5+5)
      IntSqr/30-8                         14.0µs ± 1%    13.8µs ± 2%      ~     (p=0.056 n=5+5)
      IntSqr/50-8                         38.1µs ± 3%    37.9µs ± 3%      ~     (p=0.548 n=5+5)
      IntSqr/80-8                         95.1µs ± 1%    94.7µs ± 1%      ~     (p=0.310 n=5+5)
      IntSqr/100-8                         148µs ± 1%     148µs ± 1%      ~     (p=0.548 n=5+5)
      IntSqr/200-8                         587µs ± 1%     587µs ± 1%      ~     (p=1.000 n=5+5)
      IntSqr/300-8                        1.31ms ± 1%    1.32ms ± 1%      ~     (p=0.151 n=5+5)
      IntSqr/500-8                        2.48ms ± 0%    2.49ms ± 0%      ~     (p=0.310 n=5+5)
      IntSqr/800-8                        4.68ms ± 0%    4.67ms ± 0%      ~     (p=0.548 n=5+5)
      IntSqr/1000-8                       7.57ms ± 0%    7.56ms ± 0%      ~     (p=0.421 n=5+5)
      Mul-8                                311ms ± 0%     311ms ± 0%      ~     (p=0.151 n=5+5)
      Exp3Power/0x10-8                     584ns ± 2%     573ns ± 1%      ~     (p=0.190 n=5+5)
      Exp3Power/0x40-8                     646ns ± 2%     649ns ± 1%      ~     (p=0.690 n=5+5)
      Exp3Power/0x100-8                   1.42µs ± 2%    1.45µs ± 1%    +2.03%  (p=0.032 n=5+5)
      Exp3Power/0x400-8                   8.28µs ± 1%    8.39µs ± 0%    +1.33%  (p=0.008 n=5+5)
      Exp3Power/0x1000-8                  60.1µs ± 0%    59.8µs ± 0%    -0.44%  (p=0.008 n=5+5)
      Exp3Power/0x4000-8                   818µs ± 0%     816µs ± 0%    -0.23%  (p=0.008 n=5+5)
      Exp3Power/0x10000-8                 7.79ms ± 0%    7.78ms ± 0%      ~     (p=0.690 n=5+5)
      Exp3Power/0x40000-8                 73.4ms ± 0%    73.3ms ± 0%      ~     (p=0.151 n=5+5)
      Exp3Power/0x100000-8                 665ms ± 0%     664ms ± 0%    -0.16%  (p=0.016 n=4+5)
      Exp3Power/0x400000-8                 5.99s ± 0%     5.97s ± 0%    -0.24%  (p=0.008 n=5+5)
      Fibo-8                               116ms ± 0%     117ms ± 0%    +0.42%  (p=0.008 n=5+5)
      NatSqr/1-8                           113ns ± 2%     112ns ± 1%      ~     (p=0.190 n=5+5)
      NatSqr/2-8                           249ns ± 2%     250ns ± 2%      ~     (p=0.365 n=5+5)
      NatSqr/3-8                           379ns ± 1%     381ns ± 2%      ~     (p=0.127 n=5+5)
      NatSqr/5-8                           838ns ± 3%     841ns ± 5%      ~     (p=0.754 n=5+5)
      NatSqr/8-8                          1.97µs ± 3%    1.97µs ± 4%      ~     (p=1.000 n=5+5)
      NatSqr/10-8                         3.04µs ± 4%    3.04µs ± 4%      ~     (p=1.000 n=5+5)
      NatSqr/20-8                         6.49µs ± 3%    6.50µs ± 2%      ~     (p=0.841 n=5+5)
      NatSqr/30-8                         14.3µs ± 2%    14.2µs ± 2%      ~     (p=0.548 n=5+5)
      NatSqr/50-8                         38.5µs ± 3%    38.3µs ± 3%      ~     (p=0.421 n=5+5)
      NatSqr/80-8                         96.3µs ± 1%    96.1µs ± 1%      ~     (p=0.421 n=5+5)
      NatSqr/100-8                         149µs ± 1%     148µs ± 1%      ~     (p=0.310 n=5+5)
      NatSqr/200-8                         591µs ± 1%     592µs ± 1%      ~     (p=0.690 n=5+5)
      NatSqr/300-8                        1.31ms ± 1%    1.32ms ± 0%      ~     (p=0.190 n=5+4)
      NatSqr/500-8                        2.49ms ± 0%    2.49ms ± 0%      ~     (p=0.095 n=5+5)
      NatSqr/800-8                        4.70ms ± 0%    4.69ms ± 0%      ~     (p=0.222 n=5+5)
      NatSqr/1000-8                       7.60ms ± 0%    7.58ms ± 0%      ~     (p=0.222 n=5+5)
      ScanPi-8                             326µs ± 0%     327µs ± 1%      ~     (p=0.222 n=5+5)
      StringPiParallel-8                  71.4µs ± 5%    67.7µs ± 4%      ~     (p=0.095 n=5+5)
      Scan/10/Base2-8                     1.09µs ± 0%    1.10µs ± 1%      ~     (p=0.810 n=5+5)
      Scan/100/Base2-8                    7.79µs ± 0%    7.83µs ± 0%    +0.53%  (p=0.008 n=5+5)
      Scan/1000/Base2-8                   78.9µs ± 0%    79.0µs ± 0%      ~     (p=0.151 n=5+5)
      Scan/10000/Base2-8                  1.22ms ± 0%    1.23ms ± 1%      ~     (p=0.690 n=5+5)
      Scan/100000/Base2-8                 55.1ms ± 0%    55.1ms ± 0%    +0.10%  (p=0.008 n=5+5)
      Scan/10/Base8-8                      512ns ± 1%     534ns ± 1%    +4.34%  (p=0.008 n=5+5)
      Scan/100/Base8-8                    2.90µs ± 1%    2.92µs ± 0%    +0.67%  (p=0.024 n=5+5)
      Scan/1000/Base8-8                   31.0µs ± 0%    31.1µs ± 0%    +0.27%  (p=0.008 n=5+5)
      Scan/10000/Base8-8                   741µs ± 0%     744µs ± 1%      ~     (p=0.310 n=5+5)
      Scan/100000/Base8-8                 50.5ms ± 0%    50.7ms ± 0%    +0.23%  (p=0.016 n=5+4)
      Scan/10/Base10-8                     485ns ± 0%     510ns ± 1%    +5.15%  (p=0.008 n=5+5)
      Scan/100/Base10-8                   2.68µs ± 0%    2.70µs ± 0%    +0.84%  (p=0.008 n=5+5)
      Scan/1000/Base10-8                  28.7µs ± 0%    28.8µs ± 0%    +0.34%  (p=0.008 n=5+5)
      Scan/10000/Base10-8                  717µs ± 0%     720µs ± 1%      ~     (p=0.238 n=5+5)
      Scan/100000/Base10-8                50.3ms ± 0%    50.3ms ± 0%    +0.02%  (p=0.016 n=4+5)
      Scan/10/Base16-8                     439ns ± 0%     461ns ± 1%    +5.06%  (p=0.008 n=5+5)
      Scan/100/Base16-8                   2.48µs ± 0%    2.49µs ± 0%    +0.59%  (p=0.024 n=5+5)
      Scan/1000/Base16-8                  27.2µs ± 0%    27.3µs ± 0%      ~     (p=0.063 n=5+5)
      Scan/10000/Base16-8                  722µs ± 0%     725µs ± 1%      ~     (p=0.421 n=5+5)
      Scan/100000/Base16-8                52.7ms ± 0%    52.7ms ± 0%      ~     (p=0.686 n=4+4)
      String/10/Base2-8                    248ns ± 1%     248ns ± 1%      ~     (p=0.802 n=5+5)
      String/100/Base2-8                  1.51µs ± 0%    1.51µs ± 0%    -0.54%  (p=0.024 n=5+5)
      String/1000/Base2-8                 13.6µs ± 0%    13.6µs ± 0%      ~     (p=0.548 n=5+5)
      String/10000/Base2-8                 135µs ± 1%     135µs ± 2%      ~     (p=0.421 n=5+5)
      String/100000/Base2-8               1.32ms ± 1%    1.33ms ± 1%      ~     (p=0.310 n=5+5)
      String/10/Base8-8                    169ns ± 0%     170ns ± 0%      ~     (p=0.079 n=5+5)
      String/100/Base8-8                   635ns ± 1%     633ns ± 1%      ~     (p=0.595 n=5+5)
      String/1000/Base8-8                 5.33µs ± 0%    5.30µs ± 0%      ~     (p=0.063 n=5+5)
      String/10000/Base8-8                50.7µs ± 1%    50.7µs ± 1%      ~     (p=1.000 n=5+5)
      String/100000/Base8-8                499µs ± 1%     500µs ± 1%      ~     (p=1.000 n=5+5)
      String/10/Base10-8                   517ns ± 1%     512ns ± 1%    -1.01%  (p=0.032 n=5+5)
      String/100/Base10-8                 1.97µs ± 0%    2.01µs ± 1%    +2.13%  (p=0.008 n=5+5)
      String/1000/Base10-8                12.6µs ± 1%    12.1µs ± 1%    -4.16%  (p=0.008 n=5+5)
      String/10000/Base10-8               57.9µs ± 1%    54.8µs ± 1%    -5.46%  (p=0.008 n=5+5)
      String/100000/Base10-8              25.6ms ± 0%    25.6ms ± 0%    -0.12%  (p=0.008 n=5+5)
      String/10/Base16-8                   149ns ± 0%     149ns ± 1%      ~     (p=1.000 n=5+5)
      String/100/Base16-8                  514ns ± 0%     514ns ± 1%      ~     (p=0.825 n=5+5)
      String/1000/Base16-8                4.01µs ± 0%    4.01µs ± 0%      ~     (p=0.595 n=5+5)
      String/10000/Base16-8               37.7µs ± 0%    37.8µs ± 1%      ~     (p=0.222 n=5+5)
      String/100000/Base16-8               373µs ± 1%     372µs ± 0%      ~     (p=1.000 n=5+5)
      LeafSize/0-8                        6.64ms ± 0%    6.66ms ± 0%    +0.32%  (p=0.008 n=5+5)
      LeafSize/1-8                        74.0µs ± 1%    71.2µs ± 1%    -3.75%  (p=0.008 n=5+5)
      LeafSize/2-8                        74.1µs ± 0%    70.7µs ± 1%    -4.53%  (p=0.008 n=5+5)
      LeafSize/3-8                         379µs ± 0%     374µs ± 0%    -1.25%  (p=0.008 n=5+5)
      LeafSize/4-8                        72.7µs ± 0%    69.2µs ± 0%    -4.79%  (p=0.008 n=5+5)
      LeafSize/5-8                         471µs ± 0%     466µs ± 0%    -1.05%  (p=0.008 n=5+5)
      LeafSize/6-8                         377µs ± 0%     373µs ± 0%    -1.16%  (p=0.008 n=5+5)
      LeafSize/7-8                         245µs ± 0%     241µs ± 0%    -1.65%  (p=0.008 n=5+5)
      LeafSize/8-8                        73.1µs ± 0%    69.4µs ± 0%    -5.10%  (p=0.008 n=5+5)
      LeafSize/9-8                         538µs ± 0%     532µs ± 0%    -1.01%  (p=0.008 n=5+5)
      LeafSize/10-8                        472µs ± 0%     467µs ± 0%    -1.07%  (p=0.008 n=5+5)
      LeafSize/11-8                        460µs ± 0%     454µs ± 0%    -1.22%  (p=0.008 n=5+5)
      LeafSize/12-8                        378µs ± 0%     373µs ± 0%    -1.34%  (p=0.008 n=5+5)
      LeafSize/13-8                        344µs ± 0%     338µs ± 0%    -1.61%  (p=0.008 n=5+5)
      LeafSize/14-8                        247µs ± 0%     243µs ± 0%    -1.62%  (p=0.008 n=5+5)
      LeafSize/15-8                        169µs ± 0%     165µs ± 0%    -2.71%  (p=0.008 n=5+5)
      LeafSize/16-8                       73.3µs ± 1%    69.5µs ± 0%    -5.11%  (p=0.008 n=5+5)
      LeafSize/32-8                       82.7µs ± 0%    79.2µs ± 0%    -4.24%  (p=0.008 n=5+5)
      LeafSize/64-8                        135µs ± 0%     132µs ± 0%    -2.20%  (p=0.008 n=5+5)
      ProbablyPrime/n=0-8                 44.2ms ± 0%    43.9ms ± 0%    -0.69%  (p=0.008 n=5+5)
      ProbablyPrime/n=1-8                 64.8ms ± 0%    64.4ms ± 0%    -0.60%  (p=0.008 n=5+5)
      ProbablyPrime/n=5-8                  147ms ± 0%     147ms ± 0%    -0.34%  (p=0.008 n=5+5)
      ProbablyPrime/n=10-8                 250ms ± 0%     249ms ± 0%    -0.29%  (p=0.008 n=5+5)
      ProbablyPrime/n=20-8                 456ms ± 0%     455ms ± 0%    -0.29%  (p=0.008 n=5+5)
      ProbablyPrime/Lucas-8               23.6ms ± 0%    23.2ms ± 0%    -1.44%  (p=0.008 n=5+5)
      ProbablyPrime/MillerRabinBase2-8    20.6ms ± 0%    20.6ms ± 0%    -0.31%  (p=0.008 n=5+5)
      FloatSqrt/64-8                      2.27µs ± 1%    2.11µs ± 1%    -7.02%  (p=0.008 n=5+5)
      FloatSqrt/128-8                     4.93µs ± 1%    4.40µs ± 1%   -10.73%  (p=0.008 n=5+5)
      FloatSqrt/256-8                     13.6µs ± 0%     6.6µs ± 1%   -51.40%  (p=0.008 n=5+5)
      FloatSqrt/1000-8                    69.8µs ± 0%    31.2µs ± 0%   -55.27%  (p=0.008 n=5+5)
      FloatSqrt/10000-8                   1.91ms ± 0%    0.59ms ± 0%   -69.17%  (p=0.008 n=5+5)
      FloatSqrt/100000-8                  55.4ms ± 0%    17.8ms ± 0%   -67.79%  (p=0.008 n=5+5)
      FloatSqrt/1000000-8                  4.56s ± 0%     1.52s ± 0%   -66.59%  (p=0.008 n=5+5)
      
      Change-Id: Icce52c69668f564490c69b908338b21a2288e116
      Reviewed-on: https://go-review.googlesource.com/79355Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      140bfe9c
  3. 11 Mar, 2018 1 commit
  4. 10 Mar, 2018 7 commits
  5. 09 Mar, 2018 11 commits
    • David Chase's avatar
      cmd/compile: add DWARF reg defs & fix 32-bit location list bug · 0eacf8cb
      David Chase authored
      Before DWARF location lists can be turned on, 3 bugs need
      fixing.
      
      This CL addresses two -- lack of register definitions for
      various architectures, and bugs on 32-bit platforms.
      The third bug comes later.
      
      Passes
      GO_GCFLAGS=-dwarflocationlists ./run.bash -no-rebuild
      (-no-rebuild because the map dependence causes trouble)
      
      Change-Id: I4223b48ade84763e4b048e4aeb81149f082c7bc7
      Reviewed-on: https://go-review.googlesource.com/99255
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      0eacf8cb
    • Robert Griesemer's avatar
      go/scanner: recognize //line and /*line directives incl. columns · 99c30211
      Robert Griesemer authored
      This change updates go/scanner to recognize the extended line
      directives that are now also handled by cmd/compile:
      
      //line filename:line
      //line filename:line:column
      /*line filename:line*/
      /*line filename:line:column*/
      
      As before, //-style line directives must start in column 1.
      /*-style line directives may be placed anywhere in the code.
      In both cases, the specified position applies to the character
      immediately following the comment; for line comments that is
      the first character on the next line (after the newline of the
      comment).
      
      The go/token API is extended by a new method
      
      File.AddLineColumnInfo(offset int, filename string, line, column int)
      
      which extends the existing
      
      File.AddLineInfo(offset int, filename string, line int)
      
      by adding a column parameter.
      
      Adjusted token.Position computation is changed to take into account
      column information if provided via a line directive: A (line-directive)
      relative position will have a non-zero column iff the line directive
      specified a column; if the position is on the same line as the line
      directive, the column is relative to the specified column (otherwise
      it is relative to the line beginning). See also #24183.
      
      Finally, Position.String() has been adjusted to not print a column
      value if the column is unknown (== 0).
      
      Fixes #24143.
      
      Change-Id: I5518c825ad94443365c049a95677407b46ba55a1
      Reviewed-on: https://go-review.googlesource.com/97795Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      99c30211
    • Ben Shi's avatar
      cmd/compile: fix an issue in MNEG of ARM64 · 20046020
      Ben Shi authored
      There are two less optimized SSA rules in my previous CL
      https://go-review.googlesource.com/c/go/+/95075 .
      
      This CL fixes that issue and a test case gets about 10%
      performance improvement.
      name    old time/op  new time/op  delta
      MNEG-4   263µs ± 3%   235µs ± 3%  -10.53%  (p=0.000 n=20+20)
      (https://github.com/benshi001/ugo1/blob/master/mneg_7_test.go)
      
      Change-Id: I30087097e281dd9d9d1c870d32e13b4ef4a96ad3
      Reviewed-on: https://go-review.googlesource.com/99495
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      20046020
    • Austin Clements's avatar
      runtime: fix abort handling on arm64 · 0def0f2e
      Austin Clements authored
      The implementation of runtime.abort on arm64 currently branches to
      address 0, which results in a signal from PC 0, rather than from
      runtime.abort, so the runtime fails to recognize it as an abort.
      
      Fix runtime.abort on arm64 to read from address 0 like what other
      architectures do and recognize this in the signal handler.
      
      Should fix the linux/arm64 build.
      
      Change-Id: I960ab630daaeadc9190287604d4d8337b1ea3853
      Reviewed-on: https://go-review.googlesource.com/99895
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      0def0f2e
    • Matthew Dempsky's avatar
      cmd/compile: fix Node.Etype overloading · e4de522c
      Matthew Dempsky authored
      Add helper methods that validate n.Op and convert to/from the
      appropriate type.
      
      Notably, there was a lot of code in walk.go that thought setting
      Etype=1 on an OADDR node affected escape analysis.
      
      Passes toolstash-check.
      
      TBR=marvin
      
      Change-Id: Ieae7c67225c1459c9719f9e6a748a25b975cf758
      Reviewed-on: https://go-review.googlesource.com/99535Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      e4de522c
    • Ilya Tocar's avatar
      runtime: use bytes.IndexByte in findnull · 91102bf7
      Ilya Tocar authored
      bytes.IndexByte is heavily optimized. Use it in findnull.
      This is second attempt, similar to CL97523.
      In this version we never call IndexByte on region of memory,
      that crosses page boundary. A bit slower than CL97523,
      but still fast:
      
      name        old time/op  new time/op  delta
      GoString-6   164ns ± 2%   118ns ± 0%  -28.00%  (p=0.000 n=10+6)
      
      findnull is also used in gostringnocopy,
      which is used in many hot spots in the runtime.
      
      Fixes #23830
      
      Change-Id: Id843dd4f65a34309d92bdd8df229e484d26b0cb2
      Reviewed-on: https://go-review.googlesource.com/98015
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      91102bf7
    • Ian Lance Taylor's avatar
      runtime: set libcall values for Solaris system calls · 49027786
      Ian Lance Taylor authored
      This lets SIGPROF signals get a useful traceback.
      Without it we just see sysvicallN calling asmcgocall.
      
      Updates #24142
      
      Change-Id: I5dfe3add51f0c3a4cb1c98acb7738be6396214bc
      Reviewed-on: https://go-review.googlesource.com/99617
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      49027786
    • Alberto Donizetti's avatar
      test/codegen: add README file for the codegen test harness · 8e427a38
      Alberto Donizetti authored
      This change adds a README file inside the test/codegen directory,
      explaining how to run the codegen tests and the syntax of the regexps
      comments used to match assembly instructions.
      
      Change-Id: Ica4eb3ffa9c6975371538cc8ae0ac3c1a3a03baf
      Reviewed-on: https://go-review.googlesource.com/99156Reviewed-by: default avatarKeith Randall <khr@golang.org>
      8e427a38
    • Daniel Martí's avatar
      encoding/gob: work around TestFuzzOneByte panic · 1c6144d0
      Daniel Martí authored
      The index 248 results in the decoder calling reflect.MakeMapWithSize
      with a size of 14754407682 - just under 15GB - which ends up in a
      runtime out of memory panic after some recent runtime changes on
      machines with 8GB of memory.
      
      Until that is fixed in either runtime or gob, skip the troublesome
      index.
      
      Updates #24308.
      
      Change-Id: Ia450217271c983e7386ba2f3f88c9ba50aa346f4
      Reviewed-on: https://go-review.googlesource.com/99655
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      1c6144d0
    • Josh Bleecher Snyder's avatar
      runtime: add TestSizeof · 031f71ef
      Josh Bleecher Snyder authored
      Borrowed from cmd/compile, TestSizeof ensures
      that the size of important types doesn't change unexpectedly.
      It also helps reviewers see the impact of intended changes.
      
      Change-Id: If57955f0c3e66054de3f40c6bba585b88694c7be
      Reviewed-on: https://go-review.googlesource.com/99837
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      031f71ef
    • Vladimir Kuzmin's avatar
      net: optimize IP.String for IPv4 · b2d1cd2a
      Vladimir Kuzmin authored
      This is optimization is only for IPv4. It allocates a result buffer and
      writes the IPv4 octets as dotted decimal into it before converting
      it to a string just once, reducing allocations.
      
      Benchmark shows performance improvement:
      
      name             old time/op    new time/op    delta
      IPString/IPv4-8     284ns ± 4%     144ns ± 6%  -49.35%  (p=0.000 n=19+17)
      IPString/IPv6-8    1.34µs ± 5%    1.14µs ± 5%  -14.37%  (p=0.000 n=19+20)
      
      name             old alloc/op   new alloc/op   delta
      IPString/IPv4-8     24.0B ± 0%     16.0B ± 0%  -33.33%  (p=0.000 n=20+20)
      IPString/IPv6-8      232B ± 0%      224B ± 0%   -3.45%  (p=0.000 n=20+20)
      
      name             old allocs/op  new allocs/op  delta
      IPString/IPv4-8      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=20+20)
      IPString/IPv6-8      12.0 ± 0%      11.0 ± 0%   -8.33%  (p=0.000 n=20+20)
      
      Fixes #24306
      
      Change-Id: I4e2d30d364e78183d55a42907d277744494b6df3
      Reviewed-on: https://go-review.googlesource.com/99395Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      b2d1cd2a