1. 14 Mar, 2018 3 commits
  2. 13 Mar, 2018 16 commits
  3. 12 Mar, 2018 8 commits
    • Filippo Valsorda's avatar
      C: add Filippo Valsorda's @golang.org email · 2086f350
      Filippo Valsorda authored
      Change-Id: I3758a4350b600af304b2cff7ad59c7368a01ab5c
      Reviewed-on: https://go-review.googlesource.com/100215Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      2086f350
    • Josh Bleecher Snyder's avatar
      runtime: convert g.waitreason from string to uint8 · 4eea887f
      Josh Bleecher Snyder authored
      Every time I poke at #14921, the g.waitreason string
      pointer writes show up.
      
      They're not particularly important performance-wise,
      but it'd be nice to clear the noise away.
      
      And it does open up a few extra bytes in the g struct
      for some future use.
      
      Change-Id: I7ffbd52fbc2a286931a2218038fda52ed6473cc9
      Reviewed-on: https://go-review.googlesource.com/99078
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      4eea887f
    • Tobias Klauser's avatar
      runtime: simplify range expressions in tests · 025134b0
      Tobias Klauser authored
      Generated by running gofmt -s on the files in question.
      
      Change-Id: If6578b150e1bfced8657196d2af01f5d36879f93
      Reviewed-on: https://go-review.googlesource.com/100135
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      025134b0
    • Vladimir Kuzmin's avatar
      cmd/compile: avoid extra mapaccess in "m[k] op= r" · 73950831
      Vladimir Kuzmin authored
      Currently, order desugars map assignment operations like
      
          m[k] op= r
      
      into
      
          m[k] = m[k] op r
      
      which in turn is transformed during walk into:
      
          tmp := *mapaccess(m, k)
          tmp = tmp op r
          *mapassign(m, k) = tmp
      
      However, this is suboptimal, as we could instead produce just:
      
          *mapassign(m, k) op= r
      
      One complication though is if "r == 0", then "m[k] /= r" and "m[k] %=
      r" will panic, and they need to do so *before* calling mapassign,
      otherwise we may insert a new zero-value element into the map.
      
      It would be spec compliant to just emit the "r != 0" check before
      calling mapassign (see #23735), but currently these checks aren't
      generated until SSA construction. For now, it's simpler to continue
      desugaring /= and %= into two map indexing operations.
      
      Fixes #23661.
      
      Change-Id: I46e3739d9adef10e92b46fdd78b88d5aabe68952
      Reviewed-on: https://go-review.googlesource.com/91557
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      73950831
    • isharipo's avatar
      cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86 · 85a8d25d
      isharipo authored
      cmd/asm now supports three-operand form of IMUL,
      so instead of using IMUL with resultInArg0, emit IMUL3 instruction.
      
      This results in less redundant MOVs where SSA assigns
      different registers to input[0] and dst arguments.
      
      Note: these have exactly the same encoding when reg0=reg1:
            IMUL3x $const, reg0, reg1
            IMULx $const, reg
      Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0].
      This is why we don't bother to generate IMULx for the case where
      dst is the same as input[0].
      
      Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4
      Reviewed-on: https://go-review.googlesource.com/99656
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      85a8d25d
    • Giovanni Bajo's avatar
      cmd/compile: implement CMOV on amd64 · 080187f4
      Giovanni Bajo authored
      This builds upon the branchelim pass, activating it for amd64 and
      lowering CondSelect. Special care is made to FPU instructions for
      NaN handling.
      
      Benchmark results on Xeon E5630 (Westmere EP):
      
      name                      old time/op    new time/op    delta
      BinaryTree17-16              4.99s ± 9%     4.66s ± 2%     ~     (p=0.095 n=5+5)
      Fannkuch11-16                4.93s ± 3%     5.04s ± 2%     ~     (p=0.548 n=5+5)
      FmtFprintfEmpty-16          58.8ns ± 7%    61.4ns ±14%     ~     (p=0.579 n=5+5)
      FmtFprintfString-16          114ns ± 2%     114ns ± 4%     ~     (p=0.603 n=5+5)
      FmtFprintfInt-16             181ns ± 4%     125ns ± 3%  -30.90%  (p=0.008 n=5+5)
      FmtFprintfIntInt-16          263ns ± 2%     217ns ± 2%  -17.34%  (p=0.008 n=5+5)
      FmtFprintfPrefixedInt-16     230ns ± 1%     212ns ± 1%   -7.99%  (p=0.008 n=5+5)
      FmtFprintfFloat-16           411ns ± 3%     344ns ± 5%  -16.43%  (p=0.008 n=5+5)
      FmtManyArgs-16               828ns ± 4%     790ns ± 2%   -4.59%  (p=0.032 n=5+5)
      GobDecode-16                10.9ms ± 4%    10.8ms ± 5%     ~     (p=0.548 n=5+5)
      GobEncode-16                9.52ms ± 5%    9.46ms ± 2%     ~     (p=1.000 n=5+5)
      Gzip-16                      334ms ± 2%     337ms ± 2%     ~     (p=0.548 n=5+5)
      Gunzip-16                   64.4ms ± 1%    65.0ms ± 1%   +1.00%  (p=0.008 n=5+5)
      HTTPClientServer-16          156µs ± 3%     155µs ± 3%     ~     (p=0.690 n=5+5)
      JSONEncode-16               21.0ms ± 1%    21.8ms ± 0%   +3.76%  (p=0.016 n=5+4)
      JSONDecode-16               95.1ms ± 0%    95.7ms ± 1%     ~     (p=0.151 n=5+5)
      Mandelbrot200-16            6.38ms ± 1%    6.42ms ± 1%     ~     (p=0.095 n=5+5)
      GoParse-16                  5.47ms ± 2%    5.36ms ± 1%   -1.95%  (p=0.016 n=5+5)
      RegexpMatchEasy0_32-16       111ns ± 1%     111ns ± 1%     ~     (p=0.635 n=5+4)
      RegexpMatchEasy0_1K-16       408ns ± 1%     411ns ± 2%     ~     (p=0.087 n=5+5)
      RegexpMatchEasy1_32-16       103ns ± 1%     104ns ± 1%     ~     (p=0.484 n=5+5)
      RegexpMatchEasy1_1K-16       659ns ± 2%     652ns ± 1%     ~     (p=0.571 n=5+5)
      RegexpMatchMedium_32-16      176ns ± 2%     174ns ± 1%     ~     (p=0.476 n=5+5)
      RegexpMatchMedium_1K-16     58.6µs ± 4%    57.7µs ± 4%     ~     (p=0.548 n=5+5)
      RegexpMatchHard_32-16       3.07µs ± 3%    3.04µs ± 4%     ~     (p=0.421 n=5+5)
      RegexpMatchHard_1K-16       89.2µs ± 1%    87.9µs ± 2%   -1.52%  (p=0.032 n=5+5)
      Revcomp-16                   575ms ± 0%     587ms ± 2%   +2.12%  (p=0.032 n=4+5)
      Template-16                  110ms ± 1%     107ms ± 3%   -3.00%  (p=0.032 n=5+5)
      TimeParse-16                 463ns ± 0%     462ns ± 0%     ~     (p=0.810 n=5+4)
      TimeFormat-16                538ns ± 0%     535ns ± 0%   -0.63%  (p=0.024 n=5+5)
      
      name                      old speed      new speed      delta
      GobDecode-16              70.7MB/s ± 4%  71.4MB/s ± 5%     ~     (p=0.452 n=5+5)
      GobEncode-16              80.7MB/s ± 5%  81.2MB/s ± 2%     ~     (p=1.000 n=5+5)
      Gzip-16                   58.2MB/s ± 2%  57.7MB/s ± 2%     ~     (p=0.452 n=5+5)
      Gunzip-16                  302MB/s ± 1%   299MB/s ± 1%   -0.99%  (p=0.008 n=5+5)
      JSONEncode-16             92.4MB/s ± 1%  89.1MB/s ± 0%   -3.63%  (p=0.016 n=5+4)
      JSONDecode-16             20.4MB/s ± 0%  20.3MB/s ± 1%     ~     (p=0.135 n=5+5)
      GoParse-16                10.6MB/s ± 2%  10.8MB/s ± 1%   +2.00%  (p=0.016 n=5+5)
      RegexpMatchEasy0_32-16     286MB/s ± 1%   285MB/s ± 3%     ~     (p=1.000 n=5+5)
      RegexpMatchEasy0_1K-16    2.51GB/s ± 1%  2.49GB/s ± 2%     ~     (p=0.095 n=5+5)
      RegexpMatchEasy1_32-16     309MB/s ± 1%   307MB/s ± 1%     ~     (p=0.548 n=5+5)
      RegexpMatchEasy1_1K-16    1.55GB/s ± 2%  1.57GB/s ± 1%     ~     (p=0.690 n=5+5)
      RegexpMatchMedium_32-16   5.68MB/s ± 2%  5.73MB/s ± 1%     ~     (p=0.579 n=5+5)
      RegexpMatchMedium_1K-16   17.5MB/s ± 4%  17.8MB/s ± 4%     ~     (p=0.500 n=5+5)
      RegexpMatchHard_32-16     10.4MB/s ± 3%  10.5MB/s ± 4%     ~     (p=0.460 n=5+5)
      RegexpMatchHard_1K-16     11.5MB/s ± 1%  11.7MB/s ± 2%   +1.57%  (p=0.032 n=5+5)
      Revcomp-16                 442MB/s ± 0%   433MB/s ± 2%   -2.05%  (p=0.032 n=4+5)
      Template-16               17.7MB/s ± 1%  18.2MB/s ± 3%   +3.12%  (p=0.032 n=5+5)
      
      Change-Id: Ic7cb7374d07da031e771bdcbfdd832fd1b17159c
      Reviewed-on: https://go-review.googlesource.com/98695Reviewed-by: default avatarIlya Tocar <ilya.tocar@intel.com>
      080187f4
    • fanzha02's avatar
      cmd/asm: fix ARM64 vector register arrangement encoding bug · fdf5aaf5
      fanzha02 authored
      The current code assigns vector register arrangement a wrong value
      when the arrangement specifier is S2, which causes the incorrect
      assembly.
      
      The patch fixes the issue and adds the test cases.
      
      Fixes #24249
      
      Change-Id: I9736df1279494003d0b178da1af9cee9cd85ce21
      Reviewed-on: https://go-review.googlesource.com/98555Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      fdf5aaf5
    • erifan01's avatar
      math/big: optimize shlVU and shrVU on arm64 · 140bfe9c
      erifan01 authored
      This CL implements shlVU and shrVU with arm64 HW instructions "LDP" and "STP" to reduce load cost,
      it also removes unnecessary checks on the number of shifts for better performance.
      
      Benchmarks:
      
      name                              old time/op    new time/op    delta
      AddVV/1-8                           21.6ns ± 1%    21.6ns ± 1%      ~     (p=0.683 n=5+5)
      AddVV/2-8                           13.5ns ± 0%    13.5ns ± 0%      ~     (all equal)
      AddVV/3-8                           15.5ns ± 0%    15.5ns ± 0%      ~     (all equal)
      AddVV/4-8                           17.5ns ± 0%    17.5ns ± 0%      ~     (all equal)
      AddVV/5-8                           19.5ns ± 0%    19.5ns ± 0%      ~     (all equal)
      AddVV/10-8                          29.5ns ± 0%    29.5ns ± 0%      ~     (all equal)
      AddVV/100-8                          217ns ± 0%     217ns ± 0%      ~     (all equal)
      AddVV/1000-8                        2.02µs ± 0%    2.03µs ± 0%    +0.73%  (p=0.008 n=5+5)
      AddVV/10000-8                       20.5µs ± 0%    20.5µs ± 0%    -0.01%  (p=0.008 n=5+5)
      AddVV/100000-8                       246µs ± 5%     250µs ± 4%      ~     (p=0.548 n=5+5)
      AddVW/1-8                           9.26ns ± 0%    9.32ns ± 0%    +0.65%  (p=0.016 n=4+5)
      AddVW/2-8                           19.8ns ± 3%    19.8ns ± 0%      ~     (p=0.143 n=5+5)
      AddVW/3-8                           11.5ns ± 0%    11.5ns ± 0%      ~     (all equal)
      AddVW/4-8                           13.0ns ± 0%    13.0ns ± 0%      ~     (all equal)
      AddVW/5-8                           14.5ns ± 0%    14.5ns ± 0%      ~     (all equal)
      AddVW/10-8                          22.0ns ± 0%    22.0ns ± 0%      ~     (all equal)
      AddVW/100-8                          167ns ± 0%     166ns ± 0%    -0.60%  (p=0.000 n=5+4)
      AddVW/1000-8                        1.52µs ± 0%    1.52µs ± 0%      ~     (all equal)
      AddVW/10000-8                       15.1µs ± 0%    15.1µs ± 0%    +0.01%  (p=0.008 n=5+5)
      AddVW/100000-8                       163µs ± 4%     153µs ± 3%    -5.97%  (p=0.016 n=5+5)
      AddMulVVW/1-8                       32.4ns ± 1%    33.0ns ± 1%    +1.73%  (p=0.040 n=5+5)
      AddMulVVW/2-8                       56.4ns ± 2%    55.9ns ± 1%      ~     (p=0.135 n=5+5)
      AddMulVVW/3-8                       85.4ns ± 1%    85.1ns ± 0%      ~     (p=0.079 n=5+5)
      AddMulVVW/4-8                        129ns ± 1%     129ns ± 0%      ~     (p=0.397 n=5+5)
      AddMulVVW/5-8                        148ns ± 0%     148ns ± 0%      ~     (all equal)
      AddMulVVW/10-8                       270ns ± 0%     268ns ± 0%    -0.74%  (p=0.029 n=4+4)
      AddMulVVW/100-8                     2.75µs ± 0%    2.75µs ± 0%    -0.09%  (p=0.008 n=5+5)
      AddMulVVW/1000-8                    26.0µs ± 0%    26.0µs ± 0%    -0.06%  (p=0.024 n=5+5)
      AddMulVVW/10000-8                    312µs ± 0%     312µs ± 0%    -0.09%  (p=0.008 n=5+5)
      AddMulVVW/100000-8                  2.89ms ± 0%    2.89ms ± 0%    +0.14%  (p=0.016 n=5+5)
      DecimalConversion-8                  315µs ± 1%     312µs ± 0%      ~     (p=0.095 n=5+5)
      FloatString/100-8                   2.56µs ± 1%    2.52µs ± 1%    -1.31%  (p=0.016 n=5+5)
      FloatString/1000-8                  58.6µs ± 0%    58.2µs ± 0%    -0.75%  (p=0.008 n=5+5)
      FloatString/10000-8                 4.59ms ± 0%    4.59ms ± 0%      ~     (p=0.056 n=5+5)
      FloatString/100000-8                 446ms ± 0%     446ms ± 0%    -0.04%  (p=0.008 n=5+5)
      FloatAdd/10-8                        184ns ± 0%     178ns ± 0%    -3.48%  (p=0.008 n=5+5)
      FloatAdd/100-8                       189ns ± 3%     178ns ± 2%    -6.02%  (p=0.008 n=5+5)
      FloatAdd/1000-8                      371ns ± 0%     267ns ± 0%   -27.99%  (p=0.000 n=5+4)
      FloatAdd/10000-8                    1.87µs ± 0%    1.03µs ± 0%   -44.74%  (p=0.008 n=5+5)
      FloatAdd/100000-8                   17.1µs ± 0%     8.8µs ± 0%   -48.71%  (p=0.016 n=5+4)
      FloatSub/10-8                        148ns ± 0%     146ns ± 0%    -1.35%  (p=0.000 n=5+4)
      FloatSub/100-8                       148ns ± 0%     140ns ± 0%    -5.41%  (p=0.008 n=5+5)
      FloatSub/1000-8                      242ns ± 0%     191ns ± 0%   -21.24%  (p=0.008 n=5+5)
      FloatSub/10000-8                    1.07µs ± 0%    0.64µs ± 1%   -39.89%  (p=0.016 n=4+5)
      FloatSub/100000-8                   9.48µs ± 0%    5.32µs ± 0%   -43.87%  (p=0.008 n=5+5)
      ParseFloatSmallExp-8                29.3µs ± 3%    28.6µs ± 1%      ~     (p=0.310 n=5+5)
      ParseFloatLargeExp-8                 125µs ± 1%     123µs ± 0%    -1.99%  (p=0.008 n=5+5)
      GCD10x10/WithoutXY-8                 278ns ± 4%     289ns ± 5%    +3.96%  (p=0.040 n=5+5)
      GCD10x10/WithXY-8                   2.12µs ± 2%    2.15µs ± 2%      ~     (p=0.095 n=5+5)
      GCD10x100/WithoutXY-8                615ns ± 1%     629ns ± 5%      ~     (p=0.135 n=5+5)
      GCD10x100/WithXY-8                  3.42µs ± 1%    3.53µs ± 2%    +3.38%  (p=0.008 n=5+5)
      GCD10x1000/WithoutXY-8              1.39µs ± 1%    1.38µs ± 1%      ~     (p=0.460 n=5+5)
      GCD10x1000/WithXY-8                 7.47µs ± 2%    7.49µs ± 3%      ~     (p=1.000 n=5+5)
      GCD10x10000/WithoutXY-8             8.71µs ± 1%    8.71µs ± 0%      ~     (p=0.841 n=5+5)
      GCD10x10000/WithXY-8                28.4µs ± 2%    27.2µs ± 2%    -4.24%  (p=0.008 n=5+5)
      GCD10x100000/WithoutXY-8            78.9µs ± 1%    79.1µs ± 0%      ~     (p=0.222 n=5+5)
      GCD10x100000/WithXY-8                240µs ± 1%     228µs ± 1%    -4.98%  (p=0.008 n=5+5)
      GCD100x100/WithoutXY-8              1.87µs ± 2%    1.89µs ± 1%      ~     (p=0.095 n=5+5)
      GCD100x100/WithXY-8                 26.6µs ± 1%    26.3µs ± 0%    -1.14%  (p=0.032 n=5+5)
      GCD100x1000/WithoutXY-8             4.44µs ± 2%    4.47µs ± 2%      ~     (p=0.444 n=5+5)
      GCD100x1000/WithXY-8                36.7µs ± 1%    36.0µs ± 1%    -1.96%  (p=0.008 n=5+5)
      GCD100x10000/WithoutXY-8            22.8µs ± 1%    22.3µs ± 1%    -2.52%  (p=0.008 n=5+5)
      GCD100x10000/WithXY-8                145µs ± 1%     142µs ± 0%    -1.88%  (p=0.008 n=5+5)
      GCD100x100000/WithoutXY-8            198µs ± 1%     190µs ± 0%    -4.06%  (p=0.008 n=5+5)
      GCD100x100000/WithXY-8              1.11ms ± 0%    1.09ms ± 0%    -1.87%  (p=0.008 n=5+5)
      GCD1000x1000/WithoutXY-8            25.4µs ± 1%    25.0µs ± 1%    -1.34%  (p=0.008 n=5+5)
      GCD1000x1000/WithXY-8                515µs ± 1%     485µs ± 0%    -5.85%  (p=0.008 n=5+5)
      GCD1000x10000/WithoutXY-8           57.3µs ± 1%    56.2µs ± 1%    -1.95%  (p=0.008 n=5+5)
      GCD1000x10000/WithXY-8              1.21ms ± 0%    1.18ms ± 0%    -2.65%  (p=0.008 n=5+5)
      GCD1000x100000/WithoutXY-8           358µs ± 0%     352µs ± 1%    -1.71%  (p=0.008 n=5+5)
      GCD1000x100000/WithXY-8             8.72ms ± 0%    8.66ms ± 0%    -0.71%  (p=0.008 n=5+5)
      GCD10000x10000/WithoutXY-8           690µs ± 0%     687µs ± 1%      ~     (p=0.095 n=5+5)
      GCD10000x10000/WithXY-8             16.0ms ± 0%    12.5ms ± 0%   -22.01%  (p=0.008 n=5+5)
      GCD10000x100000/WithoutXY-8         2.09ms ± 0%    2.07ms ± 0%    -0.58%  (p=0.008 n=5+5)
      GCD10000x100000/WithXY-8            86.8ms ± 0%    83.4ms ± 0%    -3.95%  (p=0.008 n=5+5)
      GCD100000x100000/WithoutXY-8        51.2ms ± 0%    51.2ms ± 0%      ~     (p=0.548 n=5+5)
      GCD100000x100000/WithXY-8            1.25s ± 0%     0.89s ± 0%   -28.98%  (p=0.008 n=5+5)
      Hilbert-8                           2.46ms ± 2%    2.53ms ± 1%    +2.89%  (p=0.032 n=5+5)
      Binomial-8                          5.15µs ± 4%    4.92µs ± 1%    -4.43%  (p=0.032 n=5+5)
      QuoRem-8                            7.10µs ± 0%    7.05µs ± 0%    -0.59%  (p=0.008 n=5+5)
      Exp-8                                161ms ± 0%     161ms ± 0%    -0.24%  (p=0.008 n=5+5)
      Exp2-8                               161ms ± 0%     161ms ± 0%    -0.30%  (p=0.016 n=4+5)
      Bitset-8                            40.4ns ± 0%    40.3ns ± 0%      ~     (p=0.159 n=5+5)
      BitsetNeg-8                          158ns ± 4%     155ns ± 2%      ~     (p=0.183 n=5+5)
      BitsetOrig-8                         374ns ± 0%     383ns ± 1%    +2.35%  (p=0.008 n=5+5)
      BitsetNegOrig-8                      620ns ± 1%     663ns ± 2%    +7.00%  (p=0.008 n=5+5)
      ModSqrt225_Tonelli-8                7.26ms ± 0%    7.27ms ± 0%      ~     (p=0.841 n=5+5)
      ModSqrt224_3Mod4-8                  2.24ms ± 0%    2.24ms ± 0%      ~     (p=0.690 n=5+5)
      ModSqrt5430_Tonelli-8                62.3s ± 0%     62.4s ± 0%    +0.15%  (p=0.008 n=5+5)
      ModSqrt5430_3Mod4-8                  20.8s ± 0%     20.8s ± 0%      ~     (p=0.151 n=5+5)
      Sqrt-8                               101µs ± 0%      97µs ± 0%    -3.99%  (p=0.008 n=5+5)
      IntSqr/1-8                          32.7ns ± 1%    32.5ns ± 1%      ~     (p=0.325 n=5+5)
      IntSqr/2-8                           161ns ± 4%     160ns ± 4%      ~     (p=0.659 n=5+5)
      IntSqr/3-8                           296ns ± 7%     297ns ± 6%      ~     (p=0.841 n=5+5)
      IntSqr/5-8                           752ns ± 7%     755ns ± 6%      ~     (p=0.889 n=5+5)
      IntSqr/8-8                          1.91µs ± 3%    1.90µs ± 3%      ~     (p=0.746 n=5+5)
      IntSqr/10-8                         2.99µs ± 4%    3.00µs ± 4%      ~     (p=0.516 n=5+5)
      IntSqr/20-8                         6.29µs ± 2%    6.19µs ± 2%      ~     (p=0.151 n=5+5)
      IntSqr/30-8                         14.0µs ± 1%    13.8µs ± 2%      ~     (p=0.056 n=5+5)
      IntSqr/50-8                         38.1µs ± 3%    37.9µs ± 3%      ~     (p=0.548 n=5+5)
      IntSqr/80-8                         95.1µs ± 1%    94.7µs ± 1%      ~     (p=0.310 n=5+5)
      IntSqr/100-8                         148µs ± 1%     148µs ± 1%      ~     (p=0.548 n=5+5)
      IntSqr/200-8                         587µs ± 1%     587µs ± 1%      ~     (p=1.000 n=5+5)
      IntSqr/300-8                        1.31ms ± 1%    1.32ms ± 1%      ~     (p=0.151 n=5+5)
      IntSqr/500-8                        2.48ms ± 0%    2.49ms ± 0%      ~     (p=0.310 n=5+5)
      IntSqr/800-8                        4.68ms ± 0%    4.67ms ± 0%      ~     (p=0.548 n=5+5)
      IntSqr/1000-8                       7.57ms ± 0%    7.56ms ± 0%      ~     (p=0.421 n=5+5)
      Mul-8                                311ms ± 0%     311ms ± 0%      ~     (p=0.151 n=5+5)
      Exp3Power/0x10-8                     584ns ± 2%     573ns ± 1%      ~     (p=0.190 n=5+5)
      Exp3Power/0x40-8                     646ns ± 2%     649ns ± 1%      ~     (p=0.690 n=5+5)
      Exp3Power/0x100-8                   1.42µs ± 2%    1.45µs ± 1%    +2.03%  (p=0.032 n=5+5)
      Exp3Power/0x400-8                   8.28µs ± 1%    8.39µs ± 0%    +1.33%  (p=0.008 n=5+5)
      Exp3Power/0x1000-8                  60.1µs ± 0%    59.8µs ± 0%    -0.44%  (p=0.008 n=5+5)
      Exp3Power/0x4000-8                   818µs ± 0%     816µs ± 0%    -0.23%  (p=0.008 n=5+5)
      Exp3Power/0x10000-8                 7.79ms ± 0%    7.78ms ± 0%      ~     (p=0.690 n=5+5)
      Exp3Power/0x40000-8                 73.4ms ± 0%    73.3ms ± 0%      ~     (p=0.151 n=5+5)
      Exp3Power/0x100000-8                 665ms ± 0%     664ms ± 0%    -0.16%  (p=0.016 n=4+5)
      Exp3Power/0x400000-8                 5.99s ± 0%     5.97s ± 0%    -0.24%  (p=0.008 n=5+5)
      Fibo-8                               116ms ± 0%     117ms ± 0%    +0.42%  (p=0.008 n=5+5)
      NatSqr/1-8                           113ns ± 2%     112ns ± 1%      ~     (p=0.190 n=5+5)
      NatSqr/2-8                           249ns ± 2%     250ns ± 2%      ~     (p=0.365 n=5+5)
      NatSqr/3-8                           379ns ± 1%     381ns ± 2%      ~     (p=0.127 n=5+5)
      NatSqr/5-8                           838ns ± 3%     841ns ± 5%      ~     (p=0.754 n=5+5)
      NatSqr/8-8                          1.97µs ± 3%    1.97µs ± 4%      ~     (p=1.000 n=5+5)
      NatSqr/10-8                         3.04µs ± 4%    3.04µs ± 4%      ~     (p=1.000 n=5+5)
      NatSqr/20-8                         6.49µs ± 3%    6.50µs ± 2%      ~     (p=0.841 n=5+5)
      NatSqr/30-8                         14.3µs ± 2%    14.2µs ± 2%      ~     (p=0.548 n=5+5)
      NatSqr/50-8                         38.5µs ± 3%    38.3µs ± 3%      ~     (p=0.421 n=5+5)
      NatSqr/80-8                         96.3µs ± 1%    96.1µs ± 1%      ~     (p=0.421 n=5+5)
      NatSqr/100-8                         149µs ± 1%     148µs ± 1%      ~     (p=0.310 n=5+5)
      NatSqr/200-8                         591µs ± 1%     592µs ± 1%      ~     (p=0.690 n=5+5)
      NatSqr/300-8                        1.31ms ± 1%    1.32ms ± 0%      ~     (p=0.190 n=5+4)
      NatSqr/500-8                        2.49ms ± 0%    2.49ms ± 0%      ~     (p=0.095 n=5+5)
      NatSqr/800-8                        4.70ms ± 0%    4.69ms ± 0%      ~     (p=0.222 n=5+5)
      NatSqr/1000-8                       7.60ms ± 0%    7.58ms ± 0%      ~     (p=0.222 n=5+5)
      ScanPi-8                             326µs ± 0%     327µs ± 1%      ~     (p=0.222 n=5+5)
      StringPiParallel-8                  71.4µs ± 5%    67.7µs ± 4%      ~     (p=0.095 n=5+5)
      Scan/10/Base2-8                     1.09µs ± 0%    1.10µs ± 1%      ~     (p=0.810 n=5+5)
      Scan/100/Base2-8                    7.79µs ± 0%    7.83µs ± 0%    +0.53%  (p=0.008 n=5+5)
      Scan/1000/Base2-8                   78.9µs ± 0%    79.0µs ± 0%      ~     (p=0.151 n=5+5)
      Scan/10000/Base2-8                  1.22ms ± 0%    1.23ms ± 1%      ~     (p=0.690 n=5+5)
      Scan/100000/Base2-8                 55.1ms ± 0%    55.1ms ± 0%    +0.10%  (p=0.008 n=5+5)
      Scan/10/Base8-8                      512ns ± 1%     534ns ± 1%    +4.34%  (p=0.008 n=5+5)
      Scan/100/Base8-8                    2.90µs ± 1%    2.92µs ± 0%    +0.67%  (p=0.024 n=5+5)
      Scan/1000/Base8-8                   31.0µs ± 0%    31.1µs ± 0%    +0.27%  (p=0.008 n=5+5)
      Scan/10000/Base8-8                   741µs ± 0%     744µs ± 1%      ~     (p=0.310 n=5+5)
      Scan/100000/Base8-8                 50.5ms ± 0%    50.7ms ± 0%    +0.23%  (p=0.016 n=5+4)
      Scan/10/Base10-8                     485ns ± 0%     510ns ± 1%    +5.15%  (p=0.008 n=5+5)
      Scan/100/Base10-8                   2.68µs ± 0%    2.70µs ± 0%    +0.84%  (p=0.008 n=5+5)
      Scan/1000/Base10-8                  28.7µs ± 0%    28.8µs ± 0%    +0.34%  (p=0.008 n=5+5)
      Scan/10000/Base10-8                  717µs ± 0%     720µs ± 1%      ~     (p=0.238 n=5+5)
      Scan/100000/Base10-8                50.3ms ± 0%    50.3ms ± 0%    +0.02%  (p=0.016 n=4+5)
      Scan/10/Base16-8                     439ns ± 0%     461ns ± 1%    +5.06%  (p=0.008 n=5+5)
      Scan/100/Base16-8                   2.48µs ± 0%    2.49µs ± 0%    +0.59%  (p=0.024 n=5+5)
      Scan/1000/Base16-8                  27.2µs ± 0%    27.3µs ± 0%      ~     (p=0.063 n=5+5)
      Scan/10000/Base16-8                  722µs ± 0%     725µs ± 1%      ~     (p=0.421 n=5+5)
      Scan/100000/Base16-8                52.7ms ± 0%    52.7ms ± 0%      ~     (p=0.686 n=4+4)
      String/10/Base2-8                    248ns ± 1%     248ns ± 1%      ~     (p=0.802 n=5+5)
      String/100/Base2-8                  1.51µs ± 0%    1.51µs ± 0%    -0.54%  (p=0.024 n=5+5)
      String/1000/Base2-8                 13.6µs ± 0%    13.6µs ± 0%      ~     (p=0.548 n=5+5)
      String/10000/Base2-8                 135µs ± 1%     135µs ± 2%      ~     (p=0.421 n=5+5)
      String/100000/Base2-8               1.32ms ± 1%    1.33ms ± 1%      ~     (p=0.310 n=5+5)
      String/10/Base8-8                    169ns ± 0%     170ns ± 0%      ~     (p=0.079 n=5+5)
      String/100/Base8-8                   635ns ± 1%     633ns ± 1%      ~     (p=0.595 n=5+5)
      String/1000/Base8-8                 5.33µs ± 0%    5.30µs ± 0%      ~     (p=0.063 n=5+5)
      String/10000/Base8-8                50.7µs ± 1%    50.7µs ± 1%      ~     (p=1.000 n=5+5)
      String/100000/Base8-8                499µs ± 1%     500µs ± 1%      ~     (p=1.000 n=5+5)
      String/10/Base10-8                   517ns ± 1%     512ns ± 1%    -1.01%  (p=0.032 n=5+5)
      String/100/Base10-8                 1.97µs ± 0%    2.01µs ± 1%    +2.13%  (p=0.008 n=5+5)
      String/1000/Base10-8                12.6µs ± 1%    12.1µs ± 1%    -4.16%  (p=0.008 n=5+5)
      String/10000/Base10-8               57.9µs ± 1%    54.8µs ± 1%    -5.46%  (p=0.008 n=5+5)
      String/100000/Base10-8              25.6ms ± 0%    25.6ms ± 0%    -0.12%  (p=0.008 n=5+5)
      String/10/Base16-8                   149ns ± 0%     149ns ± 1%      ~     (p=1.000 n=5+5)
      String/100/Base16-8                  514ns ± 0%     514ns ± 1%      ~     (p=0.825 n=5+5)
      String/1000/Base16-8                4.01µs ± 0%    4.01µs ± 0%      ~     (p=0.595 n=5+5)
      String/10000/Base16-8               37.7µs ± 0%    37.8µs ± 1%      ~     (p=0.222 n=5+5)
      String/100000/Base16-8               373µs ± 1%     372µs ± 0%      ~     (p=1.000 n=5+5)
      LeafSize/0-8                        6.64ms ± 0%    6.66ms ± 0%    +0.32%  (p=0.008 n=5+5)
      LeafSize/1-8                        74.0µs ± 1%    71.2µs ± 1%    -3.75%  (p=0.008 n=5+5)
      LeafSize/2-8                        74.1µs ± 0%    70.7µs ± 1%    -4.53%  (p=0.008 n=5+5)
      LeafSize/3-8                         379µs ± 0%     374µs ± 0%    -1.25%  (p=0.008 n=5+5)
      LeafSize/4-8                        72.7µs ± 0%    69.2µs ± 0%    -4.79%  (p=0.008 n=5+5)
      LeafSize/5-8                         471µs ± 0%     466µs ± 0%    -1.05%  (p=0.008 n=5+5)
      LeafSize/6-8                         377µs ± 0%     373µs ± 0%    -1.16%  (p=0.008 n=5+5)
      LeafSize/7-8                         245µs ± 0%     241µs ± 0%    -1.65%  (p=0.008 n=5+5)
      LeafSize/8-8                        73.1µs ± 0%    69.4µs ± 0%    -5.10%  (p=0.008 n=5+5)
      LeafSize/9-8                         538µs ± 0%     532µs ± 0%    -1.01%  (p=0.008 n=5+5)
      LeafSize/10-8                        472µs ± 0%     467µs ± 0%    -1.07%  (p=0.008 n=5+5)
      LeafSize/11-8                        460µs ± 0%     454µs ± 0%    -1.22%  (p=0.008 n=5+5)
      LeafSize/12-8                        378µs ± 0%     373µs ± 0%    -1.34%  (p=0.008 n=5+5)
      LeafSize/13-8                        344µs ± 0%     338µs ± 0%    -1.61%  (p=0.008 n=5+5)
      LeafSize/14-8                        247µs ± 0%     243µs ± 0%    -1.62%  (p=0.008 n=5+5)
      LeafSize/15-8                        169µs ± 0%     165µs ± 0%    -2.71%  (p=0.008 n=5+5)
      LeafSize/16-8                       73.3µs ± 1%    69.5µs ± 0%    -5.11%  (p=0.008 n=5+5)
      LeafSize/32-8                       82.7µs ± 0%    79.2µs ± 0%    -4.24%  (p=0.008 n=5+5)
      LeafSize/64-8                        135µs ± 0%     132µs ± 0%    -2.20%  (p=0.008 n=5+5)
      ProbablyPrime/n=0-8                 44.2ms ± 0%    43.9ms ± 0%    -0.69%  (p=0.008 n=5+5)
      ProbablyPrime/n=1-8                 64.8ms ± 0%    64.4ms ± 0%    -0.60%  (p=0.008 n=5+5)
      ProbablyPrime/n=5-8                  147ms ± 0%     147ms ± 0%    -0.34%  (p=0.008 n=5+5)
      ProbablyPrime/n=10-8                 250ms ± 0%     249ms ± 0%    -0.29%  (p=0.008 n=5+5)
      ProbablyPrime/n=20-8                 456ms ± 0%     455ms ± 0%    -0.29%  (p=0.008 n=5+5)
      ProbablyPrime/Lucas-8               23.6ms ± 0%    23.2ms ± 0%    -1.44%  (p=0.008 n=5+5)
      ProbablyPrime/MillerRabinBase2-8    20.6ms ± 0%    20.6ms ± 0%    -0.31%  (p=0.008 n=5+5)
      FloatSqrt/64-8                      2.27µs ± 1%    2.11µs ± 1%    -7.02%  (p=0.008 n=5+5)
      FloatSqrt/128-8                     4.93µs ± 1%    4.40µs ± 1%   -10.73%  (p=0.008 n=5+5)
      FloatSqrt/256-8                     13.6µs ± 0%     6.6µs ± 1%   -51.40%  (p=0.008 n=5+5)
      FloatSqrt/1000-8                    69.8µs ± 0%    31.2µs ± 0%   -55.27%  (p=0.008 n=5+5)
      FloatSqrt/10000-8                   1.91ms ± 0%    0.59ms ± 0%   -69.17%  (p=0.008 n=5+5)
      FloatSqrt/100000-8                  55.4ms ± 0%    17.8ms ± 0%   -67.79%  (p=0.008 n=5+5)
      FloatSqrt/1000000-8                  4.56s ± 0%     1.52s ± 0%   -66.59%  (p=0.008 n=5+5)
      
      Change-Id: Icce52c69668f564490c69b908338b21a2288e116
      Reviewed-on: https://go-review.googlesource.com/79355Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      140bfe9c
  4. 11 Mar, 2018 1 commit
  5. 10 Mar, 2018 7 commits
  6. 09 Mar, 2018 5 commits
    • David Chase's avatar
      cmd/compile: add DWARF reg defs & fix 32-bit location list bug · 0eacf8cb
      David Chase authored
      Before DWARF location lists can be turned on, 3 bugs need
      fixing.
      
      This CL addresses two -- lack of register definitions for
      various architectures, and bugs on 32-bit platforms.
      The third bug comes later.
      
      Passes
      GO_GCFLAGS=-dwarflocationlists ./run.bash -no-rebuild
      (-no-rebuild because the map dependence causes trouble)
      
      Change-Id: I4223b48ade84763e4b048e4aeb81149f082c7bc7
      Reviewed-on: https://go-review.googlesource.com/99255
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      0eacf8cb
    • Robert Griesemer's avatar
      go/scanner: recognize //line and /*line directives incl. columns · 99c30211
      Robert Griesemer authored
      This change updates go/scanner to recognize the extended line
      directives that are now also handled by cmd/compile:
      
      //line filename:line
      //line filename:line:column
      /*line filename:line*/
      /*line filename:line:column*/
      
      As before, //-style line directives must start in column 1.
      /*-style line directives may be placed anywhere in the code.
      In both cases, the specified position applies to the character
      immediately following the comment; for line comments that is
      the first character on the next line (after the newline of the
      comment).
      
      The go/token API is extended by a new method
      
      File.AddLineColumnInfo(offset int, filename string, line, column int)
      
      which extends the existing
      
      File.AddLineInfo(offset int, filename string, line int)
      
      by adding a column parameter.
      
      Adjusted token.Position computation is changed to take into account
      column information if provided via a line directive: A (line-directive)
      relative position will have a non-zero column iff the line directive
      specified a column; if the position is on the same line as the line
      directive, the column is relative to the specified column (otherwise
      it is relative to the line beginning). See also #24183.
      
      Finally, Position.String() has been adjusted to not print a column
      value if the column is unknown (== 0).
      
      Fixes #24143.
      
      Change-Id: I5518c825ad94443365c049a95677407b46ba55a1
      Reviewed-on: https://go-review.googlesource.com/97795Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      99c30211
    • Ben Shi's avatar
      cmd/compile: fix an issue in MNEG of ARM64 · 20046020
      Ben Shi authored
      There are two less optimized SSA rules in my previous CL
      https://go-review.googlesource.com/c/go/+/95075 .
      
      This CL fixes that issue and a test case gets about 10%
      performance improvement.
      name    old time/op  new time/op  delta
      MNEG-4   263µs ± 3%   235µs ± 3%  -10.53%  (p=0.000 n=20+20)
      (https://github.com/benshi001/ugo1/blob/master/mneg_7_test.go)
      
      Change-Id: I30087097e281dd9d9d1c870d32e13b4ef4a96ad3
      Reviewed-on: https://go-review.googlesource.com/99495
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      20046020
    • Austin Clements's avatar
      runtime: fix abort handling on arm64 · 0def0f2e
      Austin Clements authored
      The implementation of runtime.abort on arm64 currently branches to
      address 0, which results in a signal from PC 0, rather than from
      runtime.abort, so the runtime fails to recognize it as an abort.
      
      Fix runtime.abort on arm64 to read from address 0 like what other
      architectures do and recognize this in the signal handler.
      
      Should fix the linux/arm64 build.
      
      Change-Id: I960ab630daaeadc9190287604d4d8337b1ea3853
      Reviewed-on: https://go-review.googlesource.com/99895
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      0def0f2e
    • Matthew Dempsky's avatar
      cmd/compile: fix Node.Etype overloading · e4de522c
      Matthew Dempsky authored
      Add helper methods that validate n.Op and convert to/from the
      appropriate type.
      
      Notably, there was a lot of code in walk.go that thought setting
      Etype=1 on an OADDR node affected escape analysis.
      
      Passes toolstash-check.
      
      TBR=marvin
      
      Change-Id: Ieae7c67225c1459c9719f9e6a748a25b975cf758
      Reviewed-on: https://go-review.googlesource.com/99535Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      e4de522c