1. 14 Mar, 2018 11 commits
    • Lynn Boger's avatar
      cmd/compile: improve PPC64.rules to reduce size of rewritePPC64.go · aff222cd
      Lynn Boger authored
      Some rules in PPC64.rules cause an extremely large rewritePPC64.go
      file to be generated, due to rules with commutative operations and
      many operands. This happens with the existing
      rules for combining byte loads in little endian order, and
      also happens with the pending change to do the same for bytes
      in big endian order.
      
      The change improves the existing rules and reduces the size of
      the rewrite file by more than 60%. Once this change is merged,
      then the pending change for big endian ordered rules will be
      updated to use rules that avoid generating an excessively large
      rewrite file.
      
      This also includes a fix to a performance regression for
      littleEndian.PutUint16 on ppc64le.
      
      Change-Id: I8d2ea42885fa2b84b30c63aa124b0a9b130564ff
      Reviewed-on: https://go-review.googlesource.com/100675
      Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      aff222cd
    • Robert Griesemer's avatar
      math/big: add comment about internal assumptions on nat values · 7d4d2cb6
      Robert Griesemer authored
      Change-Id: I7ed40507a019c0bf521ba748fc22c03d74bb17b7
      Reviewed-on: https://go-review.googlesource.com/100719Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      7d4d2cb6
    • Balaram Makam's avatar
      runtime: improve arm64 memclr implementation · b46d3988
      Balaram Makam authored
      Improve runtime memclr_arm64.s using ZVA feature to zero out memory when n
      is at least 64 bytes.
      
      Also add DCZID_EL0 system register to use in MRS instruction.
      
          Benchmark results of runtime/Memclr on Amberwing:
      name          old time/op    new time/op    delta
      Memclr/5        12.7ns ± 0%    12.7ns ± 0%      ~     (all equal)
      Memclr/16       12.7ns ± 0%    12.2ns ± 1%    -4.13%  (p=0.000 n=7+8)
      Memclr/64       14.0ns ± 0%    14.6ns ± 1%    +4.29%  (p=0.000 n=7+8)
      Memclr/256      23.7ns ± 0%    25.7ns ± 0%    +8.44%  (p=0.000 n=8+7)
      Memclr/4096      204ns ± 0%      74ns ± 0%   -63.71%  (p=0.000 n=8+8)
      Memclr/65536    2.89µs ± 0%    0.84µs ± 0%   -70.91%  (p=0.000 n=8+8)
      Memclr/1M       45.9µs ± 0%    17.0µs ± 0%   -62.88%  (p=0.000 n=8+8)
      Memclr/4M        184µs ± 0%      77µs ± 4%   -57.94%  (p=0.001 n=6+8)
      Memclr/8M        367µs ± 0%     144µs ± 1%   -60.72%  (p=0.000 n=7+8)
      Memclr/16M       734µs ± 0%     293µs ± 1%   -60.09%  (p=0.000 n=8+8)
      Memclr/64M      2.94ms ± 0%    1.23ms ± 0%   -58.06%  (p=0.000 n=7+8)
      GoMemclr/5      8.00ns ± 0%    8.79ns ± 0%    +9.83%  (p=0.000 n=8+8)
      GoMemclr/16     8.00ns ± 0%    7.60ns ± 0%    -5.00%  (p=0.000 n=8+8)
      GoMemclr/64     10.8ns ± 0%    10.4ns ± 0%    -3.70%  (p=0.000 n=8+8)
      GoMemclr/256    20.4ns ± 0%    21.2ns ± 0%    +3.92%  (p=0.000 n=8+8)
      
      name          old speed      new speed      delta
      Memclr/5       394MB/s ± 0%   393MB/s ± 0%    -0.28%  (p=0.006 n=8+8)
      Memclr/16     1.26GB/s ± 0%  1.31GB/s ± 1%    +4.07%  (p=0.000 n=7+8)
      Memclr/64     4.57GB/s ± 0%  4.39GB/s ± 2%    -3.91%  (p=0.000 n=7+8)
      Memclr/256    10.8GB/s ± 0%  10.0GB/s ± 0%    -7.95%  (p=0.001 n=7+6)
      Memclr/4096   20.1GB/s ± 0%  55.3GB/s ± 0%  +175.46%  (p=0.000 n=8+8)
      Memclr/65536  22.6GB/s ± 0%  77.8GB/s ± 0%  +243.63%  (p=0.000 n=7+8)
      Memclr/1M     22.8GB/s ± 0%  61.5GB/s ± 0%  +169.38%  (p=0.000 n=8+8)
      Memclr/4M     22.8GB/s ± 0%  54.3GB/s ± 4%  +137.85%  (p=0.001 n=6+8)
      Memclr/8M     22.8GB/s ± 0%  58.1GB/s ± 1%  +154.56%  (p=0.000 n=7+8)
      Memclr/16M    22.8GB/s ± 0%  57.2GB/s ± 1%  +150.54%  (p=0.000 n=8+8)
      Memclr/64M    22.8GB/s ± 0%  54.4GB/s ± 0%  +138.42%  (p=0.000 n=7+8)
      GoMemclr/5     625MB/s ± 0%   569MB/s ± 0%    -8.90%  (p=0.000 n=7+8)
      GoMemclr/16   2.00GB/s ± 0%  2.10GB/s ± 0%    +5.26%  (p=0.000 n=8+8)
      GoMemclr/64   5.92GB/s ± 0%  6.15GB/s ± 0%    +3.83%  (p=0.000 n=7+8)
      GoMemclr/256  12.5GB/s ± 0%  12.1GB/s ± 0%    -3.77%  (p=0.000 n=8+7)
      
          Benchmark results of runtime/Memclr on Amberwing without ZVA:
      name          old time/op    new time/op    delta
      Memclr/5        12.7ns ± 0%    12.8ns ± 0%   +0.79%  (p=0.008 n=5+5)
      Memclr/16       12.7ns ± 0%    12.7ns ± 0%     ~     (p=0.444 n=5+5)
      Memclr/64       14.0ns ± 0%    14.4ns ± 0%   +2.86%  (p=0.008 n=5+5)
      Memclr/256      23.7ns ± 1%    19.2ns ± 0%  -19.06%  (p=0.008 n=5+5)
      Memclr/4096      203ns ± 0%     119ns ± 0%  -41.38%  (p=0.008 n=5+5)
      Memclr/65536    2.89µs ± 0%    1.66µs ± 0%  -42.76%  (p=0.008 n=5+5)
      Memclr/1M       45.9µs ± 0%    26.2µs ± 0%  -42.82%  (p=0.008 n=5+5)
      Memclr/4M        184µs ± 0%     105µs ± 0%  -42.81%  (p=0.008 n=5+5)
      Memclr/8M        367µs ± 0%     210µs ± 0%  -42.76%  (p=0.008 n=5+5)
      Memclr/16M       734µs ± 0%     420µs ± 0%  -42.74%  (p=0.008 n=5+5)
      Memclr/64M      2.94ms ± 0%    1.69ms ± 0%  -42.46%  (p=0.008 n=5+5)
      GoMemclr/5      8.00ns ± 0%    8.40ns ± 0%   +5.00%  (p=0.008 n=5+5)
      GoMemclr/16     8.00ns ± 0%    8.40ns ± 0%   +5.00%  (p=0.008 n=5+5)
      GoMemclr/64     10.8ns ± 0%     9.6ns ± 0%  -11.02%  (p=0.008 n=5+5)
      GoMemclr/256    20.4ns ± 0%    17.2ns ± 0%  -15.69%  (p=0.008 n=5+5)
      
      name          old speed      new speed      delta
      Memclr/5       393MB/s ± 0%   391MB/s ± 0%   -0.64%  (p=0.008 n=5+5)
      Memclr/16     1.26GB/s ± 0%  1.26GB/s ± 0%   -0.55%  (p=0.008 n=5+5)
      Memclr/64     4.57GB/s ± 0%  4.44GB/s ± 0%   -2.79%  (p=0.008 n=5+5)
      Memclr/256    10.8GB/s ± 0%  13.3GB/s ± 0%  +23.07%  (p=0.016 n=4+5)
      Memclr/4096   20.1GB/s ± 0%  34.3GB/s ± 0%  +70.91%  (p=0.008 n=5+5)
      Memclr/65536  22.7GB/s ± 0%  39.6GB/s ± 0%  +74.65%  (p=0.008 n=5+5)
      Memclr/1M     22.8GB/s ± 0%  40.0GB/s ± 0%  +74.88%  (p=0.008 n=5+5)
      Memclr/4M     22.8GB/s ± 0%  39.9GB/s ± 0%  +74.84%  (p=0.008 n=5+5)
      Memclr/8M     22.9GB/s ± 0%  39.9GB/s ± 0%  +74.71%  (p=0.008 n=5+5)
      Memclr/16M    22.9GB/s ± 0%  39.9GB/s ± 0%  +74.64%  (p=0.008 n=5+5)
      Memclr/64M    22.8GB/s ± 0%  39.7GB/s ± 0%  +73.79%  (p=0.008 n=5+5)
      GoMemclr/5     625MB/s ± 0%   595MB/s ± 0%   -4.77%  (p=0.000 n=4+5)
      GoMemclr/16   2.00GB/s ± 0%  1.90GB/s ± 0%   -4.77%  (p=0.008 n=5+5)
      GoMemclr/64   5.92GB/s ± 0%  6.66GB/s ± 0%  +12.48%  (p=0.016 n=4+5)
      GoMemclr/256  12.5GB/s ± 0%  14.9GB/s ± 0%  +18.95%  (p=0.008 n=5+5)
      
      Fixes #22948
      
      Change-Id: Iaae4e22391e25b54d299821bb7f8a81ac3986b93
      Reviewed-on: https://go-review.googlesource.com/82055
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      b46d3988
    • Robert Griesemer's avatar
      cmd/compile: document new line directives · e65d6a6a
      Robert Griesemer authored
      Fixes #24183.
      
      Change-Id: I5ef31c4a3aad7e05568b7de1227745d686d4aff8
      Reviewed-on: https://go-review.googlesource.com/100462Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      e65d6a6a
    • Tobias Klauser's avatar
      runtime, syscall: add RawSyscall6 on Solaris and make it panic · f0939ba5
      Tobias Klauser authored
      The syscall package currently declares RawSyscall6 for every GOOS, but
      does not define it on Solaris. This leads to code using said function
      to compile but it will not link. Fix it by adding RawSyscall6 and make
      it panic.
      
      Also remove the obsolete comment above runtime.syscall_syscall as
      pointed out by Aram.
      
      Updates #24357
      
      Change-Id: I1b1423121d1c99de2ecc61cd9a935dba9b39e3a4
      Reviewed-on: https://go-review.googlesource.com/100655Reviewed-by: default avatarAram Hăvărneanu <aram@mgk.ro>
      f0939ba5
    • Alberto Donizetti's avatar
      test/codegen: port all small memmove tests to codegen · cd3aae9b
      Alberto Donizetti authored
      This change ports all the remaining tests checking that small memmoves
      are replaced with MOVs to the new codegen test harness, and deletes
      them from the asm_test file.
      
      Change-Id: I01c94b441e27a5d61518035af62d62779dafeb56
      Reviewed-on: https://go-review.googlesource.com/100476
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      cd3aae9b
    • Alberto Donizetti's avatar
      test/codegen: add codegen tests for div · 858042b8
      Alberto Donizetti authored
      Change-Id: I6ce8981e85fd55ade6078b0946e54a9215d9deca
      Reviewed-on: https://go-review.googlesource.com/100575
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      858042b8
    • Daniel Martí's avatar
      cmd/asm: move manual tests out of generated file · b8d26225
      Daniel Martí authored
      Thanks to Iskander Sharipov for spotting this in an earlier CL of mine.
      
      Change-Id: Idf45ad266205ff83985367cb38f585badfbed151
      Reviewed-on: https://go-review.googlesource.com/100535
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      Reviewed-by: default avatarIskander Sharipov <iskander.sharipov@intel.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      b8d26225
    • David du Colombier's avatar
      runtime: don't use floating point in findnull on Plan 9 · 523f2ea7
      David du Colombier authored
      In CL 98015, findnull was rewritten so it uses bytes.IndexByte.
      
      This broke the build on plan9/amd64 because the implementation
      of bytes.IndexByte on AMD64 relies on SSE instructions while
      floating point instructions are not allowed in the note handler.
      
      This change fixes findnull by using the former implementation
      on Plan 9, so it doesn't use bytes.IndexByte.
      
      Fixes #24387.
      
      Change-Id: I084d1a44d38d9f77a6c1ad492773f0a98226be16
      Reviewed-on: https://go-review.googlesource.com/100577
      Run-TryBot: David du Colombier <0intro@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRob Pike <r@golang.org>
      523f2ea7
    • Tobias Klauser's avatar
      test: check that size argument errors are emitted at call site · d32018a5
      Tobias Klauser authored
      Add tests for the "negative size argument in make.*" and "size argument
      too large in make.*" error messages to appear at call sites in case the
      size is a const defined on another line.
      
      As suggested by Matthew in a comment on CL 69910.
      
      Change-Id: I5c33d4bec4e3d20bb21fe8019df27999997ddff3
      Reviewed-on: https://go-review.googlesource.com/100395Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      d32018a5
    • Josh Bleecher Snyder's avatar
      runtime: fix typo in gdb script · 4d38d3ae
      Josh Bleecher Snyder authored
      Change-Id: I9d4b3e25b00724f0e4870c6082671b4f14cc18fc
      Reviewed-on: https://go-review.googlesource.com/100463
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      4d38d3ae
  2. 13 Mar, 2018 16 commits
  3. 12 Mar, 2018 8 commits
    • Filippo Valsorda's avatar
      C: add Filippo Valsorda's @golang.org email · 2086f350
      Filippo Valsorda authored
      Change-Id: I3758a4350b600af304b2cff7ad59c7368a01ab5c
      Reviewed-on: https://go-review.googlesource.com/100215Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      2086f350
    • Josh Bleecher Snyder's avatar
      runtime: convert g.waitreason from string to uint8 · 4eea887f
      Josh Bleecher Snyder authored
      Every time I poke at #14921, the g.waitreason string
      pointer writes show up.
      
      They're not particularly important performance-wise,
      but it'd be nice to clear the noise away.
      
      And it does open up a few extra bytes in the g struct
      for some future use.
      
      Change-Id: I7ffbd52fbc2a286931a2218038fda52ed6473cc9
      Reviewed-on: https://go-review.googlesource.com/99078
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      4eea887f
    • Tobias Klauser's avatar
      runtime: simplify range expressions in tests · 025134b0
      Tobias Klauser authored
      Generated by running gofmt -s on the files in question.
      
      Change-Id: If6578b150e1bfced8657196d2af01f5d36879f93
      Reviewed-on: https://go-review.googlesource.com/100135
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      025134b0
    • Vladimir Kuzmin's avatar
      cmd/compile: avoid extra mapaccess in "m[k] op= r" · 73950831
      Vladimir Kuzmin authored
      Currently, order desugars map assignment operations like
      
          m[k] op= r
      
      into
      
          m[k] = m[k] op r
      
      which in turn is transformed during walk into:
      
          tmp := *mapaccess(m, k)
          tmp = tmp op r
          *mapassign(m, k) = tmp
      
      However, this is suboptimal, as we could instead produce just:
      
          *mapassign(m, k) op= r
      
      One complication though is if "r == 0", then "m[k] /= r" and "m[k] %=
      r" will panic, and they need to do so *before* calling mapassign,
      otherwise we may insert a new zero-value element into the map.
      
      It would be spec compliant to just emit the "r != 0" check before
      calling mapassign (see #23735), but currently these checks aren't
      generated until SSA construction. For now, it's simpler to continue
      desugaring /= and %= into two map indexing operations.
      
      Fixes #23661.
      
      Change-Id: I46e3739d9adef10e92b46fdd78b88d5aabe68952
      Reviewed-on: https://go-review.googlesource.com/91557
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      73950831
    • isharipo's avatar
      cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86 · 85a8d25d
      isharipo authored
      cmd/asm now supports three-operand form of IMUL,
      so instead of using IMUL with resultInArg0, emit IMUL3 instruction.
      
      This results in less redundant MOVs where SSA assigns
      different registers to input[0] and dst arguments.
      
      Note: these have exactly the same encoding when reg0=reg1:
            IMUL3x $const, reg0, reg1
            IMULx $const, reg
      Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0].
      This is why we don't bother to generate IMULx for the case where
      dst is the same as input[0].
      
      Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4
      Reviewed-on: https://go-review.googlesource.com/99656
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      85a8d25d
    • Giovanni Bajo's avatar
      cmd/compile: implement CMOV on amd64 · 080187f4
      Giovanni Bajo authored
      This builds upon the branchelim pass, activating it for amd64 and
      lowering CondSelect. Special care is made to FPU instructions for
      NaN handling.
      
      Benchmark results on Xeon E5630 (Westmere EP):
      
      name                      old time/op    new time/op    delta
      BinaryTree17-16              4.99s ± 9%     4.66s ± 2%     ~     (p=0.095 n=5+5)
      Fannkuch11-16                4.93s ± 3%     5.04s ± 2%     ~     (p=0.548 n=5+5)
      FmtFprintfEmpty-16          58.8ns ± 7%    61.4ns ±14%     ~     (p=0.579 n=5+5)
      FmtFprintfString-16          114ns ± 2%     114ns ± 4%     ~     (p=0.603 n=5+5)
      FmtFprintfInt-16             181ns ± 4%     125ns ± 3%  -30.90%  (p=0.008 n=5+5)
      FmtFprintfIntInt-16          263ns ± 2%     217ns ± 2%  -17.34%  (p=0.008 n=5+5)
      FmtFprintfPrefixedInt-16     230ns ± 1%     212ns ± 1%   -7.99%  (p=0.008 n=5+5)
      FmtFprintfFloat-16           411ns ± 3%     344ns ± 5%  -16.43%  (p=0.008 n=5+5)
      FmtManyArgs-16               828ns ± 4%     790ns ± 2%   -4.59%  (p=0.032 n=5+5)
      GobDecode-16                10.9ms ± 4%    10.8ms ± 5%     ~     (p=0.548 n=5+5)
      GobEncode-16                9.52ms ± 5%    9.46ms ± 2%     ~     (p=1.000 n=5+5)
      Gzip-16                      334ms ± 2%     337ms ± 2%     ~     (p=0.548 n=5+5)
      Gunzip-16                   64.4ms ± 1%    65.0ms ± 1%   +1.00%  (p=0.008 n=5+5)
      HTTPClientServer-16          156µs ± 3%     155µs ± 3%     ~     (p=0.690 n=5+5)
      JSONEncode-16               21.0ms ± 1%    21.8ms ± 0%   +3.76%  (p=0.016 n=5+4)
      JSONDecode-16               95.1ms ± 0%    95.7ms ± 1%     ~     (p=0.151 n=5+5)
      Mandelbrot200-16            6.38ms ± 1%    6.42ms ± 1%     ~     (p=0.095 n=5+5)
      GoParse-16                  5.47ms ± 2%    5.36ms ± 1%   -1.95%  (p=0.016 n=5+5)
      RegexpMatchEasy0_32-16       111ns ± 1%     111ns ± 1%     ~     (p=0.635 n=5+4)
      RegexpMatchEasy0_1K-16       408ns ± 1%     411ns ± 2%     ~     (p=0.087 n=5+5)
      RegexpMatchEasy1_32-16       103ns ± 1%     104ns ± 1%     ~     (p=0.484 n=5+5)
      RegexpMatchEasy1_1K-16       659ns ± 2%     652ns ± 1%     ~     (p=0.571 n=5+5)
      RegexpMatchMedium_32-16      176ns ± 2%     174ns ± 1%     ~     (p=0.476 n=5+5)
      RegexpMatchMedium_1K-16     58.6µs ± 4%    57.7µs ± 4%     ~     (p=0.548 n=5+5)
      RegexpMatchHard_32-16       3.07µs ± 3%    3.04µs ± 4%     ~     (p=0.421 n=5+5)
      RegexpMatchHard_1K-16       89.2µs ± 1%    87.9µs ± 2%   -1.52%  (p=0.032 n=5+5)
      Revcomp-16                   575ms ± 0%     587ms ± 2%   +2.12%  (p=0.032 n=4+5)
      Template-16                  110ms ± 1%     107ms ± 3%   -3.00%  (p=0.032 n=5+5)
      TimeParse-16                 463ns ± 0%     462ns ± 0%     ~     (p=0.810 n=5+4)
      TimeFormat-16                538ns ± 0%     535ns ± 0%   -0.63%  (p=0.024 n=5+5)
      
      name                      old speed      new speed      delta
      GobDecode-16              70.7MB/s ± 4%  71.4MB/s ± 5%     ~     (p=0.452 n=5+5)
      GobEncode-16              80.7MB/s ± 5%  81.2MB/s ± 2%     ~     (p=1.000 n=5+5)
      Gzip-16                   58.2MB/s ± 2%  57.7MB/s ± 2%     ~     (p=0.452 n=5+5)
      Gunzip-16                  302MB/s ± 1%   299MB/s ± 1%   -0.99%  (p=0.008 n=5+5)
      JSONEncode-16             92.4MB/s ± 1%  89.1MB/s ± 0%   -3.63%  (p=0.016 n=5+4)
      JSONDecode-16             20.4MB/s ± 0%  20.3MB/s ± 1%     ~     (p=0.135 n=5+5)
      GoParse-16                10.6MB/s ± 2%  10.8MB/s ± 1%   +2.00%  (p=0.016 n=5+5)
      RegexpMatchEasy0_32-16     286MB/s ± 1%   285MB/s ± 3%     ~     (p=1.000 n=5+5)
      RegexpMatchEasy0_1K-16    2.51GB/s ± 1%  2.49GB/s ± 2%     ~     (p=0.095 n=5+5)
      RegexpMatchEasy1_32-16     309MB/s ± 1%   307MB/s ± 1%     ~     (p=0.548 n=5+5)
      RegexpMatchEasy1_1K-16    1.55GB/s ± 2%  1.57GB/s ± 1%     ~     (p=0.690 n=5+5)
      RegexpMatchMedium_32-16   5.68MB/s ± 2%  5.73MB/s ± 1%     ~     (p=0.579 n=5+5)
      RegexpMatchMedium_1K-16   17.5MB/s ± 4%  17.8MB/s ± 4%     ~     (p=0.500 n=5+5)
      RegexpMatchHard_32-16     10.4MB/s ± 3%  10.5MB/s ± 4%     ~     (p=0.460 n=5+5)
      RegexpMatchHard_1K-16     11.5MB/s ± 1%  11.7MB/s ± 2%   +1.57%  (p=0.032 n=5+5)
      Revcomp-16                 442MB/s ± 0%   433MB/s ± 2%   -2.05%  (p=0.032 n=4+5)
      Template-16               17.7MB/s ± 1%  18.2MB/s ± 3%   +3.12%  (p=0.032 n=5+5)
      
      Change-Id: Ic7cb7374d07da031e771bdcbfdd832fd1b17159c
      Reviewed-on: https://go-review.googlesource.com/98695Reviewed-by: default avatarIlya Tocar <ilya.tocar@intel.com>
      080187f4
    • fanzha02's avatar
      cmd/asm: fix ARM64 vector register arrangement encoding bug · fdf5aaf5
      fanzha02 authored
      The current code assigns vector register arrangement a wrong value
      when the arrangement specifier is S2, which causes the incorrect
      assembly.
      
      The patch fixes the issue and adds the test cases.
      
      Fixes #24249
      
      Change-Id: I9736df1279494003d0b178da1af9cee9cd85ce21
      Reviewed-on: https://go-review.googlesource.com/98555Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      fdf5aaf5
    • erifan01's avatar
      math/big: optimize shlVU and shrVU on arm64 · 140bfe9c
      erifan01 authored
      This CL implements shlVU and shrVU with arm64 HW instructions "LDP" and "STP" to reduce load cost,
      it also removes unnecessary checks on the number of shifts for better performance.
      
      Benchmarks:
      
      name                              old time/op    new time/op    delta
      AddVV/1-8                           21.6ns ± 1%    21.6ns ± 1%      ~     (p=0.683 n=5+5)
      AddVV/2-8                           13.5ns ± 0%    13.5ns ± 0%      ~     (all equal)
      AddVV/3-8                           15.5ns ± 0%    15.5ns ± 0%      ~     (all equal)
      AddVV/4-8                           17.5ns ± 0%    17.5ns ± 0%      ~     (all equal)
      AddVV/5-8                           19.5ns ± 0%    19.5ns ± 0%      ~     (all equal)
      AddVV/10-8                          29.5ns ± 0%    29.5ns ± 0%      ~     (all equal)
      AddVV/100-8                          217ns ± 0%     217ns ± 0%      ~     (all equal)
      AddVV/1000-8                        2.02µs ± 0%    2.03µs ± 0%    +0.73%  (p=0.008 n=5+5)
      AddVV/10000-8                       20.5µs ± 0%    20.5µs ± 0%    -0.01%  (p=0.008 n=5+5)
      AddVV/100000-8                       246µs ± 5%     250µs ± 4%      ~     (p=0.548 n=5+5)
      AddVW/1-8                           9.26ns ± 0%    9.32ns ± 0%    +0.65%  (p=0.016 n=4+5)
      AddVW/2-8                           19.8ns ± 3%    19.8ns ± 0%      ~     (p=0.143 n=5+5)
      AddVW/3-8                           11.5ns ± 0%    11.5ns ± 0%      ~     (all equal)
      AddVW/4-8                           13.0ns ± 0%    13.0ns ± 0%      ~     (all equal)
      AddVW/5-8                           14.5ns ± 0%    14.5ns ± 0%      ~     (all equal)
      AddVW/10-8                          22.0ns ± 0%    22.0ns ± 0%      ~     (all equal)
      AddVW/100-8                          167ns ± 0%     166ns ± 0%    -0.60%  (p=0.000 n=5+4)
      AddVW/1000-8                        1.52µs ± 0%    1.52µs ± 0%      ~     (all equal)
      AddVW/10000-8                       15.1µs ± 0%    15.1µs ± 0%    +0.01%  (p=0.008 n=5+5)
      AddVW/100000-8                       163µs ± 4%     153µs ± 3%    -5.97%  (p=0.016 n=5+5)
      AddMulVVW/1-8                       32.4ns ± 1%    33.0ns ± 1%    +1.73%  (p=0.040 n=5+5)
      AddMulVVW/2-8                       56.4ns ± 2%    55.9ns ± 1%      ~     (p=0.135 n=5+5)
      AddMulVVW/3-8                       85.4ns ± 1%    85.1ns ± 0%      ~     (p=0.079 n=5+5)
      AddMulVVW/4-8                        129ns ± 1%     129ns ± 0%      ~     (p=0.397 n=5+5)
      AddMulVVW/5-8                        148ns ± 0%     148ns ± 0%      ~     (all equal)
      AddMulVVW/10-8                       270ns ± 0%     268ns ± 0%    -0.74%  (p=0.029 n=4+4)
      AddMulVVW/100-8                     2.75µs ± 0%    2.75µs ± 0%    -0.09%  (p=0.008 n=5+5)
      AddMulVVW/1000-8                    26.0µs ± 0%    26.0µs ± 0%    -0.06%  (p=0.024 n=5+5)
      AddMulVVW/10000-8                    312µs ± 0%     312µs ± 0%    -0.09%  (p=0.008 n=5+5)
      AddMulVVW/100000-8                  2.89ms ± 0%    2.89ms ± 0%    +0.14%  (p=0.016 n=5+5)
      DecimalConversion-8                  315µs ± 1%     312µs ± 0%      ~     (p=0.095 n=5+5)
      FloatString/100-8                   2.56µs ± 1%    2.52µs ± 1%    -1.31%  (p=0.016 n=5+5)
      FloatString/1000-8                  58.6µs ± 0%    58.2µs ± 0%    -0.75%  (p=0.008 n=5+5)
      FloatString/10000-8                 4.59ms ± 0%    4.59ms ± 0%      ~     (p=0.056 n=5+5)
      FloatString/100000-8                 446ms ± 0%     446ms ± 0%    -0.04%  (p=0.008 n=5+5)
      FloatAdd/10-8                        184ns ± 0%     178ns ± 0%    -3.48%  (p=0.008 n=5+5)
      FloatAdd/100-8                       189ns ± 3%     178ns ± 2%    -6.02%  (p=0.008 n=5+5)
      FloatAdd/1000-8                      371ns ± 0%     267ns ± 0%   -27.99%  (p=0.000 n=5+4)
      FloatAdd/10000-8                    1.87µs ± 0%    1.03µs ± 0%   -44.74%  (p=0.008 n=5+5)
      FloatAdd/100000-8                   17.1µs ± 0%     8.8µs ± 0%   -48.71%  (p=0.016 n=5+4)
      FloatSub/10-8                        148ns ± 0%     146ns ± 0%    -1.35%  (p=0.000 n=5+4)
      FloatSub/100-8                       148ns ± 0%     140ns ± 0%    -5.41%  (p=0.008 n=5+5)
      FloatSub/1000-8                      242ns ± 0%     191ns ± 0%   -21.24%  (p=0.008 n=5+5)
      FloatSub/10000-8                    1.07µs ± 0%    0.64µs ± 1%   -39.89%  (p=0.016 n=4+5)
      FloatSub/100000-8                   9.48µs ± 0%    5.32µs ± 0%   -43.87%  (p=0.008 n=5+5)
      ParseFloatSmallExp-8                29.3µs ± 3%    28.6µs ± 1%      ~     (p=0.310 n=5+5)
      ParseFloatLargeExp-8                 125µs ± 1%     123µs ± 0%    -1.99%  (p=0.008 n=5+5)
      GCD10x10/WithoutXY-8                 278ns ± 4%     289ns ± 5%    +3.96%  (p=0.040 n=5+5)
      GCD10x10/WithXY-8                   2.12µs ± 2%    2.15µs ± 2%      ~     (p=0.095 n=5+5)
      GCD10x100/WithoutXY-8                615ns ± 1%     629ns ± 5%      ~     (p=0.135 n=5+5)
      GCD10x100/WithXY-8                  3.42µs ± 1%    3.53µs ± 2%    +3.38%  (p=0.008 n=5+5)
      GCD10x1000/WithoutXY-8              1.39µs ± 1%    1.38µs ± 1%      ~     (p=0.460 n=5+5)
      GCD10x1000/WithXY-8                 7.47µs ± 2%    7.49µs ± 3%      ~     (p=1.000 n=5+5)
      GCD10x10000/WithoutXY-8             8.71µs ± 1%    8.71µs ± 0%      ~     (p=0.841 n=5+5)
      GCD10x10000/WithXY-8                28.4µs ± 2%    27.2µs ± 2%    -4.24%  (p=0.008 n=5+5)
      GCD10x100000/WithoutXY-8            78.9µs ± 1%    79.1µs ± 0%      ~     (p=0.222 n=5+5)
      GCD10x100000/WithXY-8                240µs ± 1%     228µs ± 1%    -4.98%  (p=0.008 n=5+5)
      GCD100x100/WithoutXY-8              1.87µs ± 2%    1.89µs ± 1%      ~     (p=0.095 n=5+5)
      GCD100x100/WithXY-8                 26.6µs ± 1%    26.3µs ± 0%    -1.14%  (p=0.032 n=5+5)
      GCD100x1000/WithoutXY-8             4.44µs ± 2%    4.47µs ± 2%      ~     (p=0.444 n=5+5)
      GCD100x1000/WithXY-8                36.7µs ± 1%    36.0µs ± 1%    -1.96%  (p=0.008 n=5+5)
      GCD100x10000/WithoutXY-8            22.8µs ± 1%    22.3µs ± 1%    -2.52%  (p=0.008 n=5+5)
      GCD100x10000/WithXY-8                145µs ± 1%     142µs ± 0%    -1.88%  (p=0.008 n=5+5)
      GCD100x100000/WithoutXY-8            198µs ± 1%     190µs ± 0%    -4.06%  (p=0.008 n=5+5)
      GCD100x100000/WithXY-8              1.11ms ± 0%    1.09ms ± 0%    -1.87%  (p=0.008 n=5+5)
      GCD1000x1000/WithoutXY-8            25.4µs ± 1%    25.0µs ± 1%    -1.34%  (p=0.008 n=5+5)
      GCD1000x1000/WithXY-8                515µs ± 1%     485µs ± 0%    -5.85%  (p=0.008 n=5+5)
      GCD1000x10000/WithoutXY-8           57.3µs ± 1%    56.2µs ± 1%    -1.95%  (p=0.008 n=5+5)
      GCD1000x10000/WithXY-8              1.21ms ± 0%    1.18ms ± 0%    -2.65%  (p=0.008 n=5+5)
      GCD1000x100000/WithoutXY-8           358µs ± 0%     352µs ± 1%    -1.71%  (p=0.008 n=5+5)
      GCD1000x100000/WithXY-8             8.72ms ± 0%    8.66ms ± 0%    -0.71%  (p=0.008 n=5+5)
      GCD10000x10000/WithoutXY-8           690µs ± 0%     687µs ± 1%      ~     (p=0.095 n=5+5)
      GCD10000x10000/WithXY-8             16.0ms ± 0%    12.5ms ± 0%   -22.01%  (p=0.008 n=5+5)
      GCD10000x100000/WithoutXY-8         2.09ms ± 0%    2.07ms ± 0%    -0.58%  (p=0.008 n=5+5)
      GCD10000x100000/WithXY-8            86.8ms ± 0%    83.4ms ± 0%    -3.95%  (p=0.008 n=5+5)
      GCD100000x100000/WithoutXY-8        51.2ms ± 0%    51.2ms ± 0%      ~     (p=0.548 n=5+5)
      GCD100000x100000/WithXY-8            1.25s ± 0%     0.89s ± 0%   -28.98%  (p=0.008 n=5+5)
      Hilbert-8                           2.46ms ± 2%    2.53ms ± 1%    +2.89%  (p=0.032 n=5+5)
      Binomial-8                          5.15µs ± 4%    4.92µs ± 1%    -4.43%  (p=0.032 n=5+5)
      QuoRem-8                            7.10µs ± 0%    7.05µs ± 0%    -0.59%  (p=0.008 n=5+5)
      Exp-8                                161ms ± 0%     161ms ± 0%    -0.24%  (p=0.008 n=5+5)
      Exp2-8                               161ms ± 0%     161ms ± 0%    -0.30%  (p=0.016 n=4+5)
      Bitset-8                            40.4ns ± 0%    40.3ns ± 0%      ~     (p=0.159 n=5+5)
      BitsetNeg-8                          158ns ± 4%     155ns ± 2%      ~     (p=0.183 n=5+5)
      BitsetOrig-8                         374ns ± 0%     383ns ± 1%    +2.35%  (p=0.008 n=5+5)
      BitsetNegOrig-8                      620ns ± 1%     663ns ± 2%    +7.00%  (p=0.008 n=5+5)
      ModSqrt225_Tonelli-8                7.26ms ± 0%    7.27ms ± 0%      ~     (p=0.841 n=5+5)
      ModSqrt224_3Mod4-8                  2.24ms ± 0%    2.24ms ± 0%      ~     (p=0.690 n=5+5)
      ModSqrt5430_Tonelli-8                62.3s ± 0%     62.4s ± 0%    +0.15%  (p=0.008 n=5+5)
      ModSqrt5430_3Mod4-8                  20.8s ± 0%     20.8s ± 0%      ~     (p=0.151 n=5+5)
      Sqrt-8                               101µs ± 0%      97µs ± 0%    -3.99%  (p=0.008 n=5+5)
      IntSqr/1-8                          32.7ns ± 1%    32.5ns ± 1%      ~     (p=0.325 n=5+5)
      IntSqr/2-8                           161ns ± 4%     160ns ± 4%      ~     (p=0.659 n=5+5)
      IntSqr/3-8                           296ns ± 7%     297ns ± 6%      ~     (p=0.841 n=5+5)
      IntSqr/5-8                           752ns ± 7%     755ns ± 6%      ~     (p=0.889 n=5+5)
      IntSqr/8-8                          1.91µs ± 3%    1.90µs ± 3%      ~     (p=0.746 n=5+5)
      IntSqr/10-8                         2.99µs ± 4%    3.00µs ± 4%      ~     (p=0.516 n=5+5)
      IntSqr/20-8                         6.29µs ± 2%    6.19µs ± 2%      ~     (p=0.151 n=5+5)
      IntSqr/30-8                         14.0µs ± 1%    13.8µs ± 2%      ~     (p=0.056 n=5+5)
      IntSqr/50-8                         38.1µs ± 3%    37.9µs ± 3%      ~     (p=0.548 n=5+5)
      IntSqr/80-8                         95.1µs ± 1%    94.7µs ± 1%      ~     (p=0.310 n=5+5)
      IntSqr/100-8                         148µs ± 1%     148µs ± 1%      ~     (p=0.548 n=5+5)
      IntSqr/200-8                         587µs ± 1%     587µs ± 1%      ~     (p=1.000 n=5+5)
      IntSqr/300-8                        1.31ms ± 1%    1.32ms ± 1%      ~     (p=0.151 n=5+5)
      IntSqr/500-8                        2.48ms ± 0%    2.49ms ± 0%      ~     (p=0.310 n=5+5)
      IntSqr/800-8                        4.68ms ± 0%    4.67ms ± 0%      ~     (p=0.548 n=5+5)
      IntSqr/1000-8                       7.57ms ± 0%    7.56ms ± 0%      ~     (p=0.421 n=5+5)
      Mul-8                                311ms ± 0%     311ms ± 0%      ~     (p=0.151 n=5+5)
      Exp3Power/0x10-8                     584ns ± 2%     573ns ± 1%      ~     (p=0.190 n=5+5)
      Exp3Power/0x40-8                     646ns ± 2%     649ns ± 1%      ~     (p=0.690 n=5+5)
      Exp3Power/0x100-8                   1.42µs ± 2%    1.45µs ± 1%    +2.03%  (p=0.032 n=5+5)
      Exp3Power/0x400-8                   8.28µs ± 1%    8.39µs ± 0%    +1.33%  (p=0.008 n=5+5)
      Exp3Power/0x1000-8                  60.1µs ± 0%    59.8µs ± 0%    -0.44%  (p=0.008 n=5+5)
      Exp3Power/0x4000-8                   818µs ± 0%     816µs ± 0%    -0.23%  (p=0.008 n=5+5)
      Exp3Power/0x10000-8                 7.79ms ± 0%    7.78ms ± 0%      ~     (p=0.690 n=5+5)
      Exp3Power/0x40000-8                 73.4ms ± 0%    73.3ms ± 0%      ~     (p=0.151 n=5+5)
      Exp3Power/0x100000-8                 665ms ± 0%     664ms ± 0%    -0.16%  (p=0.016 n=4+5)
      Exp3Power/0x400000-8                 5.99s ± 0%     5.97s ± 0%    -0.24%  (p=0.008 n=5+5)
      Fibo-8                               116ms ± 0%     117ms ± 0%    +0.42%  (p=0.008 n=5+5)
      NatSqr/1-8                           113ns ± 2%     112ns ± 1%      ~     (p=0.190 n=5+5)
      NatSqr/2-8                           249ns ± 2%     250ns ± 2%      ~     (p=0.365 n=5+5)
      NatSqr/3-8                           379ns ± 1%     381ns ± 2%      ~     (p=0.127 n=5+5)
      NatSqr/5-8                           838ns ± 3%     841ns ± 5%      ~     (p=0.754 n=5+5)
      NatSqr/8-8                          1.97µs ± 3%    1.97µs ± 4%      ~     (p=1.000 n=5+5)
      NatSqr/10-8                         3.04µs ± 4%    3.04µs ± 4%      ~     (p=1.000 n=5+5)
      NatSqr/20-8                         6.49µs ± 3%    6.50µs ± 2%      ~     (p=0.841 n=5+5)
      NatSqr/30-8                         14.3µs ± 2%    14.2µs ± 2%      ~     (p=0.548 n=5+5)
      NatSqr/50-8                         38.5µs ± 3%    38.3µs ± 3%      ~     (p=0.421 n=5+5)
      NatSqr/80-8                         96.3µs ± 1%    96.1µs ± 1%      ~     (p=0.421 n=5+5)
      NatSqr/100-8                         149µs ± 1%     148µs ± 1%      ~     (p=0.310 n=5+5)
      NatSqr/200-8                         591µs ± 1%     592µs ± 1%      ~     (p=0.690 n=5+5)
      NatSqr/300-8                        1.31ms ± 1%    1.32ms ± 0%      ~     (p=0.190 n=5+4)
      NatSqr/500-8                        2.49ms ± 0%    2.49ms ± 0%      ~     (p=0.095 n=5+5)
      NatSqr/800-8                        4.70ms ± 0%    4.69ms ± 0%      ~     (p=0.222 n=5+5)
      NatSqr/1000-8                       7.60ms ± 0%    7.58ms ± 0%      ~     (p=0.222 n=5+5)
      ScanPi-8                             326µs ± 0%     327µs ± 1%      ~     (p=0.222 n=5+5)
      StringPiParallel-8                  71.4µs ± 5%    67.7µs ± 4%      ~     (p=0.095 n=5+5)
      Scan/10/Base2-8                     1.09µs ± 0%    1.10µs ± 1%      ~     (p=0.810 n=5+5)
      Scan/100/Base2-8                    7.79µs ± 0%    7.83µs ± 0%    +0.53%  (p=0.008 n=5+5)
      Scan/1000/Base2-8                   78.9µs ± 0%    79.0µs ± 0%      ~     (p=0.151 n=5+5)
      Scan/10000/Base2-8                  1.22ms ± 0%    1.23ms ± 1%      ~     (p=0.690 n=5+5)
      Scan/100000/Base2-8                 55.1ms ± 0%    55.1ms ± 0%    +0.10%  (p=0.008 n=5+5)
      Scan/10/Base8-8                      512ns ± 1%     534ns ± 1%    +4.34%  (p=0.008 n=5+5)
      Scan/100/Base8-8                    2.90µs ± 1%    2.92µs ± 0%    +0.67%  (p=0.024 n=5+5)
      Scan/1000/Base8-8                   31.0µs ± 0%    31.1µs ± 0%    +0.27%  (p=0.008 n=5+5)
      Scan/10000/Base8-8                   741µs ± 0%     744µs ± 1%      ~     (p=0.310 n=5+5)
      Scan/100000/Base8-8                 50.5ms ± 0%    50.7ms ± 0%    +0.23%  (p=0.016 n=5+4)
      Scan/10/Base10-8                     485ns ± 0%     510ns ± 1%    +5.15%  (p=0.008 n=5+5)
      Scan/100/Base10-8                   2.68µs ± 0%    2.70µs ± 0%    +0.84%  (p=0.008 n=5+5)
      Scan/1000/Base10-8                  28.7µs ± 0%    28.8µs ± 0%    +0.34%  (p=0.008 n=5+5)
      Scan/10000/Base10-8                  717µs ± 0%     720µs ± 1%      ~     (p=0.238 n=5+5)
      Scan/100000/Base10-8                50.3ms ± 0%    50.3ms ± 0%    +0.02%  (p=0.016 n=4+5)
      Scan/10/Base16-8                     439ns ± 0%     461ns ± 1%    +5.06%  (p=0.008 n=5+5)
      Scan/100/Base16-8                   2.48µs ± 0%    2.49µs ± 0%    +0.59%  (p=0.024 n=5+5)
      Scan/1000/Base16-8                  27.2µs ± 0%    27.3µs ± 0%      ~     (p=0.063 n=5+5)
      Scan/10000/Base16-8                  722µs ± 0%     725µs ± 1%      ~     (p=0.421 n=5+5)
      Scan/100000/Base16-8                52.7ms ± 0%    52.7ms ± 0%      ~     (p=0.686 n=4+4)
      String/10/Base2-8                    248ns ± 1%     248ns ± 1%      ~     (p=0.802 n=5+5)
      String/100/Base2-8                  1.51µs ± 0%    1.51µs ± 0%    -0.54%  (p=0.024 n=5+5)
      String/1000/Base2-8                 13.6µs ± 0%    13.6µs ± 0%      ~     (p=0.548 n=5+5)
      String/10000/Base2-8                 135µs ± 1%     135µs ± 2%      ~     (p=0.421 n=5+5)
      String/100000/Base2-8               1.32ms ± 1%    1.33ms ± 1%      ~     (p=0.310 n=5+5)
      String/10/Base8-8                    169ns ± 0%     170ns ± 0%      ~     (p=0.079 n=5+5)
      String/100/Base8-8                   635ns ± 1%     633ns ± 1%      ~     (p=0.595 n=5+5)
      String/1000/Base8-8                 5.33µs ± 0%    5.30µs ± 0%      ~     (p=0.063 n=5+5)
      String/10000/Base8-8                50.7µs ± 1%    50.7µs ± 1%      ~     (p=1.000 n=5+5)
      String/100000/Base8-8                499µs ± 1%     500µs ± 1%      ~     (p=1.000 n=5+5)
      String/10/Base10-8                   517ns ± 1%     512ns ± 1%    -1.01%  (p=0.032 n=5+5)
      String/100/Base10-8                 1.97µs ± 0%    2.01µs ± 1%    +2.13%  (p=0.008 n=5+5)
      String/1000/Base10-8                12.6µs ± 1%    12.1µs ± 1%    -4.16%  (p=0.008 n=5+5)
      String/10000/Base10-8               57.9µs ± 1%    54.8µs ± 1%    -5.46%  (p=0.008 n=5+5)
      String/100000/Base10-8              25.6ms ± 0%    25.6ms ± 0%    -0.12%  (p=0.008 n=5+5)
      String/10/Base16-8                   149ns ± 0%     149ns ± 1%      ~     (p=1.000 n=5+5)
      String/100/Base16-8                  514ns ± 0%     514ns ± 1%      ~     (p=0.825 n=5+5)
      String/1000/Base16-8                4.01µs ± 0%    4.01µs ± 0%      ~     (p=0.595 n=5+5)
      String/10000/Base16-8               37.7µs ± 0%    37.8µs ± 1%      ~     (p=0.222 n=5+5)
      String/100000/Base16-8               373µs ± 1%     372µs ± 0%      ~     (p=1.000 n=5+5)
      LeafSize/0-8                        6.64ms ± 0%    6.66ms ± 0%    +0.32%  (p=0.008 n=5+5)
      LeafSize/1-8                        74.0µs ± 1%    71.2µs ± 1%    -3.75%  (p=0.008 n=5+5)
      LeafSize/2-8                        74.1µs ± 0%    70.7µs ± 1%    -4.53%  (p=0.008 n=5+5)
      LeafSize/3-8                         379µs ± 0%     374µs ± 0%    -1.25%  (p=0.008 n=5+5)
      LeafSize/4-8                        72.7µs ± 0%    69.2µs ± 0%    -4.79%  (p=0.008 n=5+5)
      LeafSize/5-8                         471µs ± 0%     466µs ± 0%    -1.05%  (p=0.008 n=5+5)
      LeafSize/6-8                         377µs ± 0%     373µs ± 0%    -1.16%  (p=0.008 n=5+5)
      LeafSize/7-8                         245µs ± 0%     241µs ± 0%    -1.65%  (p=0.008 n=5+5)
      LeafSize/8-8                        73.1µs ± 0%    69.4µs ± 0%    -5.10%  (p=0.008 n=5+5)
      LeafSize/9-8                         538µs ± 0%     532µs ± 0%    -1.01%  (p=0.008 n=5+5)
      LeafSize/10-8                        472µs ± 0%     467µs ± 0%    -1.07%  (p=0.008 n=5+5)
      LeafSize/11-8                        460µs ± 0%     454µs ± 0%    -1.22%  (p=0.008 n=5+5)
      LeafSize/12-8                        378µs ± 0%     373µs ± 0%    -1.34%  (p=0.008 n=5+5)
      LeafSize/13-8                        344µs ± 0%     338µs ± 0%    -1.61%  (p=0.008 n=5+5)
      LeafSize/14-8                        247µs ± 0%     243µs ± 0%    -1.62%  (p=0.008 n=5+5)
      LeafSize/15-8                        169µs ± 0%     165µs ± 0%    -2.71%  (p=0.008 n=5+5)
      LeafSize/16-8                       73.3µs ± 1%    69.5µs ± 0%    -5.11%  (p=0.008 n=5+5)
      LeafSize/32-8                       82.7µs ± 0%    79.2µs ± 0%    -4.24%  (p=0.008 n=5+5)
      LeafSize/64-8                        135µs ± 0%     132µs ± 0%    -2.20%  (p=0.008 n=5+5)
      ProbablyPrime/n=0-8                 44.2ms ± 0%    43.9ms ± 0%    -0.69%  (p=0.008 n=5+5)
      ProbablyPrime/n=1-8                 64.8ms ± 0%    64.4ms ± 0%    -0.60%  (p=0.008 n=5+5)
      ProbablyPrime/n=5-8                  147ms ± 0%     147ms ± 0%    -0.34%  (p=0.008 n=5+5)
      ProbablyPrime/n=10-8                 250ms ± 0%     249ms ± 0%    -0.29%  (p=0.008 n=5+5)
      ProbablyPrime/n=20-8                 456ms ± 0%     455ms ± 0%    -0.29%  (p=0.008 n=5+5)
      ProbablyPrime/Lucas-8               23.6ms ± 0%    23.2ms ± 0%    -1.44%  (p=0.008 n=5+5)
      ProbablyPrime/MillerRabinBase2-8    20.6ms ± 0%    20.6ms ± 0%    -0.31%  (p=0.008 n=5+5)
      FloatSqrt/64-8                      2.27µs ± 1%    2.11µs ± 1%    -7.02%  (p=0.008 n=5+5)
      FloatSqrt/128-8                     4.93µs ± 1%    4.40µs ± 1%   -10.73%  (p=0.008 n=5+5)
      FloatSqrt/256-8                     13.6µs ± 0%     6.6µs ± 1%   -51.40%  (p=0.008 n=5+5)
      FloatSqrt/1000-8                    69.8µs ± 0%    31.2µs ± 0%   -55.27%  (p=0.008 n=5+5)
      FloatSqrt/10000-8                   1.91ms ± 0%    0.59ms ± 0%   -69.17%  (p=0.008 n=5+5)
      FloatSqrt/100000-8                  55.4ms ± 0%    17.8ms ± 0%   -67.79%  (p=0.008 n=5+5)
      FloatSqrt/1000000-8                  4.56s ± 0%     1.52s ± 0%   -66.59%  (p=0.008 n=5+5)
      
      Change-Id: Icce52c69668f564490c69b908338b21a2288e116
      Reviewed-on: https://go-review.googlesource.com/79355Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      140bfe9c
  4. 11 Mar, 2018 1 commit
  5. 10 Mar, 2018 4 commits