1. 07 Mar, 2018 9 commits
    • Hana Kim's avatar
      cmd/trace: force GC occassionally · 93b0261d
      Hana Kim authored
      to return memory to the OS after completing potentially
      large operations.
      
      Update #21870
      
      Sys went down to 3.7G
      
      $ DEBUG_MEMORY_USAGE=1 go tool trace trace.out
      
      2018/03/07 09:35:52 Parsing trace...
      after parsing trace
       Alloc:	3385754360 Bytes
       Sys:	3662047864 Bytes
       HeapReleased:	0 Bytes
       HeapSys:	3488907264 Bytes
       HeapInUse:	3426549760 Bytes
       HeapAlloc:	3385754360 Bytes
      Enter to continue...
      2018/03/07 09:36:09 Splitting trace...
      after spliting trace
       Alloc:	3238309424 Bytes
       Sys:	3684410168 Bytes
       HeapReleased:	0 Bytes
       HeapSys:	3488874496 Bytes
       HeapInUse:	3266461696 Bytes
       HeapAlloc:	3238309424 Bytes
      Enter to continue...
      2018/03/07 09:36:39 Opening browser. Trace viewer is listening on http://100.101.224.241:12345
      
      after httpJsonTrace
       Alloc:	3000633872 Bytes
       Sys:	3693978424 Bytes
       HeapReleased:	0 Bytes
       HeapSys:	3488743424 Bytes
       HeapInUse:	3030966272 Bytes
       HeapAlloc:	3000633872 Bytes
      Enter to continue...
      
      Change-Id: I56f64cae66c809cbfbad03fba7bd0d35494c1d04
      Reviewed-on: https://go-review.googlesource.com/92376Reviewed-by: default avatarPeter Weinberger <pjw@google.com>
      93b0261d
    • jimmyfrasche's avatar
      go/build: correct value of .Doc field · 20b14b71
      jimmyfrasche authored
      Build could use the package comment from test files to populate the .Doc
      field on *Package.
      
      As go list uses this data and several packages in the standard library
      have tests with package comments, this lead to:
      
      $ go list -f '{{.Doc}}' flag container/heap image
      These examples demonstrate more intricate uses of the flag package.
      This example demonstrates an integer heap built using the heap interface.
      This example demonstrates decoding a JPEG image and examining its pixels.
      
      This change now only examines non-test files when attempting to populate
      .Doc, resulting in the expected behavior:
      
      $ gotip list -f '{{.Doc}}' flag container/heap image
      Package flag implements command-line flag parsing.
      Package heap provides heap operations for any type that implements heap.Interface.
      Package image implements a basic 2-D image library.
      
      Fixes #23594
      
      Change-Id: I37171c26ec5cc573efd273556a05223c6f675968
      Reviewed-on: https://go-review.googlesource.com/96976
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDaniel Martí <mvdan@mvdan.cc>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      20b14b71
    • Hana Kim's avatar
      cmd/trace: generate jsontrace data in a streaming fashion · ee465831
      Hana Kim authored
      Update #21870
      
      The Sys went down to 4.25G from 6.2G.
      
      $ DEBUG_MEMORY_USAGE=1 go tool trace trace.out
      2018/03/07 08:49:01 Parsing trace...
      after parsing trace
       Alloc:	3385757184 Bytes
       Sys:	3661195896 Bytes
       HeapReleased:	0 Bytes
       HeapSys:	3488841728 Bytes
       HeapInUse:	3426516992 Bytes
       HeapAlloc:	3385757184 Bytes
      Enter to continue...
      2018/03/07 08:49:18 Splitting trace...
      after spliting trace
       Alloc:	2352071904 Bytes
       Sys:	4243825464 Bytes
       HeapReleased:	0 Bytes
       HeapSys:	4025712640 Bytes
       HeapInUse:	2377703424 Bytes
       HeapAlloc:	2352071904 Bytes
      Enter to continue...
      after httpJsonTrace
       Alloc:	3228697832 Bytes
       Sys:	4250379064 Bytes
       HeapReleased:	0 Bytes
       HeapSys:	4025647104 Bytes
       HeapInUse:	3260014592 Bytes
       HeapAlloc:	3228697832 Bytes
      
      Change-Id: I546f26bdbc68b1e58f1af1235a0e299dc0ff115e
      Reviewed-on: https://go-review.googlesource.com/92375
      Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
      Reviewed-by: default avatarPeter Weinberger <pjw@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      ee465831
    • Yuval Pavel Zholkover's avatar
      runtime: add missing build constraints to os_linux_{be64,noauxv,novdso,ppc64x}.go files · 083f3957
      Yuval Pavel Zholkover authored
      They do not match the file name patterns of
        *_GOOS
        *_GOARCH
        *_GOOS_GOARCH
      therefore the implicit linux constraint was not being added.
      
      Change-Id: Ie506c51cee6818db445516f96fffaa351df62cf5
      Reviewed-on: https://go-review.googlesource.com/99116Reviewed-by: default avatarTobias Klauser <tobias.klauser@gmail.com>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      083f3957
    • Elias Naur's avatar
      androidtest.bash: don't require GOARCH set · 9094946f
      Elias Naur authored
      The host GOARCH is most likely supported (386, amd64, arm, arm64).
      
      Change-Id: I86324b9c00f22c592ba54bda7d2ae97c86bda904
      Reviewed-on: https://go-review.googlesource.com/99155
      Run-TryBot: Elias Naur <elias.naur@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarHyang-Ah Hana Kim <hyangah@gmail.com>
      9094946f
    • Alex Brainman's avatar
      os: use WIN32_FIND_DATA.Reserved0 to identify symlinks · e83601b4
      Alex Brainman authored
      os.Stat implementation uses instructions described at
      https://blogs.msdn.microsoft.com/oldnewthing/20100212-00/?p=14963/
      to distinguish symlinks. In particular, it calls
      GetFileAttributesEx or FindFirstFile and checks
      either WIN32_FILE_ATTRIBUTE_DATA.dwFileAttributes
      or WIN32_FIND_DATA.dwFileAttributes to see if
      FILE_ATTRIBUTES_REPARSE_POINT flag is set.
      And that seems to worked fine so far.
      
      But now we discovered that OneDrive root folder
      is determined as directory:
      
      c:\>dir C:\Users\Alex | grep OneDrive
      30/11/2017  07:25 PM    <DIR>          OneDrive
      c:\>
      
      while Go identified it as symlink.
      
      But we did not follow Microsoft's advice to the letter - we never
      checked WIN32_FIND_DATA.Reserved0. And adding that extra check
      makes Go treat OneDrive as symlink. So use FindFirstFile and
      WIN32_FIND_DATA.Reserved0 to determine symlinks.
      
      Fixes #22579
      
      Change-Id: I0cb88929eb8b47b1d24efaf1907ad5a0e20de83f
      Reviewed-on: https://go-review.googlesource.com/86556Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      e83601b4
    • Matthew Dempsky's avatar
      cmd/compile: remove funcdepth variables · d7eb4901
      Matthew Dempsky authored
      There were only two large classes of use for these variables:
      
      1) Testing "funcdepth != 0" or "funcdepth > 0", which is equivalent to
      checking "Curfn != nil".
      
      2) In oldname, detecting whether a closure variable has been created
      for the current function, which can be handled by instead testing
      "n.Name.Curfn != Curfn".
      
      Lastly, merge funcstart into funchdr, since it's only called once, and
      it better matches up with funcbody now.
      
      Passes toolstash-check.
      
      Change-Id: I8fe159a9d37ef7debc4cd310354cea22a8b23394
      Reviewed-on: https://go-review.googlesource.com/99076
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      d7eb4901
    • Matthew Dempsky's avatar
      cmd/compile: cleanup funccompile and compile · aa00ca12
      Matthew Dempsky authored
      Bring these functions next to each other, and clean them up a little
      bit. Also, change emitptrargsmap to take Curfn as a parameter instead
      of a global.
      
      Passes toolstash-check.
      
      Change-Id: Ib9c94fda3b2cb6f0dcec1585622b33b4f311b5e9
      Reviewed-on: https://go-review.googlesource.com/99075
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      aa00ca12
    • Kunpei Sakai's avatar
      cmd/compile: prevent detection of wrong duplicates · b75e8a2a
      Kunpei Sakai authored
      by including *types.Type in typeVal.
      
      Updates #21866
      Fixes #24159
      
      Change-Id: I2f8cac252d88d43e723124f2867b1410b7abab7b
      Reviewed-on: https://go-review.googlesource.com/98476
      Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      b75e8a2a
  2. 06 Mar, 2018 21 commits
    • Matthew Dempsky's avatar
      cmd/compile: fix miscompilation of "defer delete(m, k)" · 2c0c68d6
      Matthew Dempsky authored
      Previously, for slow map key types (i.e., any type other than a 32-bit
      or 64-bit plain memory type), we would rewrite
      
          defer delete(m, k)
      
      into
      
          ktmp := k
          defer delete(m, &ktmp)
      
      However, if the defer statement was inside a loop, we would end up
      reusing the same ktmp value for all of the deferred deletes.
      
      We already rewrite
      
          defer print(x, y, z)
      
      into
      
          defer func(a1, a2, a3) {
              print(a1, a2, a3)
          }(x, y, z)
      
      This CL generalizes this rewrite to also apply for slow map deletes.
      
      This could be extended to apply even more generally to other builtins,
      but as discussed on #24259, there are cases where we must *not* do
      this (e.g., "defer recover()"). However, if we elect to do this more
      generally, this CL should still make that easier.
      
      Lastly, while here, fix a few isues in wrapCall (nee walkprintfunc):
      
      1) lookupN appends the generation number to the symbol anyway, so "%d"
      was being literally included in the generated function names.
      
      2) walkstmt will be called when the function is compiled later anyway,
      so no need to do it now.
      
      Fixes #24259.
      
      Change-Id: I70286867c64c69c18e9552f69e3f4154a0fc8b04
      Reviewed-on: https://go-review.googlesource.com/99017
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      2c0c68d6
    • Ian Lance Taylor's avatar
      internal/poll: if poller init fails, assume blocking mode · 558769a6
      Ian Lance Taylor authored
      Fixes #23943
      
      Change-Id: I16e604872f1615963925ec3c4710106bcce1330c
      Reviewed-on: https://go-review.googlesource.com/99015
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      558769a6
    • ChrisALiles's avatar
      cmd/compile: improve compiler error on embedded structs · 42ecf39e
      ChrisALiles authored
      Fixes #23609
      
      Change-Id: I751aae3d849de7fce1306324fcb1a4c3842d873e
      Reviewed-on: https://go-review.googlesource.com/97076Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      42ecf39e
    • Alberto Donizetti's avatar
      test/codegen: port math/bits.ReverseBytes tests to codegen · 8516ecd0
      Alberto Donizetti authored
      And remove them from ssa_test.
      
      Change-Id: If767af662801219774d1bdb787c77edfa6067770
      Reviewed-on: https://go-review.googlesource.com/98976
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      8516ecd0
    • Wei Xiao's avatar
      cmd/compile/internal/ssa: improve store combine optimization on arm64 · 05962561
      Wei Xiao authored
      Current implementation doesn't consider MOVDreg type operand and fail to combine
      it into larger store. This patch fixes the issue.
      
      Fixes #24242
      
      Change-Id: I7d68697f80e76f48c3528ece01a602bf513248ec
      Reviewed-on: https://go-review.googlesource.com/98397
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      05962561
    • Josh Bleecher Snyder's avatar
      encoding/binary: use an offset instead of slicing · b8543397
      Josh Bleecher Snyder authored
      While running make.bash, over 5% of all pointer writes
      come from encoding/binary doing struct reads.
      
      This change replaces slicing during such reads with an offset.
      This avoids updating the slice pointer with every
      struct field read or write.
      
      This has no impact when the write barrier is off.
      Running the benchmarks with GOGC=1, however,
      shows significant improvement:
      
      name          old time/op    new time/op    delta
      ReadStruct-8    13.2µs ± 6%    10.1µs ± 5%  -23.24%  (p=0.000 n=10+10)
      
      name          old speed      new speed      delta
      ReadStruct-8  5.69MB/s ± 6%  7.40MB/s ± 5%  +30.18%  (p=0.000 n=10+10)
      
      Change-Id: I22904263196bfeddc38abe8989428e263aee5253
      Reviewed-on: https://go-review.googlesource.com/98757
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      b8543397
    • Josh Bleecher Snyder's avatar
      runtime: skip pointless writes in freedefer · f7739c07
      Josh Bleecher Snyder authored
      Change-Id: I501a0e5c87ec88616c7dcdf1b723758b6df6c088
      Reviewed-on: https://go-review.googlesource.com/98758
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      f7739c07
    • Josh Bleecher Snyder's avatar
      debug/macho: use bytes.IndexByte instead of a loop · 4599419e
      Josh Bleecher Snyder authored
      Simpler, and no doubt faster.
      
      Change-Id: Idd401918da07a257de365087721e9ff061e6fd07
      Reviewed-on: https://go-review.googlesource.com/98759
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      4599419e
    • Balaram Makam's avatar
      cmd/compile/internal/ssa: inline small memmove for arm64 · 0e8b7110
      Balaram Makam authored
      This patch enables the optimization for arm64 target.
      
      Performance results on Amberwing for strconv benchmark:
      name             old time/op  new time/op  delta
      Quote             721ns ± 0%   617ns ± 0%  -14.40%  (p=0.016 n=5+4)
      QuoteRune         118ns ± 0%   117ns ± 0%   -0.85%  (p=0.008 n=5+5)
      AppendQuote       436ns ± 2%   321ns ± 0%  -26.31%  (p=0.008 n=5+5)
      AppendQuoteRune  34.7ns ± 0%  28.4ns ± 0%  -18.16%  (p=0.000 n=5+4)
      [Geo mean]        189ns        160ns       -15.41%
      
      Change-Id: I5714c474e7483d07ca338fbaf49beb4bbcc11c44
      Reviewed-on: https://go-review.googlesource.com/98735
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      0e8b7110
    • Alberto Donizetti's avatar
      test/codegen: port math/bits.OnesCount tests to codegen · 18ae5eca
      Alberto Donizetti authored
      And remove them from ssa_test.
      
      Change-Id: I3efac5fea529bb0efa2dae32124530482ba5058e
      Reviewed-on: https://go-review.googlesource.com/98815Reviewed-by: default avatarKeith Randall <khr@golang.org>
      18ae5eca
    • Cherry Zhang's avatar
      cmd/internal/obj/arm64: gofmt · f6244454
      Cherry Zhang authored
      Change-Id: Ica778fef2d0245fbb14f595597e45c7cf6adef84
      Reviewed-on: https://go-review.googlesource.com/98895
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      f6244454
    • Elias Naur's avatar
      iostest.bash: don't build std library twice · f5f16d1e
      Elias Naur authored
      Instead, mirror androidtest.bash and build once, then run run.bash.
      
      Change-Id: I174ae30b2a429a62b20bb290a70cb07ed712b1e4
      Reviewed-on: https://go-review.googlesource.com/98915Reviewed-by: default avatarHyang-Ah Hana Kim <hyangah@gmail.com>
      Run-TryBot: Elias Naur <elias.naur@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      f5f16d1e
    • Elias Naur's avatar
      cmd/dist: default to GOARM=7 on android · ad87a67c
      Elias Naur authored
      Auto-detecting GOARM on Android makes as little sense as for nacl/arm
      and darwin/arm.
      
      Also update androidtest.sh to not require GOARM set.
      
      Change-Id: Id409ce1573d3c668d00fa4b7e3562ad7ece6fef5
      Reviewed-on: https://go-review.googlesource.com/98875Reviewed-by: default avatarHyang-Ah Hana Kim <hyangah@gmail.com>
      Run-TryBot: Elias Naur <elias.naur@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      ad87a67c
    • Cherry Zhang's avatar
      math/big: don't use R18 in ARM64 assembly · 084143d8
      Cherry Zhang authored
      R18 seems reserved on Apple platforms.
      
      May fix darwin/arm64 build.
      
      Change-Id: Ia2c1de550a64827c85a64affa53b94c62aacce8e
      Reviewed-on: https://go-review.googlesource.com/98896
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarElias Naur <elias.naur@gmail.com>
      084143d8
    • Tobias Klauser's avatar
      runtime: fix stack switch check in walltime/nanotime on linux/arm · 9745397e
      Tobias Klauser authored
      CL 98095 got the check wrong. We should be testing
      'getg() == getg().m.curg', not 'getg().m == getg().m.curg'.
      
      Change-Id: I32f6238b00409b67afa8efe732513d542aec5bc7
      Reviewed-on: https://go-review.googlesource.com/98855
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      9745397e
    • Alberto Donizetti's avatar
      test/codegen: port math/bits.TrailingZeros tests to codegen · 85dcc709
      Alberto Donizetti authored
      And remove them from ssa_test.
      
      Change-Id: Ib5de5c0d908f23915e0847eca338cacf2fa5325b
      Reviewed-on: https://go-review.googlesource.com/98795Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      85dcc709
    • as's avatar
      net/http: correct subtle transposition of offset and whence in test · df8c2b90
      as authored
      Change-Id: I788972bdf85c0225397c0e74901bf9c33c6d30c7
      GitHub-Last-Rev: 57737fe782bf7ad2d765c2efd80d75b3baca2c7b
      GitHub-Pull-Request: golang/go#24265
      Reviewed-on: https://go-review.googlesource.com/98761Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      df8c2b90
    • Meng Zhuo's avatar
      runtime, cmd/compile: use ldp for DUFFCOPY on ARM64 · 8916773a
      Meng Zhuo authored
      name         old time/op  new time/op  delta
      CopyFat8     2.15ns ± 1%  2.19ns ± 6%     ~     (p=0.171 n=8+9)
      CopyFat12    2.15ns ± 0%  2.17ns ± 2%     ~     (p=0.137 n=8+10)
      CopyFat16    2.17ns ± 3%  2.15ns ± 0%     ~     (p=0.211 n=10+10)
      CopyFat24    2.16ns ± 1%  2.15ns ± 0%     ~     (p=0.087 n=10+10)
      CopyFat32    11.5ns ± 0%  12.8ns ± 2%  +10.87%  (p=0.000 n=8+10)
      CopyFat64    20.2ns ± 2%  12.9ns ± 0%  -36.11%  (p=0.000 n=10+10)
      CopyFat128   37.2ns ± 0%  21.5ns ± 0%  -42.20%  (p=0.000 n=10+10)
      CopyFat256   71.6ns ± 0%  38.7ns ± 0%  -45.95%  (p=0.000 n=10+10)
      CopyFat512    140ns ± 0%    73ns ± 0%  -47.86%  (p=0.000 n=10+9)
      CopyFat520    142ns ± 0%    74ns ± 0%  -47.54%  (p=0.000 n=10+10)
      CopyFat1024   277ns ± 0%   141ns ± 0%  -49.10%  (p=0.000 n=10+10)
      
      Change-Id: If54bc571add5db674d5e081579c87e80153d0a5a
      Reviewed-on: https://go-review.googlesource.com/97395Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      8916773a
    • Rob Pike's avatar
      cmd/doc: make local dot-slash path names work · baf3eb16
      Rob Pike authored
      Before, an argument that started ./ or ../ was not treated as
      a package relative to the current directory. Thus
      
      	$ cd $GOROOT/src/text
      	$ go doc ./template
      
      could find html/template as $GOROOT/src/html/./template
      is a valid Go source directory.
      
      Fix this by catching such paths and making them absolute before
      processing.
      
      Fixes #23383.
      
      Change-Id: Ic2a92eaa3a6328f728635657f9de72ac3ee82afb
      Reviewed-on: https://go-review.googlesource.com/98396Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      baf3eb16
    • Fangming.Fang's avatar
      crypto/aes: optimize arm64 AES implementation · 917e7269
      Fangming.Fang authored
      This patch makes use of arm64 AES instructions to accelerate AES computation
      and only supports optimization on Linux for arm64
      
      name        old time/op    new time/op     delta
      Encrypt-32     255ns ± 0%       26ns ± 0%   -89.73%
      Decrypt-32     256ns ± 0%       26ns ± 0%   -89.77%
      Expand-32      990ns ± 5%      901ns ± 0%    -9.05%
      
      name        old speed      new speed       delta
      Encrypt-32  62.5MB/s ± 0%  610.4MB/s ± 0%  +876.39%
      Decrypt-32  62.3MB/s ± 0%  610.2MB/s ± 0%  +879.6%
      
      Fixes #18498
      
      Change-Id: If416e5a151785325527b32ff72f6da3812493ed0
      Reviewed-on: https://go-review.googlesource.com/64490
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      917e7269
    • erifan01's avatar
      math/big: optimize addVV and subVV on arm64 · c4f3fe95
      erifan01 authored
      The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance.
      By unrolling the cycle 4x and 2x, and using "LDP", "STP" instructions, this CL can reduce the "load" cost and improve performance.
      
      Benchmarks:
      
      name                              old time/op    new time/op     delta
      AddVV/1-8                           21.5ns ± 0%     11.5ns ± 0%   -46.51%  (p=0.008 n=5+5)
      AddVV/2-8                           13.5ns ± 0%     12.0ns ± 0%   -11.11%  (p=0.008 n=5+5)
      AddVV/3-8                           15.5ns ± 0%     13.0ns ± 0%   -16.13%  (p=0.008 n=5+5)
      AddVV/4-8                           17.5ns ± 0%     13.5ns ± 0%   -22.86%  (p=0.008 n=5+5)
      AddVV/5-8                           19.5ns ± 0%     14.5ns ± 0%   -25.64%  (p=0.008 n=5+5)
      AddVV/10-8                          29.5ns ± 0%     18.0ns ± 0%   -38.98%  (p=0.008 n=5+5)
      AddVV/100-8                          217ns ± 0%       94ns ± 0%   -56.64%  (p=0.008 n=5+5)
      AddVV/1000-8                        2.02µs ± 0%     1.03µs ± 0%   -48.85%  (p=0.008 n=5+5)
      AddVV/10000-8                       20.5µs ± 0%     11.3µs ± 0%   -44.70%  (p=0.008 n=5+5)
      AddVV/100000-8                       247µs ± 3%      154µs ± 0%   -37.52%  (p=0.008 n=5+5)
      SubVV/1-8                           21.5ns ± 0%     11.5ns ± 0%      ~     (p=0.079 n=4+5)
      SubVV/2-8                           13.5ns ± 0%     12.0ns ± 0%   -11.11%  (p=0.008 n=5+5)
      SubVV/3-8                           15.5ns ± 0%     13.0ns ± 0%   -16.13%  (p=0.008 n=5+5)
      SubVV/4-8                           17.5ns ± 0%     13.5ns ± 0%   -22.86%  (p=0.008 n=5+5)
      SubVV/5-8                           19.5ns ± 0%     14.5ns ± 0%   -25.64%  (p=0.008 n=5+5)
      SubVV/10-8                          29.5ns ± 0%     18.0ns ± 0%   -38.98%  (p=0.008 n=5+5)
      SubVV/100-8                          217ns ± 0%       94ns ± 0%   -56.64%  (p=0.008 n=5+5)
      SubVV/1000-8                        2.02µs ± 0%     0.80µs ± 0%   -60.50%  (p=0.008 n=5+5)
      SubVV/10000-8                       20.5µs ± 0%     11.3µs ± 0%   -44.99%  (p=0.008 n=5+5)
      SubVV/100000-8                       221µs ±11%      223µs ±16%      ~     (p=0.690 n=5+5)
      AddVW/1-8                           9.32ns ± 0%     9.32ns ± 0%      ~     (all equal)
      AddVW/2-8                           19.7ns ± 1%     19.7ns ± 0%      ~     (p=0.381 n=5+4)
      AddVW/3-8                           11.5ns ± 0%     11.5ns ± 0%      ~     (all equal)
      AddVW/4-8                           13.0ns ± 0%     13.0ns ± 0%      ~     (all equal)
      AddVW/5-8                           14.5ns ± 0%     14.5ns ± 0%      ~     (all equal)
      AddVW/10-8                          22.0ns ± 0%     22.0ns ± 0%      ~     (all equal)
      AddVW/100-8                          167ns ± 0%      167ns ± 0%      ~     (all equal)
      AddVW/1000-8                        1.52µs ± 0%     1.52µs ± 0%    +0.40%  (p=0.008 n=5+5)
      AddVW/10000-8                       15.1µs ± 0%     15.1µs ± 0%      ~     (p=0.556 n=5+4)
      AddVW/100000-8                       152µs ± 1%      152µs ± 1%      ~     (p=0.690 n=5+5)
      AddMulVVW/1-8                       33.3ns ± 0%     32.7ns ± 1%    -1.86%  (p=0.008 n=5+5)
      AddMulVVW/2-8                       59.3ns ± 1%     56.9ns ± 1%    -4.15%  (p=0.008 n=5+5)
      AddMulVVW/3-8                       80.5ns ± 1%     85.4ns ± 3%    +6.19%  (p=0.008 n=5+5)
      AddMulVVW/4-8                        127ns ± 0%      111ns ± 1%   -13.19%  (p=0.008 n=5+5)
      AddMulVVW/5-8                        144ns ± 0%      149ns ± 0%    +3.47%  (p=0.016 n=4+5)
      AddMulVVW/10-8                       298ns ± 1%      283ns ± 0%    -4.77%  (p=0.008 n=5+5)
      AddMulVVW/100-8                     3.06µs ± 0%     2.99µs ± 0%    -2.21%  (p=0.008 n=5+5)
      AddMulVVW/1000-8                    31.3µs ± 0%     26.9µs ± 0%   -14.17%  (p=0.008 n=5+5)
      AddMulVVW/10000-8                    316µs ± 0%      305µs ± 0%    -3.51%  (p=0.008 n=5+5)
      AddMulVVW/100000-8                  3.17ms ± 0%     3.17ms ± 1%      ~     (p=0.690 n=5+5)
      DecimalConversion-8                  316µs ± 1%      313µs ± 2%      ~     (p=0.095 n=5+5)
      FloatString/100-8                   2.53µs ± 1%     2.56µs ± 2%      ~     (p=0.222 n=5+5)
      FloatString/1000-8                  58.4µs ± 0%     58.5µs ± 0%      ~     (p=0.206 n=5+5)
      FloatString/10000-8                 4.59ms ± 0%     4.58ms ± 0%    -0.31%  (p=0.008 n=5+5)
      FloatString/100000-8                 446ms ± 0%      444ms ± 0%    -0.31%  (p=0.008 n=5+5)
      FloatAdd/10-8                        184ns ± 0%      172ns ± 0%    -6.30%  (p=0.008 n=5+5)
      FloatAdd/100-8                       189ns ± 2%      191ns ± 4%      ~     (p=0.381 n=5+5)
      FloatAdd/1000-8                      371ns ± 0%      347ns ± 1%    -6.42%  (p=0.008 n=5+5)
      FloatAdd/10000-8                    1.87µs ± 0%     1.68µs ± 0%   -10.16%  (p=0.008 n=5+5)
      FloatAdd/100000-8                   17.1µs ± 0%     15.6µs ± 0%    -8.74%  (p=0.016 n=5+4)
      FloatSub/10-8                        152ns ± 0%      138ns ± 0%    -9.47%  (p=0.000 n=4+5)
      FloatSub/100-8                       148ns ± 0%      142ns ± 0%    -4.05%  (p=0.000 n=5+4)
      FloatSub/1000-8                      245ns ± 1%      217ns ± 0%   -11.28%  (p=0.000 n=5+4)
      FloatSub/10000-8                    1.07µs ± 0%     0.88µs ± 1%   -18.14%  (p=0.008 n=5+5)
      FloatSub/100000-8                   9.58µs ± 0%     7.96µs ± 0%   -16.84%  (p=0.008 n=5+5)
      ParseFloatSmallExp-8                28.8µs ± 1%     29.0µs ± 1%      ~     (p=0.095 n=5+5)
      ParseFloatLargeExp-8                 126µs ± 1%      126µs ± 1%      ~     (p=0.841 n=5+5)
      GCD10x10/WithoutXY-8                 277ns ± 2%      281ns ± 4%      ~     (p=0.746 n=5+5)
      GCD10x10/WithXY-8                   2.10µs ± 1%     2.12µs ± 3%      ~     (p=0.548 n=5+5)
      GCD10x100/WithoutXY-8                615ns ± 3%      607ns ± 2%      ~     (p=0.135 n=5+5)
      GCD10x100/WithXY-8                  3.50µs ± 2%     3.62µs ± 5%      ~     (p=0.151 n=5+5)
      GCD10x1000/WithoutXY-8              1.39µs ± 2%     1.39µs ± 3%      ~     (p=0.690 n=5+5)
      GCD10x1000/WithXY-8                 7.39µs ± 1%     7.34µs ± 2%      ~     (p=0.135 n=5+5)
      GCD10x10000/WithoutXY-8             8.66µs ± 1%     8.68µs ± 1%      ~     (p=0.421 n=5+5)
      GCD10x10000/WithXY-8                28.1µs ± 2%     27.0µs ± 2%    -3.81%  (p=0.008 n=5+5)
      GCD10x100000/WithoutXY-8            79.3µs ± 1%     79.3µs ± 1%      ~     (p=0.841 n=5+5)
      GCD10x100000/WithXY-8                238µs ± 0%      227µs ± 1%    -4.74%  (p=0.008 n=5+5)
      GCD100x100/WithoutXY-8              1.89µs ± 1%     1.88µs ± 2%      ~     (p=0.968 n=5+5)
      GCD100x100/WithXY-8                 26.7µs ± 1%     27.0µs ± 1%    +1.44%  (p=0.032 n=5+5)
      GCD100x1000/WithoutXY-8             4.48µs ± 1%     4.45µs ± 2%      ~     (p=0.341 n=5+5)
      GCD100x1000/WithXY-8                36.3µs ± 1%     35.1µs ± 1%    -3.27%  (p=0.008 n=5+5)
      GCD100x10000/WithoutXY-8            22.8µs ± 0%     22.7µs ± 1%      ~     (p=0.056 n=5+5)
      GCD100x10000/WithXY-8                145µs ± 1%      133µs ± 1%    -8.33%  (p=0.008 n=5+5)
      GCD100x100000/WithoutXY-8            198µs ± 0%      195µs ± 0%    -1.56%  (p=0.008 n=5+5)
      GCD100x100000/WithXY-8              1.11ms ± 0%     1.00ms ± 0%   -10.04%  (p=0.008 n=5+5)
      GCD1000x1000/WithoutXY-8            25.2µs ± 1%     24.8µs ± 1%    -1.63%  (p=0.016 n=5+5)
      GCD1000x1000/WithXY-8                513µs ± 0%      517µs ± 2%      ~     (p=0.421 n=5+5)
      GCD1000x10000/WithoutXY-8           57.0µs ± 0%     52.7µs ± 1%    -7.56%  (p=0.008 n=5+5)
      GCD1000x10000/WithXY-8              1.20ms ± 0%     1.10ms ± 0%    -8.70%  (p=0.008 n=5+5)
      GCD1000x100000/WithoutXY-8           358µs ± 0%      318µs ± 1%   -11.03%  (p=0.008 n=5+5)
      GCD1000x100000/WithXY-8             8.71ms ± 0%     7.65ms ± 0%   -12.19%  (p=0.008 n=5+5)
      GCD10000x10000/WithoutXY-8           690µs ± 0%      630µs ± 0%    -8.71%  (p=0.008 n=5+5)
      GCD10000x10000/WithXY-8             16.0ms ± 1%     14.9ms ± 0%    -6.85%  (p=0.008 n=5+5)
      GCD10000x100000/WithoutXY-8         2.09ms ± 0%     1.75ms ± 0%   -16.09%  (p=0.016 n=5+4)
      GCD10000x100000/WithXY-8            86.8ms ± 0%     76.3ms ± 0%   -12.09%  (p=0.008 n=5+5)
      GCD100000x100000/WithoutXY-8        51.1ms ± 0%     46.0ms ± 0%    -9.97%  (p=0.008 n=5+5)
      GCD100000x100000/WithXY-8            1.25s ± 0%      1.15s ± 0%    -7.92%  (p=0.008 n=5+5)
      Hilbert-8                           2.45ms ± 1%     2.49ms ± 1%    +1.99%  (p=0.008 n=5+5)
      Binomial-8                          4.98µs ± 3%     4.90µs ± 2%      ~     (p=0.421 n=5+5)
      QuoRem-8                            7.10µs ± 0%     6.21µs ± 0%   -12.55%  (p=0.016 n=5+4)
      Exp-8                                161ms ± 0%      161ms ± 0%      ~     (p=0.421 n=5+5)
      Exp2-8                               161ms ± 0%      161ms ± 0%      ~     (p=0.151 n=5+5)
      Bitset-8                            40.4ns ± 0%     40.3ns ± 0%      ~     (p=0.190 n=5+5)
      BitsetNeg-8                          163ns ± 3%      137ns ± 2%   -15.91%  (p=0.008 n=5+5)
      BitsetOrig-8                         377ns ± 1%      372ns ± 1%    -1.22%  (p=0.024 n=5+5)
      BitsetNegOrig-8                      631ns ± 1%      605ns ± 1%    -4.09%  (p=0.008 n=5+5)
      ModSqrt225_Tonelli-8                7.26ms ± 0%     7.26ms ± 0%      ~     (p=0.548 n=5+5)
      ModSqrt224_3Mod4-8                  2.24ms ± 0%     2.24ms ± 0%      ~     (p=1.000 n=5+5)
      ModSqrt5430_Tonelli-8                62.4s ± 0%      62.4s ± 0%      ~     (p=0.841 n=5+5)
      ModSqrt5430_3Mod4-8                  20.8s ± 0%      20.7s ± 0%      ~     (p=0.056 n=5+5)
      Sqrt-8                               101µs ± 0%       89µs ± 0%   -12.17%  (p=0.008 n=5+5)
      IntSqr/1-8                          32.5ns ± 1%     32.7ns ± 1%      ~     (p=0.056 n=5+5)
      IntSqr/2-8                           160ns ± 5%      158ns ± 0%      ~     (p=0.397 n=5+4)
      IntSqr/3-8                           298ns ± 4%      296ns ± 4%      ~     (p=0.667 n=5+5)
      IntSqr/5-8                           737ns ± 5%      761ns ± 3%    +3.34%  (p=0.016 n=5+5)
      IntSqr/8-8                          1.87µs ± 4%     1.90µs ± 3%      ~     (p=0.222 n=5+5)
      IntSqr/10-8                         2.96µs ± 4%     2.92µs ± 6%      ~     (p=0.310 n=5+5)
      IntSqr/20-8                         6.28µs ± 3%     6.21µs ± 2%      ~     (p=0.310 n=5+5)
      IntSqr/30-8                         14.0µs ± 2%     13.9µs ± 2%      ~     (p=0.548 n=5+5)
      IntSqr/50-8                         37.7µs ± 3%     38.3µs ± 2%      ~     (p=0.095 n=5+5)
      IntSqr/80-8                         95.9µs ± 2%     95.1µs ± 1%      ~     (p=0.310 n=5+5)
      IntSqr/100-8                         148µs ± 1%      148µs ± 1%      ~     (p=0.841 n=5+5)
      IntSqr/200-8                         586µs ± 1%      587µs ± 1%      ~     (p=1.000 n=5+5)
      IntSqr/300-8                        1.32ms ± 0%     1.31ms ± 1%    -0.73%  (p=0.032 n=5+5)
      IntSqr/500-8                        2.48ms ± 0%     2.45ms ± 0%    -1.15%  (p=0.008 n=5+5)
      IntSqr/800-8                        4.68ms ± 0%     4.62ms ± 0%    -1.23%  (p=0.008 n=5+5)
      IntSqr/1000-8                       7.57ms ± 0%     7.50ms ± 0%    -0.84%  (p=0.008 n=5+5)
      Mul-8                                311ms ± 0%      308ms ± 0%    -0.81%  (p=0.008 n=5+5)
      Exp3Power/0x10-8                     574ns ± 1%      578ns ± 2%      ~     (p=0.500 n=5+5)
      Exp3Power/0x40-8                     640ns ± 1%      646ns ± 0%      ~     (p=0.056 n=5+5)
      Exp3Power/0x100-8                   1.42µs ± 1%     1.42µs ± 1%      ~     (p=0.246 n=5+5)
      Exp3Power/0x400-8                   8.30µs ± 1%     8.29µs ± 1%      ~     (p=0.802 n=5+5)
      Exp3Power/0x1000-8                  60.0µs ± 0%     59.9µs ± 0%    -0.24%  (p=0.016 n=5+5)
      Exp3Power/0x4000-8                   817µs ± 0%      816µs ± 0%    -0.17%  (p=0.008 n=5+5)
      Exp3Power/0x10000-8                 7.80ms ± 1%     7.70ms ± 0%    -1.23%  (p=0.008 n=5+5)
      Exp3Power/0x40000-8                 73.4ms ± 0%     72.5ms ± 0%    -1.28%  (p=0.008 n=5+5)
      Exp3Power/0x100000-8                 665ms ± 0%      656ms ± 0%    -1.34%  (p=0.008 n=5+5)
      Exp3Power/0x400000-8                 5.99s ± 0%      5.90s ± 0%    -1.40%  (p=0.008 n=5+5)
      Fibo-8                               116ms ± 0%       50ms ± 0%   -57.09%  (p=0.008 n=5+5)
      NatSqr/1-8                           112ns ± 4%      112ns ± 2%      ~     (p=0.968 n=5+5)
      NatSqr/2-8                           251ns ± 2%      250ns ± 1%      ~     (p=0.571 n=5+5)
      NatSqr/3-8                           378ns ± 2%      379ns ± 2%      ~     (p=0.794 n=5+5)
      NatSqr/5-8                           829ns ± 3%      827ns ± 2%      ~     (p=1.000 n=5+5)
      NatSqr/8-8                          1.97µs ± 2%     1.95µs ± 2%      ~     (p=0.310 n=5+5)
      NatSqr/10-8                         3.02µs ± 2%     2.99µs ± 2%      ~     (p=0.421 n=5+5)
      NatSqr/20-8                         6.51µs ± 2%     6.49µs ± 1%      ~     (p=0.841 n=5+5)
      NatSqr/30-8                         14.1µs ± 2%     14.0µs ± 2%      ~     (p=0.841 n=5+5)
      NatSqr/50-8                         38.1µs ± 2%     38.3µs ± 3%      ~     (p=0.690 n=5+5)
      NatSqr/80-8                         95.5µs ± 2%     96.0µs ± 1%      ~     (p=0.421 n=5+5)
      NatSqr/100-8                         150µs ± 1%      148µs ± 2%      ~     (p=0.095 n=5+5)
      NatSqr/200-8                         588µs ± 1%      590µs ± 1%      ~     (p=0.421 n=5+5)
      NatSqr/300-8                        1.32ms ± 1%     1.31ms ± 1%      ~     (p=0.841 n=5+5)
      NatSqr/500-8                        2.50ms ± 0%     2.47ms ± 0%    -1.03%  (p=0.008 n=5+5)
      NatSqr/800-8                        4.70ms ± 0%     4.64ms ± 0%    -1.31%  (p=0.008 n=5+5)
      NatSqr/1000-8                       7.60ms ± 0%     7.52ms ± 0%    -1.01%  (p=0.008 n=5+5)
      ScanPi-8                             326µs ± 0%      326µs ± 0%      ~     (p=0.841 n=5+5)
      StringPiParallel-8                  70.3µs ± 5%     63.8µs ±10%      ~     (p=0.056 n=5+5)
      Scan/10/Base2-8                     1.09µs ± 0%     1.09µs ± 0%      ~     (p=0.317 n=5+5)
      Scan/100/Base2-8                    7.79µs ± 0%     7.78µs ± 0%      ~     (p=0.063 n=5+5)
      Scan/1000/Base2-8                   79.0µs ± 0%     78.9µs ± 0%    -0.18%  (p=0.008 n=5+5)
      Scan/10000/Base2-8                  1.22ms ± 0%     1.22ms ± 0%    -0.15%  (p=0.008 n=5+5)
      Scan/100000/Base2-8                 55.1ms ± 0%     55.2ms ± 0%    +0.20%  (p=0.008 n=5+5)
      Scan/10/Base8-8                      512ns ± 0%      512ns ± 1%      ~     (p=0.810 n=5+5)
      Scan/100/Base8-8                    2.89µs ± 0%     2.89µs ± 0%      ~     (p=0.810 n=5+5)
      Scan/1000/Base8-8                   31.0µs ± 0%     31.0µs ± 0%      ~     (p=0.151 n=5+5)
      Scan/10000/Base8-8                   740µs ± 0%      741µs ± 0%    +0.10%  (p=0.008 n=5+5)
      Scan/100000/Base8-8                 50.6ms ± 0%     50.6ms ± 0%    +0.08%  (p=0.008 n=5+5)
      Scan/10/Base10-8                     487ns ± 0%      487ns ± 0%      ~     (p=0.571 n=5+5)
      Scan/100/Base10-8                   2.67µs ± 0%     2.67µs ± 0%      ~     (p=0.810 n=5+5)
      Scan/1000/Base10-8                  28.7µs ± 0%     28.7µs ± 0%    +0.06%  (p=0.008 n=5+5)
      Scan/10000/Base10-8                  716µs ± 0%      717µs ± 0%      ~     (p=0.222 n=5+5)
      Scan/100000/Base10-8                50.3ms ± 0%     50.3ms ± 0%    +0.10%  (p=0.008 n=5+5)
      Scan/10/Base16-8                     438ns ± 0%      437ns ± 1%      ~     (p=0.786 n=5+5)
      Scan/100/Base16-8                   2.47µs ± 0%     2.47µs ± 0%    -0.19%  (p=0.048 n=5+5)
      Scan/1000/Base16-8                  27.2µs ± 0%     27.3µs ± 0%      ~     (p=0.087 n=5+5)
      Scan/10000/Base16-8                  722µs ± 0%      722µs ± 0%    +0.11%  (p=0.008 n=5+5)
      Scan/100000/Base16-8                52.6ms ± 0%     52.7ms ± 0%    +0.15%  (p=0.008 n=5+5)
      String/10/Base2-8                    247ns ± 2%      248ns ± 1%      ~     (p=0.437 n=5+5)
      String/100/Base2-8                  1.51µs ± 0%     1.51µs ± 0%    -0.37%  (p=0.024 n=5+5)
      String/1000/Base2-8                 13.6µs ± 1%     13.5µs ± 0%      ~     (p=0.095 n=5+5)
      String/10000/Base2-8                 135µs ± 0%      135µs ± 1%      ~     (p=0.841 n=5+5)
      String/100000/Base2-8               1.32ms ± 1%     1.32ms ± 1%      ~     (p=0.690 n=5+5)
      String/10/Base8-8                    169ns ± 1%      169ns ± 1%      ~     (p=1.000 n=5+5)
      String/100/Base8-8                   636ns ± 0%      634ns ± 1%      ~     (p=0.413 n=5+5)
      String/1000/Base8-8                 5.33µs ± 1%     5.32µs ± 0%      ~     (p=0.222 n=5+5)
      String/10000/Base8-8                50.9µs ± 1%     50.7µs ± 0%      ~     (p=0.151 n=5+5)
      String/100000/Base8-8                500µs ± 1%      497µs ± 0%      ~     (p=0.421 n=5+5)
      String/10/Base10-8                   516ns ± 1%      513ns ± 0%    -0.62%  (p=0.016 n=5+4)
      String/100/Base10-8                 1.97µs ± 0%     1.96µs ± 0%      ~     (p=0.667 n=4+5)
      String/1000/Base10-8                12.5µs ± 0%     11.5µs ± 0%    -7.92%  (p=0.008 n=5+5)
      String/10000/Base10-8               57.7µs ± 0%     52.5µs ± 0%    -8.93%  (p=0.008 n=5+5)
      String/100000/Base10-8              25.6ms ± 0%     21.6ms ± 0%   -15.94%  (p=0.008 n=5+5)
      String/10/Base16-8                   150ns ± 1%      149ns ± 0%      ~     (p=0.413 n=5+4)
      String/100/Base16-8                  514ns ± 1%      514ns ± 1%      ~     (p=0.849 n=5+5)
      String/1000/Base16-8                4.01µs ± 0%     4.01µs ± 0%      ~     (p=0.421 n=5+5)
      String/10000/Base16-8               37.8µs ± 1%     37.8µs ± 1%      ~     (p=0.841 n=5+5)
      String/100000/Base16-8               373µs ± 2%      373µs ± 0%      ~     (p=0.421 n=5+5)
      LeafSize/0-8                        6.63ms ± 0%     6.63ms ± 0%      ~     (p=0.730 n=4+5)
      LeafSize/1-8                        74.0µs ± 0%     67.7µs ± 1%    -8.53%  (p=0.008 n=5+5)
      LeafSize/2-8                        74.2µs ± 0%     68.3µs ± 1%    -7.99%  (p=0.008 n=5+5)
      LeafSize/3-8                         379µs ± 0%      309µs ± 0%   -18.52%  (p=0.008 n=5+5)
      LeafSize/4-8                        72.7µs ± 1%     66.7µs ± 0%    -8.37%  (p=0.008 n=5+5)
      LeafSize/5-8                         471µs ± 0%      384µs ± 0%   -18.55%  (p=0.008 n=5+5)
      LeafSize/6-8                         378µs ± 0%      308µs ± 0%   -18.59%  (p=0.008 n=5+5)
      LeafSize/7-8                         245µs ± 0%      204µs ± 1%   -16.75%  (p=0.008 n=5+5)
      LeafSize/8-8                        73.4µs ± 0%     66.9µs ± 1%    -8.79%  (p=0.008 n=5+5)
      LeafSize/9-8                         538µs ± 0%      437µs ± 0%   -18.75%  (p=0.008 n=5+5)
      LeafSize/10-8                        472µs ± 0%      396µs ± 1%   -16.01%  (p=0.008 n=5+5)
      LeafSize/11-8                        460µs ± 0%      374µs ± 0%   -18.58%  (p=0.008 n=5+5)
      LeafSize/12-8                        378µs ± 0%      308µs ± 0%   -18.38%  (p=0.008 n=5+5)
      LeafSize/13-8                        343µs ± 0%      284µs ± 0%   -17.30%  (p=0.008 n=5+5)
      LeafSize/14-8                        248µs ± 0%      206µs ± 0%   -16.94%  (p=0.008 n=5+5)
      LeafSize/15-8                        169µs ± 0%      144µs ± 0%   -14.69%  (p=0.008 n=5+5)
      LeafSize/16-8                       72.9µs ± 0%     66.8µs ± 1%    -8.27%  (p=0.008 n=5+5)
      LeafSize/32-8                       82.5µs ± 0%     76.7µs ± 0%    -7.04%  (p=0.008 n=5+5)
      LeafSize/64-8                        134µs ± 0%      129µs ± 0%    -3.80%  (p=0.008 n=5+5)
      ProbablyPrime/n=0-8                 44.2ms ± 0%     43.4ms ± 0%    -1.95%  (p=0.008 n=5+5)
      ProbablyPrime/n=1-8                 64.9ms ± 0%     64.0ms ± 0%    -1.27%  (p=0.008 n=5+5)
      ProbablyPrime/n=5-8                  147ms ± 0%      146ms ± 0%    -0.58%  (p=0.008 n=5+5)
      ProbablyPrime/n=10-8                 250ms ± 0%      249ms ± 0%    -0.35%  (p=0.008 n=5+5)
      ProbablyPrime/n=20-8                 456ms ± 0%      455ms ± 0%    -0.18%  (p=0.008 n=5+5)
      ProbablyPrime/Lucas-8               23.6ms ± 0%     22.7ms ± 0%    -3.74%  (p=0.008 n=5+5)
      ProbablyPrime/MillerRabinBase2-8    20.7ms ± 0%     20.6ms ± 0%      ~     (p=0.421 n=5+5)
      FloatSqrt/64-8                      2.25µs ± 1%     2.29µs ± 0%    +1.48%  (p=0.008 n=5+5)
      FloatSqrt/128-8                     4.86µs ± 1%     4.92µs ± 1%    +1.21%  (p=0.032 n=5+5)
      FloatSqrt/256-8                     13.6µs ± 0%     13.7µs ± 1%    +1.31%  (p=0.032 n=5+5)
      FloatSqrt/1000-8                    70.0µs ± 1%     70.1µs ± 0%      ~     (p=0.690 n=5+5)
      FloatSqrt/10000-8                   1.92ms ± 0%     1.90ms ± 0%    -0.59%  (p=0.008 n=5+5)
      FloatSqrt/100000-8                  55.3ms ± 0%     54.8ms ± 0%    -1.01%  (p=0.008 n=5+5)
      FloatSqrt/1000000-8                  4.56s ± 0%      4.50s ± 0%    -1.28%  (p=0.008 n=5+5)
      
      name                              old speed      new speed       delta
      AddVV/1-8                         2.97GB/s ± 0%   5.56GB/s ± 0%   +86.85%  (p=0.008 n=5+5)
      AddVV/2-8                         9.47GB/s ± 0%  10.66GB/s ± 0%   +12.50%  (p=0.008 n=5+5)
      AddVV/3-8                         12.4GB/s ± 0%   14.7GB/s ± 0%   +19.10%  (p=0.008 n=5+5)
      AddVV/4-8                         14.6GB/s ± 0%   18.9GB/s ± 0%   +29.63%  (p=0.016 n=4+5)
      AddVV/5-8                         16.4GB/s ± 0%   22.0GB/s ± 0%   +34.47%  (p=0.016 n=5+4)
      AddVV/10-8                        21.7GB/s ± 0%   35.5GB/s ± 0%   +63.89%  (p=0.008 n=5+5)
      AddVV/100-8                       29.4GB/s ± 0%   68.0GB/s ± 0%  +131.38%  (p=0.008 n=5+5)
      AddVV/1000-8                      31.7GB/s ± 0%   61.9GB/s ± 0%   +95.43%  (p=0.008 n=5+5)
      AddVV/10000-8                     31.2GB/s ± 0%   56.4GB/s ± 0%   +80.83%  (p=0.008 n=5+5)
      AddVV/100000-8                    25.9GB/s ± 3%   41.4GB/s ± 0%   +59.98%  (p=0.008 n=5+5)
      SubVV/1-8                         2.97GB/s ± 0%   5.56GB/s ± 0%   +86.97%  (p=0.016 n=4+5)
      SubVV/2-8                         9.47GB/s ± 0%  10.66GB/s ± 0%   +12.51%  (p=0.008 n=5+5)
      SubVV/3-8                         12.4GB/s ± 0%   14.8GB/s ± 0%   +19.23%  (p=0.016 n=4+5)
      SubVV/4-8                         14.6GB/s ± 0%   18.9GB/s ± 0%   +29.56%  (p=0.008 n=5+5)
      SubVV/5-8                         16.4GB/s ± 0%   22.0GB/s ± 0%   +34.47%  (p=0.016 n=4+5)
      SubVV/10-8                        21.7GB/s ± 0%   35.5GB/s ± 0%   +63.89%  (p=0.008 n=5+5)
      SubVV/100-8                       29.4GB/s ± 0%   68.0GB/s ± 0%  +131.38%  (p=0.008 n=5+5)
      SubVV/1000-8                      31.6GB/s ± 0%   80.1GB/s ± 0%  +153.08%  (p=0.008 n=5+5)
      SubVV/10000-8                     31.2GB/s ± 0%   56.7GB/s ± 0%   +81.79%  (p=0.008 n=5+5)
      SubVV/100000-8                    29.1GB/s ±10%   29.0GB/s ±18%      ~     (p=0.690 n=5+5)
      AddVW/1-8                          859MB/s ± 0%    859MB/s ± 0%    -0.01%  (p=0.008 n=5+5)
      AddVW/2-8                          811MB/s ± 1%    814MB/s ± 0%      ~     (p=0.413 n=5+4)
      AddVW/3-8                         2.08GB/s ± 0%   2.08GB/s ± 0%      ~     (p=0.206 n=5+5)
      AddVW/4-8                         2.46GB/s ± 0%   2.46GB/s ± 0%      ~     (p=0.056 n=5+5)
      AddVW/5-8                         2.75GB/s ± 0%   2.75GB/s ± 0%      ~     (p=0.508 n=5+5)
      AddVW/10-8                        3.63GB/s ± 0%   3.63GB/s ± 0%      ~     (p=0.214 n=5+5)
      AddVW/100-8                       4.79GB/s ± 0%   4.79GB/s ± 0%      ~     (p=0.500 n=5+5)
      AddVW/1000-8                      5.27GB/s ± 0%   5.25GB/s ± 0%    -0.43%  (p=0.008 n=5+5)
      AddVW/10000-8                     5.30GB/s ± 0%   5.30GB/s ± 0%      ~     (p=0.397 n=5+5)
      AddVW/100000-8                    5.27GB/s ± 1%   5.25GB/s ± 1%      ~     (p=0.690 n=5+5)
      AddMulVVW/1-8                     1.92GB/s ± 0%   1.96GB/s ± 1%    +1.95%  (p=0.008 n=5+5)
      AddMulVVW/2-8                     2.16GB/s ± 1%   2.25GB/s ± 1%    +4.32%  (p=0.008 n=5+5)
      AddMulVVW/3-8                     2.39GB/s ± 1%   2.25GB/s ± 3%    -5.79%  (p=0.008 n=5+5)
      AddMulVVW/4-8                     2.00GB/s ± 0%   2.31GB/s ± 1%   +15.31%  (p=0.008 n=5+5)
      AddMulVVW/5-8                     2.22GB/s ± 0%   2.14GB/s ± 0%    -3.86%  (p=0.008 n=5+5)
      AddMulVVW/10-8                    2.15GB/s ± 1%   2.25GB/s ± 0%    +5.03%  (p=0.008 n=5+5)
      AddMulVVW/100-8                   2.09GB/s ± 0%   2.14GB/s ± 0%    +2.25%  (p=0.008 n=5+5)
      AddMulVVW/1000-8                  2.04GB/s ± 0%   2.38GB/s ± 0%   +16.52%  (p=0.008 n=5+5)
      AddMulVVW/10000-8                 2.03GB/s ± 0%   2.10GB/s ± 0%    +3.64%  (p=0.008 n=5+5)
      AddMulVVW/100000-8                2.02GB/s ± 0%   2.02GB/s ± 1%      ~     (p=0.690 n=5+5)
      
      Change-Id: Ie482d67a7dbb5af6f5d81af2b3d9d14bd66336db
      Reviewed-on: https://go-review.googlesource.com/77831Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c4f3fe95
  3. 05 Mar, 2018 10 commits