1. 23 Feb, 2018 14 commits
    • Austin Clements's avatar
      runtime: reduce arena size to 4MB on 64-bit Windows · 78846472
      Austin Clements authored
      Currently, we use 64MB heap arenas on 64-bit platforms. This works
      well on UNIX-like OSes because they treat untouched pages as
      essentially free. However, on Windows, committed memory is charged
      against a process whether or not it has demand-faulted physical pages
      in. Hence, on Windows, even a process with a tiny heap will commit
      64MB for one heap arena, plus another 32MB for the arena map. Things
      are much worse under the race detector, which increases the heap
      commitment by a factor of 5.5X, leading to 384MB of committed memory
      at runtime init.
      
      Fix this by reducing the heap arena size to 4MB on Windows.
      
      To counterbalance the effect of increasing the arena map size by a
      factor of 16, and to further reduce the impact of the commitment for
      the arena map, we switch from a single entry L1 arena map to a 64
      entry L1 arena map.
      
      Compared to the original arena design, this slows down the
      x/benchmarks garbage benchmark by 0.49% (the slow down of this commit
      alone is 1.59%, but the previous commit bought us a 1% speed-up):
      
      name                       old time/op  new time/op  delta
      Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.29ms ± 1%  +0.49%  (p=0.000 n=17+18)
      
      (https://perf.golang.org/search?q=upload:20180223.1)
      
      (This was measured on linux/amd64 by modifying its arena configuration
      as above.)
      
      Fixes #23900.
      
      Change-Id: I6b7fa5ecebee2947bf20cfeb78c248809469c6b1
      Reviewed-on: https://go-review.googlesource.com/96780
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      78846472
    • Austin Clements's avatar
      runtime: support a two-level arena map · ec252105
      Austin Clements authored
      Currently, the heap arena map is a single, large array that covers
      every possible arena frame in the entire address space. This is
      practical up to about 48 bits of address space with 64 MB arenas.
      
      However, there are two problems with this:
      
      1. mips64, ppc64, and s390x support full 64-bit address spaces (though
         on Linux only s390x has kernel support for 64-bit address spaces).
         On these platforms, it would be good to support these larger
         address spaces.
      
      2. On Windows, processes are charged for untouched memory, so for
         processes with small heaps, the mostly-untouched 32 MB arena map
         plus a 64 MB arena are significant overhead. Hence, it would be
         good to reduce both the arena map size and the arena size, but with
         a single-level arena, these are inversely proportional.
      
      This CL adds support for a two-level arena map. Arena frame numbers
      are now divided into arenaL1Bits of L1 index and arenaL2Bits of L2
      index.
      
      At the moment, arenaL1Bits is always 0, so we effectively have a
      single level map. We do a few things so that this has no cost beyond
      the current single-level map:
      
      1. We embed the L2 array directly in mheap, so if there's a single
         entry in the L2 array, the representation is identical to the
         current representation and there's no extra level of indirection.
      
      2. Hot code that accesses the arena map is structured so that it
         optimizes to nearly the same machine code as it does currently.
      
      3. We make some small tweaks to hot code paths and to the inliner
         itself to keep some important functions inlined despite their
         now-larger ASTs. In particular, this is necessary for
         heapBitsForAddr and heapBits.next.
      
      Possibly as a result of some of the tweaks, this actually slightly
      improves the performance of the x/benchmarks garbage benchmark:
      
      name                       old time/op  new time/op  delta
      Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.26ms ± 1%  -1.07%  (p=0.000 n=17+19)
      
      (https://perf.golang.org/search?q=upload:20180223.2)
      
      For #23900.
      
      Change-Id: If5164e0961754f97eb9eca58f837f36d759505ff
      Reviewed-on: https://go-review.googlesource.com/96779
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      ec252105
    • Austin Clements's avatar
      cmd/compile: teach front-end deadcode about && and || · 2dbf15e8
      Austin Clements authored
      The front-end dead code elimination is very simple. Currently, it just
      looks for if statements with constant boolean conditions. Its main
      purpose is to reduce load on the compiler and shrink code before
      inlining computes hairiness.
      
      This CL teaches front-end dead code elimination about short-circuiting
      boolean expressions && and ||, since they're essentially the same as
      if statements.
      
      This also teaches the inliner that the constant 'if' form left behind
      by deadcode is free.
      
      These changes will help with runtime modifications in the next CL that
      would otherwise inhibit inlining in some hot code paths. Currently,
      however, they have no significant impact on benchmarks.
      
      Change-Id: I886203b3c4acdbfef08148fddd7f3a7af5afc7c1
      Reviewed-on: https://go-review.googlesource.com/96778
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      2dbf15e8
    • Austin Clements's avatar
      runtime: rename "arena index" to "arena map" · 33b76920
      Austin Clements authored
      There are too many places where I want to talk about "indexing into
      the arena index". Make this less awkward and ambiguous by calling it
      the "arena map" instead.
      
      Change-Id: I726b0667bb2139dbc006175a0ec09a871cdf73f9
      Reviewed-on: https://go-review.googlesource.com/96777
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      33b76920
    • Austin Clements's avatar
      runtime: don't assume arena is in address order · 9680980e
      Austin Clements authored
      On amd64, the arena is no longer in address space order, but currently
      the heap dumper assumes that it is. Fix this assumption.
      
      Change-Id: Iab1953cd36b359d0fb78ed49e5eb813116a18855
      Reviewed-on: https://go-review.googlesource.com/96776
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      9680980e
    • Ian Lance Taylor's avatar
      path: use OS-specific function in MkdirAll, don't always keep trailing slash · b86e7668
      Ian Lance Taylor authored
      CL 86295 changed MkdirAll to always pass a trailing path separator to
      support extended-length paths on Windows.
      
      However, when Stat is called on an existing file followed by trailing
      slash, it will return a "not a directory" error, skipping the fast
      path at the beginning of MkdirAll.
      
      This change fixes MkdirAll to only pass the trailing path separator
      where required on Windows, by using an OS-specific function fixRootDirectory.
      
      Updates #23918
      
      Change-Id: I23f84a20e65ccce556efa743d026d352b4812c34
      Reviewed-on: https://go-review.googlesource.com/95255
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid du Colombier <0intro@gmail.com>
      Reviewed-by: default avatarAlex Brainman <alex.brainman@gmail.com>
      b86e7668
    • Daniel Martí's avatar
      cmd/vet: use type info to detect the atomic funcs · bae3fd66
      Daniel Martí authored
      Simply checking if a name is "atomic" isn't enough, as that might be a
      var or another imported package. Now that vet requires type information,
      we can do better. And add a simple regression test.
      
      Change-Id: Ibd2004428374e3628cd3cd0ffb5f37cedaf448ea
      Reviewed-on: https://go-review.googlesource.com/91795
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
      bae3fd66
    • Adam Langley's avatar
      crypto/x509: tighten EKU checking for requested EKUs. · 0681c7c3
      Adam Langley authored
      There are, sadly, many exceptions to EKU checking to reflect mistakes
      that CAs have made in practice. However, the requirements for checking
      requested EKUs against the leaf should be tighter than for checking leaf
      EKUs against a CA.
      
      Fixes #23884
      
      Change-Id: I05ea874c4ada0696d8bb18cac4377c0b398fcb5e
      Reviewed-on: https://go-review.googlesource.com/96379Reviewed-by: default avatarJonathan Rudenberg <jonathan@titanous.com>
      Reviewed-by: default avatarFilippo Valsorda <hi@filippo.io>
      Run-TryBot: Filippo Valsorda <hi@filippo.io>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      0681c7c3
    • Oleg Bulatov's avatar
      regexp: Regexp shouldn't keep references to inputs · 72635401
      Oleg Bulatov authored
      If you try to find something in a slice of bytes using a Regexp object,
      the byte array will not be released by GC until you use the Regexp object
      on another slice of bytes. It happens because the Regexp object keep
      references to the input data in its cache.
      
      Change-Id: I873107f15c1900aa53ccae5d29dbc885b9562808
      Reviewed-on: https://go-review.googlesource.com/96715Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      72635401
    • Alberto Donizetti's avatar
      cmd/compile: add code generation tests for sqrt intrinsics · 37a038a3
      Alberto Donizetti authored
      Add "sqrt-intrisified" code generation tests for mips64 and 386, where
      we weren't intrisifying math.Sqrt (see CL 96615 and CL 95916), and for
      mips and amd64, which lacked sqrt intrinsics tests.
      
      Change-Id: I0cfc08aec6eefd47f3cd7a5995a89393e8b7ed9e
      Reviewed-on: https://go-review.googlesource.com/96716
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      37a038a3
    • mingrammer's avatar
      runtime: rename the TestGcHashmapIndirection to TestGcMapIndirection · fceaa2e2
      mingrammer authored
      There was still the word 'Hashmap' in gc_test.go, so I renamed it to just 'Map'
      
      Previous renaming commit: https://golang.org/cl/90336
      
      Change-Id: I5b0e5c2229d1c30937c7216247f4533effb81ce7
      Reviewed-on: https://go-review.googlesource.com/96675Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      fceaa2e2
    • Alberto Donizetti's avatar
      cmd/compile: intrinsify math.Sqrt on 386 · 9ee78af8
      Alberto Donizetti authored
      It seems like all the pieces were already there, it only needed the
      final plumbing.
      
      Before:
      
      	0x001b 00027 (test.go:9)	MOVSD	X0, (SP)
      	0x0020 00032 (test.go:9)	CALL	math.Sqrt(SB)
      	0x0025 00037 (test.go:9)	MOVSD	8(SP), X0
      
      After:
      
      	0x0018 00024 (test.go:9)	SQRTSD	X0, X0
      
      name    old time/op  new time/op  delta
      Sqrt-4  4.60ns ± 2%  0.45ns ± 1%  -90.33%  (p=0.000 n=10+10)
      
      Change-Id: I0f623958e19e726840140bf9b495d3f3a9184b9d
      Reviewed-on: https://go-review.googlesource.com/96615
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      9ee78af8
    • Alberto Donizetti's avatar
      cmd/compile: use | in the last repetitive generic rules · f6c67813
      Alberto Donizetti authored
      This change or-ifies the last low-hanging rules in generic. Again,
      this is limited at short and repetitive rules, where the use or ors
      does not impact readability.
      
      Ran rulegen, no change in the actual compiler code.
      
      Change-Id: I972b523bc08532f173a3645b47d6936b6e1218c8
      Reviewed-on: https://go-review.googlesource.com/96335Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      f6c67813
    • Jerrin Shaji George's avatar
      runtime: fix a few typos in comments · 5b3cd560
      Jerrin Shaji George authored
      Change-Id: I07a1eb02ffc621c5696b49491181300bf411f822
      Reviewed-on: https://go-review.googlesource.com/96475Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      5b3cd560
  2. 22 Feb, 2018 10 commits
    • Robert Griesemer's avatar
      go/types: add -panic flag to gotype command for debugging · 70b09c72
      Robert Griesemer authored
      Setting -panic will cause gotype to panic with the first reported
      error, producing a stack trace for debugging.
      
      For #23914.
      
      Change-Id: I40c41cf10aa13d1dd9a099f727ef4201802de13a
      Reviewed-on: https://go-review.googlesource.com/96375Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      70b09c72
    • Tobias Klauser's avatar
      syscall: remove list of unimplemented syscalls · 6450c591
      Tobias Klauser authored
      The syscall package is frozen and we don't want to encourage anyone to
      implement these syscalls.
      
      Change-Id: I6b6e33e32a4b097da6012226aa15300735e50e9f
      Reviewed-on: https://go-review.googlesource.com/96315Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      6450c591
    • Robert Griesemer's avatar
      go/types: fix regression with short variable declarations · 2465ae64
      Robert Griesemer authored
      The variables on the lhs of a short variable declaration are
      only in scope after the variable declaration. Specifically,
      function literals on the rhs of a short variable declaration
      must not see newly declared variables on the lhs.
      
      This used to work and this bug was likely introduced with
      https://go-review.googlesource.com/c/go/+/83397 for go1.11.
      Luckily this is just an oversight and the fix is trivial:
      Simply use the mechanism for delayed type-checkin of function
      literals introduced in the before-mentioned change here as well.
      
      Fixes #24026.
      
      Change-Id: I74ce3a0d05c5a2a42ce4b27601645964f906e82d
      Reviewed-on: https://go-review.googlesource.com/96177Reviewed-by: default avatarAlan Donovan <adonovan@google.com>
      2465ae64
    • Ben Shi's avatar
      cmd/compile: fix FP accuracy issue introduced by FMA optimization on ARM64 · 7113d3a5
      Ben Shi authored
      Two ARM64 rules are added to avoid FP accuracy issue, which causes
      build failure.
      https://build.golang.org/log/1360f5c9ef3f37968216350283c1013e9681725d
      
      fixes #24033
      
      Change-Id: I9b74b584ab5cc53fa49476de275dc549adf97610
      Reviewed-on: https://go-review.googlesource.com/96355Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      7113d3a5
    • Alexey Palazhchenko's avatar
      database/sql: add String method to IsolationLevel · ef3ab3f5
      Alexey Palazhchenko authored
      Fixes #23632
      
      Change-Id: I7197e13df6cf28400a6dd86c110f41129550abb6
      Reviewed-on: https://go-review.googlesource.com/92235Reviewed-by: default avatarDaniel Theophanes <kardianos@gmail.com>
      ef3ab3f5
    • Alberto Donizetti's avatar
      cmd/compile: use | in the most repetitive s390x rules · 1e05924c
      Alberto Donizetti authored
      For now, limited to the most repetitive rules that are also short and
      simple, so that we can have a substantial conciseness win without
      compromising rules readability.
      
      Ran rulegen, no changes in the rewrite files.
      
      Change-Id: I8447784895a218c5c1b4dfa1cdb355bd73dabfd1
      Reviewed-on: https://go-review.googlesource.com/95955Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      1e05924c
    • Martin Möhrmann's avatar
      reflect: avoid calling common if type is known to be *rtype · 1dbe4c50
      Martin Möhrmann authored
      If the type of Type is known to be *rtype than the common
      function is a no-op and does not need to be called.
      
      name  old time/op  new time/op  delta
      New   31.0ns ± 5%  30.2ns ± 4%  -2.74%  (p=0.008 n=20+20)
      
      Change-Id: I5d00346dbc782e34c530166d1ee0499b24068b51
      Reviewed-on: https://go-review.googlesource.com/96115Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      1dbe4c50
    • Ben Shi's avatar
      cmd/compile: improve FP performance on ARM64 · f4c3072c
      Ben Shi authored
      FMADD/FMSUB/FNMADD/FNMSUB are efficient FP instructions, which can
      be used by the comiler to improve FP performance. This CL implements
      this optimization.
      
      1. The compilecmp benchmark shows little change.
      name        old time/op       new time/op       delta
      Template          2.35s ± 4%        2.38s ± 4%    ~     (p=0.161 n=15+15)
      Unicode           1.36s ± 5%        1.36s ± 4%    ~     (p=0.685 n=14+13)
      GoTypes           8.11s ± 3%        8.13s ± 2%    ~     (p=0.624 n=15+15)
      Compiler          40.5s ± 2%        40.7s ± 2%    ~     (p=0.137 n=15+15)
      SSA                115s ± 3%         116s ± 1%    ~     (p=0.270 n=15+14)
      Flate             1.46s ± 4%        1.45s ± 5%    ~     (p=0.870 n=15+15)
      GoParser          1.85s ± 2%        1.87s ± 3%    ~     (p=0.477 n=14+15)
      Reflect           5.11s ± 4%        5.10s ± 2%    ~     (p=0.624 n=15+15)
      Tar               2.23s ± 3%        2.23s ± 5%    ~     (p=0.624 n=15+15)
      XML               2.72s ± 5%        2.74s ± 3%    ~     (p=0.290 n=15+14)
      [Geo mean]        5.02s             5.03s       +0.29%
      
      name        old user-time/op  new user-time/op  delta
      Template          2.90s ± 2%        2.90s ± 3%    ~     (p=0.780 n=14+15)
      Unicode           1.71s ± 5%        1.70s ± 3%    ~     (p=0.458 n=14+13)
      GoTypes           9.77s ± 2%        9.76s ± 2%    ~     (p=0.838 n=15+15)
      Compiler          49.1s ± 2%        49.1s ± 2%    ~     (p=0.902 n=15+15)
      SSA                144s ± 1%         144s ± 2%    ~     (p=0.567 n=15+15)
      Flate             1.75s ± 5%        1.74s ± 3%    ~     (p=0.461 n=15+15)
      GoParser          2.22s ± 2%        2.21s ± 3%    ~     (p=0.233 n=15+15)
      Reflect           5.99s ± 2%        5.95s ± 1%    ~     (p=0.093 n=14+15)
      Tar               2.68s ± 2%        2.67s ± 3%    ~     (p=0.310 n=14+15)
      XML               3.22s ± 2%        3.24s ± 3%    ~     (p=0.512 n=15+15)
      [Geo mean]        6.08s             6.07s       -0.19%
      
      name        old text-bytes    new text-bytes    delta
      HelloSize         641kB ± 0%        641kB ± 0%    ~     (all equal)
      
      name        old data-bytes    new data-bytes    delta
      HelloSize        9.46kB ± 0%       9.46kB ± 0%    ~     (all equal)
      
      name        old bss-bytes     new bss-bytes     delta
      HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)
      
      name        old exe-bytes     new exe-bytes     delta
      HelloSize        1.24MB ± 0%       1.24MB ± 0%    ~     (all equal)
      
      2. The go1 benchmark shows little improvement in total (excluding noise),
      but some improvement in test case Mandelbrot200 and FmtFprintfFloat.
      name                     old time/op    new time/op    delta
      BinaryTree17-4              42.1s ± 2%     42.0s ± 2%    ~     (p=0.453 n=30+28)
      Fannkuch11-4                33.5s ± 3%     33.3s ± 3%  -0.38%  (p=0.045 n=30+30)
      FmtFprintfEmpty-4           534ns ± 0%     534ns ± 0%    ~     (all equal)
      FmtFprintfString-4         1.09µs ± 0%    1.09µs ± 0%  -0.27%  (p=0.000 n=23+17)
      FmtFprintfInt-4            1.16µs ± 3%    1.16µs ± 3%    ~     (p=0.714 n=30+30)
      FmtFprintfIntInt-4         1.76µs ± 1%    1.77µs ± 0%  +0.15%  (p=0.002 n=23+23)
      FmtFprintfPrefixedInt-4    2.21µs ± 3%    2.20µs ± 3%    ~     (p=0.390 n=30+30)
      FmtFprintfFloat-4          3.28µs ± 0%    3.11µs ± 0%  -5.01%  (p=0.000 n=25+26)
      FmtManyArgs-4              7.18µs ± 0%    7.19µs ± 0%  +0.13%  (p=0.000 n=24+25)
      GobDecode-4                94.9ms ± 0%    95.6ms ± 5%  +0.83%  (p=0.002 n=23+29)
      GobEncode-4                80.7ms ± 4%    79.8ms ± 0%  -1.11%  (p=0.003 n=30+24)
      Gzip-4                      4.58s ± 4%     4.59s ± 3%  +0.26%  (p=0.002 n=30+26)
      Gunzip-4                    449ms ± 4%     443ms ± 0%    ~     (p=0.096 n=30+26)
      HTTPClientServer-4          553µs ± 1%     548µs ± 1%  -0.96%  (p=0.000 n=30+30)
      JSONEncode-4                215ms ± 4%     214ms ± 4%  -0.29%  (p=0.000 n=30+30)
      JSONDecode-4                868ms ± 4%     875ms ± 5%  +0.79%  (p=0.008 n=30+30)
      Mandelbrot200-4            51.4ms ± 0%    46.7ms ± 3%  -9.09%  (p=0.000 n=25+26)
      GoParse-4                  42.1ms ± 0%    41.8ms ± 0%  -0.61%  (p=0.000 n=25+24)
      RegexpMatchEasy0_32-4      1.02µs ± 4%    1.02µs ± 4%  -0.17%  (p=0.000 n=30+30)
      RegexpMatchEasy0_1K-4      3.90µs ± 0%    3.95µs ± 4%    ~     (p=0.516 n=23+30)
      RegexpMatchEasy1_32-4       970ns ± 3%     973ns ± 3%    ~     (p=0.951 n=30+30)
      RegexpMatchEasy1_1K-4      6.43µs ± 3%    6.33µs ± 0%  -1.62%  (p=0.000 n=30+25)
      RegexpMatchMedium_32-4     1.75µs ± 0%    1.75µs ± 0%    ~     (p=0.422 n=25+24)
      RegexpMatchMedium_1K-4      568µs ± 3%     562µs ± 0%    ~     (p=0.079 n=30+24)
      RegexpMatchHard_32-4       30.8µs ± 0%    31.2µs ± 4%  +1.46%  (p=0.018 n=23+30)
      RegexpMatchHard_1K-4        932µs ± 0%     946µs ± 3%  +1.49%  (p=0.000 n=24+30)
      Revcomp-4                   7.69s ± 3%     7.69s ± 2%  +0.04%  (p=0.032 n=24+25)
      Template-4                  893ms ± 5%     880ms ± 6%  -1.53%  (p=0.000 n=30+30)
      TimeParse-4                4.90µs ± 3%    4.84µs ± 0%    ~     (p=0.080 n=30+25)
      TimeFormat-4               4.70µs ± 1%    4.76µs ± 0%  +1.21%  (p=0.000 n=23+26)
      [Geo mean]                  710µs          706µs       -0.63%
      
      name                     old speed      new speed      delta
      GobDecode-4              8.09MB/s ± 0%  8.03MB/s ± 5%  -0.77%  (p=0.002 n=23+29)
      GobEncode-4              9.52MB/s ± 4%  9.62MB/s ± 0%  +1.07%  (p=0.003 n=30+24)
      Gzip-4                   4.24MB/s ± 4%  4.23MB/s ± 3%  -0.35%  (p=0.002 n=30+26)
      Gunzip-4                 43.2MB/s ± 4%  43.8MB/s ± 0%    ~     (p=0.123 n=30+26)
      JSONEncode-4             9.03MB/s ± 4%  9.06MB/s ± 4%  +0.28%  (p=0.000 n=30+30)
      JSONDecode-4             2.24MB/s ± 4%  2.22MB/s ± 5%  -0.79%  (p=0.008 n=30+30)
      GoParse-4                1.38MB/s ± 1%  1.38MB/s ± 0%    ~     (p=0.401 n=25+17)
      RegexpMatchEasy0_32-4    31.4MB/s ± 4%  31.5MB/s ± 3%  +0.16%  (p=0.000 n=30+30)
      RegexpMatchEasy0_1K-4     262MB/s ± 0%   259MB/s ± 4%    ~     (p=0.693 n=23+30)
      RegexpMatchEasy1_32-4    33.0MB/s ± 3%  32.9MB/s ± 3%    ~     (p=0.139 n=30+30)
      RegexpMatchEasy1_1K-4     159MB/s ± 3%   162MB/s ± 0%  +1.60%  (p=0.000 n=30+25)
      RegexpMatchMedium_32-4    570kB/s ± 0%   570kB/s ± 0%    ~     (all equal)
      RegexpMatchMedium_1K-4   1.80MB/s ± 3%  1.82MB/s ± 0%  +1.09%  (p=0.007 n=30+24)
      RegexpMatchHard_32-4     1.04MB/s ± 0%  1.03MB/s ± 3%  -1.38%  (p=0.003 n=23+30)
      RegexpMatchHard_1K-4     1.10MB/s ± 0%  1.08MB/s ± 3%  -1.52%  (p=0.000 n=24+30)
      Revcomp-4                33.0MB/s ± 3%  33.0MB/s ± 2%    ~     (p=0.128 n=24+25)
      Template-4               2.17MB/s ± 5%  2.21MB/s ± 6%  +1.61%  (p=0.000 n=30+30)
      [Geo mean]               7.79MB/s       7.79MB/s       +0.05%
      
      Change-Id: Ied3dbdb5ba8e386168629cba06fcd4263bbb83e1
      Reviewed-on: https://go-review.googlesource.com/94901
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      f4c3072c
    • erifan01's avatar
      cmd/asm: add arm64 instructions for math optimization · f5de4200
      erifan01 authored
      Add arm64 HW instructions FMADDD, FMADDS, FMSUBD, FMSUBS, FNMADDD, FNMADDS,
      FNMSUBD, FNMSUBS, VFMLA, VFMLS, VMOV (element) for math optimization.
      
      Add check on register element index and test cases.
      
      Change-Id: Ice07c50b1a02d488ad2cde2a4e8aea93f3e3afff
      Reviewed-on: https://go-review.googlesource.com/90876Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      f5de4200
    • David Chase's avatar
      cmd/compile: decouple emitted block order from regalloc block order · c18ff184
      David Chase authored
      While tinkering with different block orders for the preemptible
      loop experiment, crashed the register allocator with a "bad"
      one (these exist).  Realized that one knob was controlling
      two things (register allocation and branch patterns) and
      decided that life would be simpler if the two orders were
      independent.
      
      Ran some experiments and determined that we have probably,
      mostly, been optimizing for register allocation effects, not
      branch effects.  Bad block orders for register allocation are
      somewhat costly.
      
      This will also allow separate experimentation with perhaps-
      better block orders for register allocation.
      
      Change-Id: I6ecf2f24cca178b6f8acc0d3c4caaef043c11ed9
      Reviewed-on: https://go-review.googlesource.com/47314
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      c18ff184
  3. 21 Feb, 2018 16 commits