1. 16 Oct, 2018 2 commits
    • Ben Shi's avatar
      cmd/compile: optimize 386's load/store combination · 4b78fe57
      Ben Shi authored
      This CL adds more combinations of two consequtive MOVBload/MOVBstore
      to a unique MOVWload/MOVWstore.
      
      1. The size of the go executable decreases about 4KB, and the total
      size of pkg/linux_386 (excluding cmd/compile) decreases about 1.5KB.
      
      2. There is no regression in the go1 benchmark result, excluding noise.
      name                     old time/op    new time/op    delta
      BinaryTree17-4              3.28s ± 2%     3.29s ± 2%    ~     (p=0.151 n=40+40)
      Fannkuch11-4                3.52s ± 1%     3.51s ± 1%  -0.28%  (p=0.002 n=40+40)
      FmtFprintfEmpty-4          45.4ns ± 4%    45.0ns ± 4%  -0.89%  (p=0.019 n=40+40)
      FmtFprintfString-4         81.9ns ± 7%    81.3ns ± 1%    ~     (p=0.660 n=40+25)
      FmtFprintfInt-4            91.9ns ± 9%    91.4ns ± 9%    ~     (p=0.249 n=40+40)
      FmtFprintfIntInt-4          143ns ± 4%     143ns ± 4%    ~     (p=0.760 n=40+40)
      FmtFprintfPrefixedInt-4     184ns ± 3%     183ns ± 4%    ~     (p=0.485 n=40+40)
      FmtFprintfFloat-4           408ns ± 3%     409ns ± 3%    ~     (p=0.961 n=40+40)
      FmtManyArgs-4               597ns ± 4%     602ns ± 3%    ~     (p=0.413 n=40+40)
      GobDecode-4                7.13ms ± 6%    7.14ms ± 6%    ~     (p=0.859 n=40+40)
      GobEncode-4                6.86ms ± 9%    6.94ms ± 7%    ~     (p=0.162 n=40+40)
      Gzip-4                      395ms ± 4%     396ms ± 3%    ~     (p=0.099 n=40+40)
      Gunzip-4                   40.9ms ± 4%    41.1ms ± 3%    ~     (p=0.064 n=40+40)
      HTTPClientServer-4         63.6µs ± 2%    63.6µs ± 3%    ~     (p=0.832 n=36+39)
      JSONEncode-4               16.1ms ± 3%    15.8ms ± 3%  -1.60%  (p=0.001 n=40+40)
      JSONDecode-4               61.0ms ± 3%    61.5ms ± 4%    ~     (p=0.065 n=40+40)
      Mandelbrot200-4            5.16ms ± 3%    5.18ms ± 3%    ~     (p=0.056 n=40+40)
      GoParse-4                  3.25ms ± 2%    3.23ms ± 3%    ~     (p=0.727 n=40+40)
      RegexpMatchEasy0_32-4      90.2ns ± 3%    89.3ns ± 6%  -0.98%  (p=0.002 n=40+40)
      RegexpMatchEasy0_1K-4       812ns ± 3%     815ns ± 3%    ~     (p=0.309 n=40+40)
      RegexpMatchEasy1_32-4       103ns ± 6%     103ns ± 5%    ~     (p=0.680 n=40+40)
      RegexpMatchEasy1_1K-4      1.01µs ± 4%    1.02µs ± 3%    ~     (p=0.326 n=40+33)
      RegexpMatchMedium_32-4      120ns ± 4%     120ns ± 5%    ~     (p=0.834 n=40+40)
      RegexpMatchMedium_1K-4     40.1µs ± 3%    39.5µs ± 4%  -1.35%  (p=0.000 n=40+40)
      RegexpMatchHard_32-4       2.27µs ± 6%    2.23µs ± 4%  -1.67%  (p=0.011 n=40+40)
      RegexpMatchHard_1K-4       67.2µs ± 3%    67.2µs ± 3%    ~     (p=0.149 n=40+40)
      Revcomp-4                   1.84s ± 2%     1.86s ± 3%  +0.70%  (p=0.020 n=40+40)
      Template-4                 69.0ms ± 4%    69.8ms ± 3%  +1.20%  (p=0.003 n=40+40)
      TimeParse-4                 438ns ± 3%     439ns ± 4%    ~     (p=0.650 n=40+40)
      TimeFormat-4                412ns ± 3%     412ns ± 3%    ~     (p=0.888 n=40+40)
      [Geo mean]                 65.2µs         65.2µs       -0.04%
      
      name                     old speed      new speed      delta
      GobDecode-4               108MB/s ± 6%   108MB/s ± 6%    ~     (p=0.855 n=40+40)
      GobEncode-4               112MB/s ± 9%   111MB/s ± 8%    ~     (p=0.159 n=40+40)
      Gzip-4                   49.2MB/s ± 4%  49.1MB/s ± 3%    ~     (p=0.102 n=40+40)
      Gunzip-4                  474MB/s ± 3%   472MB/s ± 3%    ~     (p=0.063 n=40+40)
      JSONEncode-4              121MB/s ± 3%   123MB/s ± 3%  +1.62%  (p=0.001 n=40+40)
      JSONDecode-4             31.9MB/s ± 3%  31.6MB/s ± 4%    ~     (p=0.070 n=40+40)
      GoParse-4                17.9MB/s ± 2%  17.9MB/s ± 3%    ~     (p=0.696 n=40+40)
      RegexpMatchEasy0_32-4     355MB/s ± 3%   358MB/s ± 5%  +0.99%  (p=0.002 n=40+40)
      RegexpMatchEasy0_1K-4    1.26GB/s ± 3%  1.26GB/s ± 3%    ~     (p=0.381 n=40+40)
      RegexpMatchEasy1_32-4     310MB/s ± 5%   310MB/s ± 4%    ~     (p=0.655 n=40+40)
      RegexpMatchEasy1_1K-4    1.01GB/s ± 4%  1.01GB/s ± 3%    ~     (p=0.351 n=40+33)
      RegexpMatchMedium_32-4   8.32MB/s ± 4%  8.34MB/s ± 5%    ~     (p=0.696 n=40+40)
      RegexpMatchMedium_1K-4   25.6MB/s ± 3%  25.9MB/s ± 4%  +1.36%  (p=0.000 n=40+40)
      RegexpMatchHard_32-4     14.1MB/s ± 6%  14.3MB/s ± 4%  +1.64%  (p=0.011 n=40+40)
      RegexpMatchHard_1K-4     15.2MB/s ± 3%  15.2MB/s ± 3%    ~     (p=0.147 n=40+40)
      Revcomp-4                 138MB/s ± 2%   137MB/s ± 3%  -0.70%  (p=0.021 n=40+40)
      Template-4               28.1MB/s ± 4%  27.8MB/s ± 3%  -1.19%  (p=0.003 n=40+40)
      [Geo mean]               83.7MB/s       83.7MB/s       +0.03%
      
      Change-Id: I2a2b3a942b5c45467491515d201179fd192e65c9
      Reviewed-on: https://go-review.googlesource.com/c/141650
      Run-TryBot: Ben Shi <powerman1st@163.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      4b78fe57
    • Ben Shi's avatar
      test/codegen: fix confusing test cases · 3785be30
      Ben Shi authored
      ARMv7's MULAF/MULSF/MULAD/MULSD are not fused,
      this CL fixes the confusing test cases.
      
      Change-Id: I35022e207e2f0d24a23a7f6f188e41ba8eee9886
      Reviewed-on: https://go-review.googlesource.com/c/142439
      Run-TryBot: Ben Shi <powerman1st@163.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAkhil Indurti <aindurti@gmail.com>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      3785be30
  2. 15 Oct, 2018 19 commits
    • Daniel Martí's avatar
      cmd/compile: don't panic on invalid map key declarations · 7f331313
      Daniel Martí authored
      In golang.org/cl/75310, the compiler's typechecker was changed so that
      map key types were validated at a later stage, to make sure that all the
      necessary type information was present.
      
      This still worked for map type declarations, but caused a regression for
      top-level map variable declarations. These now caused a fatal panic
      instead of a typechecking error.
      
      The cause was that checkMapKeys was run too early, before all
      typechecking was done. In particular, top-level map variable
      declarations are typechecked as external declarations, much later than
      where checkMapKeys was run.
      
      Add a test case for both exported and unexported top-level map
      declarations, and add a second call to checkMapKeys at the actual end of
      typechecking. Simply moving the one call isn't a good solution either;
      the comments expand on that.
      
      Fixes #28058.
      
      Change-Id: Ia5febb01a1d877447cf66ba44fb49a7e0f4f18a5
      Reviewed-on: https://go-review.googlesource.com/c/140417
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
      7f331313
    • Martin Möhrmann's avatar
      internal/cpu: add invalid option warnings and support to enable cpu features · 3e0227f6
      Martin Möhrmann authored
      This CL adds the ability to enable the cpu feature FEATURE by specifying
      FEATURE=on in GODEBUGCPU. Syntax support to enable cpu features is useful
      in combination with a preceeding all=off to disable all but some specific
      cpu features. Example:
      
      GODEBUGCPU=all=off,sse3=on
      
      This CL implements printing of warnings for invalid GODEBUGCPU settings:
      - requests enabling features that are not supported with the current CPU
      - specifying values different than 'on' or 'off' for a feature
      - settings for unkown cpu feature names
      
      Updates #27218
      
      Change-Id: Ic13e5c4c35426a390c50eaa4bd2a408ef2ee21be
      Reviewed-on: https://go-review.googlesource.com/c/141800
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      3e0227f6
    • Martin Möhrmann's avatar
      strconv: add comment explaining bounded shift in formatBits · f81d73e8
      Martin Möhrmann authored
      The compiler can generate better code for shifts bounded to be less than 32
      and thereby known to be less than any register width.
      See https://golang.org/cl/109776.
      
      Change-Id: I0c4c9f0faafa065fce3c10fd328830deb92f9e38
      Reviewed-on: https://go-review.googlesource.com/c/111735
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      f81d73e8
    • Martin Möhrmann's avatar
      cmd/compile: avoid string allocations when map key is struct or array literal · a0f57c3f
      Martin Möhrmann authored
      x = map[string(byteslice)] is already optimized by the compiler to avoid a
      string allocation. This CL generalizes this optimization to:
      
      x = map[T1{ ... Tn{..., string(byteslice), ...} ... }]
      where T1 to Tn is a nesting of struct and array literals.
      
      Found in a hot code path that used a struct of strings made from []byte
      slices to make a map lookup.
      
      There are no uses of the more generalized optimization in the standard library.
      Passes toolstash -cmp.
      
      MapStringConversion/32/simple    21.9ns ± 2%    21.9ns ± 3%      ~     (p=0.995 n=17+20)
      MapStringConversion/32/struct    28.8ns ± 3%    22.0ns ± 2%   -23.80%  (p=0.000 n=20+20)
      MapStringConversion/32/array     28.5ns ± 2%    21.9ns ± 2%   -23.14%  (p=0.000 n=19+16)
      MapStringConversion/64/simple    21.0ns ± 2%    21.1ns ± 3%      ~     (p=0.072 n=19+18)
      MapStringConversion/64/struct    72.4ns ± 3%    21.3ns ± 2%   -70.53%  (p=0.000 n=20+20)
      MapStringConversion/64/array     72.8ns ± 1%    21.0ns ± 2%   -71.13%  (p=0.000 n=17+19)
      
      name                           old allocs/op  new allocs/op  delta
      MapStringConversion/32/simple      0.00           0.00           ~     (all equal)
      MapStringConversion/32/struct      0.00           0.00           ~     (all equal)
      MapStringConversion/32/array       0.00           0.00           ~     (all equal)
      MapStringConversion/64/simple      0.00           0.00           ~     (all equal)
      MapStringConversion/64/struct      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
      MapStringConversion/64/array       1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
      
      Change-Id: I483b4d84d8d74b1025b62c954da9a365e79b7a3a
      Reviewed-on: https://go-review.googlesource.com/c/116275Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a0f57c3f
    • Martin Möhrmann's avatar
      cmd/compile: add intrinsics for runtime/internal/math on 386 and amd64 · a1ca4893
      Martin Möhrmann authored
      Add generic, 386 and amd64 specific ops and SSA rules for multiplication
      with overflow and branching based on overflow flags. Use these to intrinsify
      runtime/internal/math.MulUinptr.
      
      On amd64
        mul, overflow := math.MulUintptr(a, b)
        if overflow {
      is lowered to two instructions:
        MULQ SI
        JO 0x10ee35c
      
      No codegen tests as codegen can not currently test unexported internal runtime
      functions.
      
      amd64:
      name              old time/op  new time/op  delta
      MulUintptr/small  1.16ns ± 5%  0.88ns ± 6%  -24.36%  (p=0.000 n=19+20)
      MulUintptr/large  10.7ns ± 1%   1.1ns ± 1%  -89.28%  (p=0.000 n=17+19)
      
      Change-Id: If60739a86f820e5044d677276c21df90d3c7a86a
      Reviewed-on: https://go-review.googlesource.com/c/141820
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      a1ca4893
    • Martin Möhrmann's avatar
      cmd/compile: avoid implicit bounds checks after explicit checks for append · 9f66b41b
      Martin Möhrmann authored
      The generated code for the append builtin already checks if the appended
      to slice is large enough and calls growslice if that is not the case.
      Trust that this ensures the slice is large enough and avoid the
      implicit bounds check when slicing the slice to its new size.
      
      Removes 365 panicslice calls (-14%) from the go binary which
      reduces the binary size by ~12kbyte.
      
      Change-Id: I1b88418675ff409bc0b956853c9e95241274d5a6
      Reviewed-on: https://go-review.googlesource.com/c/119315
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      9f66b41b
    • Martin Möhrmann's avatar
      runtime/internal/math: add multiplication with overflow check · c9130cae
      Martin Möhrmann authored
      This CL adds a new internal math package for use by the runtime.
      The new package exports a MulUintptr function with uintptr arguments
      a and b and returns uintptr(a*b) and whether the full-width product
      x*y does overflow the uintptr value range (uintptr(x*y) != x*y).
      
      Uses of MulUinptr in the runtime and intrinsics for performance
      will be added in followup CLs.
      
      Updates #21588
      
      Change-Id: Ia5a02eeabc955249118e4edf68c67d9fc0858058
      Reviewed-on: https://go-review.googlesource.com/c/91755
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      c9130cae
    • Keith Randall's avatar
      cmd/compile: check order temp has correct type · 240a30da
      Keith Randall authored
      Followon from CL 140306
      
      Change-Id: Ic71033d2301105b15b60645d895a076107f44a2e
      Reviewed-on: https://go-review.googlesource.com/c/142178
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      240a30da
    • Alberto Donizetti's avatar
      test/codegen: test ppc64 TrailingZeros, OnesCount codegen · 7c96d87e
      Alberto Donizetti authored
      This change adds codegen tests for the intrinsification on ppc64 of
      the OnesCount{64,32,16,8}, and TrailingZeros{64,32,16,8} math/bits
      functions.
      
      Change-Id: Id3364921fbd18316850e15c8c71330c906187fdb
      Reviewed-on: https://go-review.googlesource.com/c/141897Reviewed-by: default avatarLynn Boger <laboger@linux.vnet.ibm.com>
      7c96d87e
    • Josh Bleecher Snyder's avatar
      cmd/compile: fuse before branchelim · a55f3ee4
      Josh Bleecher Snyder authored
      The branchelim pass works better after fuse.
      Running fuse before branchelim also increases
      the stability of generated code amidst other compiler changes,
      which was the original motivation behind this change.
      
      The fuse pass is not cheap enough to run in its entirety
      before branchelim, but the most important half of it is.
      This change makes it possible to run "plain fuse" independently
      and does so before branchelim.
      
      During make.bash, elimIf occurrences increase from 4244 to 4288 (1%),
      and elimIfElse occurrences increase from 989 to 1079 (9%).
      
      Toolspeed impact is marginal; plain fuse pays for itself.
      
      name        old time/op       new time/op       delta
      Template          189ms ± 2%        189ms ± 2%    ~     (p=0.890 n=45+46)
      Unicode          93.2ms ± 5%       93.4ms ± 7%    ~     (p=0.790 n=48+48)
      GoTypes           662ms ± 4%        660ms ± 4%    ~     (p=0.186 n=48+49)
      Compiler          2.89s ± 4%        2.91s ± 3%  +0.89%  (p=0.050 n=49+44)
      SSA               8.23s ± 2%        8.21s ± 1%    ~     (p=0.165 n=46+44)
      Flate             123ms ± 4%        123ms ± 3%  +0.58%  (p=0.031 n=47+49)
      GoParser          154ms ± 4%        154ms ± 4%    ~     (p=0.492 n=49+48)
      Reflect           430ms ± 4%        429ms ± 4%    ~     (p=1.000 n=48+48)
      Tar               171ms ± 3%        170ms ± 4%    ~     (p=0.122 n=48+48)
      XML               232ms ± 3%        232ms ± 2%    ~     (p=0.850 n=46+49)
      [Geo mean]        394ms             394ms       +0.02%
      
      name        old user-time/op  new user-time/op  delta
      Template          236ms ± 5%        236ms ± 4%    ~     (p=0.934 n=50+50)
      Unicode           132ms ± 7%        130ms ± 9%    ~     (p=0.087 n=50+50)
      GoTypes           861ms ± 3%        867ms ± 4%    ~     (p=0.124 n=48+50)
      Compiler          3.93s ± 4%        3.94s ± 3%    ~     (p=0.584 n=49+44)
      SSA               12.2s ± 2%        12.3s ± 1%    ~     (p=0.610 n=46+45)
      Flate             149ms ± 4%        150ms ± 4%    ~     (p=0.194 n=48+49)
      GoParser          193ms ± 5%        191ms ± 6%    ~     (p=0.239 n=49+50)
      Reflect           553ms ± 5%        556ms ± 5%    ~     (p=0.091 n=49+49)
      Tar               218ms ± 5%        218ms ± 5%    ~     (p=0.359 n=49+50)
      XML               299ms ± 5%        298ms ± 4%    ~     (p=0.482 n=50+49)
      [Geo mean]        516ms             516ms       -0.01%
      
      name        old alloc/op      new alloc/op      delta
      Template         36.3MB ± 0%       36.3MB ± 0%  -0.02%  (p=0.000 n=49+49)
      Unicode          29.7MB ± 0%       29.7MB ± 0%    ~     (p=0.270 n=50+50)
      GoTypes           126MB ± 0%        126MB ± 0%  -0.34%  (p=0.000 n=50+49)
      Compiler          534MB ± 0%        531MB ± 0%  -0.50%  (p=0.000 n=50+50)
      SSA              1.98GB ± 0%       1.98GB ± 0%  -0.06%  (p=0.000 n=49+49)
      Flate            24.6MB ± 0%       24.6MB ± 0%  -0.29%  (p=0.000 n=50+50)
      GoParser         29.5MB ± 0%       29.4MB ± 0%  -0.15%  (p=0.000 n=49+50)
      Reflect          87.3MB ± 0%       87.2MB ± 0%  -0.13%  (p=0.000 n=49+50)
      Tar              35.6MB ± 0%       35.5MB ± 0%  -0.17%  (p=0.000 n=50+50)
      XML              48.2MB ± 0%       48.0MB ± 0%  -0.30%  (p=0.000 n=48+50)
      [Geo mean]       83.1MB            82.9MB       -0.20%
      
      name        old allocs/op     new allocs/op     delta
      Template           352k ± 0%         352k ± 0%  -0.01%  (p=0.004 n=49+49)
      Unicode            341k ± 0%         341k ± 0%    ~     (p=0.341 n=48+50)
      GoTypes           1.28M ± 0%        1.28M ± 0%  -0.03%  (p=0.000 n=50+49)
      Compiler          4.96M ± 0%        4.96M ± 0%  -0.05%  (p=0.000 n=50+49)
      SSA               15.5M ± 0%        15.5M ± 0%  -0.01%  (p=0.000 n=50+49)
      Flate              233k ± 0%         233k ± 0%  +0.01%  (p=0.032 n=49+49)
      GoParser           294k ± 0%         294k ± 0%    ~     (p=0.052 n=46+48)
      Reflect           1.04M ± 0%        1.04M ± 0%    ~     (p=0.171 n=50+47)
      Tar                343k ± 0%         343k ± 0%  -0.03%  (p=0.000 n=50+50)
      XML                429k ± 0%         429k ± 0%  -0.04%  (p=0.000 n=50+50)
      [Geo mean]         812k              812k       -0.02%
      
      Object files grow slightly; branchelim often increases binary size, at least on amd64.
      
      name        old object-bytes  new object-bytes  delta
      Template          509kB ± 0%        509kB ± 0%  -0.01%  (p=0.008 n=5+5)
      Unicode           224kB ± 0%        224kB ± 0%    ~     (all equal)
      GoTypes          1.84MB ± 0%       1.84MB ± 0%  +0.00%  (p=0.008 n=5+5)
      Compiler         6.71MB ± 0%       6.71MB ± 0%  +0.01%  (p=0.008 n=5+5)
      SSA              21.2MB ± 0%       21.2MB ± 0%  +0.01%  (p=0.008 n=5+5)
      Flate             324kB ± 0%        324kB ± 0%  -0.00%  (p=0.008 n=5+5)
      GoParser          404kB ± 0%        404kB ± 0%  -0.02%  (p=0.008 n=5+5)
      Reflect          1.40MB ± 0%       1.40MB ± 0%  +0.09%  (p=0.008 n=5+5)
      Tar               452kB ± 0%        452kB ± 0%  +0.06%  (p=0.008 n=5+5)
      XML               596kB ± 0%        596kB ± 0%  +0.00%  (p=0.008 n=5+5)
      [Geo mean]       1.04MB            1.04MB       +0.01%
      
      Change-Id: I535c711b85380ff657fc0f022bebd9cb14ddd07f
      Reviewed-on: https://go-review.googlesource.com/c/129378
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      a55f3ee4
    • Keith Randall's avatar
      cmd/compile: provide types for all order-allocated temporaries · 63e964e1
      Keith Randall authored
      Ensure that we correctly type the stack temps for regular closures,
      method function closures, and slice literals.
      
      Then we don't need to override the dummy types later.
      Furthermore, this allows order to reuse temporaries of these types.
      
      OARRAYLIT doesn't need a temporary as far as I can tell, so I
      removed that case from order.
      
      Change-Id: Ic58520fa50c90639393ff78f33d3c831d5c4acb9
      Reviewed-on: https://go-review.googlesource.com/c/140306Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      63e964e1
    • Keith Randall's avatar
      cmd/compile: fix gdb stepping test · 296b7aea
      Keith Randall authored
      Not sure why this changed behavior, but seems mostly harmless.
      
      Fixes #28198
      
      Change-Id: Ie25c6e1fcb64912a582c7ae7bf92c4c1642e83cb
      Reviewed-on: https://go-review.googlesource.com/c/141649Reviewed-by: default avatarDavid Chase <drchase@google.com>
      296b7aea
    • Ben Shi's avatar
      test/codegen: add tests of FMA for arm/arm64 · 93e27e01
      Ben Shi authored
      This CL adds tests of fused multiplication-accumulation
      on arm/arm64.
      
      Change-Id: Ic85d5277c0d6acb7e1e723653372dfaf96824a39
      Reviewed-on: https://go-review.googlesource.com/c/141652
      Run-TryBot: Ben Shi <powerman1st@163.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      93e27e01
    • Akhil Indurti's avatar
      internal/cpu: expose ARM feature flags for FMA · bb3bf5bb
      Akhil Indurti authored
      This change exposes feature flags needed to implement an FMA intrinsic
      on ARM CPUs via auxv's HWCAP bits. Specifically, it exposes HasVFPv4 to
      detect if an ARM processor has the fourth version of the vector floating
      point unit. The relevant instruction for this CL is VFMA, emitted in Go
      as FMULAD.
      
      Updates #26630.
      
      Change-Id: Ibbc04fb24c2b4d994f93762360f1a37bc6d83ff7
      Reviewed-on: https://go-review.googlesource.com/c/126315
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMartin Möhrmann <moehrmann@google.com>
      bb3bf5bb
    • Martin Möhrmann's avatar
      cmd/compile: simplify as2 method of *Order · d6e80069
      Martin Möhrmann authored
      Merge the two for loops that set up the node lists for
      temporaries into one for loop.
      
      Passes toolstash -cmp
      
      Change-Id: Ibc739115f38c8869b0dcfbf9819fdc2fc96962e0
      Reviewed-on: https://go-review.googlesource.com/c/141819Reviewed-by: default avatarKeith Randall <khr@golang.org>
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      d6e80069
    • avsharapov's avatar
      cmd/cgo: simplify switch statement to if statement · 9322b533
      avsharapov authored
      Change-Id: Ie7dce45d554fde69d682680f55abba6a7fc55036
      Reviewed-on: https://go-review.googlesource.com/c/142017Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      9322b533
    • Ivan Sharavuev's avatar
      pprof: replace bits = bits + "..." to bits += "..." where bits is a string. · e47c11d8
      Ivan Sharavuev authored
      Change-Id: Ic77ebbdf2670b7fdf2c381cd1ba768624b07e57c
      Reviewed-on: https://go-review.googlesource.com/c/141998
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      e47c11d8
    • OlgaVlPetrova's avatar
      src/cmd/compile/internal/ssa: replace `s = s + x' => 's += x'. · 85066acc
      OlgaVlPetrova authored
      Change-Id: I1f399a8a0aa200bfda01f97f920b1345e59956ba
      Reviewed-on: https://go-review.googlesource.com/c/142057
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      85066acc
    • Ben Shi's avatar
      test/codegen: add tests for multiplication-subtraction · c3208842
      Ben Shi authored
      This CL adds tests for armv7's MULS and arm64's MSUBW.
      
      Change-Id: Id0fd5d26fd477e4ed14389b0d33cad930423eb5b
      Reviewed-on: https://go-review.googlesource.com/c/141651
      Run-TryBot: Ben Shi <powerman1st@163.com>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c3208842
  3. 14 Oct, 2018 3 commits
    • Keith Randall's avatar
      cmd/compile: reuse temporaries in order pass · 389e9427
      Keith Randall authored
      Instead of allocating a new temporary each time one
      is needed, keep a list of temporaries which are free
      (have already been VARKILLed on every path) and use
      one of them.
      
      Should save a lot of stack space. In a function like this:
      
      func main() {
           fmt.Printf("%d %d\n", 2, 3)
           fmt.Printf("%d %d\n", 4, 5)
           fmt.Printf("%d %d\n", 6, 7)
      }
      
      The three [2]interface{} arrays used to hold the ... args
      all use the same autotmp, instead of 3 different autotmps
      as happened previous to this CL.
      
      Change-Id: I2d728e226f81e05ae68ca8247af62014a1b032d3
      Reviewed-on: https://go-review.googlesource.com/c/140301
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      389e9427
    • Keith Randall's avatar
      runtime,cmd/compile: pass strings and slices to convT2{E,I} by value · 0e9f8a21
      Keith Randall authored
      When we pass these types by reference, we usually have to allocate
      temporaries on the stack, initialize them, then pass their address
      to the conversion functions. It's simpler to pass these types
      directly by value.
      
      This particularly applies to conversions needed for fmt.Printf
      (to interface{} for constructing a [...]interface{}).
      
      func f(a, b, c string) {
           fmt.Printf("%s %s\n", a, b)
           fmt.Printf("%s %s\n", b, c)
      }
      
      This function's stack frame shrinks from 200 to 136 bytes, and
      its code shrinks from 535 to 453 bytes.
      
      The go binary shrinks 0.3%.
      
      Update #24286
      
      Aside: for this function f, we don't really need to allocate
      temporaries for the convT2E function. We could use the address
      of a, b, and c directly. That might get similar (or maybe better?)
      improvements. I investigated a bit, but it seemed complicated
      to do it safely. This change was much easier.
      
      Change-Id: I78cbe51b501fb41e1e324ce4203f0de56a1db82d
      Reviewed-on: https://go-review.googlesource.com/c/135377
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      0e9f8a21
    • Keith Randall's avatar
      cmd/compile: optimize loads from readonly globals into constants · 653a4bd8
      Keith Randall authored
      Instead of
         MOVB go.string."foo"(SB), AX
      do
         MOVB $102, AX
      
      When we know the global we're loading from is readonly, we can
      do that read at compile time.
      
      I've made this arch-dependent mostly because the cases where this
      happens often are memory->memory moves, and those don't get
      decomposed until lowering.
      
      Did amd64/386/arm/arm64. Other architectures could follow.
      
      Update #26498
      
      Change-Id: I41b1dc831b2cd0a52dac9b97f4f4457888a46389
      Reviewed-on: https://go-review.googlesource.com/c/141118
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      653a4bd8
  4. 13 Oct, 2018 4 commits
  5. 12 Oct, 2018 12 commits
    • Matthew Dempsky's avatar
      cmd/compile: remove ineffectual -i flag · b4150f76
      Matthew Dempsky authored
      This flag lost its usefulness in CL 34273.
      
      Change-Id: I033c29f105937139b4e359a340906be439f1ed07
      Reviewed-on: https://go-review.googlesource.com/c/141646
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
      b4150f76
    • Mak Kolybabi's avatar
      doc: fix spelling of `comp[]hensive` to `comp[r]ehensive` · 57456527
      Mak Kolybabi authored
      Change-Id: Idd93e45fab30e7496105b84fc2fce1884711b580
      GitHub-Last-Rev: 43aa04e876655e31fc1c4b2b5ae0702472e49102
      GitHub-Pull-Request: golang/go#27983
      Reviewed-on: https://go-review.googlesource.com/c/141645Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      57456527
    • Russ Cox's avatar
      regexp: add partial Deprecation comment to Copy · bf68744a
      Russ Cox authored
      Change-Id: I21b7817e604a48330f1ee250f7b1b2adc1f16067
      Reviewed-on: https://go-review.googlesource.com/c/139784
      Run-TryBot: Russ Cox <rsc@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      bf68744a
    • Russ Cox's avatar
      regexp: add DeepEqual test · 5160e0d1
      Russ Cox authored
      This locks in behavior we accidentally broke
      and then restored during the Go 1.11 cycle.
      See #26219.
      
      It also locks in new behavior that DeepEqual
      always works, instead of only usually working.
      
      This CL is the final piece of a series of CLs to make
      DeepEqual always work, by eliminating the machine
      cache and making other related optimizations.
      Overall, this whole sequence of CLs achieves:
      
      name                             old time/op    new time/op    delta
      Find-12                             264ns ± 3%     260ns ± 0%   -1.59%  (p=0.000 n=10+9)
      FindAllNoMatches-12                 140ns ± 2%     133ns ± 0%   -5.34%  (p=0.000 n=10+7)
      FindString-12                       256ns ± 0%     249ns ± 0%   -2.73%  (p=0.000 n=8+8)
      FindSubmatch-12                     339ns ± 1%     333ns ± 1%   -1.73%  (p=0.000 n=9+10)
      FindStringSubmatch-12               322ns ± 0%     322ns ± 1%     ~     (p=0.450 n=8+10)
      Literal-12                          100ns ± 2%      92ns ± 0%   -8.13%  (p=0.000 n=10+10)
      NotLiteral-12                      1.50µs ± 0%    1.47µs ± 0%   -1.65%  (p=0.000 n=8+8)
      MatchClass-12                      2.18µs ± 0%    2.15µs ± 0%   -1.05%  (p=0.000 n=10+9)
      MatchClass_InRange-12              2.12µs ± 0%    2.11µs ± 0%   -0.65%  (p=0.000 n=10+9)
      ReplaceAll-12                      1.41µs ± 0%    1.41µs ± 0%     ~     (p=0.254 n=7+10)
      AnchoredLiteralShortNonMatch-12    89.8ns ± 0%    81.5ns ± 0%   -9.22%  (p=0.000 n=8+9)
      AnchoredLiteralLongNonMatch-12      105ns ± 3%      97ns ± 0%   -7.21%  (p=0.000 n=10+10)
      AnchoredShortMatch-12               141ns ± 0%     128ns ± 0%   -9.22%  (p=0.000 n=9+9)
      AnchoredLongMatch-12                276ns ± 4%     253ns ± 2%   -8.23%  (p=0.000 n=10+10)
      OnePassShortA-12                    620ns ± 0%     587ns ± 0%   -5.26%  (p=0.000 n=10+6)
      NotOnePassShortA-12                 575ns ± 3%     547ns ± 1%   -4.77%  (p=0.000 n=10+10)
      OnePassShortB-12                    493ns ± 0%     455ns ± 0%   -7.62%  (p=0.000 n=8+9)
      NotOnePassShortB-12                 423ns ± 0%     406ns ± 1%   -3.95%  (p=0.000 n=8+10)
      OnePassLongPrefix-12                112ns ± 0%     109ns ± 1%   -2.77%  (p=0.000 n=9+10)
      OnePassLongNotPrefix-12             405ns ± 0%     349ns ± 0%  -13.74%  (p=0.000 n=8+9)
      MatchParallelShared-12              501ns ± 1%      38ns ± 2%  -92.42%  (p=0.000 n=10+10)
      MatchParallelCopied-12             39.1ns ± 0%    38.6ns ± 1%   -1.38%  (p=0.002 n=6+10)
      QuoteMetaAll-12                    94.6ns ± 0%    94.8ns ± 0%   +0.26%  (p=0.001 n=10+9)
      QuoteMetaNone-12                   52.7ns ± 0%    52.7ns ± 0%     ~     (all equal)
      Match/Easy0/32-12                  79.1ns ± 0%    72.0ns ± 0%   -8.95%  (p=0.000 n=9+9)
      Match/Easy0/1K-12                   307ns ± 1%     297ns ± 0%   -3.32%  (p=0.000 n=10+7)
      Match/Easy0/32K-12                 4.65µs ± 2%    4.67µs ± 1%     ~     (p=0.633 n=10+8)
      Match/Easy0/1M-12                   234µs ± 0%     234µs ± 0%     ~     (p=0.684 n=10+10)
      Match/Easy0/32M-12                 7.98ms ± 1%    7.96ms ± 0%   -0.31%  (p=0.014 n=9+9)
      Match/Easy0i/32-12                 1.13µs ± 1%    1.10µs ± 0%   -3.18%  (p=0.000 n=9+10)
      Match/Easy0i/1K-12                 32.5µs ± 0%    31.7µs ± 0%   -2.61%  (p=0.000 n=9+9)
      Match/Easy0i/32K-12                1.59ms ± 0%    1.26ms ± 0%  -20.71%  (p=0.000 n=9+7)
      Match/Easy0i/1M-12                 51.0ms ± 0%    40.4ms ± 0%  -20.68%  (p=0.000 n=10+7)
      Match/Easy0i/32M-12                 1.63s ± 0%     1.30s ± 0%  -20.62%  (p=0.001 n=7+7)
      Match/Easy1/32-12                  75.1ns ± 1%    67.4ns ± 0%  -10.24%  (p=0.000 n=8+10)
      Match/Easy1/1K-12                   861ns ± 0%     879ns ± 0%   +2.18%  (p=0.000 n=8+8)
      Match/Easy1/32K-12                 39.2µs ± 1%    34.1µs ± 0%  -13.01%  (p=0.000 n=10+8)
      Match/Easy1/1M-12                  1.38ms ± 0%    1.17ms ± 0%  -15.06%  (p=0.000 n=10+8)
      Match/Easy1/32M-12                 44.2ms ± 1%    37.5ms ± 0%  -15.15%  (p=0.000 n=10+9)
      Match/Medium/32-12                 1.04µs ± 1%    1.03µs ± 0%   -0.64%  (p=0.002 n=9+8)
      Match/Medium/1K-12                 31.3µs ± 0%    31.2µs ± 0%   -0.36%  (p=0.000 n=9+9)
      Match/Medium/32K-12                1.44ms ± 0%    1.20ms ± 0%  -17.02%  (p=0.000 n=8+7)
      Match/Medium/1M-12                 46.1ms ± 0%    38.2ms ± 0%  -17.14%  (p=0.001 n=6+8)
      Match/Medium/32M-12                 1.48s ± 0%     1.23s ± 0%  -17.10%  (p=0.000 n=9+7)
      Match/Hard/32-12                   1.54µs ± 1%    1.47µs ± 0%   -4.64%  (p=0.000 n=9+10)
      Match/Hard/1K-12                   46.4µs ± 1%    44.4µs ± 0%   -4.35%  (p=0.000 n=9+8)
      Match/Hard/32K-12                  2.19ms ± 0%    1.78ms ± 7%  -18.74%  (p=0.000 n=8+10)
      Match/Hard/1M-12                   70.1ms ± 0%    57.7ms ± 7%  -17.62%  (p=0.000 n=8+10)
      Match/Hard/32M-12                   2.24s ± 0%     1.84s ± 8%  -17.92%  (p=0.000 n=8+10)
      Match/Hard1/32-12                  8.17µs ± 1%    7.95µs ± 0%   -2.72%  (p=0.000 n=8+10)
      Match/Hard1/1K-12                   254µs ± 2%     245µs ± 0%   -3.62%  (p=0.000 n=9+10)
      Match/Hard1/32K-12                 9.58ms ± 1%    8.54ms ± 7%  -10.87%  (p=0.000 n=10+10)
      Match/Hard1/1M-12                   306ms ± 1%     271ms ± 8%  -11.42%  (p=0.000 n=9+10)
      Match/Hard1/32M-12                  9.79s ± 1%     8.58s ± 9%  -12.37%  (p=0.000 n=9+10)
      Match_onepass_regex/32-12           808ns ± 0%     716ns ± 1%  -11.39%  (p=0.000 n=8+9)
      Match_onepass_regex/1K-12          27.8µs ± 0%    19.9µs ± 2%  -28.51%  (p=0.000 n=8+9)
      Match_onepass_regex/32K-12          925µs ± 0%     631µs ± 2%  -31.71%  (p=0.000 n=9+9)
      Match_onepass_regex/1M-12          29.5ms ± 0%    20.2ms ± 2%  -31.53%  (p=0.000 n=10+9)
      Match_onepass_regex/32M-12          945ms ± 0%     648ms ± 2%  -31.39%  (p=0.000 n=9+9)
      CompileOnepass-12                  4.67µs ± 0%    4.60µs ± 0%   -1.48%  (p=0.000 n=10+10)
      [Geo mean]                         24.5µs         21.4µs       -12.94%
      
      https://perf.golang.org/search?q=upload:20181004.5
      
      Change-Id: Icb17b306830dc5489efbb55900937b94ce0eb047
      Reviewed-on: https://go-review.googlesource.com/c/139783
      Run-TryBot: Russ Cox <rsc@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      5160e0d1
    • Russ Cox's avatar
      regexp: evaluate context flags lazily · 3ca1f28e
      Russ Cox authored
      There's no point in computing whether we're at the
      beginning of the line if the NFA isn't going to ask.
      Wait to compute that until asked.
      
      Whatever minor slowdowns were introduced by
      the conversion to pools that were not repaid by
      other optimizations are taken care of by this one.
      
      name                             old time/op    new time/op    delta
      Find-12                             252ns ± 0%     260ns ± 0%   +3.34%  (p=0.000 n=10+8)
      FindAllNoMatches-12                 136ns ± 4%     134ns ± 4%   -0.96%  (p=0.033 n=10+10)
      FindString-12                       246ns ± 0%     250ns ± 0%   +1.46%  (p=0.000 n=8+10)
      FindSubmatch-12                     332ns ± 1%     332ns ± 0%     ~     (p=0.101 n=9+10)
      FindStringSubmatch-12               321ns ± 1%     322ns ± 1%     ~     (p=0.717 n=9+10)
      Literal-12                         91.6ns ± 0%    92.3ns ± 0%   +0.74%  (p=0.000 n=9+9)
      NotLiteral-12                      1.47µs ± 0%    1.47µs ± 0%   +0.38%  (p=0.000 n=9+8)
      MatchClass-12                      2.15µs ± 0%    2.15µs ± 0%   +0.39%  (p=0.000 n=10+10)
      MatchClass_InRange-12              2.09µs ± 0%    2.11µs ± 0%   +0.75%  (p=0.000 n=9+9)
      ReplaceAll-12                      1.40µs ± 0%    1.40µs ± 0%     ~     (p=0.525 n=10+10)
      AnchoredLiteralShortNonMatch-12    83.5ns ± 0%    81.6ns ± 0%   -2.28%  (p=0.000 n=9+10)
      AnchoredLiteralLongNonMatch-12      101ns ± 0%      97ns ± 1%   -3.54%  (p=0.000 n=10+10)
      AnchoredShortMatch-12               131ns ± 0%     128ns ± 0%   -2.29%  (p=0.000 n=10+9)
      AnchoredLongMatch-12                268ns ± 1%     252ns ± 1%   -6.04%  (p=0.000 n=10+10)
      OnePassShortA-12                    614ns ± 0%     587ns ± 1%   -4.33%  (p=0.000 n=6+10)
      NotOnePassShortA-12                 552ns ± 0%     547ns ± 1%   -0.89%  (p=0.000 n=10+10)
      OnePassShortB-12                    494ns ± 0%     455ns ± 0%   -7.96%  (p=0.000 n=9+9)
      NotOnePassShortB-12                 411ns ± 0%     406ns ± 0%   -1.30%  (p=0.000 n=9+9)
      OnePassLongPrefix-12                109ns ± 0%     108ns ± 1%     ~     (p=0.064 n=8+9)
      OnePassLongNotPrefix-12             403ns ± 0%     349ns ± 0%  -13.30%  (p=0.000 n=9+8)
      MatchParallelShared-12             38.9ns ± 1%    37.9ns ± 1%   -2.65%  (p=0.000 n=10+8)
      MatchParallelCopied-12             39.2ns ± 1%    38.3ns ± 2%   -2.20%  (p=0.001 n=10+10)
      QuoteMetaAll-12                    94.5ns ± 0%    94.7ns ± 0%   +0.18%  (p=0.043 n=10+9)
      QuoteMetaNone-12                   52.7ns ± 0%    52.7ns ± 0%     ~     (all equal)
      Match/Easy0/32-12                  72.2ns ± 0%    71.9ns ± 0%   -0.38%  (p=0.009 n=8+10)
      Match/Easy0/1K-12                   296ns ± 1%     297ns ± 0%   +0.51%  (p=0.001 n=10+9)
      Match/Easy0/32K-12                 4.57µs ± 3%    4.61µs ± 2%     ~     (p=0.280 n=10+10)
      Match/Easy0/1M-12                   234µs ± 0%     234µs ± 0%     ~     (p=0.986 n=10+10)
      Match/Easy0/32M-12                 7.96ms ± 0%    7.98ms ± 0%   +0.22%  (p=0.010 n=10+9)
      Match/Easy0i/32-12                 1.09µs ± 0%    1.10µs ± 0%   +0.23%  (p=0.000 n=8+9)
      Match/Easy0i/1K-12                 31.7µs ± 0%    31.7µs ± 0%   +0.09%  (p=0.003 n=9+8)
      Match/Easy0i/32K-12                1.61ms ± 0%    1.27ms ± 1%  -21.03%  (p=0.000 n=8+10)
      Match/Easy0i/1M-12                 51.4ms ± 0%    40.4ms ± 0%  -21.29%  (p=0.000 n=8+8)
      Match/Easy0i/32M-12                 1.65s ± 0%     1.30s ± 1%  -21.22%  (p=0.000 n=9+9)
      Match/Easy1/32-12                  67.6ns ± 1%    67.2ns ± 0%     ~     (p=0.085 n=10+9)
      Match/Easy1/1K-12                   873ns ± 2%     880ns ± 0%   +0.78%  (p=0.006 n=9+7)
      Match/Easy1/32K-12                 39.7µs ± 1%    34.3µs ± 3%  -13.53%  (p=0.000 n=10+10)
      Match/Easy1/1M-12                  1.41ms ± 1%    1.19ms ± 3%  -15.48%  (p=0.000 n=10+10)
      Match/Easy1/32M-12                 44.9ms ± 1%    38.0ms ± 2%  -15.21%  (p=0.000 n=10+10)
      Match/Medium/32-12                 1.04µs ± 0%    1.03µs ± 0%   -0.57%  (p=0.000 n=9+9)
      Match/Medium/1K-12                 31.2µs ± 0%    31.4µs ± 1%   +0.61%  (p=0.000 n=8+10)
      Match/Medium/32K-12                1.45ms ± 1%    1.20ms ± 0%  -17.70%  (p=0.000 n=10+8)
      Match/Medium/1M-12                 46.4ms ± 0%    38.4ms ± 2%  -17.32%  (p=0.000 n=6+9)
      Match/Medium/32M-12                 1.49s ± 1%     1.24s ± 1%  -16.81%  (p=0.000 n=10+10)
      Match/Hard/32-12                   1.47µs ± 0%    1.47µs ± 0%   -0.31%  (p=0.000 n=9+10)
      Match/Hard/1K-12                   44.5µs ± 1%    44.4µs ± 0%     ~     (p=0.075 n=10+10)
      Match/Hard/32K-12                  2.09ms ± 0%    1.78ms ± 7%  -14.88%  (p=0.000 n=8+10)
      Match/Hard/1M-12                   67.8ms ± 5%    56.9ms ± 7%  -16.05%  (p=0.000 n=10+10)
      Match/Hard/32M-12                   2.17s ± 5%     1.84s ± 6%  -15.21%  (p=0.000 n=10+10)
      Match/Hard1/32-12                  7.89µs ± 0%    7.94µs ± 0%   +0.61%  (p=0.000 n=9+9)
      Match/Hard1/1K-12                   246µs ± 0%     245µs ± 0%   -0.30%  (p=0.010 n=9+10)
      Match/Hard1/32K-12                 8.93ms ± 0%    8.17ms ± 0%   -8.44%  (p=0.000 n=9+8)
      Match/Hard1/1M-12                   286ms ± 0%     269ms ± 9%   -5.66%  (p=0.028 n=9+10)
      Match/Hard1/32M-12                  9.16s ± 0%     8.61s ± 8%   -5.98%  (p=0.028 n=9+10)
      Match_onepass_regex/32-12           825ns ± 0%     712ns ± 0%  -13.75%  (p=0.000 n=8+8)
      Match_onepass_regex/1K-12          28.7µs ± 1%    19.8µs ± 0%  -30.99%  (p=0.000 n=9+8)
      Match_onepass_regex/32K-12          950µs ± 1%     628µs ± 0%  -33.83%  (p=0.000 n=9+8)
      Match_onepass_regex/1M-12          30.4ms ± 0%    20.1ms ± 0%  -33.74%  (p=0.000 n=9+8)
      Match_onepass_regex/32M-12          974ms ± 1%     646ms ± 0%  -33.73%  (p=0.000 n=9+8)
      CompileOnepass-12                  4.60µs ± 0%    4.59µs ± 0%     ~     (p=0.063 n=8+9)
      [Geo mean]                         23.1µs         21.3µs        -7.44%
      
      https://perf.golang.org/search?q=upload:20181004.4
      
      Change-Id: I47cdd09f6dcde1d7c317080e9b4df42c7d0a8d24
      Reviewed-on: https://go-review.googlesource.com/c/139782
      Run-TryBot: Russ Cox <rsc@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      3ca1f28e
    • Russ Cox's avatar
      regexp: use pools for NFA machines · a376435a
      Russ Cox authored
      Now the machine struct is only used for NFA execution.
      Use global pools to cache machines instead of per-Regexp lists.
      
      Also eliminate some tail calls in NFA execution, to pay for
      the added overhead of sync.Pool.
      
      name                             old time/op    new time/op    delta
      Find-12                             252ns ± 0%     252ns ± 0%     ~     (p=1.000 n=10+10)
      FindAllNoMatches-12                 134ns ± 1%     136ns ± 4%     ~     (p=0.443 n=9+10)
      FindString-12                       246ns ± 0%     246ns ± 0%   -0.16%  (p=0.046 n=10+8)
      FindSubmatch-12                     333ns ± 2%     332ns ± 1%     ~     (p=0.489 n=10+9)
      FindStringSubmatch-12               320ns ± 0%     321ns ± 1%   +0.55%  (p=0.005 n=10+9)
      Literal-12                         91.1ns ± 0%    91.6ns ± 0%   +0.55%  (p=0.000 n=10+9)
      NotLiteral-12                      1.45µs ± 0%    1.47µs ± 0%   +0.82%  (p=0.000 n=10+9)
      MatchClass-12                      2.19µs ± 0%    2.15µs ± 0%   -2.01%  (p=0.000 n=9+10)
      MatchClass_InRange-12              2.09µs ± 0%    2.09µs ± 0%     ~     (p=0.082 n=10+9)
      ReplaceAll-12                      1.39µs ± 0%    1.40µs ± 0%   +0.50%  (p=0.000 n=10+10)
      AnchoredLiteralShortNonMatch-12    82.4ns ± 0%    83.5ns ± 0%   +1.36%  (p=0.000 n=8+9)
      AnchoredLiteralLongNonMatch-12      106ns ± 1%     101ns ± 0%   -4.36%  (p=0.000 n=10+10)
      AnchoredShortMatch-12               130ns ± 0%     131ns ± 0%   +0.77%  (p=0.000 n=9+10)
      AnchoredLongMatch-12                272ns ± 0%     268ns ± 1%   -1.46%  (p=0.000 n=8+10)
      OnePassShortA-12                    615ns ± 0%     614ns ± 0%     ~     (p=0.094 n=10+6)
      NotOnePassShortA-12                 549ns ± 0%     552ns ± 0%   +0.52%  (p=0.000 n=9+10)
      OnePassShortB-12                    494ns ± 0%     494ns ± 0%     ~     (p=0.247 n=8+9)
      NotOnePassShortB-12                 412ns ± 1%     411ns ± 0%     ~     (p=0.625 n=10+9)
      OnePassLongPrefix-12                108ns ± 0%     109ns ± 0%   +0.93%  (p=0.000 n=10+8)
      OnePassLongNotPrefix-12             402ns ± 0%     403ns ± 0%   +0.14%  (p=0.041 n=8+9)
      MatchParallelShared-12             38.6ns ± 2%    38.9ns ± 1%     ~     (p=0.172 n=9+10)
      MatchParallelCopied-12             39.4ns ± 7%    39.2ns ± 1%     ~     (p=0.423 n=10+10)
      QuoteMetaAll-12                    94.9ns ± 0%    94.5ns ± 0%   -0.42%  (p=0.000 n=9+10)
      QuoteMetaNone-12                   52.7ns ± 0%    52.7ns ± 0%     ~     (all equal)
      Match/Easy0/32-12                  72.1ns ± 0%    72.2ns ± 0%     ~     (p=0.435 n=9+8)
      Match/Easy0/1K-12                   298ns ± 0%     296ns ± 1%   -1.01%  (p=0.000 n=8+10)
      Match/Easy0/32K-12                 4.64µs ± 1%    4.57µs ± 3%   -1.39%  (p=0.030 n=10+10)
      Match/Easy0/1M-12                   234µs ± 0%     234µs ± 0%     ~     (p=0.971 n=10+10)
      Match/Easy0/32M-12                 7.95ms ± 0%    7.96ms ± 0%     ~     (p=0.278 n=9+10)
      Match/Easy0i/32-12                 1.10µs ± 0%    1.09µs ± 0%   -0.29%  (p=0.000 n=9+8)
      Match/Easy0i/1K-12                 31.8µs ± 1%    31.7µs ± 0%     ~     (p=0.704 n=10+9)
      Match/Easy0i/32K-12                1.62ms ± 1%    1.61ms ± 0%   -1.12%  (p=0.000 n=10+8)
      Match/Easy0i/1M-12                 51.8ms ± 0%    51.4ms ± 0%   -0.84%  (p=0.000 n=8+8)
      Match/Easy0i/32M-12                 1.65s ± 0%     1.65s ± 0%   -0.46%  (p=0.000 n=9+9)
      Match/Easy1/32-12                  67.7ns ± 1%    67.6ns ± 1%     ~     (p=0.723 n=10+10)
      Match/Easy1/1K-12                   873ns ± 0%     873ns ± 2%     ~     (p=0.345 n=10+9)
      Match/Easy1/32K-12                 39.4µs ± 0%    39.7µs ± 1%   +0.66%  (p=0.000 n=10+10)
      Match/Easy1/1M-12                  1.39ms ± 0%    1.41ms ± 1%   +1.10%  (p=0.000 n=10+10)
      Match/Easy1/32M-12                 44.3ms ± 0%    44.9ms ± 1%   +1.18%  (p=0.000 n=10+10)
      Match/Medium/32-12                 1.04µs ± 0%    1.04µs ± 0%   -0.58%  (p=0.000 n=9+9)
      Match/Medium/1K-12                 31.4µs ± 0%    31.2µs ± 0%   -0.62%  (p=0.000 n=8+8)
      Match/Medium/32K-12                1.45ms ± 0%    1.45ms ± 1%     ~     (p=0.356 n=9+10)
      Match/Medium/1M-12                 46.4ms ± 0%    46.4ms ± 0%     ~     (p=0.142 n=8+6)
      Match/Medium/32M-12                 1.49s ± 1%     1.49s ± 1%     ~     (p=0.739 n=10+10)
      Match/Hard/32-12                   1.48µs ± 0%    1.47µs ± 0%   -0.53%  (p=0.000 n=9+9)
      Match/Hard/1K-12                   45.0µs ± 1%    44.5µs ± 1%   -1.06%  (p=0.000 n=10+10)
      Match/Hard/32K-12                  2.24ms ± 0%    2.09ms ± 0%   -6.56%  (p=0.000 n=8+8)
      Match/Hard/1M-12                   71.6ms ± 0%    67.8ms ± 5%   -5.36%  (p=0.000 n=7+10)
      Match/Hard/32M-12                   2.29s ± 0%     2.17s ± 5%   -5.40%  (p=0.000 n=9+10)
      Match/Hard1/32-12                  7.89µs ± 0%    7.89µs ± 0%     ~     (p=0.053 n=9+9)
      Match/Hard1/1K-12                   244µs ± 0%     246µs ± 0%   +0.71%  (p=0.000 n=10+9)
      Match/Hard1/32K-12                 10.3ms ± 0%     8.9ms ± 0%  -13.76%  (p=0.000 n=10+9)
      Match/Hard1/1M-12                   331ms ± 0%     286ms ± 0%  -13.72%  (p=0.000 n=9+9)
      Match/Hard1/32M-12                  10.6s ± 0%      9.2s ± 0%  -13.72%  (p=0.000 n=10+9)
      Match_onepass_regex/32-12           830ns ± 0%     825ns ± 0%   -0.57%  (p=0.000 n=9+8)
      Match_onepass_regex/1K-12          28.7µs ± 1%    28.7µs ± 1%   -0.22%  (p=0.040 n=9+9)
      Match_onepass_regex/32K-12          949µs ± 0%     950µs ± 1%     ~     (p=0.236 n=8+9)
      Match_onepass_regex/1M-12          30.4ms ± 0%    30.4ms ± 0%     ~     (p=0.059 n=8+9)
      Match_onepass_regex/32M-12          973ms ± 0%     974ms ± 1%     ~     (p=0.258 n=9+9)
      CompileOnepass-12                  4.64µs ± 0%    4.60µs ± 0%   -0.90%  (p=0.000 n=10+8)
      [Geo mean]                         23.3µs         23.1µs        -1.16%
      
      https://perf.golang.org/search?q=upload:20181004.3
      
      Change-Id: I46f3d52ce89c8cd992cf554473c27af81fd81bfd
      Reviewed-on: https://go-review.googlesource.com/c/139781
      Run-TryBot: Russ Cox <rsc@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      a376435a
    • Russ Cox's avatar
      regexp: split one-pass execution out of machine struct · 60b29711
      Russ Cox authored
      This allows the one-pass executions to have their
      own pool of (much smaller) allocated structures.
      A step toward eliminating the per-Regexp machine cache.
      
      Not much effect on benchmarks, since there are no
      optimizations here, and pools are a tiny bit slower than a
      locked data structure for single-threaded code.
      
      name                             old time/op    new time/op    delta
      Find-12                             254ns ± 0%     252ns ± 0%  -0.94%  (p=0.000 n=9+10)
      FindAllNoMatches-12                 135ns ± 0%     134ns ± 1%  -0.49%  (p=0.002 n=9+9)
      FindString-12                       247ns ± 0%     246ns ± 0%  -0.24%  (p=0.003 n=8+10)
      FindSubmatch-12                     334ns ± 0%     333ns ± 2%    ~     (p=0.283 n=10+10)
      FindStringSubmatch-12               321ns ± 0%     320ns ± 0%  -0.51%  (p=0.000 n=9+10)
      Literal-12                         92.2ns ± 0%    91.1ns ± 0%  -1.25%  (p=0.000 n=9+10)
      NotLiteral-12                      1.47µs ± 0%    1.45µs ± 0%  -0.99%  (p=0.000 n=9+10)
      MatchClass-12                      2.17µs ± 0%    2.19µs ± 0%  +0.84%  (p=0.000 n=7+9)
      MatchClass_InRange-12              2.13µs ± 0%    2.09µs ± 0%  -1.70%  (p=0.000 n=10+10)
      ReplaceAll-12                      1.39µs ± 0%    1.39µs ± 0%  +0.51%  (p=0.000 n=10+10)
      AnchoredLiteralShortNonMatch-12    83.2ns ± 0%    82.4ns ± 0%  -0.96%  (p=0.000 n=8+8)
      AnchoredLiteralLongNonMatch-12      105ns ± 0%     106ns ± 1%    ~     (p=0.087 n=10+10)
      AnchoredShortMatch-12               131ns ± 0%     130ns ± 0%  -0.76%  (p=0.000 n=10+9)
      AnchoredLongMatch-12                267ns ± 0%     272ns ± 0%  +2.01%  (p=0.000 n=10+8)
      OnePassShortA-12                    611ns ± 0%     615ns ± 0%  +0.61%  (p=0.000 n=9+10)
      NotOnePassShortA-12                 552ns ± 0%     549ns ± 0%  -0.46%  (p=0.000 n=8+9)
      OnePassShortB-12                    491ns ± 0%     494ns ± 0%  +0.61%  (p=0.000 n=8+8)
      NotOnePassShortB-12                 412ns ± 0%     412ns ± 1%    ~     (p=0.151 n=9+10)
      OnePassLongPrefix-12                112ns ± 0%     108ns ± 0%  -3.57%  (p=0.000 n=10+10)
      OnePassLongNotPrefix-12             410ns ± 0%     402ns ± 0%  -1.95%  (p=0.000 n=9+8)
      MatchParallelShared-12             38.8ns ± 1%    38.6ns ± 2%    ~     (p=0.536 n=10+9)
      MatchParallelCopied-12             39.2ns ± 3%    39.4ns ± 7%    ~     (p=0.986 n=10+10)
      QuoteMetaAll-12                    94.6ns ± 0%    94.9ns ± 0%  +0.29%  (p=0.001 n=8+9)
      QuoteMetaNone-12                   52.7ns ± 0%    52.7ns ± 0%    ~     (all equal)
      Match/Easy0/32-12                  72.9ns ± 0%    72.1ns ± 0%  -1.07%  (p=0.000 n=9+9)
      Match/Easy0/1K-12                   298ns ± 0%     298ns ± 0%    ~     (p=0.140 n=6+8)
      Match/Easy0/32K-12                 4.60µs ± 2%    4.64µs ± 1%    ~     (p=0.171 n=10+10)
      Match/Easy0/1M-12                   235µs ± 0%     234µs ± 0%  -0.14%  (p=0.004 n=10+10)
      Match/Easy0/32M-12                 7.96ms ± 0%    7.95ms ± 0%  -0.12%  (p=0.043 n=10+9)
      Match/Easy0i/32-12                 1.09µs ± 0%    1.10µs ± 0%  +0.15%  (p=0.000 n=8+9)
      Match/Easy0i/1K-12                 31.7µs ± 0%    31.8µs ± 1%    ~     (p=0.905 n=9+10)
      Match/Easy0i/32K-12                1.61ms ± 0%    1.62ms ± 1%  +1.12%  (p=0.000 n=9+10)
      Match/Easy0i/1M-12                 51.4ms ± 0%    51.8ms ± 0%  +0.85%  (p=0.000 n=8+8)
      Match/Easy0i/32M-12                 1.65s ± 1%     1.65s ± 0%    ~     (p=0.113 n=9+9)
      Match/Easy1/32-12                  67.9ns ± 0%    67.7ns ± 1%    ~     (p=0.232 n=8+10)
      Match/Easy1/1K-12                   884ns ± 0%     873ns ± 0%  -1.29%  (p=0.000 n=9+10)
      Match/Easy1/32K-12                 39.2µs ± 0%    39.4µs ± 0%  +0.50%  (p=0.000 n=9+10)
      Match/Easy1/1M-12                  1.39ms ± 0%    1.39ms ± 0%  +0.29%  (p=0.000 n=9+10)
      Match/Easy1/32M-12                 44.2ms ± 1%    44.3ms ± 0%  +0.21%  (p=0.029 n=10+10)
      Match/Medium/32-12                 1.05µs ± 0%    1.04µs ± 0%  -0.27%  (p=0.001 n=8+9)
      Match/Medium/1K-12                 31.3µs ± 0%    31.4µs ± 0%  +0.39%  (p=0.000 n=9+8)
      Match/Medium/32K-12                1.45ms ± 0%    1.45ms ± 0%  +0.33%  (p=0.000 n=8+9)
      Match/Medium/1M-12                 46.2ms ± 0%    46.4ms ± 0%  +0.35%  (p=0.000 n=9+8)
      Match/Medium/32M-12                 1.48s ± 0%     1.49s ± 1%  +0.70%  (p=0.000 n=8+10)
      Match/Hard/32-12                   1.49µs ± 0%    1.48µs ± 0%  -0.43%  (p=0.000 n=10+9)
      Match/Hard/1K-12                   45.1µs ± 1%    45.0µs ± 1%    ~     (p=0.393 n=10+10)
      Match/Hard/32K-12                  2.18ms ± 1%    2.24ms ± 0%  +2.71%  (p=0.000 n=9+8)
      Match/Hard/1M-12                   69.7ms ± 1%    71.6ms ± 0%  +2.76%  (p=0.000 n=9+7)
      Match/Hard/32M-12                   2.23s ± 1%     2.29s ± 0%  +2.65%  (p=0.000 n=9+9)
      Match/Hard1/32-12                  7.89µs ± 0%    7.89µs ± 0%    ~     (p=0.286 n=9+9)
      Match/Hard1/1K-12                   244µs ± 0%     244µs ± 0%    ~     (p=0.905 n=9+10)
      Match/Hard1/32K-12                 10.3ms ± 0%    10.3ms ± 0%    ~     (p=0.796 n=10+10)
      Match/Hard1/1M-12                   331ms ± 0%     331ms ± 0%    ~     (p=0.167 n=8+9)
      Match/Hard1/32M-12                  10.6s ± 0%     10.6s ± 0%    ~     (p=0.315 n=8+10)
      Match_onepass_regex/32-12           812ns ± 0%     830ns ± 0%  +2.19%  (p=0.000 n=10+9)
      Match_onepass_regex/1K-12          28.5µs ± 0%    28.7µs ± 1%  +0.97%  (p=0.000 n=10+9)
      Match_onepass_regex/32K-12          936µs ± 0%     949µs ± 0%  +1.43%  (p=0.000 n=10+8)
      Match_onepass_regex/1M-12          30.2ms ± 0%    30.4ms ± 0%  +0.62%  (p=0.000 n=10+8)
      Match_onepass_regex/32M-12          970ms ± 0%     973ms ± 0%  +0.35%  (p=0.000 n=10+9)
      CompileOnepass-12                  4.63µs ± 1%    4.64µs ± 0%    ~     (p=0.060 n=10+10)
      [Geo mean]                         23.3µs         23.3µs       +0.12%
      
      https://perf.golang.org/search?q=upload:20181004.2
      
      Change-Id: Iff9e9f9d4a4698162126a2f300e8ed1b1a39361e
      Reviewed-on: https://go-review.googlesource.com/c/139780
      Run-TryBot: Russ Cox <rsc@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      60b29711
    • Russ Cox's avatar
      regexp: split bit-state execution out of machine struct · 2d4346b3
      Russ Cox authored
      This allows the bit-state executions to have their
      own pool of allocated structures. A step toward
      eliminating the per-Regexp machine cache.
      
      Note especially the -92% on MatchParallelShared.
      This is real but not a complete story: the other
      execution engines still need to be de-shared,
      but the benchmark was only using bit-state.
      
      The tiny slowdowns in unrelated code are noise.
      
      name                             old time/op    new time/op    delta
      Find-12                             264ns ± 3%     254ns ± 0%   -3.86%  (p=0.000 n=10+9)
      FindAllNoMatches-12                 140ns ± 2%     135ns ± 0%   -3.91%  (p=0.000 n=10+9)
      FindString-12                       256ns ± 0%     247ns ± 0%   -3.52%  (p=0.000 n=8+8)
      FindSubmatch-12                     339ns ± 1%     334ns ± 0%   -1.41%  (p=0.000 n=9+10)
      FindStringSubmatch-12               322ns ± 0%     321ns ± 0%   -0.21%  (p=0.005 n=8+9)
      Literal-12                          100ns ± 2%      92ns ± 0%   -8.10%  (p=0.000 n=10+9)
      NotLiteral-12                      1.50µs ± 0%    1.47µs ± 0%   -1.91%  (p=0.000 n=8+9)
      MatchClass-12                      2.18µs ± 0%    2.17µs ± 0%   -0.20%  (p=0.001 n=10+7)
      MatchClass_InRange-12              2.12µs ± 0%    2.13µs ± 0%   +0.23%  (p=0.000 n=10+10)
      ReplaceAll-12                      1.41µs ± 0%    1.39µs ± 0%   -1.30%  (p=0.000 n=7+10)
      AnchoredLiteralShortNonMatch-12    89.8ns ± 0%    83.2ns ± 0%   -7.35%  (p=0.000 n=8+8)
      AnchoredLiteralLongNonMatch-12      105ns ± 3%     105ns ± 0%     ~     (p=0.186 n=10+10)
      AnchoredShortMatch-12               141ns ± 0%     131ns ± 0%   -7.09%  (p=0.000 n=9+10)
      AnchoredLongMatch-12                276ns ± 4%     267ns ± 0%   -3.23%  (p=0.000 n=10+10)
      OnePassShortA-12                    620ns ± 0%     611ns ± 0%   -1.39%  (p=0.000 n=10+9)
      NotOnePassShortA-12                 575ns ± 3%     552ns ± 0%   -3.97%  (p=0.000 n=10+8)
      OnePassShortB-12                    493ns ± 0%     491ns ± 0%   -0.33%  (p=0.000 n=8+8)
      NotOnePassShortB-12                 423ns ± 0%     412ns ± 0%   -2.60%  (p=0.000 n=8+9)
      OnePassLongPrefix-12                112ns ± 0%     112ns ± 0%     ~     (all equal)
      OnePassLongNotPrefix-12             405ns ± 0%     410ns ± 0%   +1.23%  (p=0.000 n=8+9)
      MatchParallelShared-12              501ns ± 1%      39ns ± 1%  -92.27%  (p=0.000 n=10+10)
      MatchParallelCopied-12             39.1ns ± 0%    39.2ns ± 3%     ~     (p=0.785 n=6+10)
      QuoteMetaAll-12                    94.6ns ± 0%    94.6ns ± 0%     ~     (p=0.439 n=10+8)
      QuoteMetaNone-12                   52.7ns ± 0%    52.7ns ± 0%     ~     (all equal)
      Match/Easy0/32-12                  79.1ns ± 0%    72.9ns ± 0%   -7.85%  (p=0.000 n=9+9)
      Match/Easy0/1K-12                   307ns ± 1%     298ns ± 0%   -2.99%  (p=0.000 n=10+6)
      Match/Easy0/32K-12                 4.65µs ± 2%    4.60µs ± 2%     ~     (p=0.159 n=10+10)
      Match/Easy0/1M-12                   234µs ± 0%     235µs ± 0%   +0.17%  (p=0.003 n=10+10)
      Match/Easy0/32M-12                 7.98ms ± 1%    7.96ms ± 0%     ~     (p=0.278 n=9+10)
      Match/Easy0i/32-12                 1.13µs ± 1%    1.09µs ± 0%   -3.24%  (p=0.000 n=9+8)
      Match/Easy0i/1K-12                 32.5µs ± 0%    31.7µs ± 0%   -2.66%  (p=0.000 n=9+9)
      Match/Easy0i/32K-12                1.59ms ± 0%    1.61ms ± 0%   +0.75%  (p=0.000 n=9+9)
      Match/Easy0i/1M-12                 51.0ms ± 0%    51.4ms ± 0%   +0.77%  (p=0.000 n=10+8)
      Match/Easy0i/32M-12                 1.63s ± 0%     1.65s ± 1%   +1.24%  (p=0.000 n=7+9)
      Match/Easy1/32-12                  75.1ns ± 1%    67.9ns ± 0%   -9.54%  (p=0.000 n=8+8)
      Match/Easy1/1K-12                   861ns ± 0%     884ns ± 0%   +2.71%  (p=0.000 n=8+9)
      Match/Easy1/32K-12                 39.2µs ± 1%    39.2µs ± 0%     ~     (p=0.090 n=10+9)
      Match/Easy1/1M-12                  1.38ms ± 0%    1.39ms ± 0%     ~     (p=0.095 n=10+9)
      Match/Easy1/32M-12                 44.2ms ± 1%    44.2ms ± 1%     ~     (p=0.218 n=10+10)
      Match/Medium/32-12                 1.04µs ± 1%    1.05µs ± 0%   +1.05%  (p=0.000 n=9+8)
      Match/Medium/1K-12                 31.3µs ± 0%    31.3µs ± 0%   -0.14%  (p=0.004 n=9+9)
      Match/Medium/32K-12                1.44ms ± 0%    1.45ms ± 0%   +0.18%  (p=0.001 n=8+8)
      Match/Medium/1M-12                 46.1ms ± 0%    46.2ms ± 0%   +0.13%  (p=0.003 n=6+9)
      Match/Medium/32M-12                 1.48s ± 0%     1.48s ± 0%   +0.20%  (p=0.002 n=9+8)
      Match/Hard/32-12                   1.54µs ± 1%    1.49µs ± 0%   -3.60%  (p=0.000 n=9+10)
      Match/Hard/1K-12                   46.4µs ± 1%    45.1µs ± 1%   -2.78%  (p=0.000 n=9+10)
      Match/Hard/32K-12                  2.19ms ± 0%    2.18ms ± 1%   -0.51%  (p=0.006 n=8+9)
      Match/Hard/1M-12                   70.1ms ± 0%    69.7ms ± 1%   -0.52%  (p=0.006 n=8+9)
      Match/Hard/32M-12                   2.24s ± 0%     2.23s ± 1%   -0.42%  (p=0.046 n=8+9)
      Match/Hard1/32-12                  8.17µs ± 1%    7.89µs ± 0%   -3.42%  (p=0.000 n=8+9)
      Match/Hard1/1K-12                   254µs ± 2%     244µs ± 0%   -3.91%  (p=0.000 n=9+9)
      Match/Hard1/32K-12                 9.58ms ± 1%   10.35ms ± 0%   +8.00%  (p=0.000 n=10+10)
      Match/Hard1/1M-12                   306ms ± 1%     331ms ± 0%   +8.27%  (p=0.000 n=9+8)
      Match/Hard1/32M-12                  9.79s ± 1%    10.60s ± 0%   +8.29%  (p=0.000 n=9+8)
      Match_onepass_regex/32-12           808ns ± 0%     812ns ± 0%   +0.47%  (p=0.000 n=8+10)
      Match_onepass_regex/1K-12          27.8µs ± 0%    28.5µs ± 0%   +2.32%  (p=0.000 n=8+10)
      Match_onepass_regex/32K-12          925µs ± 0%     936µs ± 0%   +1.24%  (p=0.000 n=9+10)
      Match_onepass_regex/1M-12          29.5ms ± 0%    30.2ms ± 0%   +2.38%  (p=0.000 n=10+10)
      Match_onepass_regex/32M-12          945ms ± 0%     970ms ± 0%   +2.60%  (p=0.000 n=9+10)
      CompileOnepass-12                  4.67µs ± 0%    4.63µs ± 1%   -0.84%  (p=0.000 n=10+10)
      [Geo mean]                         24.5µs         23.3µs        -5.04%
      
      https://perf.golang.org/search?q=upload:20181004.1
      
      Change-Id: Idbc2b76223718265657819ff38be2d9aba1c54b4
      Reviewed-on: https://go-review.googlesource.com/c/139779
      Run-TryBot: Russ Cox <rsc@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      2d4346b3
    • Russ Cox's avatar
      testing: implement -benchtime=100x · 8e0aea16
      Russ Cox authored
      When running benchmarks with profilers and trying to
      compare one run against another, it is very useful to be
      able to force each run to execute exactly the same number
      of iterations.
      
      Discussion on the proposal issue #24735 led to the decision
      to overload -benchtime, so that instead of saying
      -benchtime 10s to run a benchmark for 10 seconds,
      you say -benchtime 100x to run a benchmark 100 times.
      
      Fixes #24735.
      
      Change-Id: Id17c5bd18bd09987bb48ed12420d61ae9e200fd7
      Reviewed-on: https://go-review.googlesource.com/c/139258
      Run-TryBot: Russ Cox <rsc@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      8e0aea16
    • Robert Griesemer's avatar
      go/types: remove a test case and update comment · 56131cbd
      Robert Griesemer authored
      The original need for the extra test case and issue was eliminated
      by https://golang.org/cl/116815 which introduced systematic cycle
      detection. Now that we correctly report the cycle, we can't say much
      about the invalid cast anyway (the type is invalid due to the cycle).
      
      A more sophisticated approach would be able to tell the size of
      a function type independent of the details of that type, but the
      type-checker is not set up for this kind of lazy type-checking.
      
      Fixes #23127.
      
      Change-Id: Ia8479e66baf630ce96f6f36770c8e1c810c59ddc
      Reviewed-on: https://go-review.googlesource.com/c/141640
      Run-TryBot: Robert Griesemer <gri@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAlan Donovan <adonovan@google.com>
      56131cbd
    • Martin Möhrmann's avatar
      internal/cpu: use 'off' for disabling cpu capabilities instead of '0' · 4fb8b1de
      Martin Möhrmann authored
      Updates #27218
      
      Change-Id: I4ce20376fd601b5f958d79014af7eaf89e9de613
      Reviewed-on: https://go-review.googlesource.com/c/141818
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      4fb8b1de
    • Tobias Klauser's avatar
      internal/poll: add FD.Fsync on aix · d82e51a1
      Tobias Klauser authored
      Follow-up for CL 138717. This fixes the build of the os package on
      aix.
      
      Change-Id: I879b9360e71837ab622ae3a7b6144782cf5a9ce7
      Reviewed-on: https://go-review.googlesource.com/c/141797
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      d82e51a1