1. 09 Mar, 2019 5 commits
    • Carlo Alberto Ferraris's avatar
      sync: allow inlining the Once.Do fast path · ca835484
      Carlo Alberto Ferraris authored
      Using Once.Do is now extremely cheap because the fast path is just an inlined
      atomic load of a variable that is written only once and a conditional jump.
      This is very beneficial for Once.Do because, due to its nature, the fast path
      will be used for every call after the first one.
      
      In a attempt to mimize code size increase, reorder the fields so that the
      pointer to Once is also the pointer to Once.done, that is the only field used
      in the hot path. This allows to use more compact instruction encodings or less
      instructions in the hot path (that is inlined at every callsite).
      
      name     old time/op  new time/op  delta
      Once     4.54ns ± 0%  2.06ns ± 0%  -54.59%  (p=0.000 n=19+16)
      Once-4   1.18ns ± 0%  0.55ns ± 0%  -53.39%  (p=0.000 n=15+16)
      Once-16  0.53ns ± 0%  0.17ns ± 0%  -67.92%  (p=0.000 n=18+17)
      
      linux/amd64 bin/go 14675861 (previous commit 14663387, +12474/+0.09%)
      
      Change-Id: Ie2708103ab473787875d66746d2f20f1d90a6916
      Reviewed-on: https://go-review.googlesource.com/c/go/+/152697
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      ca835484
    • Carlo Alberto Ferraris's avatar
      sync: allow inlining the Mutex.Lock fast path · 41cb0aed
      Carlo Alberto Ferraris authored
      name                    old time/op  new time/op  delta
      MutexUncontended        18.9ns ± 0%  16.2ns ± 0%  -14.29%  (p=0.000 n=19+19)
      MutexUncontended-4      4.75ns ± 1%  4.08ns ± 0%  -14.20%  (p=0.000 n=20+19)
      MutexUncontended-16     2.05ns ± 0%  2.11ns ± 0%   +2.93%  (p=0.000 n=19+16)
      Mutex                   19.3ns ± 1%  16.2ns ± 0%  -15.86%  (p=0.000 n=17+19)
      Mutex-4                 52.4ns ± 4%  48.6ns ± 9%   -7.22%  (p=0.000 n=20+20)
      Mutex-16                 139ns ± 2%   140ns ± 3%   +1.03%  (p=0.011 n=16+20)
      MutexSlack              18.9ns ± 1%  16.2ns ± 1%  -13.96%  (p=0.000 n=20+20)
      MutexSlack-4             225ns ± 8%   211ns ±10%   -5.94%  (p=0.000 n=18+19)
      MutexSlack-16           98.4ns ± 1%  90.9ns ± 1%   -7.60%  (p=0.000 n=17+18)
      MutexWork               58.2ns ± 3%  55.4ns ± 0%   -4.82%  (p=0.000 n=20+17)
      MutexWork-4              103ns ± 7%    95ns ±18%   -8.03%  (p=0.000 n=20+20)
      MutexWork-16             163ns ± 2%   155ns ± 2%   -4.47%  (p=0.000 n=18+18)
      MutexWorkSlack          57.7ns ± 1%  55.4ns ± 0%   -3.99%  (p=0.000 n=20+13)
      MutexWorkSlack-4         276ns ±13%   260ns ±10%   -5.64%  (p=0.001 n=19+19)
      MutexWorkSlack-16        147ns ± 0%   156ns ± 1%   +5.87%  (p=0.000 n=14+19)
      MutexNoSpin              968ns ± 0%   900ns ± 1%   -6.98%  (p=0.000 n=20+18)
      MutexNoSpin-4            270ns ± 2%   255ns ± 2%   -5.74%  (p=0.000 n=19+20)
      MutexNoSpin-16           120ns ± 4%   112ns ± 0%   -6.99%  (p=0.000 n=19+14)
      MutexSpin               3.13µs ± 1%  3.19µs ± 6%     ~     (p=0.401 n=20+20)
      MutexSpin-4              832ns ± 2%   831ns ± 1%   -0.17%  (p=0.023 n=16+18)
      MutexSpin-16             395ns ± 0%   399ns ± 0%   +0.94%  (p=0.000 n=17+19)
      RWMutexUncontended      69.5ns ± 0%  68.4ns ± 0%   -1.59%  (p=0.000 n=20+20)
      RWMutexUncontended-4    17.5ns ± 0%  16.7ns ± 0%   -4.30%  (p=0.000 n=18+17)
      RWMutexUncontended-16   7.92ns ± 0%  7.87ns ± 0%   -0.61%  (p=0.000 n=18+17)
      RWMutexWrite100         24.9ns ± 1%  25.0ns ± 1%   +0.32%  (p=0.000 n=20+20)
      RWMutexWrite100-4       46.2ns ± 4%  46.2ns ± 5%     ~     (p=0.840 n=19+20)
      RWMutexWrite100-16      69.9ns ± 5%  69.9ns ± 3%     ~     (p=0.545 n=20+19)
      RWMutexWrite10          27.0ns ± 2%  26.8ns ± 2%   -0.98%  (p=0.001 n=20+20)
      RWMutexWrite10-4        34.7ns ± 2%  35.0ns ± 4%     ~     (p=0.191 n=18+20)
      RWMutexWrite10-16       37.2ns ± 4%  37.3ns ± 2%     ~     (p=0.438 n=20+19)
      RWMutexWorkWrite100      164ns ± 0%   163ns ± 0%   -0.24%  (p=0.025 n=20+20)
      RWMutexWorkWrite100-4    193ns ± 3%   191ns ± 2%   -1.06%  (p=0.027 n=20+20)
      RWMutexWorkWrite100-16   210ns ± 3%   207ns ± 3%   -1.22%  (p=0.038 n=20+20)
      RWMutexWorkWrite10       153ns ± 0%   153ns ± 0%     ~     (all equal)
      RWMutexWorkWrite10-4     178ns ± 2%   179ns ± 2%     ~     (p=0.186 n=20+20)
      RWMutexWorkWrite10-16    192ns ± 2%   192ns ± 2%     ~     (p=0.731 n=19+20)
      
      linux/amd64 bin/go 14663387 (previous commit 14630572, +32815/+0.22%)
      
      Change-Id: I98171006dce14069b1a62da07c3d165455a7906b
      Reviewed-on: https://go-review.googlesource.com/c/go/+/148959Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      41cb0aed
    • Keith Randall's avatar
      cmd/compile: reverse order of slice bounds checks · 83a33d38
      Keith Randall authored
      Turns out this makes the fix for 28797 unnecessary, because this order
      ensures that the RHS of IsSliceInBounds ops are always nonnegative.
      
      The real reason for this change is that it also makes dealing with
      <0 values easier for reporting values in bounds check panics (issue #30116).
      
      Makes cmd/go negligibly smaller.
      
      Update #28797
      
      Change-Id: I1f25ba6d2b3b3d4a72df3105828aa0a4b629ce85
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166377
      Run-TryBot: Keith Randall <khr@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      83a33d38
    • Clément Chigot's avatar
      cmd/link: enable DWARF with external linker on aix/ppc64 · 3cf89e50
      Clément Chigot authored
      In order to allow DWARF with ld, the symbol table is adapted.
      In internal linkmode, each package is considered as a .FILE. However,
      current version of ld is crashing on a few programs because of
      relocations between DWARF symbols. Considering all packages as part of
      one .FILE seems to bypass this bug.
      As it might be fixed in a future release, the size of each package
      in DWARF sections is still retrieved and can be used when it's fixed.
      Moreover, it's improving internal linkmode which should have done it
      anyway.
      
      Change-Id: If3d023fe118b24b9f0f46d201a4849eee8d5e333
      Reviewed-on: https://go-review.googlesource.com/c/go/+/164006
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      3cf89e50
    • LE Manh Cuong's avatar
      debug/gosym: simplify parsing symbol name rule · b37b35ed
      LE Manh Cuong authored
      Symbol name with linker prefix like "type." and "go." is not parsed
      correctly and returns the prefix as parts of package name.
      
      So just returns empty string for symbol name start with linker prefix.
      
      Fixes #29551
      
      Change-Id: Idb4ce872345e5781a5a5da2b2146faeeebd9e63b
      Reviewed-on: https://go-review.googlesource.com/c/go/+/156397
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      b37b35ed
  2. 08 Mar, 2019 23 commits
  3. 07 Mar, 2019 12 commits
    • Cherry Zhang's avatar
      cmd/link: fix suspicious code in emitPcln · 3a62f4ee
      Cherry Zhang authored
      In cmd/link/internal/ld/pcln.go:emitPcln, the code and the
      comment don't match. I think the comment is right. Fix the code.
      
      As a consequence, on Linux/AMD64, internal linking with PIE
      buildmode with cgo (at least the cgo packages in the standard
      library) now works. Add a test.
      
      Change-Id: I091cf81ba89571052bc0ec1fa0a6a688dec07b04
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166017
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      3a62f4ee
    • Peter Waller's avatar
      cmd/compile/internal/ssa: set OFOR bBody.Pos to AST Pos · 7afd58d4
      Peter Waller authored
      Assign SSA OFOR's bBody.Pos to AST (*Node).Pos as it is created.
      
      An empty for loop has no other information which may be used to give
      correct position information in the resulting executable. Such a for
      loop may compile to a single `JMP *self` and it is important that the
      location of this is in the right place.
      
      Fixes #30167.
      
      Change-Id: Iec44f0281c462c33fac6b7b8ccfc2ef37434c247
      Reviewed-on: https://go-review.googlesource.com/c/go/+/163019
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      7afd58d4
    • fanzha02's avatar
      cmd/compile: optimize arm64 comparison of x and 0.0 with "FCMP $(0.0), Fn" · 27cce773
      fanzha02 authored
      Code:
      func comp(x float64) bool {return x < 0}
      
      Previous version:
        FMOVD	"".x(FP), F0
        FMOVD	ZR, F1
        FCMPD	F1, F0
        CSET	MI, R0
        MOVB	R0, "".~r1+8(FP)
        RET	(R30)
      
      Optimized version:
        FMOVD	"".x(FP), F0
        FCMPD	$(0.0), F0
        CSET	MI, R0
        MOVB	R0, "".~r1+8(FP)
        RET	(R30)
      
      Math package benchmark results:
      name                   old time/op          new time/op          delta
      Acos-8                   77.500000ns +- 0%    77.400000ns +- 0%   -0.13%  (p=0.000 n=9+10)
      Acosh-8                  98.600000ns +- 0%    98.100000ns +- 0%   -0.51%  (p=0.000 n=10+9)
      Asin-8                   67.600000ns +- 0%    66.600000ns +- 0%   -1.48%  (p=0.000 n=9+10)
      Asinh-8                 108.000000ns +- 0%   109.000000ns +- 0%   +0.93%  (p=0.000 n=10+10)
      Atan-8                   36.788889ns +- 0%    36.000000ns +- 0%   -2.14%  (p=0.000 n=9+10)
      Atanh-8                 104.000000ns +- 0%   105.000000ns +- 0%   +0.96%  (p=0.000 n=10+10)
      Atan2-8                  67.100000ns +- 0%    66.600000ns +- 0%   -0.75%  (p=0.000 n=10+10)
      Cbrt-8                   89.100000ns +- 0%    82.000000ns +- 0%   -7.97%  (p=0.000 n=10+10)
      Erf-8                    43.500000ns +- 0%    43.000000ns +- 0%   -1.15%  (p=0.000 n=10+10)
      Erfc-8                   49.000000ns +- 0%    48.220000ns +- 0%   -1.59%  (p=0.000 n=9+10)
      Erfinv-8                 59.100000ns +- 0%    58.600000ns +- 0%   -0.85%  (p=0.000 n=10+10)
      Erfcinv-8                59.100000ns +- 0%    58.600000ns +- 0%   -0.85%  (p=0.000 n=10+10)
      Expm1-8                  56.600000ns +- 0%    56.040000ns +- 0%   -0.99%  (p=0.000 n=8+10)
      Exp2Go-8                 97.600000ns +- 0%    99.400000ns +- 0%   +1.84%  (p=0.000 n=10+10)
      Dim-8                     2.500000ns +- 0%     2.250000ns +- 0%  -10.00%  (p=0.000 n=10+10)
      Mod-8                   108.000000ns +- 0%   106.000000ns +- 0%   -1.85%  (p=0.000 n=8+8)
      Frexp-8                  12.000000ns +- 0%    12.500000ns +- 0%   +4.17%  (p=0.000 n=10+10)
      Gamma-8                  67.100000ns +- 0%    67.600000ns +- 0%   +0.75%  (p=0.000 n=10+10)
      Hypot-8                  17.100000ns +- 0%    17.000000ns +- 0%   -0.58%  (p=0.002 n=8+10)
      Ilogb-8                   9.010000ns +- 0%     8.510000ns +- 0%   -5.55%  (p=0.000 n=10+9)
      J1-8                    288.000000ns +- 0%   287.000000ns +- 0%   -0.35%  (p=0.000 n=10+10)
      Jn-8                    605.000000ns +- 0%   604.000000ns +- 0%   -0.17%  (p=0.001 n=8+9)
      Logb-8                   10.600000ns +- 0%    10.500000ns +- 0%   -0.94%  (p=0.000 n=9+10)
      Log2-8                   16.500000ns +- 0%    17.000000ns +- 0%   +3.03%  (p=0.000 n=10+10)
      PowFrac-8               232.000000ns +- 0%   233.000000ns +- 0%   +0.43%  (p=0.000 n=10+10)
      Remainder-8              70.600000ns +- 0%    69.600000ns +- 0%   -1.42%  (p=0.000 n=10+10)
      SqrtGoLatency-8          77.600000ns +- 0%    76.600000ns +- 0%   -1.29%  (p=0.000 n=10+10)
      Tanh-8                   97.600000ns +- 0%    94.100000ns +- 0%   -3.59%  (p=0.000 n=10+10)
      Y1-8                    289.000000ns +- 0%   288.000000ns +- 0%   -0.35%  (p=0.000 n=10+10)
      Yn-8                    603.000000ns +- 0%   589.000000ns +- 0%   -2.32%  (p=0.000 n=10+10)
      
      Change-Id: I6920734f8662b329aa58f5b8e4eeae73b409984d
      Reviewed-on: https://go-review.googlesource.com/c/go/+/164719Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      27cce773
    • fanzha02's avatar
      cmd/compile: change the condition flags of floating-point comparisons in arm64 backend · 6efd51c6
      fanzha02 authored
      Current compiler reverses operands to work around NaN in
      "less than" and "less equal than" comparisons. But if we
      want to use "FCMPD/FCMPS $(0.0), Fn" to do some optimization,
      the workaround way does not work. Because assembler does
      not support instruction "FCMPD/FCMPS Fn, $(0.0)".
      
      This CL sets condition flags for floating-point comparisons
      to resolve this problem.
      
      Change-Id: Ia48076a1da95da64596d6e68304018cb301ebe33
      Reviewed-on: https://go-review.googlesource.com/c/go/+/164718
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      6efd51c6
    • Robert Griesemer's avatar
      cmd/compile: remove work-arounds for 0o/0O octals · a77f85a6
      Robert Griesemer authored
      With math/big supporting the new octal prefixes directly,
      the compiler doesn't have to manually convert such numbers
      into old-style 0-prefix octals anymore.
      
      Updates #12711.
      
      Change-Id: I300bdd095836595426a1478d68da179f39e5531a
      Reviewed-on: https://go-review.googlesource.com/c/go/+/165861Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      a77f85a6
    • Robert Griesemer's avatar
      math/big: support new octal prefixes 0o and 0O · 129c6e44
      Robert Griesemer authored
      This CL extends the various SetString and Parse methods for
      Ints, Rats, and Floats to accept the new octal prefixes.
      
      The main change is in natconv.go, all other changes are
      documentation and test updates.
      
      Finally, this CL also fixes TestRatSetString which silently
      dropped certain failures.
      
      Updates #12711.
      
      Change-Id: I5ee5879e25013ba1e6eda93ff280915f25ab5d55
      Reviewed-on: https://go-review.googlesource.com/c/go/+/165898Reviewed-by: default avatarEmmanuel Odeke <emm.odeke@gmail.com>
      129c6e44
    • Raul Silvera's avatar
      test: improve test coverage for heap sampling · 2dd066d4
      Raul Silvera authored
      Update the test in test/heapsampling.go to more thoroughly validate heap sampling.
      Lower the sampling rate on the test to ensure allocations both smaller and
      larger than the sampling rate are tested.
      
      Tighten up the validation check to a 10% difference between the unsampled and correct value.
      Because of the nature of random sampling, it is possible that the unsampled value fluctuates
      over that range. To avoid flakes, run the experiment three times and only report an issue if the
      same location consistently falls out of range on all experiments.
      
      This tests the sampling fix in cl/158337.
      
      Change-Id: I54a709e5c75827b8b1c2d87cdfb425ab09759677
      GitHub-Last-Rev: 7c04f126034f9e323efc220c896d75e7984ffd39
      GitHub-Pull-Request: golang/go#26944
      Reviewed-on: https://go-review.googlesource.com/c/go/+/129117
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      2dd066d4
    • Chris Marchesi's avatar
      net/http: let Transport request body writes use sendfile · 6ebfbbaa
      Chris Marchesi authored
      net.TCPConn has the ability to send data out using system calls such as
      sendfile when the source data comes from an *os.File. However, the way
      that I/O has been laid out in the transport means that the File is
      actually wrapped behind two outer io.Readers, and as such the TCP stack
      cannot properly type-assert the reader, ensuring that it falls back to
      genericReadFrom.
      
      This commit does the following:
      
      * Removes transferBodyReader and moves its functionality to a new
      doBodyCopy helper. This is not an io.Reader implementation, but no
      functionality is lost this way, and it allows us to unwrap one layer
      from the body.
      
      * The second layer of the body is unwrapped if the original reader
      was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
      body in if it's not a ReadCloser on its own. The unwrap operation
      passes through the existing body if there's no nopCloser.
      
      Note that this depends on change https://golang.org/cl/163737 to
      properly function, as the lack of ReaderFrom implementation otherwise
      means that this functionality is essentially walled off.
      
      Benchmarks between this commit and https://golang.org/cl/163862,
      incorporating https://golang.org/cl/163737:
      
      linux/amd64:
      name                        old time/op    new time/op    delta
      FileAndServer_1KB/NoTLS-4     53.2µs ± 0%    53.3µs ± 0%      ~     (p=0.075 n=10+9)
      FileAndServer_1KB/TLS-4       61.2µs ± 0%    60.7µs ± 0%    -0.77%  (p=0.000 n=10+9)
      FileAndServer_16MB/NoTLS-4    25.3ms ± 5%     3.8ms ± 6%   -84.95%  (p=0.000 n=10+10)
      FileAndServer_16MB/TLS-4      33.2ms ± 2%    13.4ms ± 2%   -59.57%  (p=0.000 n=10+10)
      FileAndServer_64MB/NoTLS-4     106ms ± 4%      16ms ± 2%   -84.45%  (p=0.000 n=10+10)
      FileAndServer_64MB/TLS-4       129ms ± 1%      54ms ± 3%   -58.32%  (p=0.000 n=8+10)
      
      name                        old speed      new speed      delta
      FileAndServer_1KB/NoTLS-4   19.2MB/s ± 0%  19.2MB/s ± 0%      ~     (p=0.095 n=10+9)
      FileAndServer_1KB/TLS-4     16.7MB/s ± 0%  16.9MB/s ± 0%    +0.78%  (p=0.000 n=10+9)
      FileAndServer_16MB/NoTLS-4   664MB/s ± 5%  4415MB/s ± 6%  +565.27%  (p=0.000 n=10+10)
      FileAndServer_16MB/TLS-4     505MB/s ± 2%  1250MB/s ± 2%  +147.32%  (p=0.000 n=10+10)
      FileAndServer_64MB/NoTLS-4   636MB/s ± 4%  4090MB/s ± 2%  +542.81%  (p=0.000 n=10+10)
      FileAndServer_64MB/TLS-4     522MB/s ± 1%  1251MB/s ± 3%  +139.95%  (p=0.000 n=8+10)
      
      darwin/amd64:
      name                        old time/op    new time/op     delta
      FileAndServer_1KB/NoTLS-8     93.0µs ± 5%     96.6µs ±11%      ~     (p=0.190 n=10+10)
      FileAndServer_1KB/TLS-8        105µs ± 7%      100µs ± 5%    -5.14%  (p=0.002 n=10+9)
      FileAndServer_16MB/NoTLS-8    87.5ms ±19%     10.0ms ± 6%   -88.57%  (p=0.000 n=10+10)
      FileAndServer_16MB/TLS-8      52.7ms ±11%     17.4ms ± 5%   -66.92%  (p=0.000 n=10+10)
      FileAndServer_64MB/NoTLS-8     363ms ±54%       39ms ± 7%   -89.24%  (p=0.000 n=10+10)
      FileAndServer_64MB/TLS-8       209ms ±13%       73ms ± 5%   -65.37%  (p=0.000 n=9+10)
      
      name                        old speed      new speed       delta
      FileAndServer_1KB/NoTLS-8   11.0MB/s ± 5%   10.6MB/s ±10%      ~     (p=0.184 n=10+10)
      FileAndServer_1KB/TLS-8     9.75MB/s ± 7%  10.27MB/s ± 5%    +5.26%  (p=0.003 n=10+9)
      FileAndServer_16MB/NoTLS-8   194MB/s ±16%   1680MB/s ± 6%  +767.83%  (p=0.000 n=10+10)
      FileAndServer_16MB/TLS-8     319MB/s ±10%    963MB/s ± 4%  +201.36%  (p=0.000 n=10+10)
      FileAndServer_64MB/NoTLS-8   180MB/s ±31%   1719MB/s ± 7%  +853.61%  (p=0.000 n=9+10)
      FileAndServer_64MB/TLS-8     321MB/s ±12%    926MB/s ± 5%  +188.24%  (p=0.000 n=9+10)
      
      Updates #30377.
      
      Change-Id: I631a73cea75371dfbb418c9cd487c4aa35e73fcd
      GitHub-Last-Rev: 4a77dd1b80140274bf3ed20ad7465ff3cc06febf
      GitHub-Pull-Request: golang/go#30378
      Reviewed-on: https://go-review.googlesource.com/c/go/+/163599
      Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarEmmanuel Odeke <emm.odeke@gmail.com>
      6ebfbbaa
    • Komu Wairagu's avatar
      runtime/pprof: document labels bug · 9a710158
      Komu Wairagu authored
      Currently only CPU profile utilizes tag information.
      This change documents that fact
      
      Updates #23458
      
      Change-Id: Ic893e85f63af0da9100d8cba7d3328c294e8c810
      GitHub-Last-Rev: be99a126296493b3085aa5ade91895b36fb1de73
      GitHub-Pull-Request: golang/go#27198
      Reviewed-on: https://go-review.googlesource.com/c/go/+/131275Reviewed-by: default avatarHyang-Ah Hana Kim <hyangah@gmail.com>
      9a710158
    • royeo's avatar
      log: make the name of error clearer · 91170d72
      royeo authored
      Change-Id: Id0398b51336cc74f2172d9b8e18cb1dcb520b9a0
      GitHub-Last-Rev: b5cf80bf9d7f79eab1a398ad3c03f3b424aafdf1
      GitHub-Pull-Request: golang/go#29931
      Reviewed-on: https://go-review.googlesource.com/c/go/+/159537Reviewed-by: default avatarEmmanuel Odeke <emm.odeke@gmail.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      91170d72
    • erifan01's avatar
      cmd/compile: eliminate unnecessary type conversions in TrailingZeros(16|8) for arm64 · 4e2b0dda
      erifan01 authored
      This CL eliminates unnecessary type conversion operations: OpZeroExt16to64 and OpZeroExt8to64.
      If the input argrument is a nonzero value, then ORconst operation can also be eliminated.
      
      Benchmarks:
      
      name               old time/op  new time/op  delta
      TrailingZeros-8    2.75ns ± 0%  2.75ns ± 0%     ~     (all equal)
      TrailingZeros8-8   3.49ns ± 1%  2.93ns ± 0%  -16.00%  (p=0.000 n=10+10)
      TrailingZeros16-8  3.49ns ± 1%  2.93ns ± 0%  -16.05%  (p=0.000 n=9+10)
      TrailingZeros32-8  2.67ns ± 1%  2.68ns ± 1%     ~     (p=0.468 n=10+10)
      TrailingZeros64-8  2.67ns ± 1%  2.65ns ± 0%   -0.62%  (p=0.022 n=10+9)
      
      code:
      
      func f16(x uint) { z = bits.TrailingZeros16(uint16(x)) }
      
      Before:
      
      "".f16 STEXT size=48 args=0x8 locals=0x0 leaf
              0x0000 00000 (test.go:7)        TEXT    "".f16(SB), LEAF|NOFRAME|ABIInternal, $0-8
              0x0000 00000 (test.go:7)        FUNCDATA        ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
              0x0000 00000 (test.go:7)        FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
              0x0000 00000 (test.go:7)        FUNCDATA        $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
              0x0000 00000 (test.go:7)        PCDATA  $2, ZR
              0x0000 00000 (test.go:7)        PCDATA  ZR, ZR
              0x0000 00000 (test.go:7)        MOVD    "".x(FP), R0
              0x0004 00004 (test.go:7)        MOVHU   R0, R0
              0x0008 00008 (test.go:7)        ORR     $65536, R0, R0
              0x000c 00012 (test.go:7)        RBIT    R0, R0
              0x0010 00016 (test.go:7)        CLZ     R0, R0
              0x0014 00020 (test.go:7)        MOVD    R0, "".z(SB)
              0x0020 00032 (test.go:7)        RET     (R30)
      
      This line of code is unnecessary:
              0x0004 00004 (test.go:7)        MOVHU   R0, R0
      
      After:
      
      "".f16 STEXT size=32 args=0x8 locals=0x0 leaf
              0x0000 00000 (test.go:7)        TEXT    "".f16(SB), LEAF|NOFRAME|ABIInternal, $0-8
              0x0000 00000 (test.go:7)        FUNCDATA        ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
              0x0000 00000 (test.go:7)        FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
              0x0000 00000 (test.go:7)        FUNCDATA        $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
              0x0000 00000 (test.go:7)        PCDATA  $2, ZR
              0x0000 00000 (test.go:7)        PCDATA  ZR, ZR
              0x0000 00000 (test.go:7)        MOVD    "".x(FP), R0
              0x0004 00004 (test.go:7)        ORR     $65536, R0, R0
              0x0008 00008 (test.go:7)        RBITW   R0, R0
              0x000c 00012 (test.go:7)        CLZW    R0, R0
              0x0010 00016 (test.go:7)        MOVD    R0, "".z(SB)
              0x001c 00028 (test.go:7)        RET     (R30)
      
      The situation of TrailingZeros8 is similar to TrailingZeros16.
      
      Change-Id: I473bdca06be8460a0be87abbae6fe640017e4c9d
      Reviewed-on: https://go-review.googlesource.com/c/go/+/156999Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      4e2b0dda
    • erifan01's avatar
      cmd/compile: add an optimization rule for math/bits.ReverseBytes16 on arm · fee84cc9
      erifan01 authored
      This CL adds two rules to turn patterns like ((x<<8) | (x>>8)) (the type of
      x is uint16, "|" can also be "+" or "^") to a REV16 instruction on arm v6+.
      This optimization rule can be used for math/bits.ReverseBytes16.
      
      Benchmarks on arm v6:
      name               old time/op  new time/op  delta
      ReverseBytes-32    2.86ns ± 0%  2.86ns ± 0%   ~     (all equal)
      ReverseBytes16-32  2.86ns ± 0%  2.86ns ± 0%   ~     (all equal)
      ReverseBytes32-32  1.29ns ± 0%  1.29ns ± 0%   ~     (all equal)
      ReverseBytes64-32  1.43ns ± 0%  1.43ns ± 0%   ~     (all equal)
      
      Change-Id: I819e633c9a9d308f8e476fb0c82d73fb73dd019f
      Reviewed-on: https://go-review.googlesource.com/c/go/+/159019Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      fee84cc9