1. 09 Mar, 2019 11 commits
    • Josh Bleecher Snyder's avatar
      math/big: add fast path for pure Go addVW for large z · fe24837c
      Josh Bleecher Snyder authored
      In the normal case, only a few words have to be updated when adding a word to a vector.
      When that happens, we can simply copy the rest of the words, which is much faster.
      However, the overhead of that makes it prohibitive for small vectors,
      so we check the size at the beginning.
      
      The implementation is a bit weird to allow addVW to continued to be inlined; see #30548.
      
      The AddVW benchmarks are surprising, but fully repeatable.
      The SubVW benchmarks are more or less as expected.
      I expect that removing the indirect function call will
      help both and make them a bit more normal.
      
      name            old time/op    new time/op     delta
      AddVW/1-8         4.27ns ± 2%     3.81ns ± 3%   -10.83%  (p=0.000 n=89+90)
      AddVW/2-8         4.91ns ± 2%     4.34ns ± 1%   -11.60%  (p=0.000 n=83+90)
      AddVW/3-8         5.77ns ± 4%     5.76ns ± 2%      ~     (p=0.365 n=91+87)
      AddVW/4-8         6.03ns ± 1%     6.03ns ± 1%      ~     (p=0.392 n=80+76)
      AddVW/5-8         6.48ns ± 2%     6.63ns ± 1%    +2.27%  (p=0.000 n=76+74)
      AddVW/10-8        9.56ns ± 2%     9.56ns ± 1%    -0.02%  (p=0.002 n=69+76)
      AddVW/100-8       90.6ns ± 0%     18.1ns ± 4%   -79.99%  (p=0.000 n=72+94)
      AddVW/1000-8       865ns ± 0%       85ns ± 6%   -90.14%  (p=0.000 n=66+96)
      AddVW/10000-8     8.57µs ± 2%     1.82µs ± 3%   -78.73%  (p=0.000 n=99+94)
      AddVW/100000-8    84.4µs ± 2%     31.8µs ± 4%   -62.29%  (p=0.000 n=93+98)
      
      name            old time/op    new time/op     delta
      SubVW/1-8         3.90ns ± 2%     4.13ns ± 4%    +6.02%  (p=0.000 n=92+95)
      SubVW/2-8         4.15ns ± 1%     5.20ns ± 1%   +25.22%  (p=0.000 n=83+85)
      SubVW/3-8         5.50ns ± 2%     6.22ns ± 6%   +13.21%  (p=0.000 n=91+97)
      SubVW/4-8         5.99ns ± 1%     6.63ns ± 1%   +10.63%  (p=0.000 n=79+61)
      SubVW/5-8         6.75ns ± 4%     6.88ns ± 2%    +1.82%  (p=0.000 n=98+73)
      SubVW/10-8        9.57ns ± 1%     9.56ns ± 1%    -0.13%  (p=0.000 n=77+64)
      SubVW/100-8       90.3ns ± 1%     18.1ns ± 2%   -80.00%  (p=0.000 n=75+94)
      SubVW/1000-8       860ns ± 4%       85ns ± 7%   -90.14%  (p=0.000 n=97+99)
      SubVW/10000-8     8.51µs ± 3%     1.77µs ± 6%   -79.21%  (p=0.000 n=100+97)
      SubVW/100000-8    84.4µs ± 3%     31.5µs ± 3%   -62.66%  (p=0.000 n=92+92)
      
      Change-Id: I721d7031d40f245b4a284f5bdd93e7bb85e7e937
      Reviewed-on: https://go-review.googlesource.com/c/go/+/164968
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
      fe24837c
    • Josh Bleecher Snyder's avatar
      math/big: remove bounds checks in pure Go implementations · 4c227a09
      Josh Bleecher Snyder authored
      These routines are quite sensitive to BCE.
      
      This change eliminates bounds checks from loops.
      It does so at the cost of a bit of safety:
      malformed input will now return incorrect answers
      instead of panicking.
      
      This isn't as bad as it sounds: math/big has very good
      test coverage, and the alternative implementations are in
      assembly, which could do much worse things with malformed input.
      
      If the compiler's BCE improves, so could these routines.
      
      Notable BCE improvements for these routines would be:
      
      * Allowing and propagating more cross-slice length hints.
        Then hints like _ = y[:len(z)] would eliminate bounds checks for y[i].
      
      * Propagating enough information so that we could do
        n := len(x)
        if len(z) < n {
          n = len(z)
        }
        and then have i < n eliminate the same bounds checks as
        i < len(x) && i < len(z) currently does.
      
      * Providing some way to do BCE for unrolled loops.
        Now that we have math/bits implementations,
        it is possible to write things like ADC chains in
        pure Go, if you can reasonably unroll loops.
      
      Benchmarks below are for amd64, using -tags=math_big_pure_go.
      
      name            old time/op    new time/op    delta
      AddVV/1-8         5.15ns ± 3%    4.65ns ± 4%   -9.81%  (p=0.000 n=93+86)
      AddVV/2-8         6.40ns ± 2%    5.58ns ± 4%  -12.78%  (p=0.000 n=90+95)
      AddVV/3-8         7.07ns ± 2%    6.66ns ± 2%   -5.88%  (p=0.000 n=87+83)
      AddVV/4-8         7.94ns ± 5%    7.41ns ± 4%   -6.65%  (p=0.000 n=94+98)
      AddVV/5-8         8.55ns ± 1%    8.80ns ± 0%   +2.92%  (p=0.000 n=87+92)
      AddVV/10-8        12.7ns ± 1%    12.3ns ± 1%   -3.12%  (p=0.000 n=83+71)
      AddVV/100-8        119ns ± 5%     117ns ± 4%   -1.60%  (p=0.000 n=93+90)
      AddVV/1000-8      1.14µs ± 4%    1.14µs ± 5%     ~     (p=0.812 n=95+91)
      AddVV/10000-8     11.4µs ± 5%    11.3µs ± 5%     ~     (p=0.503 n=97+96)
      AddVV/100000-8     114µs ± 4%     113µs ± 5%   -0.98%  (p=0.002 n=97+90)
      
      name            old time/op    new time/op    delta
      SubVV/1-8         5.23ns ± 5%    4.65ns ± 3%  -11.18%  (p=0.000 n=89+91)
      SubVV/2-8         6.49ns ± 5%    5.58ns ± 3%  -14.04%  (p=0.000 n=92+94)
      SubVV/3-8         7.10ns ± 3%    6.65ns ± 2%   -6.28%  (p=0.000 n=87+80)
      SubVV/4-8         8.04ns ± 1%    7.44ns ± 5%   -7.49%  (p=0.000 n=83+98)
      SubVV/5-8         8.55ns ± 2%    8.32ns ± 1%   -2.75%  (p=0.000 n=84+92)
      SubVV/10-8        12.7ns ± 1%    12.3ns ± 1%   -3.09%  (p=0.000 n=80+75)
      SubVV/100-8        119ns ± 0%     116ns ± 3%   -1.83%  (p=0.000 n=87+98)
      SubVV/1000-8      1.13µs ± 5%    1.13µs ± 3%     ~     (p=0.082 n=96+98)
      SubVV/10000-8     11.2µs ± 1%    11.3µs ± 3%   +0.76%  (p=0.000 n=87+97)
      SubVV/100000-8     112µs ± 2%     113µs ± 3%   +0.55%  (p=0.000 n=76+88)
      
      name            old time/op    new time/op    delta
      AddVW/1-8         4.30ns ± 4%    3.96ns ± 6%  -8.02%  (p=0.000 n=89+97)
      AddVW/2-8         5.15ns ± 2%    4.91ns ± 1%  -4.56%  (p=0.000 n=87+80)
      AddVW/3-8         5.59ns ± 3%    5.75ns ± 2%  +2.91%  (p=0.000 n=91+88)
      AddVW/4-8         6.20ns ± 1%    6.03ns ± 1%  -2.71%  (p=0.000 n=75+90)
      AddVW/5-8         6.93ns ± 3%    6.49ns ± 2%  -6.35%  (p=0.000 n=100+82)
      AddVW/10-8        10.0ns ± 7%     9.6ns ± 0%  -4.02%  (p=0.000 n=98+74)
      AddVW/100-8       91.1ns ± 1%    90.6ns ± 1%  -0.55%  (p=0.000 n=84+80)
      AddVW/1000-8       866ns ± 1%     856ns ± 4%  -1.06%  (p=0.000 n=69+96)
      AddVW/10000-8     8.64µs ± 1%    8.53µs ± 4%  -1.25%  (p=0.000 n=67+99)
      AddVW/100000-8    84.3µs ± 2%    85.4µs ± 4%  +1.22%  (p=0.000 n=89+99)
      
      name            old time/op    new time/op    delta
      SubVW/1-8         4.28ns ± 2%    3.82ns ± 3%  -10.63%  (p=0.000 n=91+89)
      SubVW/2-8         4.61ns ± 1%    4.48ns ± 3%   -2.67%  (p=0.000 n=94+96)
      SubVW/3-8         5.54ns ± 1%    5.81ns ± 4%   +4.87%  (p=0.000 n=92+97)
      SubVW/4-8         6.20ns ± 1%    6.08ns ± 2%   -1.99%  (p=0.000 n=71+88)
      SubVW/5-8         6.91ns ± 3%    6.64ns ± 1%   -3.90%  (p=0.000 n=97+70)
      SubVW/10-8        9.85ns ± 2%    9.62ns ± 0%   -2.31%  (p=0.000 n=82+62)
      SubVW/100-8       91.1ns ± 1%    90.9ns ± 3%   -0.14%  (p=0.010 n=71+93)
      SubVW/1000-8       859ns ± 3%     867ns ± 1%   +0.98%  (p=0.000 n=99+78)
      SubVW/10000-8     8.54µs ± 5%    8.57µs ± 2%   +0.38%  (p=0.007 n=98+92)
      SubVW/100000-8    84.5µs ± 3%    84.6µs ± 3%     ~     (p=0.334 n=95+94)
      
      name                old time/op    new time/op    delta
      AddMulVVW/1-8         5.43ns ± 3%    4.36ns ± 2%  -19.67%  (p=0.000 n=95+94)
      AddMulVVW/2-8         6.56ns ± 4%    6.11ns ± 1%   -6.90%  (p=0.000 n=91+91)
      AddMulVVW/3-8         8.00ns ± 1%    7.80ns ± 4%   -2.52%  (p=0.000 n=83+95)
      AddMulVVW/4-8         9.81ns ± 2%    9.53ns ± 1%   -2.86%  (p=0.000 n=77+64)
      AddMulVVW/5-8         11.4ns ± 3%    11.3ns ± 5%   -0.89%  (p=0.000 n=95+97)
      AddMulVVW/10-8        18.9ns ± 5%    19.1ns ± 5%   +0.89%  (p=0.000 n=91+94)
      AddMulVVW/100-8        165ns ± 5%     165ns ± 4%     ~     (p=0.427 n=97+98)
      AddMulVVW/1000-8      1.56µs ± 3%    1.56µs ± 4%     ~     (p=0.167 n=98+96)
      AddMulVVW/10000-8     15.7µs ± 5%    15.6µs ± 5%   -0.31%  (p=0.044 n=95+97)
      AddMulVVW/100000-8     156µs ± 3%     157µs ± 8%     ~     (p=0.373 n=72+99)
      
      Change-Id: Ibc720785d5b95f6a797103b1363843205f4d56bf
      Reviewed-on: https://go-review.googlesource.com/c/go/+/164966
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
      4c227a09
    • Daniel Martí's avatar
      reflect: make all flag.mustBe* methods inlinable · 788e038e
      Daniel Martí authored
      mustBe was barely over budget, so manually inlining the first flag.kind
      call is enough. Add a TODO to reverse that in the future, once the
      compiler gets better.
      
      mustBeExported and mustBeAssignable were over budget by a larger amount,
      so add slow path functions instead. This is the same strategy used in
      the sync package for common methods like Once.Do, for example.
      
      Lots of exported reflect.Value methods call these assert-like unexported
      methods, so avoiding the function call overhead in the common case does
      shave off a percent from most exported APIs.
      
      Finally, add the methods to TestIntendedInlining.
      
      While at it, replace a couple of uses of the 0 Kind with its descriptive
      name, Invalid.
      
      name     old time/op    new time/op    delta
      Call-8     68.0ns ± 1%    66.8ns ± 1%  -1.81%  (p=0.000 n=10+9)
      PtrTo-8    8.00ns ± 2%    7.83ns ± 0%  -2.19%  (p=0.000 n=10+9)
      
      Updates #7818.
      
      Change-Id: Ic1603b640519393f6b50dd91ec3767753eb9e761
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166462
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      788e038e
    • Daniel Martí's avatar
      cmd/compile: update TestIntendedInlining · cc5dc001
      Daniel Martí authored
      Value.CanInterface and Value.pointer are now inlinable, since we have a
      limited form of mid-stack inlining. Their calls to panic were preventing
      that in previous Go releases. The other three methods still go over
      budget, so update that comment.
      
      In recent commits, sync.Once.Do and multiple lock/unlock methods have
      also been made inlinable, so add those as well. They have standalone
      tests like test/inline_sync.go already, but it's best if the funcs are
      in this global test table too. They aren't inlinable on every platform
      yet, though.
      
      Finally, use math/bits.UintSize to check if GOARCH is 64-bit, now that
      we can.
      
      Change-Id: I65cc681b77015f7746dba3126637e236dcd494e0
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166461
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      cc5dc001
    • Carlo Alberto Ferraris's avatar
      sync: allow inlining the RWMutex.RUnlock fast path · 05051b56
      Carlo Alberto Ferraris authored
      RWMutex.RLock is already inlineable, so add a test for it as well.
      
      name                    old time/op  new time/op  delta
      RWMutexUncontended      66.5ns ± 0%  60.3ns ± 1%  -9.38%  (p=0.000 n=12+20)
      RWMutexUncontended-4    16.7ns ± 0%  15.3ns ± 1%  -8.49%  (p=0.000 n=17+20)
      RWMutexUncontended-16   7.86ns ± 0%  7.69ns ± 0%  -2.08%  (p=0.000 n=18+15)
      RWMutexWrite100         25.1ns ± 0%  24.0ns ± 1%  -4.28%  (p=0.000 n=20+18)
      RWMutexWrite100-4       46.7ns ± 5%  44.1ns ± 4%  -5.53%  (p=0.000 n=20+20)
      RWMutexWrite100-16      68.3ns ±11%  65.7ns ± 8%  -3.81%  (p=0.003 n=20+20)
      RWMutexWrite10          26.7ns ± 1%  25.7ns ± 0%  -3.75%  (p=0.000 n=17+14)
      RWMutexWrite10-4        34.9ns ± 2%  33.8ns ± 2%  -3.15%  (p=0.000 n=20+20)
      RWMutexWrite10-16       37.4ns ± 2%  36.1ns ± 2%  -3.51%  (p=0.000 n=18+20)
      RWMutexWorkWrite100      163ns ± 0%   162ns ± 0%  -0.89%  (p=0.000 n=18+20)
      RWMutexWorkWrite100-4    189ns ± 4%   184ns ± 4%  -2.89%  (p=0.000 n=19+20)
      RWMutexWorkWrite100-16   207ns ± 4%   200ns ± 2%  -3.07%  (p=0.000 n=19+20)
      RWMutexWorkWrite10       153ns ± 0%   151ns ± 1%  -0.75%  (p=0.000 n=20+20)
      RWMutexWorkWrite10-4     177ns ± 1%   176ns ± 2%  -0.63%  (p=0.004 n=17+20)
      RWMutexWorkWrite10-16    191ns ± 2%   189ns ± 1%  -0.83%  (p=0.015 n=20+17)
      
      linux/amd64 bin/go 14688201 (previous commit 14675861, +12340/+0.08%)
      
      The cumulative effect of this and the previous 3 commits is:
      
      name                    old time/op  new time/op  delta
      MutexUncontended        19.3ns ± 1%  16.4ns ± 1%  -15.13%  (p=0.000 n=20+20)
      MutexUncontended-4      5.24ns ± 0%  4.09ns ± 0%  -21.95%  (p=0.000 n=20+18)
      MutexUncontended-16     2.10ns ± 0%  2.12ns ± 0%   +0.95%  (p=0.000 n=15+17)
      Mutex                   19.6ns ± 0%  16.3ns ± 1%  -17.12%  (p=0.000 n=20+20)
      Mutex-4                 54.6ns ± 5%  45.6ns ±10%  -16.51%  (p=0.000 n=20+19)
      Mutex-16                 133ns ± 5%   130ns ± 3%   -1.99%  (p=0.002 n=20+20)
      MutexSlack              33.4ns ± 2%  16.2ns ± 0%  -51.44%  (p=0.000 n=19+20)
      MutexSlack-4             206ns ± 5%   209ns ± 9%     ~     (p=0.154 n=20+20)
      MutexSlack-16           89.4ns ± 1%  90.9ns ± 2%   +1.70%  (p=0.000 n=18+17)
      MutexWork               60.5ns ± 0%  55.3ns ± 1%   -8.59%  (p=0.000 n=12+20)
      MutexWork-4              105ns ± 5%    97ns ±11%   -7.95%  (p=0.000 n=20+20)
      MutexWork-16             157ns ± 1%   158ns ± 1%   +0.66%  (p=0.001 n=18+17)
      MutexWorkSlack          70.2ns ± 5%  55.3ns ± 0%  -21.30%  (p=0.000 n=19+18)
      MutexWorkSlack-4         277ns ±13%   260ns ±15%   -6.35%  (p=0.002 n=20+18)
      MutexWorkSlack-16        156ns ± 0%   146ns ± 1%   -6.40%  (p=0.000 n=16+19)
      MutexNoSpin              966ns ± 0%   976ns ± 1%   +0.97%  (p=0.000 n=15+17)
      MutexNoSpin-4            269ns ± 4%   272ns ± 4%   +1.15%  (p=0.048 n=20+18)
      MutexNoSpin-16           122ns ± 0%   119ns ± 1%   -2.63%  (p=0.000 n=19+15)
      MutexSpin               3.13µs ± 0%  3.12µs ± 0%   -0.17%  (p=0.000 n=18+18)
      MutexSpin-4              826ns ± 1%   833ns ± 1%   +0.84%  (p=0.000 n=19+17)
      MutexSpin-16             397ns ± 1%   394ns ± 1%   -0.78%  (p=0.000 n=19+19)
      Once                    5.67ns ± 0%  2.07ns ± 2%  -63.43%  (p=0.000 n=20+20)
      Once-4                  1.47ns ± 2%  0.54ns ± 3%  -63.49%  (p=0.000 n=19+20)
      Once-16                 0.58ns ± 0%  0.17ns ± 5%  -70.49%  (p=0.000 n=17+17)
      RWMutexUncontended      71.4ns ± 0%  60.3ns ± 1%  -15.60%  (p=0.000 n=16+20)
      RWMutexUncontended-4    18.4ns ± 4%  15.3ns ± 1%  -17.14%  (p=0.000 n=20+20)
      RWMutexUncontended-16   8.01ns ± 0%  7.69ns ± 0%   -3.91%  (p=0.000 n=18+15)
      RWMutexWrite100         24.9ns ± 0%  24.0ns ± 1%   -3.57%  (p=0.000 n=19+18)
      RWMutexWrite100-4       46.5ns ± 3%  44.1ns ± 4%   -5.09%  (p=0.000 n=17+20)
      RWMutexWrite100-16      68.9ns ± 3%  65.7ns ± 8%   -4.65%  (p=0.000 n=18+20)
      RWMutexWrite10          27.1ns ± 0%  25.7ns ± 0%   -5.25%  (p=0.000 n=17+14)
      RWMutexWrite10-4        34.8ns ± 1%  33.8ns ± 2%   -2.96%  (p=0.000 n=20+20)
      RWMutexWrite10-16       37.5ns ± 2%  36.1ns ± 2%   -3.72%  (p=0.000 n=20+20)
      RWMutexWorkWrite100      164ns ± 0%   162ns ± 0%   -1.49%  (p=0.000 n=12+20)
      RWMutexWorkWrite100-4    186ns ± 3%   184ns ± 4%     ~     (p=0.097 n=20+20)
      RWMutexWorkWrite100-16   204ns ± 2%   200ns ± 2%   -1.58%  (p=0.000 n=18+20)
      RWMutexWorkWrite10       153ns ± 0%   151ns ± 1%   -1.21%  (p=0.000 n=20+20)
      RWMutexWorkWrite10-4     179ns ± 1%   176ns ± 2%   -1.25%  (p=0.000 n=19+20)
      RWMutexWorkWrite10-16    191ns ± 1%   189ns ± 1%   -0.94%  (p=0.000 n=15+17)
      
      Change-Id: I9269bf2ac42a04c610624f707d3268dcb17390f8
      Reviewed-on: https://go-review.googlesource.com/c/go/+/152698
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      05051b56
    • Tobias Klauser's avatar
      bytes: return early in Repeat if count is 0 · 0e9d7d43
      Tobias Klauser authored
      This matches the implementation of strings.Repeat and slightly increases
      performance:
      
      name      old time/op  new time/op  delta
      Repeat-8   145ns ±12%   125ns ±29%  -13.35%  (p=0.009 n=10+10)
      
      Change-Id: Ic0a0e2ea9e36591286a49def320ddb67fe0b2c50
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166399
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      0e9d7d43
    • Carlo Alberto Ferraris's avatar
      sync: allow inlining the Once.Do fast path · ca835484
      Carlo Alberto Ferraris authored
      Using Once.Do is now extremely cheap because the fast path is just an inlined
      atomic load of a variable that is written only once and a conditional jump.
      This is very beneficial for Once.Do because, due to its nature, the fast path
      will be used for every call after the first one.
      
      In a attempt to mimize code size increase, reorder the fields so that the
      pointer to Once is also the pointer to Once.done, that is the only field used
      in the hot path. This allows to use more compact instruction encodings or less
      instructions in the hot path (that is inlined at every callsite).
      
      name     old time/op  new time/op  delta
      Once     4.54ns ± 0%  2.06ns ± 0%  -54.59%  (p=0.000 n=19+16)
      Once-4   1.18ns ± 0%  0.55ns ± 0%  -53.39%  (p=0.000 n=15+16)
      Once-16  0.53ns ± 0%  0.17ns ± 0%  -67.92%  (p=0.000 n=18+17)
      
      linux/amd64 bin/go 14675861 (previous commit 14663387, +12474/+0.09%)
      
      Change-Id: Ie2708103ab473787875d66746d2f20f1d90a6916
      Reviewed-on: https://go-review.googlesource.com/c/go/+/152697
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      ca835484
    • Carlo Alberto Ferraris's avatar
      sync: allow inlining the Mutex.Lock fast path · 41cb0aed
      Carlo Alberto Ferraris authored
      name                    old time/op  new time/op  delta
      MutexUncontended        18.9ns ± 0%  16.2ns ± 0%  -14.29%  (p=0.000 n=19+19)
      MutexUncontended-4      4.75ns ± 1%  4.08ns ± 0%  -14.20%  (p=0.000 n=20+19)
      MutexUncontended-16     2.05ns ± 0%  2.11ns ± 0%   +2.93%  (p=0.000 n=19+16)
      Mutex                   19.3ns ± 1%  16.2ns ± 0%  -15.86%  (p=0.000 n=17+19)
      Mutex-4                 52.4ns ± 4%  48.6ns ± 9%   -7.22%  (p=0.000 n=20+20)
      Mutex-16                 139ns ± 2%   140ns ± 3%   +1.03%  (p=0.011 n=16+20)
      MutexSlack              18.9ns ± 1%  16.2ns ± 1%  -13.96%  (p=0.000 n=20+20)
      MutexSlack-4             225ns ± 8%   211ns ±10%   -5.94%  (p=0.000 n=18+19)
      MutexSlack-16           98.4ns ± 1%  90.9ns ± 1%   -7.60%  (p=0.000 n=17+18)
      MutexWork               58.2ns ± 3%  55.4ns ± 0%   -4.82%  (p=0.000 n=20+17)
      MutexWork-4              103ns ± 7%    95ns ±18%   -8.03%  (p=0.000 n=20+20)
      MutexWork-16             163ns ± 2%   155ns ± 2%   -4.47%  (p=0.000 n=18+18)
      MutexWorkSlack          57.7ns ± 1%  55.4ns ± 0%   -3.99%  (p=0.000 n=20+13)
      MutexWorkSlack-4         276ns ±13%   260ns ±10%   -5.64%  (p=0.001 n=19+19)
      MutexWorkSlack-16        147ns ± 0%   156ns ± 1%   +5.87%  (p=0.000 n=14+19)
      MutexNoSpin              968ns ± 0%   900ns ± 1%   -6.98%  (p=0.000 n=20+18)
      MutexNoSpin-4            270ns ± 2%   255ns ± 2%   -5.74%  (p=0.000 n=19+20)
      MutexNoSpin-16           120ns ± 4%   112ns ± 0%   -6.99%  (p=0.000 n=19+14)
      MutexSpin               3.13µs ± 1%  3.19µs ± 6%     ~     (p=0.401 n=20+20)
      MutexSpin-4              832ns ± 2%   831ns ± 1%   -0.17%  (p=0.023 n=16+18)
      MutexSpin-16             395ns ± 0%   399ns ± 0%   +0.94%  (p=0.000 n=17+19)
      RWMutexUncontended      69.5ns ± 0%  68.4ns ± 0%   -1.59%  (p=0.000 n=20+20)
      RWMutexUncontended-4    17.5ns ± 0%  16.7ns ± 0%   -4.30%  (p=0.000 n=18+17)
      RWMutexUncontended-16   7.92ns ± 0%  7.87ns ± 0%   -0.61%  (p=0.000 n=18+17)
      RWMutexWrite100         24.9ns ± 1%  25.0ns ± 1%   +0.32%  (p=0.000 n=20+20)
      RWMutexWrite100-4       46.2ns ± 4%  46.2ns ± 5%     ~     (p=0.840 n=19+20)
      RWMutexWrite100-16      69.9ns ± 5%  69.9ns ± 3%     ~     (p=0.545 n=20+19)
      RWMutexWrite10          27.0ns ± 2%  26.8ns ± 2%   -0.98%  (p=0.001 n=20+20)
      RWMutexWrite10-4        34.7ns ± 2%  35.0ns ± 4%     ~     (p=0.191 n=18+20)
      RWMutexWrite10-16       37.2ns ± 4%  37.3ns ± 2%     ~     (p=0.438 n=20+19)
      RWMutexWorkWrite100      164ns ± 0%   163ns ± 0%   -0.24%  (p=0.025 n=20+20)
      RWMutexWorkWrite100-4    193ns ± 3%   191ns ± 2%   -1.06%  (p=0.027 n=20+20)
      RWMutexWorkWrite100-16   210ns ± 3%   207ns ± 3%   -1.22%  (p=0.038 n=20+20)
      RWMutexWorkWrite10       153ns ± 0%   153ns ± 0%     ~     (all equal)
      RWMutexWorkWrite10-4     178ns ± 2%   179ns ± 2%     ~     (p=0.186 n=20+20)
      RWMutexWorkWrite10-16    192ns ± 2%   192ns ± 2%     ~     (p=0.731 n=19+20)
      
      linux/amd64 bin/go 14663387 (previous commit 14630572, +32815/+0.22%)
      
      Change-Id: I98171006dce14069b1a62da07c3d165455a7906b
      Reviewed-on: https://go-review.googlesource.com/c/go/+/148959Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      41cb0aed
    • Keith Randall's avatar
      cmd/compile: reverse order of slice bounds checks · 83a33d38
      Keith Randall authored
      Turns out this makes the fix for 28797 unnecessary, because this order
      ensures that the RHS of IsSliceInBounds ops are always nonnegative.
      
      The real reason for this change is that it also makes dealing with
      <0 values easier for reporting values in bounds check panics (issue #30116).
      
      Makes cmd/go negligibly smaller.
      
      Update #28797
      
      Change-Id: I1f25ba6d2b3b3d4a72df3105828aa0a4b629ce85
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166377
      Run-TryBot: Keith Randall <khr@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      83a33d38
    • Clément Chigot's avatar
      cmd/link: enable DWARF with external linker on aix/ppc64 · 3cf89e50
      Clément Chigot authored
      In order to allow DWARF with ld, the symbol table is adapted.
      In internal linkmode, each package is considered as a .FILE. However,
      current version of ld is crashing on a few programs because of
      relocations between DWARF symbols. Considering all packages as part of
      one .FILE seems to bypass this bug.
      As it might be fixed in a future release, the size of each package
      in DWARF sections is still retrieved and can be used when it's fixed.
      Moreover, it's improving internal linkmode which should have done it
      anyway.
      
      Change-Id: If3d023fe118b24b9f0f46d201a4849eee8d5e333
      Reviewed-on: https://go-review.googlesource.com/c/go/+/164006
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      3cf89e50
    • LE Manh Cuong's avatar
      debug/gosym: simplify parsing symbol name rule · b37b35ed
      LE Manh Cuong authored
      Symbol name with linker prefix like "type." and "go." is not parsed
      correctly and returns the prefix as parts of package name.
      
      So just returns empty string for symbol name start with linker prefix.
      
      Fixes #29551
      
      Change-Id: Idb4ce872345e5781a5a5da2b2146faeeebd9e63b
      Reviewed-on: https://go-review.googlesource.com/c/go/+/156397
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      b37b35ed
  2. 08 Mar, 2019 23 commits
  3. 07 Mar, 2019 6 commits
    • Cherry Zhang's avatar
      cmd/link: fix suspicious code in emitPcln · 3a62f4ee
      Cherry Zhang authored
      In cmd/link/internal/ld/pcln.go:emitPcln, the code and the
      comment don't match. I think the comment is right. Fix the code.
      
      As a consequence, on Linux/AMD64, internal linking with PIE
      buildmode with cgo (at least the cgo packages in the standard
      library) now works. Add a test.
      
      Change-Id: I091cf81ba89571052bc0ec1fa0a6a688dec07b04
      Reviewed-on: https://go-review.googlesource.com/c/go/+/166017
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      3a62f4ee
    • Peter Waller's avatar
      cmd/compile/internal/ssa: set OFOR bBody.Pos to AST Pos · 7afd58d4
      Peter Waller authored
      Assign SSA OFOR's bBody.Pos to AST (*Node).Pos as it is created.
      
      An empty for loop has no other information which may be used to give
      correct position information in the resulting executable. Such a for
      loop may compile to a single `JMP *self` and it is important that the
      location of this is in the right place.
      
      Fixes #30167.
      
      Change-Id: Iec44f0281c462c33fac6b7b8ccfc2ef37434c247
      Reviewed-on: https://go-review.googlesource.com/c/go/+/163019
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      7afd58d4
    • fanzha02's avatar
      cmd/compile: optimize arm64 comparison of x and 0.0 with "FCMP $(0.0), Fn" · 27cce773
      fanzha02 authored
      Code:
      func comp(x float64) bool {return x < 0}
      
      Previous version:
        FMOVD	"".x(FP), F0
        FMOVD	ZR, F1
        FCMPD	F1, F0
        CSET	MI, R0
        MOVB	R0, "".~r1+8(FP)
        RET	(R30)
      
      Optimized version:
        FMOVD	"".x(FP), F0
        FCMPD	$(0.0), F0
        CSET	MI, R0
        MOVB	R0, "".~r1+8(FP)
        RET	(R30)
      
      Math package benchmark results:
      name                   old time/op          new time/op          delta
      Acos-8                   77.500000ns +- 0%    77.400000ns +- 0%   -0.13%  (p=0.000 n=9+10)
      Acosh-8                  98.600000ns +- 0%    98.100000ns +- 0%   -0.51%  (p=0.000 n=10+9)
      Asin-8                   67.600000ns +- 0%    66.600000ns +- 0%   -1.48%  (p=0.000 n=9+10)
      Asinh-8                 108.000000ns +- 0%   109.000000ns +- 0%   +0.93%  (p=0.000 n=10+10)
      Atan-8                   36.788889ns +- 0%    36.000000ns +- 0%   -2.14%  (p=0.000 n=9+10)
      Atanh-8                 104.000000ns +- 0%   105.000000ns +- 0%   +0.96%  (p=0.000 n=10+10)
      Atan2-8                  67.100000ns +- 0%    66.600000ns +- 0%   -0.75%  (p=0.000 n=10+10)
      Cbrt-8                   89.100000ns +- 0%    82.000000ns +- 0%   -7.97%  (p=0.000 n=10+10)
      Erf-8                    43.500000ns +- 0%    43.000000ns +- 0%   -1.15%  (p=0.000 n=10+10)
      Erfc-8                   49.000000ns +- 0%    48.220000ns +- 0%   -1.59%  (p=0.000 n=9+10)
      Erfinv-8                 59.100000ns +- 0%    58.600000ns +- 0%   -0.85%  (p=0.000 n=10+10)
      Erfcinv-8                59.100000ns +- 0%    58.600000ns +- 0%   -0.85%  (p=0.000 n=10+10)
      Expm1-8                  56.600000ns +- 0%    56.040000ns +- 0%   -0.99%  (p=0.000 n=8+10)
      Exp2Go-8                 97.600000ns +- 0%    99.400000ns +- 0%   +1.84%  (p=0.000 n=10+10)
      Dim-8                     2.500000ns +- 0%     2.250000ns +- 0%  -10.00%  (p=0.000 n=10+10)
      Mod-8                   108.000000ns +- 0%   106.000000ns +- 0%   -1.85%  (p=0.000 n=8+8)
      Frexp-8                  12.000000ns +- 0%    12.500000ns +- 0%   +4.17%  (p=0.000 n=10+10)
      Gamma-8                  67.100000ns +- 0%    67.600000ns +- 0%   +0.75%  (p=0.000 n=10+10)
      Hypot-8                  17.100000ns +- 0%    17.000000ns +- 0%   -0.58%  (p=0.002 n=8+10)
      Ilogb-8                   9.010000ns +- 0%     8.510000ns +- 0%   -5.55%  (p=0.000 n=10+9)
      J1-8                    288.000000ns +- 0%   287.000000ns +- 0%   -0.35%  (p=0.000 n=10+10)
      Jn-8                    605.000000ns +- 0%   604.000000ns +- 0%   -0.17%  (p=0.001 n=8+9)
      Logb-8                   10.600000ns +- 0%    10.500000ns +- 0%   -0.94%  (p=0.000 n=9+10)
      Log2-8                   16.500000ns +- 0%    17.000000ns +- 0%   +3.03%  (p=0.000 n=10+10)
      PowFrac-8               232.000000ns +- 0%   233.000000ns +- 0%   +0.43%  (p=0.000 n=10+10)
      Remainder-8              70.600000ns +- 0%    69.600000ns +- 0%   -1.42%  (p=0.000 n=10+10)
      SqrtGoLatency-8          77.600000ns +- 0%    76.600000ns +- 0%   -1.29%  (p=0.000 n=10+10)
      Tanh-8                   97.600000ns +- 0%    94.100000ns +- 0%   -3.59%  (p=0.000 n=10+10)
      Y1-8                    289.000000ns +- 0%   288.000000ns +- 0%   -0.35%  (p=0.000 n=10+10)
      Yn-8                    603.000000ns +- 0%   589.000000ns +- 0%   -2.32%  (p=0.000 n=10+10)
      
      Change-Id: I6920734f8662b329aa58f5b8e4eeae73b409984d
      Reviewed-on: https://go-review.googlesource.com/c/go/+/164719Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      27cce773
    • fanzha02's avatar
      cmd/compile: change the condition flags of floating-point comparisons in arm64 backend · 6efd51c6
      fanzha02 authored
      Current compiler reverses operands to work around NaN in
      "less than" and "less equal than" comparisons. But if we
      want to use "FCMPD/FCMPS $(0.0), Fn" to do some optimization,
      the workaround way does not work. Because assembler does
      not support instruction "FCMPD/FCMPS Fn, $(0.0)".
      
      This CL sets condition flags for floating-point comparisons
      to resolve this problem.
      
      Change-Id: Ia48076a1da95da64596d6e68304018cb301ebe33
      Reviewed-on: https://go-review.googlesource.com/c/go/+/164718
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      6efd51c6
    • Robert Griesemer's avatar
      cmd/compile: remove work-arounds for 0o/0O octals · a77f85a6
      Robert Griesemer authored
      With math/big supporting the new octal prefixes directly,
      the compiler doesn't have to manually convert such numbers
      into old-style 0-prefix octals anymore.
      
      Updates #12711.
      
      Change-Id: I300bdd095836595426a1478d68da179f39e5531a
      Reviewed-on: https://go-review.googlesource.com/c/go/+/165861Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      a77f85a6
    • Robert Griesemer's avatar
      math/big: support new octal prefixes 0o and 0O · 129c6e44
      Robert Griesemer authored
      This CL extends the various SetString and Parse methods for
      Ints, Rats, and Floats to accept the new octal prefixes.
      
      The main change is in natconv.go, all other changes are
      documentation and test updates.
      
      Finally, this CL also fixes TestRatSetString which silently
      dropped certain failures.
      
      Updates #12711.
      
      Change-Id: I5ee5879e25013ba1e6eda93ff280915f25ab5d55
      Reviewed-on: https://go-review.googlesource.com/c/go/+/165898Reviewed-by: default avatarEmmanuel Odeke <emm.odeke@gmail.com>
      129c6e44