1. 17 Feb, 2017 12 commits
    • Cherry Zhang's avatar
      cmd/compile: check both syms when folding address into load/store on ARM64 · 3557d546
      Cherry Zhang authored
      The rules for folding addresses into load/stores checks sym1 is
      not on stack (because the stack offset is not known at that point).
      But sym1 could be nil, which invalidates the check. Check merged
      sym instead.
      
      Fixes #19137.
      
      Change-Id: I8574da22ced1216bb5850403d8f08ec60a8d1005
      Reviewed-on: https://go-review.googlesource.com/37145
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      3557d546
    • Robert Griesemer's avatar
      math/bits: fix benchmarks (make sure calls don't get optimized away) · 3a239a6a
      Robert Griesemer authored
      Sum up function results and store them in an exported (global)
      variable. This prevents the compiler from optimizing away the
      otherwise side-effect free function calls.
      
      We now have more realistic set of benchmark numbers...
      
      Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3.
      
      Note: These measurements are based on the same "old"
      implementation as the prior measurements (commit 7d5c003a).
      
      benchmark                     old ns/op     new ns/op     delta
      BenchmarkReverse-8            72.9          8.50          -88.34%
      BenchmarkReverse8-8           13.2          2.17          -83.56%
      BenchmarkReverse16-8          21.2          2.89          -86.37%
      BenchmarkReverse32-8          36.3          3.55          -90.22%
      BenchmarkReverse64-8          71.3          6.81          -90.45%
      BenchmarkReverseBytes-8       11.2          3.49          -68.84%
      BenchmarkReverseBytes16-8     6.24          0.93          -85.10%
      BenchmarkReverseBytes32-8     7.40          1.55          -79.05%
      BenchmarkReverseBytes64-8     10.5          2.47          -76.48%
      
      Reverse-8         72.9ns ± 0%   8.5ns ± 0%   ~     (p=1.000 n=1+1)
      Reverse8-8        13.2ns ± 0%   2.2ns ± 0%   ~     (p=1.000 n=1+1)
      Reverse16-8       21.2ns ± 0%   2.9ns ± 0%   ~     (p=1.000 n=1+1)
      Reverse32-8       36.3ns ± 0%   3.5ns ± 0%   ~     (p=1.000 n=1+1)
      Reverse64-8       71.3ns ± 0%   6.8ns ± 0%   ~     (p=1.000 n=1+1)
      ReverseBytes-8    11.2ns ± 0%   3.5ns ± 0%   ~     (p=1.000 n=1+1)
      ReverseBytes16-8  6.24ns ± 0%  0.93ns ± 0%   ~     (p=1.000 n=1+1)
      ReverseBytes32-8  7.40ns ± 0%  1.55ns ± 0%   ~     (p=1.000 n=1+1)
      ReverseBytes64-8  10.5ns ± 0%   2.5ns ± 0%   ~     (p=1.000 n=1+1)
      
      Change-Id: I8aef1334b84f6cafd25edccad7e6868b37969efb
      Reviewed-on: https://go-review.googlesource.com/37213Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      3a239a6a
    • Robert Griesemer's avatar
      math/bits: much faster ReverseBytes, added respective benchmarks · ddb15cea
      Robert Griesemer authored
      Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3.
      
      benchmark                     old ns/op     new ns/op     delta
      BenchmarkReverseBytes-8       11.4          3.51          -69.21%
      BenchmarkReverseBytes16-8     6.87          0.64          -90.68%
      BenchmarkReverseBytes32-8     7.79          0.65          -91.66%
      BenchmarkReverseBytes64-8     11.6          0.64          -94.48%
      
      name              old time/op  new time/op  delta
      ReverseBytes-8    11.4ns ± 0%   3.5ns ± 0%   ~     (p=1.000 n=1+1)
      ReverseBytes16-8  6.87ns ± 0%  0.64ns ± 0%   ~     (p=1.000 n=1+1)
      ReverseBytes32-8  7.79ns ± 0%  0.65ns ± 0%   ~     (p=1.000 n=1+1)
      ReverseBytes64-8  11.6ns ± 0%   0.6ns ± 0%   ~     (p=1.000 n=1+1)
      
      Change-Id: I67b529652b3b613c61687e9e185e8d4ee40c51a2
      Reviewed-on: https://go-review.googlesource.com/37211
      Run-TryBot: Robert Griesemer <gri@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      ddb15cea
    • Robert Griesemer's avatar
      math/bits: much faster Reverse, added respective benchmarks · 7d5c003a
      Robert Griesemer authored
      Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3.
      
      name         old time/op  new time/op  delta
      Reverse-8    76.6ns ± 0%   8.1ns ± 0%   ~     (p=1.000 n=1+1)
      Reverse8-8   12.6ns ± 0%   0.6ns ± 0%   ~     (p=1.000 n=1+1)
      Reverse16-8  20.8ns ± 0%   0.6ns ± 0%   ~     (p=1.000 n=1+1)
      Reverse32-8  36.5ns ± 0%   0.6ns ± 0%   ~     (p=1.000 n=1+1)
      Reverse64-8  74.0ns ± 0%   6.4ns ± 0%   ~     (p=1.000 n=1+1)
      
      benchmark                old ns/op     new ns/op     delta
      BenchmarkReverse-8       76.6          8.07          -89.46%
      BenchmarkReverse8-8      12.6          0.64          -94.92%
      BenchmarkReverse16-8     20.8          0.64          -96.92%
      BenchmarkReverse32-8     36.5          0.64          -98.25%
      BenchmarkReverse64-8     74.0          6.38          -91.38%
      
      Change-Id: I6b99b10cee2f2babfe79342b50ee36a45a34da30
      Reviewed-on: https://go-review.googlesource.com/37149
      Run-TryBot: Robert Griesemer <gri@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      7d5c003a
    • Cherry Zhang's avatar
      cmd/compile: fix some types in SSA · c4b8dadb
      Cherry Zhang authored
      These seem not to really matter, but good to be correct.
      
      Change-Id: I02edb9797c3d6739725cfbe4723c75f151acd05e
      Reviewed-on: https://go-review.googlesource.com/36837
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      c4b8dadb
    • Cherry Zhang's avatar
      cmd/compile: redo writebarrier pass · c4ef597c
      Cherry Zhang authored
      SSA's writebarrier pass requires WB store ops are always at the
      end of a block. If we move write barrier insertion into SSA and
      emits normal Store ops when building SSA, this requirement becomes
      impractical -- it will create too many blocks for all the Store
      ops.
      
      Redo SSA's writebarrier pass, explicitly order values in store
      order, so it no longer needs this requirement.
      
      Updates #17583.
      Fixes #19067.
      
      Change-Id: I66e817e526affb7e13517d4245905300a90b7170
      Reviewed-on: https://go-review.googlesource.com/36834
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      c4ef597c
    • Cherry Zhang's avatar
      cmd/compile: re-enable nilcheck removal in same block · 98061fa5
      Cherry Zhang authored
      Nil check removal in the same block is disabled due to issue 18725:
      because the values are not ordered, a nilcheck may influence a
      value that is logically before it. This CL re-enables same-block
      nilcheck removal by ordering values in store order first.
      
      Updates #18725.
      
      Change-Id: I287a38525230c14c5412cbcdbc422547dabd54f6
      Reviewed-on: https://go-review.googlesource.com/35496
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      98061fa5
    • Robert Griesemer's avatar
      math/bits: expand doc strings for all functions · 81acd308
      Robert Griesemer authored
      Follow-up on https://go-review.googlesource.com/36315.
      No functionality change.
      
      For #18616.
      
      Change-Id: Id4df34dd7d0381be06eea483a11bf92f4a01f604
      Reviewed-on: https://go-review.googlesource.com/37140Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      81acd308
    • Koki Ide's avatar
      all: fix a few typos in comments · 045ad5ba
      Koki Ide authored
      Change-Id: I0455ffaa51c661803d8013c7961910f920d3c3cc
      Reviewed-on: https://go-review.googlesource.com/37043Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      045ad5ba
    • Dmitry Vyukov's avatar
      sync: make Mutex more fair · 0556e262
      Dmitry Vyukov authored
      Add new starvation mode for Mutex.
      In starvation mode ownership is directly handed off from
      unlocking goroutine to the next waiter. New arriving goroutines
      don't compete for ownership.
      Unfair wait time is now limited to 1ms.
      Also fix a long standing bug that goroutines were requeued
      at the tail of the wait queue. That lead to even more unfair
      acquisition times with multiple waiters.
      
      Performance of normal mode is not considerably affected.
      
      Fixes #13086
      
      On the provided in the issue lockskew program:
      
      done in 1.207853ms
      done in 1.177451ms
      done in 1.184168ms
      done in 1.198633ms
      done in 1.185797ms
      done in 1.182502ms
      done in 1.316485ms
      done in 1.211611ms
      done in 1.182418ms
      
      name                    old time/op  new time/op   delta
      MutexUncontended-48     0.65ns ± 0%   0.65ns ± 1%     ~           (p=0.087 n=10+10)
      Mutex-48                 112ns ± 1%    114ns ± 1%   +1.69%        (p=0.000 n=10+10)
      MutexSlack-48            113ns ± 0%     87ns ± 1%  -22.65%         (p=0.000 n=8+10)
      MutexWork-48             149ns ± 0%    145ns ± 0%   -2.48%         (p=0.000 n=9+10)
      MutexWorkSlack-48        149ns ± 0%    122ns ± 3%  -18.26%         (p=0.000 n=6+10)
      MutexNoSpin-48           103ns ± 4%    105ns ± 3%     ~           (p=0.089 n=10+10)
      MutexSpin-48             490ns ± 4%    515ns ± 6%   +5.08%        (p=0.006 n=10+10)
      Cond32-48               13.4µs ± 6%   13.1µs ± 5%   -2.75%        (p=0.023 n=10+10)
      RWMutexWrite100-48      53.2ns ± 3%   41.2ns ± 3%  -22.57%        (p=0.000 n=10+10)
      RWMutexWrite10-48       45.9ns ± 2%   43.9ns ± 2%   -4.38%        (p=0.000 n=10+10)
      RWMutexWorkWrite100-48   122ns ± 2%    134ns ± 1%   +9.92%        (p=0.000 n=10+10)
      RWMutexWorkWrite10-48    206ns ± 1%    188ns ± 1%   -8.52%         (p=0.000 n=8+10)
      Cond32-24               12.1µs ± 3%   12.4µs ± 3%   +1.98%         (p=0.043 n=10+9)
      MutexUncontended-24     0.74ns ± 1%   0.75ns ± 1%     ~           (p=0.650 n=10+10)
      Mutex-24                 122ns ± 2%    124ns ± 1%   +1.31%        (p=0.007 n=10+10)
      MutexSlack-24           96.9ns ± 2%  102.8ns ± 2%   +6.11%        (p=0.000 n=10+10)
      MutexWork-24             146ns ± 1%    135ns ± 2%   -7.70%         (p=0.000 n=10+9)
      MutexWorkSlack-24        135ns ± 1%    128ns ± 2%   -5.01%         (p=0.000 n=10+9)
      MutexNoSpin-24           114ns ± 3%    110ns ± 4%   -3.84%        (p=0.000 n=10+10)
      MutexSpin-24             482ns ± 4%    475ns ± 8%     ~           (p=0.286 n=10+10)
      RWMutexWrite100-24      43.0ns ± 3%   43.1ns ± 2%     ~           (p=0.956 n=10+10)
      RWMutexWrite10-24       43.4ns ± 1%   43.2ns ± 1%     ~            (p=0.085 n=10+9)
      RWMutexWorkWrite100-24   130ns ± 3%    131ns ± 3%     ~           (p=0.747 n=10+10)
      RWMutexWorkWrite10-24    191ns ± 1%    192ns ± 1%     ~           (p=0.210 n=10+10)
      Cond32-12               11.5µs ± 2%   11.7µs ± 2%   +1.98%        (p=0.002 n=10+10)
      MutexUncontended-12     1.48ns ± 0%   1.50ns ± 1%   +1.08%        (p=0.004 n=10+10)
      Mutex-12                 141ns ± 1%    143ns ± 1%   +1.63%        (p=0.000 n=10+10)
      MutexSlack-12            121ns ± 0%    119ns ± 0%   -1.65%          (p=0.001 n=8+9)
      MutexWork-12             141ns ± 2%    150ns ± 3%   +6.36%         (p=0.000 n=9+10)
      MutexWorkSlack-12        131ns ± 0%    138ns ± 0%   +5.73%         (p=0.000 n=9+10)
      MutexNoSpin-12          87.0ns ± 1%   83.7ns ± 1%   -3.80%        (p=0.000 n=10+10)
      MutexSpin-12             364ns ± 1%    377ns ± 1%   +3.77%        (p=0.000 n=10+10)
      RWMutexWrite100-12      42.8ns ± 1%   43.9ns ± 1%   +2.41%         (p=0.000 n=8+10)
      RWMutexWrite10-12       39.8ns ± 4%   39.3ns ± 1%     ~            (p=0.433 n=10+9)
      RWMutexWorkWrite100-12   131ns ± 1%    131ns ± 0%     ~            (p=0.591 n=10+9)
      RWMutexWorkWrite10-12    173ns ± 1%    174ns ± 0%     ~            (p=0.059 n=10+8)
      Cond32-6                10.9µs ± 2%   10.9µs ± 2%     ~           (p=0.739 n=10+10)
      MutexUncontended-6      2.97ns ± 0%   2.97ns ± 0%     ~     (all samples are equal)
      Mutex-6                  122ns ± 6%    122ns ± 2%     ~           (p=0.668 n=10+10)
      MutexSlack-6             149ns ± 3%    142ns ± 3%   -4.63%        (p=0.000 n=10+10)
      MutexWork-6              136ns ± 3%    140ns ± 5%     ~           (p=0.077 n=10+10)
      MutexWorkSlack-6         152ns ± 0%    138ns ± 2%   -9.21%         (p=0.000 n=6+10)
      MutexNoSpin-6            150ns ± 1%    152ns ± 0%   +1.50%         (p=0.000 n=8+10)
      MutexSpin-6              726ns ± 0%    730ns ± 1%     ~           (p=0.069 n=10+10)
      RWMutexWrite100-6       40.6ns ± 1%   40.9ns ± 1%   +0.91%         (p=0.001 n=8+10)
      RWMutexWrite10-6        37.1ns ± 0%   37.0ns ± 1%     ~            (p=0.386 n=9+10)
      RWMutexWorkWrite100-6    133ns ± 1%    134ns ± 1%   +1.01%         (p=0.005 n=9+10)
      RWMutexWorkWrite10-6     152ns ± 0%    152ns ± 0%     ~     (all samples are equal)
      Cond32-2                7.86µs ± 2%   7.95µs ± 2%   +1.10%        (p=0.023 n=10+10)
      MutexUncontended-2      8.10ns ± 0%   9.11ns ± 4%  +12.44%         (p=0.000 n=9+10)
      Mutex-2                 32.9ns ± 9%   38.4ns ± 6%  +16.58%        (p=0.000 n=10+10)
      MutexSlack-2            93.4ns ± 1%   98.5ns ± 2%   +5.39%         (p=0.000 n=10+9)
      MutexWork-2             40.8ns ± 3%   43.8ns ± 7%   +7.38%         (p=0.000 n=10+9)
      MutexWorkSlack-2        98.6ns ± 5%  108.2ns ± 2%   +9.80%         (p=0.000 n=10+8)
      MutexNoSpin-2            399ns ± 1%    398ns ± 2%     ~             (p=0.463 n=8+9)
      MutexSpin-2             1.99µs ± 3%   1.97µs ± 1%   -0.81%          (p=0.003 n=9+8)
      RWMutexWrite100-2       37.6ns ± 5%   46.0ns ± 4%  +22.17%         (p=0.000 n=10+8)
      RWMutexWrite10-2        50.1ns ± 6%   36.8ns ±12%  -26.46%         (p=0.000 n=9+10)
      RWMutexWorkWrite100-2    136ns ± 0%    134ns ± 2%   -1.80%          (p=0.001 n=7+9)
      RWMutexWorkWrite10-2     140ns ± 1%    138ns ± 1%   -1.50%        (p=0.000 n=10+10)
      Cond32                  5.93µs ± 1%   5.91µs ± 0%     ~            (p=0.411 n=9+10)
      MutexUncontended        15.9ns ± 0%   15.8ns ± 0%   -0.63%          (p=0.000 n=8+8)
      Mutex                   15.9ns ± 0%   15.8ns ± 0%   -0.44%        (p=0.003 n=10+10)
      MutexSlack              26.9ns ± 3%   26.7ns ± 2%     ~           (p=0.084 n=10+10)
      MutexWork               47.8ns ± 0%   47.9ns ± 0%   +0.21%          (p=0.014 n=9+8)
      MutexWorkSlack          54.9ns ± 3%   54.5ns ± 3%     ~           (p=0.254 n=10+10)
      MutexNoSpin              786ns ± 2%    765ns ± 1%   -2.66%        (p=0.000 n=10+10)
      MutexSpin               3.87µs ± 1%   3.83µs ± 0%   -0.85%          (p=0.005 n=9+8)
      RWMutexWrite100         21.2ns ± 2%   21.0ns ± 1%   -0.88%         (p=0.018 n=10+9)
      RWMutexWrite10          22.6ns ± 1%   22.6ns ± 0%     ~             (p=0.471 n=9+9)
      RWMutexWorkWrite100      132ns ± 0%    132ns ± 0%     ~     (all samples are equal)
      RWMutexWorkWrite10       124ns ± 0%    123ns ± 0%     ~           (p=0.656 n=10+10)
      
      Change-Id: I66412a3a0980df1233ad7a5a0cd9723b4274528b
      Reviewed-on: https://go-review.googlesource.com/34310
      Run-TryBot: Russ Cox <rsc@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      0556e262
    • Wander Lairson Costa's avatar
      syscall: only call setgroups if we need to · 79f6a5c7
      Wander Lairson Costa authored
      If the caller set ups a Credential in os/exec.Command,
      os/exec.Command.Start will end up calling setgroups(2), even if no
      supplementary groups were given.
      
      Only root can call setgroups(2) on BSD kernels, which causes Start to
      fail for non-root users when they try to set uid and gid for the new
      process.
      
      We fix by introducing a new field to syscall.Credential named
      NoSetGroups, and setgroups(2) is only called if it is false.
      We make this field with inverted logic to preserve backward
      compatibility.
      
      RELNOTES=yes
      
      Change-Id: I3cff1f21c117a1430834f640ef21fd4e87e06804
      Reviewed-on: https://go-review.googlesource.com/36697Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      79f6a5c7
    • Keith Randall's avatar
      cmd/compile: move constant divide strength reduction to SSA rules · 708ba22a
      Keith Randall authored
      Currently the conversion from constant divides to multiplies is mostly
      done during the walk pass.  This is suboptimal because SSA can
      determine that the value being divided by is constant more often
      (e.g. after inlining).
      
      Change-Id: If1a9b993edd71be37396b9167f77da271966f85f
      Reviewed-on: https://go-review.googlesource.com/37015
      Run-TryBot: Keith Randall <khr@golang.org>
      Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      708ba22a
  2. 16 Feb, 2017 12 commits
  3. 15 Feb, 2017 16 commits