1. 18 Mar, 2019 4 commits
    • Daniel Theophanes's avatar
      cmd/go: allow -o to point to a folder that writes multiple execs · b48bda9c
      Daniel Theophanes authored
      If -o points to a directory that exists then allow multiple
      executables to be written to that directory.
      
      Fixes #14295
      
      Change-Id: Ic951e637c70a2ada5e7534bae9a43901a39fe2c5
      Reviewed-on: https://go-review.googlesource.com/c/go/+/167679
      Run-TryBot: Daniel Theophanes <kardianos@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarJay Conrod <jayconrod@google.com>
      b48bda9c
    • Martin Möhrmann's avatar
      runtime: replace division by span element size by multiply and shifts · 6ca51f78
      Martin Möhrmann authored
      Divisions are generally slow. The compiler can optimize a division
      to use a sequence of faster multiplies and shifts (magic constants)
      if the divisor is not know at compile time.
      
      The value of the span element size in mcentral.grow is not known at
      compile time but magic constants to compute n / span.elementsize
      are already stored in class_to_divmagic and mspan.
      They however need to be adjusted to work for
      (0 <= n <= span.npages * pagesize) instead of
      (0 <= n <  span.npages * pagesize).
      
      Change-Id: Ieea59f1c94525a88d012f2557d43691967900deb
      Reviewed-on: https://go-review.googlesource.com/c/go/+/148057
      Run-TryBot: Martin Möhrmann <moehrmann@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      6ca51f78
    • Daniel Martí's avatar
      encoding/json: fix performance regression in the decoder · e5f6e2d1
      Daniel Martí authored
      In golang.org/cl/145218, a feature was added where the JSON decoder
      would keep track of the entire path to a field when reporting an
      UnmarshalTypeError.
      
      However, we all failed to check if this affected the benchmarks - myself
      included, as a reviewer. Below are the numbers comparing the CL's parent
      with itself, once it was merged:
      
      name           old time/op    new time/op    delta
      CodeDecoder-8    12.9ms ± 1%    28.2ms ± 2%   +119.33%  (p=0.002 n=6+6)
      
      name           old speed      new speed      delta
      CodeDecoder-8   151MB/s ± 1%    69MB/s ± 3%    -54.40%  (p=0.002 n=6+6)
      
      name           old alloc/op   new alloc/op   delta
      CodeDecoder-8    2.74MB ± 0%  109.39MB ± 0%  +3891.83%  (p=0.002 n=6+6)
      
      name           old allocs/op  new allocs/op  delta
      CodeDecoder-8     77.5k ± 0%    168.5k ± 0%   +117.30%  (p=0.004 n=6+5)
      
      The reason why the decoder got twice as slow is because it now allocated
      ~40x as many objects, which puts a lot of pressure on the garbage
      collector.
      
      The reason is that the CL concatenated strings every time a nested field
      was decoded. In other words, practically every field generated garbage
      when decoded. This is hugely wasteful, especially considering that the
      vast majority of JSON decoding inputs won't return UnmarshalTypeError.
      
      Instead, use a stack of fields, and make sure to always use the same
      backing array, to ensure we only need to grow the slice to the maximum
      depth once.
      
      The original CL also introduced a bug. The field stack string wasn't
      reset to its original state when reaching "d.opcode == scanEndObject",
      so the last field in a decoded struct could leak. For example, an added
      test decodes a list of structs, and encoding/json before this CL would
      fail:
      
      	got:  cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
      	want: cannot unmarshal string into Go struct field T.Ts.Y of type int
      
      To fix that, simply reset the stack after decoding every field, even if
      it's the last.
      
      Below is the original performance versus this CL. There's a tiny
      performance hit, probably due to the append for every decoded field, but
      at least we're back to the usual ~150MB/s.
      
      name           old time/op    new time/op    delta
      CodeDecoder-8    12.9ms ± 1%    13.0ms ± 1%  +1.25%  (p=0.009 n=6+6)
      
      name           old speed      new speed      delta
      CodeDecoder-8   151MB/s ± 1%   149MB/s ± 1%  -1.24%  (p=0.009 n=6+6)
      
      name           old alloc/op   new alloc/op   delta
      CodeDecoder-8    2.74MB ± 0%    2.74MB ± 0%  +0.00%  (p=0.002 n=6+6)
      
      name           old allocs/op  new allocs/op  delta
      CodeDecoder-8     77.5k ± 0%     77.5k ± 0%  +0.00%  (p=0.002 n=6+6)
      
      Finally, make all of these benchmarks report allocs by default. The
      decoder ones are pretty sensitive to generated garbage, so ReportAllocs
      would have made the performance regression more obvious.
      
      Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
      Reviewed-on: https://go-review.googlesource.com/c/go/+/167978Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      e5f6e2d1
    • Tobias Klauser's avatar
      internal/bytealg: share code for IndexByte functions on arm · 3496ff1d
      Tobias Klauser authored
      Move the shared code of IndexByte and IndexByteString into
      indexbytebody. This will allow to implement optimizations (e.g.
      for #29001) in a single function.
      
      Change-Id: I1d550da8eb65f95e492a460a12058cc35b1162b6
      Reviewed-on: https://go-review.googlesource.com/c/go/+/167939
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      3496ff1d
  2. 17 Mar, 2019 3 commits
    • Elias Naur's avatar
      os: only fallback to root directory if $HOME fails for UserHomeDir · bedb6a18
      Elias Naur authored
      UserHomeDir always returns "/" for platforms where the home directory
      is not always well defined. However, the user might set HOME before
      running a Go program on those platforms and on at least iOS, HOME
      is actually set to something useful (the root of the app specific
      writable directory).
      
      This CL changes UserHomeDir to use the root directory "/" only if
      $HOME is empty.
      
      Change-Id: Icaa01de53cd585d527d9a23b1629375d6b7f67e9
      Reviewed-on: https://go-review.googlesource.com/c/go/+/167802
      Run-TryBot: Elias Naur <mail@eliasnaur.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarTobias Klauser <tobias.klauser@gmail.com>
      bedb6a18
    • Tobias Klauser's avatar
      internal/bytealg: use word-wise comparison for Equal on arm · a734601b
      Tobias Klauser authored
      Follow CL 165338 and use word-wise comparison for aligned buffers in
      Equal on arm, otherwise fall back to the current byte-wise comparison.
      
      name                 old time/op    new time/op    delta
      Equal/0-4              25.7ns ± 1%    23.5ns ± 1%    -8.78%  (p=0.000 n=10+10)
      Equal/1-4              65.8ns ± 0%    60.1ns ± 1%    -8.69%  (p=0.000 n=10+9)
      Equal/6-4              82.9ns ± 1%    86.7ns ± 0%    +4.59%  (p=0.000 n=10+10)
      Equal/9-4              90.0ns ± 0%   101.0ns ± 0%   +12.18%  (p=0.000 n=9+10)
      Equal/15-4              108ns ± 0%     119ns ± 0%   +10.19%  (p=0.000 n=8+8)
      Equal/16-4              111ns ± 0%      82ns ± 0%   -26.37%  (p=0.000 n=8+10)
      Equal/20-4              124ns ± 1%      87ns ± 1%   -29.94%  (p=0.000 n=9+10)
      Equal/32-4              160ns ± 1%      97ns ± 1%   -39.40%  (p=0.000 n=10+10)
      Equal/4K-4             14.0µs ± 0%     3.6µs ± 1%   -74.57%  (p=0.000 n=9+10)
      Equal/4M-4             12.8ms ± 1%     3.2ms ± 0%   -74.93%  (p=0.000 n=9+9)
      Equal/64M-4             204ms ± 1%      51ms ± 0%   -74.78%  (p=0.000 n=10+10)
      EqualPort/1-4          47.0ns ± 1%    46.8ns ± 0%    -0.40%  (p=0.015 n=10+6)
      EqualPort/6-4          82.6ns ± 1%    81.9ns ± 1%    -0.87%  (p=0.002 n=10+10)
      EqualPort/32-4          232ns ± 0%     232ns ± 0%      ~     (p=0.496 n=8+10)
      EqualPort/4K-4         29.0µs ± 1%    29.0µs ± 1%      ~     (p=0.604 n=9+10)
      EqualPort/4M-4         24.0ms ± 1%    23.8ms ± 0%    -0.65%  (p=0.001 n=9+9)
      EqualPort/64M-4         383ms ± 1%     382ms ± 0%      ~     (p=0.218 n=10+10)
      CompareBytesEqual-4    61.2ns ± 1%    61.0ns ± 1%      ~     (p=0.539 n=10+10)
      
      name                 old speed      new speed      delta
      Equal/1-4            15.2MB/s ± 0%  16.6MB/s ± 1%    +9.52%  (p=0.000 n=10+9)
      Equal/6-4            72.4MB/s ± 1%  69.2MB/s ± 0%    -4.40%  (p=0.000 n=10+10)
      Equal/9-4             100MB/s ± 0%    89MB/s ± 0%   -11.40%  (p=0.000 n=9+10)
      Equal/15-4            138MB/s ± 1%   125MB/s ± 1%    -9.41%  (p=0.000 n=10+10)
      Equal/16-4            144MB/s ± 1%   196MB/s ± 0%   +36.41%  (p=0.000 n=10+10)
      Equal/20-4            162MB/s ± 1%   231MB/s ± 1%   +42.98%  (p=0.000 n=9+10)
      Equal/32-4            200MB/s ± 1%   331MB/s ± 1%   +65.64%  (p=0.000 n=10+10)
      Equal/4K-4            292MB/s ± 0%  1149MB/s ± 1%  +293.19%  (p=0.000 n=9+10)
      Equal/4M-4            328MB/s ± 1%  1307MB/s ± 0%  +298.87%  (p=0.000 n=9+9)
      Equal/64M-4           329MB/s ± 1%  1306MB/s ± 0%  +296.56%  (p=0.000 n=10+10)
      EqualPort/1-4        21.3MB/s ± 1%  21.4MB/s ± 0%    +0.42%  (p=0.002 n=10+9)
      EqualPort/6-4        72.6MB/s ± 1%  73.2MB/s ± 1%    +0.87%  (p=0.003 n=10+10)
      EqualPort/32-4        138MB/s ± 0%   138MB/s ± 0%      ~     (p=0.953 n=9+10)
      EqualPort/4K-4        141MB/s ± 1%   141MB/s ± 1%      ~     (p=0.382 n=10+10)
      EqualPort/4M-4        175MB/s ± 1%   176MB/s ± 0%    +0.65%  (p=0.001 n=9+9)
      EqualPort/64M-4       175MB/s ± 1%   176MB/s ± 0%      ~     (p=0.225 n=10+10)
      
      The 5-12% decrease in performance on Equal/{6,9,15} are due to the
      benchmarks splitting the bytes buffer in half. The b argument to Equal
      then ends up being unaligned and thus the fast word-wise compare doesn't
      kick in.
      
      Updates #29001
      
      Change-Id: I73be501c18e67d211ed19da7771b4f254254e609
      Reviewed-on: https://go-review.googlesource.com/c/go/+/167557
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      a734601b
    • Elias Naur's avatar
      cmd/go,misc/ios: fix tests on iOS · 746f405f
      Elias Naur authored
      Now that modules are always on, cmd/go tests require a valid
      GOCACHE. However, on iOS where the go tool is not available, the
      cmd/go test driver ends up setting GOCACHE to the empty string.
      Fix it by falling back to the builtin default cache directory.
      
      The iOS exec wrapper passes the environment variables to the app
      on the device, including $HOME used for the default cache directory.
      Skip $HOME to let the device specific and writable $HOME be used
      instead.
      
      Should fix cmd/go on the iOS builders that broke when GO111MODULE
      defaulted to on.
      
      Change-Id: I0939f5b8aaa1d2db95e64c99f4130eee2d0b4d4d
      Reviewed-on: https://go-review.googlesource.com/c/go/+/167938
      Run-TryBot: Elias Naur <mail@eliasnaur.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      746f405f
  3. 16 Mar, 2019 1 commit
  4. 15 Mar, 2019 16 commits
  5. 14 Mar, 2019 16 commits