1. 24 Mar, 2018 7 commits
    • Tobias Klauser's avatar
      runtime: adjust GOARM floating point compatibility error message · 786899a7
      Tobias Klauser authored
      As pointed out by Josh Bleecher Snyder in CL 99780.
      
      The check is for GOARM > 6, so suggest to recompile with either GOARM=5
      or GOARM=6.
      
      Change-Id: I6a97e87bdc17aa3932f5c8cb598bba85c3cf4be9
      Reviewed-on: https://go-review.googlesource.com/101936
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      786899a7
    • Giovanni Bajo's avatar
      cmd/compile: in prove, shortcircuit self-facts · d54902ec
      Giovanni Bajo authored
      Sometimes, we can end up calling update with a self-relation
      about a variable (x REL x). In this case, there is no need
      to record anything: the relation is unsatisfiable if and only
      if it doesn't contain eq.
      
      This also helps avoiding infinite loop in next CL that will
      introduce transitive closure of relations.
      
      Passes toolstash -cmp.
      
      Change-Id: Ic408452ec1c13653f22ada35466ec98bc14aaa8e
      Reviewed-on: https://go-review.googlesource.com/100276Reviewed-by: default avatarAustin Clements <austin@google.com>
      d54902ec
    • Giovanni Bajo's avatar
      cmd/compile: in prove, fail fast when unsat is found · 385d936f
      Giovanni Bajo authored
      When an unsatisfiable relation is recorded in the facts table,
      there is no need to compute further relations or updates
      additional data structures.
      
      Since we're about to transitively propagate relations, make
      sure to fail as fast as possible to avoid doing useless work
      in dead branches.
      
      Passes toolstash -cmp.
      
      Change-Id: I23eed376d62776824c33088163c7ac9620abce85
      Reviewed-on: https://go-review.googlesource.com/100275Reviewed-by: default avatarAustin Clements <austin@google.com>
      385d936f
    • Giovanni Bajo's avatar
      cmd/compile: add patterns for bit set/clear/complement on amd64 · 79112707
      Giovanni Bajo authored
      This patch completes implementation of BT(Q|L), and adds support
      for BT(S|R|C)(Q|L).
      
      Example of code changes from time.(*Time).addSec:
      
              if t.wall&hasMonotonic != 0 {
        0x1073465               488b08                  MOVQ 0(AX), CX
        0x1073468               4889ca                  MOVQ CX, DX
        0x107346b               48c1e93f                SHRQ $0x3f, CX
        0x107346f               48c1e13f                SHLQ $0x3f, CX
        0x1073473               48f7c1ffffffff          TESTQ $-0x1, CX
        0x107347a               746b                    JE 0x10734e7
      
              if t.wall&hasMonotonic != 0 {
        0x1073435               488b08                  MOVQ 0(AX), CX
        0x1073438               480fbae13f              BTQ $0x3f, CX
        0x107343d               7363                    JAE 0x10734a2
      
      Another example:
      
                              t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
        0x10734c8               4881e1ffffff3f          ANDQ $0x3fffffff, CX
        0x10734cf               48c1e61e                SHLQ $0x1e, SI
        0x10734d3               4809ce                  ORQ CX, SI
        0x10734d6               48b90000000000000080    MOVQ $0x8000000000000000, CX
        0x10734e0               4809f1                  ORQ SI, CX
        0x10734e3               488908                  MOVQ CX, 0(AX)
      
                              t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
        0x107348b		4881e2ffffff3f		ANDQ $0x3fffffff, DX
        0x1073492		48c1e61e		SHLQ $0x1e, SI
        0x1073496		4809f2			ORQ SI, DX
        0x1073499		480fbaea3f		BTSQ $0x3f, DX
        0x107349e		488910			MOVQ DX, 0(AX)
      
      Go1 benchmarks seem unaffected, and I would be surprised
      otherwise:
      
      name                     old time/op    new time/op     delta
      BinaryTree17-4              2.64s ± 4%      2.56s ± 9%  -2.92%  (p=0.008 n=9+9)
      Fannkuch11-4                2.90s ± 1%      2.95s ± 3%  +1.76%  (p=0.010 n=10+9)
      FmtFprintfEmpty-4          35.3ns ± 1%     34.5ns ± 2%  -2.34%  (p=0.004 n=9+8)
      FmtFprintfString-4         57.0ns ± 1%     58.4ns ± 5%  +2.52%  (p=0.029 n=9+10)
      FmtFprintfInt-4            59.8ns ± 3%     59.8ns ± 6%    ~     (p=0.565 n=10+10)
      FmtFprintfIntInt-4         93.9ns ± 3%     91.2ns ± 5%  -2.94%  (p=0.014 n=10+9)
      FmtFprintfPrefixedInt-4     107ns ± 6%      104ns ± 6%    ~     (p=0.099 n=10+10)
      FmtFprintfFloat-4           187ns ± 3%      188ns ± 3%    ~     (p=0.505 n=10+9)
      FmtManyArgs-4               410ns ± 1%      415ns ± 6%    ~     (p=0.649 n=8+10)
      GobDecode-4                5.30ms ± 3%     5.27ms ± 3%    ~     (p=0.436 n=10+10)
      GobEncode-4                4.62ms ± 5%     4.47ms ± 2%  -3.24%  (p=0.001 n=9+10)
      Gzip-4                      197ms ± 4%      193ms ± 3%    ~     (p=0.123 n=10+10)
      Gunzip-4                   30.4ms ± 3%     30.1ms ± 3%    ~     (p=0.481 n=10+10)
      HTTPClientServer-4         76.3µs ± 1%     76.0µs ± 1%    ~     (p=0.236 n=8+9)
      JSONEncode-4               10.5ms ± 9%     10.3ms ± 3%    ~     (p=0.280 n=10+10)
      JSONDecode-4               42.3ms ±10%     41.3ms ± 2%    ~     (p=0.053 n=9+10)
      Mandelbrot200-4            3.80ms ± 2%     3.72ms ± 2%  -2.15%  (p=0.001 n=9+10)
      GoParse-4                  2.88ms ±10%     2.81ms ± 2%    ~     (p=0.247 n=10+10)
      RegexpMatchEasy0_32-4      69.5ns ± 4%     68.6ns ± 2%    ~     (p=0.171 n=10+10)
      RegexpMatchEasy0_1K-4       165ns ± 3%      162ns ± 3%    ~     (p=0.137 n=10+10)
      RegexpMatchEasy1_32-4      65.7ns ± 6%     64.4ns ± 2%  -2.02%  (p=0.037 n=10+10)
      RegexpMatchEasy1_1K-4       278ns ± 2%      279ns ± 3%    ~     (p=0.991 n=8+9)
      RegexpMatchMedium_32-4     99.3ns ± 3%     98.5ns ± 4%    ~     (p=0.457 n=10+9)
      RegexpMatchMedium_1K-4     30.1µs ± 1%     30.4µs ± 2%    ~     (p=0.173 n=8+10)
      RegexpMatchHard_32-4       1.40µs ± 2%     1.41µs ± 4%    ~     (p=0.565 n=10+10)
      RegexpMatchHard_1K-4       42.5µs ± 1%     41.5µs ± 3%  -2.13%  (p=0.002 n=8+9)
      Revcomp-4                   332ms ± 4%      328ms ± 5%    ~     (p=0.720 n=9+10)
      Template-4                 48.3ms ± 2%     49.6ms ± 3%  +2.56%  (p=0.002 n=8+10)
      TimeParse-4                 252ns ± 2%      249ns ± 3%    ~     (p=0.116 n=9+10)
      TimeFormat-4                262ns ± 4%      252ns ± 3%  -4.01%  (p=0.000 n=9+10)
      
      name                     old speed      new speed       delta
      GobDecode-4               145MB/s ± 3%    146MB/s ± 3%    ~     (p=0.436 n=10+10)
      GobEncode-4               166MB/s ± 5%    172MB/s ± 2%  +3.28%  (p=0.001 n=9+10)
      Gzip-4                   98.6MB/s ± 4%  100.4MB/s ± 3%    ~     (p=0.123 n=10+10)
      Gunzip-4                  639MB/s ± 3%    645MB/s ± 3%    ~     (p=0.481 n=10+10)
      JSONEncode-4              185MB/s ± 8%    189MB/s ± 3%    ~     (p=0.280 n=10+10)
      JSONDecode-4             46.0MB/s ± 9%   47.0MB/s ± 2%  +2.21%  (p=0.046 n=9+10)
      GoParse-4                20.1MB/s ± 9%   20.6MB/s ± 2%    ~     (p=0.239 n=10+10)
      RegexpMatchEasy0_32-4     460MB/s ± 4%    467MB/s ± 2%    ~     (p=0.165 n=10+10)
      RegexpMatchEasy0_1K-4    6.19GB/s ± 3%   6.28GB/s ± 3%    ~     (p=0.165 n=10+10)
      RegexpMatchEasy1_32-4     487MB/s ± 5%    497MB/s ± 2%  +2.00%  (p=0.043 n=10+10)
      RegexpMatchEasy1_1K-4    3.67GB/s ± 2%   3.67GB/s ± 3%    ~     (p=0.963 n=8+9)
      RegexpMatchMedium_32-4   10.1MB/s ± 3%   10.1MB/s ± 4%    ~     (p=0.435 n=10+9)
      RegexpMatchMedium_1K-4   34.0MB/s ± 1%   33.7MB/s ± 2%    ~     (p=0.173 n=8+10)
      RegexpMatchHard_32-4     22.9MB/s ± 2%   22.7MB/s ± 4%    ~     (p=0.565 n=10+10)
      RegexpMatchHard_1K-4     24.0MB/s ± 3%   24.7MB/s ± 3%  +2.64%  (p=0.001 n=9+9)
      Revcomp-4                 766MB/s ± 4%    775MB/s ± 5%    ~     (p=0.720 n=9+10)
      Template-4               40.2MB/s ± 2%   39.2MB/s ± 3%  -2.47%  (p=0.002 n=8+10)
      
      The rules match ~1800 times during all.bash.
      
      Fixes #18943
      
      Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
      Reviewed-on: https://go-review.googlesource.com/94766
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      79112707
    • isharipo's avatar
      cmd/compile/internal/gc: properly initialize ssa.Func Type field · 3afd2d7f
      isharipo authored
      The ssa.Func has Type field that is described as
      function signature type.
      
      It never gets any value and remains nil.
      This leads to "<T>" signature printed representation.
      
      Given this function declaration:
      	func foo(x int, f func() string) (int, error)
      
      GOSSAFUNC printed it as below:
      	compiling foo
      	foo <T>
      
      After this change:
      	compiling foo
      	foo func(int, func() string) (int, error)
      
      Change-Id: Iec5eec8aac5c76ff184659e30f41b2f5fe86d329
      Reviewed-on: https://go-review.googlesource.com/102375
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      3afd2d7f
    • Matthew Dempsky's avatar
      cmd/compile: always write pack files · ea668e18
      Matthew Dempsky authored
      By always writing out pack files, the object file format can be
      simplified somewhat. In particular, the export data format will no
      longer require escaping, because the pack file provides appropriate
      framing.
      
      This CL does not affect build systems that use -pack, which includes
      all major Go build systems (cmd/go, gb, bazel).
      
      Also, existing package import logic already distinguishes pack/object
      files based on file contents rather than file extension.
      
      The only exception is cmd/pack, which specially handled object files
      created by cmd/compile when used with the 'c' mode. This mode is
      extended to now recognize the pack files produced by cmd/compile and
      handle them as before.
      
      Passes toolstash-check.
      
      Updates #21705.
      Updates #24512.
      
      Change-Id: Idf131013bfebd73a5cde7e087eb19964503a9422
      Reviewed-on: https://go-review.googlesource.com/102236
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      ea668e18
    • Matthew Dempsky's avatar
      cmd/link: skip __.PKGDEF in archives · 699b0d4e
      Matthew Dempsky authored
      The __.PKGDEF file is a compiler object file only intended for other
      compilers. Also, for build systems that use -linkobj, all of the
      information it contains is present within the linker object files
      already, so look for it there instead.
      
      This requires a little bit of code reorganization. Significantly,
      previously when loading an archive file, the __.PKGDEF file was
      authoritative on whether the package was "main" and/or "safe". Now
      that we're using the Go object files instead, there's the issue that
      there can be multiple Go object files in an archive (because when
      using assembly, each assembly file becomes its own additional object
      file).
      
      The solution taken here is to check if any object file within the
      package declares itself as "main" and/or "safe".
      
      Updates #24512.
      
      Change-Id: I70243a293bdf34b8555c0bf1833f8933b2809449
      Reviewed-on: https://go-review.googlesource.com/102281
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      699b0d4e
  2. 23 Mar, 2018 5 commits
  3. 22 Mar, 2018 9 commits
    • Matthew Dempsky's avatar
      cmd/compile: change unsafeUintptrTag from var to const · 50921bfa
      Matthew Dempsky authored
      Change-Id: Ie30878199e24cce5b75428e6b602c017ebd16642
      Reviewed-on: https://go-review.googlesource.com/102175
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      Reviewed-by: default avatarDaniel Martí <mvdan@mvdan.cc>
      50921bfa
    • Adam Langley's avatar
      crypto/x509: follow OpenSSL and emit Extension structures directly in CSRs. · 0b37f05d
      Adam Langley authored
      I don't know if I got lost in the old PKCS documents, or whether this is
      a case where reality diverges from the spec, but OpenSSL clearly stuffs
      PKIX Extension objects in CSR attributues directly[1].
      
      In either case, doing what OpenSSL does seems valid here and allows the
      critical flag in extensions to be serialised.
      
      Fixes #13739.
      
      [1] https://github.com/openssl/openssl/blob/e3713c365c2657236439fea00822a43aa396d112/crypto/x509/x509_req.c#L173
      
      Change-Id: Ic1e73ba9bd383a357a2aa8fc4f6bd76811bbefcc
      Reviewed-on: https://go-review.googlesource.com/70851
      Run-TryBot: Adam Langley <agl@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarFilippo Valsorda <filippo@golang.org>
      0b37f05d
    • Mike Danese's avatar
      crypto/tls: support keying material export · c529141d
      Mike Danese authored
      This change implement keying material export as described in:
      
      https://tools.ietf.org/html/rfc5705
      
      I verified the implementation against openssl s_client and openssl
      s_server.
      
      Change-Id: I4dcdd2fb929c63ab4e92054616beab6dae7b1c55
      Signed-off-by: default avatarMike Danese <mikedanese@google.com>
      Reviewed-on: https://go-review.googlesource.com/85115
      Run-TryBot: Adam Langley <agl@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAdam Langley <agl@golang.org>
      c529141d
    • Daniel Martí's avatar
      cmd/compile: use more range fors in gc · 02798ed9
      Daniel Martí authored
      Slightly simplifies the code. Made sure to exclude the cases that would
      change behavior, such as when the iterated value is a string, when the
      index is modified within the body, or when the slice is modified.
      
      Also checked that all the elements are of pointer type, to avoid the
      corner case where non-pointer types could be copied by mistake.
      
      Change-Id: Iea64feb2a9a6a4c94ada9ff3ace40ee173505849
      Reviewed-on: https://go-review.googlesource.com/100557
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      02798ed9
    • Austin Clements's avatar
      cmd/compile: fix GOEXPERIMENT=preemptibleloops type-checking · 48f990b4
      Austin Clements authored
      This experiment has gone stale. It causes a type-checking failure
      because the condition of the OIF produced by range loop lowering has
      type "untyped bool". Fix this by typechecking the whole OIF statement,
      not just its condition.
      
      This doesn't quite fix the whole experiment, but it gets further.
      Something about preemption point insertion is causing failures like
      "internal compiler error: likeliness prediction 1 for block b10 with 1
      successors" in cmd/compile/internal/gc.
      
      Change-Id: I7d80d618d7c91c338bf5f2a8dc174d582a479df3
      Reviewed-on: https://go-review.googlesource.com/102157
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      48f990b4
    • Travis Bischel's avatar
      cmd/compile: specialize Move up to 79B on amd64 · 4f7b7748
      Travis Bischel authored
      Move currently uses mov instructions directly up to 31 bytes and then
      switches to duffcopy. Moving 31 bytes is 4 instructions corresponding to
      two loads and two stores, (or 6 if !useSSE) depending on the usage,
      duffcopy is five (one or two mov, two or three lea, one call).
      
      This adds direct mov instructions for Move's of size 32, 48, and 64 with
      sse and for only size 32 without.
      With useSSE:
      - 32 is 4 instructions (byte +/- comparison below)
      - 33 thru 48 is 6
      - 49 thru 64 is 8
      
      Without:
      - 32 is 8
      
      Note that the only platform with useSSE set to false is plan 9. I have
      built three projects based off tip and tip with this patch and the
      project's byte size is equal to or less than they were prior.
      
      The basis of this change is that copying data with instructions directly
      is nearly free, whereas calling into duffcopy adds a bit of overhead.
      This is most noticeable in range statements where elements are 32+
      bytes. For code with the following pattern:
      
      func Benchmark32Range(b *testing.B) {
              var f s32
              for _, count := range []int{10, 100, 1000, 10000} {
                      name := strconv.Itoa(count)
                      b.Run(name, func(b *testing.B) {
                              base := make([]s32, count)
                              for i := 0; i < b.N; i++ {
                                      for _, v := range base {
                                              f = v
                                      }
                              }
                      })
              }
              _ = f
      }
      
      These are the resulting benchmarks:
      Benchmark16Range/10-4        19.1          19.1          +0.00%
      Benchmark16Range/100-4       169           170           +0.59%
      Benchmark16Range/1000-4      1684          1691          +0.42%
      Benchmark16Range/10000-4     18147         18124         -0.13%
      Benchmark31Range/10-4        141           142           +0.71%
      Benchmark31Range/100-4       1407          1410          +0.21%
      Benchmark31Range/1000-4      14070         14074         +0.03%
      Benchmark31Range/10000-4     141781        141759        -0.02%
      Benchmark32Range/10-4        71.4          32.2          -54.90%
      Benchmark32Range/100-4       695           326           -53.09%
      Benchmark32Range/1000-4      7166          3313          -53.77%
      Benchmark32Range/10000-4     72571         35425         -51.19%
      Benchmark64Range/10-4        87.8          64.9          -26.08%
      Benchmark64Range/100-4       868           629           -27.53%
      Benchmark64Range/1000-4      9355          6907          -26.17%
      Benchmark64Range/10000-4     94463         70385         -25.49%
      Benchmark79Range/10-4        177           152           -14.12%
      Benchmark79Range/100-4       1769          1531          -13.45%
      Benchmark79Range/1000-4      17893         15532         -13.20%
      Benchmark79Range/10000-4     178947        155551        -13.07%
      Benchmark80Range/10-4        99.6          99.7          +0.10%
      Benchmark80Range/100-4       987           985           -0.20%
      Benchmark80Range/1000-4      10573         10560         -0.12%
      Benchmark80Range/10000-4     106792        106639        -0.14%
      
      For runtime's BenchCopyFat* benchmarks:
      CopyFat8-4     0.40ns ± 0%  0.40ns ± 0%      ~     (all equal)
      CopyFat12-4    0.40ns ± 0%  0.80ns ± 0%  +100.00%  (p=0.000 n=9+9)
      CopyFat16-4    0.40ns ± 0%  0.80ns ± 0%  +100.00%  (p=0.000 n=10+8)
      CopyFat24-4    0.80ns ± 0%  0.40ns ± 0%   -50.00%  (p=0.001 n=8+9)
      CopyFat32-4    2.01ns ± 0%  0.40ns ± 0%   -80.10%  (p=0.000 n=8+8)
      CopyFat64-4    2.87ns ± 0%  0.40ns ± 0%   -86.07%  (p=0.000 n=8+10)
      CopyFat128-4   4.82ns ± 0%  4.82ns ± 0%      ~     (p=1.000 n=8+8)
      CopyFat256-4   8.83ns ± 0%  8.83ns ± 0%      ~     (p=1.000 n=8+8)
      CopyFat512-4   16.9ns ± 0%  16.9ns ± 0%      ~     (all equal)
      CopyFat520-4   14.6ns ± 0%  14.6ns ± 1%      ~     (p=0.529 n=8+9)
      CopyFat1024-4  32.9ns ± 0%  33.0ns ± 0%    +0.20%  (p=0.041 n=8+9)
      
      Function calls are not benefitted as much due how they are compiled, but
      other benchmarks I ran show that calling function with 64 byte elements
      is marginally improved.
      
      The main downside with this change is that it may increase binary sizes
      depending on the size of the copy, but this change also decreases
      binaries for moves of 48 bytes or less.
      
      For the following code:
      package main
      
      type size [32]byte
      
      //go:noinline
      func use(t size) {
      }
      
      //go:noinline
      func get() size {
      	var z size
      	return z
      }
      
      func main() {
      	var a size
      	use(a)
      }
      
      Changing size around gives the following assembly leading up to the call
      (the initialization and actual call are removed):
      
      tip func call with 32B arg: 27B
          48 89 e7                 mov    %rsp,%rdi
          48 8d 74 24 20           lea    0x20(%rsp),%rsi
          48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
          48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
          e8 53 ab ff ff           callq  448964 <runtime.duffcopy+0x364>
          48 8b 6d 00              mov    0x0(%rbp),%rbp
      
      modified: 19B (-8B)
          0f 10 44 24 20           movups 0x20(%rsp),%xmm0
          0f 11 04 24              movups %xmm0,(%rsp)
          0f 10 44 24 30           movups 0x30(%rsp),%xmm0
          0f 11 44 24 10           movups %xmm0,0x10(%rsp)
      -
      tip with 47B arg: 29B
          48 8d 7c 24 0f           lea    0xf(%rsp),%rdi
          48 8d 74 24 40           lea    0x40(%rsp),%rsi
          48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
          48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
          e8 43 ab ff ff           callq  448964 <runtime.duffcopy+0x364>
          48 8b 6d 00              mov    0x0(%rbp),%rbp
      
      modified: 20B (-9B)
          0f 10 44 24 40           movups 0x40(%rsp),%xmm0
          0f 11 44 24 0f           movups %xmm0,0xf(%rsp)
          0f 10 44 24 50           movups 0x50(%rsp),%xmm0
          0f 11 44 24 1f           movups %xmm0,0x1f(%rsp)
      -
      tip with 64B arg: 27B
          48 89 e7                 mov    %rsp,%rdi
          48 8d 74 24 40           lea    0x40(%rsp),%rsi
          48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
          48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
          e8 1f ab ff ff           callq  448948 <runtime.duffcopy+0x348>
          48 8b 6d 00              mov    0x0(%rbp),%rbp
      
      modified: 39B [+12B]
          0f 10 44 24 40           movups 0x40(%rsp),%xmm0
          0f 11 04 24              movups %xmm0,(%rsp)
          0f 10 44 24 50           movups 0x50(%rsp),%xmm0
          0f 11 44 24 10           movups %xmm0,0x10(%rsp)
          0f 10 44 24 60           movups 0x60(%rsp),%xmm0
          0f 11 44 24 20           movups %xmm0,0x20(%rsp)
          0f 10 44 24 70           movups 0x70(%rsp),%xmm0
          0f 11 44 24 30           movups %xmm0,0x30(%rsp)
      -
      tip with 79B arg: 29B
          48 8d 7c 24 0f           lea    0xf(%rsp),%rdi
          48 8d 74 24 60           lea    0x60(%rsp),%rsi
          48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
          48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
          e8 09 ab ff ff           callq  448948 <runtime.duffcopy+0x348>
          48 8b 6d 00              mov    0x0(%rbp),%rbp
      
      modified: 46B [+17B]
          0f 10 44 24 60           movups 0x60(%rsp),%xmm0
          0f 11 44 24 0f           movups %xmm0,0xf(%rsp)
          0f 10 44 24 70           movups 0x70(%rsp),%xmm0
          0f 11 44 24 1f           movups %xmm0,0x1f(%rsp)
          0f 10 84 24 80 00 00     movups 0x80(%rsp),%xmm0
          00
          0f 11 44 24 2f           movups %xmm0,0x2f(%rsp)
          0f 10 84 24 90 00 00     movups 0x90(%rsp),%xmm0
          00
          0f 11 44 24 3f           movups %xmm0,0x3f(%rsp)
      
      So, at best we save 9B, at worst we gain 17. I do not think that copying
      around 65+B sized types is common enough to bloat program sizes. Using
      bincmp on the go binary itself shows a zero byte difference; there are
      gains and losses all over. One of the largest gains in binary size comes
      from cmd/go/internal/cache.(*Cache).Get, which passes around a 64 byte
      sized type -- this is one of the cases I would expect to be benefitted
      by this change.
      
      I think that this marginal improvement in struct copying for 64 byte
      structs is worth it: most data structs / work items I use in my programs
      are small, but few are smaller than 32 bytes: with one slice, the budget
      is up. The 32 rule alone would allow another 16 bytes, the 48 and 64
      rules allow another 32 and 48.
      
      Change-Id: I19a8f9190d5d41825091f17f268f4763bfc12a62
      Reviewed-on: https://go-review.googlesource.com/100718Reviewed-by: default avatarIlya Tocar <ilya.tocar@intel.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      4f7b7748
    • Alberto Donizetti's avatar
      test/codegen: port direct comparisons with memory tests · fc6280d4
      Alberto Donizetti authored
      And remove them from asm_test.
      
      Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454
      Reviewed-on: https://go-review.googlesource.com/102115
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      fc6280d4
    • Carlos Eduardo Seo's avatar
      cmd/compile/internal/ppc64, runtime internal/atomic, sync/atomic: implement... · 6633bb2a
      Carlos Eduardo Seo authored
      cmd/compile/internal/ppc64, runtime internal/atomic, sync/atomic: implement faster atomics for ppc64x
      
      This change implements faster atomics for ppc64x based on the ISA 2.07B,
      Appendix B.2 recommendations, replacing SYNC/ISYNC by LWSYNC in some
      cases.
      
      Updates #21348
      
      name                                           old time/op new time/op    delta
      Cond1-16                                           955ns     856ns      -10.33%
      Cond2-16                                          2.38µs    2.03µs      -14.59%
      Cond4-16                                          5.90µs    5.44µs       -7.88%
      Cond8-16                                          12.1µs    11.1µs       -8.42%
      Cond16-16                                         27.0µs    25.1µs       -7.04%
      Cond32-16                                         59.1µs    55.5µs       -6.14%
      LoadMostlyHits/*sync_test.DeepCopyMap-16          22.1ns    24.1ns       +9.02%
      LoadMostlyHits/*sync_test.RWMutexMap-16            252ns     249ns       -1.20%
      LoadMostlyHits/*sync.Map-16                       16.2ns    16.3ns         ~
      LoadMostlyMisses/*sync_test.DeepCopyMap-16        22.3ns    22.6ns         ~
      LoadMostlyMisses/*sync_test.RWMutexMap-16          249ns     247ns       -0.51%
      LoadMostlyMisses/*sync.Map-16                     12.7ns    12.7ns         ~
      LoadOrStoreBalanced/*sync_test.RWMutexMap-16      1.27µs    1.17µs       -7.54%
      LoadOrStoreBalanced/*sync.Map-16                  1.12µs    1.10µs       -2.35%
      LoadOrStoreUnique/*sync_test.RWMutexMap-16        1.75µs    1.68µs       -3.84%
      LoadOrStoreUnique/*sync.Map-16                    2.07µs    1.97µs       -5.13%
      LoadOrStoreCollision/*sync_test.DeepCopyMap-16    15.8ns    15.9ns         ~
      LoadOrStoreCollision/*sync_test.RWMutexMap-16      496ns     424ns      -14.48%
      LoadOrStoreCollision/*sync.Map-16                 6.07ns    6.07ns         ~
      Range/*sync_test.DeepCopyMap-16                   1.65µs    1.64µs         ~
      Range/*sync_test.RWMutexMap-16                     278µs     288µs       +3.75%
      Range/*sync.Map-16                                2.00µs    2.01µs         ~
      AdversarialAlloc/*sync_test.DeepCopyMap-16        3.45µs    3.44µs         ~
      AdversarialAlloc/*sync_test.RWMutexMap-16          226ns     227ns         ~
      AdversarialAlloc/*sync.Map-16                     1.09µs    1.07µs       -2.36%
      AdversarialDelete/*sync_test.DeepCopyMap-16        553ns     550ns       -0.57%
      AdversarialDelete/*sync_test.RWMutexMap-16         273ns     274ns         ~
      AdversarialDelete/*sync.Map-16                     247ns     249ns         ~
      UncontendedSemaphore-16                           79.0ns    65.5ns      -17.11%
      ContendedSemaphore-16                              112ns      97ns      -13.77%
      MutexUncontended-16                               3.34ns    2.51ns      -24.69%
      Mutex-16                                           266ns     191ns      -28.26%
      MutexSlack-16                                      226ns     159ns      -29.55%
      MutexWork-16                                       377ns     338ns      -10.14%
      MutexWorkSlack-16                                  335ns     308ns       -8.20%
      MutexNoSpin-16                                     196ns     184ns       -5.91%
      MutexSpin-16                                       710ns     666ns       -6.21%
      Once-16                                           1.29ns    1.29ns         ~
      Pool-16                                           8.64ns    8.71ns         ~
      PoolOverflow-16                                   1.60µs    1.44µs      -10.25%
      SemaUncontended-16                                5.39ns    4.42ns      -17.96%
      SemaSyntNonblock-16                                539ns     483ns      -10.42%
      SemaSyntBlock-16                                   413ns     354ns      -14.20%
      SemaWorkNonblock-16                                305ns     258ns      -15.36%
      SemaWorkBlock-16                                   266ns     229ns      -14.06%
      RWMutexUncontended-16                             12.9ns     9.7ns      -24.80%
      RWMutexWrite100-16                                 203ns     147ns      -27.47%
      RWMutexWrite10-16                                  177ns     119ns      -32.74%
      RWMutexWorkWrite100-16                             435ns     403ns       -7.39%
      RWMutexWorkWrite10-16                              642ns     611ns       -4.79%
      WaitGroupUncontended-16                           4.67ns    3.70ns      -20.92%
      WaitGroupAddDone-16                                402ns     355ns      -11.54%
      WaitGroupAddDoneWork-16                            208ns     250ns      +20.09%
      WaitGroupWait-16                                  1.21ns    1.21ns         ~
      WaitGroupWaitWork-16                              5.91ns    5.87ns       -0.81%
      WaitGroupActuallyWait-16                          92.2ns    85.8ns       -6.91%
      
      Updates #21348
      
      Change-Id: Ibb9b271d11b308264103829e176c6d9fe8f867d3
      Reviewed-on: https://go-review.googlesource.com/95175
      Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarLynn Boger <laboger@linux.vnet.ibm.com>
      6633bb2a
    • Giovanni Bajo's avatar
      doc: first version of new contribute guide · a3d83269
      Giovanni Bajo authored
      I've reorganized the guide and rewritten large sections.
      
      The structure is now more clear and logical, and can
      be understood and navigated using the summary displayed at
      the top of the page (before, the summary was confusing because
      the guide contained H1s that were being ignored by the summary).
      
      Both the initial onboarding process and the Gerrit
      change submission process have been reworked to
      include a concise checklist of steps that can be
      read and understood in a few seconds, for people
      that don't want or need to bother with details.
      More in-depth descriptions have been moved into
      separate sections, one per each checklist step.
      This is by far the biggest improvement, as the previous
      approach of having to read several pages just to understand
      the requires steps was very scaring for beginners, in
      addition of being harder to navigate.
      
      GitHub pull requests have been integrated as a different
      way to submit a change, suggested for first time contributors.
      
      The review process has been described in more details,
      documenting the workflow and the used conventions.
      
      Most miscellanea have been moved into an "advanced
      topics" chapter.
      
      Paragraphs describing how to use git have been removed
      to simplify reading. This guide should focus on Go contribution,
      and not help users getting familiar with git, for which many
      guides are available.
      
      Change-Id: I6f4b76583c9878b230ba1d0225745a1708fad2e8
      Reviewed-on: https://go-review.googlesource.com/93495Reviewed-by: default avatarRob Pike <r@golang.org>
      a3d83269
  4. 21 Mar, 2018 7 commits
    • Ilya Tocar's avatar
      compress/bzip2: remove bit-tricks · 9eb21948
      Ilya Tocar authored
      Since compiler is now able to generate conditional moves, we can replace
      bit-tricks with simple if/else. This even results in slightly better performance:
      
      name            old time/op    new time/op    delta
      DecodeDigits-6    13.4ms ± 4%    13.0ms ± 2%  -2.63%  (p=0.003 n=10+10)
      DecodeTwain-6     37.5ms ± 1%    36.3ms ± 1%  -3.03%  (p=0.000 n=10+9)
      DecodeRand-6      4.23ms ± 1%    4.07ms ± 1%  -3.67%  (p=0.000 n=10+9)
      
      name            old speed      new speed      delta
      DecodeDigits-6  7.47MB/s ± 4%  7.67MB/s ± 2%  +2.69%  (p=0.002 n=10+10)
      DecodeTwain-6   10.4MB/s ± 1%  10.7MB/s ± 1%  +3.25%  (p=0.000 n=10+8)
      DecodeRand-6    3.87MB/s ± 1%  4.03MB/s ± 2%  +4.08%  (p=0.000 n=10+10)
      diff --git a/src/compress/bzip2/huffman.go b/src/compress/bzip2/huffman.go
      
      Change-Id: Ie96ef1a9e07013b07e78f22cdccd531f3341caca
      Reviewed-on: https://go-review.googlesource.com/102015
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      Reviewed-by: default avatarJoe Tsai <joetsai@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      9eb21948
    • Tim Wright's avatar
      all: enable c-shared/c-archive support for freebsd/amd64 · 88129f0c
      Tim Wright authored
      Fixes #14327
      Much of the code is based on the linux/amd64 code that implements these
      build modes, and code is shared where possible.
      
      Change-Id: Ia510f2023768c0edbc863aebc585929ec593b332
      Reviewed-on: https://go-review.googlesource.com/93875
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      88129f0c
    • isharipo's avatar
      runtime,sync/atomic: replace asm BYTEs with insts for x86 · ff5cf43d
      isharipo authored
      For each replacement, test case is added to new 386enc.s file
      with exception of EMMS, SYSENTER, MFENCE and LFENCE as they
      are already covered in amd64enc.s (same on amd64 and 386).
      
      The replacement became less obvious after go vet suggested changes
      Before:
      	BYTE $0x0f; BYTE $0x7f; BYTE $0x44; BYTE $0x24; BYTE $0x08
      Changed to MOVQ (this form is being tested):
      	MOVQ M0, 8(SP)
      Refactored to FP-relative access (go vet advice):
      	MOVQ M0, val+4(FP)
      
      Change-Id: I56b87cf3371b6ad81ad0cd9db2033aee407b5818
      Reviewed-on: https://go-review.googlesource.com/101475
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIlya Tocar <ilya.tocar@intel.com>
      ff5cf43d
    • Ross Light's avatar
      net/url: fix contradiction in PathUnescape docs · 65727ab5
      Ross Light authored
      Change-Id: If35e3faa738c5d7d72cf77d14b276690579180a1
      Reviewed-on: https://go-review.googlesource.com/101921
      Run-TryBot: Ross Light <light@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      65727ab5
    • Tobias Klauser's avatar
      runtime: parse auxv on freebsd · 2e84dc25
      Tobias Klauser authored
      Decode AT_PAGESZ to determine physPageSize on freebsd/{386,amd64,arm}
      and AT_HWCAP for hwcap and hardDiv on freebsd/arm. Also use hwcap to
      perform the FP checks in checkgoarm akin to the linux/arm
      implementation.
      
      Change-Id: I532810a1581efe66277e4305cb234acdc79ee91e
      Reviewed-on: https://go-review.googlesource.com/99780
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      2e84dc25
    • Daniel Martí's avatar
      cmd/doc: use empty GOPATH when running the tests · 77c3ef6f
      Daniel Martí authored
      Otherwise, a populated GOPATH might result in failures such as:
      
      	$ go test
      	[...] no buildable Go source files in [...]/gopherjs/compiler/natives/src/crypto/rand
      	exit status 1
      
      Move the initialization of the dirs walker out of the init func, so that
      we can control its behavior in the tests.
      
      Updates #24464.
      
      Change-Id: I4b26a7d3d6809bdd8e9b6b0556d566e7855f80fe
      Reviewed-on: https://go-review.googlesource.com/101836
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      77c3ef6f
    • Alberto Donizetti's avatar
      cmd/trace: remove unused variable in tests · 041c5d83
      Alberto Donizetti authored
      Unused variables in closures are currently not diagnosed by the
      compiler (this is Issue #3059), while go/types catches them.
      
      One unused variable in the cmd/trace tests is causing the go/types
      test that typechecks the whole standard library to fail:
      
        FAIL: TestStdlib (8.05s)
          stdlib_test.go:223: cmd/trace/annotations_test.go:241:6: gcTime
          declared but not used
        FAIL
      
      Remove it.
      
      Updates #24464
      
      Change-Id: I0f1b9db6ae1f0130616ee649bdbfdc91e38d2184
      Reviewed-on: https://go-review.googlesource.com/101815Reviewed-by: default avatarDaniel Martí <mvdan@mvdan.cc>
      041c5d83
  5. 20 Mar, 2018 12 commits
    • Hiroshi Ioka's avatar
      go/internal/srcimporter: simplify and fix package file lookup · 5f0a9ba1
      Hiroshi Ioka authored
      The old code was a blend of (copied) code that existed before go/build,
      and incorrect adjustments made when go/build was introduced. This change
      leaves package path determination entirely to go/build and in the process
      fixes issues with relative import paths.
      
      Fixes #23092
      Fixes #24392
      
      Change-Id: I9e900538b365398751bace56964495c5440ac4ae
      Reviewed-on: https://go-review.googlesource.com/83415
      Run-TryBot: Robert Griesemer <gri@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
      5f0a9ba1
    • Paul Querna's avatar
      net/http: remove extraneous call to VerifyHostname · 2638001e
      Paul Querna authored
      VerifyHostname is called by tls.Conn during Handshake and does not need to be called explicitly.
      
      Change-Id: I22b7fa137e76bb4be3d0018813a571acfb882219
      Reviewed-on: https://go-review.googlesource.com/98618
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarFilippo Valsorda <filippo@golang.org>
      2638001e
    • Adam Langley's avatar
      crypto/x509: support the PSS certificates that OpenSSL 1.1.0 generates. · 8a151924
      Adam Langley authored
      It serialises optional parameters as empty rather than NULL. It's
      probably technically correct, although ASN.1 has a long history of doing
      this different ways.
      
      But OpenSSL is likely common enough that we want to support this
      encoding.
      
      Fixes #23847
      
      Change-Id: I81c60f0996edfecf59467dfdf75b0cf8ba7b1efb
      Reviewed-on: https://go-review.googlesource.com/96417Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarFilippo Valsorda <filippo@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      8a151924
    • Ilya Tocar's avatar
      cmd/compile/internal/ssa: update regalloc in loops · 983dcf70
      Ilya Tocar authored
      Currently we don't lift spill out of loop if loop contains call.
      However often we have code like this:
      
      for .. {
          if hard_case {
      	call()
          }
          // simple case, without call
      }
      
      So instead of checking for any call, check for unavoidable call.
      For #22698 cases I see:
      mime/quotedprintable/Writer-6                   10.9µs ± 4%      9.2µs ± 3%   -15.02%  (p=0.000 n=8+8)
      And:
      compress/flate/Encode/Twain/Huffman/1e4-6       99.4µs ± 6%     90.9µs ± 0%    -8.57%  (p=0.000 n=8+8)
      compress/flate/Encode/Twain/Huffman/1e5-6       760µs ± 1%      725µs ± 1%     -4.56%  (p=0.000 n=8+8)
      compress/flate/Encode/Twain/Huffman/1e6-6       7.55ms ± 0%      7.24ms ± 0%     -4.07%  (p=0.000 n=8+7)
      
      There are no significant changes on go1 benchmarks.
      But for cases with runtime arch checks, where we call generic version on old hardware,
      there are respectable performance gains:
      math/RoundToEven-6                             1.43ns ± 0%     1.25ns ± 0%   -12.59%  (p=0.001 n=7+7)
      math/bits/OnesCount64-6                        1.60ns ± 1%     1.42ns ± 1%   -11.32%  (p=0.000 n=8+8)
      
      Also on some runtime benchmarks loops have less loads and higher performance:
      runtime/RuneIterate/range1/ASCII-6             15.6ns ± 1%     13.9ns ± 1%   -10.74%  (p=0.000 n=7+8)
      runtime/ArrayEqual-6                           3.22ns ± 0%     2.86ns ± 2%   -11.06%  (p=0.000 n=7+8)
      
      Fixes #22698
      Updates #22234
      
      Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a
      Reviewed-on: https://go-review.googlesource.com/84055
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      983dcf70
    • Alberto Donizetti's avatar
      test/codegen: port comparisons tests to codegen · be371edd
      Alberto Donizetti authored
      And delete them from asm_test.
      
      Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab
      Reviewed-on: https://go-review.googlesource.com/101595
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      be371edd
    • Than McIntosh's avatar
      cmd/compile: fix regression in DWARF inlined routine variable tracking · f45c07e8
      Than McIntosh authored
      Fix a bug in the code that generates the pre-inlined variable
      declaration table used as raw material for emitting DWARF inline
      routine records. The fix for issue 23704 altered the recipe for
      assigning file/line/col to variables in one part of the compiler, but
      didn't update a similar recipe in the code for variable tracking.
      Added a new test that should catch problems of a similar nature.
      
      Fixes #24460.
      
      Change-Id: I255c036637f4151aa579c0e21d123fd413724d61
      Reviewed-on: https://go-review.googlesource.com/101676Reviewed-by: default avatarAlessandro Arzilli <alessandro.arzilli@gmail.com>
      Reviewed-by: default avatarHeschi Kreinick <heschi@google.com>
      f45c07e8
    • Michael Munday's avatar
      cmd/compile: mark LAA and LAAG as clobbering flags on s390x · ae10914e
      Michael Munday authored
      The atomic add instructions modify the condition code and so need to
      be marked as clobbering flags.
      
      Fixes #24449.
      
      Change-Id: Ic69c8d775fbdbfb2a56c5e0cfca7a49c0d7f6897
      Reviewed-on: https://go-review.googlesource.com/101455
      Run-TryBot: Michael Munday <mike.munday@ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      ae10914e
    • Fangming.Fang's avatar
      cmd/asm: fix bug about VMOV instruction (move a vector element to another) on ARM64 · 9c312245
      Fangming.Fang authored
      This change fixes index error when encoding VMOV instruction which pattern
      is vmov Vn.<T>[index], Vd.<T>[index]
      
      Change-Id: I949166e6dfd63fb0a9365f183b6c50d452614f9d
      Reviewed-on: https://go-review.googlesource.com/101335Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      9c312245
    • Fangming.Fang's avatar
      cmd/asm: fix bug about VMOV instruction (move register to vector element) on ARM64 · 7673e305
      Fangming.Fang authored
      This change fixes index error when encoding VMOV instruction which pattern is
      VMOV Rn, V.<T>[index]. For example VMOV R1, V1.S[1] is assembled as VMOV R1, V1.S[0]
      
      Fixes #24400
      Change-Id: I82b5edc8af4e06862bc4692b119697c6bb7dc3fb
      Reviewed-on: https://go-review.googlesource.com/101297Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      7673e305
    • Vladimir Kuzmin's avatar
      cmd/compile: avoid mapaccess at m[k]=append(m[k].. · c12b185a
      Vladimir Kuzmin authored
      Currently rvalue m[k] is transformed during walk into:
      
              tmp1 := *mapaccess(m, k)
              tmp2 := append(tmp1, ...)
              *mapassign(m, k) = tmp2
      
      However, this is suboptimal, as we could instead produce just:
              tmp := mapassign(m, k)
              *tmp := append(*tmp, ...)
      
      Optimization is possible only if during Order it may tell that m[k] is
      exactly the same at left and right part of assignment. It doesn't work:
      1) m[f(k)] = append(m[f(k)], ...)
      2) sink, m[k] = sink, append(m[k]...)
      3) m[k] = append(..., m[k],...)
      
      Benchmark:
      name                           old time/op    new time/op    delta
      MapAppendAssign/Int32/256-8      33.5ns ± 3%    22.4ns ±10%  -33.24%  (p=0.000 n=16+18)
      MapAppendAssign/Int32/65536-8    68.2ns ± 6%    48.5ns ±29%  -28.90%  (p=0.000 n=20+20)
      MapAppendAssign/Int64/256-8      34.3ns ± 4%    23.3ns ± 5%  -32.23%  (p=0.000 n=17+18)
      MapAppendAssign/Int64/65536-8    65.9ns ± 7%    61.2ns ±19%   -7.06%  (p=0.002 n=18+20)
      MapAppendAssign/Str/256-8         116ns ±12%      79ns ±16%  -31.70%  (p=0.000 n=20+19)
      MapAppendAssign/Str/65536-8       134ns ±15%     111ns ±45%  -16.95%  (p=0.000 n=19+20)
      
      name                           old alloc/op   new alloc/op   delta
      MapAppendAssign/Int32/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=19+18)
      MapAppendAssign/Int32/65536-8     27.0B ± 0%     20.7B ±30%  -23.33%  (p=0.000 n=20+20)
      MapAppendAssign/Int64/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=20+17)
      MapAppendAssign/Int64/65536-8     27.0B ± 0%     27.0B ± 0%     ~     (all equal)
      MapAppendAssign/Str/256-8         94.0B ± 0%     78.0B ± 0%  -17.02%  (p=0.000 n=20+16)
      MapAppendAssign/Str/65536-8       54.0B ± 0%     54.0B ± 0%     ~     (all equal)
      
      Fixes #24364
      Updates #5147
      
      Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409
      Reviewed-on: https://go-review.googlesource.com/100838Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c12b185a
    • Cherry Zhang's avatar
      Revert "bytes: add optimized Compare for arm64" · e22d2413
      Cherry Zhang authored
      This reverts commit bfa8b6f8.
      
      Reason for revert: This depends on another CL which is not yet submitted.
      
      Change-Id: I50e7594f1473c911a2079fe910849a6694ac6c07
      Reviewed-on: https://go-review.googlesource.com/101496Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      e22d2413
    • fanzha02's avatar
      bytes: add optimized Compare for arm64 · bfa8b6f8
      fanzha02 authored
      Use LDP instructions to load 16 bytes per loop when the source length is long. Specially
      process the 8 bytes length, 4 bytes length and 2 bytes length to get a better performance.
      
      Benchmark result:
      name                           old time/op   new time/op    delta
      BytesCompare/1-8                21.0ns ± 0%    10.5ns ± 0%      ~     (p=0.079 n=4+5)
      BytesCompare/2-8                11.5ns ± 0%    10.5ns ± 0%    -8.70%  (p=0.008 n=5+5)
      BytesCompare/4-8                13.5ns ± 0%    10.0ns ± 0%   -25.93%  (p=0.008 n=5+5)
      BytesCompare/8-8                28.8ns ± 0%     9.5ns ± 0%      ~     (p=0.079 n=4+5)
      BytesCompare/16-8               40.5ns ± 0%    10.5ns ± 0%   -74.07%  (p=0.008 n=5+5)
      BytesCompare/32-8               64.6ns ± 0%    12.5ns ± 0%   -80.65%  (p=0.008 n=5+5)
      BytesCompare/64-8                112ns ± 0%      16ns ± 0%   -85.27%  (p=0.008 n=5+5)
      BytesCompare/128-8               208ns ± 0%      24ns ± 0%   -88.22%  (p=0.008 n=5+5)
      BytesCompare/256-8               400ns ± 0%      50ns ± 0%   -87.62%  (p=0.008 n=5+5)
      BytesCompare/512-8               785ns ± 0%      82ns ± 0%   -89.61%  (p=0.008 n=5+5)
      BytesCompare/1024-8             1.55µs ± 0%    0.14µs ± 0%      ~     (p=0.079 n=4+5)
      BytesCompare/2048-8             3.09µs ± 0%    0.27µs ± 0%      ~     (p=0.079 n=4+5)
      CompareBytesEqual-8             39.0ns ± 0%    12.0ns ± 0%   -69.23%  (p=0.008 n=5+5)
      CompareBytesToNil-8             8.57ns ± 5%    8.23ns ± 2%    -3.99%  (p=0.016 n=5+5)
      CompareBytesEmpty-8             7.37ns ± 0%    7.36ns ± 4%      ~     (p=0.690 n=5+5)
      CompareBytesIdentical-8         7.39ns ± 0%    7.46ns ± 2%      ~     (p=0.667 n=5+5)
      CompareBytesSameLength-8        17.0ns ± 0%    10.5ns ± 0%   -38.24%  (p=0.008 n=5+5)
      CompareBytesDifferentLength-8   17.0ns ± 0%    10.5ns ± 0%   -38.24%  (p=0.008 n=5+5)
      CompareBytesBigUnaligned-8      1.58ms ± 0%    0.19ms ± 0%   -88.31%  (p=0.016 n=4+5)
      CompareBytesBig-8               1.59ms ± 0%    0.19ms ± 0%   -88.27%  (p=0.016 n=5+4)
      CompareBytesBigIdentical-8      7.01ns ± 0%    6.60ns ± 3%    -5.91%  (p=0.008 n=5+5)
      
      name                           old speed     new speed      delta
      CompareBytesBigUnaligned-8     662MB/s ± 0%  5660MB/s ± 0%  +755.15%  (p=0.016 n=4+5)
      CompareBytesBig-8              661MB/s ± 0%  5636MB/s ± 0%  +752.57%  (p=0.016 n=5+4)
      CompareBytesBigIdentical-8     150TB/s ± 0%   159TB/s ± 3%    +6.27%  (p=0.008 n=5+5)
      
      Change-Id: I70332de06f873df3bc12c4a5af1028307b670046
      Reviewed-on: https://go-review.googlesource.com/90175Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      bfa8b6f8