1. 22 Mar, 2018 4 commits
    • Travis Bischel's avatar
      cmd/compile: specialize Move up to 79B on amd64 · 4f7b7748
      Travis Bischel authored
      Move currently uses mov instructions directly up to 31 bytes and then
      switches to duffcopy. Moving 31 bytes is 4 instructions corresponding to
      two loads and two stores, (or 6 if !useSSE) depending on the usage,
      duffcopy is five (one or two mov, two or three lea, one call).
      
      This adds direct mov instructions for Move's of size 32, 48, and 64 with
      sse and for only size 32 without.
      With useSSE:
      - 32 is 4 instructions (byte +/- comparison below)
      - 33 thru 48 is 6
      - 49 thru 64 is 8
      
      Without:
      - 32 is 8
      
      Note that the only platform with useSSE set to false is plan 9. I have
      built three projects based off tip and tip with this patch and the
      project's byte size is equal to or less than they were prior.
      
      The basis of this change is that copying data with instructions directly
      is nearly free, whereas calling into duffcopy adds a bit of overhead.
      This is most noticeable in range statements where elements are 32+
      bytes. For code with the following pattern:
      
      func Benchmark32Range(b *testing.B) {
              var f s32
              for _, count := range []int{10, 100, 1000, 10000} {
                      name := strconv.Itoa(count)
                      b.Run(name, func(b *testing.B) {
                              base := make([]s32, count)
                              for i := 0; i < b.N; i++ {
                                      for _, v := range base {
                                              f = v
                                      }
                              }
                      })
              }
              _ = f
      }
      
      These are the resulting benchmarks:
      Benchmark16Range/10-4        19.1          19.1          +0.00%
      Benchmark16Range/100-4       169           170           +0.59%
      Benchmark16Range/1000-4      1684          1691          +0.42%
      Benchmark16Range/10000-4     18147         18124         -0.13%
      Benchmark31Range/10-4        141           142           +0.71%
      Benchmark31Range/100-4       1407          1410          +0.21%
      Benchmark31Range/1000-4      14070         14074         +0.03%
      Benchmark31Range/10000-4     141781        141759        -0.02%
      Benchmark32Range/10-4        71.4          32.2          -54.90%
      Benchmark32Range/100-4       695           326           -53.09%
      Benchmark32Range/1000-4      7166          3313          -53.77%
      Benchmark32Range/10000-4     72571         35425         -51.19%
      Benchmark64Range/10-4        87.8          64.9          -26.08%
      Benchmark64Range/100-4       868           629           -27.53%
      Benchmark64Range/1000-4      9355          6907          -26.17%
      Benchmark64Range/10000-4     94463         70385         -25.49%
      Benchmark79Range/10-4        177           152           -14.12%
      Benchmark79Range/100-4       1769          1531          -13.45%
      Benchmark79Range/1000-4      17893         15532         -13.20%
      Benchmark79Range/10000-4     178947        155551        -13.07%
      Benchmark80Range/10-4        99.6          99.7          +0.10%
      Benchmark80Range/100-4       987           985           -0.20%
      Benchmark80Range/1000-4      10573         10560         -0.12%
      Benchmark80Range/10000-4     106792        106639        -0.14%
      
      For runtime's BenchCopyFat* benchmarks:
      CopyFat8-4     0.40ns ± 0%  0.40ns ± 0%      ~     (all equal)
      CopyFat12-4    0.40ns ± 0%  0.80ns ± 0%  +100.00%  (p=0.000 n=9+9)
      CopyFat16-4    0.40ns ± 0%  0.80ns ± 0%  +100.00%  (p=0.000 n=10+8)
      CopyFat24-4    0.80ns ± 0%  0.40ns ± 0%   -50.00%  (p=0.001 n=8+9)
      CopyFat32-4    2.01ns ± 0%  0.40ns ± 0%   -80.10%  (p=0.000 n=8+8)
      CopyFat64-4    2.87ns ± 0%  0.40ns ± 0%   -86.07%  (p=0.000 n=8+10)
      CopyFat128-4   4.82ns ± 0%  4.82ns ± 0%      ~     (p=1.000 n=8+8)
      CopyFat256-4   8.83ns ± 0%  8.83ns ± 0%      ~     (p=1.000 n=8+8)
      CopyFat512-4   16.9ns ± 0%  16.9ns ± 0%      ~     (all equal)
      CopyFat520-4   14.6ns ± 0%  14.6ns ± 1%      ~     (p=0.529 n=8+9)
      CopyFat1024-4  32.9ns ± 0%  33.0ns ± 0%    +0.20%  (p=0.041 n=8+9)
      
      Function calls are not benefitted as much due how they are compiled, but
      other benchmarks I ran show that calling function with 64 byte elements
      is marginally improved.
      
      The main downside with this change is that it may increase binary sizes
      depending on the size of the copy, but this change also decreases
      binaries for moves of 48 bytes or less.
      
      For the following code:
      package main
      
      type size [32]byte
      
      //go:noinline
      func use(t size) {
      }
      
      //go:noinline
      func get() size {
      	var z size
      	return z
      }
      
      func main() {
      	var a size
      	use(a)
      }
      
      Changing size around gives the following assembly leading up to the call
      (the initialization and actual call are removed):
      
      tip func call with 32B arg: 27B
          48 89 e7                 mov    %rsp,%rdi
          48 8d 74 24 20           lea    0x20(%rsp),%rsi
          48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
          48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
          e8 53 ab ff ff           callq  448964 <runtime.duffcopy+0x364>
          48 8b 6d 00              mov    0x0(%rbp),%rbp
      
      modified: 19B (-8B)
          0f 10 44 24 20           movups 0x20(%rsp),%xmm0
          0f 11 04 24              movups %xmm0,(%rsp)
          0f 10 44 24 30           movups 0x30(%rsp),%xmm0
          0f 11 44 24 10           movups %xmm0,0x10(%rsp)
      -
      tip with 47B arg: 29B
          48 8d 7c 24 0f           lea    0xf(%rsp),%rdi
          48 8d 74 24 40           lea    0x40(%rsp),%rsi
          48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
          48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
          e8 43 ab ff ff           callq  448964 <runtime.duffcopy+0x364>
          48 8b 6d 00              mov    0x0(%rbp),%rbp
      
      modified: 20B (-9B)
          0f 10 44 24 40           movups 0x40(%rsp),%xmm0
          0f 11 44 24 0f           movups %xmm0,0xf(%rsp)
          0f 10 44 24 50           movups 0x50(%rsp),%xmm0
          0f 11 44 24 1f           movups %xmm0,0x1f(%rsp)
      -
      tip with 64B arg: 27B
          48 89 e7                 mov    %rsp,%rdi
          48 8d 74 24 40           lea    0x40(%rsp),%rsi
          48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
          48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
          e8 1f ab ff ff           callq  448948 <runtime.duffcopy+0x348>
          48 8b 6d 00              mov    0x0(%rbp),%rbp
      
      modified: 39B [+12B]
          0f 10 44 24 40           movups 0x40(%rsp),%xmm0
          0f 11 04 24              movups %xmm0,(%rsp)
          0f 10 44 24 50           movups 0x50(%rsp),%xmm0
          0f 11 44 24 10           movups %xmm0,0x10(%rsp)
          0f 10 44 24 60           movups 0x60(%rsp),%xmm0
          0f 11 44 24 20           movups %xmm0,0x20(%rsp)
          0f 10 44 24 70           movups 0x70(%rsp),%xmm0
          0f 11 44 24 30           movups %xmm0,0x30(%rsp)
      -
      tip with 79B arg: 29B
          48 8d 7c 24 0f           lea    0xf(%rsp),%rdi
          48 8d 74 24 60           lea    0x60(%rsp),%rsi
          48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
          48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
          e8 09 ab ff ff           callq  448948 <runtime.duffcopy+0x348>
          48 8b 6d 00              mov    0x0(%rbp),%rbp
      
      modified: 46B [+17B]
          0f 10 44 24 60           movups 0x60(%rsp),%xmm0
          0f 11 44 24 0f           movups %xmm0,0xf(%rsp)
          0f 10 44 24 70           movups 0x70(%rsp),%xmm0
          0f 11 44 24 1f           movups %xmm0,0x1f(%rsp)
          0f 10 84 24 80 00 00     movups 0x80(%rsp),%xmm0
          00
          0f 11 44 24 2f           movups %xmm0,0x2f(%rsp)
          0f 10 84 24 90 00 00     movups 0x90(%rsp),%xmm0
          00
          0f 11 44 24 3f           movups %xmm0,0x3f(%rsp)
      
      So, at best we save 9B, at worst we gain 17. I do not think that copying
      around 65+B sized types is common enough to bloat program sizes. Using
      bincmp on the go binary itself shows a zero byte difference; there are
      gains and losses all over. One of the largest gains in binary size comes
      from cmd/go/internal/cache.(*Cache).Get, which passes around a 64 byte
      sized type -- this is one of the cases I would expect to be benefitted
      by this change.
      
      I think that this marginal improvement in struct copying for 64 byte
      structs is worth it: most data structs / work items I use in my programs
      are small, but few are smaller than 32 bytes: with one slice, the budget
      is up. The 32 rule alone would allow another 16 bytes, the 48 and 64
      rules allow another 32 and 48.
      
      Change-Id: I19a8f9190d5d41825091f17f268f4763bfc12a62
      Reviewed-on: https://go-review.googlesource.com/100718Reviewed-by: default avatarIlya Tocar <ilya.tocar@intel.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      4f7b7748
    • Alberto Donizetti's avatar
      test/codegen: port direct comparisons with memory tests · fc6280d4
      Alberto Donizetti authored
      And remove them from asm_test.
      
      Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454
      Reviewed-on: https://go-review.googlesource.com/102115
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      fc6280d4
    • Carlos Eduardo Seo's avatar
      cmd/compile/internal/ppc64, runtime internal/atomic, sync/atomic: implement... · 6633bb2a
      Carlos Eduardo Seo authored
      cmd/compile/internal/ppc64, runtime internal/atomic, sync/atomic: implement faster atomics for ppc64x
      
      This change implements faster atomics for ppc64x based on the ISA 2.07B,
      Appendix B.2 recommendations, replacing SYNC/ISYNC by LWSYNC in some
      cases.
      
      Updates #21348
      
      name                                           old time/op new time/op    delta
      Cond1-16                                           955ns     856ns      -10.33%
      Cond2-16                                          2.38µs    2.03µs      -14.59%
      Cond4-16                                          5.90µs    5.44µs       -7.88%
      Cond8-16                                          12.1µs    11.1µs       -8.42%
      Cond16-16                                         27.0µs    25.1µs       -7.04%
      Cond32-16                                         59.1µs    55.5µs       -6.14%
      LoadMostlyHits/*sync_test.DeepCopyMap-16          22.1ns    24.1ns       +9.02%
      LoadMostlyHits/*sync_test.RWMutexMap-16            252ns     249ns       -1.20%
      LoadMostlyHits/*sync.Map-16                       16.2ns    16.3ns         ~
      LoadMostlyMisses/*sync_test.DeepCopyMap-16        22.3ns    22.6ns         ~
      LoadMostlyMisses/*sync_test.RWMutexMap-16          249ns     247ns       -0.51%
      LoadMostlyMisses/*sync.Map-16                     12.7ns    12.7ns         ~
      LoadOrStoreBalanced/*sync_test.RWMutexMap-16      1.27µs    1.17µs       -7.54%
      LoadOrStoreBalanced/*sync.Map-16                  1.12µs    1.10µs       -2.35%
      LoadOrStoreUnique/*sync_test.RWMutexMap-16        1.75µs    1.68µs       -3.84%
      LoadOrStoreUnique/*sync.Map-16                    2.07µs    1.97µs       -5.13%
      LoadOrStoreCollision/*sync_test.DeepCopyMap-16    15.8ns    15.9ns         ~
      LoadOrStoreCollision/*sync_test.RWMutexMap-16      496ns     424ns      -14.48%
      LoadOrStoreCollision/*sync.Map-16                 6.07ns    6.07ns         ~
      Range/*sync_test.DeepCopyMap-16                   1.65µs    1.64µs         ~
      Range/*sync_test.RWMutexMap-16                     278µs     288µs       +3.75%
      Range/*sync.Map-16                                2.00µs    2.01µs         ~
      AdversarialAlloc/*sync_test.DeepCopyMap-16        3.45µs    3.44µs         ~
      AdversarialAlloc/*sync_test.RWMutexMap-16          226ns     227ns         ~
      AdversarialAlloc/*sync.Map-16                     1.09µs    1.07µs       -2.36%
      AdversarialDelete/*sync_test.DeepCopyMap-16        553ns     550ns       -0.57%
      AdversarialDelete/*sync_test.RWMutexMap-16         273ns     274ns         ~
      AdversarialDelete/*sync.Map-16                     247ns     249ns         ~
      UncontendedSemaphore-16                           79.0ns    65.5ns      -17.11%
      ContendedSemaphore-16                              112ns      97ns      -13.77%
      MutexUncontended-16                               3.34ns    2.51ns      -24.69%
      Mutex-16                                           266ns     191ns      -28.26%
      MutexSlack-16                                      226ns     159ns      -29.55%
      MutexWork-16                                       377ns     338ns      -10.14%
      MutexWorkSlack-16                                  335ns     308ns       -8.20%
      MutexNoSpin-16                                     196ns     184ns       -5.91%
      MutexSpin-16                                       710ns     666ns       -6.21%
      Once-16                                           1.29ns    1.29ns         ~
      Pool-16                                           8.64ns    8.71ns         ~
      PoolOverflow-16                                   1.60µs    1.44µs      -10.25%
      SemaUncontended-16                                5.39ns    4.42ns      -17.96%
      SemaSyntNonblock-16                                539ns     483ns      -10.42%
      SemaSyntBlock-16                                   413ns     354ns      -14.20%
      SemaWorkNonblock-16                                305ns     258ns      -15.36%
      SemaWorkBlock-16                                   266ns     229ns      -14.06%
      RWMutexUncontended-16                             12.9ns     9.7ns      -24.80%
      RWMutexWrite100-16                                 203ns     147ns      -27.47%
      RWMutexWrite10-16                                  177ns     119ns      -32.74%
      RWMutexWorkWrite100-16                             435ns     403ns       -7.39%
      RWMutexWorkWrite10-16                              642ns     611ns       -4.79%
      WaitGroupUncontended-16                           4.67ns    3.70ns      -20.92%
      WaitGroupAddDone-16                                402ns     355ns      -11.54%
      WaitGroupAddDoneWork-16                            208ns     250ns      +20.09%
      WaitGroupWait-16                                  1.21ns    1.21ns         ~
      WaitGroupWaitWork-16                              5.91ns    5.87ns       -0.81%
      WaitGroupActuallyWait-16                          92.2ns    85.8ns       -6.91%
      
      Updates #21348
      
      Change-Id: Ibb9b271d11b308264103829e176c6d9fe8f867d3
      Reviewed-on: https://go-review.googlesource.com/95175
      Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarLynn Boger <laboger@linux.vnet.ibm.com>
      6633bb2a
    • Giovanni Bajo's avatar
      doc: first version of new contribute guide · a3d83269
      Giovanni Bajo authored
      I've reorganized the guide and rewritten large sections.
      
      The structure is now more clear and logical, and can
      be understood and navigated using the summary displayed at
      the top of the page (before, the summary was confusing because
      the guide contained H1s that were being ignored by the summary).
      
      Both the initial onboarding process and the Gerrit
      change submission process have been reworked to
      include a concise checklist of steps that can be
      read and understood in a few seconds, for people
      that don't want or need to bother with details.
      More in-depth descriptions have been moved into
      separate sections, one per each checklist step.
      This is by far the biggest improvement, as the previous
      approach of having to read several pages just to understand
      the requires steps was very scaring for beginners, in
      addition of being harder to navigate.
      
      GitHub pull requests have been integrated as a different
      way to submit a change, suggested for first time contributors.
      
      The review process has been described in more details,
      documenting the workflow and the used conventions.
      
      Most miscellanea have been moved into an "advanced
      topics" chapter.
      
      Paragraphs describing how to use git have been removed
      to simplify reading. This guide should focus on Go contribution,
      and not help users getting familiar with git, for which many
      guides are available.
      
      Change-Id: I6f4b76583c9878b230ba1d0225745a1708fad2e8
      Reviewed-on: https://go-review.googlesource.com/93495Reviewed-by: default avatarRob Pike <r@golang.org>
      a3d83269
  2. 21 Mar, 2018 7 commits
    • Ilya Tocar's avatar
      compress/bzip2: remove bit-tricks · 9eb21948
      Ilya Tocar authored
      Since compiler is now able to generate conditional moves, we can replace
      bit-tricks with simple if/else. This even results in slightly better performance:
      
      name            old time/op    new time/op    delta
      DecodeDigits-6    13.4ms ± 4%    13.0ms ± 2%  -2.63%  (p=0.003 n=10+10)
      DecodeTwain-6     37.5ms ± 1%    36.3ms ± 1%  -3.03%  (p=0.000 n=10+9)
      DecodeRand-6      4.23ms ± 1%    4.07ms ± 1%  -3.67%  (p=0.000 n=10+9)
      
      name            old speed      new speed      delta
      DecodeDigits-6  7.47MB/s ± 4%  7.67MB/s ± 2%  +2.69%  (p=0.002 n=10+10)
      DecodeTwain-6   10.4MB/s ± 1%  10.7MB/s ± 1%  +3.25%  (p=0.000 n=10+8)
      DecodeRand-6    3.87MB/s ± 1%  4.03MB/s ± 2%  +4.08%  (p=0.000 n=10+10)
      diff --git a/src/compress/bzip2/huffman.go b/src/compress/bzip2/huffman.go
      
      Change-Id: Ie96ef1a9e07013b07e78f22cdccd531f3341caca
      Reviewed-on: https://go-review.googlesource.com/102015
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      Reviewed-by: default avatarJoe Tsai <joetsai@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      9eb21948
    • Tim Wright's avatar
      all: enable c-shared/c-archive support for freebsd/amd64 · 88129f0c
      Tim Wright authored
      Fixes #14327
      Much of the code is based on the linux/amd64 code that implements these
      build modes, and code is shared where possible.
      
      Change-Id: Ia510f2023768c0edbc863aebc585929ec593b332
      Reviewed-on: https://go-review.googlesource.com/93875
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      88129f0c
    • isharipo's avatar
      runtime,sync/atomic: replace asm BYTEs with insts for x86 · ff5cf43d
      isharipo authored
      For each replacement, test case is added to new 386enc.s file
      with exception of EMMS, SYSENTER, MFENCE and LFENCE as they
      are already covered in amd64enc.s (same on amd64 and 386).
      
      The replacement became less obvious after go vet suggested changes
      Before:
      	BYTE $0x0f; BYTE $0x7f; BYTE $0x44; BYTE $0x24; BYTE $0x08
      Changed to MOVQ (this form is being tested):
      	MOVQ M0, 8(SP)
      Refactored to FP-relative access (go vet advice):
      	MOVQ M0, val+4(FP)
      
      Change-Id: I56b87cf3371b6ad81ad0cd9db2033aee407b5818
      Reviewed-on: https://go-review.googlesource.com/101475
      Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIlya Tocar <ilya.tocar@intel.com>
      ff5cf43d
    • Ross Light's avatar
      net/url: fix contradiction in PathUnescape docs · 65727ab5
      Ross Light authored
      Change-Id: If35e3faa738c5d7d72cf77d14b276690579180a1
      Reviewed-on: https://go-review.googlesource.com/101921
      Run-TryBot: Ross Light <light@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      65727ab5
    • Tobias Klauser's avatar
      runtime: parse auxv on freebsd · 2e84dc25
      Tobias Klauser authored
      Decode AT_PAGESZ to determine physPageSize on freebsd/{386,amd64,arm}
      and AT_HWCAP for hwcap and hardDiv on freebsd/arm. Also use hwcap to
      perform the FP checks in checkgoarm akin to the linux/arm
      implementation.
      
      Change-Id: I532810a1581efe66277e4305cb234acdc79ee91e
      Reviewed-on: https://go-review.googlesource.com/99780
      Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      2e84dc25
    • Daniel Martí's avatar
      cmd/doc: use empty GOPATH when running the tests · 77c3ef6f
      Daniel Martí authored
      Otherwise, a populated GOPATH might result in failures such as:
      
      	$ go test
      	[...] no buildable Go source files in [...]/gopherjs/compiler/natives/src/crypto/rand
      	exit status 1
      
      Move the initialization of the dirs walker out of the init func, so that
      we can control its behavior in the tests.
      
      Updates #24464.
      
      Change-Id: I4b26a7d3d6809bdd8e9b6b0556d566e7855f80fe
      Reviewed-on: https://go-review.googlesource.com/101836
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      77c3ef6f
    • Alberto Donizetti's avatar
      cmd/trace: remove unused variable in tests · 041c5d83
      Alberto Donizetti authored
      Unused variables in closures are currently not diagnosed by the
      compiler (this is Issue #3059), while go/types catches them.
      
      One unused variable in the cmd/trace tests is causing the go/types
      test that typechecks the whole standard library to fail:
      
        FAIL: TestStdlib (8.05s)
          stdlib_test.go:223: cmd/trace/annotations_test.go:241:6: gcTime
          declared but not used
        FAIL
      
      Remove it.
      
      Updates #24464
      
      Change-Id: I0f1b9db6ae1f0130616ee649bdbfdc91e38d2184
      Reviewed-on: https://go-review.googlesource.com/101815Reviewed-by: default avatarDaniel Martí <mvdan@mvdan.cc>
      041c5d83
  3. 20 Mar, 2018 12 commits
    • Hiroshi Ioka's avatar
      go/internal/srcimporter: simplify and fix package file lookup · 5f0a9ba1
      Hiroshi Ioka authored
      The old code was a blend of (copied) code that existed before go/build,
      and incorrect adjustments made when go/build was introduced. This change
      leaves package path determination entirely to go/build and in the process
      fixes issues with relative import paths.
      
      Fixes #23092
      Fixes #24392
      
      Change-Id: I9e900538b365398751bace56964495c5440ac4ae
      Reviewed-on: https://go-review.googlesource.com/83415
      Run-TryBot: Robert Griesemer <gri@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
      5f0a9ba1
    • Paul Querna's avatar
      net/http: remove extraneous call to VerifyHostname · 2638001e
      Paul Querna authored
      VerifyHostname is called by tls.Conn during Handshake and does not need to be called explicitly.
      
      Change-Id: I22b7fa137e76bb4be3d0018813a571acfb882219
      Reviewed-on: https://go-review.googlesource.com/98618
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarFilippo Valsorda <filippo@golang.org>
      2638001e
    • Adam Langley's avatar
      crypto/x509: support the PSS certificates that OpenSSL 1.1.0 generates. · 8a151924
      Adam Langley authored
      It serialises optional parameters as empty rather than NULL. It's
      probably technically correct, although ASN.1 has a long history of doing
      this different ways.
      
      But OpenSSL is likely common enough that we want to support this
      encoding.
      
      Fixes #23847
      
      Change-Id: I81c60f0996edfecf59467dfdf75b0cf8ba7b1efb
      Reviewed-on: https://go-review.googlesource.com/96417Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarFilippo Valsorda <filippo@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      8a151924
    • Ilya Tocar's avatar
      cmd/compile/internal/ssa: update regalloc in loops · 983dcf70
      Ilya Tocar authored
      Currently we don't lift spill out of loop if loop contains call.
      However often we have code like this:
      
      for .. {
          if hard_case {
      	call()
          }
          // simple case, without call
      }
      
      So instead of checking for any call, check for unavoidable call.
      For #22698 cases I see:
      mime/quotedprintable/Writer-6                   10.9µs ± 4%      9.2µs ± 3%   -15.02%  (p=0.000 n=8+8)
      And:
      compress/flate/Encode/Twain/Huffman/1e4-6       99.4µs ± 6%     90.9µs ± 0%    -8.57%  (p=0.000 n=8+8)
      compress/flate/Encode/Twain/Huffman/1e5-6       760µs ± 1%      725µs ± 1%     -4.56%  (p=0.000 n=8+8)
      compress/flate/Encode/Twain/Huffman/1e6-6       7.55ms ± 0%      7.24ms ± 0%     -4.07%  (p=0.000 n=8+7)
      
      There are no significant changes on go1 benchmarks.
      But for cases with runtime arch checks, where we call generic version on old hardware,
      there are respectable performance gains:
      math/RoundToEven-6                             1.43ns ± 0%     1.25ns ± 0%   -12.59%  (p=0.001 n=7+7)
      math/bits/OnesCount64-6                        1.60ns ± 1%     1.42ns ± 1%   -11.32%  (p=0.000 n=8+8)
      
      Also on some runtime benchmarks loops have less loads and higher performance:
      runtime/RuneIterate/range1/ASCII-6             15.6ns ± 1%     13.9ns ± 1%   -10.74%  (p=0.000 n=7+8)
      runtime/ArrayEqual-6                           3.22ns ± 0%     2.86ns ± 2%   -11.06%  (p=0.000 n=7+8)
      
      Fixes #22698
      Updates #22234
      
      Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a
      Reviewed-on: https://go-review.googlesource.com/84055
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      983dcf70
    • Alberto Donizetti's avatar
      test/codegen: port comparisons tests to codegen · be371edd
      Alberto Donizetti authored
      And delete them from asm_test.
      
      Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab
      Reviewed-on: https://go-review.googlesource.com/101595
      Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      be371edd
    • Than McIntosh's avatar
      cmd/compile: fix regression in DWARF inlined routine variable tracking · f45c07e8
      Than McIntosh authored
      Fix a bug in the code that generates the pre-inlined variable
      declaration table used as raw material for emitting DWARF inline
      routine records. The fix for issue 23704 altered the recipe for
      assigning file/line/col to variables in one part of the compiler, but
      didn't update a similar recipe in the code for variable tracking.
      Added a new test that should catch problems of a similar nature.
      
      Fixes #24460.
      
      Change-Id: I255c036637f4151aa579c0e21d123fd413724d61
      Reviewed-on: https://go-review.googlesource.com/101676Reviewed-by: default avatarAlessandro Arzilli <alessandro.arzilli@gmail.com>
      Reviewed-by: default avatarHeschi Kreinick <heschi@google.com>
      f45c07e8
    • Michael Munday's avatar
      cmd/compile: mark LAA and LAAG as clobbering flags on s390x · ae10914e
      Michael Munday authored
      The atomic add instructions modify the condition code and so need to
      be marked as clobbering flags.
      
      Fixes #24449.
      
      Change-Id: Ic69c8d775fbdbfb2a56c5e0cfca7a49c0d7f6897
      Reviewed-on: https://go-review.googlesource.com/101455
      Run-TryBot: Michael Munday <mike.munday@ibm.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      ae10914e
    • Fangming.Fang's avatar
      cmd/asm: fix bug about VMOV instruction (move a vector element to another) on ARM64 · 9c312245
      Fangming.Fang authored
      This change fixes index error when encoding VMOV instruction which pattern
      is vmov Vn.<T>[index], Vd.<T>[index]
      
      Change-Id: I949166e6dfd63fb0a9365f183b6c50d452614f9d
      Reviewed-on: https://go-review.googlesource.com/101335Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      9c312245
    • Fangming.Fang's avatar
      cmd/asm: fix bug about VMOV instruction (move register to vector element) on ARM64 · 7673e305
      Fangming.Fang authored
      This change fixes index error when encoding VMOV instruction which pattern is
      VMOV Rn, V.<T>[index]. For example VMOV R1, V1.S[1] is assembled as VMOV R1, V1.S[0]
      
      Fixes #24400
      Change-Id: I82b5edc8af4e06862bc4692b119697c6bb7dc3fb
      Reviewed-on: https://go-review.googlesource.com/101297Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      7673e305
    • Vladimir Kuzmin's avatar
      cmd/compile: avoid mapaccess at m[k]=append(m[k].. · c12b185a
      Vladimir Kuzmin authored
      Currently rvalue m[k] is transformed during walk into:
      
              tmp1 := *mapaccess(m, k)
              tmp2 := append(tmp1, ...)
              *mapassign(m, k) = tmp2
      
      However, this is suboptimal, as we could instead produce just:
              tmp := mapassign(m, k)
              *tmp := append(*tmp, ...)
      
      Optimization is possible only if during Order it may tell that m[k] is
      exactly the same at left and right part of assignment. It doesn't work:
      1) m[f(k)] = append(m[f(k)], ...)
      2) sink, m[k] = sink, append(m[k]...)
      3) m[k] = append(..., m[k],...)
      
      Benchmark:
      name                           old time/op    new time/op    delta
      MapAppendAssign/Int32/256-8      33.5ns ± 3%    22.4ns ±10%  -33.24%  (p=0.000 n=16+18)
      MapAppendAssign/Int32/65536-8    68.2ns ± 6%    48.5ns ±29%  -28.90%  (p=0.000 n=20+20)
      MapAppendAssign/Int64/256-8      34.3ns ± 4%    23.3ns ± 5%  -32.23%  (p=0.000 n=17+18)
      MapAppendAssign/Int64/65536-8    65.9ns ± 7%    61.2ns ±19%   -7.06%  (p=0.002 n=18+20)
      MapAppendAssign/Str/256-8         116ns ±12%      79ns ±16%  -31.70%  (p=0.000 n=20+19)
      MapAppendAssign/Str/65536-8       134ns ±15%     111ns ±45%  -16.95%  (p=0.000 n=19+20)
      
      name                           old alloc/op   new alloc/op   delta
      MapAppendAssign/Int32/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=19+18)
      MapAppendAssign/Int32/65536-8     27.0B ± 0%     20.7B ±30%  -23.33%  (p=0.000 n=20+20)
      MapAppendAssign/Int64/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=20+17)
      MapAppendAssign/Int64/65536-8     27.0B ± 0%     27.0B ± 0%     ~     (all equal)
      MapAppendAssign/Str/256-8         94.0B ± 0%     78.0B ± 0%  -17.02%  (p=0.000 n=20+16)
      MapAppendAssign/Str/65536-8       54.0B ± 0%     54.0B ± 0%     ~     (all equal)
      
      Fixes #24364
      Updates #5147
      
      Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409
      Reviewed-on: https://go-review.googlesource.com/100838Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c12b185a
    • Cherry Zhang's avatar
      Revert "bytes: add optimized Compare for arm64" · e22d2413
      Cherry Zhang authored
      This reverts commit bfa8b6f8.
      
      Reason for revert: This depends on another CL which is not yet submitted.
      
      Change-Id: I50e7594f1473c911a2079fe910849a6694ac6c07
      Reviewed-on: https://go-review.googlesource.com/101496Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      e22d2413
    • fanzha02's avatar
      bytes: add optimized Compare for arm64 · bfa8b6f8
      fanzha02 authored
      Use LDP instructions to load 16 bytes per loop when the source length is long. Specially
      process the 8 bytes length, 4 bytes length and 2 bytes length to get a better performance.
      
      Benchmark result:
      name                           old time/op   new time/op    delta
      BytesCompare/1-8                21.0ns ± 0%    10.5ns ± 0%      ~     (p=0.079 n=4+5)
      BytesCompare/2-8                11.5ns ± 0%    10.5ns ± 0%    -8.70%  (p=0.008 n=5+5)
      BytesCompare/4-8                13.5ns ± 0%    10.0ns ± 0%   -25.93%  (p=0.008 n=5+5)
      BytesCompare/8-8                28.8ns ± 0%     9.5ns ± 0%      ~     (p=0.079 n=4+5)
      BytesCompare/16-8               40.5ns ± 0%    10.5ns ± 0%   -74.07%  (p=0.008 n=5+5)
      BytesCompare/32-8               64.6ns ± 0%    12.5ns ± 0%   -80.65%  (p=0.008 n=5+5)
      BytesCompare/64-8                112ns ± 0%      16ns ± 0%   -85.27%  (p=0.008 n=5+5)
      BytesCompare/128-8               208ns ± 0%      24ns ± 0%   -88.22%  (p=0.008 n=5+5)
      BytesCompare/256-8               400ns ± 0%      50ns ± 0%   -87.62%  (p=0.008 n=5+5)
      BytesCompare/512-8               785ns ± 0%      82ns ± 0%   -89.61%  (p=0.008 n=5+5)
      BytesCompare/1024-8             1.55µs ± 0%    0.14µs ± 0%      ~     (p=0.079 n=4+5)
      BytesCompare/2048-8             3.09µs ± 0%    0.27µs ± 0%      ~     (p=0.079 n=4+5)
      CompareBytesEqual-8             39.0ns ± 0%    12.0ns ± 0%   -69.23%  (p=0.008 n=5+5)
      CompareBytesToNil-8             8.57ns ± 5%    8.23ns ± 2%    -3.99%  (p=0.016 n=5+5)
      CompareBytesEmpty-8             7.37ns ± 0%    7.36ns ± 4%      ~     (p=0.690 n=5+5)
      CompareBytesIdentical-8         7.39ns ± 0%    7.46ns ± 2%      ~     (p=0.667 n=5+5)
      CompareBytesSameLength-8        17.0ns ± 0%    10.5ns ± 0%   -38.24%  (p=0.008 n=5+5)
      CompareBytesDifferentLength-8   17.0ns ± 0%    10.5ns ± 0%   -38.24%  (p=0.008 n=5+5)
      CompareBytesBigUnaligned-8      1.58ms ± 0%    0.19ms ± 0%   -88.31%  (p=0.016 n=4+5)
      CompareBytesBig-8               1.59ms ± 0%    0.19ms ± 0%   -88.27%  (p=0.016 n=5+4)
      CompareBytesBigIdentical-8      7.01ns ± 0%    6.60ns ± 3%    -5.91%  (p=0.008 n=5+5)
      
      name                           old speed     new speed      delta
      CompareBytesBigUnaligned-8     662MB/s ± 0%  5660MB/s ± 0%  +755.15%  (p=0.016 n=4+5)
      CompareBytesBig-8              661MB/s ± 0%  5636MB/s ± 0%  +752.57%  (p=0.016 n=5+4)
      CompareBytesBigIdentical-8     150TB/s ± 0%   159TB/s ± 3%    +6.27%  (p=0.008 n=5+5)
      
      Change-Id: I70332de06f873df3bc12c4a5af1028307b670046
      Reviewed-on: https://go-review.googlesource.com/90175Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      bfa8b6f8
  4. 19 Mar, 2018 11 commits
  5. 18 Mar, 2018 2 commits
  6. 17 Mar, 2018 1 commit
  7. 16 Mar, 2018 3 commits
    • Daniel Martí's avatar
      cmd/go: remove some unused parameters · 2767c4e2
      Daniel Martí authored
      Change-Id: I441b3045e76afc1c561914926c14efc8a116c8a7
      Reviewed-on: https://go-review.googlesource.com/101195
      Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      2767c4e2
    • David Chase's avatar
      cmd/compile: enable scopes unconditionally · b30bf958
      David Chase authored
      This revives Alessandro Arzilli's CL to enable scopes
      whenever any dwarf is emitted (with optimization or not),
      adds a test that detects this changes and shows that it
      creates more truthful debugging output.
      
      Reverted change to ssa/debug_test tests made when
      scopes were disabled during dwarflocationlist development.
      
      Also included are updates to the Delve test output (it
      had fallen out of sync; creating test output for one
      updates it for all) and minor naming changes in
      ssa/debug_test.
      
      Compile-time/space changes (relative to tip including dwarflocationlists):
      
      benchstat -geomean after.log scopes.log
      name        old time/op     new time/op     delta
      Template        182ms ± 1%      182ms ± 1%    ~     (p=0.666 n=9+9)
      Unicode        82.8ms ± 1%     86.6ms ±14%    ~     (p=0.211 n=9+10)
      GoTypes         611ms ± 1%      616ms ± 2%  +0.97%  (p=0.001 n=10+9)
      Compiler        2.95s ± 1%      2.95s ± 0%    ~     (p=0.573 n=10+8)
      SSA             6.70s ± 1%      6.81s ± 1%  +1.68%  (p=0.000 n=9+10)
      Flate           117ms ± 1%      118ms ± 1%  +0.60%  (p=0.036 n=9+8)
      GoParser        145ms ± 1%      145ms ± 1%    ~     (p=1.000 n=9+9)
      Reflect         398ms ± 1%      396ms ± 1%    ~     (p=0.053 n=9+10)
      Tar             171ms ± 1%      171ms ± 1%    ~     (p=0.356 n=9+10)
      XML             214ms ± 1%      214ms ± 1%    ~     (p=0.605 n=9+9)
      StdCmd          12.4s ± 2%      12.4s ± 1%    ~     (p=1.000 n=9+9)
      [Geo mean]      506ms           509ms       +0.71%
      
      name        old user-ns/op  new user-ns/op  delta
      Template         254M ± 4%       249M ± 6%    ~     (p=0.155 n=10+10)
      Unicode          121M ±11%       124M ± 6%    ~     (p=0.516 n=10+10)
      GoTypes          824M ± 2%       869M ± 5%  +5.49%  (p=0.001 n=8+10)
      Compiler        4.01G ± 2%      4.02G ± 1%    ~     (p=0.561 n=9+9)
      SSA             10.0G ± 2%      10.2G ± 2%  +2.29%  (p=0.000 n=9+10)
      Flate            154M ± 7%       154M ± 7%    ~     (p=0.960 n=10+9)
      GoParser         190M ± 7%       196M ± 6%    ~     (p=0.064 n=9+10)
      Reflect          528M ± 2%       517M ± 3%  -1.97%  (p=0.025 n=10+10)
      Tar              227M ± 5%       232M ± 3%    ~     (p=0.061 n=9+10)
      XML              286M ± 4%       283M ± 4%    ~     (p=0.343 n=9+9)
      [Geo mean]       502M            508M       +1.09%
      
      name        old text-bytes  new text-bytes  delta
      HelloSize        672k ± 0%       672k ± 0%  +0.01%  (p=0.000 n=10+10)
      CmdGoSize       7.21M ± 0%      7.21M ± 0%  -0.00%  (p=0.000 n=10+10)
      [Geo mean]      2.20M           2.20M       +0.00%
      
      name        old data-bytes  new data-bytes  delta
      HelloSize       9.88k ± 0%      9.88k ± 0%    ~     (all equal)
      CmdGoSize        248k ± 0%       248k ± 0%    ~     (all equal)
      [Geo mean]      49.5k           49.5k       +0.00%
      
      name        old bss-bytes   new bss-bytes   delta
      HelloSize        125k ± 0%       125k ± 0%    ~     (all equal)
      CmdGoSize        144k ± 0%       144k ± 0%  -0.04%  (p=0.000 n=10+10)
      [Geo mean]       135k            135k       -0.02%
      
      name        old exe-bytes   new exe-bytes   delta
      HelloSize       1.30M ± 0%      1.34M ± 0%  +3.15%  (p=0.000 n=10+10)
      CmdGoSize       13.5M ± 0%      13.9M ± 0%  +2.70%  (p=0.000 n=10+10)
      [Geo mean]      4.19M           4.31M       +2.92%
      
      Change-Id: Id53b8d57bd00440142ccbd39b95710e14e083fb5
      Reviewed-on: https://go-review.googlesource.com/101217Reviewed-by: default avatarHeschi Kreinick <heschi@google.com>
      b30bf958
    • Ian Lance Taylor's avatar
      net: don't let cancelation of a DNS lookup affect another lookup · bd859439
      Ian Lance Taylor authored
      Updates #8602
      Updates #20703
      Fixes #22724
      
      Change-Id: I27b72311b2c66148c59977361bd3f5101e47b51d
      Reviewed-on: https://go-review.googlesource.com/100840
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      bd859439