- 22 Mar, 2018 4 commits
-
-
Travis Bischel authored
Move currently uses mov instructions directly up to 31 bytes and then switches to duffcopy. Moving 31 bytes is 4 instructions corresponding to two loads and two stores, (or 6 if !useSSE) depending on the usage, duffcopy is five (one or two mov, two or three lea, one call). This adds direct mov instructions for Move's of size 32, 48, and 64 with sse and for only size 32 without. With useSSE: - 32 is 4 instructions (byte +/- comparison below) - 33 thru 48 is 6 - 49 thru 64 is 8 Without: - 32 is 8 Note that the only platform with useSSE set to false is plan 9. I have built three projects based off tip and tip with this patch and the project's byte size is equal to or less than they were prior. The basis of this change is that copying data with instructions directly is nearly free, whereas calling into duffcopy adds a bit of overhead. This is most noticeable in range statements where elements are 32+ bytes. For code with the following pattern: func Benchmark32Range(b *testing.B) { var f s32 for _, count := range []int{10, 100, 1000, 10000} { name := strconv.Itoa(count) b.Run(name, func(b *testing.B) { base := make([]s32, count) for i := 0; i < b.N; i++ { for _, v := range base { f = v } } }) } _ = f } These are the resulting benchmarks: Benchmark16Range/10-4 19.1 19.1 +0.00% Benchmark16Range/100-4 169 170 +0.59% Benchmark16Range/1000-4 1684 1691 +0.42% Benchmark16Range/10000-4 18147 18124 -0.13% Benchmark31Range/10-4 141 142 +0.71% Benchmark31Range/100-4 1407 1410 +0.21% Benchmark31Range/1000-4 14070 14074 +0.03% Benchmark31Range/10000-4 141781 141759 -0.02% Benchmark32Range/10-4 71.4 32.2 -54.90% Benchmark32Range/100-4 695 326 -53.09% Benchmark32Range/1000-4 7166 3313 -53.77% Benchmark32Range/10000-4 72571 35425 -51.19% Benchmark64Range/10-4 87.8 64.9 -26.08% Benchmark64Range/100-4 868 629 -27.53% Benchmark64Range/1000-4 9355 6907 -26.17% Benchmark64Range/10000-4 94463 70385 -25.49% Benchmark79Range/10-4 177 152 -14.12% Benchmark79Range/100-4 1769 1531 -13.45% Benchmark79Range/1000-4 17893 15532 -13.20% Benchmark79Range/10000-4 178947 155551 -13.07% Benchmark80Range/10-4 99.6 99.7 +0.10% Benchmark80Range/100-4 987 985 -0.20% Benchmark80Range/1000-4 10573 10560 -0.12% Benchmark80Range/10000-4 106792 106639 -0.14% For runtime's BenchCopyFat* benchmarks: CopyFat8-4 0.40ns ± 0% 0.40ns ± 0% ~ (all equal) CopyFat12-4 0.40ns ± 0% 0.80ns ± 0% +100.00% (p=0.000 n=9+9) CopyFat16-4 0.40ns ± 0% 0.80ns ± 0% +100.00% (p=0.000 n=10+8) CopyFat24-4 0.80ns ± 0% 0.40ns ± 0% -50.00% (p=0.001 n=8+9) CopyFat32-4 2.01ns ± 0% 0.40ns ± 0% -80.10% (p=0.000 n=8+8) CopyFat64-4 2.87ns ± 0% 0.40ns ± 0% -86.07% (p=0.000 n=8+10) CopyFat128-4 4.82ns ± 0% 4.82ns ± 0% ~ (p=1.000 n=8+8) CopyFat256-4 8.83ns ± 0% 8.83ns ± 0% ~ (p=1.000 n=8+8) CopyFat512-4 16.9ns ± 0% 16.9ns ± 0% ~ (all equal) CopyFat520-4 14.6ns ± 0% 14.6ns ± 1% ~ (p=0.529 n=8+9) CopyFat1024-4 32.9ns ± 0% 33.0ns ± 0% +0.20% (p=0.041 n=8+9) Function calls are not benefitted as much due how they are compiled, but other benchmarks I ran show that calling function with 64 byte elements is marginally improved. The main downside with this change is that it may increase binary sizes depending on the size of the copy, but this change also decreases binaries for moves of 48 bytes or less. For the following code: package main type size [32]byte //go:noinline func use(t size) { } //go:noinline func get() size { var z size return z } func main() { var a size use(a) } Changing size around gives the following assembly leading up to the call (the initialization and actual call are removed): tip func call with 32B arg: 27B 48 89 e7 mov %rsp,%rdi 48 8d 74 24 20 lea 0x20(%rsp),%rsi 48 89 6c 24 f0 mov %rbp,-0x10(%rsp) 48 8d 6c 24 f0 lea -0x10(%rsp),%rbp e8 53 ab ff ff callq 448964 <runtime.duffcopy+0x364> 48 8b 6d 00 mov 0x0(%rbp),%rbp modified: 19B (-8B) 0f 10 44 24 20 movups 0x20(%rsp),%xmm0 0f 11 04 24 movups %xmm0,(%rsp) 0f 10 44 24 30 movups 0x30(%rsp),%xmm0 0f 11 44 24 10 movups %xmm0,0x10(%rsp) - tip with 47B arg: 29B 48 8d 7c 24 0f lea 0xf(%rsp),%rdi 48 8d 74 24 40 lea 0x40(%rsp),%rsi 48 89 6c 24 f0 mov %rbp,-0x10(%rsp) 48 8d 6c 24 f0 lea -0x10(%rsp),%rbp e8 43 ab ff ff callq 448964 <runtime.duffcopy+0x364> 48 8b 6d 00 mov 0x0(%rbp),%rbp modified: 20B (-9B) 0f 10 44 24 40 movups 0x40(%rsp),%xmm0 0f 11 44 24 0f movups %xmm0,0xf(%rsp) 0f 10 44 24 50 movups 0x50(%rsp),%xmm0 0f 11 44 24 1f movups %xmm0,0x1f(%rsp) - tip with 64B arg: 27B 48 89 e7 mov %rsp,%rdi 48 8d 74 24 40 lea 0x40(%rsp),%rsi 48 89 6c 24 f0 mov %rbp,-0x10(%rsp) 48 8d 6c 24 f0 lea -0x10(%rsp),%rbp e8 1f ab ff ff callq 448948 <runtime.duffcopy+0x348> 48 8b 6d 00 mov 0x0(%rbp),%rbp modified: 39B [+12B] 0f 10 44 24 40 movups 0x40(%rsp),%xmm0 0f 11 04 24 movups %xmm0,(%rsp) 0f 10 44 24 50 movups 0x50(%rsp),%xmm0 0f 11 44 24 10 movups %xmm0,0x10(%rsp) 0f 10 44 24 60 movups 0x60(%rsp),%xmm0 0f 11 44 24 20 movups %xmm0,0x20(%rsp) 0f 10 44 24 70 movups 0x70(%rsp),%xmm0 0f 11 44 24 30 movups %xmm0,0x30(%rsp) - tip with 79B arg: 29B 48 8d 7c 24 0f lea 0xf(%rsp),%rdi 48 8d 74 24 60 lea 0x60(%rsp),%rsi 48 89 6c 24 f0 mov %rbp,-0x10(%rsp) 48 8d 6c 24 f0 lea -0x10(%rsp),%rbp e8 09 ab ff ff callq 448948 <runtime.duffcopy+0x348> 48 8b 6d 00 mov 0x0(%rbp),%rbp modified: 46B [+17B] 0f 10 44 24 60 movups 0x60(%rsp),%xmm0 0f 11 44 24 0f movups %xmm0,0xf(%rsp) 0f 10 44 24 70 movups 0x70(%rsp),%xmm0 0f 11 44 24 1f movups %xmm0,0x1f(%rsp) 0f 10 84 24 80 00 00 movups 0x80(%rsp),%xmm0 00 0f 11 44 24 2f movups %xmm0,0x2f(%rsp) 0f 10 84 24 90 00 00 movups 0x90(%rsp),%xmm0 00 0f 11 44 24 3f movups %xmm0,0x3f(%rsp) So, at best we save 9B, at worst we gain 17. I do not think that copying around 65+B sized types is common enough to bloat program sizes. Using bincmp on the go binary itself shows a zero byte difference; there are gains and losses all over. One of the largest gains in binary size comes from cmd/go/internal/cache.(*Cache).Get, which passes around a 64 byte sized type -- this is one of the cases I would expect to be benefitted by this change. I think that this marginal improvement in struct copying for 64 byte structs is worth it: most data structs / work items I use in my programs are small, but few are smaller than 32 bytes: with one slice, the budget is up. The 32 rule alone would allow another 16 bytes, the 48 and 64 rules allow another 32 and 48. Change-Id: I19a8f9190d5d41825091f17f268f4763bfc12a62 Reviewed-on: https://go-review.googlesource.com/100718Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>
-
Alberto Donizetti authored
And remove them from asm_test. Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454 Reviewed-on: https://go-review.googlesource.com/102115 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
-
Carlos Eduardo Seo authored
cmd/compile/internal/ppc64, runtime internal/atomic, sync/atomic: implement faster atomics for ppc64x This change implements faster atomics for ppc64x based on the ISA 2.07B, Appendix B.2 recommendations, replacing SYNC/ISYNC by LWSYNC in some cases. Updates #21348 name old time/op new time/op delta Cond1-16 955ns 856ns -10.33% Cond2-16 2.38µs 2.03µs -14.59% Cond4-16 5.90µs 5.44µs -7.88% Cond8-16 12.1µs 11.1µs -8.42% Cond16-16 27.0µs 25.1µs -7.04% Cond32-16 59.1µs 55.5µs -6.14% LoadMostlyHits/*sync_test.DeepCopyMap-16 22.1ns 24.1ns +9.02% LoadMostlyHits/*sync_test.RWMutexMap-16 252ns 249ns -1.20% LoadMostlyHits/*sync.Map-16 16.2ns 16.3ns ~ LoadMostlyMisses/*sync_test.DeepCopyMap-16 22.3ns 22.6ns ~ LoadMostlyMisses/*sync_test.RWMutexMap-16 249ns 247ns -0.51% LoadMostlyMisses/*sync.Map-16 12.7ns 12.7ns ~ LoadOrStoreBalanced/*sync_test.RWMutexMap-16 1.27µs 1.17µs -7.54% LoadOrStoreBalanced/*sync.Map-16 1.12µs 1.10µs -2.35% LoadOrStoreUnique/*sync_test.RWMutexMap-16 1.75µs 1.68µs -3.84% LoadOrStoreUnique/*sync.Map-16 2.07µs 1.97µs -5.13% LoadOrStoreCollision/*sync_test.DeepCopyMap-16 15.8ns 15.9ns ~ LoadOrStoreCollision/*sync_test.RWMutexMap-16 496ns 424ns -14.48% LoadOrStoreCollision/*sync.Map-16 6.07ns 6.07ns ~ Range/*sync_test.DeepCopyMap-16 1.65µs 1.64µs ~ Range/*sync_test.RWMutexMap-16 278µs 288µs +3.75% Range/*sync.Map-16 2.00µs 2.01µs ~ AdversarialAlloc/*sync_test.DeepCopyMap-16 3.45µs 3.44µs ~ AdversarialAlloc/*sync_test.RWMutexMap-16 226ns 227ns ~ AdversarialAlloc/*sync.Map-16 1.09µs 1.07µs -2.36% AdversarialDelete/*sync_test.DeepCopyMap-16 553ns 550ns -0.57% AdversarialDelete/*sync_test.RWMutexMap-16 273ns 274ns ~ AdversarialDelete/*sync.Map-16 247ns 249ns ~ UncontendedSemaphore-16 79.0ns 65.5ns -17.11% ContendedSemaphore-16 112ns 97ns -13.77% MutexUncontended-16 3.34ns 2.51ns -24.69% Mutex-16 266ns 191ns -28.26% MutexSlack-16 226ns 159ns -29.55% MutexWork-16 377ns 338ns -10.14% MutexWorkSlack-16 335ns 308ns -8.20% MutexNoSpin-16 196ns 184ns -5.91% MutexSpin-16 710ns 666ns -6.21% Once-16 1.29ns 1.29ns ~ Pool-16 8.64ns 8.71ns ~ PoolOverflow-16 1.60µs 1.44µs -10.25% SemaUncontended-16 5.39ns 4.42ns -17.96% SemaSyntNonblock-16 539ns 483ns -10.42% SemaSyntBlock-16 413ns 354ns -14.20% SemaWorkNonblock-16 305ns 258ns -15.36% SemaWorkBlock-16 266ns 229ns -14.06% RWMutexUncontended-16 12.9ns 9.7ns -24.80% RWMutexWrite100-16 203ns 147ns -27.47% RWMutexWrite10-16 177ns 119ns -32.74% RWMutexWorkWrite100-16 435ns 403ns -7.39% RWMutexWorkWrite10-16 642ns 611ns -4.79% WaitGroupUncontended-16 4.67ns 3.70ns -20.92% WaitGroupAddDone-16 402ns 355ns -11.54% WaitGroupAddDoneWork-16 208ns 250ns +20.09% WaitGroupWait-16 1.21ns 1.21ns ~ WaitGroupWaitWork-16 5.91ns 5.87ns -0.81% WaitGroupActuallyWait-16 92.2ns 85.8ns -6.91% Updates #21348 Change-Id: Ibb9b271d11b308264103829e176c6d9fe8f867d3 Reviewed-on: https://go-review.googlesource.com/95175 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
-
Giovanni Bajo authored
I've reorganized the guide and rewritten large sections. The structure is now more clear and logical, and can be understood and navigated using the summary displayed at the top of the page (before, the summary was confusing because the guide contained H1s that were being ignored by the summary). Both the initial onboarding process and the Gerrit change submission process have been reworked to include a concise checklist of steps that can be read and understood in a few seconds, for people that don't want or need to bother with details. More in-depth descriptions have been moved into separate sections, one per each checklist step. This is by far the biggest improvement, as the previous approach of having to read several pages just to understand the requires steps was very scaring for beginners, in addition of being harder to navigate. GitHub pull requests have been integrated as a different way to submit a change, suggested for first time contributors. The review process has been described in more details, documenting the workflow and the used conventions. Most miscellanea have been moved into an "advanced topics" chapter. Paragraphs describing how to use git have been removed to simplify reading. This guide should focus on Go contribution, and not help users getting familiar with git, for which many guides are available. Change-Id: I6f4b76583c9878b230ba1d0225745a1708fad2e8 Reviewed-on: https://go-review.googlesource.com/93495Reviewed-by: Rob Pike <r@golang.org>
-
- 21 Mar, 2018 7 commits
-
-
Ilya Tocar authored
Since compiler is now able to generate conditional moves, we can replace bit-tricks with simple if/else. This even results in slightly better performance: name old time/op new time/op delta DecodeDigits-6 13.4ms ± 4% 13.0ms ± 2% -2.63% (p=0.003 n=10+10) DecodeTwain-6 37.5ms ± 1% 36.3ms ± 1% -3.03% (p=0.000 n=10+9) DecodeRand-6 4.23ms ± 1% 4.07ms ± 1% -3.67% (p=0.000 n=10+9) name old speed new speed delta DecodeDigits-6 7.47MB/s ± 4% 7.67MB/s ± 2% +2.69% (p=0.002 n=10+10) DecodeTwain-6 10.4MB/s ± 1% 10.7MB/s ± 1% +3.25% (p=0.000 n=10+8) DecodeRand-6 3.87MB/s ± 1% 4.03MB/s ± 2% +4.08% (p=0.000 n=10+10) diff --git a/src/compress/bzip2/huffman.go b/src/compress/bzip2/huffman.go Change-Id: Ie96ef1a9e07013b07e78f22cdccd531f3341caca Reviewed-on: https://go-review.googlesource.com/102015 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Joe Tsai <joetsai@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Tim Wright authored
Fixes #14327 Much of the code is based on the linux/amd64 code that implements these build modes, and code is shared where possible. Change-Id: Ia510f2023768c0edbc863aebc585929ec593b332 Reviewed-on: https://go-review.googlesource.com/93875 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
isharipo authored
For each replacement, test case is added to new 386enc.s file with exception of EMMS, SYSENTER, MFENCE and LFENCE as they are already covered in amd64enc.s (same on amd64 and 386). The replacement became less obvious after go vet suggested changes Before: BYTE $0x0f; BYTE $0x7f; BYTE $0x44; BYTE $0x24; BYTE $0x08 Changed to MOVQ (this form is being tested): MOVQ M0, 8(SP) Refactored to FP-relative access (go vet advice): MOVQ M0, val+4(FP) Change-Id: I56b87cf3371b6ad81ad0cd9db2033aee407b5818 Reviewed-on: https://go-review.googlesource.com/101475 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
-
Ross Light authored
Change-Id: If35e3faa738c5d7d72cf77d14b276690579180a1 Reviewed-on: https://go-review.googlesource.com/101921 Run-TryBot: Ross Light <light@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Tobias Klauser authored
Decode AT_PAGESZ to determine physPageSize on freebsd/{386,amd64,arm} and AT_HWCAP for hwcap and hardDiv on freebsd/arm. Also use hwcap to perform the FP checks in checkgoarm akin to the linux/arm implementation. Change-Id: I532810a1581efe66277e4305cb234acdc79ee91e Reviewed-on: https://go-review.googlesource.com/99780 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
Daniel Martí authored
Otherwise, a populated GOPATH might result in failures such as: $ go test [...] no buildable Go source files in [...]/gopherjs/compiler/natives/src/crypto/rand exit status 1 Move the initialization of the dirs walker out of the init func, so that we can control its behavior in the tests. Updates #24464. Change-Id: I4b26a7d3d6809bdd8e9b6b0556d566e7855f80fe Reviewed-on: https://go-review.googlesource.com/101836 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
-
Alberto Donizetti authored
Unused variables in closures are currently not diagnosed by the compiler (this is Issue #3059), while go/types catches them. One unused variable in the cmd/trace tests is causing the go/types test that typechecks the whole standard library to fail: FAIL: TestStdlib (8.05s) stdlib_test.go:223: cmd/trace/annotations_test.go:241:6: gcTime declared but not used FAIL Remove it. Updates #24464 Change-Id: I0f1b9db6ae1f0130616ee649bdbfdc91e38d2184 Reviewed-on: https://go-review.googlesource.com/101815Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
-
- 20 Mar, 2018 12 commits
-
-
Hiroshi Ioka authored
The old code was a blend of (copied) code that existed before go/build, and incorrect adjustments made when go/build was introduced. This change leaves package path determination entirely to go/build and in the process fixes issues with relative import paths. Fixes #23092 Fixes #24392 Change-Id: I9e900538b365398751bace56964495c5440ac4ae Reviewed-on: https://go-review.googlesource.com/83415 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
-
Paul Querna authored
VerifyHostname is called by tls.Conn during Handshake and does not need to be called explicitly. Change-Id: I22b7fa137e76bb4be3d0018813a571acfb882219 Reviewed-on: https://go-review.googlesource.com/98618 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Filippo Valsorda <filippo@golang.org>
-
Adam Langley authored
It serialises optional parameters as empty rather than NULL. It's probably technically correct, although ASN.1 has a long history of doing this different ways. But OpenSSL is likely common enough that we want to support this encoding. Fixes #23847 Change-Id: I81c60f0996edfecf59467dfdf75b0cf8ba7b1efb Reviewed-on: https://go-review.googlesource.com/96417Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Filippo Valsorda <filippo@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Ilya Tocar authored
Currently we don't lift spill out of loop if loop contains call. However often we have code like this: for .. { if hard_case { call() } // simple case, without call } So instead of checking for any call, check for unavoidable call. For #22698 cases I see: mime/quotedprintable/Writer-6 10.9µs ± 4% 9.2µs ± 3% -15.02% (p=0.000 n=8+8) And: compress/flate/Encode/Twain/Huffman/1e4-6 99.4µs ± 6% 90.9µs ± 0% -8.57% (p=0.000 n=8+8) compress/flate/Encode/Twain/Huffman/1e5-6 760µs ± 1% 725µs ± 1% -4.56% (p=0.000 n=8+8) compress/flate/Encode/Twain/Huffman/1e6-6 7.55ms ± 0% 7.24ms ± 0% -4.07% (p=0.000 n=8+7) There are no significant changes on go1 benchmarks. But for cases with runtime arch checks, where we call generic version on old hardware, there are respectable performance gains: math/RoundToEven-6 1.43ns ± 0% 1.25ns ± 0% -12.59% (p=0.001 n=7+7) math/bits/OnesCount64-6 1.60ns ± 1% 1.42ns ± 1% -11.32% (p=0.000 n=8+8) Also on some runtime benchmarks loops have less loads and higher performance: runtime/RuneIterate/range1/ASCII-6 15.6ns ± 1% 13.9ns ± 1% -10.74% (p=0.000 n=7+8) runtime/ArrayEqual-6 3.22ns ± 0% 2.86ns ± 2% -11.06% (p=0.000 n=7+8) Fixes #22698 Updates #22234 Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a Reviewed-on: https://go-review.googlesource.com/84055 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
-
Alberto Donizetti authored
And delete them from asm_test. Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab Reviewed-on: https://go-review.googlesource.com/101595 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>
-
Than McIntosh authored
Fix a bug in the code that generates the pre-inlined variable declaration table used as raw material for emitting DWARF inline routine records. The fix for issue 23704 altered the recipe for assigning file/line/col to variables in one part of the compiler, but didn't update a similar recipe in the code for variable tracking. Added a new test that should catch problems of a similar nature. Fixes #24460. Change-Id: I255c036637f4151aa579c0e21d123fd413724d61 Reviewed-on: https://go-review.googlesource.com/101676Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Michael Munday authored
The atomic add instructions modify the condition code and so need to be marked as clobbering flags. Fixes #24449. Change-Id: Ic69c8d775fbdbfb2a56c5e0cfca7a49c0d7f6897 Reviewed-on: https://go-review.googlesource.com/101455 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
Fangming.Fang authored
This change fixes index error when encoding VMOV instruction which pattern is vmov Vn.<T>[index], Vd.<T>[index] Change-Id: I949166e6dfd63fb0a9365f183b6c50d452614f9d Reviewed-on: https://go-review.googlesource.com/101335Reviewed-by: Cherry Zhang <cherryyz@google.com>
-
Fangming.Fang authored
This change fixes index error when encoding VMOV instruction which pattern is VMOV Rn, V.<T>[index]. For example VMOV R1, V1.S[1] is assembled as VMOV R1, V1.S[0] Fixes #24400 Change-Id: I82b5edc8af4e06862bc4692b119697c6bb7dc3fb Reviewed-on: https://go-review.googlesource.com/101297Reviewed-by: Cherry Zhang <cherryyz@google.com>
-
Vladimir Kuzmin authored
Currently rvalue m[k] is transformed during walk into: tmp1 := *mapaccess(m, k) tmp2 := append(tmp1, ...) *mapassign(m, k) = tmp2 However, this is suboptimal, as we could instead produce just: tmp := mapassign(m, k) *tmp := append(*tmp, ...) Optimization is possible only if during Order it may tell that m[k] is exactly the same at left and right part of assignment. It doesn't work: 1) m[f(k)] = append(m[f(k)], ...) 2) sink, m[k] = sink, append(m[k]...) 3) m[k] = append(..., m[k],...) Benchmark: name old time/op new time/op delta MapAppendAssign/Int32/256-8 33.5ns ± 3% 22.4ns ±10% -33.24% (p=0.000 n=16+18) MapAppendAssign/Int32/65536-8 68.2ns ± 6% 48.5ns ±29% -28.90% (p=0.000 n=20+20) MapAppendAssign/Int64/256-8 34.3ns ± 4% 23.3ns ± 5% -32.23% (p=0.000 n=17+18) MapAppendAssign/Int64/65536-8 65.9ns ± 7% 61.2ns ±19% -7.06% (p=0.002 n=18+20) MapAppendAssign/Str/256-8 116ns ±12% 79ns ±16% -31.70% (p=0.000 n=20+19) MapAppendAssign/Str/65536-8 134ns ±15% 111ns ±45% -16.95% (p=0.000 n=19+20) name old alloc/op new alloc/op delta MapAppendAssign/Int32/256-8 47.0B ± 0% 46.0B ± 0% -2.13% (p=0.000 n=19+18) MapAppendAssign/Int32/65536-8 27.0B ± 0% 20.7B ±30% -23.33% (p=0.000 n=20+20) MapAppendAssign/Int64/256-8 47.0B ± 0% 46.0B ± 0% -2.13% (p=0.000 n=20+17) MapAppendAssign/Int64/65536-8 27.0B ± 0% 27.0B ± 0% ~ (all equal) MapAppendAssign/Str/256-8 94.0B ± 0% 78.0B ± 0% -17.02% (p=0.000 n=20+16) MapAppendAssign/Str/65536-8 54.0B ± 0% 54.0B ± 0% ~ (all equal) Fixes #24364 Updates #5147 Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409 Reviewed-on: https://go-review.googlesource.com/100838Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Cherry Zhang authored
This reverts commit bfa8b6f8. Reason for revert: This depends on another CL which is not yet submitted. Change-Id: I50e7594f1473c911a2079fe910849a6694ac6c07 Reviewed-on: https://go-review.googlesource.com/101496Reviewed-by: Cherry Zhang <cherryyz@google.com>
-
fanzha02 authored
Use LDP instructions to load 16 bytes per loop when the source length is long. Specially process the 8 bytes length, 4 bytes length and 2 bytes length to get a better performance. Benchmark result: name old time/op new time/op delta BytesCompare/1-8 21.0ns ± 0% 10.5ns ± 0% ~ (p=0.079 n=4+5) BytesCompare/2-8 11.5ns ± 0% 10.5ns ± 0% -8.70% (p=0.008 n=5+5) BytesCompare/4-8 13.5ns ± 0% 10.0ns ± 0% -25.93% (p=0.008 n=5+5) BytesCompare/8-8 28.8ns ± 0% 9.5ns ± 0% ~ (p=0.079 n=4+5) BytesCompare/16-8 40.5ns ± 0% 10.5ns ± 0% -74.07% (p=0.008 n=5+5) BytesCompare/32-8 64.6ns ± 0% 12.5ns ± 0% -80.65% (p=0.008 n=5+5) BytesCompare/64-8 112ns ± 0% 16ns ± 0% -85.27% (p=0.008 n=5+5) BytesCompare/128-8 208ns ± 0% 24ns ± 0% -88.22% (p=0.008 n=5+5) BytesCompare/256-8 400ns ± 0% 50ns ± 0% -87.62% (p=0.008 n=5+5) BytesCompare/512-8 785ns ± 0% 82ns ± 0% -89.61% (p=0.008 n=5+5) BytesCompare/1024-8 1.55µs ± 0% 0.14µs ± 0% ~ (p=0.079 n=4+5) BytesCompare/2048-8 3.09µs ± 0% 0.27µs ± 0% ~ (p=0.079 n=4+5) CompareBytesEqual-8 39.0ns ± 0% 12.0ns ± 0% -69.23% (p=0.008 n=5+5) CompareBytesToNil-8 8.57ns ± 5% 8.23ns ± 2% -3.99% (p=0.016 n=5+5) CompareBytesEmpty-8 7.37ns ± 0% 7.36ns ± 4% ~ (p=0.690 n=5+5) CompareBytesIdentical-8 7.39ns ± 0% 7.46ns ± 2% ~ (p=0.667 n=5+5) CompareBytesSameLength-8 17.0ns ± 0% 10.5ns ± 0% -38.24% (p=0.008 n=5+5) CompareBytesDifferentLength-8 17.0ns ± 0% 10.5ns ± 0% -38.24% (p=0.008 n=5+5) CompareBytesBigUnaligned-8 1.58ms ± 0% 0.19ms ± 0% -88.31% (p=0.016 n=4+5) CompareBytesBig-8 1.59ms ± 0% 0.19ms ± 0% -88.27% (p=0.016 n=5+4) CompareBytesBigIdentical-8 7.01ns ± 0% 6.60ns ± 3% -5.91% (p=0.008 n=5+5) name old speed new speed delta CompareBytesBigUnaligned-8 662MB/s ± 0% 5660MB/s ± 0% +755.15% (p=0.016 n=4+5) CompareBytesBig-8 661MB/s ± 0% 5636MB/s ± 0% +752.57% (p=0.016 n=5+4) CompareBytesBigIdentical-8 150TB/s ± 0% 159TB/s ± 3% +6.27% (p=0.008 n=5+5) Change-Id: I70332de06f873df3bc12c4a5af1028307b670046 Reviewed-on: https://go-review.googlesource.com/90175Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 19 Mar, 2018 11 commits
-
-
fanzha02 authored
Current ARM64 assembler has no check for the invalid value of both shift amount and post-index immediate offset of LD1/ST1. This patch adds the check. This patch also fixes the printing error of register number equals to 31, which should be printed as ZR instead of R31. Test cases are also added. Change-Id: I476235f3ab3a3fc91fe89c5a3149a4d4529c05c7 Reviewed-on: https://go-review.googlesource.com/100255Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
-
Andrew Bonventre authored
This reverts commit 4b06d9d7. Reason for revert: It's a reference to a legendary article from the Journal of Irreproducible Results. Updates golang/go#24451 Change-Id: I0288177f4e286bd6ace5774f2e5e0acb02370305 Reviewed-on: https://go-review.googlesource.com/101495Reviewed-by: Andrew Bonventre <andybons@golang.org>
-
Vlad Krasnov authored
Instead shifting the accumulator every iteration of the loop, shift once in the end. This significantly improves performance on arm64. On arm64: name old time/op new time/op delta RSA2048Decrypt 3.33ms ± 0% 2.63ms ± 0% -20.94% (p=0.000 n=11+11) RSA2048Sign 4.22ms ± 0% 3.55ms ± 0% -15.89% (p=0.000 n=11+11) 3PrimeRSA2048Decrypt 1.95ms ± 0% 1.59ms ± 0% -18.59% (p=0.000 n=11+11) On Skylake: name old time/op new time/op delta RSA2048Decrypt-8 1.73ms ± 2% 1.55ms ± 2% -10.19% (p=0.000 n=10+10) RSA2048Sign-8 2.17ms ± 2% 2.00ms ± 2% -7.93% (p=0.000 n=10+10) 3PrimeRSA2048Decrypt-8 1.10ms ± 2% 0.96ms ± 2% -13.03% (p=0.000 n=10+9) Change-Id: I5786191a1a09e4217fdb1acfd90880d35c5855f7 Reviewed-on: https://go-review.googlesource.com/99838 Run-TryBot: Vlad Krasnov <vlad@cloudflare.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Adam Langley <agl@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
-
jimmyfrasche authored
Updates #24352. Change-Id: Ie7c1de905154a483b7f7748de28e35b484cce6ea Reviewed-on: https://go-review.googlesource.com/101279Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Andrew Bonventre authored
Fixes golang/go#24451 Change-Id: Id9b4cbd1a1ff032f1cc4606e9734ddcc64892ae5 Reviewed-on: https://go-review.googlesource.com/101457Reviewed-by: Filippo Valsorda <filippo@golang.org>
-
Alberto Donizetti authored
One could expect calls like z.mant.shl(z.mant, shiftAmount) (or higher-level-functions calls that use lhs/rhs) to be almost free when shiftAmount = 0; and expect calls like z.mant.shl(x.mant, 0) to have the same cost of a x.mant -> z.mant copy. Neither of this things are currently true. For an 800 words nat, the first kind of calls cost ~800ns for rigth shifts and ~3.5µs for left shift; while the second kind of calls are doing more work than necessary by calling shlVU/shrVU. This change makes the first kind of calls ({Shl,Shr}Same) almost free, and the second kind of calls ({Shl,Shr}) about 30% faster. name old time/op new time/op delta ZeroShifts/Shl-4 3.64µs ± 3% 2.49µs ± 1% -31.55% (p=0.000 n=10+10) ZeroShifts/ShlSame-4 3.65µs ± 1% 0.01µs ± 1% -99.85% (p=0.000 n=9+9) ZeroShifts/Shr-4 3.65µs ± 1% 2.49µs ± 1% -31.91% (p=0.000 n=10+10) ZeroShifts/ShrSame-4 825ns ± 0% 6ns ± 1% -99.33% (p=0.000 n=9+10) During go test math/big, the shl zeroshift fastpath is triggered 1380 times; while the shr fastpath is triggered 153334 times(!). Change-Id: I5f92b304a40638bd8453a86c87c58e54b337bcdf Reviewed-on: https://go-review.googlesource.com/87660 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
-
jimmyfrasche authored
Per #11257 all examples should be in external test files. Additionally, doing so makes this example playable. Updates #24352. (Albeit tangentially). Change-Id: I77ab4655107f61db2e9d21a608b73ace3a230fb2 Reviewed-on: https://go-review.googlesource.com/101285Reviewed-by: Robert Griesemer <gri@golang.org>
-
Brad Fitzpatrick authored
Add accessors that handle nil without crashing. Fixes #24330 Change-Id: If5fbbb6015ca8d65f620a06bad6e52de8cd896ad Reviewed-on: https://go-review.googlesource.com/101315 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
-
Alberto Donizetti authored
And delete them from asm_test. Change-Id: I3cf0934706a640136cb0f646509174f8c1bf3363 Reviewed-on: https://go-review.googlesource.com/101395 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>
-
Neven Sajko authored
Change-Id: If9ad8ae663f007efe43cc35631713565fa754e93 Reviewed-on: https://go-review.googlesource.com/93237 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
quasilyte authored
Replaces " \t" code indentation with "\t". Issues like this are easy to spot with editor that prints whitespace charecters. Change-Id: Ia82877e7c99121bf369fa76e46ba52dff84f36bf Reviewed-on: https://go-review.googlesource.com/101355Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
- 18 Mar, 2018 2 commits
-
-
Alberto Donizetti authored
And delete them from asm_test. Change-Id: Ia286239a3d8f3915f2ca25dbcb39f3354a4f8aea Reviewed-on: https://go-review.googlesource.com/101138 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
-
Ian Gudger authored
Updates to x/net git rev 24dd378 for CL 100055 Fixes #10622 Updates #16218 Change-Id: I99e26da7b908b36585a0379d9381030c01819b54 Reviewed-on: https://go-review.googlesource.com/101278 Run-TryBot: Ian Gudger <igudger@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
- 17 Mar, 2018 1 commit
-
-
Ian Lance Taylor authored
Fixes #24425 Change-Id: I2aacbced8cd14da67fe9a4cbd62b434c18b5fce2 Reviewed-on: https://go-review.googlesource.com/101215 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Filippo Valsorda <filippo@golang.org>
-
- 16 Mar, 2018 3 commits
-
-
Daniel Martí authored
Change-Id: I441b3045e76afc1c561914926c14efc8a116c8a7 Reviewed-on: https://go-review.googlesource.com/101195 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-
David Chase authored
This revives Alessandro Arzilli's CL to enable scopes whenever any dwarf is emitted (with optimization or not), adds a test that detects this changes and shows that it creates more truthful debugging output. Reverted change to ssa/debug_test tests made when scopes were disabled during dwarflocationlist development. Also included are updates to the Delve test output (it had fallen out of sync; creating test output for one updates it for all) and minor naming changes in ssa/debug_test. Compile-time/space changes (relative to tip including dwarflocationlists): benchstat -geomean after.log scopes.log name old time/op new time/op delta Template 182ms ± 1% 182ms ± 1% ~ (p=0.666 n=9+9) Unicode 82.8ms ± 1% 86.6ms ±14% ~ (p=0.211 n=9+10) GoTypes 611ms ± 1% 616ms ± 2% +0.97% (p=0.001 n=10+9) Compiler 2.95s ± 1% 2.95s ± 0% ~ (p=0.573 n=10+8) SSA 6.70s ± 1% 6.81s ± 1% +1.68% (p=0.000 n=9+10) Flate 117ms ± 1% 118ms ± 1% +0.60% (p=0.036 n=9+8) GoParser 145ms ± 1% 145ms ± 1% ~ (p=1.000 n=9+9) Reflect 398ms ± 1% 396ms ± 1% ~ (p=0.053 n=9+10) Tar 171ms ± 1% 171ms ± 1% ~ (p=0.356 n=9+10) XML 214ms ± 1% 214ms ± 1% ~ (p=0.605 n=9+9) StdCmd 12.4s ± 2% 12.4s ± 1% ~ (p=1.000 n=9+9) [Geo mean] 506ms 509ms +0.71% name old user-ns/op new user-ns/op delta Template 254M ± 4% 249M ± 6% ~ (p=0.155 n=10+10) Unicode 121M ±11% 124M ± 6% ~ (p=0.516 n=10+10) GoTypes 824M ± 2% 869M ± 5% +5.49% (p=0.001 n=8+10) Compiler 4.01G ± 2% 4.02G ± 1% ~ (p=0.561 n=9+9) SSA 10.0G ± 2% 10.2G ± 2% +2.29% (p=0.000 n=9+10) Flate 154M ± 7% 154M ± 7% ~ (p=0.960 n=10+9) GoParser 190M ± 7% 196M ± 6% ~ (p=0.064 n=9+10) Reflect 528M ± 2% 517M ± 3% -1.97% (p=0.025 n=10+10) Tar 227M ± 5% 232M ± 3% ~ (p=0.061 n=9+10) XML 286M ± 4% 283M ± 4% ~ (p=0.343 n=9+9) [Geo mean] 502M 508M +1.09% name old text-bytes new text-bytes delta HelloSize 672k ± 0% 672k ± 0% +0.01% (p=0.000 n=10+10) CmdGoSize 7.21M ± 0% 7.21M ± 0% -0.00% (p=0.000 n=10+10) [Geo mean] 2.20M 2.20M +0.00% name old data-bytes new data-bytes delta HelloSize 9.88k ± 0% 9.88k ± 0% ~ (all equal) CmdGoSize 248k ± 0% 248k ± 0% ~ (all equal) [Geo mean] 49.5k 49.5k +0.00% name old bss-bytes new bss-bytes delta HelloSize 125k ± 0% 125k ± 0% ~ (all equal) CmdGoSize 144k ± 0% 144k ± 0% -0.04% (p=0.000 n=10+10) [Geo mean] 135k 135k -0.02% name old exe-bytes new exe-bytes delta HelloSize 1.30M ± 0% 1.34M ± 0% +3.15% (p=0.000 n=10+10) CmdGoSize 13.5M ± 0% 13.9M ± 0% +2.70% (p=0.000 n=10+10) [Geo mean] 4.19M 4.31M +2.92% Change-Id: Id53b8d57bd00440142ccbd39b95710e14e083fb5 Reviewed-on: https://go-review.googlesource.com/101217Reviewed-by: Heschi Kreinick <heschi@google.com>
-
Ian Lance Taylor authored
Updates #8602 Updates #20703 Fixes #22724 Change-Id: I27b72311b2c66148c59977361bd3f5101e47b51d Reviewed-on: https://go-review.googlesource.com/100840 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
-