- 23 Apr, 2019 1 commit
-
-
David Chase authored
This was originally Revert "cmd/link: fix up debug_range for dsymutil (revert CL 72371)" which has the effect of no longer using Base Address Selection Entries in DWARF. However, the build-time costs of that are about 2%, so instead the hacky fixup that generated technically incorrect DWARF was removed from the linker, and the choice is instead made in the compiler, dependent on platform, but also under control of a flag so that we can report this bug against LLDB/dsymutil/dwarfdump (really, the LLVM dwarf libraries). This however does not solve #31188; debugging still fails, but dwarfdump no longer complains. There are at least two LLDB bugs involved, and this change will at allow us to report them without them being rejected because our now-obsolete workaround for the first bug creates not-quite-DWARF. Updates #31188. Change-Id: I5300c51ad202147bab7333329ebe961623d2b47d Reviewed-on: https://go-review.googlesource.com/c/go/+/170638 Run-TryBot: David Chase <drchase@google.com> Reviewed-by:
Heschi Kreinick <heschi@google.com>
-
- 15 Apr, 2019 1 commit
-
-
Michael Munday authored
This CL adds a new attribute, TOPFRAME, which can be used to mark functions that should be treated as being at the top of the call stack. The function `runtime.goexit` has been marked this way on architectures that use a link register. This will stop programs that use DWARF to unwind the call stack from unwinding past `runtime.goexit` on architectures that use a link register. For example, it eliminates "corrupt stack?" warnings when generating a backtrace that hits `runtime.goexit` in GDB on s390x. Similar code should be added for non-link-register architectures (i.e. amd64, 386). They mark the top of the call stack slightly differently to link register architectures so I haven't added that code (they need to mark "rip" as undefined). Fixes #24385. Change-Id: I15b4c69ac75b491daa0acf0d981cb80eb06488de Reviewed-on: https://go-review.googlesource.com/c/go/+/169726 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 11 Jan, 2019 1 commit
-
-
Austin Clements authored
Currently, obj.Ctxt's symbol table does not distinguish between ABI0 and ABIInternal symbols. This is *almost* okay, since a given symbol name in the final object file is only going to belong to one ABI or the other, but it requires that the compiler mark a Sym as being a function symbol before it retrieves its LSym. If it retrieves the LSym first, that LSym will be created as ABI0, and later marking the Sym as a function symbol won't change the LSym's ABI. Marking a Sym as a function symbol before looking up its LSym sounds easy, except Syms have a dual purpose: they are used just as interned strings (every function, variable, parameter, etc with the same textual name shares a Sym), and *also* to store state for whatever package global has that name. As a result, it's easy to slip up and look up an LSym when a Sym is serving as the name of a local variable, and then later mark it as a function when it's serving as the global with the name. In general, we were careful to avoid this, but #29610 demonstrates one case where we messed up. Because of on-demand importing from indexed export data, it's possible to compile a method wrapper for a type imported from another package before importing an init function from that package. If the argument of the method is named "init", the "init" LSym will be created as a data symbol when compiling the wrapper, before it gets marked as a function symbol. To fix this, we separate obj.Ctxt's symbol tables for ABI0 and ABIInternal symbols. This way, the compiler will simply get a different LSym once the Sym takes on its package-global meaning as a function. This fixes the above ordering issue, and means we no longer need to go out of our way to create the "init" function early and mark it as a function symbol. Fixes #29610. Updates #27539. Change-Id: Id9458b40017893d46ef9e4a3f9b47fc49e1ce8df Reviewed-on: https://go-review.googlesource.com/c/157017 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Robert Griesemer <gri@golang.org>
-
- 28 Dec, 2018 1 commit
-
-
Keith Randall authored
Work involved in getting a stack trace is divided between runtime.Callers and runtime.CallersFrames. Before this CL, runtime.Callers returns a pc per runtime frame. runtime.CallersFrames is responsible for expanding a runtime frame into potentially multiple user frames. After this CL, runtime.Callers returns a pc per user frame. runtime.CallersFrames just maps those to user frame info. Entries in the result of runtime.Callers are now pcs of the calls (or of the inline marks), not of the instruction just after the call. Fixes #29007 Fixes #28640 Update #26320 Change-Id: I1c9567596ff73dc73271311005097a9188c3406f Reviewed-on: https://go-review.googlesource.com/c/152537 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
- 11 Dec, 2018 1 commit
-
-
Keith Randall authored
When functions are inlined, for instructions in the inlined body, does -S print the location of the call, or the location of the body? Right now, we do the former. I'd like to do the latter by default, it makes much more sense when reading disassembly. With mid-stack inlining enabled in more cases, this quandry will come up more often. The original behavior is still available with -S=2. Some tests use this mode (so they can find assembly generated by a particular source line). This helped me with understanding what the compiler was doing while fixing #29007. Change-Id: Id14a3a41e1b18901e7c5e460aa4caf6d940ed064 Reviewed-on: https://go-review.googlesource.com/c/153241 Reviewed-by:
David Chase <drchase@google.com>
-
- 27 Nov, 2018 1 commit
-
-
Clément Chigot authored
This commit changes the code generated for addressing symbols on AIX operating system. On AIX, every symbol accesses must be done via another symbol near the TOC, named TOC anchor or TOC entry. This TOC anchor is a pointer to the symbol address. During Progedit function, when a symbol access is detected, its instructions are modified to create a load on its TOC anchor and retrieve the symbol. Change-Id: I00cf8f49c13004bc99fa8af13d549a709320f797 Reviewed-on: https://go-review.googlesource.com/c/151039 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Ian Lance Taylor <iant@golang.org>
-
- 12 Nov, 2018 2 commits
-
-
Austin Clements authored
This repurposes the "version" field of a symbol reference in the Go object file format to be an ABI field. Currently, this is just 0 or 1 depending on whether the symbol is static (the linker turns it into a different internal version number), so it's already only tenuously a symbol version. We change this to be -1 for static symbols and otherwise by the ABI number. This also adds a separate list of ABI alias symbols to be recorded in the object file. The ABI aliases must be a separate list and not just part of the symbol definitions because it's possible to have a symbol defined in one package and the alias "defined" in a different package. For example, this can happen if a symbol is defined in assembly in one package and stubbed in a different package. The stub triggers the generation of the ABI alias, but in a different package from the definition. For #27539. Change-Id: I015c9fe54690c027de6ef77e22b5585976a01587 Reviewed-on: https://go-review.googlesource.com/c/147157 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
Austin Clements authored
This doesn't yet do anything with this information. For #27539. Change-Id: Ia12c905812aa1ed425eedd6ab2f55ec75d81c0ce Reviewed-on: https://go-review.googlesource.com/c/147099 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 03 Nov, 2018 1 commit
-
-
Austin Clements authored
Currently, liveness produces a distinct obj.LSym for each GC bitmap for each function. These are then named by content hash and only ultimately deduplicated by WriteObjFile. For various reasons (see next commit), we want to remove this deduplication behavior from WriteObjFile. Furthermore, it's inefficient to produce these duplicate symbols in the first place. GC bitmaps are the only source of duplicate symbols in the compiler. This commit eliminates these duplicate symbols by declaring them in the Ctxt symbol hash just like every other obj.LSym. As a result, all GC bitmaps with the same content now refer to the same obj.LSym. The next commit will remove deduplication from WriteObjFile. For #27539. Change-Id: I4f15e3d99530122cdf473b7a838c69ef5f79db59 Reviewed-on: https://go-review.googlesource.com/c/146557 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 23 Oct, 2018 1 commit
-
-
Lynn Boger authored
This adds support for an alignment directive that can be used within Go asm to indicate preferred code alignment for ppc64x. This is intended to be used with loops to improve performance. This change only adds the directive and aligns the code based on it. Follow up changes will modify asm functions for ppc64x that benefit from preferred alignment. Fixes #14935 Here is one example of the improvement in memmove when the directive is used on the loops in the code: Memmove/64 8.74ns ± 0% 8.64ns ± 0% -1.19% (p=0.000 n=8+8) Memmove/128 11.5ns ± 0% 11.0ns ± 0% -4.35% (p=0.000 n=8+8) Memmove/256 23.0ns ± 0% 15.3ns ± 0% -33.48% (p=0.000 n=8+8) Memmove/512 31.7ns ± 0% 31.8ns ± 0% +0.32% (p=0.000 n=8+8) Memmove/1024 52.3ns ± 0% 43.9ns ± 0% -16.10% (p=0.000 n=8+8) Memmove/2048 93.2ns ± 0% 76.2ns ± 0% -18.24% (p=0.000 n=8+8) Memmove/4096 174ns ± 0% 141ns ± 0% -18.97% (p=0.000 n=8+8) Change-Id: I200d77e923dd5d78c22fe3f8eb142a8fbaff57bf Reviewed-on: https://go-review.googlesource.com/c/144218 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
- 03 Oct, 2018 1 commit
-
-
Keith Randall authored
Rework how the compiler+runtime handles stack-allocated variables whose address is taken. Direct references to such variables work as before. References through pointers, however, use a new mechanism. The new mechanism is more precise than the old "ambiguously live" mechanism. It computes liveness at runtime based on the actual references among objects on the stack. Each function records all of its address-taken objects in a FUNCDATA. These are called "stack objects". The runtime then uses that information while scanning a stack to find all of the stack objects on a stack. It then does a mark phase on the stack objects, using all the pointers found on the stack (and ancillary structures, like defer records) as the root set. Only stack objects which are found to be live during this mark phase will be scanned and thus retain any heap objects they point to. A subsequent CL will remove all the "ambiguously live" logic from the compiler, so that the stack object tracing will be required. For this CL, the stack tracing is all redundant with the current ambiguously live logic. Update #22350 Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f Reviewed-on: https://go-review.googlesource.com/c/134155 Reviewed-by:
Austin Clements <austin@google.com>
-
- 22 May, 2018 2 commits
-
-
Austin Clements authored
This adds FUNCDATA and PCDATA that records the register maps much like the existing live arguments maps and live locals maps. The register map is indexed independently from the argument and locals maps since changes in register liveness tend not to correlate with changes to argument and local liveness. This is the final CL toward adding safe-points everywhere. The following CLs will optimize liveness analysis to bring down the cost. The effect of this CL is: name old time/op new time/op delta Template 195ms ± 2% 197ms ± 1% ~ (p=0.136 n=9+9) Unicode 98.4ms ± 2% 99.7ms ± 1% +1.39% (p=0.004 n=10+10) GoTypes 685ms ± 1% 700ms ± 1% +2.06% (p=0.000 n=9+9) Compiler 3.28s ± 1% 3.34s ± 0% +1.71% (p=0.000 n=9+8) SSA 7.79s ± 1% 7.91s ± 1% +1.55% (p=0.000 n=10+9) Flate 133ms ± 2% 133ms ± 2% ~ (p=0.190 n=10+10) GoParser 161ms ± 2% 164ms ± 3% +1.83% (p=0.015 n=10+10) Reflect 450ms ± 1% 457ms ± 1% +1.62% (p=0.000 n=10+10) Tar 183ms ± 2% 185ms ± 1% +0.91% (p=0.008 n=9+10) XML 234ms ± 1% 238ms ± 1% +1.60% (p=0.000 n=9+9) [Geo mean] 411ms 417ms +1.40% name old exe-bytes new exe-bytes delta HelloSize 1.47M ± 0% 1.51M ± 0% +2.79% (p=0.000 n=10+10) Compared to just before "cmd/internal/obj: consolidate emitting entry stack map", the cumulative effect of adding stack maps everywhere and register maps is: name old time/op new time/op delta Template 185ms ± 2% 197ms ± 1% +6.42% (p=0.000 n=10+9) Unicode 96.3ms ± 3% 99.7ms ± 1% +3.60% (p=0.000 n=10+10) GoTypes 658ms ± 0% 700ms ± 1% +6.37% (p=0.000 n=10+9) Compiler 3.14s ± 1% 3.34s ± 0% +6.53% (p=0.000 n=9+8) SSA 7.41s ± 2% 7.91s ± 1% +6.71% (p=0.000 n=9+9) Flate 126ms ± 1% 133ms ± 2% +6.15% (p=0.000 n=10+10) GoParser 153ms ± 1% 164ms ± 3% +6.89% (p=0.000 n=10+10) Reflect 437ms ± 1% 457ms ± 1% +4.59% (p=0.000 n=10+10) Tar 178ms ± 1% 185ms ± 1% +4.18% (p=0.000 n=10+10) XML 223ms ± 1% 238ms ± 1% +6.39% (p=0.000 n=10+9) [Geo mean] 394ms 417ms +5.78% name old alloc/op new alloc/op delta Template 34.5MB ± 0% 38.0MB ± 0% +10.19% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 30.3MB ± 0% +3.56% (p=0.000 n=8+9) GoTypes 113MB ± 0% 125MB ± 0% +10.89% (p=0.000 n=10+10) Compiler 510MB ± 0% 575MB ± 0% +12.79% (p=0.000 n=10+10) SSA 1.46GB ± 0% 1.64GB ± 0% +12.40% (p=0.000 n=10+10) Flate 23.9MB ± 0% 25.9MB ± 0% +8.56% (p=0.000 n=10+10) GoParser 28.0MB ± 0% 30.8MB ± 0% +10.08% (p=0.000 n=10+10) Reflect 77.6MB ± 0% 84.3MB ± 0% +8.63% (p=0.000 n=10+10) Tar 34.1MB ± 0% 37.0MB ± 0% +8.44% (p=0.000 n=10+10) XML 42.7MB ± 0% 47.2MB ± 0% +10.75% (p=0.000 n=10+10) [Geo mean] 76.0MB 83.3MB +9.60% name old allocs/op new allocs/op delta Template 321k ± 0% 337k ± 0% +4.98% (p=0.000 n=10+10) Unicode 337k ± 0% 340k ± 0% +1.04% (p=0.000 n=10+9) GoTypes 1.13M ± 0% 1.18M ± 0% +4.85% (p=0.000 n=10+10) Compiler 4.67M ± 0% 4.96M ± 0% +6.25% (p=0.000 n=10+10) SSA 11.7M ± 0% 12.3M ± 0% +5.69% (p=0.000 n=10+10) Flate 216k ± 0% 226k ± 0% +4.52% (p=0.000 n=10+9) GoParser 271k ± 0% 283k ± 0% +4.52% (p=0.000 n=10+10) Reflect 927k ± 0% 972k ± 0% +4.78% (p=0.000 n=10+10) Tar 318k ± 0% 333k ± 0% +4.56% (p=0.000 n=10+10) XML 376k ± 0% 395k ± 0% +5.04% (p=0.000 n=10+10) [Geo mean] 730k 764k +4.61% name old exe-bytes new exe-bytes delta HelloSize 1.46M ± 0% 1.51M ± 0% +3.66% (p=0.000 n=10+10) For #24543. Change-Id: I91e003dc64151916b384274884bf02a2d6862547 Reviewed-on: https://go-review.googlesource.com/109353 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Keith Randall <khr@golang.org>
-
isharipo authored
- Uncomment tests for AVX512 encoder - Permit instruction suffixes for x86 - Permit limited reg list [reg-reg] syntax for x86 for multi-source ops - EVEX encoding support in obj/x86 (Z-cases, asmevex, etc.) - optabs and ytabs generated by x86avxgen (https://golang.org/cl/107216) Note: suffix formatting implemented with updated CConv function. Now arch asm backend should register formatting function by calling RegisterOpSuffix. Updates #22779 Change-Id: I076a167ee49582700e058c56ad74e6696710c8c8 Reviewed-on: https://go-review.googlesource.com/113315 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 04 May, 2018 1 commit
-
-
Richard Musiol authored
This commit adds the wasm architecture to the compile command. A later commit will contain the corresponding linker changes. Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4 The following files are generated: - src/cmd/compile/internal/ssa/opGen.go - src/cmd/compile/internal/ssa/rewriteWasm.go - src/cmd/internal/obj/wasm/anames.go Updates #18892 Change-Id: Ifb4a96a3e427aac2362a1c97967d5667450fba3b Reviewed-on: https://go-review.googlesource.com/103295 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 30 Apr, 2018 1 commit
-
-
Wei Xiao authored
Add a compiler intrinsic for getcallerpc on arm64 for better code generation. Change-Id: I897e670a2b8ffa1a8c2fdc638f5b2c44bda26318 Reviewed-on: https://go-review.googlesource.com/109276 Reviewed-by:
Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 04 Apr, 2018 1 commit
-
-
David Chase authored
To improve debugging, instructions should be annotated with DWARF is_stmt. The DWARF default before was is_stmt=1, and to remove "jumpy" stepping the optimizer was tagging instructions with a no-position position, which interferes with the accuracy of profiling information. This allows that to be corrected, and also allows more "jumpy" positions to be annotated with is_stmt=0 (these changes were not made for 1.10 because of worries about further messing up profiling). The is_stmt values are placed in a pc-encoded table and passed through a symbol derived from the name of the function and processed in the linker alongside its processing of each function's pc/line tables. The only change in binary size is in the .debug_line tables measured with "objdump -h --section=.debug_line go1.test" For go1.test, these are 2614 bytes larger, or 0.72% of the size of .debug_line, or 0.025% of the file size. This will increase in proportion to how much the is_stmt flag is used (toggled). Change-Id: Ic1f1aeccff44591ad0494d29e1a0202a3c506a7a Reviewed-on: https://go-review.googlesource.com/93664 Run-TryBot: David Chase <drchase@google.com> Reviewed-by:
Heschi Kreinick <heschi@google.com>
-
- 10 Mar, 2018 1 commit
-
-
Daniel Martí authored
The tool has gotten better over time, so re-generating the files brings some advantages like fewer objects, dropping the use of fmt, and dropping unnecessary bounds checks. While at it, add the missing go:generate line for obj.AddrType. Change-Id: I120c9795ee8faddf5961ff0384b9dcaf58d831ff Reviewed-on: https://go-review.googlesource.com/100015 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org>
-
- 04 Dec, 2017 1 commit
-
-
Than McIntosh authored
The DWARF inline info generation hooks weren't properly handling unused auto vars in certain cases, triggering an assert (now fixed). Also with this change, introduce a new autom "flavor" to use for autom entries that are added to insure that a specific auto type makes it into the linker (this is a follow-on to the fix for 22941). Fixes #22962. Change-Id: I7a2d8caf47f6ca897b12acb6a6de0eb25f5cac8f Reviewed-on: https://go-review.googlesource.com/81557 Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 30 Nov, 2017 1 commit
-
-
Than McIntosh authored
Compiler and linker changes to support DWARF inlined instances, see https://go.googlesource.com/proposal/+/HEAD/design/22080-dwarf-inlining.md for design details. This functionality is gated via the cmd/compile option -gendwarfinl=N, where N={0,1,2}, where a value of 0 disables dwarf inline generation, a value of 1 turns on dwarf generation without tracking of formal/local vars from inlined routines, and a value of 2 enables inlines with variable tracking. Updates #22080 Change-Id: I69309b3b815d9fed04aebddc0b8d33d0dbbfad6e Reviewed-on: https://go-review.googlesource.com/75550 Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
David Chase <drchase@google.com>
-
- 02 Nov, 2017 1 commit
-
-
isharipo authored
Current AllowedOpCodes is 1024, which is not enough for modern x86. Changed limit to 2048 (though AVX512 will exceed this). Additional Z-cases and ytab tables are added to make it possible to handle missing AVX1 and AVX2 instructions. This CL is required by x86avxgen to work properly: https://go-review.googlesource.com/c/arch/+/66972 Change-Id: I290214bbda554d2cba53349f50dcd34014fe4cee Reviewed-on: https://go-review.googlesource.com/70650 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Ilya Tocar <ilya.tocar@intel.com>
-
- 13 Oct, 2017 1 commit
-
-
Wei Xiao authored
Some ARM64-specific instructions (such as SIMD instructions) are not supported. This patch adds support for the following: 1. Extended register, e.g.: ADD Rm.<ext>[<<amount], Rn, Rd <ext> can have the following values: UXTB, UXTH, UXTW, UXTX, SXTB, SXTH, SXTW and SXTX 2. Arrangement for SIMD instructions, e.g.: VADDP Vm.<T>, Vn.<T>, Vd.<T> <T> can have the following values: B8, B16, H4, H8, S2, S4 and D2 3. Width specifier and element index for SIMD instructions, e.g.: VMOV Vn.<T>[index], Rd // MOV(to general register) <T> can have the following values: S and D 4. Register List, e.g.: VLD1 (Rn), [Vt1.<T>, Vt2.<T>, Vt3.<T>] 5. Register offset variant, e.g.: VLD1.P (Rn)(Rm), [Vt1.<T>, Vt2.<T>] // Rm is the post-index register 6. Go assembly for ARM64 reference manual new added instructions are required to have according explanation items in the manual and items for existed instructions will be added incrementally For more information about the refinement background, please refer to the discussion (https://groups.google.com/forum/#!topic/golang-dev/rWgDxCrL4GU) This patch only adds syntax and doesn't break any assembly that already exists. Change-Id: I34e90b7faae032820593a0e417022c354a882008 Reviewed-on: https://go-review.googlesource.com/41654 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 11 Oct, 2017 1 commit
-
-
Keith Randall authored
If the stack frame is too large, abort immediately. We used to generate code first, then abort. In issue 22200, generating code raised a panic so we got an ICE instead of an error message. Change the max frame size to 1GB (from 2GB). Stack frames between 1.1GB and 2GB didn't used to work anyway, the pcln table generation would have failed and generated an ICE. Fixes #22200 Change-Id: I1d918ab27ba6ebf5c87ec65d1bccf973f8c8541e Reviewed-on: https://go-review.googlesource.com/69810 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by:
Ian Lance Taylor <iant@golang.org>
-
- 15 Sep, 2017 1 commit
-
-
isharipo authored
This change makes it easier to express instructions with arbitrary number of operands. Rationale: previous approach with operand "hiding" does not scale well, AVX and especially AVX512 have many instructions with 3+ operands. x86 asm backend is updated to handle up to 6 explicit operands. It also fixes issue with 4-th immediate operand type checks. All `ytab` tables are updated accordingly. Changes to non-x86 backends only include these patterns: `p.From3 = X` => `p.SetFrom3(X)` `p.From3.X = Y` => `p.GetFrom3().X = Y` Over time, other backends can adapt Prog.RestArgs and reduce the amount of workarounds. -- Performance -- x/benchmark/build: $ benchstat upstream.bench patched.bench name old time/op new time/op delta Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10) name old binary-size new binary-size delta Build-48 10.3M ± 0% 10.3M ± 0% ~ (all equal) name old build-time/op new build-time/op delta Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10) name old build-peak-RSS-bytes new build-peak-RSS-bytes delta Build-48 145MB ± 5% 148MB ± 5% ~ (p=0.218 n=10+10) name old build-user+sys-time/op new build-user+sys-time/op delta Build-48 21.0s ± 2% 21.2s ± 2% ~ (p=0.075 n=10+10) Microbenchmark shows a slight slowdown. name old time/op new time/op delta AMD64asm-4 49.5ms ± 1% 49.9ms ± 1% +0.67% (p=0.001 n=23+15) func BenchmarkAMD64asm(b *testing.B) { for i := 0; i < b.N; i++ { TestAMD64EndToEnd(nil) TestAMD64Encoder(nil) } } Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183 Reviewed-on: https://go-review.googlesource.com/63490 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Matthew Dempsky <mdempsky@google.com>
-
- 27 Jul, 2017 1 commit
-
-
Heschi Kreinick authored
Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by:
Josh Bleecher Snyder <josharian@gmail.com>
-
- 18 May, 2017 1 commit
-
-
Alessandro Arzilli authored
Change compiler and linker to emit DWARF lexical blocks in .debug_info section when compiling with -N -l. Version of debug_info is updated from DWARF v2 to DWARF v3 since version 2 does not allow lexical blocks with discontinuous PC ranges. Remaining open problems: - scope information is removed from inlined functions - variables records do not have DW_AT_start_scope attributes so a variable will shadow other variables with the same name as soon as its containing scope begins, even before its declaration. Updates #6913. Updates #12899. Change-Id: Idc6808788512ea20e7e45bcf782453acb416fb49 Reviewed-on: https://go-review.googlesource.com/40095 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Matthew Dempsky <mdempsky@google.com> Reviewed-by:
Josh Bleecher Snyder <josharian@gmail.com>
-
- 27 Apr, 2017 1 commit
-
-
Josh Bleecher Snyder authored
This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by:
Matthew Dempsky <mdempsky@google.com> Reviewed-by:
Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 20 Apr, 2017 2 commits
-
-
Josh Bleecher Snyder authored
There were only two versions, 0 and 1, and the only user of version 1 was the assembler, to indicate that a symbol was static. Rename LSym.Version to Static, and add it to LSym.Attributes. Simplify call-sites. Passes toolstash-check. Change-Id: Iabd39918f5019cce78f381d13f0481ae09f3871f Reviewed-on: https://go-review.googlesource.com/41201 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Matthew Dempsky <mdempsky@google.com>
-
Josh Bleecher Snyder authored
Though LSym.Version is an int, it can only have the value 0 or 1. Using that, split Link.hash into two maps, one for version 0 (which is far more common) and one for version 1. This lets use just the name for lookups, which is both faster and more compact. This matters because Link.hash map lookups are frequent, and will be contended once the backend is concurrent. name old time/op new time/op delta Template 194ms ± 3% 192ms ± 5% -1.46% (p=0.000 n=47+49) Unicode 84.5ms ± 3% 83.8ms ± 3% -0.81% (p=0.011 n=50+49) GoTypes 543ms ± 2% 545ms ± 4% ~ (p=0.566 n=46+49) Compiler 2.48s ± 2% 2.48s ± 3% ~ (p=0.706 n=47+50) SSA 5.94s ± 3% 5.98s ± 2% +0.55% (p=0.040 n=49+50) Flate 119ms ± 6% 119ms ± 4% ~ (p=0.681 n=48+47) GoParser 145ms ± 4% 145ms ± 3% ~ (p=0.662 n=47+49) Reflect 348ms ± 3% 344ms ± 3% -1.17% (p=0.000 n=47+47) Tar 105ms ± 4% 104ms ± 3% ~ (p=0.155 n=50+47) XML 197ms ± 2% 197ms ± 3% ~ (p=0.666 n=49+49) [Geo mean] 332ms 331ms -0.37% name old user-time/op new user-time/op delta Template 230ms ±10% 226ms ±10% -1.85% (p=0.041 n=50+50) Unicode 104ms ± 6% 103ms ± 5% ~ (p=0.076 n=49+49) GoTypes 707ms ± 4% 705ms ± 5% ~ (p=0.521 n=50+50) Compiler 3.30s ± 3% 3.33s ± 4% +0.76% (p=0.003 n=50+49) SSA 8.17s ± 4% 8.23s ± 3% +0.66% (p=0.030 n=50+49) Flate 139ms ± 6% 138ms ± 8% ~ (p=0.184 n=49+48) GoParser 174ms ± 5% 172ms ± 6% ~ (p=0.107 n=48+49) Reflect 431ms ± 8% 420ms ± 5% -2.57% (p=0.000 n=50+46) Tar 119ms ± 6% 118ms ± 7% -0.95% (p=0.033 n=50+49) XML 236ms ± 4% 236ms ± 4% ~ (p=0.935 n=50+48) [Geo mean] 410ms 407ms -0.67% name old alloc/op new alloc/op delta Template 38.7MB ± 0% 38.6MB ± 0% -0.29% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.7MB ± 0% -0.24% (p=0.008 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.29% (p=0.008 n=5+5) Compiler 462MB ± 0% 462MB ± 0% -0.12% (p=0.008 n=5+5) SSA 1.27GB ± 0% 1.27GB ± 0% -0.05% (p=0.008 n=5+5) Flate 25.2MB ± 0% 25.1MB ± 0% -0.37% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.6MB ± 0% ~ (p=0.056 n=5+5) Reflect 77.5MB ± 0% 77.2MB ± 0% -0.38% (p=0.008 n=5+5) Tar 26.4MB ± 0% 26.3MB ± 0% ~ (p=0.151 n=5+5) XML 41.9MB ± 0% 41.9MB ± 0% -0.20% (p=0.032 n=5+5) [Geo mean] 74.5MB 74.3MB -0.23% name old allocs/op new allocs/op delta Template 378k ± 1% 377k ± 1% ~ (p=0.690 n=5+5) Unicode 321k ± 0% 322k ± 0% ~ (p=0.595 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.310 n=5+5) Compiler 4.25M ± 0% 4.25M ± 0% ~ (p=0.151 n=5+5) SSA 9.84M ± 0% 9.84M ± 0% ~ (p=0.841 n=5+5) Flate 232k ± 1% 232k ± 0% ~ (p=0.690 n=5+5) GoParser 315k ± 1% 315k ± 1% ~ (p=0.841 n=5+5) Reflect 970k ± 0% 970k ± 0% ~ (p=0.841 n=5+5) Tar 248k ± 0% 248k ± 1% ~ (p=0.841 n=5+5) XML 389k ± 0% 389k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 724k 724k +0.01% Updates #15756 Change-Id: I2646332e89f0444ca9d5a41d7172537d904ed636 Reviewed-on: https://go-review.googlesource.com/41050 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Matthew Dempsky <mdempsky@google.com>
-
- 19 Apr, 2017 1 commit
-
-
Matthew Dempsky authored
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing the assembler backends no longer requires reinstalling cmd/link or cmd/addr2line. There's also now one canonical definition of the object file format in cmd/internal/objabi/doc.go, with a warning to update all three implementations. objabi is still something of a grab bag of unrelated code (e.g., flag and environment variable handling probably belong in a separate "tool" package), but this is still progress. Fixes #15165. Fixes #20026. Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c Reviewed-on: https://go-review.googlesource.com/40972 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Josh Bleecher Snyder <josharian@gmail.com>
-
- 18 Apr, 2017 3 commits
-
-
Matthew Dempsky authored
Automated refactoring using github.com/mdempsky/unbed (to rewrite s.Foo to s.FuncInfo.Foo) and then gorename (to rename the FuncInfo field to just Func). Passes toolstash-check -all. Change-Id: I802c07a1239e0efea058a91a87c5efe12170083a Reviewed-on: https://go-review.googlesource.com/40670 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Josh Bleecher Snyder <josharian@gmail.com>
-
Josh Bleecher Snyder authored
A prior CL eliminated the last reference to Ctxt.Hash from the compiler. Change-Id: Ic97ff84ed1a14e0c93fb0e8ec0b2617c3397c0e8 Reviewed-on: https://go-review.googlesource.com/40699 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Matthew Dempsky <mdempsky@google.com>
-
- 13 Apr, 2017 1 commit
-
-
Josh Bleecher Snyder authored
Follow-up to review feedback from mdempsky on CL 40507. Reduces mutex contention by about 1%. Change-Id: I540ea6772925f4a59e58f55a3458eff15880c328 Reviewed-on: https://go-review.googlesource.com/40575 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org>
-
- 12 Apr, 2017 5 commits
-
-
Josh Bleecher Snyder authored
Prior to this CL, flags such as NOSPLIT on ATEXT Progs were stored in From3.Offset. Some but not all of those flags were also duplicated into From.Sym.Attribute. This CL migrates all of those flags into From.Sym.Attribute and stops creating a From3. A side-effect of this is that printing an ATEXT Prog can no longer simply dump From3.Offset. That's kind of good, since the raw flag value wasn't very informative anyway, but it did necessitate a bunch of updates to the cmd/asm tests. The reason I'm doing this work now is that avoiding storing flags in both From.Sym and From3.Offset simplifies some other changes to fix the data race first described in CL 40254. This CL almost passes toolstash-check -all. The only changes are in cases where the assembler has decided that a function's flags may be altered, e.g. to make a function with no calls in it NOSPLIT. Prior to this CL, that information was not printed. Sample before: "".Ctz64 t=1 size=63 args=0x10 locals=0x0 0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) TEXT "".Ctz64(SB), $0-16 0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) FUNCDATA $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB) Sample after: "".Ctz64 t=1 nosplit size=63 args=0x10 locals=0x0 0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) TEXT "".Ctz64(SB), NOSPLIT, $0-16 0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) FUNCDATA $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB) Observe the additional "nosplit" in the first line and the additional "NOSPLIT" in the second line. Updates #15756 Change-Id: I5c59bd8f3bdc7c780361f801d94a261f0aef3d13 Reviewed-on: https://go-review.googlesource.com/40495 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Josh Bleecher Snyder authored
Move it to the x86 package, matching our handling of deferreturn in x86 and arm. While we're here, improve the concurrency safety of both Plan9privates and deferreturn by eagerly initializing them in instinit. Updates #15756 Change-Id: If3b1995c1e4ec816a5443a18f8d715631967a8b1 Reviewed-on: https://go-review.googlesource.com/40408 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org>
-
Josh Bleecher Snyder authored
It is zeroed pointlessly and never read. Change-Id: I65390501a878f545122ec558cb621b91e394a538 Reviewed-on: https://go-review.googlesource.com/40406 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
Josh Bleecher Snyder authored
It is only used once and never written to. Switch to a local constant instead. Change-Id: Icdd84e47b81f0de44ad9ed56ab5f4f91df22e6b6 Reviewed-on: https://go-review.googlesource.com/40405 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
Josh Bleecher Snyder authored
These are unused after CLs 39922, 40252, 40370, 40371, and 40372. Change-Id: I76f9276c581067a8cb555de761550d960f6e39b8 Reviewed-on: https://go-review.googlesource.com/40404 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-
- 11 Apr, 2017 2 commits
-
-
Josh Bleecher Snyder authored
CL 39922 made the arm assembler concurrency-safe. This CL does the same, but for s390x. The approach is similar: introduce ctxtz to hold function-local state and thread it through the assembler as necessary. One race remains after this CL, similar to CL 40252. That race is conceptually unrelated to this refactoring, and will be addressed in a separate CL. Passes toolstash-check -all. Updates #15756 Change-Id: Iabf17aa242b70c0b078c2e85dae3d93a5e512372 Reviewed-on: https://go-review.googlesource.com/40371 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Michael Munday <munday@ca.ibm.com>
-
Josh Bleecher Snyder authored
CL 39922 made the arm assembler concurrency-safe. This CL does the same, but for arm64. The approach is similar: introduce ctxt7 to hold function-local state and thread it through the assembler as necessary. One race remains after this CL, deep in aclass, in the check that a Prog does not take the address of a TLS variable. That race is conceptually unrelated to this refactoring, and will be addressed in a separate CL. Passes toolstash-check -all. Updates #15756 Change-Id: Icab1ef70008468f9a5b8bf728a77c4520bbcb67d Reviewed-on: https://go-review.googlesource.com/40252 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Cherry Zhang <cherryyz@google.com>
-