1. 30 Apr, 2018 15 commits
  2. 29 Apr, 2018 14 commits
    • Hana Kim's avatar
      cmd/trace: use different colors for tasks · af5143e3
      Hana Kim authored
      and assign the same colors for spans belong to the tasks
      (sadly, the trace viewer will change the saturation/ligthness
      for asynchronous slices so exact color mapping is impossible.
      But I hope they are not too far from each other)
      
      Change-Id: Idaaf0828a1e0dac8012d336dcefa1c6572ddca2e
      Reviewed-on: https://go-review.googlesource.com/109338
      Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarHeschi Kreinick <heschi@google.com>
      af5143e3
    • Alberto Donizetti's avatar
      cmd/compile: better formatting for ssa phases options doc · 8a958bb8
      Alberto Donizetti authored
      Change the help doc of
      
        go tool compile -d=ssa/help
      
      from this:
      
        compile: GcFlag -d=ssa/<phase>/<flag>[=<value>|<function_name>]
        <phase> is one of:
        check, all, build, intrinsics, early_phielim, early_copyelim
        early_deadcode, short_circuit, decompose_user, opt, zero_arg_cse
        opt_deadcode, generic_cse, phiopt, nilcheckelim, prove, loopbce
        decompose_builtin, softfloat, late_opt, generic_deadcode, check_bce
        fuse, dse, writebarrier, insert_resched_checks, tighten, lower
        lowered_cse, elim_unread_autos, lowered_deadcode, checkLower
        late_phielim, late_copyelim, phi_tighten, late_deadcode, critical
        likelyadjust, layout, schedule, late_nilcheck, flagalloc, regalloc
        loop_rotate, stackframe, trim
        <flag> is one of on, off, debug, mem, time, test, stats, dump
        <value> defaults to 1
        <function_name> is required for "dump", specifies name of function to dump after <phase>
        Except for dump, output is directed to standard out; dump appears in a file.
        Phase "all" supports flags "time", "mem", and "dump".
        Phases "intrinsics" supports flags "on", "off", and "debug".
        Interpretation of the "debug" value depends on the phase.
        Dump files are named <phase>__<function_name>_<seq>.dump.
      
      To this:
      
        compile: PhaseOptions usage:
      
            go tool compile -d=ssa/<phase>/<flag>[=<value>|<function_name>]
      
        where:
      
        - <phase> is one of:
            check, all, build, intrinsics, early_phielim, early_copyelim
            early_deadcode, short_circuit, decompose_user, opt, zero_arg_cse
            opt_deadcode, generic_cse, phiopt, nilcheckelim, prove
            decompose_builtin, softfloat, late_opt, generic_deadcode, check_bce
            branchelim, fuse, dse, writebarrier, insert_resched_checks, lower
            lowered_cse, elim_unread_autos, lowered_deadcode, checkLower
            late_phielim, late_copyelim, tighten, phi_tighten, late_deadcode
            critical, likelyadjust, layout, schedule, late_nilcheck, flagalloc
            regalloc, loop_rotate, stackframe, trim
      
        - <flag> is one of:
            on, off, debug, mem, time, test, stats, dump
      
        - <value> defaults to 1
      
        - <function_name> is required for the "dump" flag, and specifies the
          name of function to dump after <phase>
      
        Phase "all" supports flags "time", "mem", and "dump".
        Phase "intrinsics" supports flags "on", "off", and "debug".
      
        If the "dump" flag is specified, the output is written on a file named
        <phase>__<function_name>_<seq>.dump; otherwise it is directed to stdout.
      
      Also add a few examples at the bottom.
      
      Fixes #20349
      
      Change-Id: I334799e951e7b27855b3ace5d2d966c4d6ec4cff
      Reviewed-on: https://go-review.googlesource.com/110062Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      8a958bb8
    • Josh Bleecher Snyder's avatar
      cmd/compile: simplify shifts using bounds from prove pass · 9eb4590a
      Josh Bleecher Snyder authored
      The prove pass sometimes has bounds information
      that later rewrite passes do not.
      
      Use this information to mark shifts as bounded,
      and then use that information to generate better code on amd64.
      It may prove to be helpful on other architectures, too.
      
      While here, coalesce the existing shift lowering rules.
      
      This triggers 35 times building std+cmd. The full list is below.
      
      Here's an example from runtime.heapBitsSetType:
      
      			if nb < 8 {
      				b |= uintptr(*p) << nb
      				p = add1(p)
      			} else {
      				nb -= 8
      			}
      
      We now generate better code on amd64 for that left shift.
      
      Updates #25087
      
      vendor/golang_org/x/crypto/curve25519/mont25519_amd64.go:48:20: Proved Rsh8Ux64 bounded
      runtime/mbitmap.go:1252:22: Proved Lsh64x64 bounded
      runtime/mbitmap.go:1265:16: Proved Lsh64x64 bounded
      runtime/mbitmap.go:1275:28: Proved Lsh64x64 bounded
      runtime/mbitmap.go:1645:25: Proved Lsh64x64 bounded
      runtime/mbitmap.go:1663:25: Proved Lsh64x64 bounded
      runtime/mbitmap.go:1808:41: Proved Lsh64x64 bounded
      runtime/mbitmap.go:1831:49: Proved Lsh64x64 bounded
      syscall/route_bsd.go:227:23: Proved Lsh32x64 bounded
      syscall/route_bsd.go:295:23: Proved Lsh32x64 bounded
      syscall/route_darwin.go:40:23: Proved Lsh32x64 bounded
      compress/bzip2/bzip2.go:384:26: Proved Lsh64x16 bounded
      vendor/golang_org/x/net/route/address.go:370:14: Proved Lsh64x64 bounded
      compress/flate/inflate.go:201:54: Proved Lsh64x64 bounded
      math/big/prime.go:50:25: Proved Lsh64x64 bounded
      vendor/golang_org/x/crypto/cryptobyte/asn1.go:464:43: Proved Lsh8x8 bounded
      net/ip.go:87:21: Proved Rsh8Ux64 bounded
      cmd/internal/goobj/read.go:267:23: Proved Lsh64x64 bounded
      cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:534:27: Proved Lsh32x32 bounded
      cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:544:27: Proved Lsh32x32 bounded
      cmd/internal/obj/arm/asm5.go:1044:16: Proved Lsh32x64 bounded
      cmd/internal/obj/arm/asm5.go:1065:10: Proved Lsh32x32 bounded
      cmd/internal/obj/mips/obj0.go:1311:21: Proved Lsh32x64 bounded
      cmd/compile/internal/syntax/scanner.go:352:23: Proved Lsh64x64 bounded
      go/types/expr.go:222:36: Proved Lsh64x64 bounded
      crypto/x509/x509.go:1626:9: Proved Rsh8Ux64 bounded
      cmd/link/internal/loadelf/ldelf.go:823:22: Proved Lsh8x64 bounded
      net/http/h2_bundle.go:1470:17: Proved Lsh8x8 bounded
      net/http/h2_bundle.go:1477:46: Proved Lsh8x8 bounded
      net/http/h2_bundle.go:1481:31: Proved Lsh64x8 bounded
      cmd/compile/internal/ssa/rewriteARM64.go:18759:17: Proved Lsh64x64 bounded
      cmd/compile/internal/ssa/sparsemap.go:70:23: Proved Lsh32x64 bounded
      cmd/compile/internal/ssa/sparsemap.go:73:45: Proved Lsh32x64 bounded
      
      Change-Id: I58bb72f3e6f12f6ac69be633ea7222c245438142
      Reviewed-on: https://go-review.googlesource.com/109776
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      9eb4590a
    • ChrisALiles's avatar
      cmd/compile: pass arguments to convt2E/I integer functions by value · 22ff9521
      ChrisALiles authored
      The motivation is avoid generating a pointer to the data being
      converted so it can be garbage collected.
      The change also slightly reduces binary size by shrinking call sites.
      
      Fixes #24286
      
      Benchmark results:
      name                   old time/op  new time/op  delta
      ConvT2ESmall-4         2.86ns ± 0%  2.80ns ± 1%  -2.12%  (p=0.000 n=29+28)
      ConvT2EUintptr-4       2.88ns ± 1%  2.88ns ± 0%  -0.20%  (p=0.002 n=28+30)
      ConvT2ELarge-4         19.6ns ± 0%  20.4ns ± 1%  +4.22%  (p=0.000 n=19+30)
      ConvT2ISmall-4         3.01ns ± 0%  2.85ns ± 0%  -5.32%  (p=0.000 n=24+28)
      ConvT2IUintptr-4       3.00ns ± 1%  2.87ns ± 0%  -4.44%  (p=0.000 n=29+25)
      ConvT2ILarge-4         20.4ns ± 1%  21.3ns ± 1%  +4.41%  (p=0.000 n=30+26)
      ConvT2Ezero/zero/16-4  2.84ns ± 1%  2.99ns ± 0%  +5.38%  (p=0.000 n=30+25)
      ConvT2Ezero/zero/32-4  2.83ns ± 2%  3.00ns ± 0%  +5.91%  (p=0.004 n=27+3)
      
      Change-Id: I65016ec94c53f97c52113121cab582d0c342b7a8
      Reviewed-on: https://go-review.googlesource.com/102636Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      22ff9521
    • Giovanni Bajo's avatar
      cmd/compile: teach prove to handle expressions like len(s)-delta · e0d37a33
      Giovanni Bajo authored
      When a loop has bound len(s)-delta, findIndVar detected it and
      returned len(s) as (conservative) upper bound. This little lie
      allowed loopbce to drop bound checks.
      
      It is obviously more generic to teach prove about relations like
      x+d<w for non-constant "w"; we already handled the case for
      constant "w", so we just want to learn that if d<0, then x+d<w
      proves that x<w.
      
      To be able to remove the code from findIndVar, we also need
      to teach prove that len() and cap() are always non-negative.
      
      This CL allows to prove 633 more checks in cmd+std. Most
      of them are cases where the code was already testing before
      accessing a slice but the compiler didn't know it. For instance,
      take strings.HasSuffix:
      
          func HasSuffix(s, suffix string) bool {
              return len(s) >= len(suffix) && s[len(s)-len(suffix):] == suffix
          }
      
      When suffix is a literal string, the compiler now understands
      that the explicit check is enough to not emit a slice check.
      
      I also found a loopbce test that was incorrectly
      written to detect an overflow but had a off-by-one (on the
      conservative side), so it unexpectly passed with this CL; I
      changed it to really trigger the overflow as intended.
      
      Change-Id: Ib5abade337db46b8811425afebad4719b6e46c4a
      Reviewed-on: https://go-review.googlesource.com/105635
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      e0d37a33
    • Giovanni Bajo's avatar
      cmd/compile: in prove, detect loops with negative increments · 6d379add
      Giovanni Bajo authored
      To be effective, this also requires being able to relax constraints
      on min/max bound inclusiveness; they are now exposed through a flags,
      and prove has been updated to handle it correctly.
      
      Change-Id: I3490e54461b7b9de8bc4ae40d3b5e2fa2d9f0556
      Reviewed-on: https://go-review.googlesource.com/104041
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      6d379add
    • Giovanni Bajo's avatar
      cmd/compile: improve testing of induction variables · 980fdb8d
      Giovanni Bajo authored
      Test both minimum and maximum bound, and prepare
      formatting for more advanced tests (inclusive / esclusive bounds).
      
      Change-Id: Ibe432916d9c938343bc07943798bc9709ad71845
      Reviewed-on: https://go-review.googlesource.com/104040
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      980fdb8d
    • Giovanni Bajo's avatar
      cmd/compile: remove loopbce pass · f49369b6
      Giovanni Bajo authored
      prove now is able to do what loopbce used to do.
      
      Passes toolstash -cmp.
      
      Compilebench of the whole serie (master 9967582f):
      
      name       old time/op     new time/op     delta
      Template       208ms ±18%      198ms ± 4%    ~     (p=0.690 n=5+5)
      Unicode       99.1ms ±19%     96.5ms ± 4%    ~     (p=0.548 n=5+5)
      GoTypes        623ms ± 1%      633ms ± 1%    ~     (p=0.056 n=5+5)
      Compiler       2.94s ± 2%      3.02s ± 4%    ~     (p=0.095 n=5+5)
      SSA            6.77s ± 1%      7.11s ± 2%  +4.94%  (p=0.008 n=5+5)
      Flate          129ms ± 1%      136ms ± 0%  +4.87%  (p=0.016 n=5+4)
      GoParser       152ms ± 3%      156ms ± 1%    ~     (p=0.095 n=5+5)
      Reflect        380ms ± 2%      392ms ± 1%  +3.30%  (p=0.008 n=5+5)
      Tar            185ms ± 6%      184ms ± 2%    ~     (p=0.690 n=5+5)
      XML            223ms ± 2%      228ms ± 3%    ~     (p=0.095 n=5+5)
      StdCmd         26.8s ± 2%      28.0s ± 5%  +4.46%  (p=0.032 n=5+5)
      
      name       old user-ns/op  new user-ns/op  delta
      Template        252M ± 5%       248M ± 3%    ~     (p=1.000 n=5+5)
      Unicode         118M ± 7%       121M ± 4%    ~     (p=0.548 n=5+5)
      GoTypes         790M ± 2%       793M ± 2%    ~     (p=0.690 n=5+5)
      Compiler       3.78G ± 3%      3.91G ± 4%    ~     (p=0.056 n=5+5)
      SSA            8.98G ± 2%      9.52G ± 3%  +6.08%  (p=0.008 n=5+5)
      Flate           155M ± 1%       160M ± 0%  +3.47%  (p=0.016 n=5+4)
      GoParser        185M ± 4%       187M ± 2%    ~     (p=0.310 n=5+5)
      Reflect         469M ± 1%       481M ± 1%  +2.52%  (p=0.016 n=5+5)
      Tar             222M ± 4%       222M ± 2%    ~     (p=0.841 n=5+5)
      XML             269M ± 1%       274M ± 2%  +1.88%  (p=0.032 n=5+5)
      
      name       old text-bytes  new text-bytes  delta
      HelloSize       664k ± 0%       664k ± 0%    ~     (all equal)
      CmdGoSize      7.23M ± 0%      7.22M ± 0%  -0.06%  (p=0.008 n=5+5)
      
      name       old data-bytes  new data-bytes  delta
      HelloSize       134k ± 0%       134k ± 0%    ~     (all equal)
      CmdGoSize       390k ± 0%       390k ± 0%    ~     (all equal)
      
      name       old exe-bytes   new exe-bytes   delta
      HelloSize      1.39M ± 0%      1.39M ± 0%    ~     (all equal)
      CmdGoSize      14.4M ± 0%      14.4M ± 0%  -0.06%  (p=0.008 n=5+5)
      
      Go1 of the whole serie:
      
      name                      old time/op    new time/op    delta
      BinaryTree17-16              5.40s ± 6%     5.38s ± 4%     ~     (p=1.000 n=12+10)
      Fannkuch11-16                4.04s ± 3%     3.81s ± 3%   -5.70%  (p=0.000 n=11+11)
      FmtFprintfEmpty-16          60.7ns ± 2%    60.2ns ± 3%     ~     (p=0.136 n=11+10)
      FmtFprintfString-16          115ns ± 2%     114ns ± 4%     ~     (p=0.175 n=11+10)
      FmtFprintfInt-16             118ns ± 2%     125ns ± 2%   +5.76%  (p=0.000 n=11+10)
      FmtFprintfIntInt-16          196ns ± 2%     204ns ± 3%   +4.42%  (p=0.000 n=10+11)
      FmtFprintfPrefixedInt-16     207ns ± 2%     214ns ± 2%   +3.23%  (p=0.000 n=10+11)
      FmtFprintfFloat-16           364ns ± 3%     357ns ± 2%   -1.88%  (p=0.002 n=11+11)
      FmtManyArgs-16               773ns ± 2%     775ns ± 1%     ~     (p=0.457 n=11+10)
      GobDecode-16                11.2ms ± 4%    11.0ms ± 3%   -1.51%  (p=0.022 n=10+9)
      GobEncode-16                9.91ms ± 6%    9.81ms ± 5%     ~     (p=0.699 n=11+11)
      Gzip-16                      339ms ± 1%     338ms ± 1%     ~     (p=0.438 n=11+11)
      Gunzip-16                   64.4ms ± 1%    65.2ms ± 1%   +1.28%  (p=0.001 n=10+11)
      HTTPClientServer-16          157µs ± 7%     160µs ± 5%     ~     (p=0.133 n=11+11)
      JSONEncode-16               22.3ms ± 4%    23.2ms ± 4%   +3.79%  (p=0.000 n=11+11)
      JSONDecode-16               96.7ms ± 3%    96.6ms ± 1%     ~     (p=0.562 n=11+11)
      Mandelbrot200-16            6.42ms ± 1%    6.40ms ± 1%     ~     (p=0.365 n=11+11)
      GoParse-16                  5.59ms ± 7%    5.42ms ± 5%   -3.07%  (p=0.020 n=11+10)
      RegexpMatchEasy0_32-16       113ns ± 2%     113ns ± 3%     ~     (p=0.968 n=11+10)
      RegexpMatchEasy0_1K-16       417ns ± 1%     416ns ± 3%     ~     (p=0.742 n=11+10)
      RegexpMatchEasy1_32-16       106ns ± 1%     107ns ± 3%     ~     (p=0.223 n=11+11)
      RegexpMatchEasy1_1K-16       654ns ± 2%     657ns ± 1%     ~     (p=0.672 n=11+8)
      RegexpMatchMedium_32-16      176ns ± 3%     177ns ± 1%     ~     (p=0.664 n=11+9)
      RegexpMatchMedium_1K-16     56.3µs ± 3%    56.7µs ± 3%     ~     (p=0.171 n=11+11)
      RegexpMatchHard_32-16       2.83µs ± 5%    2.83µs ± 4%     ~     (p=0.735 n=11+11)
      RegexpMatchHard_1K-16       82.7µs ± 2%    82.7µs ± 2%     ~     (p=0.853 n=10+10)
      Revcomp-16                   679ms ± 9%     782ms ±29%  +15.16%  (p=0.031 n=9+11)
      Template-16                  118ms ± 1%     109ms ± 2%   -7.49%  (p=0.000 n=11+11)
      TimeParse-16                 474ns ± 1%     462ns ± 1%   -2.59%  (p=0.000 n=11+11)
      TimeFormat-16                482ns ± 1%     494ns ± 1%   +2.49%  (p=0.000 n=10+11)
      
      name                      old speed      new speed      delta
      GobDecode-16              68.7MB/s ± 4%  69.8MB/s ± 3%   +1.52%  (p=0.022 n=10+9)
      GobEncode-16              77.6MB/s ± 6%  78.3MB/s ± 5%     ~     (p=0.699 n=11+11)
      Gzip-16                   57.2MB/s ± 1%  57.3MB/s ± 1%     ~     (p=0.428 n=11+11)
      Gunzip-16                  301MB/s ± 2%   298MB/s ± 1%   -1.07%  (p=0.007 n=11+11)
      JSONEncode-16             86.9MB/s ± 4%  83.7MB/s ± 4%   -3.63%  (p=0.000 n=11+11)
      JSONDecode-16             20.1MB/s ± 3%  20.1MB/s ± 1%     ~     (p=0.529 n=11+11)
      GoParse-16                10.4MB/s ± 6%  10.7MB/s ± 4%   +3.12%  (p=0.020 n=11+10)
      RegexpMatchEasy0_32-16     282MB/s ± 2%   282MB/s ± 3%     ~     (p=0.756 n=11+10)
      RegexpMatchEasy0_1K-16    2.45GB/s ± 1%  2.46GB/s ± 2%     ~     (p=0.705 n=11+10)
      RegexpMatchEasy1_32-16     299MB/s ± 1%   297MB/s ± 2%     ~     (p=0.151 n=11+11)
      RegexpMatchEasy1_1K-16    1.56GB/s ± 2%  1.56GB/s ± 1%     ~     (p=0.717 n=11+8)
      RegexpMatchMedium_32-16   5.67MB/s ± 4%  5.63MB/s ± 1%     ~     (p=0.538 n=11+9)
      RegexpMatchMedium_1K-16   18.2MB/s ± 3%  18.1MB/s ± 3%     ~     (p=0.156 n=11+11)
      RegexpMatchHard_32-16     11.3MB/s ± 5%  11.3MB/s ± 4%     ~     (p=0.711 n=11+11)
      RegexpMatchHard_1K-16     12.4MB/s ± 1%  12.4MB/s ± 2%     ~     (p=0.535 n=9+10)
      Revcomp-16                 370MB/s ± 5%   332MB/s ±24%     ~     (p=0.062 n=8+11)
      Template-16               16.5MB/s ± 1%  17.8MB/s ± 2%   +8.11%  (p=0.000 n=11+11)
      
      Change-Id: I41e46f375ee127785c6491f7ef5bd35581261ae6
      Reviewed-on: https://go-review.googlesource.com/104039
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      f49369b6
    • Giovanni Bajo's avatar
      cmd/compile: implement loop BCE in prove · 7ec25d0a
      Giovanni Bajo authored
      Reuse findIndVar to discover induction variables, and then
      register the facts we know about them into the facts table
      when entering the loop block.
      
      Moreover, handle "x+delta > w" while updating the facts table,
      to be able to prove accesses to slices with constant offsets
      such as slice[i-10].
      
      Change-Id: I2a63d050ed58258136d54712ac7015b25c893d71
      Reviewed-on: https://go-review.googlesource.com/104038
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      7ec25d0a
    • Giovanni Bajo's avatar
      cmd/compile: in prove, infer unsigned relations while branching · 29162ec9
      Giovanni Bajo authored
      When a branch is followed, we apply the relation as described
      in the domain relation table. In case the relation is in the
      positive domain, we can also infer an unsigned relation if,
      by that point, we know that both operands are non-negative.
      
      Fixes #20393
      
      Change-Id: Ieaf0c81558b36d96616abae3eb834c788dd278d5
      Reviewed-on: https://go-review.googlesource.com/100278
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarGiovanni Bajo <rasky@develer.com>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      29162ec9
    • Giovanni Bajo's avatar
      cmd/compile: in prove, add transitive closure of relations · 5c402109
      Giovanni Bajo authored
      Implement it through a partial order datastructure, which
      keeps the relations between SSA values in a forest of DAGs
      and is able to discover contradictions.
      
      In make.bash, this patch is able to prove hundreds of conditions
      which were not proved before.
      
      Compilebench:
      
      name       old time/op       new time/op       delta
      Template         371ms ± 2%        368ms ± 1%    ~     (p=0.222 n=5+5)
      Unicode          203ms ± 6%        199ms ± 3%    ~     (p=0.421 n=5+5)
      GoTypes          1.17s ± 4%        1.18s ± 1%    ~     (p=0.151 n=5+5)
      Compiler         5.54s ± 2%        5.59s ± 1%    ~     (p=0.548 n=5+5)
      SSA              12.9s ± 2%        13.2s ± 1%  +2.96%  (p=0.032 n=5+5)
      Flate            245ms ± 2%        247ms ± 3%    ~     (p=0.690 n=5+5)
      GoParser         302ms ± 6%        302ms ± 4%    ~     (p=0.548 n=5+5)
      Reflect          764ms ± 4%        773ms ± 3%    ~     (p=0.095 n=5+5)
      Tar              354ms ± 6%        361ms ± 3%    ~     (p=0.222 n=5+5)
      XML              434ms ± 3%        429ms ± 1%    ~     (p=0.421 n=5+5)
      StdCmd           22.6s ± 1%        22.9s ± 1%  +1.40%  (p=0.032 n=5+5)
      
      name       old user-time/op  new user-time/op  delta
      Template         436ms ± 8%        426ms ± 5%    ~     (p=0.579 n=5+5)
      Unicode          219ms ±15%        219ms ±12%    ~     (p=1.000 n=5+5)
      GoTypes          1.47s ± 6%        1.53s ± 6%    ~     (p=0.222 n=5+5)
      Compiler         7.26s ± 4%        7.40s ± 2%    ~     (p=0.389 n=5+5)
      SSA              17.7s ± 4%        18.5s ± 4%  +4.13%  (p=0.032 n=5+5)
      Flate            257ms ± 5%        268ms ± 9%    ~     (p=0.333 n=5+5)
      GoParser         354ms ± 6%        348ms ± 6%    ~     (p=0.913 n=5+5)
      Reflect          904ms ± 2%        944ms ± 4%    ~     (p=0.056 n=5+5)
      Tar              398ms ±11%        430ms ± 7%    ~     (p=0.079 n=5+5)
      XML              501ms ± 7%        489ms ± 5%    ~     (p=0.444 n=5+5)
      
      name       old text-bytes    new text-bytes    delta
      HelloSize        670kB ± 0%        670kB ± 0%  +0.00%  (p=0.008 n=5+5)
      CmdGoSize       7.22MB ± 0%       7.21MB ± 0%  -0.07%  (p=0.008 n=5+5)
      
      name       old data-bytes    new data-bytes    delta
      HelloSize       9.88kB ± 0%       9.88kB ± 0%    ~     (all equal)
      CmdGoSize        248kB ± 0%        248kB ± 0%  -0.06%  (p=0.008 n=5+5)
      
      name       old bss-bytes     new bss-bytes     delta
      HelloSize        125kB ± 0%        125kB ± 0%    ~     (all equal)
      CmdGoSize        145kB ± 0%        144kB ± 0%  -0.20%  (p=0.008 n=5+5)
      
      name       old exe-bytes     new exe-bytes     delta
      HelloSize       1.43MB ± 0%       1.43MB ± 0%    ~     (all equal)
      CmdGoSize       14.5MB ± 0%       14.5MB ± 0%  -0.06%  (p=0.008 n=5+5)
      
      Fixes #19714
      Updates #20393
      
      Change-Id: Ia090f5b5dc1bcd274ba8a39b233c1e1ace1b330e
      Reviewed-on: https://go-review.googlesource.com/100277
      Run-TryBot: Giovanni Bajo <rasky@develer.com>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      5c402109
    • Josh Bleecher Snyder's avatar
      runtime: iterate over set bits in adjustpointers · 5af0b28a
      Josh Bleecher Snyder authored
      There are several things combined in this change.
      
      First, eliminate the gobitvector type in favor
      of adding a ptrbit method to bitvector.
      In non-performance-critical code, use that method.
      In performance critical code, though, load the bitvector data
      one byte at a time and iterate only over set bits.
      To support that, add and use sys.Ctz8.
      
      name                old time/op  new time/op  delta
      StackCopyPtr-8      81.8ms ± 5%  78.9ms ± 3%   -3.58%  (p=0.000 n=97+96)
      StackCopy-8         65.9ms ± 3%  62.8ms ± 3%   -4.67%  (p=0.000 n=96+92)
      StackCopyNoCache-8   105ms ± 3%   102ms ± 3%   -3.38%  (p=0.000 n=96+95)
      
      Change-Id: I00b80f45612708bd440b1a411a57fa6dfa24aa74
      Reviewed-on: https://go-review.googlesource.com/109716
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      5af0b28a
    • Josh Bleecher Snyder's avatar
      runtime: add fast version of getArgInfo · 13cd0061
      Josh Bleecher Snyder authored
      getArgInfo is called a lot during stack copying.
      In the common case it doesn't do much work,
      but it cannot be inlined.
      
      This change works around that.
      
      name                old time/op  new time/op  delta
      StackCopyPtr-8       108ms ± 5%    96ms ± 4%  -10.40%  (p=0.000 n=20+20)
      StackCopy-8         82.6ms ± 3%  78.4ms ± 6%   -5.15%  (p=0.000 n=19+20)
      StackCopyNoCache-8   130ms ± 3%   122ms ± 3%   -6.44%  (p=0.000 n=20+20)
      
      Change-Id: If7d8a08c50a4e2e76e4331b399396c5dbe88c2ce
      Reviewed-on: https://go-review.googlesource.com/108945
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAustin Clements <austin@google.com>
      13cd0061
    • Austin Clements's avatar
      runtime: use entry stack map at function entry · 0fd427fd
      Austin Clements authored
      Currently, when the runtime looks up the stack map for a frame, it
      uses frame.continpc - 1 unless continpc is the function entry PC, in
      which case it uses frame.continpc. As a result, if continpc is the
      function entry point (which happens for deferred frames), it will
      actually look up the stack map *following* the first instruction.
      
      I think, though I am not positive, that this is always okay today
      because the first instruction of a function can never change the stack
      map. It's usually not a CALL, so it doesn't have PCDATA. Or, if it is
      a CALL, it has to have the entry stack map.
      
      But we're about to start emitting stack maps at every instruction that
      changes them, which means the first instruction can have PCDATA
      (notably, in leaf functions that don't have a prologue).
      
      To prepare for this, tweak how the runtime looks up stack map indexes
      so that if continpc is the function entry point, it directly uses the
      entry stack map.
      
      For #24543.
      
      Change-Id: I85aa818041cd26aff416f7b1fba186e9c8ca6568
      Reviewed-on: https://go-review.googlesource.com/109349Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      0fd427fd
  3. 28 Apr, 2018 2 commits
  4. 27 Apr, 2018 9 commits
    • Richard Musiol's avatar
      misc/wasm: wasm_exec: non-zero exit code on compile error · adb52cff
      Richard Musiol authored
      Return a non-zero exit code if the WebAssembly host fails to compile
      the WebAssmbly bytecode to machine code.
      
      Change-Id: I774309db2872b6a2de77a1b0392608058414160d
      Reviewed-on: https://go-review.googlesource.com/110097Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      adb52cff
    • Ben Shi's avatar
      cmd/compile: optimize ARM64 with shifted register indexed load/store · aaf73c6d
      Ben Shi authored
      ARM64 supports efficient instructions which combine shift, addition, load/store
      together. Such as "MOVD (R0)(R1<<3), R2" and "MOVWU R6, (R4)(R1<<2)".
      
      This CL optimizes the compiler to emit such efficient instuctions. And below
      is some test data.
      
      1. binary size before/after
      binary                 size change
      pkg/linux_arm64        +80.1KB
      pkg/tool/linux_arm64   +121.9KB
      go                     -4.3KB
      gofmt                  -64KB
      
      2. go1 benchmark
      There is big improvement for the test case Fannkuch11, and slight
      improvement for sme others, excluding noise.
      
      name                     old time/op    new time/op    delta
      BinaryTree17-4              43.9s ± 2%     44.0s ± 2%     ~     (p=0.820 n=30+30)
      Fannkuch11-4                30.6s ± 2%     24.5s ± 3%  -19.93%  (p=0.000 n=25+30)
      FmtFprintfEmpty-4           500ns ± 0%     499ns ± 0%   -0.11%  (p=0.000 n=23+25)
      FmtFprintfString-4         1.03µs ± 0%    1.04µs ± 3%     ~     (p=0.065 n=29+30)
      FmtFprintfInt-4            1.15µs ± 3%    1.15µs ± 4%   -0.56%  (p=0.000 n=30+30)
      FmtFprintfIntInt-4         1.80µs ± 5%    1.82µs ± 0%     ~     (p=0.094 n=30+24)
      FmtFprintfPrefixedInt-4    2.17µs ± 5%    2.20µs ± 0%     ~     (p=0.100 n=30+23)
      FmtFprintfFloat-4          3.08µs ± 3%    3.09µs ± 4%     ~     (p=0.123 n=30+30)
      FmtManyArgs-4              7.41µs ± 4%    7.17µs ± 1%   -3.26%  (p=0.000 n=30+23)
      GobDecode-4                93.7ms ± 0%    94.7ms ± 4%     ~     (p=0.685 n=24+30)
      GobEncode-4                78.7ms ± 7%    77.1ms ± 0%     ~     (p=0.729 n=30+23)
      Gzip-4                      4.01s ± 0%     3.97s ± 5%   -1.11%  (p=0.037 n=24+30)
      Gunzip-4                    389ms ± 4%     384ms ± 0%     ~     (p=0.155 n=30+23)
      HTTPClientServer-4          536µs ± 1%     537µs ± 1%     ~     (p=0.236 n=30+30)
      JSONEncode-4                179ms ± 1%     182ms ± 6%     ~     (p=0.763 n=24+30)
      JSONDecode-4                843ms ± 0%     839ms ± 6%   -0.42%  (p=0.003 n=25+30)
      Mandelbrot200-4            46.5ms ± 0%    46.5ms ± 0%   +0.02%  (p=0.000 n=26+26)
      GoParse-4                  44.3ms ± 6%    43.3ms ± 0%     ~     (p=0.067 n=30+27)
      RegexpMatchEasy0_32-4      1.07µs ± 7%    1.07µs ± 4%     ~     (p=0.835 n=30+30)
      RegexpMatchEasy0_1K-4      5.51µs ± 0%    5.49µs ± 0%   -0.35%  (p=0.000 n=23+26)
      RegexpMatchEasy1_32-4      1.01µs ± 0%    1.02µs ± 4%   +0.96%  (p=0.014 n=24+30)
      RegexpMatchEasy1_1K-4      7.43µs ± 0%    7.18µs ± 0%   -3.41%  (p=0.000 n=23+24)
      RegexpMatchMedium_32-4     1.78µs ± 0%    1.81µs ± 4%   +1.47%  (p=0.012 n=23+30)
      RegexpMatchMedium_1K-4      547µs ± 1%     542µs ± 3%   -0.90%  (p=0.003 n=24+30)
      RegexpMatchHard_32-4       30.4µs ± 0%    29.7µs ± 0%   -2.15%  (p=0.000 n=19+23)
      RegexpMatchHard_1K-4        913µs ± 0%     915µs ± 6%   +0.25%  (p=0.012 n=24+30)
      Revcomp-4                   6.32s ± 1%     6.42s ± 4%     ~     (p=0.342 n=25+30)
      Template-4                  868ms ± 6%     878ms ± 6%   +1.15%  (p=0.000 n=30+30)
      TimeParse-4                4.57µs ± 4%    4.59µs ± 3%   +0.65%  (p=0.010 n=29+30)
      TimeFormat-4               4.51µs ± 0%    4.50µs ± 0%   -0.27%  (p=0.000 n=27+24)
      [Geo mean]                  695µs          689µs        -0.92%
      
      name                     old speed      new speed      delta
      GobDecode-4              8.19MB/s ± 0%  8.12MB/s ± 4%     ~     (p=0.680 n=24+30)
      GobEncode-4              9.76MB/s ± 7%  9.96MB/s ± 0%     ~     (p=0.616 n=30+23)
      Gzip-4                   4.84MB/s ± 0%  4.89MB/s ± 4%   +1.16%  (p=0.030 n=24+30)
      Gunzip-4                 49.9MB/s ± 4%  50.6MB/s ± 0%     ~     (p=0.162 n=30+23)
      JSONEncode-4             10.9MB/s ± 1%  10.7MB/s ± 6%     ~     (p=0.575 n=24+30)
      JSONDecode-4             2.30MB/s ± 0%  2.32MB/s ± 5%   +0.72%  (p=0.003 n=22+30)
      GoParse-4                1.31MB/s ± 6%  1.34MB/s ± 0%   +2.26%  (p=0.002 n=30+27)
      RegexpMatchEasy0_32-4    30.0MB/s ± 6%  30.0MB/s ± 4%     ~     (p=1.000 n=30+30)
      RegexpMatchEasy0_1K-4     186MB/s ± 0%   187MB/s ± 0%   +0.35%  (p=0.000 n=23+26)
      RegexpMatchEasy1_32-4    31.8MB/s ± 0%  31.5MB/s ± 4%   -0.92%  (p=0.012 n=25+30)
      RegexpMatchEasy1_1K-4     138MB/s ± 0%   143MB/s ± 0%   +3.53%  (p=0.000 n=23+24)
      RegexpMatchMedium_32-4    560kB/s ± 0%   553kB/s ± 4%   -1.19%  (p=0.005 n=23+30)
      RegexpMatchMedium_1K-4   1.87MB/s ± 0%  1.89MB/s ± 3%   +1.04%  (p=0.002 n=24+30)
      RegexpMatchHard_32-4     1.05MB/s ± 0%  1.08MB/s ± 0%   +2.40%  (p=0.000 n=19+23)
      RegexpMatchHard_1K-4     1.12MB/s ± 0%  1.12MB/s ± 5%   +0.12%  (p=0.006 n=25+30)
      Revcomp-4                40.2MB/s ± 1%  39.6MB/s ± 4%     ~     (p=0.242 n=25+30)
      Template-4               2.24MB/s ± 6%  2.21MB/s ± 6%   -1.15%  (p=0.000 n=30+30)
      [Geo mean]               7.87MB/s       7.91MB/s        +0.44%
      
      Change-Id: If374cb7abf83537aa0a176f73c0f736f7800db03
      Reviewed-on: https://go-review.googlesource.com/108735Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      aaf73c6d
    • Cherry Zhang's avatar
      cmd/link: fix plugin on linux/arm64 · ceda47d0
      Cherry Zhang authored
      The init function and runtime.addmoduledata were not added when
      building plugin, which caused the runtime could not find the
      module.
      
      Testplugin is still not enabled on linux/arm64
      (https://go.googlesource.com/go/+/master/src/cmd/dist/test.go#948)
      because the gold linker on the builder is too old, which fails
      with an internal error (see issue #17138). I tested locally and
      it passes.
      
      Fixes #24940.
      Updates #17138.
      
      Change-Id: I26aebca6c38a3443af0949471fa12b6d550e8c6c
      Reviewed-on: https://go-review.googlesource.com/109917
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      ceda47d0
    • Milan Knezevic's avatar
      cmd/compile: add softfloat support to mips64{,le} · 2959128d
      Milan Knezevic authored
      mips64 softfloat support is based on mips implementation and introduces
      new enviroment variable GOMIPS64.
      
      GOMIPS64 is a GOARCH=mips64{,le} specific option, for a choice between
      hard-float and soft-float. Valid values are 'hardfloat' (default) and
      'softfloat'. It is passed to the assembler as
      'GOMIPS64_{hardfloat,softfloat}'.
      
      Change-Id: I7f73078627f7cb37c588a38fb5c997fe09c56134
      Reviewed-on: https://go-review.googlesource.com/108475Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      2959128d
    • Josh Bleecher Snyder's avatar
      cmd/internal/obj: convert unicode C to ASCII C · 62adf6fc
      Josh Bleecher Snyder authored
      Hex before: d0 a1
      Hex after: 43
      
      Not sure where that came from.
      
      Change-Id: I189e7e21f8faf480ba72846b956a149976f720f8
      Reviewed-on: https://go-review.googlesource.com/109777Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      62adf6fc
    • Zhou Peng's avatar
      testing: fix typo mistake · 1f56499d
      Zhou Peng authored
      Change-Id: I561640768c43491288e7f5bd1a34247787793dab
      Reviewed-on: https://go-review.googlesource.com/109935Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      1f56499d
    • Yasuhiro Matsumoto's avatar
      os: os: make Stat("*.txt") fail on windows · e656aebb
      Yasuhiro Matsumoto authored
      Fixes #24999
      
      Change-Id: Ie0bb6a6e0fa3992cdd272d42347af65ae7c95463
      Reviewed-on: https://go-review.googlesource.com/108755
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarAlex Brainman <alex.brainman@gmail.com>
      e656aebb
    • Daniel Martí's avatar
      cmd/compile: add initial README · a835c739
      Daniel Martí authored
      As a follow-up to the first README for cmd/compile/internal/ssa.
      
      Since this is the parent package for all the compiler packages, this
      README serves as an overview of the compiler and its packages. As more
      READMEs are added for specific parts with more detail, such as ssa's,
      they can be linked from this one.
      
      Thanks to Iskander Sharipov, Josh Bleecher Snyder, Matthew Dempsky,
      Alberto Donizetti, and Robert Griesemer for helping with all the details
      in this document.
      
      Change-Id: I820a535e25dce86ccc667ce1c6e92b75fc32f3af
      Reviewed-on: https://go-review.googlesource.com/103935Reviewed-by: default avatarMartin Möhrmann <moehrmann@google.com>
      Reviewed-by: default avatarJosh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      a835c739
    • Josh Bleecher Snyder's avatar
      cmd/compile: increase initial allocation of LSym.R · a76249c3
      Josh Bleecher Snyder authored
      Not a big win, but cheap.
      
      name        old alloc/op      new alloc/op      delta
      Template         34.4MB ± 0%       34.4MB ± 0%  -0.20%  (p=0.000 n=15+15)
      Unicode          29.2MB ± 0%       29.3MB ± 0%  +0.17%  (p=0.000 n=15+15)
      GoTypes           113MB ± 0%        113MB ± 0%  -0.22%  (p=0.000 n=15+15)
      Compiler          509MB ± 0%        508MB ± 0%  -0.11%  (p=0.000 n=15+14)
      SSA              1.46GB ± 0%       1.46GB ± 0%  -0.08%  (p=0.000 n=14+15)
      Flate            23.8MB ± 0%       23.7MB ± 0%  -0.22%  (p=0.000 n=15+15)
      GoParser         27.9MB ± 0%       27.8MB ± 0%  -0.21%  (p=0.000 n=14+15)
      Reflect          77.2MB ± 0%       77.0MB ± 0%  -0.27%  (p=0.000 n=14+15)
      Tar              34.0MB ± 0%       33.9MB ± 0%  -0.21%  (p=0.000 n=13+15)
      XML              42.6MB ± 0%       42.5MB ± 0%  -0.15%  (p=0.000 n=15+15)
      [Geo mean]       75.8MB            75.7MB       -0.15%
      
      name        old allocs/op     new allocs/op     delta
      Template           322k ± 0%         320k ± 0%  -0.60%  (p=0.000 n=15+15)
      Unicode            337k ± 0%         336k ± 0%  -0.23%  (p=0.000 n=12+15)
      GoTypes           1.13M ± 0%        1.12M ± 0%  -0.58%  (p=0.000 n=15+14)
      Compiler          4.67M ± 0%        4.65M ± 0%  -0.38%  (p=0.000 n=14+15)
      SSA               11.7M ± 0%        11.6M ± 0%  -0.25%  (p=0.000 n=15+15)
      Flate              216k ± 0%         214k ± 0%  -0.67%  (p=0.000 n=15+15)
      GoParser           271k ± 0%         270k ± 0%  -0.57%  (p=0.000 n=15+15)
      Reflect            927k ± 0%         920k ± 0%  -0.72%  (p=0.000 n=13+14)
      Tar                318k ± 0%         316k ± 0%  -0.57%  (p=0.000 n=15+15)
      XML                376k ± 0%         375k ± 0%  -0.46%  (p=0.000 n=14+14)
      [Geo mean]         731k              727k       -0.50%
      
      Change-Id: I1417c5881e866fb3efe62a3d0fbe1134275da31a
      Reviewed-on: https://go-review.googlesource.com/109755
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      a76249c3