1. 28 Aug, 2016 5 commits
    • Radu Berinde's avatar
      hash/crc32: fix nil Castagnoli table problem · 8c15a172
      Radu Berinde authored
      When SSE is available, we don't need the Table. However, it is
      returned as a handle by MakeTable. Fix this to always generate
      the table.
      
      Further cleanup is discussed in #16909.
      
      Change-Id: Ic05400d68c6b5d25073ebd962000451746137afc
      Reviewed-on: https://go-review.googlesource.com/27934Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      8c15a172
    • Keith Randall's avatar
      cmd/compile: fix noopt build · 0c6c3d1d
      Keith Randall authored
      Atomic add rules were depending on CSE to combine duplicate atomic ops.
      With -N, CSE doesn't run.
      
      Redo the rules for atomic add so there's only one atomic op.
      Introduce an add-to-first-part-of-tuple pseudo-ops to make the atomic add result correct.
      
      Change-Id: Ib132247051abe5f80fefad6c197db8df8ee06427
      Reviewed-on: https://go-review.googlesource.com/27991
      Run-TryBot: Keith Randall <khr@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      0c6c3d1d
    • Keith Randall's avatar
      cmd/compile: intrinsify the rest of runtime/internal/atomic for amd64 · 84aac622
      Keith Randall authored
      Atomic swap, add/and/or, compare and swap.
      
      Also works on amd64p32.
      
      Change-Id: Idf2d8f3e1255f71deba759e6e75e293afe4ab2ba
      Reviewed-on: https://go-review.googlesource.com/27813Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      84aac622
    • Alex Brainman's avatar
      time: always use $GOROOT/lib/time/zoneinfo.zip with genzabbrs.go · e2e2d10b
      Alex Brainman authored
      genzabbrs.go uses whatever zoneinfo database available on the system.
      This makes genzabbrs.go output change from system to system. Adjust
      go:generate line to always use $GOROOT/lib/time/zoneinfo.zip, so it
      does not matter who runs the command.
      
      Also move go:generate line into zoneinfo.go, so it can be run
      on Unix (see #16368 for details).
      
      Fixes #15802.
      
      Change-Id: I8ae4818aaf40795364e180d7bb4326ad7c07c370
      Reviewed-on: https://go-review.googlesource.com/27832Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      e2e2d10b
    • Radu Berinde's avatar
      hash/crc32: improve the AMD64 implementation using SSE4.2 · 90c3cf4b
      Radu Berinde authored
      The algorithm is explained in the comments. The improvement in
      throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
      for larger buffers.
      
      Additionally, we no longer initialize the software tables if SSE4.2 is
      available.
      
      Adding a test for the SSE implementation (restricted to amd64 and
      amd64p32).
      
      Benchmarks on a Haswell i5-4670 @ 3.4 GHz:
      
      name                           old time/op    new time/op     delta
      CastagnoliCrc15B-4               21.9ns ± 1%     22.9ns ± 0%    +4.45%
      CastagnoliCrc15BMisaligned-4     22.6ns ± 0%     23.4ns ± 0%    +3.43%
      CastagnoliCrc40B-4               23.3ns ± 0%     23.9ns ± 0%    +2.58%
      CastagnoliCrc40BMisaligned-4     25.4ns ± 0%     26.1ns ± 0%    +2.86%
      CastagnoliCrc512-4               72.6ns ± 0%     52.8ns ± 0%   -27.33%
      CastagnoliCrc512Misaligned-4     76.3ns ± 1%     56.3ns ± 0%   -26.18%
      CastagnoliCrc1KB-4                128ns ± 1%       89ns ± 0%   -30.04%
      CastagnoliCrc1KBMisaligned-4      130ns ± 0%       88ns ± 0%   -32.65%
      CastagnoliCrc4KB-4                461ns ± 0%      187ns ± 0%   -59.40%
      CastagnoliCrc4KBMisaligned-4      463ns ± 0%      191ns ± 0%   -58.77%
      CastagnoliCrc32KB-4              3.58µs ± 0%     1.35µs ± 0%   -62.22%
      CastagnoliCrc32KBMisaligned-4    3.58µs ± 0%     1.36µs ± 0%   -61.84%
      
      name                           old speed      new speed       delta
      CastagnoliCrc15B-4              684MB/s ± 1%    655MB/s ± 0%    -4.32%
      CastagnoliCrc15BMisaligned-4    663MB/s ± 0%    641MB/s ± 0%    -3.32%
      CastagnoliCrc40B-4             1.72GB/s ± 0%   1.67GB/s ± 0%    -2.69%
      CastagnoliCrc40BMisaligned-4   1.58GB/s ± 0%   1.53GB/s ± 0%    -2.82%
      CastagnoliCrc512-4             7.05GB/s ± 0%   9.70GB/s ± 0%   +37.59%
      CastagnoliCrc512Misaligned-4   6.71GB/s ± 1%   9.09GB/s ± 0%   +35.43%
      CastagnoliCrc1KB-4             7.98GB/s ± 1%  11.46GB/s ± 0%   +43.55%
      CastagnoliCrc1KBMisaligned-4   7.86GB/s ± 0%  11.70GB/s ± 0%   +48.75%
      CastagnoliCrc4KB-4             8.87GB/s ± 0%  21.80GB/s ± 0%  +145.69%
      CastagnoliCrc4KBMisaligned-4   8.83GB/s ± 0%  21.39GB/s ± 0%  +142.25%
      CastagnoliCrc32KB-4            9.15GB/s ± 0%  24.22GB/s ± 0%  +164.62%
      CastagnoliCrc32KBMisaligned-4  9.16GB/s ± 0%  24.00GB/s ± 0%  +161.94%
      
      Fixes #16107.
      
      Change-Id: Ibe50ea76574674ce0571ef31c31015e0ed66b907
      Reviewed-on: https://go-review.googlesource.com/27931
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      90c3cf4b
  2. 27 Aug, 2016 4 commits
    • Joonas Kuorilehto's avatar
      crypto/tls: add KeyLogWriter for debugging · 320bd562
      Joonas Kuorilehto authored
      Add support for writing TLS client random and master secret
      in NSS key log format.
      
      https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/Key_Log_Format
      
      Normally this is enabled by a developer debugging TLS based
      applications, especially HTTP/2, by setting the KeyLogWriter
      to an open file. The keys negotiated in handshake are then
      logged and can be used to decrypt TLS sessions e.g. in Wireshark.
      
      Applications may choose to add support similar to NSS where this
      is enabled by environment variable, but no such mechanism is
      built in to Go. Instead each application must explicitly enable.
      
      Fixes #13057.
      
      Change-Id: If6edd2d58999903e8390b1674ba4257ecc747ae1
      Reviewed-on: https://go-review.googlesource.com/27434
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      320bd562
    • Keith Randall's avatar
      Revert "hash/crc32: improve the AMD64 implementation using SSE4.2" · 3427f166
      Keith Randall authored
      This reverts commit 54d7de7d.
      
      It was breaking non-amd64 builds.
      
      Change-Id: I22650e922498eeeba3d4fa08bb4ea40a210c8f97
      Reviewed-on: https://go-review.googlesource.com/27925Reviewed-by: default avatarKeith Randall <khr@golang.org>
      3427f166
    • Radu Berinde's avatar
      hash/crc32: improve the AMD64 implementation using SSE4.2 · 54d7de7d
      Radu Berinde authored
      The algorithm is explained in the comments. The improvement in
      throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
      for larger buffers.
      
      Additionally, we no longer initialize the software tables if SSE4.2 is
      available.
      
      Benchmarks on a Haswell i5-4670 @ 3.4 GHz:
      
      name                           old time/op    new time/op     delta
      CastagnoliCrc15B-4               21.9ns ± 1%     22.9ns ± 0%    +4.45%
      CastagnoliCrc15BMisaligned-4     22.6ns ± 0%     23.4ns ± 0%    +3.43%
      CastagnoliCrc40B-4               23.3ns ± 0%     23.9ns ± 0%    +2.58%
      CastagnoliCrc40BMisaligned-4     25.4ns ± 0%     26.1ns ± 0%    +2.86%
      CastagnoliCrc512-4               72.6ns ± 0%     52.8ns ± 0%   -27.33%
      CastagnoliCrc512Misaligned-4     76.3ns ± 1%     56.3ns ± 0%   -26.18%
      CastagnoliCrc1KB-4                128ns ± 1%       89ns ± 0%   -30.04%
      CastagnoliCrc1KBMisaligned-4      130ns ± 0%       88ns ± 0%   -32.65%
      CastagnoliCrc4KB-4                461ns ± 0%      187ns ± 0%   -59.40%
      CastagnoliCrc4KBMisaligned-4      463ns ± 0%      191ns ± 0%   -58.77%
      CastagnoliCrc32KB-4              3.58µs ± 0%     1.35µs ± 0%   -62.22%
      CastagnoliCrc32KBMisaligned-4    3.58µs ± 0%     1.36µs ± 0%   -61.84%
      
      name                           old speed      new speed       delta
      CastagnoliCrc15B-4              684MB/s ± 1%    655MB/s ± 0%    -4.32%
      CastagnoliCrc15BMisaligned-4    663MB/s ± 0%    641MB/s ± 0%    -3.32%
      CastagnoliCrc40B-4             1.72GB/s ± 0%   1.67GB/s ± 0%    -2.69%
      CastagnoliCrc40BMisaligned-4   1.58GB/s ± 0%   1.53GB/s ± 0%    -2.82%
      CastagnoliCrc512-4             7.05GB/s ± 0%   9.70GB/s ± 0%   +37.59%
      CastagnoliCrc512Misaligned-4   6.71GB/s ± 1%   9.09GB/s ± 0%   +35.43%
      CastagnoliCrc1KB-4             7.98GB/s ± 1%  11.46GB/s ± 0%   +43.55%
      CastagnoliCrc1KBMisaligned-4   7.86GB/s ± 0%  11.70GB/s ± 0%   +48.75%
      CastagnoliCrc4KB-4             8.87GB/s ± 0%  21.80GB/s ± 0%  +145.69%
      CastagnoliCrc4KBMisaligned-4   8.83GB/s ± 0%  21.39GB/s ± 0%  +142.25%
      CastagnoliCrc32KB-4            9.15GB/s ± 0%  24.22GB/s ± 0%  +164.62%
      CastagnoliCrc32KBMisaligned-4  9.16GB/s ± 0%  24.00GB/s ± 0%  +161.94%
      
      Fixes #16107.
      
      Change-Id: I8fa827ec03f708ba27ee71c833f7544ad9dc5bc3
      Reviewed-on: https://go-review.googlesource.com/24471Reviewed-by: default avatarKeith Randall <khr@golang.org>
      54d7de7d
    • Robert Griesemer's avatar
      cmd/compile: make dumpdepth a global again · 0d23c285
      Robert Griesemer authored
      Fixes indenting in debug output like -W.
      
      Change-Id: Ia16b0bad47428cee71fe036c297731e841ec9ca0
      Reviewed-on: https://go-review.googlesource.com/27924Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      Run-TryBot: Matthew Dempsky <mdempsky@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      0d23c285
  3. 26 Aug, 2016 23 commits
  4. 25 Aug, 2016 8 commits
    • Josh Bleecher Snyder's avatar
      net/http, cmd/compile: minor vet fixes · f9acd391
      Josh Bleecher Snyder authored
      Updates #11041
      
      Change-Id: Ia0151723e3bc0d163cc687a02bfc5e0285d95ffa
      Reviewed-on: https://go-review.googlesource.com/27810
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      f9acd391
    • Keith Randall's avatar
      cmd/compile: inline atomics from runtime/internal/atomic on amd64 · 320ddcf8
      Keith Randall authored
      Inline atomic reads and writes on amd64.  There's no reason
      to pay the overhead of a call for these.
      
      To keep atomic loads from being reordered, we make them
      return a <value,memory> tuple.
      
      Change the meaning of resultInArg0 for tuple-generating ops
      to mean the first part of the result tuple, not the second.
      This means we can always put the store part of the tuple last,
      matching how arguments are laid out.  This requires reordering
      the outputs of add32carry and sub32carry and their descendents
      in various architectures.
      
      benchmark                    old ns/op     new ns/op     delta
      BenchmarkAtomicLoad64-8      2.09          0.26          -87.56%
      BenchmarkAtomicStore64-8     7.54          5.72          -24.14%
      
      TBD (in a different CL): Cas, Or8, ...
      
      Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207
      Reviewed-on: https://go-review.googlesource.com/27641Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      320ddcf8
    • Josh Bleecher Snyder's avatar
      all: fix assembly vet issues · 71ab9fa3
      Josh Bleecher Snyder authored
      Add missing function prototypes.
      Fix function prototypes.
      Use FP references instead of SP references.
      Fix variable names.
      Update comments.
      Clean up whitespace. (Not for vet.)
      
      All fairly minor fixes to make vet happy.
      
      Updates #11041
      
      Change-Id: Ifab2cdf235ff61cdc226ab1d84b8467b5ac9446c
      Reviewed-on: https://go-review.googlesource.com/27713
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      71ab9fa3
    • Joe Tsai's avatar
      archive/tar: isolate regular and sparse file handling as methods · 6af7639a
      Joe Tsai authored
      Factor out the regular file handling logic into handleRegularFile
      from nextHeader. We will need to reuse this logic when fixing #15573
      in a future CL.
      
      Factor out the sparse file handling logic into handleSparseFile.
      Currently this logic is split between nextHeader (for GNU sparse
      files) and Next (for PAX sparse files). Instead, we move this
      related code into a single method.
      
      There is no overall logic change. Thus, no unit tests.
      
      Updates #15573 #15564
      
      Change-Id: I3b8270d8b4e080e77d6c0df6a123d677c82cc466
      Reviewed-on: https://go-review.googlesource.com/27454Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      6af7639a
    • Sina Siadat's avatar
      net/http: send Content-Range if no byte range overlaps · aa9b3d70
      Sina Siadat authored
      RFC 7233, section 4.4 says:
      >>>
      For byte ranges, failing to overlap the current extent means that the
      first-byte-pos of all of the byte-range-spec values were greater than the
      current length of the selected representation.  When this status code is
      generated in response to a byte-range request, the sender SHOULD generate a
      Content-Range header field specifying the current length of the selected
      representation
      <<<
      
      Thus, we should send the Content-Range only if none of the ranges
      overlap.
      
      Fixes #15798.
      
      Change-Id: Ic9a3e1b3a8730398b4bdff877a8f2fd2e30149e3
      Reviewed-on: https://go-review.googlesource.com/24212
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      aa9b3d70
    • Josh Bleecher Snyder's avatar
      cmd/compile: when inlining ==, don’t take the address of the values · 0bc94a88
      Josh Bleecher Snyder authored
      This CL reworks walkcompare for clarity and concision.
      It also makes one significant functional change.
      (The functional change is hard to separate cleanly
      from the cleanup, so I just did them together.)
      When inlining and unrolling an equality comparison
      for a small struct or array, compare the elements like:
      
      a[0] == b[0] && a[1] == b[1]
      
      rather than
      
      pa := &a
      pb := &b
      pa[0] == pb[0] && pa[1] == pb[1]
      
      The result is the same, but taking the address
      and working through the indirect
      forces the backends to generate less efficient code.
      
      This is only an improvement with the SSA backend.
      However, every port but s390x now has a working
      SSA backend, and switching to the SSA backend
      by default everywhere is a priority for Go 1.8.
      It thus seems reasonable to start to prioritize
      SSA performance over the old backend.
      
      Updates #15303
      
      
      Sample code:
      
      type T struct {
      	a, b int8
      }
      
      func g(a T) bool {
      	return a == T{1, 2}
      }
      
      
      SSA before:
      
      "".g t=1 size=80 args=0x10 locals=0x8
      	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $8-16
      	0x0000 00000 (badeq.go:7)	SUBQ	$8, SP
      	0x0004 00004 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
      	0x0004 00004 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
      	0x0004 00004 (badeq.go:8)	MOVBLZX	"".a+16(FP), AX
      	0x0009 00009 (badeq.go:8)	MOVB	AL, "".autotmp_0+6(SP)
      	0x000d 00013 (badeq.go:8)	MOVBLZX	"".a+17(FP), AX
      	0x0012 00018 (badeq.go:8)	MOVB	AL, "".autotmp_0+7(SP)
      	0x0016 00022 (badeq.go:8)	MOVB	$0, "".autotmp_1+4(SP)
      	0x001b 00027 (badeq.go:8)	MOVB	$1, "".autotmp_1+4(SP)
      	0x0020 00032 (badeq.go:8)	MOVB	$2, "".autotmp_1+5(SP)
      	0x0025 00037 (badeq.go:8)	MOVBLZX	"".autotmp_0+6(SP), AX
      	0x002a 00042 (badeq.go:8)	MOVBLZX	"".autotmp_1+4(SP), CX
      	0x002f 00047 (badeq.go:8)	CMPB	AL, CL
      	0x0031 00049 (badeq.go:8)	JNE	70
      	0x0033 00051 (badeq.go:8)	MOVBLZX	"".autotmp_0+7(SP), AX
      	0x0038 00056 (badeq.go:8)	CMPB	AL, $2
      	0x003a 00058 (badeq.go:8)	SETEQ	AL
      	0x003d 00061 (badeq.go:8)	MOVB	AL, "".~r1+24(FP)
      	0x0041 00065 (badeq.go:8)	ADDQ	$8, SP
      	0x0045 00069 (badeq.go:8)	RET
      	0x0046 00070 (badeq.go:8)	MOVB	$0, AL
      	0x0048 00072 (badeq.go:8)	JMP	61
      
      SSA after:
      
      "".g t=1 size=32 args=0x10 locals=0x0
      	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $0-16
      	0x0000 00000 (badeq.go:7)	NOP
      	0x0000 00000 (badeq.go:7)	NOP
      	0x0000 00000 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
      	0x0000 00000 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
      	0x0000 00000 (badeq.go:8)	MOVBLZX	"".a+8(FP), AX
      	0x0005 00005 (badeq.go:8)	CMPB	AL, $1
      	0x0007 00007 (badeq.go:8)	JNE	25
      	0x0009 00009 (badeq.go:8)	MOVBLZX	"".a+9(FP), CX
      	0x000e 00014 (badeq.go:8)	CMPB	CL, $2
      	0x0011 00017 (badeq.go:8)	SETEQ	AL
      	0x0014 00020 (badeq.go:8)	MOVB	AL, "".~r1+16(FP)
      	0x0018 00024 (badeq.go:8)	RET
      	0x0019 00025 (badeq.go:8)	MOVB	$0, AL
      	0x001b 00027 (badeq.go:8)	JMP	20
      
      
      Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86
      Reviewed-on: https://go-review.googlesource.com/22277
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMatthew Dempsky <mdempsky@google.com>
      0bc94a88
    • Ian Lance Taylor's avatar
      path/filepath: don't return SkipDir at top · 157fc454
      Ian Lance Taylor authored
      If the walker function called on a top-level file returns SkipDir,
      then (before this change) Walk would return SkipDir, which the
      documentation implies will not happen.
      
      Fixes #16280.
      
      Change-Id: I37d63bdcef7af4b56e342b624cf0d4b42e65c297
      Reviewed-on: https://go-review.googlesource.com/24780
      Run-TryBot: Ian Lance Taylor <iant@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      157fc454
    • Josh Bleecher Snyder's avatar
      cmd/compile/internal/obj/x86: clean up "is leaf?" check · 307de654
      Josh Bleecher Snyder authored
      Minor code cleanup. No functional changes.
      
      Change-Id: I2e631b43b122174302a182a1a286c0f873851ce6
      Reviewed-on: https://go-review.googlesource.com/24813
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      307de654