• Josh Bleecher Snyder's avatar
    cmd/internal/obj/x86: make function prologue more predictable · f4b48de3
    Josh Bleecher Snyder authored
    Static branch prediction guesses that forward branches aren't taken.
    Since stacks are rarely grown, make the forward branch mean grow.
    
    Sample disassembly for
    
    func f() {
    	_ = [128]byte{}
    }
    
    Before:
    
    TEXT main.f(SB) x.go
    	x.go:3	0x2000	65488b0c25a0080000	GS MOVQ GS:0x8a0, CX
    	x.go:3	0x2009	483b6110		CMPQ 0x10(CX), SP
    	x.go:3	0x200d	7707			JA 0x2016
    	x.go:3	0x200f	e88c410400		CALL runtime.morestack_noctxt(SB)
    	x.go:3	0x2014	ebea			JMP main.f(SB)
    	x.go:3	0x2016	4881ec80000000		SUBQ $0x80, SP
    	x.go:4	0x201d	488d3c24		LEAQ 0(SP), DI
    	x.go:4	0x2021	31c0			XORL AX, AX
    	x.go:4	0x2023	e8cc640400		CALL 0x484f4
    	x.go:5	0x2028	4881c480000000		ADDQ $0x80, SP
    	x.go:5	0x202f	c3			RET
    
    After:
    
    TEXT main.f(SB) x.go
    	x.go:3	0x2000	65488b0c25a0080000	GS MOVQ GS:0x8a0, CX
    	x.go:3	0x2009	483b6110		CMPQ 0x10(CX), SP
    	x.go:3	0x200d	761a			JBE 0x2029
    	x.go:3	0x200f	4881ec80000000		SUBQ $0x80, SP
    	x.go:4	0x2016	488d3c24		LEAQ 0(SP), DI
    	x.go:4	0x201a	31c0			XORL AX, AX
    	x.go:4	0x201c	e813740400		CALL 0x49434
    	x.go:5	0x2021	4881c480000000		ADDQ $0x80, SP
    	x.go:5	0x2028	c3			RET
    	x.go:3	0x2029	e8224f0400		CALL runtime.morestack_noctxt(SB)
    	x.go:3	0x202e	ebd0			JMP main.f(SB)
    
    Updates #10587.
    
    Sample benchmarks on a 2.8 GHz Intel Core i7:
    
    package sort
    
    name            old mean              new mean              delta
    SearchWrappers   134ns × (0.99,1.01)   132ns × (0.99,1.01)  -1.73% (p=0.000 n=15+14)
    SortString1K     215µs × (0.99,1.01)   213µs × (0.99,1.01)  -0.61% (p=0.020 n=14+15)
    StableString1K   311µs × (0.99,1.02)   309µs × (0.99,1.02)    ~    (p=0.077 n=14+15)
    SortInt1K        103µs × (0.99,1.02)   100µs × (0.98,1.01)  -3.34% (p=0.000 n=15+15)
    StableInt1K      102µs × (0.99,1.01)    98µs × (0.97,1.04)  -3.53% (p=0.000 n=15+15)
    SortInt64K      10.1ms × (0.98,1.02)   9.7ms × (0.99,1.01)  -3.86% (p=0.000 n=14+15)
    StableInt64K    8.70ms × (0.99,1.01)  8.44ms × (0.99,1.03)  -2.93% (p=0.000 n=14+15)
    Sort1e2         51.2µs × (1.00,1.01)  48.9µs × (0.99,1.02)  -4.48% (p=0.000 n=13+15)
    Stable1e2        100µs × (0.99,1.02)    99µs × (0.99,1.01)  -1.15% (p=0.000 n=14+13)
    Sort1e4         11.1ms × (0.99,1.02)  10.4ms × (0.99,1.01)  -6.02% (p=0.000 n=15+14)
    Stable1e4       30.6ms × (0.99,1.01)  30.3ms × (0.99,1.02)  -1.02% (p=0.001 n=15+14)
    Sort1e6          1.75s × (0.99,1.02)   1.66s × (0.98,1.03)  -4.95% (p=0.000 n=14+15)
    Stable1e6        6.31s × (0.99,1.01)   6.26s × (0.99,1.01)  -0.79% (p=0.002 n=15+15)
    
    package regexp
    
    name                          old mean              new mean              delta
    Literal                        131ns × (0.99,1.01)   130ns × (0.99,1.03)  -1.07% (p=0.004 n=14+15)
    NotLiteral                    2.13µs × (0.99,1.01)  2.01µs × (0.99,1.03)  -5.71% (p=0.000 n=14+14)
    MatchClass                    3.15µs × (0.99,1.01)  3.04µs × (0.99,1.02)  -3.40% (p=0.000 n=15+15)
    MatchClass_InRange            2.92µs × (0.99,1.01)  2.77µs × (0.99,1.02)  -5.05% (p=0.000 n=13+15)
    ReplaceAll                    2.17µs × (0.99,1.02)  2.06µs × (0.99,1.01)  -5.19% (p=0.000 n=15+13)
    AnchoredLiteralShortNonMatch   116ns × (0.99,1.02)   113ns × (0.99,1.01)  -2.75% (p=0.000 n=15+14)
    AnchoredLiteralLongNonMatch    125ns × (0.99,1.01)   127ns × (0.98,1.02)  +1.49% (p=0.000 n=15+15)
    AnchoredShortMatch             178ns × (0.99,1.02)   175ns × (0.99,1.01)  -1.62% (p=0.000 n=15+13)
    AnchoredLongMatch              328ns × (0.99,1.00)   341ns × (0.99,1.01)  +3.73% (p=0.000 n=12+15)
    OnePassShortA                  773ns × (0.99,1.02)   752ns × (0.99,1.01)  -2.78% (p=0.000 n=15+13)
    NotOnePassShortA               794ns × (0.99,1.03)   780ns × (0.99,1.02)  -1.75% (p=0.001 n=15+15)
    OnePassShortB                  608ns × (0.99,1.01)   591ns × (0.99,1.02)  -2.86% (p=0.000 n=15+14)
    NotOnePassShortB               576ns × (0.99,1.01)   571ns × (0.99,1.02)  -0.74% (p=0.035 n=15+15)
    OnePassLongPrefix              131ns × (0.99,1.02)   130ns × (0.99,1.02)  -1.32% (p=0.003 n=15+15)
    OnePassLongNotPrefix           503ns × (0.99,1.02)   481ns × (0.99,1.01)  -4.34% (p=0.000 n=15+13)
    MatchEasy0_32                  102ns × (0.98,1.01)   101ns × (0.99,1.02)    ~    (p=0.907 n=15+14)
    MatchEasy0_1K                  617ns × (0.99,1.02)   634ns × (0.98,1.02)  +2.77% (p=0.000 n=15+15)
    MatchEasy0_32K                10.9µs × (0.99,1.01)  11.1µs × (0.99,1.01)  +1.59% (p=0.000 n=15+15)
    MatchEasy0_1M                  406µs × (0.99,1.02)   410µs × (0.99,1.02)  +1.01% (p=0.000 n=14+15)
    MatchEasy0_32M                13.4ms × (0.99,1.01)  13.7ms × (0.99,1.02)  +1.64% (p=0.000 n=12+15)
    MatchEasy1_32                 83.7ns × (0.98,1.02)  83.0ns × (0.98,1.02)    ~    (p=0.190 n=15+15)
    MatchEasy1_1K                 1.46µs × (0.99,1.02)  1.39µs × (0.99,1.02)  -4.83% (p=0.000 n=15+15)
    MatchEasy1_32K                49.4µs × (0.99,1.01)  49.4µs × (0.99,1.01)    ~    (p=0.205 n=15+15)
    MatchEasy1_1M                 1.72ms × (0.99,1.02)  1.75ms × (0.99,1.01)  +1.34% (p=0.000 n=15+15)
    MatchEasy1_32M                55.5ms × (0.99,1.01)  56.1ms × (0.99,1.02)  +1.10% (p=0.002 n=15+15)
    MatchMedium_32                1.37µs × (0.99,1.04)  1.33µs × (0.99,1.01)  -2.87% (p=0.000 n=15+15)
    MatchMedium_1K                41.1µs × (0.99,1.02)  40.4µs × (0.99,1.02)  -1.59% (p=0.000 n=15+15)
    MatchMedium_32K               1.71ms × (0.99,1.01)  1.75ms × (0.99,1.02)  +2.36% (p=0.000 n=14+15)
    MatchMedium_1M                54.5ms × (0.99,1.01)  56.1ms × (0.99,1.01)  +2.94% (p=0.000 n=13+15)
    MatchMedium_32M                1.75s × (0.99,1.01)   1.80s × (0.99,1.01)  +2.77% (p=0.000 n=15+15)
    MatchHard_32                  2.12µs × (0.99,1.02)  2.06µs × (0.99,1.01)  -2.60% (p=0.000 n=15+14)
    MatchHard_1K                  64.4µs × (0.98,1.02)  62.2µs × (0.99,1.01)  -3.33% (p=0.000 n=15+15)
    MatchHard_32K                 2.74ms × (0.99,1.01)  2.75ms × (0.99,1.01)    ~    (p=0.310 n=15+14)
    MatchHard_1M                  87.1ms × (0.99,1.02)  88.2ms × (0.99,1.01)  +1.36% (p=0.000 n=14+15)
    MatchHard_32M                  2.79s × (0.99,1.02)   2.83s × (0.99,1.02)  +1.26% (p=0.004 n=15+14)
    
    go1 benchmarks
    
    name                   old time/op    new time/op    delta
    BinaryTree17              3.34s ± 3%     3.28s ± 2%  -1.86%  (p=0.000 n=67+66)
    Fannkuch11                2.50s ± 1%     2.51s ± 1%  +0.24%  (p=0.016 n=63+66)
    FmtFprintfEmpty          50.3ns ± 1%    50.2ns ± 2%  -0.30%  (p=0.001 n=62+67)
    FmtFprintfString          178ns ± 1%     166ns ± 1%  -7.10%  (p=0.000 n=62+59)
    FmtFprintfInt             168ns ± 1%     161ns ± 2%  -4.41%  (p=0.000 n=66+64)
    FmtFprintfIntInt          292ns ± 1%     282ns ± 2%  -3.55%  (p=0.000 n=62+60)
    FmtFprintfPrefixedInt     245ns ± 2%     239ns ± 2%  -2.24%  (p=0.000 n=66+65)
    FmtFprintfFloat           338ns ± 2%     326ns ± 1%  -3.42%  (p=0.000 n=64+59)
    FmtManyArgs              1.14µs ± 1%    1.10µs ± 2%  -3.55%  (p=0.000 n=62+62)
    GobDecode                8.88ms ± 2%    8.74ms ± 1%  -1.55%  (p=0.000 n=66+62)
    GobEncode                6.84ms ± 2%    6.61ms ± 2%  -3.32%  (p=0.000 n=61+67)
    Gzip                      356ms ± 2%     352ms ± 2%  -1.07%  (p=0.000 n=67+66)
    Gunzip                   90.6ms ± 2%    89.8ms ± 1%  -0.83%  (p=0.000 n=65+64)
    HTTPClientServer         82.6µs ± 2%    82.5µs ± 2%    ~     (p=0.832 n=65+63)
    JSONEncode               17.5ms ± 2%    16.8ms ± 2%  -3.77%  (p=0.000 n=63+63)
    JSONDecode               63.3ms ± 2%    59.0ms ± 2%  -6.85%  (p=0.000 n=64+63)
    Mandelbrot200            3.85ms ± 1%    3.85ms ± 1%    ~     (p=0.127 n=65+62)
    GoParse                  3.75ms ± 2%    3.66ms ± 2%  -2.39%  (p=0.000 n=66+64)
    RegexpMatchEasy0_32       100ns ± 2%     100ns ± 1%  -0.65%  (p=0.000 n=62+64)
    RegexpMatchEasy0_1K       342ns ± 1%     341ns ± 1%  -0.43%  (p=0.000 n=65+64)
    RegexpMatchEasy1_32      82.8ns ± 2%    82.8ns ± 2%    ~     (p=0.977 n=63+64)
    RegexpMatchEasy1_1K       511ns ± 2%     506ns ± 2%  -1.01%  (p=0.000 n=63+64)
    RegexpMatchMedium_32      139ns ± 1%     134ns ± 3%  -3.27%  (p=0.000 n=59+60)
    RegexpMatchMedium_1K     41.8µs ± 2%    40.5µs ± 2%  -3.05%  (p=0.000 n=62+64)
    RegexpMatchHard_32       2.13µs ± 1%    2.09µs ± 1%  -2.22%  (p=0.000 n=60+65)
    RegexpMatchHard_1K       64.4µs ± 3%    62.8µs ± 2%  -2.58%  (p=0.000 n=65+59)
    Revcomp                   531ms ± 2%     529ms ± 1%  -0.28%  (p=0.022 n=61+61)
    Template                 73.2ms ± 1%    73.1ms ± 1%    ~     (p=0.794 n=66+63)
    TimeParse                 369ns ± 1%     352ns ± 1%  -4.68%  (p=0.000 n=65+66)
    TimeFormat                374ns ± 2%     348ns ± 2%  -7.01%  (p=0.000 n=66+64)
    
    Change-Id: Ib190b5bb48a3e9087711d9e3383621d3103dd342
    Reviewed-on: https://go-review.googlesource.com/10367Reviewed-by: default avatarRuss Cox <rsc@golang.org>
    f4b48de3
obj6.go 29.7 KB