• Daniel Martí's avatar
    encoding/json: fix performance regression in the decoder · e5f6e2d1
    Daniel Martí authored
    In golang.org/cl/145218, a feature was added where the JSON decoder
    would keep track of the entire path to a field when reporting an
    UnmarshalTypeError.
    
    However, we all failed to check if this affected the benchmarks - myself
    included, as a reviewer. Below are the numbers comparing the CL's parent
    with itself, once it was merged:
    
    name           old time/op    new time/op    delta
    CodeDecoder-8    12.9ms ± 1%    28.2ms ± 2%   +119.33%  (p=0.002 n=6+6)
    
    name           old speed      new speed      delta
    CodeDecoder-8   151MB/s ± 1%    69MB/s ± 3%    -54.40%  (p=0.002 n=6+6)
    
    name           old alloc/op   new alloc/op   delta
    CodeDecoder-8    2.74MB ± 0%  109.39MB ± 0%  +3891.83%  (p=0.002 n=6+6)
    
    name           old allocs/op  new allocs/op  delta
    CodeDecoder-8     77.5k ± 0%    168.5k ± 0%   +117.30%  (p=0.004 n=6+5)
    
    The reason why the decoder got twice as slow is because it now allocated
    ~40x as many objects, which puts a lot of pressure on the garbage
    collector.
    
    The reason is that the CL concatenated strings every time a nested field
    was decoded. In other words, practically every field generated garbage
    when decoded. This is hugely wasteful, especially considering that the
    vast majority of JSON decoding inputs won't return UnmarshalTypeError.
    
    Instead, use a stack of fields, and make sure to always use the same
    backing array, to ensure we only need to grow the slice to the maximum
    depth once.
    
    The original CL also introduced a bug. The field stack string wasn't
    reset to its original state when reaching "d.opcode == scanEndObject",
    so the last field in a decoded struct could leak. For example, an added
    test decodes a list of structs, and encoding/json before this CL would
    fail:
    
    	got:  cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
    	want: cannot unmarshal string into Go struct field T.Ts.Y of type int
    
    To fix that, simply reset the stack after decoding every field, even if
    it's the last.
    
    Below is the original performance versus this CL. There's a tiny
    performance hit, probably due to the append for every decoded field, but
    at least we're back to the usual ~150MB/s.
    
    name           old time/op    new time/op    delta
    CodeDecoder-8    12.9ms ± 1%    13.0ms ± 1%  +1.25%  (p=0.009 n=6+6)
    
    name           old speed      new speed      delta
    CodeDecoder-8   151MB/s ± 1%   149MB/s ± 1%  -1.24%  (p=0.009 n=6+6)
    
    name           old alloc/op   new alloc/op   delta
    CodeDecoder-8    2.74MB ± 0%    2.74MB ± 0%  +0.00%  (p=0.002 n=6+6)
    
    name           old allocs/op  new allocs/op  delta
    CodeDecoder-8     77.5k ± 0%     77.5k ± 0%  +0.00%  (p=0.002 n=6+6)
    
    Finally, make all of these benchmarks report allocs by default. The
    decoder ones are pretty sensitive to generated garbage, so ReportAllocs
    would have made the performance regression more obvious.
    
    Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
    Reviewed-on: https://go-review.googlesource.com/c/go/+/167978Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
    Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    e5f6e2d1
bench_test.go 8.07 KB