• Michael Munday's avatar
    internal/bytealg: add SIMD byte count implementation for s390x · b6245cef
    Michael Munday authored
    Add a 'single lane' SIMD implemementation of the single byte count
    function for use on machines that support the vector facility. This
    allows up to 16 bytes to be counted per loop iteration.
    
    We can probably improve performance further by adding more 'lanes'
    (i.e. counting more bytes in parallel) however this will increase
    the complexity of the function so I'm not sure it is worth doing
    yet.
    
    name                old speed      new speed       delta
    pkg:strings goos:linux goarch:s390x
    CountByte/10         789MB/s ± 0%   1131MB/s ± 0%    +43.44%  (p=0.000 n=9+9)
    CountByte/32         936MB/s ± 0%   3236MB/s ± 0%   +245.87%  (p=0.000 n=8+9)
    CountByte/4096      1.06GB/s ± 0%  21.26GB/s ± 0%  +1907.07%  (p=0.000 n=10+10)
    CountByte/4194304   1.06GB/s ± 0%  20.54GB/s ± 0%  +1838.50%  (p=0.000 n=10+10)
    CountByte/67108864  1.06GB/s ± 0%  18.31GB/s ± 0%  +1629.51%  (p=0.000 n=10+10)
    pkg:bytes goos:linux goarch:s390x
    CountSingle/10       800MB/s ± 0%    986MB/s ± 0%    +23.21%  (p=0.000 n=9+10)
    CountSingle/32       925MB/s ± 0%   2744MB/s ± 0%   +196.55%  (p=0.000 n=9+10)
    CountSingle/4K      1.26GB/s ± 0%  19.44GB/s ± 0%  +1445.59%  (p=0.000 n=10+10)
    CountSingle/4M      1.26GB/s ± 0%  20.28GB/s ± 0%  +1510.26%  (p=0.000 n=8+10)
    CountSingle/64M     1.23GB/s ± 0%  17.78GB/s ± 0%  +1350.67%  (p=0.000 n=9+10)
    
    Change-Id: I230d57905db92a8fdfc50b1d5be338941ae3a7a1
    Reviewed-on: https://go-review.googlesource.com/c/go/+/199979
    Run-TryBot: Michael Munday <mike.munday@ibm.com>
    Reviewed-by: default avatarKeith Randall <khr@golang.org>
    b6245cef
count_generic.go 460 Bytes