cmd/compile/internal/ssa: combine byte stores on amd64
On amd64 we optimize encoding/binary.BigEndian.PutUint{16,32,64} into bswap + single store, but strangely enough not LittleEndian.PutUint{16,32}. We have similar rules, but they use 64-bit shifts everywhere, and fail for 16/32-bit case. Add rules that matchLittleEndian.PutUint, and relevant tests. Performance results: LittleEndianPutUint16-6 1.43ns ± 0% 1.07ns ± 0% -25.17% (p=0.000 n=9+9) LittleEndianPutUint32-6 2.14ns ± 0% 0.94ns ± 0% -56.07% (p=0.019 n=6+8) LittleEndianPutUint16-6 1.40GB/s ± 0% 1.87GB/s ± 0% +33.24% (p=0.000 n=9+9) LittleEndianPutUint32-6 1.87GB/s ± 0% 4.26GB/s ± 0% +128.54% (p=0.000 n=8+8) Discovered, while looking at ethereum_ethash from community benchmarks Change-Id: Id86d5443687ecddd2803edf3203dbdd1246f61fe Reviewed-on: https://go-review.googlesource.com/95475 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Showing
This diff is collapsed.
Please register or sign in to comment