cmd/compile: optimize ARM64 code with CMN/TST
Use CMN/TST to simplify comparisons. This can reduce the register pressure by removing single def/use registers for example: ADDW R0, R1, R8 -> CMNW R1, R0 ; CMN is an alias of ADDS. CBZW R8, label -> BEQ label ; single def/use of R8 removed. Little change in performance of go1 benchmark on Amberwing: name old time/op new time/op delta RegexpMatchEasy0_32 247ns ± 0% 246ns ± 0% -0.40% (p=0.008 n=5+5) RegexpMatchEasy0_1K 581ns ± 0% 580ns ± 0% ~ (p=0.079 n=4+5) RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.008 n=5+5) RegexpMatchEasy1_1K 804ns ± 0% 806ns ± 0% +0.25% (p=0.016 n=5+4) RegexpMatchMedium_32 313ns ± 0% 311ns ± 0% -0.64% (p=0.008 n=5+5) RegexpMatchMedium_1K 52.2µs ± 0% 51.9µs ± 0% -0.51% (p=0.008 n=5+5) RegexpMatchHard_32 2.76µs ± 3% 2.74µs ± 0% ~ (p=0.683 n=5+5) RegexpMatchHard_1K 78.8µs ± 0% 78.9µs ± 0% +0.04% (p=0.008 n=5+5) FmtFprintfEmpty 58.6ns ± 0% 57.7ns ± 0% -1.54% (p=0.008 n=5+5) FmtFprintfString 118ns ± 0% 115ns ± 0% -2.54% (p=0.008 n=5+5) FmtFprintfInt 119ns ± 0% 119ns ± 0% ~ (all equal) FmtFprintfIntInt 192ns ± 0% 192ns ± 0% ~ (all equal) FmtFprintfPrefixedInt 224ns ± 0% 205ns ± 0% -8.48% (p=0.008 n=5+5) FmtFprintfFloat 336ns ± 0% 333ns ± 1% ~ (p=0.683 n=5+5) FmtManyArgs 779ns ± 1% 760ns ± 1% -2.41% (p=0.008 n=5+5) Gzip 437ms ± 0% 436ms ± 0% -0.27% (p=0.008 n=5+5) HTTPClientServer 90.1µs ± 1% 91.1µs ± 0% +1.19% (p=0.008 n=5+5) JSONEncode 20.1ms ± 0% 20.2ms ± 1% ~ (p=0.690 n=5+5) JSONDecode 94.5ms ± 1% 94.1ms ± 1% ~ (p=0.095 n=5+5) Mandelbrot200 5.37ms ± 0% 5.37ms ± 0% ~ (p=0.421 n=5+5) TimeParse 450ns ± 0% 446ns ± 0% -0.89% (p=0.000 n=5+4) TimeFormat 483ns ± 1% 473ns ± 0% -2.19% (p=0.008 n=5+5) Template 90.6ms ± 0% 89.7ms ± 0% -0.93% (p=0.008 n=5+5) GoParse 5.97ms ± 0% 6.01ms ± 0% +0.65% (p=0.008 n=5+5) BinaryTree17 11.8s ± 0% 11.7s ± 0% -0.28% (p=0.016 n=5+5) Revcomp 669ms ± 0% 669ms ± 0% ~ (p=0.222 n=5+5) Fannkuch11 3.28s ± 0% 3.34s ± 0% +1.72% (p=0.016 n=4+5) [Geo mean] 46.6µs 46.3µs -0.74% name old speed new speed delta RegexpMatchEasy0_32 129MB/s ± 0% 130MB/s ± 0% +0.32% (p=0.016 n=5+4) RegexpMatchEasy0_1K 1.76GB/s ± 0% 1.76GB/s ± 0% +0.13% (p=0.016 n=4+5) RegexpMatchEasy1_32 131MB/s ± 0% 132MB/s ± 0% +0.32% (p=0.008 n=5+5) RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% -0.24% (p=0.016 n=5+4) RegexpMatchMedium_32 3.19MB/s ± 0% 3.21MB/s ± 0% +0.63% (p=0.008 n=5+5) RegexpMatchMedium_1K 19.6MB/s ± 0% 19.7MB/s ± 0% +0.51% (p=0.029 n=4+4) RegexpMatchHard_32 11.6MB/s ± 2% 11.7MB/s ± 0% ~ (p=1.000 n=5+5) RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=0.079 n=4+5) Gzip 44.4MB/s ± 0% 44.5MB/s ± 0% +0.27% (p=0.008 n=5+5) JSONEncode 96.4MB/s ± 0% 96.2MB/s ± 1% ~ (p=0.579 n=5+5) JSONDecode 20.5MB/s ± 1% 20.6MB/s ± 1% ~ (p=0.111 n=5+5) Template 21.4MB/s ± 0% 21.6MB/s ± 0% +0.94% (p=0.008 n=5+5) GoParse 9.70MB/s ± 0% 9.63MB/s ± 0% -0.68% (p=0.016 n=4+5) Revcomp 380MB/s ± 0% 380MB/s ± 0% ~ (p=0.222 n=5+5) [Geo mean] 55.3MB/s 55.4MB/s +0.23% Change-Id: I2e5338138991d9bc984e67b51212aa5d1b0f2a6b Reviewed-on: https://go-review.googlesource.com/97335Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
Showing
This diff is collapsed.
Please register or sign in to comment