math/bits: faster Reverse, ReverseBytes
- moved from: x&m>>k | x&^m<<k to: x&m>>k | x<<k&m This permits use of the same constant m twice (*) which may be better for machines that can't use large immediate constants directly with an AND instruction and have to load them explicitly. *) CPUs don't usually have a &^ instruction, so x&^m becomes x&(^m) - simplified returns This improves the generated code because the compiler recognizes x>>k | x<<k as ROT when k is the bitsize of x. The 8-bit versions of these instructions can be significantly faster still if they are replaced with table lookups, as long as the table is in cache. If the table is not in cache, table-lookup is probably slower, hence the choice of an explicit register-only implementation for now. BenchmarkReverse-8 8.50 6.86 -19.29% BenchmarkReverse8-8 2.17 1.74 -19.82% BenchmarkReverse16-8 2.89 2.34 -19.03% BenchmarkReverse32-8 3.55 2.95 -16.90% BenchmarkReverse64-8 6.81 5.57 -18.21% BenchmarkReverseBytes-8 3.49 2.48 -28.94% BenchmarkReverseBytes16-8 0.93 0.62 -33.33% BenchmarkReverseBytes32-8 1.55 1.13 -27.10% BenchmarkReverseBytes64-8 2.47 2.47 +0.00% Reverse-8 8.50ns ± 0% 6.86ns ± 0% ~ (p=1.000 n=1+1) Reverse8-8 2.17ns ± 0% 1.74ns ± 0% ~ (p=1.000 n=1+1) Reverse16-8 2.89ns ± 0% 2.34ns ± 0% ~ (p=1.000 n=1+1) Reverse32-8 3.55ns ± 0% 2.95ns ± 0% ~ (p=1.000 n=1+1) Reverse64-8 6.81ns ± 0% 5.57ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes-8 3.49ns ± 0% 2.48ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes16-8 0.93ns ± 0% 0.62ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes32-8 1.55ns ± 0% 1.13ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes64-8 2.47ns ± 0% 2.47ns ± 0% ~ (all samples are equal) Change-Id: I0064de8c7e0e568ca7885d6f7064344bef91a06d Reviewed-on: https://go-review.googlesource.com/37215 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
Showing
Please register or sign in to comment