crypto/elliptic: improve P256 implementation on amd64 a bit
Minor modifications to the optimized amd64 implememntation. * Reduce window size: reduces size of the lookup tables by 40% * Revised scalar inversion formula, with less operations * Field square function now uses intental loop, saving call overhead This change will serve as a basis for an arm64 implementation. Performance results on Skylake MacBook Pro: pkg:crypto/elliptic goos:darwin goarch:amd64 BaseMultP256 17.8µs ± 1% 17.5µs ± 1% -1.41% (p=0.003 n=10+10) ScalarMultP256 70.7µs ± 1% 68.9µs ± 2% -2.57% (p=0.000 n=9+9) pkg:crypto/ecdsa goos:darwin goarch:amd64 SignP256 32.7µs ± 1% 31.4µs ± 1% -3.96% (p=0.000 n=10+8) VerifyP256 95.1µs ± 1% 93.5µs ± 2% -1.73% (p=0.001 n=10+9) name old alloc/op new alloc/op delta pkg:crypto/elliptic goos:darwin goarch:amd64 BaseMultP256 288B ± 0% 288B ± 0% ~ (all equal) ScalarMultP256 256B ± 0% 256B ± 0% ~ (all equal) pkg:crypto/ecdsa goos:darwin goarch:amd64 SignP256 2.90kB ± 0% 2.90kB ± 0% ~ (all equal) VerifyP256 976B ± 0% 976B ± 0% ~ (all equal) name old allocs/op new allocs/op delta pkg:crypto/elliptic goos:darwin goarch:amd64 BaseMultP256 6.00 ± 0% 6.00 ± 0% ~ (all equal) ScalarMultP256 5.00 ± 0% 5.00 ± 0% ~ (all equal) pkg:crypto/ecdsa goos:darwin goarch:amd64 SignP256 34.0 ± 0% 34.0 ± 0% ~ (all equal) VerifyP256 17.0 ± 0% 17.0 ± 0% ~ (all equal) Change-Id: I3f0e2e197a54e7bc7916dedc5dbf085e2c4aea24 Reviewed-on: https://go-review.googlesource.com/99622Reviewed-by: Vlad Krasnov <vlad@cloudflare.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> Run-TryBot: Vlad Krasnov <vlad@cloudflare.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
Showing
Please register or sign in to comment