crypto: x86/chacha20 - Add a 4-block AVX-512VL variant
This version uses the same principle as the AVX2 version by scheduling the operations for two block pairs in parallel. It benefits from the AVX-512VL rotate instructions and the more efficient partial block handling using "vmovdqu8", resulting in a speedup of the raw block function of ~20%. Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Showing
Please register or sign in to comment