arch/x86/crypto/chacha20-avx512vl-x86_64.S · 180def6c4ad139ae6f97953ae810092ace295d5b · Kirill Smelkov / linux

crypto: x86/chacha20 - Add a 4-block AVX-512VL variant · 180def6c

Martin Willi authored Nov 20, 2018

This version uses the same principle as the AVX2 version by scheduling the
operations for two block pairs in parallel. It benefits from the AVX-512VL
rotate instructions and the more efficient partial block handling using
"vmovdqu8", resulting in a speedup of the raw block function of ~20%.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

180def6c

chacha20-avx512vl-x86_64.S 19.9 KB

Replace chacha20-avx512vl-x86_64.S