• Martin Willi's avatar
    crypto: x86/chacha20 - Add a 8-block AVX-512VL variant · cee7a36e
    Martin Willi authored
    This variant is similar to the AVX2 version, but benefits from the AVX-512
    rotate instructions and the additional registers, so it can operate without
    any data on the stack. It uses ymm registers only to avoid the massive core
    throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed
    improvement compared to the AVX2 variant for random encryption lengths.
    
    The AVX2 version uses "rep movsb" for partial block XORing via the stack.
    With AVX-512, the new "vmovdqu8" can do this much more efficiently. The
    associated "kmov" instructions to work with dynamic masks is not part of
    the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given
    that the major AVX-512VL architectures provide AVX-512BW and this extension
    does not affect core clocking, this seems to be no problem at least for
    now.
    Signed-off-by: default avatarMartin Willi <martin@strongswan.org>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    cee7a36e
chacha20_glue.c 5.35 KB