• Eric Biggers's avatar
    crypto: x86/aes-xts - wire up VAES + AVX10/512 implementation · aa2197f5
    Eric Biggers authored
    Add an AES-XTS implementation "xts-aes-vaes-avx10_512" for x86_64 CPUs
    with the VAES, VPCLMULQDQ, and either AVX10/512 or AVX512BW + AVX512VL
    extensions.  This implementation uses zmm registers to operate on four
    AES blocks at a time.  The assembly code is instantiated using a macro
    so that most of the source code is shared with other implementations.
    
    To avoid downclocking on older Intel CPU models, an exclusion list is
    used to prevent this 512-bit implementation from being used by default
    on some CPU models.  They will use xts-aes-vaes-avx10_256 instead.  For
    now, this exclusion list is simply coded into aesni-intel_glue.c.  It
    may make sense to eventually move it into a more central location.
    
    xts-aes-vaes-avx10_512 is slightly faster than xts-aes-vaes-avx10_256 on
    some current CPUs.  E.g., on AMD Zen 4, AES-256-XTS decryption
    throughput increases by 13% with 4096-byte inputs, or 14% with 512-byte
    inputs.  On Intel Sapphire Rapids, AES-256-XTS decryption throughput
    increases by 2% with 4096-byte inputs, or 3% with 512-byte inputs.
    
    Future CPUs may provide stronger 512-bit support, in which case a larger
    benefit should be seen.
    Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    aa2197f5
aes-xts-avx-x86_64.S 24.6 KB