• Eric Biggers's avatar
    crypto: x86/aes-xts - wire up VAES + AVX10/256 implementation · ee63fea0
    Eric Biggers authored
    Add an AES-XTS implementation "xts-aes-vaes-avx10_256" for x86_64 CPUs
    with the VAES, VPCLMULQDQ, and either AVX10/256 or AVX512BW + AVX512VL
    extensions.  This implementation avoids using zmm registers, instead
    using ymm registers to operate on two AES blocks at a time.  The
    assembly code is instantiated using a macro so that most of the source
    code is shared with other implementations.
    
    This is the optimal implementation on CPUs that support VAES and AVX512
    but where the zmm registers should not be used due to downclocking
    effects, for example Intel's Ice Lake.  It should also be the optimal
    implementation on future CPUs that support AVX10/256 but not AVX10/512.
    
    The performance is slightly better than that of xts-aes-vaes-avx2, which
    uses the same 256-bit vector length, due to factors such as being able
    to use ymm16-ymm31 to cache the AES round keys, and being able to use
    the vpternlogd instruction to do XORs more efficiently.  For example, on
    Ice Lake, the throughput of decrypting 4096-byte messages with
    AES-256-XTS is 6.6% higher with xts-aes-vaes-avx10_256 than with
    xts-aes-vaes-avx2.  While this is a small improvement, it is
    straightforward to provide this implementation (xts-aes-vaes-avx10_256)
    as long as we are providing xts-aes-vaes-avx2 and xts-aes-vaes-avx10_512
    anyway, due to the way the _aes_xts_crypt macro is structured.
    Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    ee63fea0
aes-xts-avx-x86_64.S 24.4 KB