1. 12 Apr, 2024 35 commits
  2. 05 Apr, 2024 5 commits
    • Thorsten Blum's avatar
      crypto: jitter - Replace http with https · 4ad27a8b
      Thorsten Blum authored
      The PDF is also available via https.
      Signed-off-by: default avatarThorsten Blum <thorsten.blum@toblux.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      4ad27a8b
    • Thorsten Blum's avatar
      8fa5f4f0
    • Eric Biggers's avatar
      crypto: x86/aes-xts - wire up VAES + AVX10/512 implementation · aa2197f5
      Eric Biggers authored
      Add an AES-XTS implementation "xts-aes-vaes-avx10_512" for x86_64 CPUs
      with the VAES, VPCLMULQDQ, and either AVX10/512 or AVX512BW + AVX512VL
      extensions.  This implementation uses zmm registers to operate on four
      AES blocks at a time.  The assembly code is instantiated using a macro
      so that most of the source code is shared with other implementations.
      
      To avoid downclocking on older Intel CPU models, an exclusion list is
      used to prevent this 512-bit implementation from being used by default
      on some CPU models.  They will use xts-aes-vaes-avx10_256 instead.  For
      now, this exclusion list is simply coded into aesni-intel_glue.c.  It
      may make sense to eventually move it into a more central location.
      
      xts-aes-vaes-avx10_512 is slightly faster than xts-aes-vaes-avx10_256 on
      some current CPUs.  E.g., on AMD Zen 4, AES-256-XTS decryption
      throughput increases by 13% with 4096-byte inputs, or 14% with 512-byte
      inputs.  On Intel Sapphire Rapids, AES-256-XTS decryption throughput
      increases by 2% with 4096-byte inputs, or 3% with 512-byte inputs.
      
      Future CPUs may provide stronger 512-bit support, in which case a larger
      benefit should be seen.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      aa2197f5
    • Eric Biggers's avatar
      crypto: x86/aes-xts - wire up VAES + AVX10/256 implementation · ee63fea0
      Eric Biggers authored
      Add an AES-XTS implementation "xts-aes-vaes-avx10_256" for x86_64 CPUs
      with the VAES, VPCLMULQDQ, and either AVX10/256 or AVX512BW + AVX512VL
      extensions.  This implementation avoids using zmm registers, instead
      using ymm registers to operate on two AES blocks at a time.  The
      assembly code is instantiated using a macro so that most of the source
      code is shared with other implementations.
      
      This is the optimal implementation on CPUs that support VAES and AVX512
      but where the zmm registers should not be used due to downclocking
      effects, for example Intel's Ice Lake.  It should also be the optimal
      implementation on future CPUs that support AVX10/256 but not AVX10/512.
      
      The performance is slightly better than that of xts-aes-vaes-avx2, which
      uses the same 256-bit vector length, due to factors such as being able
      to use ymm16-ymm31 to cache the AES round keys, and being able to use
      the vpternlogd instruction to do XORs more efficiently.  For example, on
      Ice Lake, the throughput of decrypting 4096-byte messages with
      AES-256-XTS is 6.6% higher with xts-aes-vaes-avx10_256 than with
      xts-aes-vaes-avx2.  While this is a small improvement, it is
      straightforward to provide this implementation (xts-aes-vaes-avx10_256)
      as long as we are providing xts-aes-vaes-avx2 and xts-aes-vaes-avx10_512
      anyway, due to the way the _aes_xts_crypt macro is structured.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      ee63fea0
    • Eric Biggers's avatar
      crypto: x86/aes-xts - wire up VAES + AVX2 implementation · e787060b
      Eric Biggers authored
      Add an AES-XTS implementation "xts-aes-vaes-avx2" for x86_64 CPUs with
      the VAES, VPCLMULQDQ, and AVX2 extensions, but not AVX512 or AVX10.
      This implementation uses ymm registers to operate on two AES blocks at a
      time.  The assembly code is instantiated using a macro so that most of
      the source code is shared with other implementations.
      
      This is the optimal implementation on AMD Zen 3.  It should also be the
      optimal implementation on Intel Alder Lake, which similarly supports
      VAES but not AVX512.  Comparing to xts-aes-aesni-avx on Zen 3,
      xts-aes-vaes-avx2 provides 70% higher AES-256-XTS decryption throughput
      with 4096-byte messages, or 23% higher with 512-byte messages.
      
      A large improvement is also seen with CPUs that do support AVX512 (e.g.,
      98% higher AES-256-XTS decryption throughput on Ice Lake with 4096-byte
      messages), though the following patches add AVX512 optimized
      implementations to get a bit more performance on those CPUs.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      e787060b