1. 08 Jan, 2021 2 commits
    • Ard Biesheuvel's avatar
      crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helper · 2481104f
      Ard Biesheuvel authored
      The AES-NI driver implements XTS via the glue helper, which consumes
      a struct with sets of function pointers which are invoked on chunks
      of input data of the appropriate size, as annotated in the struct.
      
      Let's get rid of this indirection, so that we can perform direct calls
      to the assembler helpers. Instead, let's adopt the arm64 strategy, i.e.,
      provide a helper which can consume inputs of any size, provided that the
      penultimate, full block is passed via the last call if ciphertext stealing
      needs to be applied.
      
      This also allows us to enable the XTS mode for i386.
      
      Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      2481104f
    • Ard Biesheuvel's avatar
      crypto: x86/aes-ni-xts - use direct calls to and 4-way stride · 86ad60a6
      Ard Biesheuvel authored
      The XTS asm helper arrangement is a bit odd: the 8-way stride helper
      consists of back-to-back calls to the 4-way core transforms, which
      are called indirectly, based on a boolean that indicates whether we
      are performing encryption or decryption.
      
      Given how costly indirect calls are on x86, let's switch to direct
      calls, and given how the 8-way stride doesn't really add anything
      substantial, use a 4-way stride instead, and make the asm core
      routine deal with any multiple of 4 blocks. Since 512 byte sectors
      or 4 KB blocks are the typical quantities XTS operates on, increase
      the stride exported to the glue helper to 512 bytes as well.
      
      As a result, the number of indirect calls is reduced from 3 per 64 bytes
      of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup
      when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU)
      
      Fixes: 9697fa39 ("x86/retpoline/crypto: Convert crypto assembler indirect jumps")
      Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      86ad60a6
  2. 02 Jan, 2021 38 commits