1. 03 Feb, 2017 8 commits
  2. 02 Feb, 2017 2 commits
  3. 23 Jan, 2017 16 commits
  4. 13 Jan, 2017 9 commits
  5. 12 Jan, 2017 5 commits
    • Ard Biesheuvel's avatar
      crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64 · 1abee99e
      Ard Biesheuvel authored
      This is a reimplementation of the NEON version of the bit-sliced AES
      algorithm. This code is heavily based on Andy Polyakov's OpenSSL version
      for ARM, which is also available in the kernel. This is an alternative for
      the existing NEON implementation for arm64 authored by me, which suffers
      from poor performance due to its reliance on the pathologically slow four
      register variant of the tbl/tbx NEON instruction.
      
      This version is about ~30% (*) faster than the generic C code, but only in
      cases where the input can be 8x interleaved (this is a fundamental property
      of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR
      are implemented. (The significance of ECB is that it could potentially be
      used by other chaining modes)
      
      * Measured on Cortex-A57. Note that this is still an order of magnitude
        slower than the implementations that use the dedicated AES instructions
        introduced in ARMv8, but those are part of an optional extension, and so
        it is good to have a fallback.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      1abee99e
    • Ard Biesheuvel's avatar
      crypto: arm/aes - replace scalar AES cipher · 81edb426
      Ard Biesheuvel authored
      This replaces the scalar AES cipher that originates in the OpenSSL project
      with a new implementation that is ~15% (*) faster (on modern cores), and
      reuses the lookup tables and the key schedule generation routines from the
      generic C implementation (which is usually compiled in anyway due to
      networking and other subsystems depending on it).
      
      Note that the bit sliced NEON code for AES still depends on the scalar cipher
      that this patch replaces, so it is not removed entirely yet.
      
      * On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte
        for 128-bit keys.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      81edb426
    • Ard Biesheuvel's avatar
      crypto: arm64/aes - add scalar implementation · bed593c0
      Ard Biesheuvel authored
      This adds a scalar implementation of AES, based on the precomputed tables
      that are exposed by the generic AES code. Since rotates are cheap on arm64,
      this implementation only uses the 4 core tables (of 1 KB each), and avoids
      the prerotated ones, reducing the D-cache footprint by 75%.
      
      On Cortex-A57, this code manages 13.0 cycles per byte, which is ~34% faster
      than the generic C code. (Note that this is still >13x slower than the code
      that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per
      byte.)
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      bed593c0
    • Ard Biesheuvel's avatar
      crypto: arm64/aes-blk - expose AES-CTR as synchronous cipher as well · 293614ce
      Ard Biesheuvel authored
      In addition to wrapping the AES-CTR cipher into the async SIMD wrapper,
      which exposes it as an async skcipher that defers processing to process
      context, expose our AES-CTR implementation directly as a synchronous cipher
      as well, but with a lower priority.
      
      This makes the AES-CTR transform usable in places where synchronous
      transforms are required, such as the MAC802.11 encryption code, which
      executes in sotfirq context, where SIMD processing is allowed on arm64.
      Users of the async transform will keep the existing behavior.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      293614ce
    • Ard Biesheuvel's avatar
      crypto: arm/chacha20 - implement NEON version based on SSE3 code · afaf712e
      Ard Biesheuvel authored
      This is a straight port to ARM/NEON of the x86 SSE3 implementation
      of the ChaCha20 stream cipher. It uses the new skcipher walksize
      attribute to process the input in strides of 4x the block size.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      afaf712e