1. 29 Nov, 2018 3 commits
    • Martin Willi's avatar
      crypto: x86/chacha20 - Add a 4-block AVX-512VL variant · 180def6c
      Martin Willi authored
      This version uses the same principle as the AVX2 version by scheduling the
      operations for two block pairs in parallel. It benefits from the AVX-512VL
      rotate instructions and the more efficient partial block handling using
      "vmovdqu8", resulting in a speedup of the raw block function of ~20%.
      Signed-off-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      180def6c
    • Martin Willi's avatar
      crypto: x86/chacha20 - Add a 2-block AVX-512VL variant · 29a47b54
      Martin Willi authored
      This version uses the same principle as the AVX2 version. It benefits
      from the AVX-512VL rotate instructions and the more efficient partial
      block handling using "vmovdqu8", resulting in a speedup of ~20%.
      
      Unlike the AVX2 version, it is faster than the single block SSSE3 version
      to process a single block. Hence we engage that function for (partial)
      single block lengths as well.
      Signed-off-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      29a47b54
    • Martin Willi's avatar
      crypto: x86/chacha20 - Add a 8-block AVX-512VL variant · cee7a36e
      Martin Willi authored
      This variant is similar to the AVX2 version, but benefits from the AVX-512
      rotate instructions and the additional registers, so it can operate without
      any data on the stack. It uses ymm registers only to avoid the massive core
      throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed
      improvement compared to the AVX2 variant for random encryption lengths.
      
      The AVX2 version uses "rep movsb" for partial block XORing via the stack.
      With AVX-512, the new "vmovdqu8" can do this much more efficiently. The
      associated "kmov" instructions to work with dynamic masks is not part of
      the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given
      that the major AVX-512VL architectures provide AVX-512BW and this extension
      does not affect core clocking, this seems to be no problem at least for
      now.
      Signed-off-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      cee7a36e
  2. 20 Nov, 2018 18 commits
    • Eric Biggers's avatar
      crypto: adiantum - add Adiantum support · 059c2a4d
      Eric Biggers authored
      Add support for the Adiantum encryption mode.  Adiantum was designed by
      Paul Crowley and is specified by our paper:
      
          Adiantum: length-preserving encryption for entry-level processors
          (https://eprint.iacr.org/2018/720.pdf)
      
      See our paper for full details; this patch only provides an overview.
      
      Adiantum is a tweakable, length-preserving encryption mode designed for
      fast and secure disk encryption, especially on CPUs without dedicated
      crypto instructions.  Adiantum encrypts each sector using the XChaCha12
      stream cipher, two passes of an ε-almost-∆-universal (εA∆U) hash
      function, and an invocation of the AES-256 block cipher on a single
      16-byte block.  On CPUs without AES instructions, Adiantum is much
      faster than AES-XTS; for example, on ARM Cortex-A7, on 4096-byte sectors
      Adiantum encryption is about 4 times faster than AES-256-XTS encryption,
      and decryption about 5 times faster.
      
      Adiantum is a specialization of the more general HBSH construction.  Our
      earlier proposal, HPolyC, was also a HBSH specialization, but it used a
      different εA∆U hash function, one based on Poly1305 only.  Adiantum's
      εA∆U hash function, which is based primarily on the "NH" hash function
      like that used in UMAC (RFC4418), is about twice as fast as HPolyC's;
      consequently, Adiantum is about 20% faster than HPolyC.
      
      This speed comes with no loss of security: Adiantum is provably just as
      secure as HPolyC, in fact slightly *more* secure.  Like HPolyC,
      Adiantum's security is reducible to that of XChaCha12 and AES-256,
      subject to a security bound.  XChaCha12 itself has a security reduction
      to ChaCha12.  Therefore, one need not "trust" Adiantum; one need only
      trust ChaCha12 and AES-256.  Note that the εA∆U hash function is only
      used for its proven combinatorical properties so cannot be "broken".
      
      Adiantum is also a true wide-block encryption mode, so flipping any
      plaintext bit in the sector scrambles the entire ciphertext, and vice
      versa.  No other such mode is available in the kernel currently; doing
      the same with XTS scrambles only 16 bytes.  Adiantum also supports
      arbitrary-length tweaks and naturally supports any length input >= 16
      bytes without needing "ciphertext stealing".
      
      For the stream cipher, Adiantum uses XChaCha12 rather than XChaCha20 in
      order to make encryption feasible on the widest range of devices.
      Although the 20-round variant is quite popular, the best known attacks
      on ChaCha are on only 7 rounds, so ChaCha12 still has a substantial
      security margin; in fact, larger than AES-256's.  12-round Salsa20 is
      also the eSTREAM recommendation.  For the block cipher, Adiantum uses
      AES-256, despite it having a lower security margin than XChaCha12 and
      needing table lookups, due to AES's extensive adoption and analysis
      making it the obvious first choice.  Nevertheless, for flexibility this
      patch also permits the "adiantum" template to be instantiated with
      XChaCha20 and/or with an alternate block cipher.
      
      We need Adiantum support in the kernel for use in dm-crypt and fscrypt,
      where currently the only other suitable options are block cipher modes
      such as AES-XTS.  A big problem with this is that many low-end mobile
      devices (e.g. Android Go phones sold primarily in developing countries,
      as well as some smartwatches) still have CPUs that lack AES
      instructions, e.g. ARM Cortex-A7.  Sadly, AES-XTS encryption is much too
      slow to be viable on these devices.  We did find that some "lightweight"
      block ciphers are fast enough, but these suffer from problems such as
      not having much cryptanalysis or being too controversial.
      
      The ChaCha stream cipher has excellent performance but is insecure to
      use directly for disk encryption, since each sector's IV is reused each
      time it is overwritten.  Even restricting the threat model to offline
      attacks only isn't enough, since modern flash storage devices don't
      guarantee that "overwrites" are really overwrites, due to wear-leveling.
      Adiantum avoids this problem by constructing a
      "tweakable super-pseudorandom permutation"; this is the strongest
      possible security model for length-preserving encryption.
      
      Of course, storing random nonces along with the ciphertext would be the
      ideal solution.  But doing that with existing hardware and filesystems
      runs into major practical problems; in most cases it would require data
      journaling (like dm-integrity) which severely degrades performance.
      Thus, for now length-preserving encryption is still needed.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      059c2a4d
    • Eric Biggers's avatar
      crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305 · 16aae359
      Eric Biggers authored
      Add an ARM NEON implementation of NHPoly1305, an ε-almost-∆-universal
      hash function used in the Adiantum encryption mode.  For now, only the
      NH portion is actually NEON-accelerated; the Poly1305 part is less
      performance-critical so is just implemented in C.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      16aae359
    • Eric Biggers's avatar
      crypto: nhpoly1305 - add NHPoly1305 support · 26609a21
      Eric Biggers authored
      Add a generic implementation of NHPoly1305, an ε-almost-∆-universal hash
      function used in the Adiantum encryption mode.
      
      CONFIG_NHPOLY1305 is not selectable by itself since there won't be any
      real reason to enable it without also enabling Adiantum support.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      26609a21
    • Eric Biggers's avatar
      crypto: poly1305 - add Poly1305 core API · 1b6fd3d5
      Eric Biggers authored
      Expose a low-level Poly1305 API which implements the
      ε-almost-∆-universal (εA∆U) hash function underlying the Poly1305 MAC
      and supports block-aligned inputs only.
      
      This is needed for Adiantum hashing, which builds an εA∆U hash function
      from NH and a polynomial evaluation in GF(2^{130}-5); this polynomial
      evaluation is identical to the one the Poly1305 MAC does.  However, the
      crypto_shash Poly1305 API isn't very appropriate for this because its
      calling convention assumes it is used as a MAC, with a 32-byte "one-time
      key" provided for every digest.
      
      But by design, in Adiantum hashing the performance of the polynomial
      evaluation isn't nearly as critical as NH.  So it suffices to just have
      some C helper functions.  Thus, this patch adds such functions.
      Acked-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      1b6fd3d5
    • Eric Biggers's avatar
      crypto: poly1305 - use structures for key and accumulator · 878afc35
      Eric Biggers authored
      In preparation for exposing a low-level Poly1305 API which implements
      the ε-almost-∆-universal (εA∆U) hash function underlying the Poly1305
      MAC and supports block-aligned inputs only, create structures
      poly1305_key and poly1305_state which hold the limbs of the Poly1305
      "r" key and accumulator, respectively.
      
      These structures could actually have the same type (e.g. poly1305_val),
      but different types are preferable, to prevent misuse.
      Acked-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      878afc35
    • Eric Biggers's avatar
      crypto: arm/chacha - add XChaCha12 support · bdb063a7
      Eric Biggers authored
      Now that the 32-bit ARM NEON implementation of ChaCha20 and XChaCha20
      has been refactored to support varying the number of rounds, add support
      for XChaCha12.  This is identical to XChaCha20 except for the number of
      rounds, which is 12 instead of 20.
      
      XChaCha12 is faster than XChaCha20 but has a lower security margin,
      though still greater than AES-256's since the best known attacks make it
      through only 7 rounds.  See the patch "crypto: chacha - add XChaCha12
      support" for more details about why we need XChaCha12 support.
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      bdb063a7
    • Eric Biggers's avatar
      crypto: arm/chacha20 - refactor to allow varying number of rounds · 3cc21519
      Eric Biggers authored
      In preparation for adding XChaCha12 support, rename/refactor the NEON
      implementation of ChaCha20 to support different numbers of rounds.
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      3cc21519
    • Eric Biggers's avatar
      crypto: arm/chacha20 - add XChaCha20 support · d97a9430
      Eric Biggers authored
      Add an XChaCha20 implementation that is hooked up to the ARM NEON
      implementation of ChaCha20.  This is needed for use in the Adiantum
      encryption mode; see the generic code patch,
      "crypto: chacha20-generic - add XChaCha20 support", for more details.
      
      We also update the NEON code to support HChaCha20 on one block, so we
      can use that in XChaCha20 rather than calling the generic HChaCha20.
      This required factoring the permutation out into its own macro.
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      d97a9430
    • Eric Biggers's avatar
      crypto: arm/chacha20 - limit the preemption-disabled section · be2830b1
      Eric Biggers authored
      To improve responsivesess, disable preemption for each step of the walk
      (which is at most PAGE_SIZE) rather than for the entire
      encryption/decryption operation.
      Suggested-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      be2830b1
    • Eric Biggers's avatar
      crypto: chacha - add XChaCha12 support · aa762409
      Eric Biggers authored
      Now that the generic implementation of ChaCha20 has been refactored to
      allow varying the number of rounds, add support for XChaCha12, which is
      the XSalsa construction applied to ChaCha12.  ChaCha12 is one of the
      three ciphers specified by the original ChaCha paper
      (https://cr.yp.to/chacha/chacha-20080128.pdf: "ChaCha, a variant of
      Salsa20"), alongside ChaCha8 and ChaCha20.  ChaCha12 is faster than
      ChaCha20 but has a lower, but still large, security margin.
      
      We need XChaCha12 support so that it can be used in the Adiantum
      encryption mode, which enables disk/file encryption on low-end mobile
      devices where AES-XTS is too slow as the CPUs lack AES instructions.
      
      We'd prefer XChaCha20 (the more popular variant), but it's too slow on
      some of our target devices, so at least in some cases we do need the
      XChaCha12-based version.  In more detail, the problem is that Adiantum
      is still much slower than we're happy with, and encryption still has a
      quite noticeable effect on the feel of low-end devices.  Users and
      vendors push back hard against encryption that degrades the user
      experience, which always risks encryption being disabled entirely.  So
      we need to choose the fastest option that gives us a solid margin of
      security, and here that's XChaCha12.  The best known attack on ChaCha
      breaks only 7 rounds and has 2^235 time complexity, so ChaCha12's
      security margin is still better than AES-256's.  Much has been learned
      about cryptanalysis of ARX ciphers since Salsa20 was originally designed
      in 2005, and it now seems we can be comfortable with a smaller number of
      rounds.  The eSTREAM project also suggests the 12-round version of
      Salsa20 as providing the best balance among the different variants:
      combining very good performance with a "comfortable margin of security".
      
      Note that it would be trivial to add vanilla ChaCha12 in addition to
      XChaCha12.  However, it's unneeded for now and therefore is omitted.
      
      As discussed in the patch that introduced XChaCha20 support, I
      considered splitting the code into separate chacha-common, chacha20,
      xchacha20, and xchacha12 modules, so that these algorithms could be
      enabled/disabled independently.  However, since nearly all the code is
      shared anyway, I ultimately decided there would have been little benefit
      to the added complexity.
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      aa762409
    • Eric Biggers's avatar
      crypto: chacha20-generic - refactor to allow varying number of rounds · 1ca1b917
      Eric Biggers authored
      In preparation for adding XChaCha12 support, rename/refactor
      chacha20-generic to support different numbers of rounds.  The
      justification for needing XChaCha12 support is explained in more detail
      in the patch "crypto: chacha - add XChaCha12 support".
      
      The only difference between ChaCha{8,12,20} are the number of rounds
      itself; all other parts of the algorithm are the same.  Therefore,
      remove the "20" from all definitions, structures, functions, files, etc.
      that will be shared by all ChaCha versions.
      
      Also make ->setkey() store the round count in the chacha_ctx (previously
      chacha20_ctx).  The generic code then passes the round count through to
      chacha_block().  There will be a ->setkey() function for each explicitly
      allowed round count; the encrypt/decrypt functions will be the same.  I
      decided not to do it the opposite way (same ->setkey() function for all
      round counts, with different encrypt/decrypt functions) because that
      would have required more boilerplate code in architecture-specific
      implementations of ChaCha and XChaCha.
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      1ca1b917
    • Eric Biggers's avatar
      crypto: chacha20-generic - add XChaCha20 support · de61d7ae
      Eric Biggers authored
      Add support for the XChaCha20 stream cipher.  XChaCha20 is the
      application of the XSalsa20 construction
      (https://cr.yp.to/snuffle/xsalsa-20081128.pdf) to ChaCha20 rather than
      to Salsa20.  XChaCha20 extends ChaCha20's nonce length from 64 bits (or
      96 bits, depending on convention) to 192 bits, while provably retaining
      ChaCha20's security.  XChaCha20 uses the ChaCha20 permutation to map the
      key and first 128 nonce bits to a 256-bit subkey.  Then, it does the
      ChaCha20 stream cipher with the subkey and remaining 64 bits of nonce.
      
      We need XChaCha support in order to add support for the Adiantum
      encryption mode.  Note that to meet our performance requirements, we
      actually plan to primarily use the variant XChaCha12.  But we believe
      it's wise to first add XChaCha20 as a baseline with a higher security
      margin, in case there are any situations where it can be used.
      Supporting both variants is straightforward.
      
      Since XChaCha20's subkey differs for each request, XChaCha20 can't be a
      template that wraps ChaCha20; that would require re-keying the
      underlying ChaCha20 for every request, which wouldn't be thread-safe.
      Instead, we make XChaCha20 its own top-level algorithm which calls the
      ChaCha20 streaming implementation internally.
      
      Similar to the existing ChaCha20 implementation, we define the IV to be
      the nonce and stream position concatenated together.  This allows users
      to seek to any position in the stream.
      
      I considered splitting the code into separate chacha20-common, chacha20,
      and xchacha20 modules, so that chacha20 and xchacha20 could be
      enabled/disabled independently.  However, since nearly all the code is
      shared anyway, I ultimately decided there would have been little benefit
      to the added complexity of separate modules.
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      de61d7ae
    • Eric Biggers's avatar
      crypto: chacha20-generic - don't unnecessarily use atomic walk · 5e04542a
      Eric Biggers authored
      chacha20-generic doesn't use SIMD instructions or otherwise disable
      preemption, so passing atomic=true to skcipher_walk_virt() is
      unnecessary.
      Suggested-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      5e04542a
    • Eric Biggers's avatar
      crypto: chacha20-generic - add HChaCha20 library function · dd333449
      Eric Biggers authored
      Refactor the unkeyed permutation part of chacha20_block() into its own
      function, then add hchacha20_block() which is the ChaCha equivalent of
      HSalsa20 and is an intermediate step towards XChaCha20 (see
      https://cr.yp.to/snuffle/xsalsa-20081128.pdf).  HChaCha20 skips the
      final addition of the initial state, and outputs only certain words of
      the state.  It should not be used for streaming directly.
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      dd333449
    • Eric Biggers's avatar
      crypto: drop mask=CRYPTO_ALG_ASYNC from 'shash' tfm allocations · 3d234b33
      Eric Biggers authored
      'shash' algorithms are always synchronous, so passing CRYPTO_ALG_ASYNC
      in the mask to crypto_alloc_shash() has no effect.  Many users therefore
      already don't pass it, but some still do.  This inconsistency can cause
      confusion, especially since the way the 'mask' argument works is
      somewhat counterintuitive.
      
      Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.
      
      This patch shouldn't change any actual behavior.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      3d234b33
    • Eric Biggers's avatar
      crypto: drop mask=CRYPTO_ALG_ASYNC from 'cipher' tfm allocations · 1ad0f160
      Eric Biggers authored
      'cipher' algorithms (single block ciphers) are always synchronous, so
      passing CRYPTO_ALG_ASYNC in the mask to crypto_alloc_cipher() has no
      effect.  Many users therefore already don't pass it, but some still do.
      This inconsistency can cause confusion, especially since the way the
      'mask' argument works is somewhat counterintuitive.
      
      Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.
      
      This patch shouldn't change any actual behavior.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      1ad0f160
    • Eric Biggers's avatar
      crypto: remove useless initializations of cra_list · d4165590
      Eric Biggers authored
      Some algorithms initialize their .cra_list prior to registration.
      But this is unnecessary since crypto_register_alg() will overwrite
      .cra_list when adding the algorithm to the 'crypto_alg_list'.
      Apparently the useless assignment has just been copy+pasted around.
      
      So, remove the useless assignments.
      
      Exception: paes_s390.c uses cra_list to check whether the algorithm is
      registered or not, so I left that as-is for now.
      
      This patch shouldn't change any actual behavior.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      d4165590
    • Eric Biggers's avatar
      crypto: inside-secure - remove useless setting of type flags · 2b78aeb3
      Eric Biggers authored
      Remove the unnecessary setting of CRYPTO_ALG_TYPE_SKCIPHER.
      Commit 2c95e6d9 ("crypto: skcipher - remove useless setting of type
      flags") took care of this everywhere else, but a few more instances made
      it into the tree at about the same time.  Squash them before they get
      copy+pasted around again.
      
      This patch shouldn't change any actual behavior.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      2b78aeb3
  3. 16 Nov, 2018 19 commits