1. 21 Sep, 2018 9 commits
    • Eric Biggers's avatar
      crypto: chacha20 - Fix chacha20_block() keystream alignment (again) · a5e9f557
      Eric Biggers authored
      In commit 9f480fae ("crypto: chacha20 - Fix keystream alignment for
      chacha20_block()"), I had missed that chacha20_block() can be called
      directly on the buffer passed to get_random_bytes(), which can have any
      alignment.  So, while my commit didn't break anything, it didn't fully
      solve the alignment problems.
      
      Revert my solution and just update chacha20_block() to use
      put_unaligned_le32(), so the output buffer need not be aligned.
      This is simpler, and on many CPUs it's the same speed.
      
      But, I kept the 'tmp' buffers in extract_crng_user() and
      _get_random_bytes() 4-byte aligned, since that alignment is actually
      needed for _crng_backtrack_protect() too.
      Reported-by: default avatarStephan Müller <smueller@chronox.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      a5e9f557
    • Ondrej Mosnacek's avatar
      crypto: xts - Drop use of auxiliary buffer · 78105c7e
      Ondrej Mosnacek authored
      Since commit acb9b159 ("crypto: gf128mul - define gf128mul_x_* in
      gf128mul.h"), the gf128mul_x_*() functions are very fast and therefore
      caching the computed XTS tweaks has only negligible advantage over
      computing them twice.
      
      In fact, since the current caching implementation limits the size of
      the calls to the child ecb(...) algorithm to PAGE_SIZE (usually 4096 B),
      it is often actually slower than the simple recomputing implementation.
      
      This patch simplifies the XTS template to recompute the XTS tweaks from
      scratch in the second pass and thus also removes the need to allocate a
      dynamic buffer using kmalloc().
      
      As discussed at [1], the use of kmalloc causes deadlocks with dm-crypt.
      
      PERFORMANCE RESULTS
      I measured time to encrypt/decrypt a memory buffer of varying sizes with
      xts(ecb-aes-aesni) using a tool I wrote ([2]) and the results suggest
      that after this patch the performance is either better or comparable for
      both small and large buffers. Note that there is a lot of noise in the
      measurements, but the overall difference is easy to see.
      
      Old code:
             ALGORITHM KEY (b)        DATA (B)   TIME ENC (ns)   TIME DEC (ns)
              xts(aes)     256              64             331             328
              xts(aes)     384              64             332             333
              xts(aes)     512              64             338             348
              xts(aes)     256             512             889             920
              xts(aes)     384             512            1019             993
              xts(aes)     512             512            1032             990
              xts(aes)     256            4096            2152            2292
              xts(aes)     384            4096            2453            2597
              xts(aes)     512            4096            3041            2641
              xts(aes)     256           16384            9443            8027
              xts(aes)     384           16384            8536            8925
              xts(aes)     512           16384            9232            9417
              xts(aes)     256           32768           16383           14897
              xts(aes)     384           32768           17527           16102
              xts(aes)     512           32768           18483           17322
      
      New code:
             ALGORITHM KEY (b)        DATA (B)   TIME ENC (ns)   TIME DEC (ns)
              xts(aes)     256              64             328             324
              xts(aes)     384              64             324             319
              xts(aes)     512              64             320             322
              xts(aes)     256             512             476             473
              xts(aes)     384             512             509             492
              xts(aes)     512             512             531             514
              xts(aes)     256            4096            2132            1829
              xts(aes)     384            4096            2357            2055
              xts(aes)     512            4096            2178            2027
              xts(aes)     256           16384            6920            6983
              xts(aes)     384           16384            8597            7505
              xts(aes)     512           16384            7841            8164
              xts(aes)     256           32768           13468           12307
              xts(aes)     384           32768           14808           13402
              xts(aes)     512           32768           15753           14636
      
      [1] https://lkml.org/lkml/2018/8/23/1315
      [2] https://gitlab.com/omos/linux-crypto-benchSigned-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      78105c7e
    • Ard Biesheuvel's avatar
      crypto: arm64/aes-blk - improve XTS mask handling · 2e5d2f33
      Ard Biesheuvel authored
      The Crypto Extension instantiation of the aes-modes.S collection of
      skciphers uses only 15 NEON registers for the round key array, whereas
      the pure NEON flavor uses 16 NEON registers for the AES S-box.
      
      This means we have a spare register available that we can use to hold
      the XTS mask vector, removing the need to reload it at every iteration
      of the inner loop.
      
      Since the pure NEON version does not permit this optimization, tweak
      the macros so we can factor out this functionality. Also, replace the
      literal load with a short sequence to compose the mask vector.
      
      On Cortex-A53, this results in a ~4% speedup.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      2e5d2f33
    • Ard Biesheuvel's avatar
      crypto: arm64/aes-blk - add support for CTS-CBC mode · dd597fb3
      Ard Biesheuvel authored
      Currently, we rely on the generic CTS chaining mode wrapper to
      instantiate the cts(cbc(aes)) skcipher. Due to the high performance
      of the ARMv8 Crypto Extensions AES instructions (~1 cycles per byte),
      any overhead in the chaining mode layers is amplified, and so it pays
      off considerably to fold the CTS handling into the SIMD routines.
      
      On Cortex-A53, this results in a ~50% speedup for smaller input sizes.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      dd597fb3
    • Ard Biesheuvel's avatar
      crypto: arm64/aes-blk - revert NEON yield for skciphers · 6e7de6af
      Ard Biesheuvel authored
      The reasoning of commit f10dc56c ("crypto: arm64 - revert NEON yield
      for fast AEAD implementations") applies equally to skciphers: the walk
      API already guarantees that the input size of each call into the NEON
      code is bounded to the size of a page, and so there is no need for an
      additional TIF_NEED_RESCHED flag check inside the inner loop. So revert
      the skcipher changes to aes-modes.S (but retain the mac ones)
      
      This partially reverts commit 0c8f838a.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      6e7de6af
    • Ard Biesheuvel's avatar
      crypto: arm64/aes-blk - remove pointless (u8 *) casts · 557ecb45
      Ard Biesheuvel authored
      For some reason, the asmlinkage prototypes of the NEON routines take
      u8[] arguments for the round key arrays, while the actual round keys
      are arrays of u32, and so passing them into those routines requires
      u8* casts at each occurrence. Fix that.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      557ecb45
    • Srikanth Jampala's avatar
      crypto: cavium/nitrox - use dma_pool_zalloc() · 718f608c
      Srikanth Jampala authored
      use dma_pool_zalloc() instead of dma_pool_alloc with __GFP_ZERO flag.
      crypto dma pool renamed to "nitrox-context".
      Signed-off-by: default avatarSrikanth Jampala <Jampala.Srikanth@cavium.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      718f608c
    • Herbert Xu's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 910e3ca1
      Herbert Xu authored
      Merge crypto-2.6 to resolve caam conflict with skcipher conversion.
      910e3ca1
    • Horia Geantă's avatar
      crypto: caam/jr - fix ablkcipher_edesc pointer arithmetic · 13cc6f48
      Horia Geantă authored
      In some cases the zero-length hw_desc array at the end of
      ablkcipher_edesc struct requires for 4B of tail padding.
      
      Due to tail padding and the way pointers to S/G table and IV
      are computed:
      	edesc->sec4_sg = (void *)edesc + sizeof(struct ablkcipher_edesc) +
      			 desc_bytes;
      	iv = (u8 *)edesc->hw_desc + desc_bytes + sec4_sg_bytes;
      first 4 bytes of IV are overwritten by S/G table.
      
      Update computation of pointer to S/G table to rely on offset of hw_desc
      member and not on sizeof() operator.
      
      Cc: <stable@vger.kernel.org> # 4.13+
      Fixes: 115957bb ("crypto: caam - fix IV DMA mapping and updating")
      Signed-off-by: default avatarHoria Geantă <horia.geanta@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      13cc6f48
  2. 14 Sep, 2018 5 commits
  3. 13 Sep, 2018 1 commit
    • Brijesh Singh's avatar
      crypto: ccp - add timeout support in the SEV command · 3702a058
      Brijesh Singh authored
      Currently, the CCP driver assumes that the SEV command issued to the PSP
      will always return (i.e. it will never hang).  But recently, firmware bugs
      have shown that a command can hang.  Since of the SEV commands are used
      in probe routines, this can cause boot hangs and/or loss of virtualization
      capabilities.
      
      To protect against firmware bugs, add a timeout in the SEV command
      execution flow.  If a command does not complete within the specified
      timeout then return -ETIMEOUT and stop the driver from executing any
      further commands since the state of the SEV firmware is unknown.
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Gary Hook <Gary.Hook@amd.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      3702a058
  4. 04 Sep, 2018 24 commits
  5. 02 Sep, 2018 1 commit