1. 26 May, 2024 1 commit
  2. 17 May, 2024 2 commits
  3. 10 May, 2024 13 commits
  4. 03 May, 2024 3 commits
  5. 26 Apr, 2024 14 commits
  6. 19 Apr, 2024 7 commits
    • Eric Biggers's avatar
      crypto: x86/aes-xts - optimize size of instructions operating on lengths · 543ea178
      Eric Biggers authored
      x86_64 has the "interesting" property that the instruction size is
      generally a bit shorter for instructions that operate on the 32-bit (or
      less) part of registers, or registers that are in the original set of 8.
      
      This patch adjusts the AES-XTS code to take advantage of that property
      by changing the LEN parameter from size_t to unsigned int (which is all
      that's needed and is what the non-AVX implementation uses) and using the
      %eax register for KEYLEN.
      
      This decreases the size of aes-xts-avx-x86_64.o by 1.2%.
      
      Note that changing the kmovq to kmovd was going to be needed anyway to
      make the AVX10/256 code really work on CPUs that don't support 512-bit
      vectors (since the AVX10 spec says that 64-bit opmask instructions will
      only be supported on processors that support 512-bit vectors).
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      543ea178
    • Eric Biggers's avatar
      crypto: x86/aes-xts - eliminate a few more instructions · e619723a
      Eric Biggers authored
      - For conditionally subtracting 16 from LEN when decrypting a message
        whose length isn't a multiple of 16, use the cmovnz instruction.
      
      - Fold the addition of 4*VL to LEN into the sub of VL or 16 from LEN.
      
      - Remove an unnecessary test instruction.
      
      This results in slightly shorter code, both source and binary.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      e619723a
    • Eric Biggers's avatar
      crypto: x86/aes-xts - handle AES-128 and AES-192 more efficiently · 2717e01f
      Eric Biggers authored
      Decrease the amount of code specific to the different AES variants by
      "right-aligning" the sequence of round keys, and for AES-128 and AES-192
      just skipping irrelevant rounds at the beginning.
      
      This shrinks the size of aes-xts-avx-x86_64.o by 13.3%, and it improves
      the efficiency of AES-128 and AES-192.  The tradeoff is that for AES-256
      some additional not-taken conditional jumps are now executed.  But these
      are predicted well and are cheap on x86.
      
      Note that the ARMv8 CE based AES-XTS implementation uses a similar
      strategy to handle the different AES variants.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      2717e01f
    • Eric Biggers's avatar
      crypto: x86/aesni-xts - deduplicate aesni_xts_enc() and aesni_xts_dec() · ea9459ef
      Eric Biggers authored
      Since aesni_xts_enc() and aesni_xts_dec() are very similar, generate
      them from a macro that's passed an argument enc=1 or enc=0.  This
      reduces the length of aesni-intel_asm.S by 112 lines while still
      producing the exact same object file in both 32-bit and 64-bit mode.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      ea9459ef
    • Eric Biggers's avatar
      crypto: x86/aes-xts - handle CTS encryption more efficiently · 1d27e1f5
      Eric Biggers authored
      When encrypting a message whose length isn't a multiple of 16 bytes,
      encrypt the last full block in the main loop.  This works because only
      decryption uses the last two tweaks in reverse order, not encryption.
      
      This improves the performance of decrypting messages whose length isn't
      a multiple of the AES block length, shrinks the size of
      aes-xts-avx-x86_64.o by 5.0%, and eliminates two instructions (a test
      and a not-taken conditional jump) when encrypting a message whose length
      *is* a multiple of the AES block length.
      
      While it's not super useful to optimize for ciphertext stealing given
      that it's rarely needed in practice, the other two benefits mentioned
      above make this optimization worthwhile.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      1d27e1f5
    • Maxime Méré's avatar
      crypto: stm32/hash - add full DMA support for stm32mpx · 3525fe47
      Maxime Méré authored
      Due to a lack of alignment in the data sent by requests, the actual DMA
      support of the STM32 hash driver is only working with digest calls.
      This patch, based on the algorithm used in the driver omap-sham.c,
      allows for the usage of DMA in any situation.
      
      It has been functionally tested on STM32MP15, STM32MP13 and STM32MP25.
      
      By checking the performance of this new driver with OpenSSL, the
      following results were found:
      
      Performance:
      
      (datasize: 4096, number of hashes performed in 10s)
      
      |type   |no DMA    |DMA support|software  |
      |-------|----------|-----------|----------|
      |md5    |13873.56k |10958.03k  |71163.08k |
      |sha1   |13796.15k |10729.47k  |39670.58k |
      |sha224 |13737.98k |10775.76k  |22094.64k |
      |sha256 |13655.65k |10872.01k  |22075.39k |
      
      CPU Usage:
      
      (algorithm used: sha256, computation time: 20s, measurement taken at
      ~10s)
      
      |datasize  |no DMA |DMA  | software |
      |----------|-------|-----|----------|
      |  2048    | 56%   | 49% | 50%      |
      |  4096    | 54%   | 46% | 50%      |
      |  8192    | 53%   | 40% | 50%      |
      | 16384    | 53%   | 33% | 50%      |
      
      Note: this update doesn't change the driver performance without DMA.
      
      As shown, performance with DMA is slightly lower than without, but in
      most cases, it will save CPU time.
      Signed-off-by: default avatarMaxime Méré <maxime.mere@foss.st.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      3525fe47
    • Adam Guerin's avatar
      crypto: qat - improve error logging to be consistent across features · d281a28b
      Adam Guerin authored
      Improve error logging in rate limiting feature. Staying consistent with
      the error logging found in the telemetry feature.
      
      Fixes: d9fb8408 ("crypto: qat - add rate limiting feature to qat_4xxx")
      Signed-off-by: default avatarAdam Guerin <adam.guerin@intel.com>
      Reviewed-by: default avatarGiovanni Cabiddu <giovanni.cabiddu@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      d281a28b