Commits · 67ec8cdf29971677b2fb4b6d92871eb5d5e95597 · Kirill Smelkov / linux

26 May, 2024 1 commit

hwrng: core - Remove add_early_randomness · 67ec8cdf

Herbert Xu authored May 22, 2024

A potential deadlock was reported with the config file at

https://web.archive.org/web/20240522052129/https://0x0.st/XPN_.txt

In this particular configuration, the deadlock doesn't exist because
the warning triggered at a point before modules were even available.
However, the deadlock can be real because any module loaded would
invoke async_synchronize_full.

The issue is spurious for software crypto algorithms which aren't
themselves involved in async probing. However, it would be hard to
avoid for a PCI crypto driver using async probing.

In this particular call trace, the problem is easily avoided because
the only reason the module is being requested during probing is the
add_early_randomness call in the hwrng core. This feature is
vestigial since there is now a kernel thread dedicated to doing
exactly this.

So remove add_early_randomness as it is no longer needed.
Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reported-by: Eric Biggers <ebiggers@kernel.org>
Fixes: 1b6d7f9e ("tpm: add session encryption protection to tpm2_get_random()")
Link: https://lore.kernel.org/r/119dc5ed-f159-41be-9dda-1a056f29888d@notapiano/Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

67ec8cdf

17 May, 2024 2 commits

crypto: ecc - Prevent ecc_digits_from_bytes from reading too many bytes · c6ab5c91

Stefan Berger authored May 09, 2024

Prevent ecc_digits_from_bytes from reading too many bytes from the input
byte array in case an insufficient number of bytes is provided to fill the
output digit array of ndigits. Therefore, initialize the most significant
digits with 0 to avoid trying to read too many bytes later on. Convert the
function into a regular function since it is getting too big for an inline
function.

If too many bytes are provided on the input byte array the extra bytes
are ignored since the input variable 'ndigits' limits the number of digits
that will be filled.

Fixes: d67c96fb ("crypto: ecdsa - Convert byte arrays with key coordinates to digits")
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c6ab5c91

crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak · d3b17c6d

Herbert Xu authored May 08, 2024

Using completion_done to determine whether the caller has gone
away only works after a complete call.  Furthermore it's still
possible that the caller has not yet called wait_for_completion,
resulting in another potential UAF.

Fix this by making the caller use cancel_work_sync and then freeing
the memory safely.

Fixes: 7d42e097 ("crypto: qat - resolve race condition during AER recovery")
Cc: <stable@vger.kernel.org> #6.8+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

d3b17c6d

10 May, 2024 13 commits

crypto: atmel-sha204a - provide the otp content · 13909a0c

Lothar Rubusch authored May 03, 2024

Set up sysfs for the Atmel SHA204a. Provide the content of the otp zone as
an attribute field on the sysfs entry. Thereby make sure that if the chip
is locked, not connected or trouble with the i2c bus, the sysfs device is
not set up. This is mostly already handled in atmel-i2c.
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

13909a0c

crypto: atmel-sha204a - add reading from otp zone · e05ce444

Lothar Rubusch authored May 03, 2024

Provide a read function reading the otp zone. The otp zone can be used for
storing serial numbers. The otp zone, as also data zone, are only
accessible if the chip was locked before. Locking the chip is a post
production customization and has to be done manually i.e. not by this
driver. Without this step the chip is pretty much not usable, where
putting or not putting data into the otp zone is optional.
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e05ce444

crypto: atmel-i2c - rename read function · 3f5f7461

Lothar Rubusch authored May 03, 2024

Make the memory read function name more specific to the read memory zone.
The Atmel SHA204 chips provide config, otp and data zone. The implemented
read function in fact only reads some fields in zone config. The function
renaming allows for a uniform naming scheme when reading from other memory
zones.
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

3f5f7461

crypto: atmel-i2c - add missing arg description · e228b41a

Lothar Rubusch authored May 03, 2024

Add missing description for argument hwrng.
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e228b41a

crypto: iaa - Use kmemdup() instead of kzalloc() and memcpy() · bfbe27ba

Thorsten Blum authored May 02, 2024

Fixes the following two Coccinelle/coccicheck warnings reported by
memdup.cocci:

	iaa_crypto_main.c:350:19-26: WARNING opportunity for kmemdup
	iaa_crypto_main.c:358:18-25: WARNING opportunity for kmemdup
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

bfbe27ba

crypto: sahara - use 'time_left' variable with wait_for_completion_timeout() · e02ea6f9

Wolfram Sang authored Apr 30, 2024

There is a confusing pattern in the kernel to use a variable named 'timeout' to
store the result of wait_for_completion_timeout() causing patterns like:

	timeout = wait_for_completion_timeout(...)
	if (!timeout) return -ETIMEDOUT;

with all kinds of permutations. Use 'time_left' as a variable to make the code
self explaining.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e02ea6f9

crypto: api - use 'time_left' variable with wait_for_completion_killable_timeout() · 98f9e447

Wolfram Sang authored Apr 30, 2024

There is a confusing pattern in the kernel to use a variable named 'timeout' to
store the result of wait_for_completion_killable_timeout() causing patterns like:

	timeout = wait_for_completion_killable_timeout(...)
	if (!timeout) return -ETIMEDOUT;

with all kinds of permutations. Use 'time_left' as a variable to make the code
self explaining.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

98f9e447

crypto: caam - i.MX8ULP donot have CAAM page0 access · d2835701

Pankaj Gupta authored Apr 29, 2024

iMX8ULP have a secure-enclave hardware IP called EdgeLock Enclave(ELE),
that control access to caam controller's register page, i.e., page0.

At all, if the ELE release access to CAAM controller's register page,
it will release to secure-world only.

Clocks are turned on automatically for iMX8ULP. There exists the caam
clock gating bit, but it is not advised to gate the clock at linux, as
optee-os or any other entity might be using it.
Signed-off-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Reviewed-by: Gaurav Jain <gaurav.jain@nxp.com>
Reviewed-by: Horia Geanta <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

d2835701

crypto: caam - init-clk based on caam-page0-access · 61444368

Pankaj Gupta authored Apr 29, 2024

CAAM clock initializat is done based on the basis of soc specific
info stored in struct caam_imx_data:
- caam-page0-access flag
- num_clks

CAAM driver needs to be aware of access rights to CAAM control page
i.e., page0, to do things differently.
Signed-off-by: Pankaj Gupta <pankaj.gupta@nxp.com>
Reviewed-by: Gaurav Jain <gaurav.jain@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

61444368

crypto: starfive - Use fallback for unaligned dma access · f8c423ba

Jia Jie Ho authored Apr 29, 2024

Dma address mapping fails on unaligned scatterlist offset. Use sw
fallback for these cases.
Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

f8c423ba

crypto: starfive - Do not free stack buffer · d7f01649

Jia Jie Ho authored Apr 29, 2024

RSA text data uses variable length buffer allocated in software stack.
Calling kfree on it causes undefined behaviour in subsequent operations.

Cc: <stable@vger.kernel.org> #6.7+
Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

d7f01649

crypto: starfive - Skip unneeded fallback allocation · 25ca4a85

Jia Jie Ho authored Apr 29, 2024

Skip sw fallback allocation if RSA module failed to get device handle.
Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

25ca4a85

crypto: starfive - Skip dma setup for zeroed message · 3d12d90e

Jia Jie Ho authored Apr 29, 2024

Skip dma setup and mapping for AES driver if plaintext is empty.
Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

3d12d90e

03 May, 2024 3 commits

crypto: hisilicon/sec2 - fix for register offset · 6117af86

Wenkai Lin authored Apr 23, 2024

The offset of SEC_CORE_ENABLE_BITMAP should be 0 instead of 32,
it cause a kasan shift-out-bounds warning, fix it.
Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

6117af86

crypto: hisilicon/debugfs - mask the unnecessary info from the dump · 15f112f9

Chenghai Huang authored Apr 23, 2024

Some information showed by the dump function is invalid. Mask
the unnecessary information from the dump file.
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

15f112f9

crypto: qat - specify firmware files for 402xx · a3dc1f2b

Giovanni Cabiddu authored Apr 22, 2024

The 4xxx driver can probe 4xxx and 402xx devices. However, the driver
only specifies the firmware images required for 4xxx.
This might result in external tools missing these binaries, if required,
in the initramfs.

Specify the firmware image used by 402xx with the MODULE_FIRMWARE()
macros in the 4xxx driver.

Fixes: a3e8c919 ("crypto: qat - add support for 402xx devices")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a3dc1f2b

26 Apr, 2024 14 commits

crypto: x86/aes-gcm - simplify GCM hash subkey derivation · ed265f7f

Eric Biggers authored Apr 20, 2024

Remove a redundant expansion of the AES key, and use rodata for zeroes.
Also rename rfc4106_set_hash_subkey() to aes_gcm_derive_hash_subkey()
because it's used for both versions of AES-GCM, not just RFC4106.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ed265f7f

crypto: x86/aes-gcm - delete unused GCM assembly code · a0bbb1c1

Eric Biggers authored Apr 19, 2024

Delete aesni_gcm_enc() and aesni_gcm_dec() because they are unused.
Only the incremental AES-GCM functions (aesni_gcm_init(),
aesni_gcm_enc_update(), aesni_gcm_finalize()) are actually used.

This saves 17 KB of object code.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a0bbb1c1

crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath() · 6a805864

Eric Biggers authored Apr 19, 2024

Since the total length processed by the loop in xts_crypt_slowpath() is
a multiple of AES_BLOCK_SIZE, just round the length down to
AES_BLOCK_SIZE even on the last step. This doesn't change behavior, as
the last step will process a multiple of AES_BLOCK_SIZE regardless.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

6a805864

hwrng: stm32 - repair clock handling · c819d7b8

Marek Vasut authored Apr 19, 2024

The clock management in this driver does not seem to be correct. The
struct hwrng .init callback enables the clock, but there is no matching
.cleanup callback to disable the clock. The clock get disabled as some
later point by runtime PM suspend callback.

Furthermore, both runtime PM and sleep suspend callbacks access registers
first and disable clock which are used for register access second. If the
IP is already in RPM suspend and the system enters sleep state, the sleep
callback will attempt to access registers while the register clock are
already disabled. This bug has been fixed once before already in commit
9bae5494 ("hwrng: stm32 - fix pm_suspend issue"), and regressed in
commit ff4e4610 ("hwrng: stm32 - rework power management sequences") .

Fix this slightly differently, disable register clock at the end of .init
callback, this way the IP is disabled after .init. On every access to the
IP, which really is only stm32_rng_read(), do pm_runtime_get_sync() which
is already done in stm32_rng_read() to bring the IP from RPM suspend, and
pm_runtime_mark_last_busy()/pm_runtime_put_sync_autosuspend() to put it
back into RPM suspend.

Change sleep suspend/resume callbacks to enable and disable register clock
around register access, as those cannot use the RPM suspend/resume callbacks
due to slightly different initialization in those sleep callbacks. This way,
the register access should always be performed with clock surely enabled.

Fixes: ff4e4610 ("hwrng: stm32 - rework power management sequences")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c819d7b8

hwrng: stm32 - put IP into RPM suspend on failure · da62ed5c

Marek Vasut authored Apr 19, 2024

In case of an irrecoverable failure, put the IP into RPM suspend
to avoid RPM imbalance. I did not trigger this case, but it seems
it should be done based on reading the code.

Fixes: b17bc6eb ("hwrng: stm32 - rework error handling in stm32_rng_read()")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

da62ed5c

hwrng: stm32 - use logical OR in conditional · 31b57788

Marek Vasut authored Apr 19, 2024

The conditional is used to check whether err is non-zero OR whether
reg variable is non-zero after clearing bits from it. This should be
done using logical OR, not bitwise OR, fix it.

Fixes: 6b85a7e1 ("hwrng: stm32 - implement STM32MP13x support")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

31b57788

crypto: ecdh - Initialize ctx->private_key in proper byte order · 01474b70

Stefan Berger authored Apr 18, 2024

The private key in ctx->private_key is currently initialized in reverse
byte order in ecdh_set_secret and whenever the key is needed in proper
byte order the variable priv is introduced and the bytes from
ctx->private_key are copied into priv while being byte-swapped
(ecc_swap_digits). To get rid of the unnecessary byte swapping initialize
ctx->private_key in proper byte order and clean up all functions that were
previously using priv or were called with ctx->private_key:

- ecc_gen_privkey: Directly initialize the passed ctx->private_key with
  random bytes filling all the digits of the private key. Get rid of the
  priv variable. This function only has ecdh_set_secret as a caller to
  create NIST P192/256/384 private keys.

- crypto_ecdh_shared_secret: Called only from ecdh_compute_value with
  ctx->private_key. Get rid of the priv variable and work with the passed
  private_key directly.

- ecc_make_pub_key: Called only from ecdh_compute_value with
  ctx->private_key. Get rid of the priv variable and work with the passed
  private_key directly.

Cc: Salvatore Benedetto <salvatore.benedetto@intel.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

01474b70

crypto: ecdh - Pass private key in proper byte order to check valid key · bd955a4e

Stefan Berger authored Apr 18, 2024

ecc_is_key_valid expects a key with the most significant digit in the last
entry of the digit array. Currently ecdh_set_secret passes a reversed key
to ecc_is_key_valid that then passes the rather simple test checking
whether the private key is in range [2, n-3]. For all current ecdh-
supported curves (NIST P192/256/384) the 'n' parameter is a rather large
number, therefore easily passing this test.

Throughout the ecdh and ecc codebase the variable 'priv' is used for a
private_key holding the bytes in proper byte order. Therefore, introduce
priv in ecdh_set_secret and copy the bytes from ctx->private_key into
priv in proper byte order by using ecc_swap_digits. Pass priv to
ecc_is_valid_key.

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Salvatore Benedetto <salvatore.benedetto@intel.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

bd955a4e

crypto: tegra - Fix some error codes · 5ae6d3f5

Dan Carpenter authored Apr 17, 2024

Return negative -ENOMEM, instead of positive ENOMEM.

Fixes: 0880bb3b ("crypto: tegra - Add Tegra Security Engine driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Akhil R <akhilrajeev@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5ae6d3f5

dt-bindings: crypto: starfive: Restore sort order · ee2615fa

Geert Uytterhoeven authored Apr 16, 2024

Restore alphabetical sort order of the list of supported compatible
values.

Fixes: 2ccf7a5d ("dt-bindings: crypto: starfive: Add jh8100 support")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ee2615fa

crypto: qat - validate slices count returned by FW · 483fd65c

Lucas Segarra Fernandez authored Apr 16, 2024

The function adf_send_admin_tl_start() enables the telemetry (TL)
feature on a QAT device by sending the ICP_QAT_FW_TL_START message to
the firmware. This triggers the FW to start writing TL data to a DMA
buffer in memory and returns an array containing the number of
accelerators of each type (slices) supported by this HW.
The pointer to this array is stored in the adf_tl_hw_data data
structure called slice_cnt.

The array slice_cnt is then used in the function tl_print_dev_data()
to report in debugfs only statistics about the supported accelerators.
An incorrect value of the elements in slice_cnt might lead to an out
of bounds memory read.
At the moment, there isn't an implementation of FW that returns a wrong
value, but for robustness validate the slice count array returned by FW.

Fixes: 69e7649f ("crypto: qat - add support for device telemetry")
Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

483fd65c

crypto: aead,cipher - zeroize key buffer after use · 23e4099b

Hailey Mothershead authored Apr 15, 2024

I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding
cryptographic information should be zeroized once they are no longer
needed. Accomplish this by using kfree_sensitive for buffers that
previously held the private key.
Signed-off-by: Hailey Mothershead <hailmo@amazon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

23e4099b

crypto: arm64/aes-ce - Simplify round key load sequence · 571e557c

Ard Biesheuvel authored Apr 15, 2024

Tweak the round key logic so that they can be loaded using a single
branchless sequence using overlapping loads. This is shorter and
simpler, and puts the conditional branches based on the key size further
apart, which might benefit microarchitectures that cannot record taken
branches at every instruction. For these branches, use test-bit-branch
instructions that don't clobber the condition flags.

Note that none of this has any impact on performance, positive or
otherwise (and the branch prediction benefit would only benefit AES-192
which nobody uses). It does make for nicer code, though.

While at it, use \@ to generate the labels inside the macros, which is
more robust than using fixed numbers, which could clash inadvertently.
Also, bring aes-neon.S in line with these changes, including the switch
to test-and-branch instructions, to avoid surprises in the future when
we might start relying on the condition flags being preserved in the
chaining mode wrappers in aes-modes.S
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

571e557c

crypto: tegra - Convert to platform remove callback returning void · 3f4d1482

Uwe Kleine-König authored Apr 15, 2024

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Fixes: 0880bb3b ("crypto: tegra - Add Tegra Security Engine driver")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Akhil R <akhilrajeev@nvidia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

3f4d1482

19 Apr, 2024 7 commits

crypto: x86/aes-xts - optimize size of instructions operating on lengths · 543ea178

Eric Biggers authored Apr 12, 2024

x86_64 has the "interesting" property that the instruction size is
generally a bit shorter for instructions that operate on the 32-bit (or
less) part of registers, or registers that are in the original set of 8.

This patch adjusts the AES-XTS code to take advantage of that property
by changing the LEN parameter from size_t to unsigned int (which is all
that's needed and is what the non-AVX implementation uses) and using the
%eax register for KEYLEN.

This decreases the size of aes-xts-avx-x86_64.o by 1.2%.

Note that changing the kmovq to kmovd was going to be needed anyway to
make the AVX10/256 code really work on CPUs that don't support 512-bit
vectors (since the AVX10 spec says that 64-bit opmask instructions will
only be supported on processors that support 512-bit vectors).
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

543ea178

crypto: x86/aes-xts - eliminate a few more instructions · e619723a

Eric Biggers authored Apr 12, 2024

- For conditionally subtracting 16 from LEN when decrypting a message
  whose length isn't a multiple of 16, use the cmovnz instruction.

- Fold the addition of 4*VL to LEN into the sub of VL or 16 from LEN.

- Remove an unnecessary test instruction.

This results in slightly shorter code, both source and binary.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e619723a

crypto: x86/aes-xts - handle AES-128 and AES-192 more efficiently · 2717e01f

Eric Biggers authored Apr 12, 2024

Decrease the amount of code specific to the different AES variants by
"right-aligning" the sequence of round keys, and for AES-128 and AES-192
just skipping irrelevant rounds at the beginning.

This shrinks the size of aes-xts-avx-x86_64.o by 13.3%, and it improves
the efficiency of AES-128 and AES-192.  The tradeoff is that for AES-256
some additional not-taken conditional jumps are now executed.  But these
are predicted well and are cheap on x86.

Note that the ARMv8 CE based AES-XTS implementation uses a similar
strategy to handle the different AES variants.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

2717e01f

crypto: x86/aesni-xts - deduplicate aesni_xts_enc() and aesni_xts_dec() · ea9459ef

Eric Biggers authored Apr 12, 2024

Since aesni_xts_enc() and aesni_xts_dec() are very similar, generate
them from a macro that's passed an argument enc=1 or enc=0.  This
reduces the length of aesni-intel_asm.S by 112 lines while still
producing the exact same object file in both 32-bit and 64-bit mode.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ea9459ef

crypto: x86/aes-xts - handle CTS encryption more efficiently · 1d27e1f5

Eric Biggers authored Apr 12, 2024

When encrypting a message whose length isn't a multiple of 16 bytes,
encrypt the last full block in the main loop.  This works because only
decryption uses the last two tweaks in reverse order, not encryption.

This improves the performance of decrypting messages whose length isn't
a multiple of the AES block length, shrinks the size of
aes-xts-avx-x86_64.o by 5.0%, and eliminates two instructions (a test
and a not-taken conditional jump) when encrypting a message whose length
*is* a multiple of the AES block length.

While it's not super useful to optimize for ciphertext stealing given
that it's rarely needed in practice, the other two benefits mentioned
above make this optimization worthwhile.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1d27e1f5

crypto: stm32/hash - add full DMA support for stm32mpx · 3525fe47

Maxime Méré authored Apr 12, 2024

Due to a lack of alignment in the data sent by requests, the actual DMA
support of the STM32 hash driver is only working with digest calls.
This patch, based on the algorithm used in the driver omap-sham.c,
allows for the usage of DMA in any situation.

It has been functionally tested on STM32MP15, STM32MP13 and STM32MP25.

By checking the performance of this new driver with OpenSSL, the
following results were found:

Performance:

(datasize: 4096, number of hashes performed in 10s)

|type   |no DMA    |DMA support|software  |
|-------|----------|-----------|----------|
|md5    |13873.56k |10958.03k  |71163.08k |
|sha1   |13796.15k |10729.47k  |39670.58k |
|sha224 |13737.98k |10775.76k  |22094.64k |
|sha256 |13655.65k |10872.01k  |22075.39k |

CPU Usage:

(algorithm used: sha256, computation time: 20s, measurement taken at
~10s)

|datasize  |no DMA |DMA  | software |
|----------|-------|-----|----------|
|  2048    | 56%   | 49% | 50%      |
|  4096    | 54%   | 46% | 50%      |
|  8192    | 53%   | 40% | 50%      |
| 16384    | 53%   | 33% | 50%      |

Note: this update doesn't change the driver performance without DMA.

As shown, performance with DMA is slightly lower than without, but in
most cases, it will save CPU time.
Signed-off-by: Maxime Méré <maxime.mere@foss.st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

3525fe47

crypto: qat - improve error logging to be consistent across features · d281a28b

Adam Guerin authored Apr 12, 2024

Improve error logging in rate limiting feature. Staying consistent with
the error logging found in the telemetry feature.

Fixes: d9fb8408 ("crypto: qat - add rate limiting feature to qat_4xxx")
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

d281a28b