Commits · 040279e84d4e94d65cd5a9f95f9821e26c72d26e · Kirill Smelkov / linux

12 Apr, 2024 28 commits

crypto: hisilicon/sgl - Delete redundant parameter verification · 040279e8

Chenghai Huang authored Apr 07, 2024

The input parameter check in acc_get_sgl is redundant. The
caller has been verified once. When the check is performed for
multiple times, the performance deteriorates.

So the redundant parameter verification is deleted, and the
index verification is changed to the module entry function for
verification.
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

040279e8

crypto: hisilicon/debugfs - Fix debugfs uninit process issue · 8be09133

Chenghai Huang authored Apr 07, 2024

During the zip probe process, the debugfs failure does not stop
the probe. When debugfs initialization fails, jumping to the
error branch will also release regs, in addition to its own
rollback operation.

As a result, it may be released repeatedly during the regs
uninit process. Therefore, the null check needs to be added to
the regs uninit process.
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8be09133

crypto: hisilicon/sec - Add the condition for configuring the sriov function · 5307147b

Chenghai Huang authored Apr 07, 2024

When CONFIG_PCI_IOV is disabled, the SRIOV configuration
function is not required. An error occurs if this function is
incorrectly called.

Consistent with other modules, add the condition for
configuring the sriov function of sec_pci_driver.
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5307147b

crypto: x86/sha512-avx2 - add missing vzeroupper · 6a24fdfe

Eric Biggers authored Apr 05, 2024

Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it.  This is necessary to avoid reducing the
performance of SSE code.

Fixes: e01d69cb ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

6a24fdfe

crypto: x86/sha256-avx2 - add missing vzeroupper · 57ce8a4e

Eric Biggers authored Apr 05, 2024

Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it.  This is necessary to avoid reducing the
performance of SSE code.

Fixes: d34a4600 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

57ce8a4e

crypto: x86/nh-avx2 - add missing vzeroupper · 4ad096cc

Eric Biggers authored Apr 05, 2024

Since nh_avx2() uses ymm registers, execute vzeroupper before returning
from it.  This is necessary to avoid reducing the performance of SSE
code.

Fixes: 0f961f9f ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4ad096cc

crypto: iaa - Use cpumask_weight() when rebalancing · 8f0e0cf7

Tom Zanussi authored Apr 05, 2024

If some cpus are offlined, or if the node mask is smaller than
expected, the 'nonexistent cpu' warning in rebalance_wq_table() may be
erroneously triggered.

Use cpumask_weight() to make sure we only iterate over the exact
number of cpus in the mask.

Also use num_possible_cpus() instead of num_online_cpus() to make sure
all slots in the wq table are initialized.
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8f0e0cf7

crypto: x509 - Add OID for NIST P521 and extend parser for it · 3ba2ae36

Stefan Berger authored Apr 04, 2024

Enable the x509 parser to accept NIST P521 certificates and add the
OID for ansip521r1, which is the identifier for NIST P521.

Cc: David Howells <dhowells@redhat.com>
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

3ba2ae36

crypto: asymmetric_keys - Adjust signature size calculation for NIST P521 · 4dc50330

Stefan Berger authored Apr 04, 2024

Adjust the calculation of the maximum signature size for support of
NIST P521. While existing curves may prepend a 0 byte to their coordinates
(to make the number positive), NIST P521 will not do this since only the
first bit in the most significant byte is used.

If the encoding of the x & y coordinates requires at least 128 bytes then
an additional byte is needed for the encoding of the length. Take this into
account when calculating the maximum signature size.
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4dc50330

crypto: ecdsa - Register NIST P521 and extend test suite · a7d45ba7

Stefan Berger authored Apr 04, 2024

Register NIST P521 as an akcipher and extend the testmgr with
NIST P521-specific test vectors.

Add a module alias so the module gets automatically loaded by the crypto
subsystem when the curve is needed.
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a7d45ba7

crypto: ecdsa - Rename keylen to bufsize where necessary · 703ca5cd

Stefan Berger authored Apr 04, 2024

In cases where 'keylen' was referring to the size of the buffer used by
a curve's digits, it does not reflect the purpose of the variable anymore
once NIST P521 is used. What it refers to then is the size of the buffer,
which may be a few bytes larger than the size a coordinate of a key.
Therefore, rename keylen to bufsize where appropriate.
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

703ca5cd

crypto: ecdsa - Replace ndigits with nbits where precision is needed · dee45a60

Stefan Berger authored Apr 04, 2024

Replace the usage of ndigits with nbits where precise space calculations
are needed, such as in ecdsa_max_size where the length of a coordinate is
determined.
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dee45a60

crypto: ecc - Add NIST P521 curve parameters · 288b46c5

Stefan Berger authored Apr 04, 2024

Add the parameters for the NIST P521 curve and define a new curve ID
for it. Make the curve available in ecc_get_curve.
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

288b46c5

crypto: ecc - Add special case for NIST P521 in ecc_point_mult · 114e8043

Stefan Berger authored Apr 04, 2024

In ecc_point_mult use the number of bits of the NIST P521 curve + 2. The
change is required specifically for NIST P521 to pass mathematical tests
on the public key.
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

114e8043

crypto: ecc - Implement vli_mmod_fast_521 for NIST p521 · e7fb0627

Stefan Berger authored Apr 04, 2024

Implement vli_mmod_fast_521 following the description for how to calculate
the modulus for NIST P521 in the NIST publication "Recommendations for
Discrete Logarithm-Based Cryptography: Elliptic Curve Domain Parameters"
section G.1.4.

NIST p521 requires 9 64bit digits, so increase the ECC_MAX_DIGITS so that
the vli digit array provides enough elements to fit the larger integers
required by this curve.
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e7fb0627

crypto: ecc - Add nbits field to ecc_curve structure · c0d6bd1f

Stefan Berger authored Apr 04, 2024

Add the number of bits a curve has to the ecc_curve definition to be able
to derive the number of bytes a curve requires for its coordinates from it.
It also allows one to identify a curve by its particular size. Set the
number of bits on all curve definitions.
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Vitaly Chikunov <vt@altlinux.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c0d6bd1f

crypto: ecdsa - Extend res.x mod n calculation for NIST P521 · 48e8d3a5

Stefan Berger authored Apr 04, 2024

res.x has been calculated by ecc_point_mult_shamir, which uses
'mod curve_prime' on res.x and therefore p > res.x with 'p' being the
curve_prime. Further, it is true that for the NIST curves p > n with 'n'
being the 'curve_order' and therefore the following may be true as well:
p > res.x >= n.

If res.x >= n then res.x mod n can be calculated by iteratively sub-
tracting n from res.x until res.x < n. For NIST P192/256/384 this can be
done in a single subtraction. This can also be done in a single
subtraction for NIST P521.

The mathematical reason why a single subtraction is sufficient is due to
the values of 'p' and 'n' of the NIST curves where the following holds
true:

   note: max(res.x) = p - 1

   max(res.x) - n < n
       p - 1  - n < n
       p - 1      < 2n  => holds true for the NIST curves
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

48e8d3a5

crypto: ecdsa - Adjust tests on length of key parameters · dcee6068

Stefan Berger authored Apr 04, 2024

In preparation for support of NIST P521, adjust the basic tests on the
length of the provided key parameters to only ensure that the length of the
x plus y coordinates parameter array is not an odd number and that each
coordinate fits into an array of 'ndigits' digits. Mathematical tests on
the key's parameters are then done in ecc_is_pubkey_valid_full rejecting
invalid keys.

The change is necessary since NIST P521 keys do not have keys with
coordinates that each require 'full' digits (= all bits in u64 used).
NIST P521 only requires 2 bytes (9 bits) in the most significant digit
unlike NIST P192/256/384 that each require multiple 'full' digits.
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dcee6068

crypto: ecdsa - Convert byte arrays with key coordinates to digits · d67c96fb

Stefan Berger authored Apr 04, 2024

For NIST P192/256/384 the public key's x and y parameters could be copied
directly from a given array since both parameters filled 'ndigits' of
digits (a 'digit' is a u64). For support of NIST P521 the key parameters
need to have leading zeros prepended to the most significant digit since
only 2 bytes of the most significant digit are provided.

Therefore, implement ecc_digits_from_bytes to convert a byte array into an
array of digits and use this function in ecdsa_set_pub_key where an input
byte array needs to be converted into digits.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

d67c96fb

crypto: ecc - Use ECC_CURVE_NIST_P192/256/384_DIGITS where possible · 526d23fc

Stefan Berger authored Apr 04, 2024

Replace hard-coded numbers with ECC_CURVE_NIST_P192/256/384_DIGITS where
possible.
Tested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

526d23fc

crypto: tegra - Add Tegra Security Engine driver · 0880bb3b

Akhil R authored Apr 03, 2024

Add support for Tegra Security Engine which can accelerate various
crypto algorithms. The Engine has two separate instances within for
AES and HASH algorithms respectively.

The driver registers two crypto engines - one for AES and another for
HASH algorithms and these operate independently and both uses the host1x
bus. Additionally, it provides  hardware-assisted key protection for up
to 15 symmetric keys which it can use for the cipher operations.
Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0880bb3b

gpu: host1x: Add Tegra SE to SID table · cc370ff8

Akhil R authored Apr 03, 2024

Add Tegra Security Engine details to the SID table in host1x driver.
These entries are required to be in place to configure the stream ID
for SE. Register writes to stream ID registers fail otherwise.
Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Acked-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

cc370ff8

dt-bindings: crypto: Add Tegra Security Engine · 17048b22

Akhil R authored Apr 03, 2024

Add DT binding document for Tegra Security Engine.
The AES and HASH algorithms are handled independently by separate
engines within the Security Engine. These engines are registered
as two separate crypto engine drivers.
Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

17048b22

padata: Disable BH when taking works lock on MT path · 58329c43

Herbert Xu authored Apr 03, 2024

As the old padata code can execute in softirq context, disable
softirqs for the new padata_do_mutithreaded code too as otherwise
lockdep will get antsy.

Reported-by: syzbot+0cb5bb0f4bf9e79db3b3@syzkaller.appspotmail.com
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

58329c43

crypto: ccp - drop platform ifdef checks · 42c2d7d0

Arnd Bergmann authored Apr 03, 2024

When both ACPI and OF are disabled, the dev_vdata variable is unused:

drivers/crypto/ccp/sp-platform.c:33:34: error: unused variable 'dev_vdata' [-Werror,-Wunused-const-variable]

This is not a useful configuration, and there is not much point in saving
a few bytes when only one of the two is enabled, so just remove all
these ifdef checks and rely on of_match_node() and acpi_match_device()
returning NULL when these subsystems are disabled.

Fixes: 6c506343 ("crypto: ccp - Add ACPI support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

42c2d7d0

crypto: qat - Fix spelling mistake "Invalide" -> "Invalid" · f5c2cf9d

Colin Ian King authored Apr 02, 2024

There is a spelling mistake in a dev_err message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

f5c2cf9d

crypto: algboss - remove NULL check in cryptomgr_schedule_probe() · ea32d547

Roman Smirnov authored Apr 01, 2024

The for loop will be executed at least once, so i > 0. If the loop
is interrupted before i is incremented (e.g., when checking len for NULL),
i will not be checked.

Found by Linux Verification Center (linuxtesting.org) with Svace.
Signed-off-by: Roman Smirnov <r.smirnov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ea32d547

crypto: ecc - remove checks in crypto_ecdh_shared_secret() and ecc_make_pub_key() · 233e7505

Roman Smirnov authored Apr 01, 2024

With the current state of the code, ecc_get_curve() cannot return
NULL in crypto_ecdh_shared_secret() and ecc_make_pub_key(). This is
conditioned by the fact that they are only called from ecdh_compute_value(),
which implements the kpp_alg::{generate_public_key,compute_shared_secret}()
methods. Also ecdh implements the kpp_alg::init() method, which is called
before the other methods and sets ecdh_ctx::curve_id to a valid value.
Signed-off-by: Roman Smirnov <r.smirnov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

233e7505

05 Apr, 2024 12 commits

crypto: jitter - Replace http with https · 4ad27a8b

Thorsten Blum authored Mar 29, 2024

The PDF is also available via https.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4ad27a8b

crypto: jitter - Remove duplicate word in comment · 8fa5f4f0

Thorsten Blum authored Mar 29, 2024

s/in//
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8fa5f4f0

crypto: x86/aes-xts - wire up VAES + AVX10/512 implementation · aa2197f5

Eric Biggers authored Mar 29, 2024

Add an AES-XTS implementation "xts-aes-vaes-avx10_512" for x86_64 CPUs
with the VAES, VPCLMULQDQ, and either AVX10/512 or AVX512BW + AVX512VL
extensions. This implementation uses zmm registers to operate on four
AES blocks at a time. The assembly code is instantiated using a macro
so that most of the source code is shared with other implementations.

To avoid downclocking on older Intel CPU models, an exclusion list is
used to prevent this 512-bit implementation from being used by default
on some CPU models. They will use xts-aes-vaes-avx10_256 instead. For
now, this exclusion list is simply coded into aesni-intel_glue.c. It
may make sense to eventually move it into a more central location.

xts-aes-vaes-avx10_512 is slightly faster than xts-aes-vaes-avx10_256 on
some current CPUs. E.g., on AMD Zen 4, AES-256-XTS decryption
throughput increases by 13% with 4096-byte inputs, or 14% with 512-byte
inputs. On Intel Sapphire Rapids, AES-256-XTS decryption throughput
increases by 2% with 4096-byte inputs, or 3% with 512-byte inputs.

Future CPUs may provide stronger 512-bit support, in which case a larger
benefit should be seen.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

aa2197f5

crypto: x86/aes-xts - wire up VAES + AVX10/256 implementation · ee63fea0

Eric Biggers authored Mar 29, 2024

Add an AES-XTS implementation "xts-aes-vaes-avx10_256" for x86_64 CPUs
with the VAES, VPCLMULQDQ, and either AVX10/256 or AVX512BW + AVX512VL
extensions. This implementation avoids using zmm registers, instead
using ymm registers to operate on two AES blocks at a time. The
assembly code is instantiated using a macro so that most of the source
code is shared with other implementations.

This is the optimal implementation on CPUs that support VAES and AVX512
but where the zmm registers should not be used due to downclocking
effects, for example Intel's Ice Lake. It should also be the optimal
implementation on future CPUs that support AVX10/256 but not AVX10/512.

The performance is slightly better than that of xts-aes-vaes-avx2, which
uses the same 256-bit vector length, due to factors such as being able
to use ymm16-ymm31 to cache the AES round keys, and being able to use
the vpternlogd instruction to do XORs more efficiently. For example, on
Ice Lake, the throughput of decrypting 4096-byte messages with
AES-256-XTS is 6.6% higher with xts-aes-vaes-avx10_256 than with
xts-aes-vaes-avx2. While this is a small improvement, it is
straightforward to provide this implementation (xts-aes-vaes-avx10_256)
as long as we are providing xts-aes-vaes-avx2 and xts-aes-vaes-avx10_512
anyway, due to the way the _aes_xts_crypt macro is structured.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ee63fea0

crypto: x86/aes-xts - wire up VAES + AVX2 implementation · e787060b

Eric Biggers authored Mar 29, 2024

Add an AES-XTS implementation "xts-aes-vaes-avx2" for x86_64 CPUs with
the VAES, VPCLMULQDQ, and AVX2 extensions, but not AVX512 or AVX10.
This implementation uses ymm registers to operate on two AES blocks at a
time. The assembly code is instantiated using a macro so that most of
the source code is shared with other implementations.

This is the optimal implementation on AMD Zen 3. It should also be the
optimal implementation on Intel Alder Lake, which similarly supports
VAES but not AVX512. Comparing to xts-aes-aesni-avx on Zen 3,
xts-aes-vaes-avx2 provides 70% higher AES-256-XTS decryption throughput
with 4096-byte messages, or 23% higher with 512-byte messages.

A large improvement is also seen with CPUs that do support AVX512 (e.g.,
98% higher AES-256-XTS decryption throughput on Ice Lake with 4096-byte
messages), though the following patches add AVX512 optimized
implementations to get a bit more performance on those CPUs.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e787060b

crypto: x86/aes-xts - wire up AESNI + AVX implementation · 996f4dcb

Eric Biggers authored Mar 29, 2024

Add an AES-XTS implementation "xts-aes-aesni-avx" for x86_64 CPUs that
have the AES-NI and AVX extensions but not VAES.  It's similar to the
existing xts-aes-aesni in that uses xmm registers to operate on one AES
block at a time.  It differs from xts-aes-aesni in the following ways:

- It uses the VEX-coded (non-destructive) instructions from AVX.
  This improves performance slightly.
- It incorporates some additional optimizations such as interleaving the
  tweak computation with AES en/decryption, handling single-page
  messages more efficiently, and caching the first round key.
- It supports only 64-bit (x86_64).
- It's generated by an assembly macro that will also be used to generate
  VAES-based implementations.

The performance improvement over xts-aes-aesni varies from small to
large, depending on the CPU and other factors such as the size of the
messages en/decrypted.  For example, the following increases in
AES-256-XTS decryption throughput are seen on the following CPUs:

                          | 4096-byte messages | 512-byte messages |
    ----------------------+--------------------+-------------------+
    Intel Skylake         |        6%          |       31%         |
    Intel Cascade Lake    |        4%          |       26%         |
    AMD Zen 1             |        61%         |       73%         |
    AMD Zen 2             |        36%         |       59%         |

(The above CPUs don't support VAES, so they can't use VAES instead.)

While this isn't as large an improvement as what VAES provides, this
still seems worthwhile.  This implementation is fairly easy to provide
based on the assembly macro that's needed for VAES anyway, and it will
be the best implementation on a large number of CPUs (very roughly, the
CPUs launched by Intel and AMD from 2011 to 2018).

This makes the existing xts-aes-aesni *mostly* obsolete.  For now, leave
it in place to support 32-bit kernels and also CPUs like Intel Westmere
that support AES-NI but not AVX.  (We could potentially remove it anyway
and just rely on the indirect acceleration via ecb-aes-aesni in those
cases, but that change will need to be considered separately.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

996f4dcb

crypto: x86/aes-xts - add AES-XTS assembly macro for modern CPUs · d6371688

Eric Biggers authored Mar 29, 2024

Add an assembly file aes-xts-avx-x86_64.S which contains a macro that
expands into AES-XTS implementations for x86_64 CPUs that support at
least AES-NI and AVX, optionally also taking advantage of VAES,
VPCLMULQDQ, and AVX512 or AVX10.

This patch doesn't expand the macro at all.  Later patches will do so,
adding each implementation individually so that the motivation and use
case for each individual implementation can be fully presented.

The file also provides a function aes_xts_encrypt_iv() which handles the
encryption of the IV (tweak), using AES-NI and AVX.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

d6371688

x86: add kconfig symbols for assembler VAES and VPCLMULQDQ support · 7d4700d1

Eric Biggers authored Mar 29, 2024

Add config symbols AS_VAES and AS_VPCLMULQDQ that expose whether the
assembler supports the vector AES and carryless multiplication
cryptographic extensions.
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7d4700d1

crypto: ecdh - explicitly zeroize private_key · 73e5984e

Joachim Vandersmissen authored Mar 28, 2024

private_key is overwritten with the key parameter passed in by the
caller (if present), or alternatively a newly generated private key.
However, it is possible that the caller provides a key (or the newly
generated key) which is shorter than the previous key. In that
scenario, some key material from the previous key would not be
overwritten. The easiest solution is to explicitly zeroize the entire
private_key array first.

Note that this patch slightly changes the behavior of this function:
previously, if the ecc_gen_privkey failed, the old private_key would
remain. Now, the private_key is always zeroized. This behavior is
consistent with the case where params.key is set and ecc_is_key_valid
fails.
Signed-off-by: Joachim Vandersmissen <git@jvdsn.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

73e5984e

crypto: fips - Remove the now superfluous sentinel element from ctl_table array · 5adf213c

Joel Granados authored Mar 28, 2024

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which will
reduce the overall build time size of the kernel and run time memory
bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel from crypto_sysctl_table
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5adf213c

crypto: jitter - Use kvfree_sensitive() to fix Coccinelle warning · 6e61ee1c

Thorsten Blum authored Mar 27, 2024

Replace memzero_explicit() and kvfree() with kvfree_sensitive() to fix
the following Coccinelle/coccicheck warning reported by
kfree_sensitive.cocci:

	WARNING opportunity for kfree_sensitive/kvfree_sensitive
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

6e61ee1c

dt-bindings: crypto: ti,omap-sham: Convert to dtschema · a00dce05

Animesh Agarwal authored Mar 27, 2024

Convert the OMAP SoC SHA crypto Module bindings to DT Schema.
Signed-off-by: Animesh Agarwal <animeshagarwal28@gmail.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a00dce05