Commits · 2481104fe98d5b016fdd95d649b1235f21e491ba · Kirill Smelkov / linux

08 Jan, 2021 2 commits

crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helper · 2481104f

Ard Biesheuvel authored Dec 31, 2020

The AES-NI driver implements XTS via the glue helper, which consumes
a struct with sets of function pointers which are invoked on chunks
of input data of the appropriate size, as annotated in the struct.

Let's get rid of this indirection, so that we can perform direct calls
to the assembler helpers. Instead, let's adopt the arm64 strategy, i.e.,
provide a helper which can consume inputs of any size, provided that the
penultimate, full block is passed via the last call if ciphertext stealing
needs to be applied.

This also allows us to enable the XTS mode for i386.

Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

2481104f

crypto: x86/aes-ni-xts - use direct calls to and 4-way stride · 86ad60a6

Ard Biesheuvel authored Dec 31, 2020

The XTS asm helper arrangement is a bit odd: the 8-way stride helper
consists of back-to-back calls to the 4-way core transforms, which
are called indirectly, based on a boolean that indicates whether we
are performing encryption or decryption.

Given how costly indirect calls are on x86, let's switch to direct
calls, and given how the 8-way stride doesn't really add anything
substantial, use a 4-way stride instead, and make the asm core
routine deal with any multiple of 4 blocks. Since 512 byte sectors
or 4 KB blocks are the typical quantities XTS operates on, increase
the stride exported to the glue helper to 512 bytes as well.

As a result, the number of indirect calls is reduced from 3 per 64 bytes
of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup
when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU)

Fixes: 9697fa39 ("x86/retpoline/crypto: Convert crypto assembler indirect jumps")
Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

86ad60a6

02 Jan, 2021 38 commits

crypto: picoxcell - Remove PicoXcell driver · fecff3b9

Rob Herring authored Dec 10, 2020

PicoXcell has had nothing but treewide cleanups for at least the last 8
years and no signs of activity. The most recent activity is a yocto vendor
kernel based on v3.0 in 2015.

Cc: Jamie Iles <jamie@jamieiles.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

fecff3b9

crypto: arm/blake2b - add NEON-accelerated BLAKE2b · 1862eb00

Eric Biggers authored Dec 23, 2020

Add a NEON-accelerated implementation of BLAKE2b.

On Cortex-A7 (which these days is the most common ARM processor that
doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as
SHA-256, and slightly faster than SHA-1.  It is also almost three times
as fast as the generic implementation of BLAKE2b:

	Algorithm            Cycles per byte (on 4096-byte messages)
	===================  =======================================
	blake2b-256-neon     14.0
	sha1-neon            16.3
	blake2s-256-arm      18.8
	sha1-asm             20.8
	blake2s-256-generic  26.0
	sha256-neon	     28.9
	sha256-asm	     32.0
	blake2b-256-generic  38.9

This implementation isn't directly based on any other implementation,
but it borrows some ideas from previous NEON code I've written as well
as from chacha-neon-core.S.  At least on Cortex-A7, it is faster than
the other NEON implementations of BLAKE2b I'm aware of (the
implementation in the BLAKE2 official repository using intrinsics, and
Andrew Moon's implementation which can be found in SUPERCOP).  It does
only one block at a time, so it performs well on short messages too.

NEON-accelerated BLAKE2b is useful because there is interest in using
BLAKE2b-256 for dm-verity on low-end Android devices (specifically,
devices that lack the ARMv8 Crypto Extensions) to replace SHA-1.  On
these devices, the performance cost of upgrading to SHA-256 may be
unacceptable, whereas BLAKE2b-256 would actually improve performance.

Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which
is intended for 32-bit platforms), on 32-bit ARM processors with NEON,
BLAKE2b is actually faster than BLAKE2s.  This is because NEON supports
64-bit operations, and because BLAKE2s's block size is too small for
NEON to be helpful for it.  The best I've been able to do with BLAKE2s
on Cortex-A7 is 18.8 cpb with an optimized scalar implementation.

(I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but
they're more complex as they require running multiple hashes at once.
Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7,
so I expect that any speedup from BLAKE2sp or BLAKE3 would come only
from the smaller number of rounds, not from the extra parallelism.)

For now this BLAKE2b implementation is only wired up to the shash API,
since there is no library API for BLAKE2b yet.  However, I've tried to
keep things consistent with BLAKE2s, e.g. by defining
blake2b_compress_arch() which is analogous to blake2s_compress_arch()
and could be exported for use by the library API later if needed.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1862eb00

crypto: blake2b - update file comment · 0cdc438e

Eric Biggers authored Dec 23, 2020

The file comment for blake2b_generic.c makes it sound like it's the
reference implementation of BLAKE2b with only minor changes.  But it's
actually been changed a lot.  Update the comment to make this clearer.
Reviewed-by: David Sterba <dsterba@suse.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0cdc438e

crypto: blake2b - sync with blake2s implementation · 28dcca4c

Eric Biggers authored Dec 23, 2020

Sync the BLAKE2b code with the BLAKE2s code as much as possible:

- Move a lot of code into new headers <crypto/blake2b.h> and
  <crypto/internal/blake2b.h>, and adjust it to be like the
  corresponding BLAKE2s code, i.e. like <crypto/blake2s.h> and
  <crypto/internal/blake2s.h>.

- Rename constants, e.g. BLAKE2B_*_DIGEST_SIZE => BLAKE2B_*_HASH_SIZE.

- Use a macro BLAKE2B_ALG() to define the shash_alg structs.

- Export blake2b_compress_generic() for use as a fallback.

This makes it much easier to add optimized implementations of BLAKE2b,
as optimized implementations can use the helper functions
crypto_blake2b_{setkey,init,update,final}() and
blake2b_compress_generic().  The ARM implementation will use these.

But this change is also helpful because it eliminates unnecessary
differences between the BLAKE2b and BLAKE2s code, so that the same
improvements can easily be made to both.  (The two algorithms are
basically identical, except for the word size and constants.)  It also
makes it straightforward to add a library API for BLAKE2b in the future
if/when it's needed.

This change does make the BLAKE2b code slightly more complicated than it
needs to be, as it doesn't actually provide a library API yet.  For
example, __blake2b_update() doesn't really need to exist yet; it could
just be inlined into crypto_blake2b_update().  But I believe this is
outweighed by the benefits of keeping the code in sync.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

28dcca4c

wireguard: Kconfig: select CRYPTO_BLAKE2S_ARM · a64bfe7a

Eric Biggers authored Dec 23, 2020

When available, select the new implementation of BLAKE2s for 32-bit ARM.
This is faster than the generic C implementation.
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a64bfe7a

crypto: arm/blake2s - add ARM scalar optimized BLAKE2s · 5172d322

Eric Biggers authored Dec 23, 2020

Add an ARM scalar optimized implementation of BLAKE2s.

NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too
small for NEON to help.  Each NEON instruction would depend on the
previous one, resulting in poor performance.

With scalar instructions, on the other hand, we can take advantage of
ARM's "free" rotations (like I did in chacha-scalar-core.S) to get an
implementation get runs much faster than the C implementation.

Performance results on Cortex-A7 in cycles per byte using the shash API:

	4096-byte messages:
		blake2s-256-arm:     18.8
		blake2s-256-generic: 26.0

	500-byte messages:
		blake2s-256-arm:     20.3
		blake2s-256-generic: 27.9

	100-byte messages:
		blake2s-256-arm:     29.7
		blake2s-256-generic: 39.2

	32-byte messages:
		blake2s-256-arm:     50.6
		blake2s-256-generic: 66.2

Except on very short messages, this is still slower than the NEON
implementation of BLAKE2b which I've written; that is 14.0, 16.4, 25.8,
and 76.1 cpb on 4096, 500, 100, and 32-byte messages, respectively.
However, optimized BLAKE2s is useful for cases where BLAKE2s is used
instead of BLAKE2b, such as WireGuard.

This new implementation is added in the form of a new module
blake2s-arm.ko, which is analogous to blake2s-x86_64.ko in that it
provides blake2s_compress_arch() for use by the library API as well as
optionally register the algorithms with the shash API.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5172d322

crypto: blake2s - include <linux/bug.h> instead of <asm/bug.h> · bbda6e0f

Eric Biggers authored Dec 23, 2020

Address the following checkpatch warning:

	WARNING: Use #include <linux/bug.h> instead of <asm/bug.h>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

bbda6e0f

crypto: blake2s - adjust include guard naming · 8786841b

Eric Biggers authored Dec 23, 2020

Use the full path in the include guards for the BLAKE2s headers to avoid
ambiguity and to match the convention for most files in include/crypto/.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8786841b

crypto: blake2s - add comment for blake2s_state fields · 7d87131f

Eric Biggers authored Dec 23, 2020

The first three fields of 'struct blake2s_state' are used in assembly
code, which isn't immediately obvious, so add a comment to this effect.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7d87131f

crypto: blake2s - optimize blake2s initialization · 42ad8cf8

Eric Biggers authored Dec 23, 2020

If no key was provided, then don't waste time initializing the block
buffer, as its initial contents won't be used.

Also, make crypto_blake2s_init() and blake2s() call a single internal
function __blake2s_init() which treats the key as optional, rather than
conditionally calling blake2s_init() or blake2s_init_key(). This
reduces the compiled code size, as previously both blake2s_init() and
blake2s_init_key() were being inlined into these two callers, except
when the key size passed to blake2s() was a compile-time constant.

These optimizations aren't that significant for BLAKE2s. However, the
equivalent optimizations will be more significant for BLAKE2b, as
everything is twice as big in BLAKE2b. And it's good to keep things
consistent rather than making optimizations for BLAKE2b but not BLAKE2s.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

42ad8cf8

crypto: blake2s - share the "shash" API boilerplate code · 8c4a93a1

Eric Biggers authored Dec 23, 2020

Add helper functions for shash implementations of BLAKE2s to
include/crypto/internal/blake2s.h, taking advantage of
__blake2s_update() and __blake2s_final() that were added by the previous
patch to share more code between the library and shash implementations.

crypto_blake2s_setkey() and crypto_blake2s_init() are usable as
shash_alg::setkey and shash_alg::init directly, while
crypto_blake2s_update() and crypto_blake2s_final() take an extra
'blake2s_compress_t' function pointer parameter.  This allows the
implementation of the compression function to be overridden, which is
the only part that optimized implementations really care about.

The new functions are inline functions (similar to those in sha1_base.h,
sha256_base.h, and sm3_base.h) because this avoids needing to add a new
module blake2s_helpers.ko, they aren't *too* long, and this avoids
indirect calls which are expensive these days.  Note that they can't go
in blake2s_generic.ko, as that would require selecting CRYPTO_BLAKE2S
from CRYPTO_BLAKE2S_X86, which would cause a recursive dependency.

Finally, use these new helper functions in the x86 implementation of
BLAKE2s.  (This part should be a separate patch, but unfortunately the
x86 implementation used the exact same function names like
"crypto_blake2s_update()", so it had to be updated at the same time.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8c4a93a1

crypto: blake2s - move update and final logic to internal/blake2s.h · 057edc9c

Eric Biggers authored Dec 23, 2020

Move most of blake2s_update() and blake2s_final() into new inline
functions __blake2s_update() and __blake2s_final() in
include/crypto/internal/blake2s.h so that this logic can be shared by
the shash helper functions.  This will avoid duplicating this logic
between the library and shash implementations.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

057edc9c

crypto: blake2s - remove unneeded includes · df412e7e

Eric Biggers authored Dec 23, 2020

It doesn't make sense for the generic implementation of BLAKE2s to
include <crypto/internal/simd.h> and <linux/jump_label.h>, as these are
things that would only be useful in an architecture-specific
implementation.  Remove these unnecessary includes.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

df412e7e

crypto: x86/blake2s - define shash_alg structs using macros · 1aa90f4c

Eric Biggers authored Dec 23, 2020

The shash_alg structs for the four variants of BLAKE2s are identical
except for the algorithm name, driver name, and digest size.  So, avoid
code duplication by using a macro to define these structs.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1aa90f4c

crypto: blake2s - define shash_alg structs using macros · 0d396058

Eric Biggers authored Dec 23, 2020

The shash_alg structs for the four variants of BLAKE2s are identical
except for the algorithm name, driver name, and digest size.  So, avoid
code duplication by using a macro to define these structs.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0d396058

hwrng: ingenic - Fix a resource leak in an error handling path · c4ff41b9

Christophe JAILLET authored Dec 19, 2020

In case of error, we should call 'clk_disable_unprepare()' to undo a
previous 'clk_prepare_enable()' call, as already done in the remove
function.

Fixes: 406346d2 ("hwrng: ingenic - Add hardware TRNG for Ingenic X1830")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c4ff41b9

hwrng: iproc-rng200 - Move enable/disable in separate function · 256693a3

Matthias Brugger authored Dec 18, 2020

We are calling the same code for enable and disable the block in various
parts of the driver. Put that code into a new function to reduce code
duplication.
Signed-off-by: Matthias Brugger <mbrugger@suse.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

256693a3

hwrng: iproc-rng200 - Fix disable of the block. · 96a6af54

Matthias Brugger authored Dec 18, 2020

When trying to disable the block we bitwise or the control
register with value zero. This is confusing as using bitwise or with
value zero doesn't have any effect at all. Drop this as we already set
the enable bit to zero by appling inverted RNG_RBGEN_MASK.
Signed-off-by: Matthias Brugger <mbrugger@suse.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

96a6af54

crypto: arm64/aes-ctr - improve tail handling · 5318d3db

Ard Biesheuvel authored Dec 17, 2020

Counter mode is a stream cipher chaining mode that is typically used
with inputs that are of arbitrarily length, and so a tail block which
is smaller than a full AES block is rule rather than exception.

The current ctr(aes) implementation for arm64 always makes a separate
call into the assembler routine to process this tail block, which is
suboptimal, given that it requires reloading of the AES round keys,
and prevents us from handling this tail block using the 5-way stride
that we use for better performance on deep pipelines.

So let's update the assembler routine so it can handle any input size,
and uses NEON permutation instructions and overlapping loads and stores
to handle the tail block. This results in a ~16% speedup for 1420 byte
blocks on cores with deep pipelines such as ThunderX2.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5318d3db

crypto: arm64/aes-ce - really hide slower algos when faster ones are enabled · 15deb433

Ard Biesheuvel authored Dec 17, 2020

Commit 69b6f2e8 ("crypto: arm64/aes-neon - limit exposed routines if
faster driver is enabled") intended to hide modes from the plain NEON
driver that are also implemented by the faster bit sliced NEON one if
both are enabled. However, the defined() CPP function does not detect
if the bit sliced NEON driver is enabled as a module. So instead, let's
use IS_ENABLED() here.

Fixes: 69b6f2e8 ("crypto: arm64/aes-neon - limit exposed routines if ...")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

15deb433

MAINTAINERS: Add maintainers for Keem Bay OCS HCU driver · 5a5a27b3

Daniele Alessandrelli authored Dec 16, 2020

Add maintainers for the Intel Keem Bay Offload Crypto Subsystem (OCS)
Hash Control Unit (HCU) crypto driver.
Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Acked-by: Declan Murphy <declan.murphy@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5a5a27b3

crypto: keembay-ocs-hcu - Add optional support for sha224 · b46f8036

Daniele Alessandrelli authored Dec 16, 2020

Add optional support of sha224 and hmac(sha224).
Co-developed-by: Declan Murphy <declan.murphy@intel.com>
Signed-off-by: Declan Murphy <declan.murphy@intel.com>
Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b46f8036

crypto: keembay-ocs-hcu - Add HMAC support · ae832e32

Daniele Alessandrelli authored Dec 16, 2020

Add HMAC support to the Keem Bay OCS HCU driver, thus making it provide
the following additional transformations:
- hmac(sha256)
- hmac(sha384)
- hmac(sha512)
- hmac(sm3)

The Keem Bay OCS HCU hardware does not allow "context-switch" for HMAC
operations, i.e., it does not support computing a partial HMAC, save its
state and then continue it later. Therefore, full hardware acceleration
is provided only when possible (e.g., when crypto_ahash_digest() is
called); in all other cases hardware acceleration is only partial (OPAD
and IPAD calculation is done in software, while hashing is hardware
accelerated).
Co-developed-by: Declan Murphy <declan.murphy@intel.com>
Signed-off-by: Declan Murphy <declan.murphy@intel.com>
Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ae832e32

crypto: keembay - Add Keem Bay OCS HCU driver · 472b0444

Declan Murphy authored Dec 16, 2020

Add support for the Hashing Control Unit (HCU) included in the Offload
Crypto Subsystem (OCS) of the Intel Keem Bay SoC, thus enabling
hardware-accelerated hashing on the Keem Bay SoC for the following
algorithms:
- sha256
- sha384
- sha512
- sm3

The driver is composed of two files:

- 'ocs-hcu.c' which interacts with the hardware and abstracts it by
  providing an API following the usual paradigm used in hashing drivers
  / libraries (e.g., hash_init(), hash_update(), hash_final(), etc.).
  NOTE: this API can block and sleep, since completions are used to wait
  for the HW to complete the hashing.

- 'keembay-ocs-hcu-core.c' which exports the functionality provided by
  'ocs-hcu.c' as a ahash crypto driver. The crypto engine is used to
  provide asynchronous behavior. 'keembay-ocs-hcu-core.c' also takes
  care of the DMA mapping of the input sg list.

The driver passes crypto manager self-tests, including the extra tests
(CRYPTO_MANAGER_EXTRA_TESTS=y).
Signed-off-by: Declan Murphy <declan.murphy@intel.com>
Co-developed-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Acked-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

472b0444

dt-bindings: crypto: Add Keem Bay OCS HCU bindings · 33ff6488

Declan Murphy authored Dec 16, 2020

Add device-tree bindings for the Intel Keem Bay Offload Crypto Subsystem
(OCS) Hashing Control Unit (HCU) crypto driver.
Signed-off-by: Declan Murphy <declan.murphy@intel.com>
Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Acked-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

33ff6488

crypto: sun4i-ss - add SPDX header and remove blank lines · 44122cc6

Corentin Labbe authored Dec 14, 2020

This patchs fixes some remaining style issue.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

44122cc6

crypto: sun4i-ss - enabled stats via debugfs · b1f578b8

Corentin Labbe authored Dec 14, 2020

This patch enable to access usage stats for each algorithm.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b1f578b8

crypto: sun4i-ss - fix kmap usage · 9bc3dd24

Corentin Labbe authored Dec 14, 2020

With the recent kmap change, some tests which were conditional on
CONFIG_DEBUG_HIGHMEM now are enabled by default.
This permit to detect a problem in sun4i-ss usage of kmap.

sun4i-ss uses two kmap via sg_miter (one for input, one for output), but
using two kmap at the same time is hard:
"the ordering has to be correct and with sg_miter that's probably hard to get
right." (quoting Tlgx)

So the easiest solution is to never have two sg_miter/kmap open at the same time.
After each use of sg_miter, I store the current index, for being able to
resume sg_miter to the right place.

Fixes: 6298e948 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

9bc3dd24

crypto: sun4i-ss - initialize need_fallback · 4ec8977b

Corentin Labbe authored Dec 14, 2020

The need_fallback is never initialized and seem to be always true at runtime.
So all hardware operations are always bypassed.

Fixes: 0ae1f46c ("crypto: sun4i-ss - fallback when length is not multiple of blocksize")
Cc: <stable@vger.kernel.org>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4ec8977b

crypto: sun4i-ss - handle BigEndian for cipher · 5ab6177f

Corentin Labbe authored Dec 14, 2020

Ciphers produce invalid results on BE.
Key and IV need to be written in LE.

Fixes: 6298e948 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator")
Cc: <stable@vger.kernel.org>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5ab6177f

crypto: sun4i-ss - IV register does not work on A10 and A13 · b756f1c8

Corentin Labbe authored Dec 14, 2020

Allwinner A10 and A13 SoC have a version of the SS which produce
invalid IV in IVx register.

Instead of adding a variant for those, let's convert SS to produce IV
directly from data.
Fixes: 6298e948 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator")
Cc: <stable@vger.kernel.org>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b756f1c8

crypto: sun4i-ss - checking sg length is not sufficient · 7bdcd851

Corentin Labbe authored Dec 14, 2020

The optimized cipher function need length multiple of 4 bytes.
But it get sometimes odd length.
This is due to SG data could be stored with an offset.

So the fix is to check also if the offset is aligned with 4 bytes.
Fixes: 6298e948 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator")
Cc: <stable@vger.kernel.org>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7bdcd851

crypto: sun4i-ss - linearize buffers content must be kept · 58351351

Corentin Labbe authored Dec 14, 2020

When running the non-optimized cipher function, SS produce partial random
output.
This is due to linearize buffers being reseted after each loop.

For preserving stack, instead of moving them back to start of function,
I move them in sun4i_ss_ctx.

Fixes: 8d3bcb99 ("crypto: sun4i-ss - reduce stack usage")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

58351351

crypto: inside-secure - fix platform_get_irq.cocci warnings · 7334a4be

Tian Tao authored Dec 14, 2020

Remove dev_err() messages after platform_get_irq*() failures.
drivers/crypto/inside-secure/safexcel.c: line 1161 is redundant
because platform_get_irq() already prints an error

Generated by: scripts/coccinelle/api/platform_get_irq.cocci
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Acked-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7334a4be

crypto: remove cipher routines from public crypto API · 0eb76ba2

Ard Biesheuvel authored Dec 11, 2020

The cipher routines in the crypto API are mostly intended for templates
implementing skcipher modes generically in software, and shouldn't be
used outside of the crypto subsystem. So move the prototypes and all
related definitions to a new header file under include/crypto/internal.
Also, let's use the new module namespace feature to move the symbol
exports into a new namespace CRYPTO_INTERNAL.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0eb76ba2

chcr_ktls: use AES library for single use cipher · a3b01ffd

Ard Biesheuvel authored Dec 11, 2020

Allocating a cipher via the crypto API only to free it again after using
it to encrypt a single block is unnecessary in cases where the algorithm
is known at compile time. So replace this pattern with a call to the AES
library.

Cc: Ayush Sawal <ayush.sawal@chelsio.com>
Cc: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Cc: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a3b01ffd

crypto: ccree - remove unused including <linux/version.h> · bbfd06c7

Tian Tao authored Dec 11, 2020

Remove including <linux/version.h> that don't need it.
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

bbfd06c7

crypto: sahara - Remove unused .id_table support · c4dc99e1

Fabio Estevam authored Dec 09, 2020

Since 5.10-rc1 i.MX is a devicetree-only platform and the existing
.id_table support in this driver was only useful for old non-devicetree
platforms.

Remove the unused .id_table support.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c4dc99e1