Commit 0f1d6dfe authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.10:

  API:
   - add skcipher walk interface
   - add asynchronous compression (acomp) interface
   - fix algif_aed AIO handling of zero buffer

  Algorithms:
   - fix unaligned access in poly1305
   - fix DRBG output to large buffers

  Drivers:
   - add support for iMX6UL to caam
   - fix givenc descriptors (used by IPsec) in caam
   - accelerated SHA256/SHA512 for ARM64 from OpenSSL
   - add SSE CRCT10DIF and CRC32 to ARM/ARM64
   - add AEAD support to Chelsio chcr
   - add Armada 8K support to omap-rng"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (148 commits)
  crypto: testmgr - fix overlap in chunked tests again
  crypto: arm/crc32 - accelerated support based on x86 SSE implementation
  crypto: arm64/crc32 - accelerated support based on x86 SSE implementation
  crypto: arm/crct10dif - port x86 SSE implementation to ARM
  crypto: arm64/crct10dif - port x86 SSE implementation to arm64
  crypto: testmgr - add/enhance test cases for CRC-T10DIF
  crypto: testmgr - avoid overlap in chunked tests
  crypto: chcr - checking for IS_ERR() instead of NULL
  crypto: caam - check caam_emi_slow instead of re-lookup platform
  crypto: algif_aead - fix AIO handling of zero buffer
  crypto: aes-ce - Make aes_simd_algs static
  crypto: algif_skcipher - set error code when kcalloc fails
  crypto: caam - make aamalg_desc a proper module
  crypto: caam - pass key buffers with typesafe pointers
  crypto: arm64/aes-ce-ccm - Fix AEAD decryption length
  MAINTAINERS: add crypto headers to crypto entry
  crypt: doc - remove misleading mention of async API
  crypto: doc - fix header file name
  crypto: api - fix comment typo
  crypto: skcipher - Add separate walker for AEAD decryption
  ..
parents d05c5f7b 04b46fbd
...@@ -44,12 +44,9 @@ one block while the former can operate on an arbitrary amount of data, ...@@ -44,12 +44,9 @@ one block while the former can operate on an arbitrary amount of data,
subject to block size requirements (i.e., non-stream ciphers can only subject to block size requirements (i.e., non-stream ciphers can only
process multiples of blocks). process multiples of blocks).
Support for hardware crypto devices via an asynchronous interface is
under development.
Here's an example of how to use the API: Here's an example of how to use the API:
#include <crypto/ahash.h> #include <crypto/hash.h>
#include <linux/err.h> #include <linux/err.h>
#include <linux/scatterlist.h> #include <linux/scatterlist.h>
......
...@@ -123,6 +123,9 @@ PROPERTIES ...@@ -123,6 +123,9 @@ PROPERTIES
EXAMPLE EXAMPLE
iMX6QDL/SX requires four clocks
crypto@300000 { crypto@300000 {
compatible = "fsl,sec-v4.0"; compatible = "fsl,sec-v4.0";
fsl,sec-era = <2>; fsl,sec-era = <2>;
...@@ -139,6 +142,23 @@ EXAMPLE ...@@ -139,6 +142,23 @@ EXAMPLE
clock-names = "mem", "aclk", "ipg", "emi_slow"; clock-names = "mem", "aclk", "ipg", "emi_slow";
}; };
iMX6UL does only require three clocks
crypto: caam@2140000 {
compatible = "fsl,sec-v4.0";
#address-cells = <1>;
#size-cells = <1>;
reg = <0x2140000 0x3c000>;
ranges = <0 0x2140000 0x3c000>;
interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&clks IMX6UL_CLK_CAAM_MEM>,
<&clks IMX6UL_CLK_CAAM_ACLK>,
<&clks IMX6UL_CLK_CAAM_IPG>;
clock-names = "mem", "aclk", "ipg";
};
===================================================================== =====================================================================
Job Ring (JR) Node Job Ring (JR) Node
......
OMAP SoC HWRNG Module OMAP SoC and Inside-Secure HWRNG Module
Required properties: Required properties:
...@@ -6,11 +6,13 @@ Required properties: ...@@ -6,11 +6,13 @@ Required properties:
RNG versions: RNG versions:
- "ti,omap2-rng" for OMAP2. - "ti,omap2-rng" for OMAP2.
- "ti,omap4-rng" for OMAP4, OMAP5 and AM33XX. - "ti,omap4-rng" for OMAP4, OMAP5 and AM33XX.
- "inside-secure,safexcel-eip76" for SoCs with EIP76 IP block
Note that these two versions are incompatible. Note that these two versions are incompatible.
- ti,hwmods: Name of the hwmod associated with the RNG module - ti,hwmods: Name of the hwmod associated with the RNG module
- reg : Offset and length of the register set for the module - reg : Offset and length of the register set for the module
- interrupts : the interrupt number for the RNG module. - interrupts : the interrupt number for the RNG module.
Only used for "ti,omap4-rng". Used for "ti,omap4-rng" and "inside-secure,safexcel-eip76"
- clocks: the trng clock source
Example: Example:
/* AM335x */ /* AM335x */
...@@ -20,3 +22,11 @@ rng: rng@48310000 { ...@@ -20,3 +22,11 @@ rng: rng@48310000 {
reg = <0x48310000 0x2000>; reg = <0x48310000 0x2000>;
interrupts = <111>; interrupts = <111>;
}; };
/* SafeXcel IP-76 */
trng: rng@f2760000 {
compatible = "inside-secure,safexcel-eip76";
reg = <0xf2760000 0x7d>;
interrupts = <GIC_SPI 59 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cpm_syscon0 1 25>;
};
...@@ -137,6 +137,7 @@ infineon Infineon Technologies ...@@ -137,6 +137,7 @@ infineon Infineon Technologies
inforce Inforce Computing inforce Inforce Computing
ingenic Ingenic Semiconductor ingenic Ingenic Semiconductor
innolux Innolux Corporation innolux Innolux Corporation
inside-secure INSIDE Secure
intel Intel Corporation intel Intel Corporation
intercontrol Inter Control Group intercontrol Inter Control Group
invensense InvenSense Inc. invensense InvenSense Inc.
......
...@@ -3470,6 +3470,7 @@ F: arch/*/crypto/ ...@@ -3470,6 +3470,7 @@ F: arch/*/crypto/
F: crypto/ F: crypto/
F: drivers/crypto/ F: drivers/crypto/
F: include/crypto/ F: include/crypto/
F: include/linux/crypto*
CRYPTOGRAPHIC RANDOM NUMBER GENERATOR CRYPTOGRAPHIC RANDOM NUMBER GENERATOR
M: Neil Horman <nhorman@tuxdriver.com> M: Neil Horman <nhorman@tuxdriver.com>
...@@ -5086,6 +5087,14 @@ F: include/linux/fb.h ...@@ -5086,6 +5087,14 @@ F: include/linux/fb.h
F: include/uapi/video/ F: include/uapi/video/
F: include/uapi/linux/fb.h F: include/uapi/linux/fb.h
FREESCALE CAAM (Cryptographic Acceleration and Assurance Module) DRIVER
M: Horia Geantă <horia.geanta@nxp.com>
M: Dan Douglass <dan.douglass@nxp.com>
L: linux-crypto@vger.kernel.org
S: Maintained
F: drivers/crypto/caam/
F: Documentation/devicetree/bindings/crypto/fsl-sec4.txt
FREESCALE DIU FRAMEBUFFER DRIVER FREESCALE DIU FRAMEBUFFER DRIVER
M: Timur Tabi <timur@tabi.org> M: Timur Tabi <timur@tabi.org>
L: linux-fbdev@vger.kernel.org L: linux-fbdev@vger.kernel.org
......
...@@ -88,9 +88,9 @@ config CRYPTO_AES_ARM ...@@ -88,9 +88,9 @@ config CRYPTO_AES_ARM
config CRYPTO_AES_ARM_BS config CRYPTO_AES_ARM_BS
tristate "Bit sliced AES using NEON instructions" tristate "Bit sliced AES using NEON instructions"
depends on KERNEL_MODE_NEON depends on KERNEL_MODE_NEON
select CRYPTO_ALGAPI
select CRYPTO_AES_ARM select CRYPTO_AES_ARM
select CRYPTO_ABLK_HELPER select CRYPTO_BLKCIPHER
select CRYPTO_SIMD
help help
Use a faster and more secure NEON based implementation of AES in CBC, Use a faster and more secure NEON based implementation of AES in CBC,
CTR and XTS modes CTR and XTS modes
...@@ -104,8 +104,8 @@ config CRYPTO_AES_ARM_BS ...@@ -104,8 +104,8 @@ config CRYPTO_AES_ARM_BS
config CRYPTO_AES_ARM_CE config CRYPTO_AES_ARM_CE
tristate "Accelerated AES using ARMv8 Crypto Extensions" tristate "Accelerated AES using ARMv8 Crypto Extensions"
depends on KERNEL_MODE_NEON depends on KERNEL_MODE_NEON
select CRYPTO_ALGAPI select CRYPTO_BLKCIPHER
select CRYPTO_ABLK_HELPER select CRYPTO_SIMD
help help
Use an implementation of AES in CBC, CTR and XTS modes that uses Use an implementation of AES in CBC, CTR and XTS modes that uses
ARMv8 Crypto Extensions ARMv8 Crypto Extensions
...@@ -120,4 +120,14 @@ config CRYPTO_GHASH_ARM_CE ...@@ -120,4 +120,14 @@ config CRYPTO_GHASH_ARM_CE
that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64) that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64)
that is part of the ARMv8 Crypto Extensions that is part of the ARMv8 Crypto Extensions
config CRYPTO_CRCT10DIF_ARM_CE
tristate "CRCT10DIF digest algorithm using PMULL instructions"
depends on KERNEL_MODE_NEON && CRC_T10DIF
select CRYPTO_HASH
config CRYPTO_CRC32_ARM_CE
tristate "CRC32(C) digest algorithm using CRC and/or PMULL instructions"
depends on KERNEL_MODE_NEON && CRC32
select CRYPTO_HASH
endif endif
...@@ -13,6 +13,8 @@ ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o ...@@ -13,6 +13,8 @@ ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o ce-obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o ce-obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM_CE) += crct10dif-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_CRC32_ARM_CE) += crc32-arm-ce.o
ifneq ($(ce-obj-y)$(ce-obj-m),) ifneq ($(ce-obj-y)$(ce-obj-m),)
ifeq ($(call as-instr,.fpu crypto-neon-fp-armv8,y,n),y) ifeq ($(call as-instr,.fpu crypto-neon-fp-armv8,y,n),y)
...@@ -36,6 +38,8 @@ sha1-arm-ce-y := sha1-ce-core.o sha1-ce-glue.o ...@@ -36,6 +38,8 @@ sha1-arm-ce-y := sha1-ce-core.o sha1-ce-glue.o
sha2-arm-ce-y := sha2-ce-core.o sha2-ce-glue.o sha2-arm-ce-y := sha2-ce-core.o sha2-ce-glue.o
aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o
ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
quiet_cmd_perl = PERL $@ quiet_cmd_perl = PERL $@
cmd_perl = $(PERL) $(<) > $(@) cmd_perl = $(PERL) $(<) > $(@)
......
This diff is collapsed.
This diff is collapsed.
/*
* Accelerated CRC32(C) using ARM CRC, NEON and Crypto Extensions instructions
*
* Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
/* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see http://www.gnu.org/licenses
*
* Please visit http://www.xyratex.com/contact if you need additional
* information or have any questions.
*
* GPL HEADER END
*/
/*
* Copyright 2012 Xyratex Technology Limited
*
* Using hardware provided PCLMULQDQ instruction to accelerate the CRC32
* calculation.
* CRC32 polynomial:0x04c11db7(BE)/0xEDB88320(LE)
* PCLMULQDQ is a new instruction in Intel SSE4.2, the reference can be found
* at:
* http://www.intel.com/products/processor/manuals/
* Intel(R) 64 and IA-32 Architectures Software Developer's Manual
* Volume 2B: Instruction Set Reference, N-Z
*
* Authors: Gregory Prestas <Gregory_Prestas@us.xyratex.com>
* Alexander Boyko <Alexander_Boyko@xyratex.com>
*/
#include <linux/linkage.h>
#include <asm/assembler.h>
.text
.align 6
.arch armv8-a
.arch_extension crc
.fpu crypto-neon-fp-armv8
.Lcrc32_constants:
/*
* [x4*128+32 mod P(x) << 32)]' << 1 = 0x154442bd4
* #define CONSTANT_R1 0x154442bd4LL
*
* [(x4*128-32 mod P(x) << 32)]' << 1 = 0x1c6e41596
* #define CONSTANT_R2 0x1c6e41596LL
*/
.quad 0x0000000154442bd4
.quad 0x00000001c6e41596
/*
* [(x128+32 mod P(x) << 32)]' << 1 = 0x1751997d0
* #define CONSTANT_R3 0x1751997d0LL
*
* [(x128-32 mod P(x) << 32)]' << 1 = 0x0ccaa009e
* #define CONSTANT_R4 0x0ccaa009eLL
*/
.quad 0x00000001751997d0
.quad 0x00000000ccaa009e
/*
* [(x64 mod P(x) << 32)]' << 1 = 0x163cd6124
* #define CONSTANT_R5 0x163cd6124LL
*/
.quad 0x0000000163cd6124
.quad 0x00000000FFFFFFFF
/*
* #define CRCPOLY_TRUE_LE_FULL 0x1DB710641LL
*
* Barrett Reduction constant (u64`) = u` = (x**64 / P(x))`
* = 0x1F7011641LL
* #define CONSTANT_RU 0x1F7011641LL
*/
.quad 0x00000001DB710641
.quad 0x00000001F7011641
.Lcrc32c_constants:
.quad 0x00000000740eef02
.quad 0x000000009e4addf8
.quad 0x00000000f20c0dfe
.quad 0x000000014cd00bd6
.quad 0x00000000dd45aab8
.quad 0x00000000FFFFFFFF
.quad 0x0000000105ec76f0
.quad 0x00000000dea713f1
dCONSTANTl .req d0
dCONSTANTh .req d1
qCONSTANT .req q0
BUF .req r0
LEN .req r1
CRC .req r2
qzr .req q9
/**
* Calculate crc32
* BUF - buffer
* LEN - sizeof buffer (multiple of 16 bytes), LEN should be > 63
* CRC - initial crc32
* return %eax crc32
* uint crc32_pmull_le(unsigned char const *buffer,
* size_t len, uint crc32)
*/
ENTRY(crc32_pmull_le)
adr r3, .Lcrc32_constants
b 0f
ENTRY(crc32c_pmull_le)
adr r3, .Lcrc32c_constants
0: bic LEN, LEN, #15
vld1.8 {q1-q2}, [BUF, :128]!
vld1.8 {q3-q4}, [BUF, :128]!
vmov.i8 qzr, #0
vmov.i8 qCONSTANT, #0
vmov dCONSTANTl[0], CRC
veor.8 d2, d2, dCONSTANTl
sub LEN, LEN, #0x40
cmp LEN, #0x40
blt less_64
vld1.64 {qCONSTANT}, [r3]
loop_64: /* 64 bytes Full cache line folding */
sub LEN, LEN, #0x40
vmull.p64 q5, d3, dCONSTANTh
vmull.p64 q6, d5, dCONSTANTh
vmull.p64 q7, d7, dCONSTANTh
vmull.p64 q8, d9, dCONSTANTh
vmull.p64 q1, d2, dCONSTANTl
vmull.p64 q2, d4, dCONSTANTl
vmull.p64 q3, d6, dCONSTANTl
vmull.p64 q4, d8, dCONSTANTl
veor.8 q1, q1, q5
vld1.8 {q5}, [BUF, :128]!
veor.8 q2, q2, q6
vld1.8 {q6}, [BUF, :128]!
veor.8 q3, q3, q7
vld1.8 {q7}, [BUF, :128]!
veor.8 q4, q4, q8
vld1.8 {q8}, [BUF, :128]!
veor.8 q1, q1, q5
veor.8 q2, q2, q6
veor.8 q3, q3, q7
veor.8 q4, q4, q8
cmp LEN, #0x40
bge loop_64
less_64: /* Folding cache line into 128bit */
vldr dCONSTANTl, [r3, #16]
vldr dCONSTANTh, [r3, #24]
vmull.p64 q5, d3, dCONSTANTh
vmull.p64 q1, d2, dCONSTANTl
veor.8 q1, q1, q5
veor.8 q1, q1, q2
vmull.p64 q5, d3, dCONSTANTh
vmull.p64 q1, d2, dCONSTANTl
veor.8 q1, q1, q5
veor.8 q1, q1, q3
vmull.p64 q5, d3, dCONSTANTh
vmull.p64 q1, d2, dCONSTANTl
veor.8 q1, q1, q5
veor.8 q1, q1, q4
teq LEN, #0
beq fold_64
loop_16: /* Folding rest buffer into 128bit */
subs LEN, LEN, #0x10
vld1.8 {q2}, [BUF, :128]!
vmull.p64 q5, d3, dCONSTANTh
vmull.p64 q1, d2, dCONSTANTl
veor.8 q1, q1, q5
veor.8 q1, q1, q2
bne loop_16
fold_64:
/* perform the last 64 bit fold, also adds 32 zeroes
* to the input stream */
vmull.p64 q2, d2, dCONSTANTh
vext.8 q1, q1, qzr, #8
veor.8 q1, q1, q2
/* final 32-bit fold */
vldr dCONSTANTl, [r3, #32]
vldr d6, [r3, #40]
vmov.i8 d7, #0
vext.8 q2, q1, qzr, #4
vand.8 d2, d2, d6
vmull.p64 q1, d2, dCONSTANTl
veor.8 q1, q1, q2
/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
vldr dCONSTANTl, [r3, #48]
vldr dCONSTANTh, [r3, #56]
vand.8 q2, q1, q3
vext.8 q2, qzr, q2, #8
vmull.p64 q2, d5, dCONSTANTh
vand.8 q2, q2, q3
vmull.p64 q2, d4, dCONSTANTl
veor.8 q1, q1, q2
vmov r0, s5
bx lr
ENDPROC(crc32_pmull_le)
ENDPROC(crc32c_pmull_le)
.macro __crc32, c
subs ip, r2, #8
bmi .Ltail\c
tst r1, #3
bne .Lunaligned\c
teq ip, #0
.Laligned8\c:
ldrd r2, r3, [r1], #8
ARM_BE8(rev r2, r2 )
ARM_BE8(rev r3, r3 )
crc32\c\()w r0, r0, r2
crc32\c\()w r0, r0, r3
bxeq lr
subs ip, ip, #8
bpl .Laligned8\c
.Ltail\c:
tst ip, #4
beq 2f
ldr r3, [r1], #4
ARM_BE8(rev r3, r3 )
crc32\c\()w r0, r0, r3
2: tst ip, #2
beq 1f
ldrh r3, [r1], #2
ARM_BE8(rev16 r3, r3 )
crc32\c\()h r0, r0, r3
1: tst ip, #1
bxeq lr
ldrb r3, [r1]
crc32\c\()b r0, r0, r3
bx lr
.Lunaligned\c:
tst r1, #1
beq 2f
ldrb r3, [r1], #1
subs r2, r2, #1
crc32\c\()b r0, r0, r3
tst r1, #2
beq 0f
2: ldrh r3, [r1], #2
subs r2, r2, #2
ARM_BE8(rev16 r3, r3 )
crc32\c\()h r0, r0, r3
0: subs ip, r2, #8
bpl .Laligned8\c
b .Ltail\c
.endm
.align 5
ENTRY(crc32_armv8_le)
__crc32
ENDPROC(crc32_armv8_le)
.align 5
ENTRY(crc32c_armv8_le)
__crc32 c
ENDPROC(crc32c_armv8_le)
/*
* Accelerated CRC32(C) using ARM CRC, NEON and Crypto Extensions instructions
*
* Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/crc32.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/string.h>
#include <crypto/internal/hash.h>
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include <asm/unaligned.h>
#define PMULL_MIN_LEN 64L /* minimum size of buffer
* for crc32_pmull_le_16 */
#define SCALE_F 16L /* size of NEON register */
asmlinkage u32 crc32_pmull_le(const u8 buf[], u32 len, u32 init_crc);
asmlinkage u32 crc32_armv8_le(u32 init_crc, const u8 buf[], u32 len);
asmlinkage u32 crc32c_pmull_le(const u8 buf[], u32 len, u32 init_crc);
asmlinkage u32 crc32c_armv8_le(u32 init_crc, const u8 buf[], u32 len);
static u32 (*fallback_crc32)(u32 init_crc, const u8 buf[], u32 len);
static u32 (*fallback_crc32c)(u32 init_crc, const u8 buf[], u32 len);
static int crc32_cra_init(struct crypto_tfm *tfm)
{
u32 *key = crypto_tfm_ctx(tfm);
*key = 0;
return 0;
}
static int crc32c_cra_init(struct crypto_tfm *tfm)
{
u32 *key = crypto_tfm_ctx(tfm);
*key = ~0;
return 0;
}
static int crc32_setkey(struct crypto_shash *hash, const u8 *key,
unsigned int keylen)
{
u32 *mctx = crypto_shash_ctx(hash);
if (keylen != sizeof(u32)) {
crypto_shash_set_flags(hash, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
*mctx = le32_to_cpup((__le32 *)key);
return 0;
}
static int crc32_init(struct shash_desc *desc)
{
u32 *mctx = crypto_shash_ctx(desc->tfm);
u32 *crc = shash_desc_ctx(desc);
*crc = *mctx;
return 0;
}
static int crc32_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u32 *crc = shash_desc_ctx(desc);
*crc = crc32_armv8_le(*crc, data, length);
return 0;
}
static int crc32c_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u32 *crc = shash_desc_ctx(desc);
*crc = crc32c_armv8_le(*crc, data, length);
return 0;
}
static int crc32_final(struct shash_desc *desc, u8 *out)
{
u32 *crc = shash_desc_ctx(desc);
put_unaligned_le32(*crc, out);
return 0;
}
static int crc32c_final(struct shash_desc *desc, u8 *out)
{
u32 *crc = shash_desc_ctx(desc);
put_unaligned_le32(~*crc, out);
return 0;
}
static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u32 *crc = shash_desc_ctx(desc);
unsigned int l;
if (may_use_simd()) {
if ((u32)data % SCALE_F) {
l = min_t(u32, length, SCALE_F - ((u32)data % SCALE_F));
*crc = fallback_crc32(*crc, data, l);
data += l;
length -= l;
}
if (length >= PMULL_MIN_LEN) {
l = round_down(length, SCALE_F);
kernel_neon_begin();
*crc = crc32_pmull_le(data, l, *crc);
kernel_neon_end();
data += l;
length -= l;
}
}
if (length > 0)
*crc = fallback_crc32(*crc, data, length);
return 0;
}
static int crc32c_pmull_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u32 *crc = shash_desc_ctx(desc);
unsigned int l;
if (may_use_simd()) {
if ((u32)data % SCALE_F) {
l = min_t(u32, length, SCALE_F - ((u32)data % SCALE_F));
*crc = fallback_crc32c(*crc, data, l);
data += l;
length -= l;
}
if (length >= PMULL_MIN_LEN) {
l = round_down(length, SCALE_F);
kernel_neon_begin();
*crc = crc32c_pmull_le(data, l, *crc);
kernel_neon_end();
data += l;
length -= l;
}
}
if (length > 0)
*crc = fallback_crc32c(*crc, data, length);
return 0;
}
static struct shash_alg crc32_pmull_algs[] = { {
.setkey = crc32_setkey,
.init = crc32_init,
.update = crc32_update,
.final = crc32_final,
.descsize = sizeof(u32),
.digestsize = sizeof(u32),
.base.cra_ctxsize = sizeof(u32),
.base.cra_init = crc32_cra_init,
.base.cra_name = "crc32",
.base.cra_driver_name = "crc32-arm-ce",
.base.cra_priority = 200,
.base.cra_blocksize = 1,
.base.cra_module = THIS_MODULE,
}, {
.setkey = crc32_setkey,
.init = crc32_init,
.update = crc32c_update,
.final = crc32c_final,
.descsize = sizeof(u32),
.digestsize = sizeof(u32),
.base.cra_ctxsize = sizeof(u32),
.base.cra_init = crc32c_cra_init,
.base.cra_name = "crc32c",
.base.cra_driver_name = "crc32c-arm-ce",
.base.cra_priority = 200,
.base.cra_blocksize = 1,
.base.cra_module = THIS_MODULE,
} };
static int __init crc32_pmull_mod_init(void)
{
if (elf_hwcap2 & HWCAP2_PMULL) {
crc32_pmull_algs[0].update = crc32_pmull_update;
crc32_pmull_algs[1].update = crc32c_pmull_update;
if (elf_hwcap2 & HWCAP2_CRC32) {
fallback_crc32 = crc32_armv8_le;
fallback_crc32c = crc32c_armv8_le;
} else {
fallback_crc32 = crc32_le;
fallback_crc32c = __crc32c_le;
}
} else if (!(elf_hwcap2 & HWCAP2_CRC32)) {
return -ENODEV;
}
return crypto_register_shashes(crc32_pmull_algs,
ARRAY_SIZE(crc32_pmull_algs));
}
static void __exit crc32_pmull_mod_exit(void)
{
crypto_unregister_shashes(crc32_pmull_algs,
ARRAY_SIZE(crc32_pmull_algs));
}
module_init(crc32_pmull_mod_init);
module_exit(crc32_pmull_mod_exit);
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("crc32");
MODULE_ALIAS_CRYPTO("crc32c");
This diff is collapsed.
/*
* Accelerated CRC-T10DIF using ARM NEON and Crypto Extensions instructions
*
* Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/crc-t10dif.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/string.h>
#include <crypto/internal/hash.h>
#include <asm/neon.h>
#include <asm/simd.h>
#define CRC_T10DIF_PMULL_CHUNK_SIZE 16U
asmlinkage u16 crc_t10dif_pmull(u16 init_crc, const u8 buf[], u32 len);
static int crct10dif_init(struct shash_desc *desc)
{
u16 *crc = shash_desc_ctx(desc);
*crc = 0;
return 0;
}
static int crct10dif_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u16 *crc = shash_desc_ctx(desc);
unsigned int l;
if (!may_use_simd()) {
*crc = crc_t10dif_generic(*crc, data, length);
} else {
if (unlikely((u32)data % CRC_T10DIF_PMULL_CHUNK_SIZE)) {
l = min_t(u32, length, CRC_T10DIF_PMULL_CHUNK_SIZE -
((u32)data % CRC_T10DIF_PMULL_CHUNK_SIZE));
*crc = crc_t10dif_generic(*crc, data, l);
length -= l;
data += l;
}
if (length > 0) {
kernel_neon_begin();
*crc = crc_t10dif_pmull(*crc, data, length);
kernel_neon_end();
}
}
return 0;
}
static int crct10dif_final(struct shash_desc *desc, u8 *out)
{
u16 *crc = shash_desc_ctx(desc);
*(u16 *)out = *crc;
return 0;
}
static struct shash_alg crc_t10dif_alg = {
.digestsize = CRC_T10DIF_DIGEST_SIZE,
.init = crct10dif_init,
.update = crct10dif_update,
.final = crct10dif_final,
.descsize = CRC_T10DIF_DIGEST_SIZE,
.base.cra_name = "crct10dif",
.base.cra_driver_name = "crct10dif-arm-ce",
.base.cra_priority = 200,
.base.cra_blocksize = CRC_T10DIF_BLOCK_SIZE,
.base.cra_module = THIS_MODULE,
};
static int __init crc_t10dif_mod_init(void)
{
if (!(elf_hwcap2 & HWCAP2_PMULL))
return -ENODEV;
return crypto_register_shash(&crc_t10dif_alg);
}
static void __exit crc_t10dif_mod_exit(void)
{
crypto_unregister_shash(&crc_t10dif_alg);
}
module_init(crc_t10dif_mod_init);
module_exit(crc_t10dif_mod_exit);
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("crct10dif");
...@@ -164,6 +164,14 @@ cpm_i2c1: i2c@701100 { ...@@ -164,6 +164,14 @@ cpm_i2c1: i2c@701100 {
clocks = <&cpm_syscon0 1 21>; clocks = <&cpm_syscon0 1 21>;
status = "disabled"; status = "disabled";
}; };
cpm_trng: trng@760000 {
compatible = "marvell,armada-8k-rng", "inside-secure,safexcel-eip76";
reg = <0x760000 0x7d>;
interrupts = <GIC_SPI 59 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cpm_syscon0 1 25>;
status = "okay";
};
}; };
cpm_pcie0: pcie@f2600000 { cpm_pcie0: pcie@f2600000 {
......
...@@ -164,6 +164,14 @@ cps_i2c1: i2c@701100 { ...@@ -164,6 +164,14 @@ cps_i2c1: i2c@701100 {
clocks = <&cps_syscon0 1 21>; clocks = <&cps_syscon0 1 21>;
status = "disabled"; status = "disabled";
}; };
cps_trng: trng@760000 {
compatible = "marvell,armada-8k-rng", "inside-secure,safexcel-eip76";
reg = <0x760000 0x7d>;
interrupts = <GIC_SPI 312 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cps_syscon0 1 25>;
status = "okay";
};
}; };
cps_pcie0: pcie@f4600000 { cps_pcie0: pcie@f4600000 {
......
sha256-core.S
sha512-core.S
...@@ -8,6 +8,14 @@ menuconfig ARM64_CRYPTO ...@@ -8,6 +8,14 @@ menuconfig ARM64_CRYPTO
if ARM64_CRYPTO if ARM64_CRYPTO
config CRYPTO_SHA256_ARM64
tristate "SHA-224/SHA-256 digest algorithm for arm64"
select CRYPTO_HASH
config CRYPTO_SHA512_ARM64
tristate "SHA-384/SHA-512 digest algorithm for arm64"
select CRYPTO_HASH
config CRYPTO_SHA1_ARM64_CE config CRYPTO_SHA1_ARM64_CE
tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)" tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
depends on ARM64 && KERNEL_MODE_NEON depends on ARM64 && KERNEL_MODE_NEON
...@@ -23,6 +31,16 @@ config CRYPTO_GHASH_ARM64_CE ...@@ -23,6 +31,16 @@ config CRYPTO_GHASH_ARM64_CE
depends on ARM64 && KERNEL_MODE_NEON depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH select CRYPTO_HASH
config CRYPTO_CRCT10DIF_ARM64_CE
tristate "CRCT10DIF digest algorithm using PMULL instructions"
depends on KERNEL_MODE_NEON && CRC_T10DIF
select CRYPTO_HASH
config CRYPTO_CRC32_ARM64_CE
tristate "CRC32 and CRC32C digest algorithms using PMULL instructions"
depends on KERNEL_MODE_NEON && CRC32
select CRYPTO_HASH
config CRYPTO_AES_ARM64_CE config CRYPTO_AES_ARM64_CE
tristate "AES core cipher using ARMv8 Crypto Extensions" tristate "AES core cipher using ARMv8 Crypto Extensions"
depends on ARM64 && KERNEL_MODE_NEON depends on ARM64 && KERNEL_MODE_NEON
...@@ -40,17 +58,18 @@ config CRYPTO_AES_ARM64_CE_BLK ...@@ -40,17 +58,18 @@ config CRYPTO_AES_ARM64_CE_BLK
depends on ARM64 && KERNEL_MODE_NEON depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER select CRYPTO_BLKCIPHER
select CRYPTO_AES_ARM64_CE select CRYPTO_AES_ARM64_CE
select CRYPTO_ABLK_HELPER select CRYPTO_SIMD
config CRYPTO_AES_ARM64_NEON_BLK config CRYPTO_AES_ARM64_NEON_BLK
tristate "AES in ECB/CBC/CTR/XTS modes using NEON instructions" tristate "AES in ECB/CBC/CTR/XTS modes using NEON instructions"
depends on ARM64 && KERNEL_MODE_NEON depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER select CRYPTO_BLKCIPHER
select CRYPTO_AES select CRYPTO_AES
select CRYPTO_ABLK_HELPER select CRYPTO_SIMD
config CRYPTO_CRC32_ARM64 config CRYPTO_CRC32_ARM64
tristate "CRC32 and CRC32C using optional ARMv8 instructions" tristate "CRC32 and CRC32C using optional ARMv8 instructions"
depends on ARM64 depends on ARM64
select CRYPTO_HASH select CRYPTO_HASH
endif endif
...@@ -17,6 +17,12 @@ sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o ...@@ -17,6 +17,12 @@ sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM64_CE) += crct10dif-ce.o
crct10dif-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o
crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o
obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
...@@ -29,6 +35,12 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o ...@@ -29,6 +35,12 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
aes-neon-blk-y := aes-glue-neon.o aes-neon.o aes-neon-blk-y := aes-glue-neon.o aes-neon.o
obj-$(CONFIG_CRYPTO_SHA256_ARM64) += sha256-arm64.o
sha256-arm64-y := sha256-glue.o sha256-core.o
obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
sha512-arm64-y := sha512-glue.o sha512-core.o
AFLAGS_aes-ce.o := -DINTERLEAVE=4 AFLAGS_aes-ce.o := -DINTERLEAVE=4
AFLAGS_aes-neon.o := -DINTERLEAVE=4 AFLAGS_aes-neon.o := -DINTERLEAVE=4
...@@ -40,3 +52,14 @@ CFLAGS_crc32-arm64.o := -mcpu=generic+crc ...@@ -40,3 +52,14 @@ CFLAGS_crc32-arm64.o := -mcpu=generic+crc
$(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
$(call if_changed_rule,cc_o_c) $(call if_changed_rule,cc_o_c)
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)
$(src)/sha256-core.S_shipped: $(src)/sha512-armv8.pl
$(call cmd,perlasm)
$(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl
$(call cmd,perlasm)
.PRECIOUS: $(obj)/sha256-core.S $(obj)/sha512-core.S
...@@ -9,6 +9,7 @@ ...@@ -9,6 +9,7 @@
*/ */
#include <linux/linkage.h> #include <linux/linkage.h>
#include <asm/assembler.h>
.text .text
.arch armv8-a+crypto .arch armv8-a+crypto
...@@ -19,7 +20,7 @@ ...@@ -19,7 +20,7 @@
*/ */
ENTRY(ce_aes_ccm_auth_data) ENTRY(ce_aes_ccm_auth_data)
ldr w8, [x3] /* leftover from prev round? */ ldr w8, [x3] /* leftover from prev round? */
ld1 {v0.2d}, [x0] /* load mac */ ld1 {v0.16b}, [x0] /* load mac */
cbz w8, 1f cbz w8, 1f
sub w8, w8, #16 sub w8, w8, #16
eor v1.16b, v1.16b, v1.16b eor v1.16b, v1.16b, v1.16b
...@@ -31,7 +32,7 @@ ENTRY(ce_aes_ccm_auth_data) ...@@ -31,7 +32,7 @@ ENTRY(ce_aes_ccm_auth_data)
beq 8f /* out of input? */ beq 8f /* out of input? */
cbnz w8, 0b cbnz w8, 0b
eor v0.16b, v0.16b, v1.16b eor v0.16b, v0.16b, v1.16b
1: ld1 {v3.2d}, [x4] /* load first round key */ 1: ld1 {v3.16b}, [x4] /* load first round key */
prfm pldl1strm, [x1] prfm pldl1strm, [x1]
cmp w5, #12 /* which key size? */ cmp w5, #12 /* which key size? */
add x6, x4, #16 add x6, x4, #16
...@@ -41,17 +42,17 @@ ENTRY(ce_aes_ccm_auth_data) ...@@ -41,17 +42,17 @@ ENTRY(ce_aes_ccm_auth_data)
mov v5.16b, v3.16b mov v5.16b, v3.16b
b 4f b 4f
2: mov v4.16b, v3.16b 2: mov v4.16b, v3.16b
ld1 {v5.2d}, [x6], #16 /* load 2nd round key */ ld1 {v5.16b}, [x6], #16 /* load 2nd round key */
3: aese v0.16b, v4.16b 3: aese v0.16b, v4.16b
aesmc v0.16b, v0.16b aesmc v0.16b, v0.16b
4: ld1 {v3.2d}, [x6], #16 /* load next round key */ 4: ld1 {v3.16b}, [x6], #16 /* load next round key */
aese v0.16b, v5.16b aese v0.16b, v5.16b
aesmc v0.16b, v0.16b aesmc v0.16b, v0.16b
5: ld1 {v4.2d}, [x6], #16 /* load next round key */ 5: ld1 {v4.16b}, [x6], #16 /* load next round key */
subs w7, w7, #3 subs w7, w7, #3
aese v0.16b, v3.16b aese v0.16b, v3.16b
aesmc v0.16b, v0.16b aesmc v0.16b, v0.16b
ld1 {v5.2d}, [x6], #16 /* load next round key */ ld1 {v5.16b}, [x6], #16 /* load next round key */
bpl 3b bpl 3b
aese v0.16b, v4.16b aese v0.16b, v4.16b
subs w2, w2, #16 /* last data? */ subs w2, w2, #16 /* last data? */
...@@ -60,7 +61,7 @@ ENTRY(ce_aes_ccm_auth_data) ...@@ -60,7 +61,7 @@ ENTRY(ce_aes_ccm_auth_data)
ld1 {v1.16b}, [x1], #16 /* load next input block */ ld1 {v1.16b}, [x1], #16 /* load next input block */
eor v0.16b, v0.16b, v1.16b /* xor with mac */ eor v0.16b, v0.16b, v1.16b /* xor with mac */
bne 1b bne 1b
6: st1 {v0.2d}, [x0] /* store mac */ 6: st1 {v0.16b}, [x0] /* store mac */
beq 10f beq 10f
adds w2, w2, #16 adds w2, w2, #16
beq 10f beq 10f
...@@ -79,7 +80,7 @@ ENTRY(ce_aes_ccm_auth_data) ...@@ -79,7 +80,7 @@ ENTRY(ce_aes_ccm_auth_data)
adds w7, w7, #1 adds w7, w7, #1
bne 9b bne 9b
eor v0.16b, v0.16b, v1.16b eor v0.16b, v0.16b, v1.16b
st1 {v0.2d}, [x0] st1 {v0.16b}, [x0]
10: str w8, [x3] 10: str w8, [x3]
ret ret
ENDPROC(ce_aes_ccm_auth_data) ENDPROC(ce_aes_ccm_auth_data)
...@@ -89,27 +90,27 @@ ENDPROC(ce_aes_ccm_auth_data) ...@@ -89,27 +90,27 @@ ENDPROC(ce_aes_ccm_auth_data)
* u32 rounds); * u32 rounds);
*/ */
ENTRY(ce_aes_ccm_final) ENTRY(ce_aes_ccm_final)
ld1 {v3.2d}, [x2], #16 /* load first round key */ ld1 {v3.16b}, [x2], #16 /* load first round key */
ld1 {v0.2d}, [x0] /* load mac */ ld1 {v0.16b}, [x0] /* load mac */
cmp w3, #12 /* which key size? */ cmp w3, #12 /* which key size? */
sub w3, w3, #2 /* modified # of rounds */ sub w3, w3, #2 /* modified # of rounds */
ld1 {v1.2d}, [x1] /* load 1st ctriv */ ld1 {v1.16b}, [x1] /* load 1st ctriv */
bmi 0f bmi 0f
bne 3f bne 3f
mov v5.16b, v3.16b mov v5.16b, v3.16b
b 2f b 2f
0: mov v4.16b, v3.16b 0: mov v4.16b, v3.16b
1: ld1 {v5.2d}, [x2], #16 /* load next round key */ 1: ld1 {v5.16b}, [x2], #16 /* load next round key */
aese v0.16b, v4.16b aese v0.16b, v4.16b
aesmc v0.16b, v0.16b aesmc v0.16b, v0.16b
aese v1.16b, v4.16b aese v1.16b, v4.16b
aesmc v1.16b, v1.16b aesmc v1.16b, v1.16b
2: ld1 {v3.2d}, [x2], #16 /* load next round key */ 2: ld1 {v3.16b}, [x2], #16 /* load next round key */
aese v0.16b, v5.16b aese v0.16b, v5.16b
aesmc v0.16b, v0.16b aesmc v0.16b, v0.16b
aese v1.16b, v5.16b aese v1.16b, v5.16b
aesmc v1.16b, v1.16b aesmc v1.16b, v1.16b
3: ld1 {v4.2d}, [x2], #16 /* load next round key */ 3: ld1 {v4.16b}, [x2], #16 /* load next round key */
subs w3, w3, #3 subs w3, w3, #3
aese v0.16b, v3.16b aese v0.16b, v3.16b
aesmc v0.16b, v0.16b aesmc v0.16b, v0.16b
...@@ -120,47 +121,47 @@ ENTRY(ce_aes_ccm_final) ...@@ -120,47 +121,47 @@ ENTRY(ce_aes_ccm_final)
aese v1.16b, v4.16b aese v1.16b, v4.16b
/* final round key cancels out */ /* final round key cancels out */
eor v0.16b, v0.16b, v1.16b /* en-/decrypt the mac */ eor v0.16b, v0.16b, v1.16b /* en-/decrypt the mac */
st1 {v0.2d}, [x0] /* store result */ st1 {v0.16b}, [x0] /* store result */
ret ret
ENDPROC(ce_aes_ccm_final) ENDPROC(ce_aes_ccm_final)
.macro aes_ccm_do_crypt,enc .macro aes_ccm_do_crypt,enc
ldr x8, [x6, #8] /* load lower ctr */ ldr x8, [x6, #8] /* load lower ctr */
ld1 {v0.2d}, [x5] /* load mac */ ld1 {v0.16b}, [x5] /* load mac */
rev x8, x8 /* keep swabbed ctr in reg */ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
0: /* outer loop */ 0: /* outer loop */
ld1 {v1.1d}, [x6] /* load upper ctr */ ld1 {v1.8b}, [x6] /* load upper ctr */
prfm pldl1strm, [x1] prfm pldl1strm, [x1]
add x8, x8, #1 add x8, x8, #1
rev x9, x8 rev x9, x8
cmp w4, #12 /* which key size? */ cmp w4, #12 /* which key size? */
sub w7, w4, #2 /* get modified # of rounds */ sub w7, w4, #2 /* get modified # of rounds */
ins v1.d[1], x9 /* no carry in lower ctr */ ins v1.d[1], x9 /* no carry in lower ctr */
ld1 {v3.2d}, [x3] /* load first round key */ ld1 {v3.16b}, [x3] /* load first round key */
add x10, x3, #16 add x10, x3, #16
bmi 1f bmi 1f
bne 4f bne 4f
mov v5.16b, v3.16b mov v5.16b, v3.16b
b 3f b 3f
1: mov v4.16b, v3.16b 1: mov v4.16b, v3.16b
ld1 {v5.2d}, [x10], #16 /* load 2nd round key */ ld1 {v5.16b}, [x10], #16 /* load 2nd round key */
2: /* inner loop: 3 rounds, 2x interleaved */ 2: /* inner loop: 3 rounds, 2x interleaved */
aese v0.16b, v4.16b aese v0.16b, v4.16b
aesmc v0.16b, v0.16b aesmc v0.16b, v0.16b
aese v1.16b, v4.16b aese v1.16b, v4.16b
aesmc v1.16b, v1.16b aesmc v1.16b, v1.16b
3: ld1 {v3.2d}, [x10], #16 /* load next round key */ 3: ld1 {v3.16b}, [x10], #16 /* load next round key */
aese v0.16b, v5.16b aese v0.16b, v5.16b
aesmc v0.16b, v0.16b aesmc v0.16b, v0.16b
aese v1.16b, v5.16b aese v1.16b, v5.16b
aesmc v1.16b, v1.16b aesmc v1.16b, v1.16b
4: ld1 {v4.2d}, [x10], #16 /* load next round key */ 4: ld1 {v4.16b}, [x10], #16 /* load next round key */
subs w7, w7, #3 subs w7, w7, #3
aese v0.16b, v3.16b aese v0.16b, v3.16b
aesmc v0.16b, v0.16b aesmc v0.16b, v0.16b
aese v1.16b, v3.16b aese v1.16b, v3.16b
aesmc v1.16b, v1.16b aesmc v1.16b, v1.16b
ld1 {v5.2d}, [x10], #16 /* load next round key */ ld1 {v5.16b}, [x10], #16 /* load next round key */
bpl 2b bpl 2b
aese v0.16b, v4.16b aese v0.16b, v4.16b
aese v1.16b, v4.16b aese v1.16b, v4.16b
...@@ -177,14 +178,14 @@ ENDPROC(ce_aes_ccm_final) ...@@ -177,14 +178,14 @@ ENDPROC(ce_aes_ccm_final)
eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */ eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */
st1 {v1.16b}, [x0], #16 /* write output block */ st1 {v1.16b}, [x0], #16 /* write output block */
bne 0b bne 0b
rev x8, x8 CPU_LE( rev x8, x8 )
st1 {v0.2d}, [x5] /* store mac */ st1 {v0.16b}, [x5] /* store mac */
str x8, [x6, #8] /* store lsb end of ctr (BE) */ str x8, [x6, #8] /* store lsb end of ctr (BE) */
5: ret 5: ret
6: eor v0.16b, v0.16b, v5.16b /* final round mac */ 6: eor v0.16b, v0.16b, v5.16b /* final round mac */
eor v1.16b, v1.16b, v5.16b /* final round enc */ eor v1.16b, v1.16b, v5.16b /* final round enc */
st1 {v0.2d}, [x5] /* store mac */ st1 {v0.16b}, [x5] /* store mac */
add w2, w2, #16 /* process partial tail block */ add w2, w2, #16 /* process partial tail block */
7: ldrb w9, [x1], #1 /* get 1 byte of input */ 7: ldrb w9, [x1], #1 /* get 1 byte of input */
umov w6, v1.b[0] /* get top crypted ctr byte */ umov w6, v1.b[0] /* get top crypted ctr byte */
......
...@@ -11,9 +11,9 @@ ...@@ -11,9 +11,9 @@
#include <asm/neon.h> #include <asm/neon.h>
#include <asm/unaligned.h> #include <asm/unaligned.h>
#include <crypto/aes.h> #include <crypto/aes.h>
#include <crypto/algapi.h>
#include <crypto/scatterwalk.h> #include <crypto/scatterwalk.h>
#include <crypto/internal/aead.h> #include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
#include <linux/module.h> #include <linux/module.h>
#include "aes-ce-setkey.h" #include "aes-ce-setkey.h"
...@@ -149,12 +149,7 @@ static int ccm_encrypt(struct aead_request *req) ...@@ -149,12 +149,7 @@ static int ccm_encrypt(struct aead_request *req)
{ {
struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead); struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
struct blkcipher_desc desc = { .info = req->iv }; struct skcipher_walk walk;
struct blkcipher_walk walk;
struct scatterlist srcbuf[2];
struct scatterlist dstbuf[2];
struct scatterlist *src;
struct scatterlist *dst;
u8 __aligned(8) mac[AES_BLOCK_SIZE]; u8 __aligned(8) mac[AES_BLOCK_SIZE];
u8 buf[AES_BLOCK_SIZE]; u8 buf[AES_BLOCK_SIZE];
u32 len = req->cryptlen; u32 len = req->cryptlen;
...@@ -172,27 +167,19 @@ static int ccm_encrypt(struct aead_request *req) ...@@ -172,27 +167,19 @@ static int ccm_encrypt(struct aead_request *req)
/* preserve the original iv for the final round */ /* preserve the original iv for the final round */
memcpy(buf, req->iv, AES_BLOCK_SIZE); memcpy(buf, req->iv, AES_BLOCK_SIZE);
src = scatterwalk_ffwd(srcbuf, req->src, req->assoclen); err = skcipher_walk_aead_encrypt(&walk, req, true);
dst = src;
if (req->src != req->dst)
dst = scatterwalk_ffwd(dstbuf, req->dst, req->assoclen);
blkcipher_walk_init(&walk, dst, src, len);
err = blkcipher_aead_walk_virt_block(&desc, &walk, aead,
AES_BLOCK_SIZE);
while (walk.nbytes) { while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE; u32 tail = walk.nbytes % AES_BLOCK_SIZE;
if (walk.nbytes == len) if (walk.nbytes == walk.total)
tail = 0; tail = 0;
ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr, ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes - tail, ctx->key_enc, walk.nbytes - tail, ctx->key_enc,
num_rounds(ctx), mac, walk.iv); num_rounds(ctx), mac, walk.iv);
len -= walk.nbytes - tail; err = skcipher_walk_done(&walk, tail);
err = blkcipher_walk_done(&desc, &walk, tail);
} }
if (!err) if (!err)
ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
...@@ -203,7 +190,7 @@ static int ccm_encrypt(struct aead_request *req) ...@@ -203,7 +190,7 @@ static int ccm_encrypt(struct aead_request *req)
return err; return err;
/* copy authtag to end of dst */ /* copy authtag to end of dst */
scatterwalk_map_and_copy(mac, dst, req->cryptlen, scatterwalk_map_and_copy(mac, req->dst, req->assoclen + req->cryptlen,
crypto_aead_authsize(aead), 1); crypto_aead_authsize(aead), 1);
return 0; return 0;
...@@ -214,12 +201,7 @@ static int ccm_decrypt(struct aead_request *req) ...@@ -214,12 +201,7 @@ static int ccm_decrypt(struct aead_request *req)
struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead); struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
unsigned int authsize = crypto_aead_authsize(aead); unsigned int authsize = crypto_aead_authsize(aead);
struct blkcipher_desc desc = { .info = req->iv }; struct skcipher_walk walk;
struct blkcipher_walk walk;
struct scatterlist srcbuf[2];
struct scatterlist dstbuf[2];
struct scatterlist *src;
struct scatterlist *dst;
u8 __aligned(8) mac[AES_BLOCK_SIZE]; u8 __aligned(8) mac[AES_BLOCK_SIZE];
u8 buf[AES_BLOCK_SIZE]; u8 buf[AES_BLOCK_SIZE];
u32 len = req->cryptlen - authsize; u32 len = req->cryptlen - authsize;
...@@ -237,27 +219,19 @@ static int ccm_decrypt(struct aead_request *req) ...@@ -237,27 +219,19 @@ static int ccm_decrypt(struct aead_request *req)
/* preserve the original iv for the final round */ /* preserve the original iv for the final round */
memcpy(buf, req->iv, AES_BLOCK_SIZE); memcpy(buf, req->iv, AES_BLOCK_SIZE);
src = scatterwalk_ffwd(srcbuf, req->src, req->assoclen); err = skcipher_walk_aead_decrypt(&walk, req, true);
dst = src;
if (req->src != req->dst)
dst = scatterwalk_ffwd(dstbuf, req->dst, req->assoclen);
blkcipher_walk_init(&walk, dst, src, len);
err = blkcipher_aead_walk_virt_block(&desc, &walk, aead,
AES_BLOCK_SIZE);
while (walk.nbytes) { while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE; u32 tail = walk.nbytes % AES_BLOCK_SIZE;
if (walk.nbytes == len) if (walk.nbytes == walk.total)
tail = 0; tail = 0;
ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr, ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes - tail, ctx->key_enc, walk.nbytes - tail, ctx->key_enc,
num_rounds(ctx), mac, walk.iv); num_rounds(ctx), mac, walk.iv);
len -= walk.nbytes - tail; err = skcipher_walk_done(&walk, tail);
err = blkcipher_walk_done(&desc, &walk, tail);
} }
if (!err) if (!err)
ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
...@@ -268,7 +242,8 @@ static int ccm_decrypt(struct aead_request *req) ...@@ -268,7 +242,8 @@ static int ccm_decrypt(struct aead_request *req)
return err; return err;
/* compare calculated auth tag with the stored one */ /* compare calculated auth tag with the stored one */
scatterwalk_map_and_copy(buf, src, req->cryptlen - authsize, scatterwalk_map_and_copy(buf, req->src,
req->assoclen + req->cryptlen - authsize,
authsize, 0); authsize, 0);
if (crypto_memneq(mac, buf, authsize)) if (crypto_memneq(mac, buf, authsize))
...@@ -287,6 +262,7 @@ static struct aead_alg ccm_aes_alg = { ...@@ -287,6 +262,7 @@ static struct aead_alg ccm_aes_alg = {
.cra_module = THIS_MODULE, .cra_module = THIS_MODULE,
}, },
.ivsize = AES_BLOCK_SIZE, .ivsize = AES_BLOCK_SIZE,
.chunksize = AES_BLOCK_SIZE,
.maxauthsize = AES_BLOCK_SIZE, .maxauthsize = AES_BLOCK_SIZE,
.setkey = ccm_setkey, .setkey = ccm_setkey,
.setauthsize = ccm_setauthsize, .setauthsize = ccm_setauthsize,
......
...@@ -47,24 +47,24 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) ...@@ -47,24 +47,24 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
kernel_neon_begin_partial(4); kernel_neon_begin_partial(4);
__asm__(" ld1 {v0.16b}, %[in] ;" __asm__(" ld1 {v0.16b}, %[in] ;"
" ld1 {v1.2d}, [%[key]], #16 ;" " ld1 {v1.16b}, [%[key]], #16 ;"
" cmp %w[rounds], #10 ;" " cmp %w[rounds], #10 ;"
" bmi 0f ;" " bmi 0f ;"
" bne 3f ;" " bne 3f ;"
" mov v3.16b, v1.16b ;" " mov v3.16b, v1.16b ;"
" b 2f ;" " b 2f ;"
"0: mov v2.16b, v1.16b ;" "0: mov v2.16b, v1.16b ;"
" ld1 {v3.2d}, [%[key]], #16 ;" " ld1 {v3.16b}, [%[key]], #16 ;"
"1: aese v0.16b, v2.16b ;" "1: aese v0.16b, v2.16b ;"
" aesmc v0.16b, v0.16b ;" " aesmc v0.16b, v0.16b ;"
"2: ld1 {v1.2d}, [%[key]], #16 ;" "2: ld1 {v1.16b}, [%[key]], #16 ;"
" aese v0.16b, v3.16b ;" " aese v0.16b, v3.16b ;"
" aesmc v0.16b, v0.16b ;" " aesmc v0.16b, v0.16b ;"
"3: ld1 {v2.2d}, [%[key]], #16 ;" "3: ld1 {v2.16b}, [%[key]], #16 ;"
" subs %w[rounds], %w[rounds], #3 ;" " subs %w[rounds], %w[rounds], #3 ;"
" aese v0.16b, v1.16b ;" " aese v0.16b, v1.16b ;"
" aesmc v0.16b, v0.16b ;" " aesmc v0.16b, v0.16b ;"
" ld1 {v3.2d}, [%[key]], #16 ;" " ld1 {v3.16b}, [%[key]], #16 ;"
" bpl 1b ;" " bpl 1b ;"
" aese v0.16b, v2.16b ;" " aese v0.16b, v2.16b ;"
" eor v0.16b, v0.16b, v3.16b ;" " eor v0.16b, v0.16b, v3.16b ;"
...@@ -92,24 +92,24 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) ...@@ -92,24 +92,24 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
kernel_neon_begin_partial(4); kernel_neon_begin_partial(4);
__asm__(" ld1 {v0.16b}, %[in] ;" __asm__(" ld1 {v0.16b}, %[in] ;"
" ld1 {v1.2d}, [%[key]], #16 ;" " ld1 {v1.16b}, [%[key]], #16 ;"
" cmp %w[rounds], #10 ;" " cmp %w[rounds], #10 ;"
" bmi 0f ;" " bmi 0f ;"
" bne 3f ;" " bne 3f ;"
" mov v3.16b, v1.16b ;" " mov v3.16b, v1.16b ;"
" b 2f ;" " b 2f ;"
"0: mov v2.16b, v1.16b ;" "0: mov v2.16b, v1.16b ;"
" ld1 {v3.2d}, [%[key]], #16 ;" " ld1 {v3.16b}, [%[key]], #16 ;"
"1: aesd v0.16b, v2.16b ;" "1: aesd v0.16b, v2.16b ;"
" aesimc v0.16b, v0.16b ;" " aesimc v0.16b, v0.16b ;"
"2: ld1 {v1.2d}, [%[key]], #16 ;" "2: ld1 {v1.16b}, [%[key]], #16 ;"
" aesd v0.16b, v3.16b ;" " aesd v0.16b, v3.16b ;"
" aesimc v0.16b, v0.16b ;" " aesimc v0.16b, v0.16b ;"
"3: ld1 {v2.2d}, [%[key]], #16 ;" "3: ld1 {v2.16b}, [%[key]], #16 ;"
" subs %w[rounds], %w[rounds], #3 ;" " subs %w[rounds], %w[rounds], #3 ;"
" aesd v0.16b, v1.16b ;" " aesd v0.16b, v1.16b ;"
" aesimc v0.16b, v0.16b ;" " aesimc v0.16b, v0.16b ;"
" ld1 {v3.2d}, [%[key]], #16 ;" " ld1 {v3.16b}, [%[key]], #16 ;"
" bpl 1b ;" " bpl 1b ;"
" aesd v0.16b, v2.16b ;" " aesd v0.16b, v2.16b ;"
" eor v0.16b, v0.16b, v3.16b ;" " eor v0.16b, v0.16b, v3.16b ;"
...@@ -173,7 +173,12 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, ...@@ -173,7 +173,12 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
u32 *rki = ctx->key_enc + (i * kwords); u32 *rki = ctx->key_enc + (i * kwords);
u32 *rko = rki + kwords; u32 *rko = rki + kwords;
#ifndef CONFIG_CPU_BIG_ENDIAN
rko[0] = ror32(aes_sub(rki[kwords - 1]), 8) ^ rcon[i] ^ rki[0]; rko[0] = ror32(aes_sub(rki[kwords - 1]), 8) ^ rcon[i] ^ rki[0];
#else
rko[0] = rol32(aes_sub(rki[kwords - 1]), 8) ^ (rcon[i] << 24) ^
rki[0];
#endif
rko[1] = rko[0] ^ rki[1]; rko[1] = rko[0] ^ rki[1];
rko[2] = rko[1] ^ rki[2]; rko[2] = rko[1] ^ rki[2];
rko[3] = rko[2] ^ rki[3]; rko[3] = rko[2] ^ rki[3];
......
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
*/ */
#include <linux/linkage.h> #include <linux/linkage.h>
#include <asm/assembler.h>
#define AES_ENTRY(func) ENTRY(ce_ ## func) #define AES_ENTRY(func) ENTRY(ce_ ## func)
#define AES_ENDPROC(func) ENDPROC(ce_ ## func) #define AES_ENDPROC(func) ENDPROC(ce_ ## func)
......
This diff is collapsed.
...@@ -386,7 +386,8 @@ AES_ENDPROC(aes_ctr_encrypt) ...@@ -386,7 +386,8 @@ AES_ENDPROC(aes_ctr_encrypt)
.endm .endm
.Lxts_mul_x: .Lxts_mul_x:
.word 1, 0, 0x87, 0 CPU_LE( .quad 1, 0x87 )
CPU_BE( .quad 0x87, 1 )
AES_ENTRY(aes_xts_encrypt) AES_ENTRY(aes_xts_encrypt)
FRAME_PUSH FRAME_PUSH
......
...@@ -9,6 +9,7 @@ ...@@ -9,6 +9,7 @@
*/ */
#include <linux/linkage.h> #include <linux/linkage.h>
#include <asm/assembler.h>
#define AES_ENTRY(func) ENTRY(neon_ ## func) #define AES_ENTRY(func) ENTRY(neon_ ## func)
#define AES_ENDPROC(func) ENDPROC(neon_ ## func) #define AES_ENDPROC(func) ENDPROC(neon_ ## func)
...@@ -83,13 +84,13 @@ ...@@ -83,13 +84,13 @@
.endm .endm
.macro do_block, enc, in, rounds, rk, rkp, i .macro do_block, enc, in, rounds, rk, rkp, i
ld1 {v15.16b}, [\rk] ld1 {v15.4s}, [\rk]
add \rkp, \rk, #16 add \rkp, \rk, #16
mov \i, \rounds mov \i, \rounds
1111: eor \in\().16b, \in\().16b, v15.16b /* ^round key */ 1111: eor \in\().16b, \in\().16b, v15.16b /* ^round key */
tbl \in\().16b, {\in\().16b}, v13.16b /* ShiftRows */ tbl \in\().16b, {\in\().16b}, v13.16b /* ShiftRows */
sub_bytes \in sub_bytes \in
ld1 {v15.16b}, [\rkp], #16 ld1 {v15.4s}, [\rkp], #16
subs \i, \i, #1 subs \i, \i, #1
beq 2222f beq 2222f
.if \enc == 1 .if \enc == 1
...@@ -229,7 +230,7 @@ ...@@ -229,7 +230,7 @@
.endm .endm
.macro do_block_2x, enc, in0, in1 rounds, rk, rkp, i .macro do_block_2x, enc, in0, in1 rounds, rk, rkp, i
ld1 {v15.16b}, [\rk] ld1 {v15.4s}, [\rk]
add \rkp, \rk, #16 add \rkp, \rk, #16
mov \i, \rounds mov \i, \rounds
1111: eor \in0\().16b, \in0\().16b, v15.16b /* ^round key */ 1111: eor \in0\().16b, \in0\().16b, v15.16b /* ^round key */
...@@ -237,7 +238,7 @@ ...@@ -237,7 +238,7 @@
sub_bytes_2x \in0, \in1 sub_bytes_2x \in0, \in1
tbl \in0\().16b, {\in0\().16b}, v13.16b /* ShiftRows */ tbl \in0\().16b, {\in0\().16b}, v13.16b /* ShiftRows */
tbl \in1\().16b, {\in1\().16b}, v13.16b /* ShiftRows */ tbl \in1\().16b, {\in1\().16b}, v13.16b /* ShiftRows */
ld1 {v15.16b}, [\rkp], #16 ld1 {v15.4s}, [\rkp], #16
subs \i, \i, #1 subs \i, \i, #1
beq 2222f beq 2222f
.if \enc == 1 .if \enc == 1
...@@ -254,7 +255,7 @@ ...@@ -254,7 +255,7 @@
.endm .endm
.macro do_block_4x, enc, in0, in1, in2, in3, rounds, rk, rkp, i .macro do_block_4x, enc, in0, in1, in2, in3, rounds, rk, rkp, i
ld1 {v15.16b}, [\rk] ld1 {v15.4s}, [\rk]
add \rkp, \rk, #16 add \rkp, \rk, #16
mov \i, \rounds mov \i, \rounds
1111: eor \in0\().16b, \in0\().16b, v15.16b /* ^round key */ 1111: eor \in0\().16b, \in0\().16b, v15.16b /* ^round key */
...@@ -266,7 +267,7 @@ ...@@ -266,7 +267,7 @@
tbl \in1\().16b, {\in1\().16b}, v13.16b /* ShiftRows */ tbl \in1\().16b, {\in1\().16b}, v13.16b /* ShiftRows */
tbl \in2\().16b, {\in2\().16b}, v13.16b /* ShiftRows */ tbl \in2\().16b, {\in2\().16b}, v13.16b /* ShiftRows */
tbl \in3\().16b, {\in3\().16b}, v13.16b /* ShiftRows */ tbl \in3\().16b, {\in3\().16b}, v13.16b /* ShiftRows */
ld1 {v15.16b}, [\rkp], #16 ld1 {v15.4s}, [\rkp], #16
subs \i, \i, #1 subs \i, \i, #1
beq 2222f beq 2222f
.if \enc == 1 .if \enc == 1
...@@ -306,12 +307,16 @@ ...@@ -306,12 +307,16 @@
.text .text
.align 4 .align 4
.LForward_ShiftRows: .LForward_ShiftRows:
.byte 0x0, 0x5, 0xa, 0xf, 0x4, 0x9, 0xe, 0x3 CPU_LE( .byte 0x0, 0x5, 0xa, 0xf, 0x4, 0x9, 0xe, 0x3 )
.byte 0x8, 0xd, 0x2, 0x7, 0xc, 0x1, 0x6, 0xb CPU_LE( .byte 0x8, 0xd, 0x2, 0x7, 0xc, 0x1, 0x6, 0xb )
CPU_BE( .byte 0xb, 0x6, 0x1, 0xc, 0x7, 0x2, 0xd, 0x8 )
CPU_BE( .byte 0x3, 0xe, 0x9, 0x4, 0xf, 0xa, 0x5, 0x0 )
.LReverse_ShiftRows: .LReverse_ShiftRows:
.byte 0x0, 0xd, 0xa, 0x7, 0x4, 0x1, 0xe, 0xb CPU_LE( .byte 0x0, 0xd, 0xa, 0x7, 0x4, 0x1, 0xe, 0xb )
.byte 0x8, 0x5, 0x2, 0xf, 0xc, 0x9, 0x6, 0x3 CPU_LE( .byte 0x8, 0x5, 0x2, 0xf, 0xc, 0x9, 0x6, 0x3 )
CPU_BE( .byte 0x3, 0x6, 0x9, 0xc, 0xf, 0x2, 0x5, 0x8 )
CPU_BE( .byte 0xb, 0xe, 0x1, 0x4, 0x7, 0xa, 0xd, 0x0 )
.LForward_Sbox: .LForward_Sbox:
.byte 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5 .byte 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5
......
/*
* Accelerated CRC32(C) using arm64 CRC, NEON and Crypto Extensions instructions
*
* Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
/* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see http://www.gnu.org/licenses
*
* Please visit http://www.xyratex.com/contact if you need additional
* information or have any questions.
*
* GPL HEADER END
*/
/*
* Copyright 2012 Xyratex Technology Limited
*
* Using hardware provided PCLMULQDQ instruction to accelerate the CRC32
* calculation.
* CRC32 polynomial:0x04c11db7(BE)/0xEDB88320(LE)
* PCLMULQDQ is a new instruction in Intel SSE4.2, the reference can be found
* at:
* http://www.intel.com/products/processor/manuals/
* Intel(R) 64 and IA-32 Architectures Software Developer's Manual
* Volume 2B: Instruction Set Reference, N-Z
*
* Authors: Gregory Prestas <Gregory_Prestas@us.xyratex.com>
* Alexander Boyko <Alexander_Boyko@xyratex.com>
*/
#include <linux/linkage.h>
#include <asm/assembler.h>
.text
.align 6
.cpu generic+crypto+crc
.Lcrc32_constants:
/*
* [x4*128+32 mod P(x) << 32)]' << 1 = 0x154442bd4
* #define CONSTANT_R1 0x154442bd4LL
*
* [(x4*128-32 mod P(x) << 32)]' << 1 = 0x1c6e41596
* #define CONSTANT_R2 0x1c6e41596LL
*/
.octa 0x00000001c6e415960000000154442bd4
/*
* [(x128+32 mod P(x) << 32)]' << 1 = 0x1751997d0
* #define CONSTANT_R3 0x1751997d0LL
*
* [(x128-32 mod P(x) << 32)]' << 1 = 0x0ccaa009e
* #define CONSTANT_R4 0x0ccaa009eLL
*/
.octa 0x00000000ccaa009e00000001751997d0
/*
* [(x64 mod P(x) << 32)]' << 1 = 0x163cd6124
* #define CONSTANT_R5 0x163cd6124LL
*/
.quad 0x0000000163cd6124
.quad 0x00000000FFFFFFFF
/*
* #define CRCPOLY_TRUE_LE_FULL 0x1DB710641LL
*
* Barrett Reduction constant (u64`) = u` = (x**64 / P(x))`
* = 0x1F7011641LL
* #define CONSTANT_RU 0x1F7011641LL
*/
.octa 0x00000001F701164100000001DB710641
.Lcrc32c_constants:
.octa 0x000000009e4addf800000000740eef02
.octa 0x000000014cd00bd600000000f20c0dfe
.quad 0x00000000dd45aab8
.quad 0x00000000FFFFFFFF
.octa 0x00000000dea713f10000000105ec76f0
vCONSTANT .req v0
dCONSTANT .req d0
qCONSTANT .req q0
BUF .req x0
LEN .req x1
CRC .req x2
vzr .req v9
/**
* Calculate crc32
* BUF - buffer
* LEN - sizeof buffer (multiple of 16 bytes), LEN should be > 63
* CRC - initial crc32
* return %eax crc32
* uint crc32_pmull_le(unsigned char const *buffer,
* size_t len, uint crc32)
*/
ENTRY(crc32_pmull_le)
adr x3, .Lcrc32_constants
b 0f
ENTRY(crc32c_pmull_le)
adr x3, .Lcrc32c_constants
0: bic LEN, LEN, #15
ld1 {v1.16b-v4.16b}, [BUF], #0x40
movi vzr.16b, #0
fmov dCONSTANT, CRC
eor v1.16b, v1.16b, vCONSTANT.16b
sub LEN, LEN, #0x40
cmp LEN, #0x40
b.lt less_64
ldr qCONSTANT, [x3]
loop_64: /* 64 bytes Full cache line folding */
sub LEN, LEN, #0x40
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull2 v6.1q, v2.2d, vCONSTANT.2d
pmull2 v7.1q, v3.2d, vCONSTANT.2d
pmull2 v8.1q, v4.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
pmull v2.1q, v2.1d, vCONSTANT.1d
pmull v3.1q, v3.1d, vCONSTANT.1d
pmull v4.1q, v4.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v5.16b
ld1 {v5.16b}, [BUF], #0x10
eor v2.16b, v2.16b, v6.16b
ld1 {v6.16b}, [BUF], #0x10
eor v3.16b, v3.16b, v7.16b
ld1 {v7.16b}, [BUF], #0x10
eor v4.16b, v4.16b, v8.16b
ld1 {v8.16b}, [BUF], #0x10
eor v1.16b, v1.16b, v5.16b
eor v2.16b, v2.16b, v6.16b
eor v3.16b, v3.16b, v7.16b
eor v4.16b, v4.16b, v8.16b
cmp LEN, #0x40
b.ge loop_64
less_64: /* Folding cache line into 128bit */
ldr qCONSTANT, [x3, #16]
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v5.16b
eor v1.16b, v1.16b, v2.16b
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v5.16b
eor v1.16b, v1.16b, v3.16b
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v5.16b
eor v1.16b, v1.16b, v4.16b
cbz LEN, fold_64
loop_16: /* Folding rest buffer into 128bit */
subs LEN, LEN, #0x10
ld1 {v2.16b}, [BUF], #0x10
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v5.16b
eor v1.16b, v1.16b, v2.16b
b.ne loop_16
fold_64:
/* perform the last 64 bit fold, also adds 32 zeroes
* to the input stream */
ext v2.16b, v1.16b, v1.16b, #8
pmull2 v2.1q, v2.2d, vCONSTANT.2d
ext v1.16b, v1.16b, vzr.16b, #8
eor v1.16b, v1.16b, v2.16b
/* final 32-bit fold */
ldr dCONSTANT, [x3, #32]
ldr d3, [x3, #40]
ext v2.16b, v1.16b, vzr.16b, #4
and v1.16b, v1.16b, v3.16b
pmull v1.1q, v1.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v2.16b
/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
ldr qCONSTANT, [x3, #48]
and v2.16b, v1.16b, v3.16b
ext v2.16b, vzr.16b, v2.16b, #8
pmull2 v2.1q, v2.2d, vCONSTANT.2d
and v2.16b, v2.16b, v3.16b
pmull v2.1q, v2.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v2.16b
mov w0, v1.s[1]
ret
ENDPROC(crc32_pmull_le)
ENDPROC(crc32c_pmull_le)
.macro __crc32, c
0: subs x2, x2, #16
b.mi 8f
ldp x3, x4, [x1], #16
CPU_BE( rev x3, x3 )
CPU_BE( rev x4, x4 )
crc32\c\()x w0, w0, x3
crc32\c\()x w0, w0, x4
b.ne 0b
ret
8: tbz x2, #3, 4f
ldr x3, [x1], #8
CPU_BE( rev x3, x3 )
crc32\c\()x w0, w0, x3
4: tbz x2, #2, 2f
ldr w3, [x1], #4
CPU_BE( rev w3, w3 )
crc32\c\()w w0, w0, w3
2: tbz x2, #1, 1f
ldrh w3, [x1], #2
CPU_BE( rev16 w3, w3 )
crc32\c\()h w0, w0, w3
1: tbz x2, #0, 0f
ldrb w3, [x1]
crc32\c\()b w0, w0, w3
0: ret
.endm
.align 5
ENTRY(crc32_armv8_le)
__crc32
ENDPROC(crc32_armv8_le)
.align 5
ENTRY(crc32c_armv8_le)
__crc32 c
ENDPROC(crc32c_armv8_le)
/*
* Accelerated CRC32(C) using arm64 NEON and Crypto Extensions instructions
*
* Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/cpufeature.h>
#include <linux/crc32.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/string.h>
#include <crypto/internal/hash.h>
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/unaligned.h>
#define PMULL_MIN_LEN 64L /* minimum size of buffer
* for crc32_pmull_le_16 */
#define SCALE_F 16L /* size of NEON register */
asmlinkage u32 crc32_pmull_le(const u8 buf[], u64 len, u32 init_crc);
asmlinkage u32 crc32_armv8_le(u32 init_crc, const u8 buf[], size_t len);
asmlinkage u32 crc32c_pmull_le(const u8 buf[], u64 len, u32 init_crc);
asmlinkage u32 crc32c_armv8_le(u32 init_crc, const u8 buf[], size_t len);
static u32 (*fallback_crc32)(u32 init_crc, const u8 buf[], size_t len);
static u32 (*fallback_crc32c)(u32 init_crc, const u8 buf[], size_t len);
static int crc32_pmull_cra_init(struct crypto_tfm *tfm)
{
u32 *key = crypto_tfm_ctx(tfm);
*key = 0;
return 0;
}
static int crc32c_pmull_cra_init(struct crypto_tfm *tfm)
{
u32 *key = crypto_tfm_ctx(tfm);
*key = ~0;
return 0;
}
static int crc32_pmull_setkey(struct crypto_shash *hash, const u8 *key,
unsigned int keylen)
{
u32 *mctx = crypto_shash_ctx(hash);
if (keylen != sizeof(u32)) {
crypto_shash_set_flags(hash, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
*mctx = le32_to_cpup((__le32 *)key);
return 0;
}
static int crc32_pmull_init(struct shash_desc *desc)
{
u32 *mctx = crypto_shash_ctx(desc->tfm);
u32 *crc = shash_desc_ctx(desc);
*crc = *mctx;
return 0;
}
static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u32 *crc = shash_desc_ctx(desc);
unsigned int l;
if ((u64)data % SCALE_F) {
l = min_t(u32, length, SCALE_F - ((u64)data % SCALE_F));
*crc = fallback_crc32(*crc, data, l);
data += l;
length -= l;
}
if (length >= PMULL_MIN_LEN) {
l = round_down(length, SCALE_F);
kernel_neon_begin_partial(10);
*crc = crc32_pmull_le(data, l, *crc);
kernel_neon_end();
data += l;
length -= l;
}
if (length > 0)
*crc = fallback_crc32(*crc, data, length);
return 0;
}
static int crc32c_pmull_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u32 *crc = shash_desc_ctx(desc);
unsigned int l;
if ((u64)data % SCALE_F) {
l = min_t(u32, length, SCALE_F - ((u64)data % SCALE_F));
*crc = fallback_crc32c(*crc, data, l);
data += l;
length -= l;
}
if (length >= PMULL_MIN_LEN) {
l = round_down(length, SCALE_F);
kernel_neon_begin_partial(10);
*crc = crc32c_pmull_le(data, l, *crc);
kernel_neon_end();
data += l;
length -= l;
}
if (length > 0) {
*crc = fallback_crc32c(*crc, data, length);
}
return 0;
}
static int crc32_pmull_final(struct shash_desc *desc, u8 *out)
{
u32 *crc = shash_desc_ctx(desc);
put_unaligned_le32(*crc, out);
return 0;
}
static int crc32c_pmull_final(struct shash_desc *desc, u8 *out)
{
u32 *crc = shash_desc_ctx(desc);
put_unaligned_le32(~*crc, out);
return 0;
}
static struct shash_alg crc32_pmull_algs[] = { {
.setkey = crc32_pmull_setkey,
.init = crc32_pmull_init,
.update = crc32_pmull_update,
.final = crc32_pmull_final,
.descsize = sizeof(u32),
.digestsize = sizeof(u32),
.base.cra_ctxsize = sizeof(u32),
.base.cra_init = crc32_pmull_cra_init,
.base.cra_name = "crc32",
.base.cra_driver_name = "crc32-arm64-ce",
.base.cra_priority = 200,
.base.cra_blocksize = 1,
.base.cra_module = THIS_MODULE,
}, {
.setkey = crc32_pmull_setkey,
.init = crc32_pmull_init,
.update = crc32c_pmull_update,
.final = crc32c_pmull_final,
.descsize = sizeof(u32),
.digestsize = sizeof(u32),
.base.cra_ctxsize = sizeof(u32),
.base.cra_init = crc32c_pmull_cra_init,
.base.cra_name = "crc32c",
.base.cra_driver_name = "crc32c-arm64-ce",
.base.cra_priority = 200,
.base.cra_blocksize = 1,
.base.cra_module = THIS_MODULE,
} };
static int __init crc32_pmull_mod_init(void)
{
if (elf_hwcap & HWCAP_CRC32) {
fallback_crc32 = crc32_armv8_le;
fallback_crc32c = crc32c_armv8_le;
} else {
fallback_crc32 = crc32_le;
fallback_crc32c = __crc32c_le;
}
return crypto_register_shashes(crc32_pmull_algs,
ARRAY_SIZE(crc32_pmull_algs));
}
static void __exit crc32_pmull_mod_exit(void)
{
crypto_unregister_shashes(crc32_pmull_algs,
ARRAY_SIZE(crc32_pmull_algs));
}
module_cpu_feature_match(PMULL, crc32_pmull_mod_init);
module_exit(crc32_pmull_mod_exit);
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
This diff is collapsed.
/*
* Accelerated CRC-T10DIF using arm64 NEON and Crypto Extensions instructions
*
* Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/cpufeature.h>
#include <linux/crc-t10dif.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/string.h>
#include <crypto/internal/hash.h>
#include <asm/neon.h>
#define CRC_T10DIF_PMULL_CHUNK_SIZE 16U
asmlinkage u16 crc_t10dif_pmull(u16 init_crc, const u8 buf[], u64 len);
static int crct10dif_init(struct shash_desc *desc)
{
u16 *crc = shash_desc_ctx(desc);
*crc = 0;
return 0;
}
static int crct10dif_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u16 *crc = shash_desc_ctx(desc);
unsigned int l;
if (unlikely((u64)data % CRC_T10DIF_PMULL_CHUNK_SIZE)) {
l = min_t(u32, length, CRC_T10DIF_PMULL_CHUNK_SIZE -
((u64)data % CRC_T10DIF_PMULL_CHUNK_SIZE));
*crc = crc_t10dif_generic(*crc, data, l);
length -= l;
data += l;
}
if (length > 0) {
kernel_neon_begin_partial(14);
*crc = crc_t10dif_pmull(*crc, data, length);
kernel_neon_end();
}
return 0;
}
static int crct10dif_final(struct shash_desc *desc, u8 *out)
{
u16 *crc = shash_desc_ctx(desc);
*(u16 *)out = *crc;
return 0;
}
static struct shash_alg crc_t10dif_alg = {
.digestsize = CRC_T10DIF_DIGEST_SIZE,
.init = crct10dif_init,
.update = crct10dif_update,
.final = crct10dif_final,
.descsize = CRC_T10DIF_DIGEST_SIZE,
.base.cra_name = "crct10dif",
.base.cra_driver_name = "crct10dif-arm64-ce",
.base.cra_priority = 200,
.base.cra_blocksize = CRC_T10DIF_BLOCK_SIZE,
.base.cra_module = THIS_MODULE,
};
static int __init crc_t10dif_mod_init(void)
{
return crypto_register_shash(&crc_t10dif_alg);
}
static void __exit crc_t10dif_mod_exit(void)
{
crypto_unregister_shash(&crc_t10dif_alg);
}
module_cpu_feature_match(PMULL, crc_t10dif_mod_init);
module_exit(crc_t10dif_mod_exit);
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
...@@ -29,8 +29,8 @@ ...@@ -29,8 +29,8 @@
* struct ghash_key const *k, const char *head) * struct ghash_key const *k, const char *head)
*/ */
ENTRY(pmull_ghash_update) ENTRY(pmull_ghash_update)
ld1 {SHASH.16b}, [x3] ld1 {SHASH.2d}, [x3]
ld1 {XL.16b}, [x1] ld1 {XL.2d}, [x1]
movi MASK.16b, #0xe1 movi MASK.16b, #0xe1
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
shl MASK.2d, MASK.2d, #57 shl MASK.2d, MASK.2d, #57
...@@ -74,6 +74,6 @@ CPU_LE( rev64 T1.16b, T1.16b ) ...@@ -74,6 +74,6 @@ CPU_LE( rev64 T1.16b, T1.16b )
cbnz w0, 0b cbnz w0, 0b
st1 {XL.16b}, [x1] st1 {XL.2d}, [x1]
ret ret
ENDPROC(pmull_ghash_update) ENDPROC(pmull_ghash_update)
...@@ -78,7 +78,7 @@ ENTRY(sha1_ce_transform) ...@@ -78,7 +78,7 @@ ENTRY(sha1_ce_transform)
ld1r {k3.4s}, [x6] ld1r {k3.4s}, [x6]
/* load state */ /* load state */
ldr dga, [x0] ld1 {dgav.4s}, [x0]
ldr dgb, [x0, #16] ldr dgb, [x0, #16]
/* load sha1_ce_state::finalize */ /* load sha1_ce_state::finalize */
...@@ -144,7 +144,7 @@ CPU_LE( rev32 v11.16b, v11.16b ) ...@@ -144,7 +144,7 @@ CPU_LE( rev32 v11.16b, v11.16b )
b 1b b 1b
/* store new state */ /* store new state */
3: str dga, [x0] 3: st1 {dgav.4s}, [x0]
str dgb, [x0, #16] str dgb, [x0, #16]
ret ret
ENDPROC(sha1_ce_transform) ENDPROC(sha1_ce_transform)
...@@ -85,7 +85,7 @@ ENTRY(sha2_ce_transform) ...@@ -85,7 +85,7 @@ ENTRY(sha2_ce_transform)
ld1 {v12.4s-v15.4s}, [x8] ld1 {v12.4s-v15.4s}, [x8]
/* load state */ /* load state */
ldp dga, dgb, [x0] ld1 {dgav.4s, dgbv.4s}, [x0]
/* load sha256_ce_state::finalize */ /* load sha256_ce_state::finalize */
ldr w4, [x0, #:lo12:sha256_ce_offsetof_finalize] ldr w4, [x0, #:lo12:sha256_ce_offsetof_finalize]
...@@ -148,6 +148,6 @@ CPU_LE( rev32 v19.16b, v19.16b ) ...@@ -148,6 +148,6 @@ CPU_LE( rev32 v19.16b, v19.16b )
b 1b b 1b
/* store new state */ /* store new state */
3: stp dga, dgb, [x0] 3: st1 {dgav.4s, dgbv.4s}, [x0]
ret ret
ENDPROC(sha2_ce_transform) ENDPROC(sha2_ce_transform)
This diff is collapsed.
/*
* Linux/arm64 port of the OpenSSL SHA256 implementation for AArch64
*
* Copyright (c) 2016 Linaro Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*
*/
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/internal/hash.h>
#include <crypto/sha.h>
#include <crypto/sha256_base.h>
#include <linux/cryptohash.h>
#include <linux/types.h>
#include <linux/string.h>
MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash for arm64");
MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("sha224");
MODULE_ALIAS_CRYPTO("sha256");
asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
unsigned int num_blks);
asmlinkage void sha256_block_neon(u32 *digest, const void *data,
unsigned int num_blks);
static int sha256_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order);
}
static int sha256_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
if (len)
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order);
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_data_order);
return sha256_base_finish(desc, out);
}
static int sha256_final(struct shash_desc *desc, u8 *out)
{
return sha256_finup(desc, NULL, 0, out);
}
static struct shash_alg algs[] = { {
.digestsize = SHA256_DIGEST_SIZE,
.init = sha256_base_init,
.update = sha256_update,
.final = sha256_final,
.finup = sha256_finup,
.descsize = sizeof(struct sha256_state),
.base.cra_name = "sha256",
.base.cra_driver_name = "sha256-arm64",
.base.cra_priority = 100,
.base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.base.cra_blocksize = SHA256_BLOCK_SIZE,
.base.cra_module = THIS_MODULE,
}, {
.digestsize = SHA224_DIGEST_SIZE,
.init = sha224_base_init,
.update = sha256_update,
.final = sha256_final,
.finup = sha256_finup,
.descsize = sizeof(struct sha256_state),
.base.cra_name = "sha224",
.base.cra_driver_name = "sha224-arm64",
.base.cra_priority = 100,
.base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.base.cra_blocksize = SHA224_BLOCK_SIZE,
.base.cra_module = THIS_MODULE,
} };
static int sha256_update_neon(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
/*
* Stacking and unstacking a substantial slice of the NEON register
* file may significantly affect performance for small updates when
* executing in interrupt context, so fall back to the scalar code
* in that case.
*/
if (!may_use_simd())
return sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order);
kernel_neon_begin();
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_neon);
kernel_neon_end();
return 0;
}
static int sha256_finup_neon(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
if (!may_use_simd()) {
if (len)
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order);
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_data_order);
} else {
kernel_neon_begin();
if (len)
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_neon);
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_neon);
kernel_neon_end();
}
return sha256_base_finish(desc, out);
}
static int sha256_final_neon(struct shash_desc *desc, u8 *out)
{
return sha256_finup_neon(desc, NULL, 0, out);
}
static struct shash_alg neon_algs[] = { {
.digestsize = SHA256_DIGEST_SIZE,
.init = sha256_base_init,
.update = sha256_update_neon,
.final = sha256_final_neon,
.finup = sha256_finup_neon,
.descsize = sizeof(struct sha256_state),
.base.cra_name = "sha256",
.base.cra_driver_name = "sha256-arm64-neon",
.base.cra_priority = 150,
.base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.base.cra_blocksize = SHA256_BLOCK_SIZE,
.base.cra_module = THIS_MODULE,
}, {
.digestsize = SHA224_DIGEST_SIZE,
.init = sha224_base_init,
.update = sha256_update_neon,
.final = sha256_final_neon,
.finup = sha256_finup_neon,
.descsize = sizeof(struct sha256_state),
.base.cra_name = "sha224",
.base.cra_driver_name = "sha224-arm64-neon",
.base.cra_priority = 150,
.base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.base.cra_blocksize = SHA224_BLOCK_SIZE,
.base.cra_module = THIS_MODULE,
} };
static int __init sha256_mod_init(void)
{
int ret = crypto_register_shashes(algs, ARRAY_SIZE(algs));
if (ret)
return ret;
if (elf_hwcap & HWCAP_ASIMD) {
ret = crypto_register_shashes(neon_algs, ARRAY_SIZE(neon_algs));
if (ret)
crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
}
return ret;
}
static void __exit sha256_mod_fini(void)
{
if (elf_hwcap & HWCAP_ASIMD)
crypto_unregister_shashes(neon_algs, ARRAY_SIZE(neon_algs));
crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
}
module_init(sha256_mod_init);
module_exit(sha256_mod_fini);
This diff is collapsed.
This diff is collapsed.
/*
* Linux/arm64 port of the OpenSSL SHA512 implementation for AArch64
*
* Copyright (c) 2016 Linaro Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*
*/
#include <crypto/internal/hash.h>
#include <linux/cryptohash.h>
#include <linux/types.h>
#include <linux/string.h>
#include <crypto/sha.h>
#include <crypto/sha512_base.h>
#include <asm/neon.h>
MODULE_DESCRIPTION("SHA-384/SHA-512 secure hash for arm64");
MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("sha384");
MODULE_ALIAS_CRYPTO("sha512");
asmlinkage void sha512_block_data_order(u32 *digest, const void *data,
unsigned int num_blks);
static int sha512_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
return sha512_base_do_update(desc, data, len,
(sha512_block_fn *)sha512_block_data_order);
}
static int sha512_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
if (len)
sha512_base_do_update(desc, data, len,
(sha512_block_fn *)sha512_block_data_order);
sha512_base_do_finalize(desc,
(sha512_block_fn *)sha512_block_data_order);
return sha512_base_finish(desc, out);
}
static int sha512_final(struct shash_desc *desc, u8 *out)
{
return sha512_finup(desc, NULL, 0, out);
}
static struct shash_alg algs[] = { {
.digestsize = SHA512_DIGEST_SIZE,
.init = sha512_base_init,
.update = sha512_update,
.final = sha512_final,
.finup = sha512_finup,
.descsize = sizeof(struct sha512_state),
.base.cra_name = "sha512",
.base.cra_driver_name = "sha512-arm64",
.base.cra_priority = 150,
.base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.base.cra_blocksize = SHA512_BLOCK_SIZE,
.base.cra_module = THIS_MODULE,
}, {
.digestsize = SHA384_DIGEST_SIZE,
.init = sha384_base_init,
.update = sha512_update,
.final = sha512_final,
.finup = sha512_finup,
.descsize = sizeof(struct sha512_state),
.base.cra_name = "sha384",
.base.cra_driver_name = "sha384-arm64",
.base.cra_priority = 150,
.base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.base.cra_blocksize = SHA384_BLOCK_SIZE,
.base.cra_module = THIS_MODULE,
} };
static int __init sha512_mod_init(void)
{
return crypto_register_shashes(algs, ARRAY_SIZE(algs));
}
static void __exit sha512_mod_fini(void)
{
crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
}
module_init(sha512_mod_init);
module_exit(sha512_mod_fini);
...@@ -9,7 +9,7 @@ obj-$(CONFIG_CRYPTO_MD5_PPC) += md5-ppc.o ...@@ -9,7 +9,7 @@ obj-$(CONFIG_CRYPTO_MD5_PPC) += md5-ppc.o
obj-$(CONFIG_CRYPTO_SHA1_PPC) += sha1-powerpc.o obj-$(CONFIG_CRYPTO_SHA1_PPC) += sha1-powerpc.o
obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o
obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o
obj-$(CONFIG_CRYPT_CRC32C_VPMSUM) += crc32c-vpmsum.o obj-$(CONFIG_CRYPTO_CRC32C_VPMSUM) += crc32c-vpmsum.o
aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o aes-spe-glue.o aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o aes-spe-glue.o
md5-ppc-y := md5-asm.o md5-glue.o md5-ppc-y := md5-asm.o md5-glue.o
......
This diff is collapsed.
...@@ -11,143 +11,186 @@ ...@@ -11,143 +11,186 @@
* *
*/ */
#include <crypto/algapi.h> #include <crypto/internal/skcipher.h>
#include <linux/err.h> #include <linux/err.h>
#include <linux/init.h> #include <linux/init.h>
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/crypto.h>
#include <asm/fpu/api.h> #include <asm/fpu/api.h>
struct crypto_fpu_ctx { struct crypto_fpu_ctx {
struct crypto_blkcipher *child; struct crypto_skcipher *child;
}; };
static int crypto_fpu_setkey(struct crypto_tfm *parent, const u8 *key, static int crypto_fpu_setkey(struct crypto_skcipher *parent, const u8 *key,
unsigned int keylen) unsigned int keylen)
{ {
struct crypto_fpu_ctx *ctx = crypto_tfm_ctx(parent); struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(parent);
struct crypto_blkcipher *child = ctx->child; struct crypto_skcipher *child = ctx->child;
int err; int err;
crypto_blkcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK); crypto_skcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK);
crypto_blkcipher_set_flags(child, crypto_tfm_get_flags(parent) & crypto_skcipher_set_flags(child, crypto_skcipher_get_flags(parent) &
CRYPTO_TFM_REQ_MASK); CRYPTO_TFM_REQ_MASK);
err = crypto_blkcipher_setkey(child, key, keylen); err = crypto_skcipher_setkey(child, key, keylen);
crypto_tfm_set_flags(parent, crypto_blkcipher_get_flags(child) & crypto_skcipher_set_flags(parent, crypto_skcipher_get_flags(child) &
CRYPTO_TFM_RES_MASK); CRYPTO_TFM_RES_MASK);
return err; return err;
} }
static int crypto_fpu_encrypt(struct blkcipher_desc *desc_in, static int crypto_fpu_encrypt(struct skcipher_request *req)
struct scatterlist *dst, struct scatterlist *src,
unsigned int nbytes)
{ {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(tfm);
struct crypto_skcipher *child = ctx->child;
SKCIPHER_REQUEST_ON_STACK(subreq, child);
int err; int err;
struct crypto_fpu_ctx *ctx = crypto_blkcipher_ctx(desc_in->tfm);
struct crypto_blkcipher *child = ctx->child; skcipher_request_set_tfm(subreq, child);
struct blkcipher_desc desc = { skcipher_request_set_callback(subreq, 0, NULL, NULL);
.tfm = child, skcipher_request_set_crypt(subreq, req->src, req->dst, req->cryptlen,
.info = desc_in->info, req->iv);
.flags = desc_in->flags & ~CRYPTO_TFM_REQ_MAY_SLEEP,
};
kernel_fpu_begin(); kernel_fpu_begin();
err = crypto_blkcipher_crt(desc.tfm)->encrypt(&desc, dst, src, nbytes); err = crypto_skcipher_encrypt(subreq);
kernel_fpu_end(); kernel_fpu_end();
skcipher_request_zero(subreq);
return err; return err;
} }
static int crypto_fpu_decrypt(struct blkcipher_desc *desc_in, static int crypto_fpu_decrypt(struct skcipher_request *req)
struct scatterlist *dst, struct scatterlist *src,
unsigned int nbytes)
{ {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(tfm);
struct crypto_skcipher *child = ctx->child;
SKCIPHER_REQUEST_ON_STACK(subreq, child);
int err; int err;
struct crypto_fpu_ctx *ctx = crypto_blkcipher_ctx(desc_in->tfm);
struct crypto_blkcipher *child = ctx->child; skcipher_request_set_tfm(subreq, child);
struct blkcipher_desc desc = { skcipher_request_set_callback(subreq, 0, NULL, NULL);
.tfm = child, skcipher_request_set_crypt(subreq, req->src, req->dst, req->cryptlen,
.info = desc_in->info, req->iv);
.flags = desc_in->flags & ~CRYPTO_TFM_REQ_MAY_SLEEP,
};
kernel_fpu_begin(); kernel_fpu_begin();
err = crypto_blkcipher_crt(desc.tfm)->decrypt(&desc, dst, src, nbytes); err = crypto_skcipher_decrypt(subreq);
kernel_fpu_end(); kernel_fpu_end();
skcipher_request_zero(subreq);
return err; return err;
} }
static int crypto_fpu_init_tfm(struct crypto_tfm *tfm) static int crypto_fpu_init_tfm(struct crypto_skcipher *tfm)
{ {
struct crypto_instance *inst = crypto_tfm_alg_instance(tfm); struct skcipher_instance *inst = skcipher_alg_instance(tfm);
struct crypto_spawn *spawn = crypto_instance_ctx(inst); struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(tfm);
struct crypto_fpu_ctx *ctx = crypto_tfm_ctx(tfm); struct crypto_skcipher_spawn *spawn;
struct crypto_blkcipher *cipher; struct crypto_skcipher *cipher;
cipher = crypto_spawn_blkcipher(spawn); spawn = skcipher_instance_ctx(inst);
cipher = crypto_spawn_skcipher(spawn);
if (IS_ERR(cipher)) if (IS_ERR(cipher))
return PTR_ERR(cipher); return PTR_ERR(cipher);
ctx->child = cipher; ctx->child = cipher;
return 0; return 0;
} }
static void crypto_fpu_exit_tfm(struct crypto_tfm *tfm) static void crypto_fpu_exit_tfm(struct crypto_skcipher *tfm)
{
struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(tfm);
crypto_free_skcipher(ctx->child);
}
static void crypto_fpu_free(struct skcipher_instance *inst)
{ {
struct crypto_fpu_ctx *ctx = crypto_tfm_ctx(tfm); crypto_drop_skcipher(skcipher_instance_ctx(inst));
crypto_free_blkcipher(ctx->child); kfree(inst);
} }
static struct crypto_instance *crypto_fpu_alloc(struct rtattr **tb) static int crypto_fpu_create(struct crypto_template *tmpl, struct rtattr **tb)
{ {
struct crypto_instance *inst; struct crypto_skcipher_spawn *spawn;
struct crypto_alg *alg; struct skcipher_instance *inst;
struct crypto_attr_type *algt;
struct skcipher_alg *alg;
const char *cipher_name;
int err; int err;
err = crypto_check_attr_type(tb, CRYPTO_ALG_TYPE_BLKCIPHER); algt = crypto_get_attr_type(tb);
if (IS_ERR(algt))
return PTR_ERR(algt);
if ((algt->type ^ (CRYPTO_ALG_INTERNAL | CRYPTO_ALG_TYPE_SKCIPHER)) &
algt->mask)
return -EINVAL;
if (!(algt->mask & CRYPTO_ALG_INTERNAL))
return -EINVAL;
cipher_name = crypto_attr_alg_name(tb[1]);
if (IS_ERR(cipher_name))
return PTR_ERR(cipher_name);
inst = kzalloc(sizeof(*inst) + sizeof(*spawn), GFP_KERNEL);
if (!inst)
return -ENOMEM;
spawn = skcipher_instance_ctx(inst);
crypto_set_skcipher_spawn(spawn, skcipher_crypto_instance(inst));
err = crypto_grab_skcipher(spawn, cipher_name, CRYPTO_ALG_INTERNAL,
CRYPTO_ALG_INTERNAL | CRYPTO_ALG_ASYNC);
if (err) if (err)
return ERR_PTR(err); goto out_free_inst;
alg = crypto_get_attr_alg(tb, CRYPTO_ALG_TYPE_BLKCIPHER,
CRYPTO_ALG_TYPE_MASK);
if (IS_ERR(alg))
return ERR_CAST(alg);
inst = crypto_alloc_instance("fpu", alg);
if (IS_ERR(inst))
goto out_put_alg;
inst->alg.cra_flags = alg->cra_flags;
inst->alg.cra_priority = alg->cra_priority;
inst->alg.cra_blocksize = alg->cra_blocksize;
inst->alg.cra_alignmask = alg->cra_alignmask;
inst->alg.cra_type = alg->cra_type;
inst->alg.cra_blkcipher.ivsize = alg->cra_blkcipher.ivsize;
inst->alg.cra_blkcipher.min_keysize = alg->cra_blkcipher.min_keysize;
inst->alg.cra_blkcipher.max_keysize = alg->cra_blkcipher.max_keysize;
inst->alg.cra_ctxsize = sizeof(struct crypto_fpu_ctx);
inst->alg.cra_init = crypto_fpu_init_tfm;
inst->alg.cra_exit = crypto_fpu_exit_tfm;
inst->alg.cra_blkcipher.setkey = crypto_fpu_setkey;
inst->alg.cra_blkcipher.encrypt = crypto_fpu_encrypt;
inst->alg.cra_blkcipher.decrypt = crypto_fpu_decrypt;
out_put_alg:
crypto_mod_put(alg);
return inst;
}
static void crypto_fpu_free(struct crypto_instance *inst) alg = crypto_skcipher_spawn_alg(spawn);
{
crypto_drop_spawn(crypto_instance_ctx(inst)); err = crypto_inst_setname(skcipher_crypto_instance(inst), "fpu",
&alg->base);
if (err)
goto out_drop_skcipher;
inst->alg.base.cra_flags = CRYPTO_ALG_INTERNAL;
inst->alg.base.cra_priority = alg->base.cra_priority;
inst->alg.base.cra_blocksize = alg->base.cra_blocksize;
inst->alg.base.cra_alignmask = alg->base.cra_alignmask;
inst->alg.ivsize = crypto_skcipher_alg_ivsize(alg);
inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg);
inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg);
inst->alg.base.cra_ctxsize = sizeof(struct crypto_fpu_ctx);
inst->alg.init = crypto_fpu_init_tfm;
inst->alg.exit = crypto_fpu_exit_tfm;
inst->alg.setkey = crypto_fpu_setkey;
inst->alg.encrypt = crypto_fpu_encrypt;
inst->alg.decrypt = crypto_fpu_decrypt;
inst->free = crypto_fpu_free;
err = skcipher_register_instance(tmpl, inst);
if (err)
goto out_drop_skcipher;
out:
return err;
out_drop_skcipher:
crypto_drop_skcipher(spawn);
out_free_inst:
kfree(inst); kfree(inst);
goto out;
} }
static struct crypto_template crypto_fpu_tmpl = { static struct crypto_template crypto_fpu_tmpl = {
.name = "fpu", .name = "fpu",
.alloc = crypto_fpu_alloc, .create = crypto_fpu_create,
.free = crypto_fpu_free,
.module = THIS_MODULE, .module = THIS_MODULE,
}; };
......
This diff is collapsed.
...@@ -114,7 +114,7 @@ static inline void sha1_init_digest(uint32_t *digest) ...@@ -114,7 +114,7 @@ static inline void sha1_init_digest(uint32_t *digest)
} }
static inline uint32_t sha1_pad(uint8_t padblock[SHA1_BLOCK_SIZE * 2], static inline uint32_t sha1_pad(uint8_t padblock[SHA1_BLOCK_SIZE * 2],
uint32_t total_len) uint64_t total_len)
{ {
uint32_t i = total_len & (SHA1_BLOCK_SIZE - 1); uint32_t i = total_len & (SHA1_BLOCK_SIZE - 1);
......
...@@ -125,7 +125,7 @@ struct sha1_hash_ctx { ...@@ -125,7 +125,7 @@ struct sha1_hash_ctx {
/* error flag */ /* error flag */
int error; int error;
uint32_t total_length; uint64_t total_length;
const void *incoming_buffer; const void *incoming_buffer;
uint32_t incoming_buffer_length; uint32_t incoming_buffer_length;
uint8_t partial_block_buffer[SHA1_BLOCK_SIZE * 2]; uint8_t partial_block_buffer[SHA1_BLOCK_SIZE * 2];
......
...@@ -115,7 +115,7 @@ inline void sha256_init_digest(uint32_t *digest) ...@@ -115,7 +115,7 @@ inline void sha256_init_digest(uint32_t *digest)
} }
inline uint32_t sha256_pad(uint8_t padblock[SHA256_BLOCK_SIZE * 2], inline uint32_t sha256_pad(uint8_t padblock[SHA256_BLOCK_SIZE * 2],
uint32_t total_len) uint64_t total_len)
{ {
uint32_t i = total_len & (SHA256_BLOCK_SIZE - 1); uint32_t i = total_len & (SHA256_BLOCK_SIZE - 1);
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment