• Jason A. Donenfeld's avatar
    crypto: x86/curve25519 - replace with formally verified implementation · 07b586fe
    Jason A. Donenfeld authored
    This comes from INRIA's HACL*/Vale. It implements the same algorithm and
    implementation strategy as the code it replaces, only this code has been
    formally verified, sans the base point multiplication, which uses code
    similar to prior, only it uses the formally verified field arithmetic
    alongside reproducable ladder generation steps. This doesn't have a
    pure-bmi2 version, which means haswell no longer benefits, but the
    increased (doubled) code complexity is not worth it for a single
    generation of chips that's already old.
    
    Performance-wise, this is around 1% slower on older microarchitectures,
    and slightly faster on newer microarchitectures, mainly 10nm ones or
    backports of 10nm to 14nm. This implementation is "everest" below:
    
    Xeon E5-2680 v4 (Broadwell)
    
         armfazh: 133340 cycles per call
         everest: 133436 cycles per call
    
    Xeon Gold 5120 (Sky Lake Server)
    
         armfazh: 112636 cycles per call
         everest: 113906 cycles per call
    
    Core i5-6300U (Sky Lake Client)
    
         armfazh: 116810 cycles per call
         everest: 117916 cycles per call
    
    Core i7-7600U (Kaby Lake)
    
         armfazh: 119523 cycles per call
         everest: 119040 cycles per call
    
    Core i7-8750H (Coffee Lake)
    
         armfazh: 113914 cycles per call
         everest: 113650 cycles per call
    
    Core i9-9880H (Coffee Lake Refresh)
    
         armfazh: 112616 cycles per call
         everest: 114082 cycles per call
    
    Core i3-8121U (Cannon Lake)
    
         armfazh: 113202 cycles per call
         everest: 111382 cycles per call
    
    Core i7-8265U (Whiskey Lake)
    
         armfazh: 127307 cycles per call
         everest: 127697 cycles per call
    
    Core i7-8550U (Kaby Lake Refresh)
    
         armfazh: 127522 cycles per call
         everest: 127083 cycles per call
    
    Xeon Platinum 8275CL (Cascade Lake)
    
         armfazh: 114380 cycles per call
         everest: 114656 cycles per call
    
    Achieving these kind of results with formally verified code is quite
    remarkable, especialy considering that performance is favorable for
    newer chips.
    Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    07b586fe
curve25519-x86_64.c 60.6 KB