• Eric Biggers's avatar
    crypto: x86/crct10dif-pcl - cleanup and optimizations · 0974037f
    Eric Biggers authored
    The x86, arm, and arm64 asm implementations of crct10dif are very
    difficult to understand partly because many of the comments, labels, and
    macros are named incorrectly: the lengths mentioned are usually off by a
    factor of two from the actual code.  Many other things are unnecessarily
    convoluted as well, e.g. there are many more fold constants than
    actually needed and some aren't fully reduced.
    
    This series therefore cleans up all these implementations to be much
    more maintainable.  I also made some small optimizations where I saw
    opportunities, resulting in slightly better performance.
    
    This patch cleans up the x86 version.
    
    As part of this, I removed support for len < 16 from the x86 assembly;
    now the glue code falls back to the generic table-based implementation
    in this case.  Due to the overhead of kernel_fpu_begin(), this actually
    significantly improves performance on these lengths.  (And even if
    kernel_fpu_begin() were free, the generic code is still faster for about
    len < 11.)  This removal also eliminates error-prone special cases and
    makes the x86, arm32, and arm64 ports of the code match more closely.
    Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
    Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    0974037f
crct10dif-pcl-asm_64.S 10.9 KB