• Jussi Kivilinna's avatar
    crypto: cast5-avx - tune assembler code for more performance · ddaea786
    Jussi Kivilinna authored
    Patch replaces 'movb' instructions with 'movzbl' to break false register
    dependencies, interleaves instructions better for out-of-order scheduling
    and merges constant 16-bit rotation with round-key variable rotation.
    
    tcrypt ECB results (128bit key):
    
    Intel Core i5-2450M:
    
    size    old-vs-new      new-vs-generic  old-vs-generic
            enc     dec     enc     dec     enc     dec
    256     1.18x   1.18x   2.45x   2.47x   2.08x   2.10x
    1k      1.20x   1.20x   2.73x   2.73x   2.28x   2.28x
    8k      1.20x   1.19x   2.73x   2.73x   2.28x   2.29x
    
    [v2]
     - Do instruction interleaving another way to avoid adding new FPU<=>CPU
       register moves as these cause performance drop on Bulldozer.
     - Improvements to round-key variable rotation handling.
     - Further interleaving improvements for better out-of-order scheduling.
    
    Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
    Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    ddaea786
cast5-avx-x86_64-asm_64.S 8.94 KB