• Ard Biesheuvel's avatar
    md/raid6: use faster multiplication for ARM NEON delta syndrome · 35129dde
    Ard Biesheuvel authored
    The P/Q left side optimization in the delta syndrome simply involves
    repeatedly multiplying a value by polynomial 'x' in GF(2^8). Given
    that 'x * x * x * x' equals 'x^4' even in the polynomial world, we
    can accelerate this substantially by performing up to 4 such operations
    at once, using the NEON instructions for polynomial multiplication.
    
    Results on a Cortex-A57 running in 64-bit mode:
    
      Before:
      -------
      raid6: neonx1   xor()  1680 MB/s
      raid6: neonx2   xor()  2286 MB/s
      raid6: neonx4   xor()  3162 MB/s
      raid6: neonx8   xor()  3389 MB/s
    
      After:
      ------
      raid6: neonx1   xor()  2281 MB/s
      raid6: neonx2   xor()  3362 MB/s
      raid6: neonx4   xor()  3787 MB/s
      raid6: neonx8   xor()  4239 MB/s
    
    While we're at it, simplify MASK() by using a signed shift rather than
    a vector compare involving a temp register.
    Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
    Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    35129dde
neon.uc 3.88 KB