• Linus Torvalds's avatar
    x86/csum: clean up `csum_partial' further · a476aae3
    Linus Torvalds authored
    Commit 688eb819 ("x86/csum: Improve performance of `csum_partial`")
    ended up improving the code generation for the IP csum calculations, and
    in particular special-casing the 40-byte case that is a hot case for
    IPv6 headers.
    
    It then had _another_ special case for the 64-byte unrolled loop, which
    did two chains of 32-byte blocks, which allows modern CPU's to improve
    performance by doing the chains in parallel thanks to renaming the carry
    flag.
    
    This just unifies the special cases and combines them into just one
    single helper the 40-byte csum case, and replaces the 64-byte case by a
    80-byte case that just does that single helper twice.  It avoids having
    all these different versions of inline assembly, and actually improved
    performance further in my tests.
    
    There was never anything magical about the 64-byte unrolled case, even
    though it happens to be a common size (and typically is the cacheline
    size).
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a476aae3
csum-partial_64.c 2.98 KB