• Christophe Leroy's avatar
    powerpc/net: Implement powerpc specific csum_shift() to remove branch · 3af722cb
    Christophe Leroy authored
    Today's implementation of csum_shift() leads to branching based on
    parity of 'offset'
    
    	000002f8 <csum_block_add>:
    	     2f8:	70 a5 00 01 	andi.   r5,r5,1
    	     2fc:	41 a2 00 08 	beq     304 <csum_block_add+0xc>
    	     300:	54 84 c0 3e 	rotlwi  r4,r4,24
    	     304:	7c 63 20 14 	addc    r3,r3,r4
    	     308:	7c 63 01 94 	addze   r3,r3
    	     30c:	4e 80 00 20 	blr
    
    Use first bit of 'offset' directly as input of the rotation instead of
    branching.
    
    	000002f8 <csum_block_add>:
    	     2f8:	54 a5 1f 38 	rlwinm  r5,r5,3,28,28
    	     2fc:	20 a5 00 20 	subfic  r5,r5,32
    	     300:	5c 84 28 3e 	rotlw   r4,r4,r5
    	     304:	7c 63 20 14 	addc    r3,r3,r4
    	     308:	7c 63 01 94 	addze   r3,r3
    	     30c:	4e 80 00 20 	blr
    
    And change to left shift instead of right shift to skip one more
    instruction. This has no impact on the final sum.
    
    	000002f8 <csum_block_add>:
    	     2f8:	54 a5 1f 38 	rlwinm  r5,r5,3,28,28
    	     2fc:	5c 84 28 3e 	rotlw   r4,r4,r5
    	     300:	7c 63 20 14 	addc    r3,r3,r4
    	     304:	7c 63 01 94 	addze   r3,r3
    	     308:	4e 80 00 20 	blr
    
    Seems like only powerpc benefits from a branchless implementation.
    Other main architectures like ARM or X86 get better code with
    the generic implementation and its branch.
    Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    3af722cb
checksum.h 4.88 KB