• Christophe Leroy's avatar
    powerpc/64: optimises from64to32() · 55a0edf0
    Christophe Leroy authored
    The current implementation of from64to32() gives a poor result:
    
    0000000000000270 <.from64to32>:
     270:	38 00 ff ff 	li      r0,-1
     274:	78 69 00 22 	rldicl  r9,r3,32,32
     278:	78 00 00 20 	clrldi  r0,r0,32
     27c:	7c 60 00 38 	and     r0,r3,r0
     280:	7c 09 02 14 	add     r0,r9,r0
     284:	78 09 00 22 	rldicl  r9,r0,32,32
     288:	7c 00 4a 14 	add     r0,r0,r9
     28c:	78 03 00 20 	clrldi  r3,r0,32
     290:	4e 80 00 20 	blr
    
    This patch modifies from64to32() to operate in the same
    spirit as csum_fold()
    
    It swaps the two 32-bit halves of sum then it adds it with the
    unswapped sum. If there is a carry from adding the two 32-bit halves,
    it will carry from the lower half into the upper half, giving us the
    correct sum in the upper half.
    
    The resulting code is:
    
    0000000000000260 <.from64to32>:
     260:	78 60 00 02 	rotldi  r0,r3,32
     264:	7c 60 1a 14 	add     r3,r0,r3
     268:	78 63 00 22 	rldicl  r3,r3,32,32
     26c:	4e 80 00 20 	blr
    Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    55a0edf0
checksum.h 5.77 KB