• Anton Blanchard's avatar
    powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch · b3f271e8
    Anton Blanchard authored
    Implement a POWER7 optimised memcpy using VMX and enhanced prefetch
    instructions.
    
    This is a copy of the POWER7 optimised copy_to_user/copy_from_user
    loop. Detailed implementation and performance details can be found in
    commit a66086b8 (powerpc: POWER7 optimised
    copy_to_user/copy_from_user using VMX).
    
    I noticed memcpy issues when profiling a RAID6 workload:
    
    	.memcpy
    	.async_memcpy
    	.async_copy_data
    	.__raid_run_ops
    	.handle_stripe
    	.raid5d
    	.md_thread
    
    I created a simplified testcase by building a RAID6 array with 4 1GB
    ramdisks (booting with brd.rd_size=1048576):
    
    # mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3]
    
    I then timed how long it took to write to the entire array:
    
    # dd if=/dev/zero of=/dev/md0 bs=1M
    
    Before: 892 MB/s
    After:  999 MB/s
    
    A 12% improvement.
    Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
    Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    b3f271e8
memcpy_64.S 3.64 KB