• David Sterba's avatar
    btrfs: add xxhash to fast checksum implementations · efcfcbc6
    David Sterba authored
    The implementation of XXHASH is now CPU only but still fast enough to be
    considered for the synchronous checksumming, like non-generic crc32c.
    
    A userspace benchmark comparing it to various implementations (patched
    hash-speedtest from btrfs-progs):
    
      Block size:     4096
      Iterations:     1000000
      Implementation: builtin
      Units:          CPU cycles
    
    	NULL-NOP: cycles:     73384294, cycles/i       73
         NULL-MEMCPY: cycles:    228033868, cycles/i      228,    61664.320 MiB/s
          CRC32C-ref: cycles:  24758559416, cycles/i    24758,      567.950 MiB/s
           CRC32C-NI: cycles:   1194350470, cycles/i     1194,    11773.433 MiB/s
      CRC32C-ADLERSW: cycles:   6150186216, cycles/i     6150,     2286.372 MiB/s
      CRC32C-ADLERHW: cycles:    626979180, cycles/i      626,    22427.453 MiB/s
          CRC32C-PCL: cycles:    466746732, cycles/i      466,    30126.699 MiB/s
    	  XXHASH: cycles:    860656400, cycles/i      860,    16338.188 MiB/s
    
    Comparing purely software implementation (ref), current outdated
    accelerated using crc32q instruction (NI), optimized implementations by
    M. Adler (https://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software/17646775#17646775)
    and the best one that was taken from kernel using the PCLMULQDQ
    instruction (PCL).
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    efcfcbc6
disk-io.c 139 KB