-
George Spelvin authored
There's no need for the K_table to be made of 64-bit words. For some reason, the original authors didn't fully reduce the values modulo the CRC32C polynomial, and so had some 33-bit values in there. They can all be reduced to 32 bits. Doing that cuts the table size in half. Since the code depends on both pclmulq and crc32, SSE 4.1 is obviously present, so we can use pmovzxdq to fetch it in the correct format. This adds (measured on Ivy Bridge) 1 cycle per main loop iteration (CRC of up to 3K bytes), less than 0.2%. The hope is that the reduced D-cache footprint will make up the loss in other code. Two other related fixes: * K_table is read-only, so belongs in .rodata, and * There's no need for more than 8-byte alignment Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: George Spelvin <linux@horizon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
473946e6