-
Roger He authored
This can improve performance for some cases. v2 (chk): handle all sizes, simplify the patch quite a bit v3 (chk): adjust dw estimation as well v4 (chk): use single loop, make end mask 64bit Signed-off-by: Roger He <Hongbo.He@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Tested-by: Roger He <Hongbo.He@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6849d47c