-
Xi Ruoyao authored
LA464 and LA664 can do 32-bit/64-bit integer multiplication with a latency of 4 cycles and a throughput of 2 ops per cycle. It is comparable to the mainstream x86 and arm64 cores, so we can select ARCH_HAS_FAST_MULTIPLIER like them. It speeds up __sw_hweight32() in lib/hweight.c for about 14% on LA464 and 11% on LA664, while __sw_hweight64() for about 30% on LA464 and 33% on LA664. Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2cce9059