• Christopher Swenson's avatar
    math/big: Replace RCLQ + ANDQ with SETCS in unrolled arithmetic assembly. · baf426f1
    Christopher Swenson authored
    benchmark             old ns/op    new ns/op    delta
    BenchmarkAddVW_1              8            8   +0.60%
    BenchmarkAddVW_2             10            9   -8.64%
    BenchmarkAddVW_3             10           10   -4.63%
    BenchmarkAddVW_4             10           11   +3.67%
    BenchmarkAddVW_5             11           12   +5.98%
    BenchmarkAddVW_1e1           18           20   +6.38%
    BenchmarkAddVW_1e2          129          115  -10.85%
    BenchmarkAddVW_1e3         1270         1089  -14.25%
    BenchmarkAddVW_1e4        13376        12145   -9.20%
    BenchmarkAddVW_1e5       130392       125260   -3.94%
    
    benchmark              old MB/s     new MB/s  speedup
    BenchmarkAddVW_1        7709.10      7661.92    0.99x
    BenchmarkAddVW_2       12451.10     13604.00    1.09x
    BenchmarkAddVW_3       17727.81     18721.54    1.06x
    BenchmarkAddVW_4       23552.64     22708.81    0.96x
    BenchmarkAddVW_5       27411.40     25816.22    0.94x
    BenchmarkAddVW_1e1     34063.19     32023.06    0.94x
    BenchmarkAddVW_1e2     49529.97     55360.55    1.12x
    BenchmarkAddVW_1e3     50380.44     58764.18    1.17x
    BenchmarkAddVW_1e4     47843.59     52696.10    1.10x
    BenchmarkAddVW_1e5     49082.60     51093.66    1.04x
    
    R=gri, rsc, r
    CC=golang-dev
    https://golang.org/cl/6480063
    baf426f1
arith_amd64.s 7.37 KB