• Bill O'Farrell's avatar
    math: use SIMD to accelerate additional scalar math functions on s390x · 88672de7
    Bill O'Farrell authored
    As necessary, math functions were structured to use stubs, so that they can
    be accelerated with assembly on any platform.
    
    Technique used was minimax polynomial approximation using tables of
    polynomial coefficients, with argument range reduction.
    
    Benchmark         New     Old     Speedup
    BenchmarkAcos     12.2    47.5    3.89
    BenchmarkAcosh    18.5    56.2    3.04
    BenchmarkAsin     13.1    40.6    3.10
    BenchmarkAsinh    19.4    62.8    3.24
    BenchmarkAtan     10.1    23      2.28
    BenchmarkAtanh    19.1    53.2    2.79
    BenchmarkAtan2    16.5    33.9    2.05
    BenchmarkCbrt     14.8    58      3.92
    BenchmarkErf      10.8    20.1    1.86
    BenchmarkErfc     11.2    23.5    2.10
    BenchmarkExp      8.77    53.8    6.13
    BenchmarkExpm1    10.1    38.3    3.79
    BenchmarkLog      13.1    40.1    3.06
    BenchmarkLog1p    12.7    38.3    3.02
    BenchmarkPowInt   31.7    40.5    1.28
    BenchmarkPowFrac  33.1    141     4.26
    BenchmarkTan      11.5    30      2.61
    
    Accuracy was tested against a high precision
    reference function to determine maximum error.
    Note: ulperr is error in "units in the last place"
    
           max
          ulperr
    Acos  1.15
    Acosh 1.07
    Asin  2.22
    Asinh 1.72
    Atan  1.41
    Atanh 3.00
    Atan2 1.45
    Cbrt  1.18
    Erf   1.29
    Erfc  4.82
    Exp   1.00
    Expm1 2.26
    Log   0.94
    Log1p 2.39
    Tan   3.14
    
    Pow will have 99.99% correctly rounded results with reasonable inputs
    producing numeric (non Inf or NaN) results
    
    Change-Id: I850e8cf7b70426e8b54ec49d74acd4cddc8c6cb2
    Reviewed-on: https://go-review.googlesource.com/38585Reviewed-by: default avatarMichael Munday <munday@ca.ibm.com>
    Run-TryBot: Michael Munday <munday@ca.ibm.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    88672de7
expm1_s390x.s 5.6 KB