• smasher164's avatar
    cmd/compile: add fma intrinsic for amd64 · 7a6da218
    smasher164 authored
    To permit ssa-level optimization, this change introduces an amd64 intrinsic
    that generates the VFMADD231SD instruction for the fused-multiply-add
    operation on systems that support it. System support is detected via
    cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic
    ("Fma") to VFMADD231SD.
    
    The benchmark compares the software implementation (old) with the intrinsic
    (new).
    
    name   old time/op  new time/op  delta
    Fma-4  27.2ns ± 1%   1.0ns ± 9%  -96.48%  (p=0.008 n=5+5)
    
    Updates #25819.
    
    Change-Id: I966655e5f96817a5d06dff5942418a3915b09584
    Reviewed-on: https://go-review.googlesource.com/c/go/+/137156
    Run-TryBot: Keith Randall <khr@golang.org>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: default avatarKeith Randall <khr@golang.org>
    7a6da218
builtin.go 15.6 KB