The first illustration of our verification methodology is a proof of correctness of a double-precision floating-point multiplier that supports both the binary FMUL instruction and the ternary FMA. In a typical implementation of the latter, addition is combined with multiplication in a single pipeline by inserting the addend into the multiplier’s compression tree as an additional partial product. The resulting FMA latency is somewhat greater than that of a simple multiplication but less than two successive operations. One drawback of this integrated approach is that the computation cannot be initiated until all three operands are available. Another is that in order to conserve area, hardware is typically shared with the operations of pure multiplication and addition, each of which is implemented as a degenerate case of FMA, resulting in increased latencies.
- 17.Lutz, D. R.: Fused multiply-add architecture comprising separate early-normalizing multiply and add pipelines. In: 20th IEEE Symposium on Computer Arithmetic (2011). Available at https://www.computer.org/csdl/proceedings/arith/2011/9457/00/05992117.pdf
- 20.Mentor Graphics Corp.: Sequential logic equivalence checker. https://www.mentor.com/products/fv/questa-slec