Skip to main content

Binary Floating-Point Unit Design

The fused multiply-add dataflow

  • Chapter
High-Performance Energy-Efficient Microprocessor Design

Part of the book series: Series on Integrated Circuits and Systems ((ICIR))

Abstract

Since 1990 many floating-point units have been designed using a fused multiply-add dataflow. This type of design has a huge performance advantage over a separate multiplier and adder. With one compound operation, effectively two dependent operations per cycle can be achieved. Even though a fused multiply-add dataflow is now common in today’s microprocessors, there are many details which have never been discussed in papers. This chapter shows the implementation of the different parts of the fused multiply-add dataflow including the counter tree, suppression of sign extension encoding, leading zero anticipation, and end around carry adder design. This chapter illustrates algorithms and implementation details used in today’s floating-point units that have been passed down from designer to designer, becoming the folklore of floating-point unit design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. “IEEE standard for floating-point arithmetic, ANSI/IEEE Std 754R,” The Institute of Electrical and Electronic Engineers, Inc., In progress, http://754r.ucbtest.org/drafts/754r.pdf.

    Google Scholar 

  2. Knuth, D. “The Art of Computer Programming, Vol. 2: Seminumerical Algorithms, 3rd ed.” Addison-Wesley, Reading, MA, 1998, 467–469.

    Google Scholar 

  3. Montoye, R.K.; Hokenek, E.; Runyon, S.L. “Design of the IBM RISC System/6000 floating-point execution unit”, IBM J. Res. Dev., 1990, 34(1), 59–70.

    Google Scholar 

  4. “Enterprise Systems Architecture/390 Principles of Operation”, Order No. SA22-7201-5, available through IBM branch offices, Sept 1998.

    Google Scholar 

  5. Waser, S.; Flynn, M.J. Introduction to Arithmetic for Digital Systems Designers, Holt, Rinehart, &Winston, 1982.

    Google Scholar 

  6. “IEEE standard for binary floating-point arithmetic, ANSI/IEEE Std 754-1985,” Institute of Electrical and Electronic Engineers, Inc., New York, Aug. 1985.

    Google Scholar 

  7. Intel Corporation, “Intel Itanium Architecture Sofware Developer’s Manual, Volume 1 Application Architecture,” ftp://download.intel.com/design/Itanium/Downloads/24531703s.pdf, Dec. 2001.

    Google Scholar 

  8. Intel Corporation, “IA-32 Intel Architecture Sofware Developer’s Manual, Volume 1: Basic Architecture,” ftp://download.intel.com/design/Pentium4/manuals/24547008.pdf, 1997.

    Google Scholar 

  9. Schwarz, E.; Schmookler, M.; Dao Trong, S. “FPU implementations with denormalized numbers”, IEEE Trans. Computers, 2005, 54(7), 825–836.

    Article  Google Scholar 

  10. Schwarz, E.; Schmookler, M.; Dao Trong, S. “Hardware Implementations of Denormalized Number Handling”, Proc. 16th IEEE Symp. on Computer Arith. Metic, June 2003, 70–78.

    Google Scholar 

  11. Booth, A.D. “Asigned multiplication technique”, Q. J. Mech. Appl. Math., 1951, 4(2), 236–240.

    Article  MATH  MathSciNet  Google Scholar 

  12. Vassiliadis, S.; Schwarz, E.; Hanrahan, D. “A general proof for overlapped multi-bit scanning multiplications”, IEEE Trans. Computers, 1998, 38(2), 172–183.

    Article  Google Scholar 

  13. Vassiliadis, S.; Schwarz, E.; Sung, B. “Hard-wired multipliers with encoded partial products,” IEEE Trans. Computers, 1991, 40(11), 1181–1197.

    Article  Google Scholar 

  14. Wallace, C.S. “A suggestion for parallel multipliers”, IEEE Trans. Electron. Comput., 1964, EC-13, 14–17.

    Article  Google Scholar 

  15. Dadda, L. “Some schemes for parallel multipliers”, Alta Frequenza, 1965, 34, 349–356.

    Google Scholar 

  16. Weinberger, A. “4:2 carry-save adder module”, IBM Technical Disclosure Bull., 1981, 23, 3811–3814.

    Google Scholar 

  17. Ohkubo N.; et al. “A 4.4 ns CMOS 54 × 54-b multiplier using pass-transistor multiplexer”, IEEE J. Solid-State Circuits, 1995, 30(3), 251–257.

    Article  Google Scholar 

  18. Richards, R.K. Arithmetic operations in digital computers, D. Van Nostrand Co., Inc., New York, 120, 1955, 120.

    Google Scholar 

  19. Beaumont-Smith, A.; Lim, C. “Parallel prefix adder design”, Proc. 15th IEEE Symp. Comp. Arith., Vail, June 2001, 218–225.

    Google Scholar 

  20. Hokenek, E.; Montoye, R.K. “Leading-zero anticipator (LZA) in the IBM RISC System/6000 floating-point execution unit”, IBM J. Res. Dev., 1990, 34(1), 71–77.

    Article  Google Scholar 

  21. Schmookler, M.S.; Nowka, K.J. “Leading zero anticipation and detection — a comparison of methods”, Proc. 15th IEEE Symp Computer Arithmetic, Vail, 11–13 June, 2001.

    Google Scholar 

  22. Oklobdzija, V. “An implementation algorithm and design of a novel leading zero detector circuit”, Proc. 26th Asilomar Conf. on Signals, Systems, and Computers, 1992, 391–395.

    Google Scholar 

  23. Oklobdzija, V. “An algorithmic and novel design of a leading zero detector circuit: comparison with logic synthesis”, IEEE Trans. on VLSI Systems, 1993 2(1), 124–128.

    Article  Google Scholar 

  24. Seidel, P.M. “Multiple path IEEE floating-point fused multiply-add”, Proc. 46th Int. IEEE Midwest Symp. Circuits and Systems (MWS-CAS), 2003.

    Google Scholar 

  25. Bruguera, J.D.; Lang, T. “Floating-point fused mulipy-add: reduced latency for floating-point addition”, Proc. 17th IEEE Symp. Computer Arithmetic, Hyannis, 27–29 June, 2005.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer

About this chapter

Cite this chapter

Schwarz, E.M. (2006). Binary Floating-Point Unit Design. In: Oklobdzija, V.G., Krishnamurthy, R.K. (eds) High-Performance Energy-Efficient Microprocessor Design. Series on Integrated Circuits and Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-34047-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-34047-0_8

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-28594-8

  • Online ISBN: 978-0-387-34047-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics