Floating-Point Formats and Environment

Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Torres, Serge

doi:10.1007/978-3-319-76526-6_3

Jean-Michel Muller¹⁰,
Nicolas Brunie¹¹,
Florent de Dinechin¹²,
Claude-Pierre Jeannerod¹³,
Mioara Joldes¹⁴,
Vincent Lefèvre¹³,
Guillaume Melquiond¹⁵,
Nathalie Revol¹³ &
…
Serge Torres¹⁶

2663 Accesses

Abstract

Our main focus in this chapter is the IEEE 754-2008 Standard for Floating-Point Arithmetic [267] , a revision and merge of the earlier IEEE 754-1985 [12] and IEEE 854-1987 [13] standards. A paper written in 1981 by Kahan, Why Do We Need a Floating-Point Standard? [315], depicts the rather messy situation of floating-point arithmetic before the 1980s. Anybody who takes the view that the current standard is too constraining and that circuit and system manufacturers could build much more efficient machines without it should read that paper and think about it. Even if there were at that time a few reasonably good environments, the various systems available then were so different that writing portable yet reasonably efficient numerical software was extremely difficult. For instance, as pointed out in [553], sometimes a programmer had to insert multiplications by 1. 0 to make a program work reliably.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
IEEE is an acronym for the Institute of Electrical and Electronics Engineers. For more details, see https://www.ieee.org/about/index.html.
2.
Platforms may exchange character strings, or binary data. In the latter case, endianness problems (see Section 3.1.1.5) must be considered.
3.
Those of the form 01x11x111x, 10x11x111x, or 11x11x111x.
4.
The case where these floating-point numbers both have an odd least significant significand digit (this can occur in precision 1 only, possibly when converting a number such as 9. 5 into a decimal string for instance) has been forgotten in the standard, but for the next revision, it has been proposed—See http://speleotrove.com/misc/IEEE754-errata.html—to deliver the one larger in magnitude.
5.
This implies that the language standards must specify what that literal meaning is: order of operations, destination formats of operations, etc.
6.
This rule (which may help in implementing complex functions [317]) may seem strange, but the most important point is that any sequence of exact computations on real numbers will give the correct result, even when \(\sqrt{-0}\) is involved. Also let us recall that − 0 is regarded as a null value, not a negative number.
7.
In most cases, x ⋅ β ⁿ is exactly representable so that there is no rounding at all, but requiring correct rounding is the simplest way of defining what should be returned if the result is outside the normal range.
8.
This problem was already there in the 1985 version of the standard. The choice of not giving a clearer specification in IEEE 754-2008 probably results from the desire to keep existing implementations conforming to the standard.
9.
There are a few exceptions to this rule (but not with the basic arithmetic operations): for instance it is recommended that pow(+1, y) should be 1 even if y is a quiet NaN (this partial function being constant).
10.
These operations regard qNaN as missing data, but not sNaN.
11.
This means that in the case where a NaN is not regarded as missing data, qNaN must propagate even when the partial function is constant. For instance, the minimum function on (qNaN, −∞) will return qNaN instead of −∞.
12.
But it is the default on recent systems only.
13.
Which is not very realistic but suffices to get a rough estimate of the frequency of occurrences of double roundings.
14.
Warning! The instructions called FMADDs and so on from SPARC64 V, which share the same name and the same encoding with SPARC64 VI, are not real FMA instructions as they perform two roundings. [208, page 56].
15.
These instructions also suppress (“silent”) exceptions, which induce similar problems in case of out-of-order execution (although their correct management in hardware must be implemented anyway).
16.
The performance of AVX instructions can be degraded on some computations involving subnormals.
17.
CVE-2006-6499 / https://bugzilla.mozilla.org/show_bug.cgi?id=358569.

References

Advanced Micro Devices. 128-bit SSE5 instruction set, 2007. Available at http://pdinda.org/icsclass/doc/AMD_ARCH_MANUALS/AMD64_128_Bit_SSE5_Instrs.pdf.
B. Akbarpour, A. T. Abdel-Hamid, S. Tahar, and J. Harrison. Verifying a synthesized implementation of IEEE-754 floating-point exponential function using HOL. The Computer Journal, 53(4):465, 2010.
Article Google Scholar
American National Standards Institute and Institute of Electrical and Electronic Engineers. IEEE Standard for Binary Floating-Point Arithmetic. ANSI/IEEE Standard 754–1985, 1985.
Google Scholar
American National Standards Institute and Institute of Electrical and Electronic Engineers. IEEE Standard for Radix Independent Floating-Point Arithmetic. ANSI/IEEE Standard 854–1987, 1987.
Google Scholar
A. Blot, J.-M. Muller, and L. Théry. Formal correctness of comparison algorithms between binary64 and decimal64 floating-point numbers. In 10th International Workshop on Numerical Software Verification (NSV), pages 25–37, 2017.
Chapter Google Scholar
S. Boldo, J.-H. Jourdan, X. Leroy, and G. Melquiond. Verified compilation of floating-point computations. Journal of Automated Reasoning, 54(2):135–163, 2015.
Article MathSciNet Google Scholar
R. T. Boute. The Euclidean definition of the functions div and mod. ACM Transactions on Programming Languages and Systems, 14(2):127–144, 1992.
Article Google Scholar
N. Brisebarre, C. Lauter, M. Mezzarobba, and J.-M. Muller. Comparison between binary and decimal floating-point numbers. IEEE Transactions on Computers, 65(7):2032–2044, 2016.
Article MathSciNet Google Scholar
N. Brunie, F. de Dinechin, and B. Dupont de Dinechin. A mixed-precision fused multiply and add. In 45th Asilomar Conference on Signals, Systems, and Computers, November 2011.
Google Scholar
N. Brunie, F. de Dinechin, and B. Dupont de Dinechin. Mixed-precision merged multiplication and addition operator. Patent WO/2012/175828, December 2012.
Google Scholar
T. C. Chen and I. T. Ho. Storage-efficient representation of decimal data. Communications of the ACM, 18(1):49–52, 1975.
Article MathSciNet Google Scholar
W. J. Cody. MACHAR: a subroutine to dynamically determine machine parameters. ACM Transactions on Mathematical Software, 14(4):301–311, 1988.
Article Google Scholar
W. J. Cody and W. Waite. Software Manual for the Elementary Functions. Prentice-Hall, Englewood Cliffs, NJ, 1980.
MATH Google Scholar
T. Coe and P. T. P. Tang. It takes six ones to reach a flaw. In 12th IEEE Symposium on Computer Arithmetic (ARITH-12), pages 140–146, July 1995.
Google Scholar
S. Collange, M. Daumas, and D. Defour. État de l’intégration de la virgule flottante dans les processeurs graphiques. Technique et Science Informatiques, 27(6):719–733, 2008. In French.
Article Google Scholar
M. Cornea, C. Anderson, J. Harrison, P. T. P. Tang, E. Schneider, and C. Tsen. A software implementation of the IEEE 754R decimal floating-point arithmetic using the binary encoding format. In 18th IEEE Symposium on Computer Arithmetic (ARITH-18), pages 29–37, June 2007.
Google Scholar
M. Cornea, J. Harrison, C. Anderson, P. T. P. Tang, E. Schneider, and E. Gvozdev. A software implementation of the IEEE 754R decimal floating-point arithmetic using the binary encoding format. IEEE Transactions on Computers, 58(2):148–162, 2009.
Article MathSciNet Google Scholar
M. Cornea, J. Harrison, and P. T. P. Tang. Scientific Computing on Itanium ^®; -based Systems. Intel Press, Hillsboro, OR, 2002.
Google Scholar
M. F. Cowlishaw. Decimal floating-point: algorism for computers. In 16th IEEE Symposium on Computer Arithmetic (ARITH-16), pages 104–111, June 2003.
Google Scholar
A. Edelman. The mathematics of the Pentium division bug. SIAM Review, 39(1):54–67, 1997.
Article MathSciNet Google Scholar
L. Eisen, J. W. Ward, H. W. Tast, N. Mäding, J. Leenstra, S. M. Mueller, C. Jacobi, J. Preiss, E. M. Schwarz, and S. R. Carlough. IBM POWER6 accelerators: VMX and DFU. IBM Journal of Research and Development, 51(6):1–21, 2007.
Article Google Scholar
M. D. Ercegovac and T. Lang. Division and Square Root: Digit-Recurrence Algorithms and Implementations. Kluwer Academic Publishers, Boston, MA, 1994.
MATH Google Scholar
M. D. Ercegovac and T. Lang. Digital Arithmetic. Morgan Kaufmann Publishers, San Francisco, CA, 2004.
Google Scholar
M. A. Erle, M. J. Schulte, and B. J. Hickmann. Decimal floating-point multiplication. IEEE Transactions on Computers, 58(7):902–916, 2009.
Article MathSciNet Google Scholar
Fujitsu. SPARC64 ^TM VI Extensions. Fujitsu Limited, 1.3 edition, March 2007.
Google Scholar
D. Goldberg. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, 23(1):5–48, 1991. An edited reprint is available at https://docs.oracle.com/cd/E19059-01/fortec6u2/806-7996/806-7996.pdf from Sun’s Numerical Computation Guide; it contains an addendum Differences Among IEEE 754 Implementations, also available at http://www.validlab.com/goldberg/addendum.html.
Article Google Scholar
J. Harrison. Formal verification of floating-point trigonometric functions. In 3rd International Conference on Formal Methods in Computer-Aided Design (FMCAD), number 1954 in Lecture Notes in Computer Science, pages 217–233, Austin, TX, USA, 2000.
Google Scholar
J. Harrison. Formal verification of IA-64 division algorithms. In 13th International Conference on Theorem Proving in Higher Order Logics (TPHOLs), volume 1869 of Lecture Notes in Computer Science, pages 233–251, 2000.
Google Scholar
IEEE Computer Society. IEEE Standard for Floating-Point Arithmetic. IEEE Standard 754-2008, August 2008. Available at http://ieeexplore.ieee.org/servlet/opac?punumber=4610933.
Intel Corporation. Intrinsics Guide. Available at https://software.intel.com/sites/landingpage/IntrinsicsGuide/. Accessed in June 2017.
Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual, March 2017. Order Number: 325462-062US.
Google Scholar
International Organization for Standardization. ISO/IEC/IEEE Standard 60559:2011: Information technology – Microprocessor Systems – Floating-Point arithmetic. International Electrotechnical Commission, 1st edition, 2011.
Google Scholar
W. Kahan. Why do we need a floating-point standard? Technical report, Computer Science, UC Berkeley, 1981. Available at http://www.cs.berkeley.edu/~wkahan/ieee754status/why-ieee.pdf.
W. Kahan. Branch cuts for complex elementary functions. In The State of the Art in Numerical Analysis, pages 165–211, 1987.
Google Scholar
W. Kahan. Lecture notes on the status of IEEE-754. Available at http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF, 1997.
R. Karpinsky. PARANOIA: a floating-point benchmark. BYTE, 10(2), 1985.
Google Scholar
D. E. Knuth. The Art of Computer Programming, volume 2. Addison-Wesley, Reading, MA, 3rd edition, 1998.
Google Scholar
D. Lutz. Fused multiply-add microarchitecture comprising separate early-normalizing multiply and add pipelines. In 20th IEEE Symposium on Computer Arithmetic (ARITH-20), pages 123–128, 2011.
Google Scholar
D. Lutz and N. Burgess. Overcoming double-rounding errors under IEEE 754-2008 using software. In 44th Asilomar Conference on Signals, Systems, and Computers, pages 1399–1401, November 2010.
Google Scholar
É. Martin-Dorel, G. Melquiond, and J.-M. Muller. Some issues related to double rounding. BIT Numerical Mathematics, 53(4):897–924, 2013.
Article MathSciNet Google Scholar
D. Monniaux. The pitfalls of verifying floating-point computations. ACM TOPLAS, 30(3):1–41, 2008. A preliminary version is available at http://hal.archives-ouvertes.fr/hal-00128124.
J. S. Moore, T. Lynch, and M. Kaufmann. A mechanically checked proof of the correctness of the kernel of the AMD5K86 floating point division algorithm. IEEE Transactions on Computers, 47(9):913–926, 1998.
Article MathSciNet Google Scholar
R. E. Moore. Interval analysis. Prentice Hall, 1966.
Google Scholar
J.-M. Muller. Algorithmes de division pour microprocesseurs: illustration à l’aide du “bug” du pentium. Technique et Science Informatiques, 14(8), 1995.
Google Scholar
S. F. Oberman and M. J. Flynn. Division algorithms and implementations. IEEE Transactions on Computers, 46(8):833–854, 1997.
Article MathSciNet Google Scholar
S. F. Oberman and M. Y. Siu. A high-performance area-efficient multifunction interpolator. In 17th IEEE Symposium on Computer Arithmetic (ARITH-17), June 2005.
Google Scholar
M. Parks. Number-theoretic test generation for directed roundings. IEEE Transactions on Computers, 49(7):651–658, 2000.
Article MathSciNet Google Scholar
D. M. Russinoff. A mechanically checked proof of IEEE compliance of a register-transfer-level specification of the AMD-K7 floating-point multiplication, division, and square root instructions. LMS Journal of Computation and Mathematics, 1:148–200, 1998.
Article MathSciNet Google Scholar
D. M. Russinoff. A mechanically checked proof of correctness of the AMD K5 floating point square root microcode. Formal Methods in System Design, 14(1):75–125, 1999.
Article Google Scholar
D. M. Russinoff. A case study in formal verification of register-transfer logic with ACL2: The floating point adder of the AMD Athlon processor. Lecture Notes in Computer Science, 1954:3–36, 2000.
Google Scholar
E. M. Schwarz, J. S. Kapernick, and M. F. Cowlishaw. Decimal floating-point support on the IBM system Z10 processor. IBM Journal of Research and Development, 53(1):36–45, 2009.
Article Google Scholar
C. Severance. IEEE 754: An interview with William Kahan. Computer, 31(3):114–115, 1998.
Article Google Scholar
Sun Microsystems. Interval arithmetic in high performance technical computing. Technical report, 2002.
Google Scholar
N. Whitehead and A. Fit-Florea. Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs. Technical report, NVIDIA, 2011. Available at http://developer.download.nvidia.com/devzone/devcenter/cuda/files/NVIDIA-CUDA-Floating-Point.pdf.

Download references

Author information

Authors and Affiliations

CNRS - LIP, Lyon, France
Jean-Michel Muller
Kalray, Grenoble, France
Nicolas Brunie
INSA-Lyon - CITI, Villeurbanne, France
Florent de Dinechin
Inria - LIP, Lyon, France
Claude-Pierre Jeannerod, Vincent Lefèvre & Nathalie Revol
CNRS - LAAS, Toulouse, France
Mioara Joldes
Inria - LRI, Orsay, France
Guillaume Melquiond
ENS-Lyon - LIP, Lyon, France
Serge Torres

Authors

Jean-Michel Muller
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Brunie
View author publications
You can also search for this author in PubMed Google Scholar
Florent de Dinechin
View author publications
You can also search for this author in PubMed Google Scholar
Claude-Pierre Jeannerod
View author publications
You can also search for this author in PubMed Google Scholar
Mioara Joldes
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Lefèvre
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Melquiond
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Revol
View author publications
You can also search for this author in PubMed Google Scholar
Serge Torres
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Muller, JM. et al. (2018). Floating-Point Formats and Environment. In: Handbook of Floating-Point Arithmetic. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-76526-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-76526-6_3
Published: 03 May 2018
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-76525-9
Online ISBN: 978-3-319-76526-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics