Abstract
Our main focus in this chapter is the IEEE 754-2008 Standard for Floating-Point Arithmetic [267] , a revision and merge of the earlier IEEE 754-1985 [12] and IEEE 854-1987 [13] standards. A paper written in 1981 by Kahan, Why Do We Need a Floating-Point Standard? [315], depicts the rather messy situation of floating-point arithmetic before the 1980s. Anybody who takes the view that the current standard is too constraining and that circuit and system manufacturers could build much more efficient machines without it should read that paper and think about it. Even if there were at that time a few reasonably good environments, the various systems available then were so different that writing portable yet reasonably efficient numerical software was extremely difficult. For instance, as pointed out in [553], sometimes a programmer had to insert multiplications by 1. 0 to make a program work reliably.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
IEEE is an acronym for the Institute of Electrical and Electronics Engineers. For more details, see https://www.ieee.org/about/index.html.
- 2.
Platforms may exchange character strings, or binary data. In the latter case, endianness problems (see Section 3.1.1.5) must be considered.
- 3.
Those of the form 01x11x111x, 10x11x111x, or 11x11x111x.
- 4.
The case where these floating-point numbers both have an odd least significant significand digit (this can occur in precision 1 only, possibly when converting a number such as 9. 5 into a decimal string for instance) has been forgotten in the standard, but for the next revision, it has been proposed—See http://speleotrove.com/misc/IEEE754-errata.html—to deliver the one larger in magnitude.
- 5.
This implies that the language standards must specify what that literal meaning is: order of operations, destination formats of operations, etc.
- 6.
This rule (which may help in implementing complex functions [317]) may seem strange, but the most important point is that any sequence of exact computations on real numbers will give the correct result, even when \(\sqrt{-0}\) is involved. Also let us recall that − 0 is regarded as a null value, not a negative number.
- 7.
In most cases, x ⋅ β n is exactly representable so that there is no rounding at all, but requiring correct rounding is the simplest way of defining what should be returned if the result is outside the normal range.
- 8.
This problem was already there in the 1985 version of the standard. The choice of not giving a clearer specification in IEEE 754-2008 probably results from the desire to keep existing implementations conforming to the standard.
- 9.
There are a few exceptions to this rule (but not with the basic arithmetic operations): for instance it is recommended that pow(+1, y) should be 1 even if y is a quiet NaN (this partial function being constant).
- 10.
These operations regard qNaN as missing data, but not sNaN.
- 11.
This means that in the case where a NaN is not regarded as missing data, qNaN must propagate even when the partial function is constant. For instance, the minimum function on (qNaN, −∞) will return qNaN instead of −∞.
- 12.
But it is the default on recent systems only.
- 13.
Which is not very realistic but suffices to get a rough estimate of the frequency of occurrences of double roundings.
- 14.
Warning! The instructions called FMADDs and so on from SPARC64 V, which share the same name and the same encoding with SPARC64 VI, are not real FMA instructions as they perform two roundings. [208, page 56].
- 15.
These instructions also suppress (“silent”) exceptions, which induce similar problems in case of out-of-order execution (although their correct management in hardware must be implemented anyway).
- 16.
The performance of AVX instructions can be degraded on some computations involving subnormals.
- 17.
CVE-2006-6499 / https://bugzilla.mozilla.org/show_bug.cgi?id=358569.
References
Advanced Micro Devices. 128-bit SSE5 instruction set, 2007. Available at http://pdinda.org/icsclass/doc/AMD_ARCH_MANUALS/AMD64_128_Bit_SSE5_Instrs.pdf.
B. Akbarpour, A. T. Abdel-Hamid, S. Tahar, and J. Harrison. Verifying a synthesized implementation of IEEE-754 floating-point exponential function using HOL. The Computer Journal, 53(4):465, 2010.
American National Standards Institute and Institute of Electrical and Electronic Engineers. IEEE Standard for Binary Floating-Point Arithmetic. ANSI/IEEE Standard 754–1985, 1985.
American National Standards Institute and Institute of Electrical and Electronic Engineers. IEEE Standard for Radix Independent Floating-Point Arithmetic. ANSI/IEEE Standard 854–1987, 1987.
A. Blot, J.-M. Muller, and L. Théry. Formal correctness of comparison algorithms between binary64 and decimal64 floating-point numbers. In 10th International Workshop on Numerical Software Verification (NSV), pages 25–37, 2017.
S. Boldo, J.-H. Jourdan, X. Leroy, and G. Melquiond. Verified compilation of floating-point computations. Journal of Automated Reasoning, 54(2):135–163, 2015.
R. T. Boute. The Euclidean definition of the functions div and mod. ACM Transactions on Programming Languages and Systems, 14(2):127–144, 1992.
N. Brisebarre, C. Lauter, M. Mezzarobba, and J.-M. Muller. Comparison between binary and decimal floating-point numbers. IEEE Transactions on Computers, 65(7):2032–2044, 2016.
N. Brunie, F. de Dinechin, and B. Dupont de Dinechin. A mixed-precision fused multiply and add. In 45th Asilomar Conference on Signals, Systems, and Computers, November 2011.
N. Brunie, F. de Dinechin, and B. Dupont de Dinechin. Mixed-precision merged multiplication and addition operator. Patent WO/2012/175828, December 2012.
T. C. Chen and I. T. Ho. Storage-efficient representation of decimal data. Communications of the ACM, 18(1):49–52, 1975.
W. J. Cody. MACHAR: a subroutine to dynamically determine machine parameters. ACM Transactions on Mathematical Software, 14(4):301–311, 1988.
W. J. Cody and W. Waite. Software Manual for the Elementary Functions. Prentice-Hall, Englewood Cliffs, NJ, 1980.
T. Coe and P. T. P. Tang. It takes six ones to reach a flaw. In 12th IEEE Symposium on Computer Arithmetic (ARITH-12), pages 140–146, July 1995.
S. Collange, M. Daumas, and D. Defour. État de l’intégration de la virgule flottante dans les processeurs graphiques. Technique et Science Informatiques, 27(6):719–733, 2008. In French.
M. Cornea, C. Anderson, J. Harrison, P. T. P. Tang, E. Schneider, and C. Tsen. A software implementation of the IEEE 754R decimal floating-point arithmetic using the binary encoding format. In 18th IEEE Symposium on Computer Arithmetic (ARITH-18), pages 29–37, June 2007.
M. Cornea, J. Harrison, C. Anderson, P. T. P. Tang, E. Schneider, and E. Gvozdev. A software implementation of the IEEE 754R decimal floating-point arithmetic using the binary encoding format. IEEE Transactions on Computers, 58(2):148–162, 2009.
M. Cornea, J. Harrison, and P. T. P. Tang. Scientific Computing on Itanium ®; -based Systems. Intel Press, Hillsboro, OR, 2002.
M. F. Cowlishaw. Decimal floating-point: algorism for computers. In 16th IEEE Symposium on Computer Arithmetic (ARITH-16), pages 104–111, June 2003.
A. Edelman. The mathematics of the Pentium division bug. SIAM Review, 39(1):54–67, 1997.
L. Eisen, J. W. Ward, H. W. Tast, N. Mäding, J. Leenstra, S. M. Mueller, C. Jacobi, J. Preiss, E. M. Schwarz, and S. R. Carlough. IBM POWER6 accelerators: VMX and DFU. IBM Journal of Research and Development, 51(6):1–21, 2007.
M. D. Ercegovac and T. Lang. Division and Square Root: Digit-Recurrence Algorithms and Implementations. Kluwer Academic Publishers, Boston, MA, 1994.
M. D. Ercegovac and T. Lang. Digital Arithmetic. Morgan Kaufmann Publishers, San Francisco, CA, 2004.
M. A. Erle, M. J. Schulte, and B. J. Hickmann. Decimal floating-point multiplication. IEEE Transactions on Computers, 58(7):902–916, 2009.
Fujitsu. SPARC64 TM VI Extensions. Fujitsu Limited, 1.3 edition, March 2007.
D. Goldberg. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, 23(1):5–48, 1991. An edited reprint is available at https://docs.oracle.com/cd/E19059-01/fortec6u2/806-7996/806-7996.pdf from Sun’s Numerical Computation Guide; it contains an addendum Differences Among IEEE 754 Implementations, also available at http://www.validlab.com/goldberg/addendum.html.
J. Harrison. Formal verification of floating-point trigonometric functions. In 3rd International Conference on Formal Methods in Computer-Aided Design (FMCAD), number 1954 in Lecture Notes in Computer Science, pages 217–233, Austin, TX, USA, 2000.
J. Harrison. Formal verification of IA-64 division algorithms. In 13th International Conference on Theorem Proving in Higher Order Logics (TPHOLs), volume 1869 of Lecture Notes in Computer Science, pages 233–251, 2000.
IEEE Computer Society. IEEE Standard for Floating-Point Arithmetic. IEEE Standard 754-2008, August 2008. Available at http://ieeexplore.ieee.org/servlet/opac?punumber=4610933.
Intel Corporation. Intrinsics Guide. Available at https://software.intel.com/sites/landingpage/IntrinsicsGuide/. Accessed in June 2017.
Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual, March 2017. Order Number: 325462-062US.
International Organization for Standardization. ISO/IEC/IEEE Standard 60559:2011: Information technology – Microprocessor Systems – Floating-Point arithmetic. International Electrotechnical Commission, 1st edition, 2011.
W. Kahan. Why do we need a floating-point standard? Technical report, Computer Science, UC Berkeley, 1981. Available at http://www.cs.berkeley.edu/~wkahan/ieee754status/why-ieee.pdf.
W. Kahan. Branch cuts for complex elementary functions. In The State of the Art in Numerical Analysis, pages 165–211, 1987.
W. Kahan. Lecture notes on the status of IEEE-754. Available at http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF, 1997.
R. Karpinsky. PARANOIA: a floating-point benchmark. BYTE, 10(2), 1985.
D. E. Knuth. The Art of Computer Programming, volume 2. Addison-Wesley, Reading, MA, 3rd edition, 1998.
D. Lutz. Fused multiply-add microarchitecture comprising separate early-normalizing multiply and add pipelines. In 20th IEEE Symposium on Computer Arithmetic (ARITH-20), pages 123–128, 2011.
D. Lutz and N. Burgess. Overcoming double-rounding errors under IEEE 754-2008 using software. In 44th Asilomar Conference on Signals, Systems, and Computers, pages 1399–1401, November 2010.
É. Martin-Dorel, G. Melquiond, and J.-M. Muller. Some issues related to double rounding. BIT Numerical Mathematics, 53(4):897–924, 2013.
D. Monniaux. The pitfalls of verifying floating-point computations. ACM TOPLAS, 30(3):1–41, 2008. A preliminary version is available at http://hal.archives-ouvertes.fr/hal-00128124.
J. S. Moore, T. Lynch, and M. Kaufmann. A mechanically checked proof of the correctness of the kernel of the AMD5K86 floating point division algorithm. IEEE Transactions on Computers, 47(9):913–926, 1998.
R. E. Moore. Interval analysis. Prentice Hall, 1966.
J.-M. Muller. Algorithmes de division pour microprocesseurs: illustration à l’aide du “bug” du pentium. Technique et Science Informatiques, 14(8), 1995.
S. F. Oberman and M. J. Flynn. Division algorithms and implementations. IEEE Transactions on Computers, 46(8):833–854, 1997.
S. F. Oberman and M. Y. Siu. A high-performance area-efficient multifunction interpolator. In 17th IEEE Symposium on Computer Arithmetic (ARITH-17), June 2005.
M. Parks. Number-theoretic test generation for directed roundings. IEEE Transactions on Computers, 49(7):651–658, 2000.
D. M. Russinoff. A mechanically checked proof of IEEE compliance of a register-transfer-level specification of the AMD-K7 floating-point multiplication, division, and square root instructions. LMS Journal of Computation and Mathematics, 1:148–200, 1998.
D. M. Russinoff. A mechanically checked proof of correctness of the AMD K5 floating point square root microcode. Formal Methods in System Design, 14(1):75–125, 1999.
D. M. Russinoff. A case study in formal verification of register-transfer logic with ACL2: The floating point adder of the AMD Athlon processor. Lecture Notes in Computer Science, 1954:3–36, 2000.
E. M. Schwarz, J. S. Kapernick, and M. F. Cowlishaw. Decimal floating-point support on the IBM system Z10 processor. IBM Journal of Research and Development, 53(1):36–45, 2009.
C. Severance. IEEE 754: An interview with William Kahan. Computer, 31(3):114–115, 1998.
Sun Microsystems. Interval arithmetic in high performance technical computing. Technical report, 2002.
N. Whitehead and A. Fit-Florea. Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs. Technical report, NVIDIA, 2011. Available at http://developer.download.nvidia.com/devzone/devcenter/cuda/files/NVIDIA-CUDA-Floating-Point.pdf.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Muller, JM. et al. (2018). Floating-Point Formats and Environment. In: Handbook of Floating-Point Arithmetic. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-76526-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-76526-6_3
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-76525-9
Online ISBN: 978-3-319-76526-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)