Definitions and Basic Notions

  • Jean-Michel Muller
  • Nicolas Brunie
  • Florent de Dinechin
  • Claude-Pierre Jeannerod
  • Mioara Joldes
  • Vincent Lefèvre
  • Guillaume Melquiond
  • Nathalie Revol
  • Serge Torres


As stated in the introduction, roughly speaking, a radix-β floating-point number x is a number of the form
$$\displaystyle{m \cdot \beta ^{e},}$$
where β is the radix of the floating-point system, m such that | m | < β is the significand of x, and e is its exponent.


  1. [35]
    F. Benford. The law of anomalous numbers. Proceedings of the American Philosophical Society, 78(4):551–572, 1938.zbMATHGoogle Scholar
  2. [63]
    R. P. Brent. On the precision attainable with various floating-point number systems. IEEE Transactions on Computers, C-22(6):601–607, 1973.MathSciNetCrossRefGoogle Scholar
  3. [77]
    W. S. Brown and P. L. Richman. The choice of base. Communications of the ACM, 12(10):560–561, 1969.MathSciNetCrossRefGoogle Scholar
  4. [92]
    P. E. Ceruzzi. The early computers of Konrad Zuse, 1935 to 1945. Annals of the History of Computing, 3(3):241–262, 1981.MathSciNetCrossRefGoogle Scholar
  5. [104]
    W. J. Cody. Static and dynamic numerical characteristics of floating-point arithmetic. IEEE Transactions on Computers, C-22(6):598–601, 1973.CrossRefGoogle Scholar
  6. [120]
    M. A. Cornea-Hasegan, R. A. Golliver, and P. Markstein. Correctness proofs outline for Newton–Raphson based floating-point divide and square root algorithms. In 14th IEEE Symposium on Computer Arithmetic (ARITH-14), pages 96–105, April 1999.Google Scholar
  7. [121]
    M. F. Cowlishaw. Decimal floating-point: algorism for computers. In 16th IEEE Symposium on Computer Arithmetic (ARITH-16), pages 104–111, June 2003.Google Scholar
  8. [158]
    T. J. Dekker. A floating-point technique for extending the available precision. Numerische Mathematik, 18(3):224–242, 1971.MathSciNetCrossRefGoogle Scholar
  9. [160]
    J. Demmel. Underflow and the reliability of numerical software. SIAM Journal on Scientific and Statistical Computing, 5(4):887–919, 1984.MathSciNetCrossRefGoogle Scholar
  10. [161]
    J. Demmel and H. D. Nguyen. Fast reproducible floating-point summation. In 21th IEEE Symposium on Computer Arithmetic (ARITH-21), pages 163–172, April 2013.Google Scholar
  11. [162]
    J. Demmel and H. D. Nguyen. Parallel reproducible summation. IEEE Transactions on Computers, 64(7):2060–2070, 2015.MathSciNetCrossRefGoogle Scholar
  12. [164]
    J. W. Demmel and X. Li. Faster numerical algorithms via exception handling. IEEE Transactions on Computers, 43(8):983–992, 1994.CrossRefGoogle Scholar
  13. [165]
    J. Demmel, P. Ahrens, and H. D. Nguyen. Efficient reproducible floating point summation and BLAS. Technical Report UCB/EECS-2016-121, EECS Department, University of California, Berkeley, June 2016.Google Scholar
  14. [202]
    G. E. Forsythe and C. B. Moler. Computer Solution of Linear Algebraic Systems. Prentice-Hall, Englewood Cliffs, NJ, 1967.zbMATHGoogle Scholar
  15. [214]
    D. Goldberg. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, 23(1):5–48, 1991. An edited reprint is available at from Sun’s Numerical Computation Guide; it contains an addendum Differences Among IEEE 754 Implementations, also available at Scholar
  16. [231]
    R. W. Hamming. On the distribution of numbers. The Bell System Technical Journal, 49:1609–1625, 1970. Reprinted in [583].MathSciNetCrossRefGoogle Scholar
  17. [240]
    J. Harrison. A machine-checked theory of floating point arithmetic. In 12th International Conference in Theorem Proving in Higher Order Logics (TPHOLs), volume 1690 of Lecture Notes in Computer Science, pages 113–130, Nice, France, September 1999.Google Scholar
  18. [243]
    J. Harrison. Floating-point verification using theorem proving. In Formal Methods for Hardware Verification, 6th International School on Formal Methods for the Design of Computer, Communication, and Software Systems, SFM 2006, volume 3965 of Lecture Notes in Computer Science, pages 211–242, Bertinoro, Italy, 2006.CrossRefGoogle Scholar
  19. [249]
    J. R. Hauser. Handling floating-point exceptions in numeric programs. ACM Transactions on Programming Languages and Systems, 18(2):139–174, 1996.CrossRefGoogle Scholar
  20. [256]
    D. J. Higham and N. J. Higham. MATLAB Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, third edition, 2017.Google Scholar
  21. [258]
    N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA, 2nd edition, 2002.CrossRefGoogle Scholar
  22. [264]
    T. E. Hull, T. F. Fairgrieve, and P. T. P. Tang. Implementing complex elementary functions using exception handling. ACM Transactions on Mathematical Software, 20(2):215–244, 1994.CrossRefGoogle Scholar
  23. [267]
    IEEE Computer Society. IEEE Standard for Floating-Point Arithmetic. IEEE Standard 754-2008, August 2008. Available at
  24. [274]
    International Organization for Standardization. Information technology — Language independent arithmetic — Part 2: Elementary numerical functions. ISO/IEC standard 10967-2, 2001.Google Scholar
  25. [303]
    C.-P. Jeannerod and S. M. Rump. On relative errors of floating-point operations: optimal bounds and applications. Mathematics of Computation, 2016. To appear.Google Scholar
  26. [317]
    W. Kahan. Branch cuts for complex elementary functions. In The State of the Art in Numerical Analysis, pages 165–211, 1987.Google Scholar
  27. [318]
    W. Kahan. Lecture notes on the status of IEEE-754. Available at, 1997.
  28. [321]
    W. Kahan. A logarithm too clever by half. Available at, 2004.
  29. [342]
    D. E. Knuth. The Art of Computer Programming, volume 2. Addison-Wesley, Reading, MA, 3rd edition, 1998.Google Scholar
  30. [352]
    H. Kuki and W. J. Cody. A statistical study of the accuracy of floating point number systems. Communications of the ACM, 16(4):223–230, 1973.MathSciNetCrossRefGoogle Scholar
  31. [406]
    P. Markstein. IA-64 and Elementary Functions: Speed and Precision. Hewlett-Packard Professional Books. Prentice-Hall, Englewood Cliffs, NJ, 2000.Google Scholar
  32. [413]
    W. M. McKeeman. Representation error for real numbers in binary computer arithmetic. IEEE Transactions on Electronic Computers, EC-16(5):682–683, 1967.CrossRefGoogle Scholar
  33. [439]
    J.-M. Muller. On the definition of ulp(x). Technical Report 2005-09, LIP Laboratory, ENS Lyon, 2005.Google Scholar
  34. [477]
    M. L. Overton. Numerical Computing with IEEE Floating Point Arithmetic. SIAM, Philadelphia, PA, 2001.CrossRefGoogle Scholar
  35. [508]
    N. Revol and P. Théveny. Numerical reproducibility and parallel computations: Issues for interval algorithms. IEEE Transactions on Computers, 63(8):1915–1924, 2014.MathSciNetCrossRefGoogle Scholar
  36. [513]
    R. Rojas. Konrad Zuse’s legacy: the architecture of the Z1 and Z3. IEEE Annals of the History of Computing, 19(2):5–16, 1997.CrossRefGoogle Scholar
  37. [527]
    S. M. Rump, F. Bünger, and C.-P. Jeannerod. Improved error bounds for floating-point products and Horner’s scheme. BIT Numerical Mathematics, 56(1):293–307, 2016.MathSciNetCrossRefGoogle Scholar
  38. [530]
    S. M. Rump and M. Lange. On the definition of unit roundoff. BIT Numerical Mathematics, 56(1):309–317, 2016.MathSciNetCrossRefGoogle Scholar
  39. [531]
    S. M. Rump, T. Ogita, and S. Oishi. Accurate floating-point summation part I: Faithful rounding. SIAM Journal on Scientific Computing, 31(1):189–224, 2008.MathSciNetCrossRefGoogle Scholar
  40. [549]
    E. M. Schwarz, M. Schmookler, and S. D. Trong. FPU implementations with denormalized numbers. IEEE Transactions on Computers, 54(7):825–836, 2005.CrossRefGoogle Scholar
  41. [553]
    C. Severance. IEEE 754: An interview with William Kahan. Computer, 31(3):114–115, 1998.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Jean-Michel Muller
    • 1
  • Nicolas Brunie
    • 2
  • Florent de Dinechin
    • 3
  • Claude-Pierre Jeannerod
    • 4
  • Mioara Joldes
    • 5
  • Vincent Lefèvre
    • 4
  • Guillaume Melquiond
    • 6
  • Nathalie Revol
    • 4
  • Serge Torres
    • 7
  1. 1.CNRS - LIPLyonFrance
  2. 2.KalrayGrenobleFrance
  3. 3.INSA-Lyon - CITIVilleurbanneFrance
  4. 4.Inria - LIPLyonFrance
  5. 5.CNRS - LAASToulouseFrance
  6. 6.Inria - LRIOrsayFrance
  7. 7.ENS-Lyon - LIPLyonFrance

Personalised recommendations