Basic Properties and Algorithms

  • Jean-Michel Muller
  • Nicolas Brunie
  • Florent de Dinechin
  • Claude-Pierre Jeannerod
  • Mioara Joldes
  • Vincent Lefèvre
  • Guillaume Melquiond
  • Nathalie Revol
  • Serge Torres


In this chapter, we present some short yet useful algorithms and some basic properties that can be derived from specifications of floating-point arithmetic systems, such as the ones given in the successive IEEE 754 standards. Thanks to these standards, we now have an accurate definition of floating-point formats and operations. The behavior of a sequence of operations becomes at least partially for more details on this). We therefore can build algorithms and proofs that refer to these specifications.


  1. [15]
    S. F. Anderson, J. G. Earle, R. E. Goldschmidt, and D. M. Powers. The IBM 360/370 model 91: floating-point execution unit. IBM Journal of Research and Development, 1967. Reprinted in [583].Google Scholar
  2. [16]
    M. Andrysco, R. Jhala, and S. Lerner. Printing floating-point numbers: A faster, always correct method. In 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 555–567, 2016.Google Scholar
  3. [41]
    G. Bohlender, W. Walter, P. Kornerup, and D. W. Matula. Semantics for exact floating point operations. In 10th IEEE Symposium on Computer Arithmetic (ARITH-10), pages 22–26, June 1991.Google Scholar
  4. [42]
    S. Boldo. Pitfalls of a full floating-point proof: example on the formal proof of the Veltkamp/Dekker algorithms. In 3rd International Joint Conference on Automated Reasoning (IJCAR), volume 4130 of Lecture Notes in Computer Science, pages 52–66, Seattle, WA, USA, 2006.CrossRefGoogle Scholar
  5. [46]
    S. Boldo and M. Daumas. Representable correcting terms for possibly underflowing floating point operations. In 16th IEEE Symposium on Computer Arithmetic (ARITH-16), pages 79–86, Santiago de Compostela, Spain, 2003.Google Scholar
  6. [48]
    S. Boldo, M. Daumas, C. Moreau-Finot, and L. Théry. Computer validated proofs of a toolset for adaptable arithmetic. Technical report, École Normale Supérieure de Lyon, 2001. Available at
  7. [49]
    S. Boldo, S. Graillat, and J.-M. Muller. On the robustness of the 2Sum and Fast2Sum algorithms. ACM Transactions on Mathematical Software, 44(1):4:1–4:14, 2017.MathSciNetCrossRefGoogle Scholar
  8. [51]
    S. Boldo, J.-H. Jourdan, X. Leroy, and G. Melquiond. Verified compilation of floating-point computations. Journal of Automated Reasoning, 54(2):135–163, 2015.MathSciNetCrossRefGoogle Scholar
  9. [52]
    S. Boldo and G. Melquiond. Emulation of FMA and correctly rounded sums: proved algorithms using rounding to odd. IEEE Transactions on Computers, 57(4):462–471, 2008.MathSciNetCrossRefGoogle Scholar
  10. [54]
    S. Boldo and G. Melquiond. Computer Arithmetic and Formal Proofs. ISTE Press – Elsevier, 2017.Google Scholar
  11. [55]
    S. Boldo and J.-M. Muller. Exact and approximated error of the FMA. IEEE Transactions on Computers, 60(2):157–164, 2011.MathSciNetCrossRefGoogle Scholar
  12. [57]
    A. D. Booth. A signed binary multiplication technique. Quarterly Journal of Mechanics and Applied Mathematics, 4(2):236–240, 1951. Reprinted in [583].Google Scholar
  13. [74]
    N. Brisebarre and J.-M. Muller. Correctly rounded multiplication by arbitrary precision constants. IEEE Transactions on Computers, 57(2):165–174, 2008.MathSciNetCrossRefGoogle Scholar
  14. [86]
    R. G. Burger and R. K. Dybvig. Printing floating-point numbers quickly and accurately. In SIGPLAN’96 Conference on Programming Languages Design and Implementation (PLDI), pages 108–116, June 1996.Google Scholar
  15. [101]
    W. D. Clinger. How to read floating-point numbers accurately. ACM SIGPLAN Notices, 25(6):92–101, 1990.CrossRefGoogle Scholar
  16. [102]
    W. D. Clinger. Retrospective: how to read floating-point numbers accurately. ACM SIGPLAN Notices, 39(4):360–371, 2004.CrossRefGoogle Scholar
  17. [118]
    M. Cornea, J. Harrison, and P. T. P. Tang. Scientific Computing on Itanium ®; -based Systems. Intel Press, Hillsboro, OR, 2002.Google Scholar
  18. [120]
    M. A. Cornea-Hasegan, R. A. Golliver, and P. Markstein. Correctness proofs outline for Newton–Raphson based floating-point divide and square root algorithms. In 14th IEEE Symposium on Computer Arithmetic (ARITH-14), pages 96–105, April 1999.Google Scholar
  19. [128]
    A. Dahan-Dalmedico and J. Pfeiffer. Histoire des Mathématiques. Editions du Seuil, Paris, 1986. In French.Google Scholar
  20. [129]
    C. Daramy-Loirat, D. Defour, F. de Dinechin, M. Gallet, N. Gast, C. Q. Lauter, and J.-M. Muller. CR-LIBM, a library of correctly-rounded elementary functions in double-precision. Technical report, LIP Laboratory, Arenaire team, December 2006. Available at
  21. [130]
    D. Das Sarma and D. W. Matula. Measuring the accuracy of ROM reciprocal tables. IEEE Transactions on Computers, 43(8):932–940, 1994.CrossRefGoogle Scholar
  22. [131]
    D. Das Sarma and D. W. Matula. Faithful bipartite ROM reciprocal tables. In 12th IEEE Symposium on Computer Arithmetic (ARITH-12), pages 17–28, June 1995.Google Scholar
  23. [132]
    D. Das Sarma and D. W. Matula. Faithful interpolation in reciprocal tables. In 13th IEEE Symposium on Computer Arithmetic (ARITH-13), pages 82–91, July 1997.Google Scholar
  24. [142]
    F. de Dinechin, A. V. Ershov, and N. Gast. Towards the post-ultimate libm. In 17th IEEE Symposium on Computer Arithmetic (ARITH-17), pages 288–295, 2005.Google Scholar
  25. [156]
    F. de Dinechin and A. Tisserand. Multipartite table methods. IEEE Transactions on Computers, 54(3):319–330, 2005.CrossRefGoogle Scholar
  26. [158]
    T. J. Dekker. A floating-point technique for extending the available precision. Numerische Mathematik, 18(3):224–242, 1971.MathSciNetCrossRefGoogle Scholar
  27. [186]
    M. D. Ercegovac and T. Lang. Division and Square Root: Digit-Recurrence Algorithms and Implementations. Kluwer Academic Publishers, Boston, MA, 1994.zbMATHGoogle Scholar
  28. [187]
    M. D. Ercegovac and T. Lang. Digital Arithmetic. Morgan Kaufmann Publishers, San Francisco, CA, 2004.Google Scholar
  29. [189]
    M. Ercegovac, J.-M. Muller, and A. Tisserand. Simple seed architectures for reciprocal and square root reciprocal. In 39th Asilomar Conference on Signals, Systems, and Computers, November 2005.Google Scholar
  30. [197]
    G. Even, P.-M. Seidel, and W. E. Ferguson. A parametric error analysis of Goldschmidt’s division algorithm. Journal of Computer and System Sciences, 70(1):118–139, 2005.MathSciNetCrossRefGoogle Scholar
  31. [199]
    W. E. Ferguson, Jr. Exact computation of a sum or difference with applications to argument reduction. In 12th IEEE Symposium on Computer Arithmetic (ARITH-12), pages 216–221, Bath, UK, July 1995.Google Scholar
  32. [205]
    D. Fowler and E. Robson. Square root approximations in old Babylonian mathematics: YBC 7289 in context. Historia Mathematica, 25:366–378, 1998.MathSciNetCrossRefGoogle Scholar
  33. [211]
    D. M. Gay. Correctly-rounded binary-decimal and decimal-binary conversions. Technical Report Numerical Analysis Manuscript 90–10, ATT & Bell Laboratories (Murray Hill, NJ), November 1990.Google Scholar
  34. [212]
    W. M. Gentleman and S. B. Marovitch. More on algorithms that reveal properties of floating-point arithmetic units. Communications of the ACM, 17(5):276–277, 1974.CrossRefGoogle Scholar
  35. [215]
    I. B. Goldberg. 27 bits are not enough for 8-digit accuracy. Commun. ACM, 10(2):105–106, 1967.CrossRefGoogle Scholar
  36. [217]
    R. E. Goldschmidt. Applications of division by convergence. Master’s thesis, Dept. of Electrical Engineering, Massachusetts Institute of Technology, Cambridge, MA, June 1964.Google Scholar
  37. [249]
    J. R. Hauser. Handling floating-point exceptions in numeric programs. ACM Transactions on Programming Languages and Systems, 18(2):139–174, 1996.CrossRefGoogle Scholar
  38. [267]
    IEEE Computer Society. IEEE Standard for Floating-Point Arithmetic. IEEE Standard 754-2008, August 2008. Available at
  39. [313]
    W. Kahan. Pracniques: further remarks on reducing truncation errors. Communications of the ACM, 8(1):40, 1965.CrossRefGoogle Scholar
  40. [329]
    R. Karpinsky. PARANOIA: a floating-point benchmark. BYTE, 10(2), 1985.Google Scholar
  41. [342]
    D. E. Knuth. The Art of Computer Programming, volume 2. Addison-Wesley, Reading, MA, 3rd edition, 1998.Google Scholar
  42. [347]
    P. Kornerup, V. Lefèvre, N. Louvet, and J.-M. Muller. On the computation of correctly rounded sums. IEEE Transactions on Computers, 61(3):289–298, 2012.MathSciNetCrossRefGoogle Scholar
  43. [350]
    P. Kornerup and J.-M. Muller. Choosing starting values for certain Newton–Raphson iterations. Theoretical Computer Science, 351(1):101–110, 2006.MathSciNetCrossRefGoogle Scholar
  44. [391]
    S. Linnainmaa. Software for doubled-precision floating-point computations. ACM Transactions on Mathematical Software, 7(3):272–283, 1981.MathSciNetCrossRefGoogle Scholar
  45. [394]
    F. Loitsch. Printing floating-point numbers quickly and accurately with integers. In 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ‘10), pages 233–243, 2010.Google Scholar
  46. [403]
    M. A. Malcolm. Algorithms to reveal properties of floating-point arithmetic. Communications of the ACM, 15(11):949–951, 1972.CrossRefGoogle Scholar
  47. [405]
    P. Markstein. Computation of elementary functions on the IBM RISC System/6000 processor. IBM Journal of Research and Development, 34(1):111–119, 1990.MathSciNetCrossRefGoogle Scholar
  48. [406]
    P. Markstein. IA-64 and Elementary Functions: Speed and Precision. Hewlett-Packard Professional Books. Prentice-Hall, Englewood Cliffs, NJ, 2000.Google Scholar
  49. [411]
    D. W. Matula. In-and-out conversions. Communications of the ACM, 11(1):47–50, 1968.MathSciNetCrossRefGoogle Scholar
  50. [422]
    O. Møller. Quasi double-precision in floating-point addition. BIT, 5:37–50, 1965.MathSciNetCrossRefGoogle Scholar
  51. [440]
    J.-M. Muller. Avoiding double roundings in scaled Newton-Raphson division. In 47th Asilomar Conference on Signals, Systems, and Computers, pages 396–399, November 2013.Google Scholar
  52. [442]
    J.-M. Muller. Elementary Functions, Algorithms and Implementation. Birkhäuser Boston, MA, 3rd edition, 2016.CrossRefGoogle Scholar
  53. [458]
    I. Newton. Methodus Fluxionum et Serierum Infinitarum. 1664–1671.Google Scholar
  54. [481]
    A. Panhaleux. Génération d’itérations de type Newton-Raphson pour la division de deux flottants à l’aide d’un FMA. Master’s thesis, École Normale Supérieure de Lyon, Lyon, France, 2008. In French.Google Scholar
  55. [482]
    B. Parhami. On the complexity of table lookup for iterative division. IEEE Transactions on Computers, C-36(10):1233–1236, 1987.MathSciNetCrossRefGoogle Scholar
  56. [492]
    J. A. Pineiro and J. D. Bruguera. High-speed double-precision computation of reciprocal, division, square root, and inverse square root. IEEE Transactions on Computers, 51(12):1377–1388, 2002.MathSciNetCrossRefGoogle Scholar
  57. [496]
    D. M. Priest. On Properties of Floating-Point Arithmetics: Numerical Stability and the Cost of Accurate Computations. Ph.D. thesis, University of California at Berkeley, 1992.Google Scholar
  58. [517]
    S. M. Rump. Solving algebraic problems with high accuracy (Habilitationsschrift). In A New Approach to Scientific Computation, pages 51–120, 1983.Google Scholar
  59. [531]
    S. M. Rump, T. Ogita, and S. Oishi. Accurate floating-point summation part I: Faithful rounding. SIAM Journal on Scientific Computing, 31(1):189–224, 2008.MathSciNetCrossRefGoogle Scholar
  60. [555]
    J. R. Shewchuk. Adaptive precision floating-point arithmetic and fast robust geometric predicates. Discrete Computational Geometry, 18:305–363, 1997.MathSciNetCrossRefGoogle Scholar
  61. [558]
    T. Simpson. Essays on several curious and useful subjects in speculative and mix’d mathematicks, illustrated by a variety of examples. London, 1740.Google Scholar
  62. [566]
    G. L. Steele, Jr. and J. L. White. How to print floating-point numbers accurately. ACM SIGPLAN Notices, 25(6):112–126, 1990.CrossRefGoogle Scholar
  63. [567]
    G. L. Steele, Jr. and J. L. White. Retrospective: how to print floating-point numbers accurately. ACM SIGPLAN Notices, 39(4):372–389, 2004.CrossRefGoogle Scholar
  64. [572]
    P. H. Sterbenz. Floating-Point Computation. Prentice-Hall, Englewood Cliffs, NJ, 1974.Google Scholar
  65. [574]
    J. E. Stine and M. J. Schulte. The symmetric table addition method for accurate function approximation. Journal of VLSI Signal Processing, 21:167–177, 1999.CrossRefGoogle Scholar
  66. [616]
    G. W. Veltkamp. ALGOL procedures voor het berekenen van een inwendig product in dubbele precisie. Technical Report 22, RC-Informatie, Technishe Hogeschool Eindhoven, 1968.Google Scholar
  67. [617]
    G. W. Veltkamp. ALGOL procedures voor het rekenen in dubbele lengte. Technical Report 21, RC-Informatie, Technishe Hogeschool Eindhoven, 1969.Google Scholar
  68. [639]
    T. J. Ypma. Historical development of the Newton-Raphson method. SIAM Rev., 37(4):531–551, 1995.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Jean-Michel Muller
    • 1
  • Nicolas Brunie
    • 2
  • Florent de Dinechin
    • 3
  • Claude-Pierre Jeannerod
    • 4
  • Mioara Joldes
    • 5
  • Vincent Lefèvre
    • 4
  • Guillaume Melquiond
    • 6
  • Nathalie Revol
    • 4
  • Serge Torres
    • 7
  1. 1.CNRS - LIPLyonFrance
  2. 2.KalrayGrenobleFrance
  3. 3.INSA-Lyon - CITIVilleurbanneFrance
  4. 4.Inria - LIPLyonFrance
  5. 5.CNRS - LAASToulouseFrance
  6. 6.Inria - LRIOrsayFrance
  7. 7.ENS-Lyon - LIPLyonFrance

Personalised recommendations