Improving FloatingPoint Numbers: A Lazy Approach to Adaptive Accuracy Refinement for Numerical Computations
 1 Citations
 750 Downloads
Abstract
Numerical computation using floatingpoint numbers is prone to accuracy degradation due to roundoff errors and cancellation of significant digits. Although multipleprecision arithmetic might alleviate this problem, it is difficult to statically decide the optimal degree of precision for each operation in a program. This paper presents a solution to this problem: the partial results in floatingpoint representations are incrementally improved using adaptive control. This process of adaptive accuracy refinement is implemented using lazy lists, each one containing a sequence of floatingpoint numbers with gradually improving accuracies. The computation process is driven by the propagation of demand for more accurate results. The concept of this improving floatingpoint number (IFN) mechanism was experimentally implemented in two ways: as a Haskell library and as a pure C library. Despite the simple approach, the results for numerical problems demonstrated the effectiveness of this mechanism.
Keywords
Improving floatingpoint numbers Accurate numerical computation Lazy evaluation Haskell library1 Introduction
Obtaining accurate results from numerical computation is not an easy task. Since real values cannot be represented correctly using a fixed number of digits, ordinary numerical computation is carried out using approximated representations of numbers. Programmers have to be fully aware that computation based on approximated numbers can easily degrade the accuracy of the result unexpectedly.
In floatingpoint representation using base 2, each value is denoted as \(s \times m \times 2^e\), where s, m, and e are the sign, mantissa, and exponent, respectively. The mantissa (also called significant digits) is particularly important since it affects the precision of each number. There are many cases in which the mantissa of a resultant value has few meaningful digits, or even no digit, due to the accumulation of roundoff errors or catastrophic cancellation [1].
Multipleprecision floatingpoint arithmetic such as that defined in IEEE 754 [2] has been used to avoid catastrophic cancellation, and some processors have been designed to support multipleprecision arithmetic. In addition, arbitraryprecision arithmetic operations can be performed by using libraries such as the GNU MPFR library [3].
Next, let us slightly change the value of b. The results of (1) with \(a = 77617\) and \(b=33095\) are \(y=4.7833916866560586\times 10^{32}\) for doubleprecision and \(y=4.783391686660554\times 10^{32}\) for a 122bit length mantissa — there is no substantial difference between them.
This small example illustrates the inherent difficulties of floatingpoint computation with a fixed number of digits.

It is difficult to determine a suitable number of bits for each operation.

It is difficult to detect cancellation of significant digits from the obtained result.

It is difficult to pinpoint the operations in a program that caused cancellation.

It is difficult to predict the effect of changes in the data.
To obtain results that satisfy the required accuracy, it is necessary to properly estimate the mantissa length for each floatingpoint operation. Unfortunately, it is quite difficult to statically decide the optimal precision for each operation. Thus, we take a dynamic approach; the partial results in floatingpoint representations are incrementally improved using adaptive control. This adaptive accuracy refinement process requires tracing of accumulated errors and application of suitable strategies to remove inaccurate values.
In this paper, we present a mechanism for adaptive accuracy refinement of computational results. It uses lazy lists, each one containing an infinite sequence of specially designed floatingpoint numbers with gradually improving accuracies. These numbers are approximations of the same real value and are ordered on the basis of what we call “accuracy.” In computation using this mechanism, “referring to the next value” corresponds to obtaining a better (more accurate) value via recomputation of subexpressions, which are propagated to dependent operands automatically. In principle, computation using this improving floatingpoint number (IFN) mechanism is applicable to any numerical algorithm.
Once we bind the result of rump that uses IFN computation to the variable \({ qs}\), we can obtain an arbitrary accuracy of the resultant value by giving the desired accuracy to the library function accurateValue.

Our proposed IFN mechanism achieves adaptive accuracy refinement for the computation of each subexpression in the entire expression. This enables the user to easily obtain a computational result with the desired accuracy without cancellation of significant digits. In contrast to computation using a fixed number of bits such as with functions in the MPFR library, computation using the IFN mechanism can be done using only a sufficient number of bits for each subexpression. Note that “sufficient number of bits” varies with the subexpression; the number of bits for each subexpression is automatically adjusted.

We formalize an IFN as a list of specially represented floatingpoint numbers that approximate the same real value and for which the accuracies are improved. On the basis of this concept, we define unary / binary operators and basic mathematical functions on IFNs, the results of which are also IFNs.

An experimental IFN library was implemented as both a Haskell library and a pure C library. The lazy evaluation facility of Haskell facilitated implementation of the Haskell library in a quite natural manner because computation using IFNs proceeds in response to demands for “more accurate” values of the IFN of interest. A C version of the library was developed to cope with the performance problem of the Haskell library. Application of these libraries to numerical problems for which it is difficult to obtain precise answers by using Double numbers demonstrated the effectiveness of the proposed IFN mechanism.
The organization of this paper is as follows. Section 2 outlines the concept of our IFN mechanism and the adaptive refinement of subexpressions using IFNs. In Sect. 3, we describe the details of IFNs, floatingpoint numbers for IFNs, and arithmetic operations for IFNs. In Sect. 4, support for logical expressions is discussed. In Sect. 5, issues related to implementing IFN libraries are considered. Section 6 describes several numerical examples demonstrating the effectiveness and applicability of our IFN mechanism. In Sect. 7, we discuss a few IFNrelated issues. Related work is covered in Sect. 8. Finally, we conclude with a brief summary in Sect. 9.
Throughout this paper, we use Haskell [5] with some extra typesetting features to describe the design of IFN libraries. Though some datatypes in Sects. 3 and 4 are defined naively for the sake of conciseness, our practical IFN Haskell library uses more efficient implementation by means of Haskell’s foreign function interface (FFI), as described in Sect. 5.
2 Improving FloatingPoint Numbers
To improve the accuracy of floatingpoint computations of an expression by adaptive and appropriate accuracy refinements of its subexpressions, we introduce the concept of improving floatingpoint numbers. Intuitively speaking, an IFN is an infinite sequence of floatingpoint values, each of which approximates the same real value v, i.e., the ideal result of the computation. The further to the right the value is in the sequence, the closer it is to v.
The basic idea of IFNs came from the notion of “improving sequences” [6, 7], which is a finite monotonic sequence of approximation values of a final value that are gradually improved in accordance with an ordering relation. However, IFNs are different from improving sequences because IFNs are infinite sequences.
The floatingpoint values in an IFN are forced from left to right. If all \(q_i\)’s (\(i \le k\)) have been forced and \(q_j\)’s (\(k<j\)) have not been forced yet, the current value of this IFN is \(q_k\), and its current accuracy is \(\textit{ac}\;q_k\). If the current accuracy is unsatisfactory, \(q_{k+1}\) is forced, and the current value is set to \(q_{k+1}\). This process is repeated until the desired accuracy is obtained.
To see how the computation in terms of IFNs proceeds, let us consider a simple example: addition of v and w. Suppose that v corresponds to \(\textit{ps}= [p_0, p_1, \ldots , p_h, \ldots ]\) where its current value is \(p_h\), and w corresponds to \(\textit{qs}= [q_0, q_1, \ldots , q_k, \ldots ]\) where its current value is \(q_k\). Also suppose that \(v + w\) corresponds to \(\textit{rs}= [r_0, r_1, \ldots , r_l, \ldots ]\) where we have already computed \(r_0, r_1, \ldots , r_l\) by using \(p_0, p_1, \ldots , p_h, q_0, q_1, \ldots , q_k\). If we are not satisfied with \(\textit{ac}\;r_l\), we want to compute \(r_{l+1}\) to obtain better accuracy. If the next values of \(\textit{ps}\) and \(\textit{qs}\) (\(p_{h+1}\) and \(q_{k+1}\)) are judged to be necessary, we force them and compute \(r_{l+1}\) by using both \(p_{h+1}\) and \(q_{k+1}\). It is worth noting that the computation of \(r_{l+1}\) has to produce a value with better accuracy than \(r_l\). If \(\textit{ps}\) is the result of another computation, say multiplication of \(\textit{ps}'\) and \(\textit{ps}''\), forcing \(p_{h+1}\) induces other floatingpoint values in \(\textit{ps}'\) and \(\textit{ps}''\) to be forced. In this way, the entire computation is driven by the propagation of the demands for the next values in IFNs. This kind of demanddriven computation can be naturally described on the basis of lazy evaluation. We thus used Haskell to build a prototype IFN library. As we describe in later sections, there are several design choices in a practical implementation.
3 Adaptive Accuracy Refinement with IFNs
3.1 FloatingPoint Datatype
where sign, mantissa, and expo are the sign, mantissa, and exponent, respectively; sign is either 1 or \(1\), and mantissa is a finite list for which the elements are either 0 or 1. We assume that q is an instance of Q.
Example 1
Consider the following three floatingpoint numbers q, \(q'\), and \(q''\). For q and \(q'\), equations \(\langle q \rangle =\langle q' \rangle \) and \(\textit{ac}\;q=\textit{ac}\;q'\) hold. In addition, for any real v, \(v\leftarrowtail q \Longleftrightarrow v\leftarrowtail q'\). However, for q and \(q''\), \(\langle q \rangle =\langle q'' \rangle \), but \(\textit{ac}\;q < \textit{ac}\;q''\). For any real v, \(v\leftarrowtail q'' \Longrightarrow v\leftarrowtail q\) but \(v\leftarrowtail q \not \Longrightarrow v\leftarrowtail q''\).
Leftshifting \(q_1\) by m digits leads to \(q_2\), where \(\textit{lenM}\;q_2=nm\), and rightshifting \(q_2\) by m digits leads to \(q_3\), where \(\textit{lenM}\;q_3=n+m\). Note that rightshifting does not change real values associated with Q, i.e., \(\langle q_1 \rangle =\langle q_3 \rangle \) and \(\{q_1\} = \{q_3\}\). If \(b_1 = b_2 = \cdots = b_m = 0\) and \(b_{m+1}\ne 0\), leftshifting \(q_1\) by m digits removes all leading zeros of the mantissa of \(q_1\). We call the removal of leading zeros of the mantissa by leftshifting normalization. Normalization also does not change real values associated with Q. In Example 1, q and \(q'\) have the same representation under normalization.
3.2 Definition of IFNs

\(\textit{ac}\;q_i < \textit{ac}\;q_{i+1}\) holds for all \(i \ge 0\), and

there is a real value v that satisfies \(v \leftarrowtail q_i\) for all \(i \ge 0\).^{1}
If every element of an IFN \({ qs}\) properly represents a real v, i.e., \(v\leftarrowtail q_i\) for all \(i \ge 0\), we say that \({ qs}\)is an IFN with respect tov and write \(v \leftarrowtail { qs}\) by overloading the relation symbol \(\leftarrowtail \). Since IFNs are infinite lists, \(v \leftarrowtail { qs} \wedge w \leftarrowtail { qs} \Longrightarrow v = w\).
The function \({fromString}:\!\!\!:{Int}\rightarrow {String}\rightarrow {Q}\) generates an instance of Q that approximates a given number in a string representation with a designated length the mantissa. initN and diffN are respectively the initial length of the mantissa and the difference between the lengths of the mantissas of consecutive elements in the returned IFN. As will be described in Sect. 5.2 in detail, the values of initN and diffN and the way in which the IFNs are generated greatly affect the performance of the IFN library.
We call functions that accept IFNs as operands and returns an IFN IFN operators. When a program is constructed by composition of IFN operators, results at any level of accuracy can be obtained by using accurateValue.
3.3 FloatingPoint Arithmetic Operators for Q
We have to be careful in defining binary operators for Q because simple definitions would not satisfy the above conditions.
Example 2
Then, rounding of \(q_a\) at rth digit is computed as \({simpleAddQ}\; q_b\; q_c\).
Then, \(r_0 = p_0 \sqcup _0 q_0\) is an instance of Q such that among all qs that satisfy both \(\langle p_0 \rangle \leftarrowtail q\) and \(\langle q_0 \rangle \leftarrowtail q\), \(r_0\) gives the maximum of \(\textit{ac}\;q\) for most cases. The function \(\sqcup _0\) is defined as follows.
 Case \(\langle p_0 \rangle > \langle q_0 \rangle \ge 0\) or \(\langle p_0 \rangle < \langle q_0 \rangle \le 0\): Suppose the following expressions hold, as shown in Fig. 2:Note that \((b_d^{(p)},b_d^{(q)})=(1,0)\). Here, we suppose \(d\ge 3\) assuming that \(p_0\) and \(q_0\) are appropriately shifted beforehand. Now, if \(d<n\) and \((b_{d+1}^{(p)},b_{d+1}^{(q)})=(0,1)\), then \(r_0 = \textit{roundQ}\; p_0\; (j_d2)\), where \(j_d\) is the smallest integer satisfying \(j_d > d+1\) and \((b_{j_d}^{(p)},b_{j_d}^{(q)})\ne (0,1)\). If \(d=n\) or \((b_{d+1}^{(p)},b_{d+1}^{(q)}) \ne (0,1)\), then \(r_0 = \textit{roundQ}\; p_0\; (d2)\).$$ {\left\{ \begin{array}{ll} b_i^{(p)} = b_i^{(q)},&{}\;\;\;i = 1,\ldots ,d1\\ b_i^{(p)} \ne b_i^{(q)},&{}\;\;\;i = d. \end{array}\right. } $$

Case \(\langle q_0 \rangle > \langle p_0 \rangle \ge 0\)or \(\langle q_0 \rangle < \langle p_0 \rangle \le 0\): Exchange \(p_0\) and \(q_0\) and apply the previous case.

Case \(\langle p_0 \rangle  = \langle q_0 \rangle  > 0\): If \(\langle p_0 \rangle = \langle q_0 \rangle \), \(r_0 = p_0 = q_0\). If \(\langle p_0 \rangle = \langle q_0 \rangle \), \(r_0 = \textit{roundQ}\; p_0\; (d2)\), where the first \(d1\) digits of \(\textit{mantissa}\; p_0\) are zeros and the dth digit is 1.

Case \(\langle p_0 \rangle  = \langle q_0 \rangle  = 0\): \(r_0 = p_0 = q_0\).
We can also define divQ in a similar manner; its definition is omitted due to space limitation.
As described above, binary Q operators such as addQ and mulQ are constructed using raw arithmetic operators such as simpleAddQ and simpleMulQ and a binary operator \(\sqcup \) for accuracy management. The raw operators are expected to be built with highly tuned variableprecision libraries such as MPFR.
3.4 Unary IFN Operator: Negation
3.5 Binary IFN Operators
The function \({{addIFN'}}\) keeps the current accuracy (the accuracy of the current value) in its first argument and uses it to ensure that the accuracy of the next value in the resultant IFN is improved.^{2}
One may wonder whether recursive application of addIFN’ while obtaining “the next value” might not terminate. However, \({\textit{addIFN}}\) is shown to work properly as follows.

If \(\textit{ac}\;p_j < \textit{ac}\;q_k\), the candidate for \(r_{i+1}\) is \({addQ}\; p_{j+1}\; q_k\). In this case, the inequality \({min}\; (\textit{ac}\;p_j)\; (\textit{ac}\;q_k)<{min}\; (\textit{ac}\;p_{j+1})\; (\textit{ac}\;q_k)\) holds. If \(\textit{ac}\;r_i < \textit{ac}\;(\textit{addQ}\; p_{j+1}\; q_k)\), then \(r_{i+1}\) is readily available as \({addQ}\; p_{j+1}\; q_k\). On the other hand, if \(\textit{ac}\;r_i \ge \textit{ac}\;(\textit{addQ}\; p_{j+1}\; q_k)\), searching for appropriate operands continues.

If \(\textit{ac}\;p_j > \textit{ac}\;q_k\), the candidate for \(r_{i+1}\) is \({addQ}\; p_{j}\; q_{k+1}\). Here, \({min}\; (\textit{ac}\;p_j)\; (\textit{ac}\;q_k) < {min}\; (\textit{ac}\;p_{j})\; (\textit{ac}\;q_{k+1})\) holds. If \(\textit{ac}\;r_i \ge \textit{ac}\;({addQ}\; p_{j}\; q_{k+1})\), searching for appropriate operands continues.

If \(\textit{ac}\;p_j = \textit{ac}\;q_k\), the candidate for \(r_{i+1}\) is \({addQ}\; p_{j+1}\; q_{k+1}\). In this case, \({min}\; (\textit{ac}\;p_j)\; (\textit{ac}\;q_k)<\textit{min}\; (\textit{ac}\;p_{j+1})\; (\textit{ac}\;q_{k+1})\) holds. If \(\textit{ac}\;r_i \ge \textit{ac}\;(\textit{addQ}\; p_{j+1}\; q_{k+1})\), searching for appropriate operands continues.
To show that mulIFN works properly, first we show that mulIFN’ searches for appropriate operands for use in calculating “the next value” in such a way that the value of \(f\; p\; q =\textit{ac}\;p+\textit{ac}\;q \textit{max}\;(\textit{lenM}\;p)\;(\textit{lenM}\;q)\) increases, where p and q are instances of Q that are used to calculate \({mulQ}\;p\;q\).
 If \(\textit{lenM}\;p_j < \textit{lenM}\;q_k\), the candidate for \(r_{i+1}\) is \({mulQ}\; p_{j+1}\; q_k\). Let us examine the value of \(f\; p_{j+1}\; q_k\;  f\; p_{j}\; q_k\). If \(\textit{lenM}\;p_{j+1} > \textit{lenM}\;q_k\), thensince \(\textit{expo}\; p_{j}\ge \textit{expo}\; p_{j+1}\). On the other hand, if \(\textit{lenM}\;p_{j+1} \le \textit{lenM}\;q_k\), then$$\begin{aligned} f\; p_{j+1}\; q_k  f\; p_{j}\; q_k= & {} \textit{ac}\;p_{j+1}\textit{ac}\;p_j \textit{lenM}\;p_{j+1}+ \textit{lenM}\;q_k \\= & {} (\textit{expo}\; p_{j}\textit{expo}\; p_{j+1}) + (\textit{lenM}\;q_k\textit{lenM}\;p_{j}) > 0 \end{aligned}$$If \(\textit{ac}\;r_i < \textit{ac}\;(\textit{mulQ}\; p_{j+1}\; q_k)\), then \(r_{i+1}\) is readily available as \({mulQ}\; p_{j+1}\; q_k\). If \(\textit{ac}\;r_i \ge \textit{ac}\;({mulQ}\; p_{j+1}\; q_k)\), searching for appropriate operands continues.$$\begin{aligned} f\; p_{j+1}\; q_k  f\; p_{j}\; q_k= & {} \textit{ac}\;p_{j+1}\textit{ac}\;p_j {\textit{lenM}\;q_{k}}+ {\textit{lenM}\;q_k} > 0. \end{aligned}$$

If \(\textit{lenM}\;p_j > \textit{lenM}\;q_k\), \(f\; p_{j}\; q_{k+1} > f\; p_{j}\; q_k\) can be shown similarly. If \(\textit{ac}\;r_i \ge \textit{ac}\;(\textit{mulQ}\; p_{j}\; q_{k+1})\), searching for appropriate operands continues.

If \(\textit{lenM}\;p_j = \textit{lenM}\;q_k\), \(f\; p_{j+1}\; q_{k+1} > f\; p_{j}\; q_k\) can be shown similarly. If \(\textit{ac}\;r_i \ge \textit{ac}\;(\textit{mulQ}\; p_{j+1}\; q_{k+1})\), searching for appropriate operands continues.
Division operator divIFN can be essentially constructed in the same way.
3.6 Basic Mathematical Functions for IFN
where simpleExpQ corresponds to the square root function of MPFR. Note that \(\{q\}=\langle d \rangle \).
The functions described here are unary. The definitions of monotonic IFN functions are straightforward; e.g., expIFN = mapexpQ.
Nonmonotonic functions such as trigonometric functions are not as easy to implement as monotonic ones. The implementation of continuous functions such as sinQ and cosQ requires special treatment for the maximum and / or minimum values in the range of each Q. If the accuracy of an input value is too low, the corresponding range could be \((1, 1)\). Other trigonometric functions such as tanQ can be composed of other functions. Although such special treatment has to be taken into consideration, nonmonotonic functions can be implemented by using MPFR functions (although we have not yet implemented them in the current IFN library.)
3.7 Precision and Accuracy
In ordinary floatingpoint computations, the word “precision” usually denotes the number of (meaningful) digits in the mantissa. Precision thus implies the relative error of each floatingpoint number. In contrast, we use “accuracy” for the index of preciseness of each floatingpoint number of Q. As defined in Sect. 3.1, the accuracy of a floatingpoint number q of type Q, \(\textit{ac}\;q\), is an indicator of the absolute error of q. In the strict sense, precision and accuracy are different. However, phrases used in the context of numerical computation such as “precision improvement” and “accuracy improvement” imply the same meaning. Hereafter, we use the “precision” and “accuracy” interchangeably unless otherwise stated.
4 Logical Expressions with IFNs
4.1 Treatment of Logical Expressions
The preciseness of the results of numerical computations does not depend only on the accuracy of the arithmetic operations. For some numerical methods including iterative solving and truncated summation of an infinite series, control transfers are required to terminate the computation. Thus, correct evaluation of branch conditions, i.e., logical expressions, dependent on the results of floatingpoint arithmetic is crucial.
Logical expressions have type Bool. Since there is no “more precise True,” or “True but very close to False,” a logical expression should return a simple Boolean value. Boolean values should not change even when adaptive accuracy refinement is applied to arithmetic expressions. This means that accurate comparisons of floatingpoint values have to be done immediately.
Zerotesting for a floatingpoint value is not as easy as one might think. It might not be enough to simply look at the mantissa of floatingpoint numbers. Recall that every floatingpoint number is simply an approximation of a real value. In fact, algorithms dependent on the result of equality checking are considered to be improper when writing numerical programs. Nevertheless, two numbers are frequently compared.

True zero is separately treated throughout the computation.

Zerotesting is done in accordance with an auxiliary parameter.
4.2 Introduction of True Zeros
Here we introduce the idea of true zeros. A true zero is generated when a literal constant zero appears in a program. In addition, a true zero can be the result of an operation on Q. For example, we can extend mulQ so that \({mulQ}\; q\; z = z\), where z is a true zero. A true zero is never generated from an operation when all operands are not true zeros.
We let Q include zeroFlag, which represents whether the value is a true zero. Notice that we do not care about the values of sign, mantissa, and expo for a true zero because any operation on Q can be defined without them.
As for IFN operators, if we define \(\textit{ac}\;z=\infty \) and \(\textit{lenM}\;z=\infty \) for a true zero z, no modification for the IFN operators defined in Sects. 3.4 and 3.5 is required. In fact, when a true zero appears as the current value in an IFN, the next value need not be accessed because there should be no better value.
As a result of this separate handling of true zeros, a true zero can appear only as the first element in a lazy list, and the subsequent elements are never accessed. This property can be used for optimization in the implementation of IFN operators. For example, \({addIFN}\;{ ps}\;{ qs}\) can be defined to return \({ ps}\) immediately if the first element of \({ qs}\) is a true zero.
4.3 ZeroTesting and Equality Testing
Besides true zero, it is practically convenient to capture very small computational results as “approximated zero.” Here, tol is an integer called tolerance that is used to judge whether to treat a value as zero.
\({zeroQ}\; q\;\textit{tol} = 0\) indicates that \(\langle q \rangle = 0\) but \(\textit{expo}\; q\) is too large to treat q as zero.
where subIFN is an IFN operator defined in Sect. 3.5.
4.4 Comparison Between Two IFNs
5 Detailed Design and Implementation of Primitives
5.1 Adaptive Accuracy Refinement for IFN Computations
Control of Adaptive Accuracy Refinement.
In previous sections, we described the basic design of IFNs. An IFN does not know whether its current value is sufficiently accurate; this can be judged only at the root of the computation tree. If the root judges that the accuracy of the IFN’s current value at the root is unsatisfactory, it issues a demand for “recomputation for a more accurate value” to its child (or children) on the basis of the definition of the operation (e.g., addIFN) at the root. This recomputation demand is propagated down to the (part of the) leaves. Every node that receives the demand produces the next more accurate value in the IFN stream it returns. The entire computation proceeds as a repetition of this accuracy refinement process.
The accuracy refinement process, illustrated in Fig. 3, starts only if the resulting precision does not meet the user’s requirement. The demand for recomputation is propagated all the way to one or more leaves, and the values at the nodes on the paths from the leaves to the root are updated. This process is iterated until the resultant value at the root is sufficiently accurate. Even though recomputation is done only at nodes of a subtree in each iteration, the repetition process is inherently inefficient.
In fact, the accuracy refinement process could be performed too many times because each node in the computation tree is not informed of the required accuracies of the value it produces (in the IFN stream). Our approach to reducing this inefficiency is to advise each node of the required accuracy of the value it produces when propagating the demand for a more accurate value. Although the required accuracy at the root is not actually required at other nodes, we use the accuracy given by the user as the lower limit at each node of the computation tree. Although this is a simple heuristic approach, it works well for many cases from our experience.
Although the types of primitives change, the whole course of computation is consistently replaced, and the numerical programs prepared by the user do not need to be modified.
Conversion of an IFNbased library to an IFNgenbased library causes subtle but nonignorable problems. When computation trees are constructed on the basis of IFNgen, the nodes of the tree are type IFNgen functions that generate specific IFNs at runtime. Therefore, even though the generator of type IFNgen is referred to by multiple IFNgen operators as the generator of the operands, the IFNs generated at runtime are not shared. The use of nonshared IFNs means that complicated numerical programs that utilize iterative algorithms and / or matrix computations are unfeasible because of the blowup of duplicated computations.
5.2 Configuration of IFNs

The accuracy of each IFN’s initial element for literal values can be set higher than the user’s required accuracy for the total results.

The difference in accuracy between the elements of each IFN for literal values can be increased, or even the sequence can be defined on the basis of a geometric series.

A lower limit on accuracy refinement can be imposed for each IFN operator.
In our experience, these modifications and parameter settings greatly affect the performance of numerical programs. However, it is difficult to determine the optimal configuration. If the accuracy of each IFN’s initial element for literals is too large, useless computations to obtain too accurate values by variablelength floatingpoint operations might take a very long time. However, if it is too small, there could be a huge number of iterations for adaptive accuracy refinement, resulting in a great amount of time to obtain accurate results. The settings for other items in the list above also affect computational cost. Deciding the appropriate configuration remains for future work.
5.3 Structure of Datatype Q and Design of Q Operators
We used Boehm’s GC library for memory management of C objects. By using Haskell’s foreign function interface (FFI) and the Foreign and Foreign.Concurrent modules, we made Haskell’s GC cooperate with Boehm’s GC.
5.4 Implementation in C Without GeneralPurpose Garbage Collectors
The adaptive accuracy refinement functionality of the IFN library relies greatly on the lazy evaluation of infinite lists. However, because the dynamic structure constructed at runtime is fairly simple, generalpurpose garbage collectors are not necessary (or may be unsuitable). Objects created at runtime do not cyclically reference to each other, and the point of destruction of an IFN’s elements in the course of execution is predeterminable. Thus, an implementation of the IFN library in C with its own memory management, such as collection by reference counting, may outperform the awkward cooperation of Haskell’s and Boehm’s GC facilities described in Sect. 5.4, especially for complicated problems.
On the basis of this perspective, we implemented a version of the IFN library completely in C without generalpurpose garbage collectors. IFN cells and other objects such as instances of Q and their mantissas are managed by using a set of ring buffers.
6 Numerical Experiments
Examples presented in this section demonstrate the potential and applicability of adaptive accuracy refinement using IFNs. The results are compared with those of programs using other computable real arithmetic libraries, namely Exact Real Arithmetic (ERA) (version 1.0) [8] and iRRAM (version 2013_01) [9].
Although the implementation is done in quite a small number of lines, ERA is thought to be the fastest among computable real libraries implemented in Haskell [10]. ERA is not based on the idea of adaptive refinement of accuracy — it does not need it. However, to obtain a final result with userdefined accuracy, much computation on integers may be required.
iRRAM is a C++ library for computable real numbers [9]. Each variable of type REAL is constructed with a multipleprecision number and information on its absolute error. Errors are accumulated during the course of computation, and if the error in the resultant value exceeds the user’s request, the entire computation is repeated with a significantly better precision iRRAM uses external multipleprecision libraries such as MPFR. It was the “clear winner” of a competition among several systems for exact arithmetic held at CCA 2000 [11].
Note that IFN libraries are still prototypes and have much room for optimization. Although the comparison among the libraries reported here was done mainly on the basis of performance, the aim was not to determine a “winner” but to clarify the characteristics of the libraries.
6.1 Environment
All programs were run on a MacBook Pro (Intel Core i7 3 GHz with 16GB memory) running OS X 10.10.3. We used two versions of the IFN library. One was implemented in Haskell (IFNH), described in Sects. 5.1 and 5.3, and the other was implemented in C (IFNC), described in Sect. 5.4. Both used MPFR [3].
Application programs were written in Haskell for IFNH and ERA, in C for IFNC, and in C++ for iRRAM. The Haskell programs given to IFNH and ERA were the same, and the C and C++ programs were equivalent to the Haskell programs.
6.2 Example 1: Simple Expression
The elapsed time to perform one evaluation of the program against the required accuracy is plotted in Fig. 4. All the curves depict essentially the same trend, indicating that the performance of IFN is comparable to those of the others.
IFNC outperformed IFNH and ERA for a higher range of required accuracy. However, when used for obtaining moderately accurate results with maximum absolute errors of, say, less than \(2^{1024}\), IFNH performed better than IFNC. Since the behaviors of the numerical computations of IFNC and IFNH were the same, it should be possible to raise the performance of IFNC to the level of IFNH.
6.3 Example 2: Solving a Sensitive Problem
As shown in Fig. 5, the behavior of IFNC was similar to that of iRRAM. Although clear differences are apparent, IFNC can be said to be comparable to iRRAM. IFNH performed as well as IFNC. However, there was substantial performance degradation of IFNH for large problems. This is probably because the cost of garbage collection increased with the problem size.
 1.The accuracy of the first element’s in each IFN for literal values was set as either
 (a)
the accuracy required for the result or
 (b)
twice the required accuracy.
 (a)
 2.The pattern to increase the accuracy of each IFN for literal values was set as either
 (a)
an arithmetic sequence with a common difference of 64 or
 (b)
a geometric series with a common ratio of 2.
 (a)
 3.
The minimum increase in accuracy imposed between consecutive elements in all IFNs was set as either (a) 4 or (b) 32.
We describe each configuration using triples; for example, (a, b, a) denotes that the choices for items 1, 2, and 3 are (a), (b), and (a), respectively. The results for IFNC and IFNH shown in Figs. 4 and 5 were obtained with a configuration pattern of (b, b, b).
The results with patterns (a, a, a), (b, a, b), (a, b, b), and (b, b, b) are shown in Fig. 6. The results for iRRAM are also shown for reference. The performance with (b, b, b) was about two orders of magnitude better than that with (a, a, a). The curves for (b, a, b) and (b, b, b) were almost the same when the required accuracy was greater than 1,024 (required maximum absolute error less than \(2^{1025}\)). In fact, all executions with (b, _, _) that were configured with large initial accuracy values for literals did not perform any recomputation in the range of requirements, including (b, b, a) and (b, a, b), which are not shown in Fig. 6. When moderate accuracy was required, items 2 and 3 both have to be (b). Although not shown in Fig. 6, the result with (b, b, a) was very close to that of (b, a, b).
The performance with configuration pattern (b, b, b) was the best. This seems to be mainly due to the reduced number of invocations for recomputation. However, the most suitable configuration may greatly depend on the application. For example, there could be cases where reducing the number of computations increases total elapsed time because of the cost required for too accurate variableprecision floatingpoint operations. Detailed analysis of the behavior of IFN libraries for other applications is left for future work.
7 Discussion
7.1 Usability
In principle, adaptive accuracy refinement using IFNs is applicable to almost all numerical computations. One of the major tasks for implementing an IFN version of a numerical program is basically to replace floatingpoint operators in the original program with their corresponding IFN operators (or IFNgen operators). As for the Haskell library we developed, since the IFN datatype is declared as instances of numberrelated type classes such as Num and Fractional, the user can use normal arithmetic operators such as \(+\), \(*\) without any knowledge of the internal details of IFNs.
7.2 Applicability
IFNs can be used to solve any type of numerical problem. As demonstrated in Sect. 6, evaluation of complex expressions and matrix computations can be carried out.
Usage of IFNs simply enables the user to eliminate errors caused by the usage of fixed length floatingpoint representations. When IFNs are used for programs derived from an approximated modeling process such as truncation, discretization, and any style of simplification, the accuracy of the results is not guaranteed. In addition, because the accuracy of zerotesting depends on the external parameter, the accuracy of the entire result can also depend on the parameter. This includes iterative algorithms such as the NewtonRaphson method, which require comparison of (computed) real values to control the execution.
7.3 Performance
Compared to arithmetic on the basis of a fixednumber of bits such as Double, IFN arithmetic has inherent and significant overhead caused by operators on Q (e.g., addQ), operators on IFNs (e.g., addIFN) and the control of demand driven computation.
The properties of IFN operators enable the user to obtain computational results for expressions composed of basic arithmetic operations with the desired accuracies. However, expressions that would cause catastrophic cancellations of significant digits (if Double numbers were used) could take a long computational time with IFNs.
The results of our experimental implementation of IFN libraries (Sect. 6) show the potential of IFNs. Although current versions of IFN libraries cannot be used to solve large problems, we think there is much room for improvement in terms of computational speed. Rearrangement of the many bitwise operations required for postprocessing of \(\sqcup \) in each Q operator may be one way to reduce computational time.
8 Related Work
The basic idea of IFNs came from the notion of improving sequences [6, 7]. An improving sequence is a finite monotonic sequence of approximation values of a final value that are improved gradually in accordance with an ordering relation. Programs constructed with improving sequences offers many opportunities to eliminate too accurate computation. The effectiveness of improving sequences has been demonstrated for combinatorial optimization problems. IFNs are different from improving sequences because IFNs are specialized and infinite streams representing real numbers
Dynamic detection of catastrophic cancellation of significant digits can be realized by monitoring each normalization process of the resultant values in floatingpoint arithmetic (an example of a vector processor with cancellation detectors is presented elsewhere [12]).
Sophisticated tools for detecting precision degradation have been proposed [13, 14]. They use shadow values calculated using higher precision arithmetic, so the results are presumably better. By using the tools, the occurrence of cancellation can be detected and causes of the errors can be analyzed. However, accuracy of the results will never be guaranteed by those tools.
An option other than floatingpoint arithmetic for carrying out precise numerical computation is classical rational arithmetic. Each value is represented by a pair of integers, i.e., a numerator and a denominator. While it can be used to evaluate simple expressions like Rump’s example (Sect. 1), it may not be suitable for complicated programs due to huge computational cost. Approximation by a kind of rounding may be needed.
Interval arithmetic, in which each value is represented by upper and lower bounds, is another tool for accuracyaware numerical computation [1]. Analyses of complicated functions and linear systems based on special facilities for precise inner product computation have been carried out [15]. Our floatingpoint representation of Q can be considered a virtual interval representation. The feasibility of adaptive refinement with interval arithmetic based on lazy evaluation has been indirectly confirmed by our research.
Exact real computer arithmetic has been studied for decades [16]. To deal with reals, a number of representations have been considered, such as continued fraction representation [17, 18], linear fractional transformations for exact arithmetic [19], radix representations with negative digits, nonintegral or irrational bases, and nested sequences of rational intervals [20]. In scaledinteger representation [21], reals are represented by functions, and each arithmetic operation consists of the application and construction of functions with rational computation for coefficients. Some of those adopt lazy stream implementation in which each infinite stream represents a real. Although we have not examined in them detail, several libraries have been implemented [8, 9, 22, 23, 24]. Compared to previous approaches, ours is simple and thus has many design choices for the details. The most significant difference between IFNs and others is that each approximate value is represented as a precisionguaranteed floatingpoint number. In that sense, an IFN can be seen as a sequence of intervals. It can also be seen as a kind of Cauchy sequence although different from ordinary definitions of Cauchy sequences for computable real arithmetic.
9 Conclusion
We have demonstrated a means of adaptive refinement based on lazy evaluation for accurate floatingpoint numerical computations. We presented the idea of improving floatingpoint numbers (IFNs) to encapsulate adaptive refinement processes into lazy lists. Computations using IFNs was described in detail. Numerical results based on implementations in Haskell and C demonstrated the effectiveness of the approach.
Future work includes optimization, design of supporting tools, and numerical evaluation on a larger scale of complicated numerical problems.
Footnotes
 1.
The condition is weaker than the following: \(\langle q_i \rangle  \{q_i\} \le \langle q_{i+1} \rangle  \{q_{i+1}\} < \langle q_{i+1} \rangle + \{q_{i+1}\} \le \langle q_i \rangle + \{q_i\}\); i.e., each successive floatingpoint number does not necessarily denote a subinterval of the previous one.
 2.
The inequality \(a' \le a\) in the definition of addIFN’ can be modified to enhance performance; see Sect. 5.2 for details.
Notes
Acknowledgments
The first author thanks Toshiaki Kitamura and Mizuki Yokoyama for their useful discussions. The authors thank the anonymous reviewers for their detailed comments. This work was supported in part by an HCU Grant for Special Academic Research (General Studies) under Grant No.1030301.
References
 1.Knuth, D.E.: The Art of Computer Programming, Sect. 4.2.2, 3rd edn. AddisonWesley, Boston (1997)Google Scholar
 2.Microprocessor Standards Committee of the IEEE Computer Society: IEEE Standard for FloatingPoint Arithmetic. IEEE Standard 754 (2008)Google Scholar
 3.The GNU MPFR Library. http://www.mpfr.org
 4.Rump, S.M.: Algorithms for verified inclusion. In: Moore, R. (ed.) Reliability in Computing, Perspectives in Computing, pp. 109–126. Academic Press, New York (1988)CrossRefGoogle Scholar
 5.Bird, R.: Introduction to Functional Programming Using Haskell, 2nd edn. Prentice Hall, Englewood Cliffs (1998)Google Scholar
 6.Morimoto, T., Takano, Y., Iwasaki, H.: Instantly turning a naive exhaustive search into three efficient searches with pruning. In: Hanus, M. (ed.) PADL 2007. LNCS, vol. 4354, pp. 65–79. Springer, Heidelberg (2007)CrossRefGoogle Scholar
 7.Iwasaki, H., Morimoto, T., Takano, Y.: Pruning with improving sequences in lazy functional programs. HigherOrder Symbolic Comput. 24, 281–309 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 8.Lester, D.: ERA: Exact Real Arithmetic, version 1.0 (2000). http://hackage.haskell.org/package/numbers3000.2.0.1/docs/DataNumberCReal.html
 9.Müller, N.T.: The iRRAM: exact arithmetic in C++. In: Blank, J., Brattka, V., Hertling, P. (eds.) CCA 2000. LNCS, vol. 2064, p. 222. Springer, Heidelberg (2001)CrossRefGoogle Scholar
 10.Haskell wiki, Applications and Libraries / Mathematics, Sect. 3.2.2.2. http://www.haskell.org/haskellwiki/Applications_and_libraries/Mathematics#Dynamic_precision_by_lazy_evaluation
 11.Blanck, J.: Exact real arithmetic systems: results of competition. In: Blank, J., Brattka, V., Hertling, P. (eds.) CCA 2000. LNCS, vol. 2064, p. 389. Springer, Heidelberg (2001)CrossRefGoogle Scholar
 12.Aniya, S., Kitamura, T.: A performance improvement for floatingpoint arithmetic unit with precision degradation detection. In: The 17th Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2012), pp. 490–491 (2012)Google Scholar
 13.Jeffrey, K.H., Lam, M.O., Stewart, G.W.: Dynamic floatingpoint cancellation detection. In: WHIST 2011 (2011)Google Scholar
 14.Benz, F., Hildebrandt, A., Hack, S.: A dynamic program analysis to find floatingpoint accuracy problems. In: SIGPLAN Notices, PLDI 2012, vol. 47, No. 6, pp. 453–462 (2012)Google Scholar
 15.Jansen, P., Weidner, P.: Highaccuracy arithmetic software – some tests of the ACRITH problemsolving routines. ACM Trans. Math. Softw. 12(1), 62–70 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Gowland, P., Lester, D.R.: A survey of exact arithmetic implementations. In: Blank, J., Brattka, V., Hertling, P. (eds.) CCA 2000. LNCS, vol. 2064, pp. 30–47. Springer, Heidelberg (2001)CrossRefGoogle Scholar
 17.Gosper, W.: Continued fractions (1972). http://www.inwap.com/pdp10/hbaker/hakmem/cf.html
 18.Vuillemin, J.: Exact real computer arithmetic with continued fractions. IEEE Trans. Comput. 39, 1087–1105 (1990)CrossRefGoogle Scholar
 19.Potts, P.: Exact real arithmetic using Möbius transformations, Ph.D. thesis, Department of Computing, Imperial College of Science, Technology and Medicine, University of London. http://www.doc.ic.ac.uk/%7Eae/papers.html
 20.Escardó, M.: Introduction to exact numerical computation, Notes for a tutorial at ISSAC (2000). http://www.cs.bham.ac.uk/%7Emhe/issac
 21.Boehm, H.J., Cartwright, R., Riggle, M., O’Donnell, M.J.: Exact real arithmetic: a case study in higher order programming. In: ACM Symposium on Lisp and Functional Programming, pp. 162–173 (1986)Google Scholar
 22.Guy, M.: bignum / BigFloat (2007). http://bignum.sourceforge.net/
 23.Edalat, A.: Exact real number computation using linear fractional transformations, Final Report on EPSRC grant GR/L43077/01 (2001)Google Scholar
 24.Lambov, B.: RealLib: an efficient implementation of exact real arithmetic. Math. Struct. Comput. Sci. 17, 81–98 (2007)MathSciNetCrossRefzbMATHGoogle Scholar