1 Introduction

The solution of an ill-posed problem often requires the solution of a large, sparse linear system \(A{\varvec{x}}={{\varvec{b}}}\) where \(A\in \mathbb {C}^{n \times n}\) is non-Hermitian and nearly singular, \({{\varvec{b}}}\in \mathbb {C}^n\) and \({{\varvec{b}}}\in {{\mathrm{range}}}(A)\) [1]. We assume throughout that \(A\) is diagonalizable since, although possible, analysis using the Jordan canonical form is more complicated. The near-singularity of \(A\) is reflected in a number of small eigenvalues.

In many cases \({{\varvec{b}}}\) is unknown and we instead possess a noisy vector \({{\varvec{b}}}_\delta \), where \(\Vert {{\varvec{b}}} - {{\varvec{b}}}_{\delta }\Vert _2 = \delta \). This is problematic since the ill-conditioning of \(A\) means that \(A^{-1}{{\varvec{b}}}_{\delta }\) may be a poor approximation of \({\varvec{x}}\). Consequently, it is necessary to regularize, i.e., to solve

$$\begin{aligned} A_\delta {\varvec{x}}_{\delta } = {{\varvec{b}}}_{\delta }. \end{aligned}$$
(1)

The Generalized Minimal Residual method [2] (GMRES) is an iterative method for solving (1) that, given an initial guess \({\varvec{x}}_0\) which we assume for simplicity is the zero vector, selects at the \(k\)th step the iterate \({\varvec{x}}_k\) for which the residual \({\varvec{r}}_k= {{\varvec{b}}}_\delta - A_\delta {\varvec{x}}_k\) satisfies

$$\begin{aligned} \Vert {\varvec{r}}_k\Vert _2 = \min _{\begin{array}{c} q\in \varPi _k\\ q(0)=1 \end{array}} \Vert q(A_\delta ){{{\varvec{b}}}_\delta }\Vert _{2}, \end{aligned}$$
(2)

where \(\varPi _k\) is the set of polynomials of degree at most \(k\). When GMRES is used to solve (1) it can sometimes give good approximations to \({\varvec{x}}_\delta \) as long as the method is terminated after the correct number of iterations, i.e., GMRES itself can have a regularizing effect [3, 4]. Alternatively, regularization may be achieved by preconditioning [46]. In either case it is important to understand the behaviour of GMRES applied to nearly singular systems. Eldén and Simoncini [4] used the Schur decomposition to show that when the right-hand side has leading components in the direction of eigenvectors associated with large eigenvalues, the initial convergence is related to a reduction in the sizes of these components. Here we provide a complementary analysis involving the eigenvalue-eigenvector decomposition and the simple bounds in Titley-Peloquin, Pestana and Wathen [7]. Similarly to Eldén and Simoncini we find that the first phase of convergence is related to large eigenvalues. We additionally observe that the stagnation typically observed in the second phase, known as semi-convergence, is attributable to the remaining small eigenvalues.

2 Structure of Nearly Singular Systems

Let \(A_\delta \) have diagonalization \(A_\delta = Z\varLambda Z^{-1}\), \(\varLambda = {{\mathrm{diag}}}(\lambda _i)\) and \(Z\in \mathbb {C}^{n \times n}\), where without loss of generality \(|\lambda _1| \ge |\lambda _2|\ge \cdots \ge |\lambda _n|\). We wish to separate the spectrum of \(A_\delta \) into \(p\) large eigenvalues and the remaining small eigenvalues. The matrix \(A_\delta \) may have two distinct sets of eigenvalues, for example, when a preconditioner is applied. In other cases, however, there is no obvious separation. In this situation we find that a division on the order of \(\delta \) is a reasonable choice.

Given these two sets of eigenvalues we partition \(A_\delta \) as

$$ A_\delta = \begin{bmatrix}Z_1&Z_2\end{bmatrix}\begin{bmatrix}\varLambda _1&\\&\varLambda _2\end{bmatrix} \begin{bmatrix}Y_1^{*}\\Y_2^{*}\end{bmatrix},$$

where \(\varLambda _1\in \mathbb {C}^{p\times p}\), \(\varLambda _2\in \mathbb {C}^{(n-p)\times (n-p)}\), \(Z_1, Y_1 \in \mathbb {C}^{n \times p}\) and \(Z_2,Y_2\in \mathbb {C}^{n\times (n-p)}\). We assume that \(\Vert Y_2^*{{\varvec{b}}}\Vert _2 = \epsilon \) is small, i.e., that the true right-hand side vector \({{\varvec{b}}}\) is mainly associated with the low-frequency components of \(A_\delta \); otherwise the ill-posed problem is intractable.

Integral to our bounds are the co-ordinates of \({{\varvec{b}}}_\delta \) in the eigenvector basis

$$\begin{aligned} {\varvec{w}}= Z^{-1}{{\varvec{b}}}_\delta /\Vert {{\varvec{b}}}_\delta \Vert _2=\begin{bmatrix}{\varvec{w}}_1\\{\varvec{w}}_2\end{bmatrix} = \frac{1}{\Vert {{\varvec{b}}}_\delta \Vert _2}\begin{bmatrix}Y_1^*{{\varvec{b}}}_\delta \\ Y_2^*{{\varvec{b}}}_\delta \end{bmatrix} \end{aligned}$$
(3)

and, in particular, \({\varvec{w}}_2 = (Y_2^*{{\varvec{b}}} + Y_2^*({{\varvec{b}}}_\delta - {{\varvec{b}}}))/{\Vert {{\varvec{b}}}_\delta \Vert _2},\) the norm of which is bounded by

$$\begin{aligned} \Vert {\varvec{w}}_2\Vert _2 \le (\epsilon + \delta \Vert Y_2\Vert _2)/\Vert {{\varvec{b}}}_\delta \Vert _2. \end{aligned}$$
(4)

To give some idea of typical spectra, and to show the difference between the components of \({\varvec{w}}_1\) and \({\varvec{w}}_2\), we compute these quantities for the baart and wing test problems from the Matlab toolbox Regularization Tools [8, 9]. The problems are described in more detail in Sect. 4. We add Gaussian noise to the true right-hand side vectors with \(\delta = 10^{-7}\), \(10^{-5}\) and \(10^{-3}\). For baart, \(\Vert {{\varvec{b}}}_\delta \Vert _2 \approx 2.9\), \(\Vert Y_2\Vert _2= 64\) and \(\epsilon = 10\delta \) when \(p = 5\). Thus, (4) gives \(\Vert {\varvec{w}}_2\Vert _2 \le 26 \delta \) for baart. For wing, \(\Vert {{\varvec{b}}}_\delta \Vert _2\approx 0.15\) and \(\Vert Y_2\Vert _2 = 158\) with \(p = 3\). We find that when \(\delta \) is \(10^{-7}\), \(10^{-5}\) and \(10^{-3}\), \(\epsilon \) is \(1 \times 10^{-5}\), \(3.6 \times 10^{-4}\) and \(9\times 10^{-3}\), so that (4) is \(2\times 10^{-4}\), \(0.01\) and \(1\).

Fig. 1.
figure 1

Magnitudes of eigenvalues (\(*\)) of \(A_\delta \) and of corresponding components of \({\varvec{w}}\) for \(\delta = 10^{-7}\) (solid line) \(\delta = 10^{-5}\) (dashed line) and \(\delta = 10^{-3}\) (dot-dashed line).

Figure 1 shows that for both problems, as expected, the eigenvalues decay and there are a number of very small eigenvalues present. Associated with large eigenvalues are relatively large components of \({\varvec{w}}\) in magnitude. Once the eigenvalues decrease to around the level of the noise, the components of \({\varvec{w}}\) stay constant in magnitude at a level that depends on \(\Vert {{\varvec{b}}}_\delta \Vert _2\), the amount of noise and the conditioning of the eigenvectors associated with small eigenvalues. This level is, consequently, higher for wing than for baart. The structure of these two systems is typical of ill-posed linear systems and is exploited in the next section to analyse the convergence of GMRES.

3 GMRES Bounds

Our interest is in explaining the behaviour of GMRES applied to (1). To this end, we apply the bounds in Sect. 2 in Titley-Peloquin et al. [7], the first of which is cast in terms of a weighted least squares problem.

Theorem 1

Suppose that \(A_\delta \) has diagonalization \(A_\delta =Z\varLambda Z^{-1}\), \(\varLambda = {{\mathrm{diag}}}(\lambda _i)\), and let \({\varvec{w}}_1=W_1{\varvec{e}}\) and \({\varvec{w}}_2 = W_2 {\varvec{e}}\), where \({\varvec{w}}_1\) and \({\varvec{w}}_2\) are as in (3), \(W ={{\mathrm{diag}}}(w_i)\) and \({\varvec{e}}=[1,\dots ,1]^T\). Then the GMRES residuals satisfy

$$\begin{aligned} \frac{ \Vert {\varvec{r}}_k\Vert _2 }{ \Vert {{\varvec{b}}}\Vert _2 } \le \Vert Z\Vert _2\min _{\begin{array}{c} q\in \varPi _k\\ q(0)=1 \end{array}} \left\| \begin{bmatrix}W_1q(\varLambda _1)&\\&W_2q(\varLambda _2)\end{bmatrix}{\varvec{e}} \right\| _2. \end{aligned}$$
(5)

For our ill-posed problem, the weights in \(W_1\) are larger in magnitude than those in \(W_2\) and the eigenvalues in \(\varLambda _1\) are all larger in magnitude than the eigenvalues in \(\varLambda _2\). Thus, GMRES will initially choose polynomials that primarily reduce the size of \(W_1q(\varLambda _1)\) to the size of \(W_2q(\varLambda _2)\). In particular, when \(\Vert {\varvec{w}}_1\Vert _2\gg \Vert {\varvec{w}}_2\Vert _2\) we would expect that for the first \(p\) steps GMRES would mainly work on reducing the components of the residual associated with \(\varLambda _1\) and \(Z_1\).

When \(\Vert W_1q(\varLambda _1)\Vert _2\) is on the order of \(\Vert W_2(\varLambda _2)\Vert _2\) it is common for convergence to stagnate, after which residuals may increase in norm; this is known as semi-convergence. The following theorem can help to explain why semi-convergence occurs by explicitly separating the effects of large and small eigenvalues [7].

Theorem 2

Let \(A_\delta \) have diagonalization \(A_\delta =Z\varLambda Z^{-1}\). For any subset of indices \(\mathcal{J}\) with \(|\mathcal{J}|=p\), GMRES residuals with \(k>p\) satisfy

$$\begin{aligned} \frac{ \Vert {\varvec{r}}_k\Vert _2 }{ \Vert {{\varvec{b}}}_\delta \Vert _2 } \le \Vert Z\Vert _2\min _{\begin{array}{c} q\in \varPi _{k-p}\\ q(0)=1 \end{array}} \left( \sum _{\begin{array}{c} i=1\\ i\not \in \mathcal{J} \end{array}}^n |\tilde{w}_i|^2 |q(\lambda _i)|^2 \right) ^{1/2}, \end{aligned}$$
(6)

where

$$ \tilde{w}_i = w_i \prod _{j\in \mathcal{J}} \left( 1 - \frac{\lambda _i}{\lambda _j} \right) . $$

To examine the semi-convergence phase, we choose \(\mathcal{J} = [1,p]\). Then for any \(i \in [p+1,n]\), we have that \(|\lambda _i| \le |\lambda _j|\) and \( |\tilde{w}_i| \le \alpha ^p |w_i|, \) where \(\alpha \le 2\) and \(\alpha \) is around 1 or smaller when, say, there is a decent gap between the large and small eigenvalues or when all eigenvalues have the same sign. Thus, for any \(k > p\)

$$ \frac{ \Vert {\varvec{r}}_k\Vert _2 }{ \Vert {{\varvec{b}}}_\delta \Vert _2 } \le \alpha ^p\Vert Z\Vert _2\Vert {\varvec{w}}_2\Vert _2 \min _{\begin{array}{c} q\in \varPi _{k-p}\\ q(0)=1 \end{array}} \Vert q(\varLambda _2)\Vert _2. $$

Now, let us consider \(\Vert q(\varLambda _2)\Vert _2\). Since \(|\lambda _i|\ll 1\), \(i = p+1,\cdots ,n\) and \(q(0)=1\) it will be difficult to reduce \(\Vert q(\varLambda _2)\Vert _2\) significantly below 1. Consequently, we expect the residuals to stagnate at a level bounded by

$$\begin{aligned} \frac{ \Vert {\varvec{r}}_k\Vert _2 }{ \Vert {{\varvec{b}}}_\delta \Vert _2 } \le \alpha ^p \Vert Z\Vert _2 \Vert {\varvec{w}}_2\Vert _2. \end{aligned}$$
(7)

This, in conjunction with (4), indicates that the level of semi-convergence depends on the sizes of the large and small eigenvalues, the noise level \(\delta \), the norm of \({{\varvec{b}}}_\delta \) and the conditioning of the eigenvectors associated with small eigenvalues.

4 Numerical Results

We now compare the bounds (5) and (7) to the GMRES residuals for the baart and wing problems mentioned above, both of which are discretizations of Fredholm integral equations of the first kind. The integral equation for baart is

$$ \int _0^\pi e^{s\cos (t)}f(t) dt = 2\frac{\sinh (s)}{s}, \quad 0 \le s \le \frac{\pi }{2}, $$
Fig. 2.
figure 2

Plots of the relative GMRES residuals and (5) (\(\times \)) (left) and relative errors (right) for \(\delta = 10^{-7}\) (solid line) \(\delta = 10^{-5}\) (dashed line) and \(\delta = 10^{-3}\) (dot-dashed line).

Fig. 3.
figure 3

Plots of the relative GMRES residuals and (7) (\(+\)) for \(\delta = 10^{-7}\) (solid line) \(\delta = 10^{-5}\) (dashed line) and \(\delta = 10^{-3}\) (dot-dashed line).

which has the continuous solution \(f(t) = \sin (t)\). For the wing problem we solve

$$ \int _0^1 t e^{-st^2} f(t) dt = \frac{e^{-st_1^2} - e^{-st_2^2}}{2s}, \quad 0 \le s \le 1, $$

with \(t_1 = 1/3\) and \(t_2 = 2/3\). The discontinuous solution is

$$f(t) = {\left\{ \begin{array}{ll} 1 &{} t_1 < t < t_2,\\ 0 &{} \text { elsewhere }.\end{array}\right. } $$

Figure 2 shows the relative GMRES residuals and the relative errors. For both baart and wing the relative residuals decrease before stagnating at a level related to the noise level \(\delta \). Note that the staircase-like convergence behaviour for baart is particular to this problem. It appears to be related to the harmonic Ritz values, which at the \(k\)th step of GMRES are the eigenvalues of a certain \(k\times k\) matrix, and which define the GMRES polynomial \(q\) in (2) [10]. For fast convergence it is desirable that these harmonic Ritz values are good approximations of eigenvalues of \(A\). For baart, however, at the second and fourth steps there is a harmonic Ritz value that lies between two consecutive eigenvalues of \(A\); these are precisely the steps at which there is little reduction in the relative residual norm.

Unlike the relative residuals, for both problems the norm of the error initially decreases but then starts to increase. This increase occurs during the semi-convergence phase for baart but for the wing problem the errors increase before semi-convergence and exhibit a sawtooth-like behaviour. This highlights the importance of applying a sensible stopping criterion and the potential unsuitability of standard (unpreconditioned) GMRES for some ill-posed problems. Interestingly, (5) seems to provide a better indication of when the iterations should be stopped than the onset of semi-convergence for the wing problem for noisy right-hand side vectors, although we have not investigated this further.

It is clear from Fig. 2 that the bound (5) is very descriptive during the first phase of convergence. Although the bound is not quantitatively descriptive in the second phase of convergence, it accurately predicts the onset of semi-convergence. The approximation (7) is an upper bound on the relative residuals during the semi-convergence phase for both problems (see Fig. 3). Note that for both problems \(\alpha \approx 1\). Since (6) is an upper bound on (5), we cannot expect (7) to be quantitatively accurate. Nevertheless, it provides an analysis of semi-convergence and the factors that can affect the level at which residual norms stagnate.

5 Conclusions

In this paper we have applied simple bounds on GMRES convergence to the nearly singular systems that arise from ill-posed problems. We have shown that GMRES initially reduces the residual components associated with large eigenvalues. Once these components are commensurate with those associated with small eigenvalues semi-convergence sets in, with the level at which residuals stagnate determined by the sizes of small eigenvalues, the noise in the right-hand side vector, the size of \({{\varvec{b}}}\) and the eigenvectors associated with small eigenvalues.