1 Introduction

Non-smooth regularisation methods are popular tools in the imaging sciences. They allow to promote sparsity of inverse problem solutions with respect to specific representations; they can implicitly restrict the null-space of the forward operator while guaranteeing noise suppression at the same time. The most prominent representatives of this class are total variation regularisation [19] and \(\ell ^1\)-norm regularisation as in the broader context of compressed sensing [8, 10].

In order to solve convex, non-smooth regularisation methods with linear operator constraints computationally, first-order operator splitting methods have gained increasing interest over the last decade, see [3, 9, 11, 12] to name just a few. Despite some recent extensions to certain types of non-convex problems [7, 14,15,16] there has to our knowledge only been made little progress for nonlinear operators constraints [2, 22].

In this paper we are particularly interested in minimising non-smooth, convex functionals with nonlinear operator constraints. This model covers many interesting applications; one particular application that we are going to address is the joint reconstruction of the spin-proton-density and coil sensitivity maps in parallel MRI [13, 21].

The paper is structured as follows: we will introduce the generic problem formulation, then address its numerical minimisation via a generalised ADMM method with linearised operator constraints. Subsequently we will show connections to the recently proposed NL-PDHGM method (indicating a local convergence result of the proposed algorithm) and conclude with the joint spin-proton-density and coil sensitivity map estimation as a numerical example.

2 Problem Formulation

We consider the following generic constrained minimisation problem:

$$\begin{aligned} (\hat{u}, \hat{v})&= \mathop {{{\mathrm{\arg \min }}}}\limits _{u, v} \left\{ H(u) + J(v) \ \text {subject to} \ F(u, v) = c\right\} . \end{aligned}$$
(1)

Here H and J denote proper, convex and lower semi-continuous functionals, F is a nonlinear operator and c a given function. Note that for nonlinear operators of the form \(F(u, v) = G(u) - v\) and \(c = 0\) problem (1) can be written as

$$\begin{aligned} \hat{u}&= \mathop {{{\mathrm{\arg \min }}}}\limits _{u} \left\{ H(u) + J(G(u)) \right\} . \end{aligned}$$
(2)

In the following we want to propose a strategy for solving (1) that is based on simultaneous linearisation of the nonlinear operator constraint and the solution of an inexact ADMM problem.

3 Alternating Direction Method of Multipliers

We solve (1) by alternating optimisation of the augmented Lagrange function

$$\begin{aligned} \mathcal {L}_\delta (u, v; \mu ) = H(u) + J(v) + \langle \mu , F(u, v) - c \rangle + \frac{\delta }{2} \Vert F(u, v) - c\Vert _2^2. \end{aligned}$$
(3)

Alternating minimisation of (3) in u, v and subsequent maximisation of \(\mu \) via a step of gradient ascent yields this nonlinear version of ADMM [11]:

$$\begin{aligned} u^{k + 1}&\in \mathop {{{\mathrm{\arg \min }}}}\limits _u \left\{ \frac{\delta }{2} \Vert F(u, v^k) - c\Vert _2^2 + \langle \mu ^k, F(u, v^k) \rangle + H(u) \right\} , \end{aligned}$$
(4)
$$\begin{aligned} v^{k + 1}&\in \mathop {{{\mathrm{\arg \min }}}}\limits _v \left\{ \frac{\delta }{2} \Vert F(u^{k + 1}, v) - c\Vert _2^2 + \langle \mu ^k, F(u^{k + 1}, v) \rangle + J(v) \right\} , \end{aligned}$$
(5)
$$\begin{aligned} \mu ^{k + 1}&= \mu ^k + \delta \left( F(u^{k + 1}, v^{k + 1}) - c \right) . \end{aligned}$$
(6)

Not having to deal with nonlinear subproblems, we replace \(F(u, v^k)\) and \(F(u^{k + 1}, v)\) by their Taylor linearisations around \(u^k\) and \(v^k\), which yields \(F(u, v^k) \approx F(u^k, v^k) + \partial _u F(u^k, v^k)\left( u - u^k \right) \) and \(F(u^{k + 1}, v) \approx F(u^{k + 1}, v^k) + \partial _v F(u^{k + 1}, v^k)\left( v - v^k \right) \), respectively. The updates (4) and (5) modify to

$$\begin{aligned} u^{k + 1}&\in \mathop {{{\mathrm{\arg \min }}}}\limits _u \left\{ \frac{\delta }{2} \left\| A^k u - c_1^k\right\| _2^2 + \langle \mu ^k, A^k u \rangle + H(u) \right\} , \end{aligned}$$
(7)
$$\begin{aligned} v^{k + 1}&\in \mathop {{{\mathrm{\arg \min }}}}\limits _v \left\{ \frac{\delta }{2} \left\| B^k v - c_2^k \right\| _2^2 + \langle \mu ^k, B^k v \rangle + J(v) \right\} , \end{aligned}$$
(8)

with \(A^k := \partial _u F(u^k, v^k)\), \(B^k := \partial _v F(u^{k + 1}, v^k)\), \(c_1^k := c + A^k u^k - F(u^k, v^k)\) and \(c_2^k := c + B^k v^k - F(u^{k + 1}, v^k)\). Note that the updates (7) and (8) are still implicit, regardless of H and J. In the following, we want to modify the updates such that they become simple proximity operations.

4 Preconditioned ADMM

Based on [23], we modify (7) and (8) by adding the surrogate terms \(\Vert u^{k + 1} - u^k \Vert _{Q^k_1}^2 / 2\) and \(\Vert v^{k + 1} - v^k \Vert _{Q^k_2}^2 / 2\), with \(\Vert w \Vert _Q := \sqrt{\langle Qw, w\rangle }\) (note that if Q is chosen to be positive definite, \(\Vert \cdot \Vert _Q\) becomes a norm). We then obtain

$$\begin{aligned} u^{k + 1}&\in \mathop {{{\mathrm{\arg \min }}}}\limits _u \left\{ \frac{\delta }{2} \left\| A^k u - c_1^k\right\| _2^2 + \langle \mu ^k, A^k u \rangle + H(u) + \frac{1}{2}\Vert u - u^k \Vert _{Q_1^k}^2 \right\} , \end{aligned}$$
$$\begin{aligned} v^{k + 1}&\in \mathop {{{\mathrm{\arg \min }}}}\limits _v \left\{ \frac{\delta }{2} \left\| B^k v - c_2^k \right\| _2^2 + \langle \mu ^k, B^k v \rangle + J(v) + \frac{1}{2}\Vert v - v^k \Vert _{Q_2^k}^2 \right\} . \end{aligned}$$
(9)

If we choose \(Q_1^k := \tau _1^k I - \delta A^k {}^* A^k\) with \(\tau _1^k \delta < 1/\Vert A^k \Vert ^2\) and \(Q_2^k := \tau _2^k I - \delta B^k {}^* B^k\) with \(\tau _2^k \delta < 1/\Vert B^k \Vert ^2\) and if we define \(\overline{\mu }^k := 2\mu ^k - \mu ^{k - 1}\) we obtain

$$\begin{aligned} u^{k + 1}&= \left( I + \tau _1^k \partial H \right) ^{-1} \left( u^k - \tau _1^k A^k{}^*\overline{\mu }^k \right) , \\ v^{k + 1}&= \left( I + \tau _2^k \partial J \right) ^{-1} \left( v^k - \tau _2^k B^k{}^*\left( \mu ^k + \delta \left( F(u^{k + 1}, v^k) - c \right) \right) \right) , \end{aligned}$$
(10)

with \((I + \alpha \partial E)^{-1}(w)\) denoting the proximity or resolvent operator

$$\begin{aligned} (I + \alpha \partial E)^{-1}(w) := \mathop {{{\mathrm{\arg \min }}}}\limits _u \left\{ \frac{1}{2} \Vert u - w\Vert _2^2 + \alpha E(u) \right\} . \end{aligned}$$

The entire proposed algorithm with updates (9), (10) and (6) reads as

figure a

5 Connection to NL-PDHGM

In the following we want to show how the algorithm simplifies in case the nonlinear operator constraint is only nonlinear in one variable, which is sufficient for problems of the form (2). Without loss of generality we consider constraints of the form \(F(u, v) = G(u) - v\), where G represents a nonlinear operator in u. Then we have \(A^k = {{\mathrm{\mathcal {J}}}}\!G(u^k)\) (with \({{\mathrm{\mathcal {J}}}}\!G(u^k)\) denoting the Jacobi matrix of G at \(u^k\)), \(B^k = -I\) and if we further choose \(\tau _2^k = 1/\delta \) for all k, update (10) reads

$$\begin{aligned} v^{k + 1} = \left( I + \frac{1}{\delta } \partial J \right) ^{-1} \left( G(u^{k + 1}) + \frac{1}{\delta }\mu ^k \right) . \end{aligned}$$

Applying Moreau’s identity [18] \(b = \left( I + \frac{1}{\delta } \partial J\right) ^{-1}(b) + \frac{1}{\delta }(I + \delta \partial J^*)^{-1}(\delta b)\) yields

$$\begin{aligned} \mu ^{k + 1} = \left( I + \delta \partial J^{*} \right) ^{-1}\left( \mu ^k + \delta G(u^{k + 1}) \right) . \end{aligned}$$

If we further change the order of the updates, starting with the update for \(\mu \), the whole algorithm reads

$$\begin{aligned} \mu ^{k + 1}&= \left( I + \delta \partial J^{*} \right) ^{-1}\left( \mu ^k + \delta G(u^k) \right) ,\\ \overline{\mu }^{k + 1}&= 2\mu ^{k + 1} - \mu ^k,\\ u^{k + 1}&= \left( I + \tau _1^k \partial H \right) ^{-1} \left( u^k - \tau _1^k {{\mathrm{\mathcal {J}}}}\!G(u^k)^* \overline{\mu }^{k + 1} \right) . \end{aligned}$$

Note that this algorithm is almost the same as NL-PDHGM proposed in [22] for \(\theta = 1\), except that the extrapolation step is carried out on the dual variable \(\mu \) instead of the primal variable u. In the following we want to briefly sketch how to prove convergence for this algorithm in analogy to [22]. We define

$$\begin{aligned} N(\mu ^{k + 1}, u^{k + 1})&:= \left( \begin{array}{c} \partial J^*(\mu ^{k + 1}) - \nabla G(u^k) u^{k + 1} - c^k\\ \partial H(u^{k + 1}) + {{\mathrm{\mathcal {J}}}}\!G(u^k)^* \mu ^{k + 1} \end{array} \right) ,\\ L^k&:= \left( \begin{array}{cc} \frac{1}{\delta } I &{} {{\mathrm{\mathcal {J}}}}\!G(u^k)\\ {{\mathrm{\mathcal {J}}}}\!G(u^k)^* &{} \frac{1}{\tau _1^k} I \end{array}\right) , \end{aligned}$$

with \(c^k := G(u^k) - {{\mathrm{\mathcal {J}}}}\!G(u^k)u^k\). Now the algorithm is: find \((\mu ^{k+1}, u^{k+1})\) such that

$$\begin{aligned} N(\mu ^{k + 1}, u^{k + 1}) + L^k(\mu ^{k + 1}-\mu ^k, u^{k + 1}-u^k) \ni 0. \end{aligned}$$

If we exchange the order of \(\mu \) and u here, i.e., reorder the rows of N, and the rows and columns of \(L^k\), we obtain almost the “linearised” NL-PDHGM of [22]. The difference is that the sign of \({{\mathrm{\mathcal {J}}}}G\) in \(L^k\) is inverted. The only points in [22] where the exact structure of \(L^k\) (\(M_{x^k}\) therein) is used, are Lemma 3.1, Lemma 3.6 and Lemma 3.10. The first two go through exactly as before with the negated structure. Reproducing Lemma 3.10 demands bounding actual step lengths \(\Vert u^k-u^{k+1} \Vert \) and \(\Vert \mu ^k-\mu ^{k+1} \Vert \) from below, near a solution for arbitrary \(\epsilon >0\). A proof would go beyond the page limit of this proceeding. Let us just point out that this can be done, implying that the convergence results of [22] apply for this algorithm as well. This means that under somewhat technical regularity conditions, which for TV type problems amount to Huber regularisation, local convergence in a neighbourhood of the true solution can be guaranteed.

6 Joint Estimation of the Spin-Proton Density and Coil Sensitivities in Parallel MRI

We want to demonstrate the numerical capabilities of Algorithm 1 by applying it to the nonlinear problem of joint estimation of the spin-proton density and the coil sensitivities in parallel MRI. The discrete problem of joint reconstruction from sub-sampled k-space data on a rectangular grid reads

$$\begin{aligned} \left( \begin{array}{c} \hat{u}\\ \hat{c_1}\\ \vdots \\ \hat{c_2} \end{array}\right) \in \mathop {{{\mathrm{\arg \min }}}}\limits _{\mathbf {v}=(u, c_1, \ldots , c_n)} \left\{ \frac{1}{2} \sum _{j = 1}^n \Vert S\mathcal {F}(G(\mathbf v))_j - f_j \Vert _2^2 + \alpha _0 R_0(u) + \sum _{j = 1}^n \alpha _j R_j(c_j) \right\} , \end{aligned}$$

where \(\mathcal {F}\) is the 2D discrete Fourier transform, \(f_j\) are the k-space measurements for each of the n coils, S is the sub-sampling operator and \(R_j\) denote appropriate regularisation functionals. The nonlinear operator G maps the unknown spin-proton density u and the different coil sensitivities \(c_j\) as follows [21]:

$$\begin{aligned} G(u, c_1, \ldots , c_n) = (u c_1, u c_2, \ldots , u c_n)^T. \end{aligned}$$
(11)

In order to compensate for sub-sampling artefacts in sub-sampled MRI it is common practice to use total variation as a regulariser [6, 17]. Coil sensitivities are assumed to be smooth, cf. Fig. 1, motivating a reconstruction model similar to the one proposed in [13]. We therefore choose the discrete isotropic total variation, \(R_0(u) = \Vert \nabla u \Vert _{2, 1}\), and the smooth 2-norm of the discretised gradient, i.e. \(R_j(c_j) := \Vert \nabla c_j \Vert _{2, 2}\), for all \(j > 0\), following the notation in [4]. We further introduce regularisation parameters \(\lambda _j\) in front of the data fidelities and rescale all regularisation parameters such that \(\alpha _0 + \frac{1}{n}\left( \sum _{j = 1}^n \lambda _j + \sum _{j = 1}^n \alpha _j \right) = 1\). In order to realise this model via Algorithm 1 we consider the following operator splitting strategy. We define \(F(u_0, \ldots , u_n, v_0, \ldots , v_{2n})\) as

$$\begin{aligned} F(u_0, \ldots , u_n, v_1, \ldots , v_n) := \left( \begin{array}{c} G(u_0, \ldots , u_n)\\ \begin{array}{cccc} \nabla u_0 &{} 0 &{} \cdots &{} 0\\ 0 &{} \nabla u_1 &{} \ddots &{} \vdots \\ \vdots &{} \ddots &{} \ddots &{} 0 \\ 0 &{} \cdots &{} 0 &{} \nabla u_n\end{array} \end{array}\right) - \left( \begin{array}{c} v_0 \\ \vdots \\ v_n \\ \vdots \\ v_{2n} \end{array}\right) , \end{aligned}$$

set \(H(u_0, \ldots , u_n) \equiv 0\), and \(J(v_0, \ldots , v_{2n}) = \sum _{j = 0}^{2n} J_j(v_j)\) with \(J_j(v_j) := \frac{\lambda _j}{2}\Vert S\mathcal {F} v_j - f_j \Vert _2^2\) for \(j \in \{0, \ldots , n - 1\}\), \(J_n(v_n) = \alpha _0 \Vert v_n \Vert _{2, 1}\) and \(J_j(v_j) = \alpha _{j - n} \Vert v_j \Vert _{2, 2} \) for \(j \in \{ n + 1, \ldots , 2n\}\). Note that with these choices of functions, all the resolvent operations can be carried out easily. In particular, we obtain

$$\begin{aligned} (I + \tau _1^k \partial H)^{-1}(w)&= w,\\ (I + \tau _2^k \partial J_j)^{-1}(w)&= \mathcal {F}^{-1} \left( \frac{\mathcal {F}w_j + \tau _2^k \lambda _j S^T f_j}{1 + \tau _2^k \lambda _j \text {diag}(S^T S) } \right) \ \text {for} \ j \in \{0, \ldots , n - 1\},\\ (I + \tau _2^k \partial J_n)^{-1}(w)&= \frac{w_n}{\Vert w_n \Vert _2}\max \left( \Vert w_n \Vert _2 - \alpha _0 \tau _2^k, 0 \right) ,\\ (I + \tau _2^k \partial J_j)^{-1}(w)&= \frac{w_j}{\Vert w_j \Vert _{2, 2}}\max \left( \Vert w_j \Vert _{2, 2} - \alpha _{j-n} \tau _2^k, 0 \right) \ \text {for} \ j \in \{n + 1, \ldots , 2n\}. \end{aligned}$$

Moreover, as \(B_k = -I\) (and thus, \(\Vert B^k \Vert = 1\)) for all k, we can simply eliminate \(\tau _2^k\) by replacing it with \(1/\delta \), similar to Sect. 5.

Fig. 1.
figure 1

(a) shows the brain phantom as described in Sect. 6.1. (c)–(j) show visualisations of the measured coil sensitivities of a water bottle. (b) shows the simulated, spiral-shaped sub-sampling scheme used to sub-sample the k-space data.

Fig. 2.
figure 2

Reconstructions for noise with low noise level \(\sigma = 0.05\). Despite the sub-sampling, features of the brain phantom are very well preserved. In addition, the coil sensitivities seem to correspond well to the original ones, despite a slight loss of contrast. Note that coil sensitivities remain the initial value where the signal is zero.

6.1 Experimental Setup

We now want to discuss the experimental setup. We want to reconstruct the synthetic brain phantom in Fig. 1a from sub-sampled k-space measurements. The numerical phantom is based on the design in [1] with a matrix size of \(190 \times 190\). It consists of several different tissue types like cerebrospinal fluid (CSF), gray matter (GM), white matter (WM) and cortical bone. Each pixel is assigned a set of MR tissue properties: Relaxation times \(\text {T}_1(x,y)\) and \(\text {T}_2(x,y)\) and spin density \(\rho (x,y)\). These parameters were also selected according to [1]. The MR signal s(xy) in each pixel was then calculated by using the signal equation of a fluid attenuation inversion recovery (FLAIR) sequence [5]:

$$\begin{aligned} s(x,y) = \rho (x,y)(1-2\ e^{-\text {TI}/\text {T}_1(x,y)})(1 - e^{-\text {TR}/\text {T}_1(x,y)})\ e^{-\text {TE}/\text {T}_2(x,y)}. \end{aligned}$$

The sequence parameters were selected: TR = 10000 ms, TE = 90 ms. TI was set to 1781 ms to achieve signal nulling of CSF (\(\text {T}_1^\text {csf} \log (2)\) with \(\text {T}_1^\text {csf} = 2569\,\text {ms}\)).

In order to generate artificial k-space measurements for each coil, we proceed as follows. First, we produce 8 images of the brain phantom multiplied by the measured coil sensitivity maps shown in Fig. 1c–j. The coil sensitivity maps were generated from the measurements of a water bottle with an 8-channel head coil array. Then we produce artificial k-space data by applying the 2D discrete Fourier-transform to each of those individual images. Subsequently, we sub-sample only approx. 25% of each of the k-space datasets via the spiral shown in Fig. 1b. Finally, we add Gaußian noise with standard deviation \(\sigma \) to the sub-sampled data.

Fig. 3.
figure 3

Reconstructions for noise with high noise level \(\sigma = 0.95\). Due to the large amount of noise, higher regularisation parameters are necessary. As a consequence, fine structures are smoothed out and in contrast to the case of little noise, compensation of sub-sampling artefacts is less successful.

6.2 Computations

For the actual computations we use two noisy versions \(f_j\) of the simulated k-space data; one with small noise (\(\sigma = 0.05\)) and one with a high amount of noise (\(\sigma = 0.95\)). As stopping criterion we simply choose a fixed number of iterations; for both the low noise level as well as the high noise level dataset we have fixed the number of iterations to 1500. The initial values used for the algorithm are \(u_j^0 = \mathbf 1 \) with \(\mathbf 1 \in \mathbb {R}^{l \times 1}\) being the constant one-vector, for all \(j \in \{0, \ldots , n\}\). All other initial variables (\(v^0\), \(\mu ^0\), \(\overline{\mu }^0\)) are set to zero.

Low Noise Level. We have computed reconstructions from the noisy data with noise level \(\sigma = 0.05\) via Algorithm 1, with regularisation parameters set to \(\lambda _j = 0.0621\), \(\alpha _0 = 0.0062\) and \(\alpha _j = 0.9317\) for \(j \in \{1, \ldots , n\}\). We have further created a naïve reconstruction by averaging the individual inverse Fourier-transformed images obtained from zero-filling the k-space data. The modulus images of the results are visualised in Fig. 2. The PSNR values for the averaged zero-filled reconstruction is 10.2185, whereas the PSNR of the reconstruction with the proposed method is 24.5572.

High Noise Level. We proceeded as in the previous section, but for noisy data with noise level \(\sigma = 0.95\). The regularisation parameters were set to \(\lambda _j = 0.0149\), \(\alpha _0 = 0.0135\) and \(\alpha _j = 0.9716\) for \(j \in \{1, \ldots , n\}\). The modulus images of the results are visualised in Fig. 3. The PSNR values for the averaged zero-filled reconstruction is 9.9621, whereas the PSNR of the reconstruction with the proposed method is 16.672.

7 Conclusions and Outlook

We have presented a novel algorithm that allows to compute minimisers of a sum of convex functionals with nonlinear operator constraint. We have shown the connection to the recently proposed NL-PDHGM algorithm which implies local convergence results in analogy to those derived in [22]. Subsequently we have demonstrated the computational capabilities of the algorithm by applying it to a nonlinear joint reconstruction problem in parallel MRI.

For future work, the convergence of the algorithm in the general setting has to be verified, and possible extensions to guarantee global convergence have to be studied. Generalisation of stopping criteria such as a linearised primal-dual gap will also be of interest as well. With respect to the presented parallel MRI application, exact conditions for the convergence (like the exact norm of the bounds) have to be verified. The impact of the algorithm - as well as the regularisation-parameters on the reconstruction has to be analysed, and a rigorous study with artificial and real data would also be desirable. Moreover, future research will focus on alternative regularisation functions, e.g. based on spherical harmonics motivated by [20]. Last but not least, other applications that can be modelled via (1) should be considered in future research.