1 Introduction

Dynamic cardiac MRI is considered as the gold standard among several imaging modalities in heart function diagnosis. However, due to its long acquisition time, its clinical application has been limited to non-time-critical ones. Recent research advances in Compressed Sensing (CS) have been successfully applied to MRI [8] to reduce acquisition time. Nevertheless, CS-MRI poses a new challenge – the reconstruction time also increases because it needs to solve an in-painting inverse problem in the frequency domain (i.e., k-space). Therefore, accelerating the reconstruction process is a top priority to adopt CS framework to fast MRI diagnosis.

Conventional CS-MRI reconstruction methods have exploited the sparsity of signal by applying universal sparsifying transforms such as Fourier (e.g., discrete Fourier transform (DFT) or discrete cosine transform (DCT)), Total Variation (TV), and Wavelets (e.g., Haar, Daubechies, etc.). This research direction has focused on accelerating the sparsity-based energy minimization problem, with [10] or without hardware supports [7]. Some strategies were designed to accelerate the minimization process such as using TV plus nuclear norm [14] or proposed the solver in other sparsity domain such as low-rank technique [9, 12]. More recently, the other approaches leveraging the state-of-the-art data-driven method, i.e., dictionary learning [2] (DL), have been proposed to further enhance the reconstruction quality [3, 5, 6, 11]. However, the existing learning-based methods suffer from the drawback of patch-based dictionary (i.e., redundant atoms and longer running times).

Convolutional sparse coding (CSC) is a new learning-based sparse representation that approximates the input signal with a superposition of sparse feature maps convolved with a collection of filters. This advanced technique replaces the patch-based dictionary learning process with an energy minimization process using a convolution operator on the image domain, which leads to an element-wise multiplication in frequency domain, derived within Alternating Direction Method of Multiplier (ADMM) framework [4], and later its direct inverse problem is introduced by Wohlberg [13]. CSC can generate much compact dictionaries due to its shift-invatiant nature of filters, and the pixel-wise computation in Fourier domain maps well to parallel architecture. However, such advanced machine learning approaches have not been fully exploited in CS-MRI literature yet. Therefore, in this paper, we propose a novel CS dynamic MRI reconstruction that exploits the compactness and efficiency of 3D CSC. The proposed 3D CSC directly encodes both spatial and temporal features from dynamic cardiac 2D MRI using a compact set of 3D atoms (i.e., filters) without regularizers enforcing temporal coherence (e.g., total variation along the time axis). We also show that the proposed method maps well to data-parallel architecture, such as GPUs, for further accelerating its running time significantly, up to two orders of magnitude faster compared to the state-of-the-art CPU implementation of CS-MRI using patch-based dictionary learning. To the best of our knowledge, this is the first CS-MRI reconstruction method based on GPU-accelerated 3D CSC.

2 Method

Figure 1 is a pictorial description of the proposed method. If the inverse Fourier transform is directly applied to undersampled MRI k-space data (Fig. 1a \(\times \)4 undersampling), the reconstructed images will suffer from artifacts (Fig. 1b). The zero-filling reconstruction will serve as an initial guess for our iterative reconstruction process with randomly initialized filters, e.g., a collection of 16 atoms of size 9\(\times \)9\(\times \)9 as shown in Fig. 1d. Then the image and filters are iteratively updated until they converge as shown in Fig. 1c, e and f.

Fig. 1.
figure 1

An overview of CS-MRI reconstruction using 3D CSC method.

The proposed CS-MRI reconstruction algorithm is a process of finding s (i.e., a stack of 2D MR images for a given time duration) in the energy minimization problem defined as follows:

$$\begin{aligned}&\mathop {\min }\limits _{d,x,s} \,\,\,\frac{\alpha }{2}\left\| s - \sum _{k} d_k * x_k \right\| _2^2 + \lambda \sum _{k} \left\| x_k \right\| _1\, \nonumber \\&\quad s.t.: \, \left\| {R_{} \mathcal {F}_2 s - m} \right\| _2^2 < \varepsilon ^2 , \, \left\| d_k \right\| _2^2 \leqslant 1 \end{aligned}$$
(1)

where \(d_k\) is the k-th filter (or atom in the dictionary) and \(x_k\), is its corresponding sparse code for s. In Eq. (1), the first term measures the difference between s and its sparse approximation \(s-\sum _{k} d_k * x_k\), weighted by \(\alpha \). The second term is the sparsity regularization of \(x_k\) using an \(\ell \)1 norm with a weight \(\lambda \) instead of an \(\ell \)0 norm as used in [2, 5, 6]. The rest of the equation is the collection of constraints - the first constraint enforces the consistency between undersampled measurement m and the undersampled reconstructed image using the mask R and the Fourier operator \(\mathcal {F}\), and the second constraint restricts the Frobenius norm of each atom \(d_k\) within a unit length. In the following discussion, we will use a simplified notation without indices k and replace the result of Fourier transform of a given variable by using the subscript f (for example, \(d_{f}\) is the simplified notation for \(\mathcal {F}d\) in 3D domain and \(s_{f_2}\) is the simplified notation for \(\mathcal {F}_2s\) in 2D spatial domain) to derive the solution of Eq. (1). Therefore, problem 1 can be rewritten using auxiliary variables y and g for x and d as follows:

$$\begin{aligned}&\mathop {\min }\limits _{d,x,g,y,s} \,\,\,\frac{\alpha }{2}\left\| s - \sum d * x \right\| _2^2 + \lambda \left\| y \right\| _1\nonumber \\&\quad s.t.: \, x - y = 0, \, \left\| {R \mathcal {F}_2 s - m} \right\| _2^2 < \varepsilon ^2, g=\mathcal {P}roj(d), \, \left\| g \right\| _2^2 \leqslant 1 \end{aligned}$$
(2)

where g and d are related by a projection operator as a combination of a truncated matrix followed by a padding-zero matrix in order to make the dimension of g same as that of x. Since we will leverage Fourier transform to solve this problem, g should be zero-padded to make its size same as \(g_f\) and \(x_f\). The above constrained problem can be rebuilt in an unconstrained form with dual variables u, h, and further regulates the measurement consistency and the dual differences with \(\gamma \), \(\rho \), and \(\sigma \), respectively:

$$\begin{aligned}&\mathop {\min }\limits _{d,x,g,y,s} \,\,\,\frac{\alpha }{2}\left\| s - \sum d * x \right\| _2^2 + \lambda \left\| y \right\| _1^{} + \frac{\gamma }{2}\left\| {R_{} \mathcal {F}_2 s - m} \right\| _2^2 \nonumber \\&\quad + \frac{\rho }{2}\left\| {x - y + u} \right\| _2^2 + \frac{\sigma }{2}\left\| {d - g + h} \right\| _2^2 \, s.t.: \, g=\mathcal {P}roj(d),\, \left\| g \right\| _2^2 \leqslant 1 \end{aligned}$$
(3)

Then we can solve problem (3) by iteratively finding the solution of independent smaller problems, as described below:

Solve for x:

$$\begin{aligned} \mathop {\min }\limits _{x} \,\,\,\frac{\alpha }{2}\left\| \sum d * x - s\right\| _2^2 + \frac{\rho }{2}\left\| {x - y + u} \right\| _2^2 \end{aligned}$$
(4)

If we apply the Fourier transform to the (4), it becomes:

$$\begin{aligned} \mathop {\min }\limits _{x_f} \,\,\,\frac{\alpha }{2}\left\| \sum d_fx_f - s_f\right\| _2^2 + \frac{\rho }{2}\left\| {x_f - y_f + u_f} \right\| _2^2 \end{aligned}$$
(5)

Then the minimum solution of (5) can be found by taking the derivative of (5) with respect to \(x_f\) and setting it to zero as follows:

$$\begin{aligned} \left( {\alpha {D_f^H D_f } + \rho I} \right) x_f = {D_f^H s_f } + \rho \left( {y_f - u_f } \right) \end{aligned}$$
(6)

Note that the notation \(D_f\) stands for the concatenated matrix of all diagonalized matrices \(d_{fk}\) as follows: \(D_f = [diag(d_{f1}), ..., diag(d_{fk})]\) and \(D_f^H\) is the complex conjugated transpose of \(D_f\).

Solve for y:

$$\begin{aligned} \mathop {\min }\limits _y \quad \lambda {\left\| {y} \right\| _1 } + \frac{\rho }{2}\left\| {x - y + u} \right\| _2^2 \end{aligned}$$
(7)

y for \(\ell \)1 minimization problem can be found by using a shrinkage operation:

$$\begin{aligned} y = \mathcal {S}_{{\lambda /\rho }} \left( {x + u} \right) \end{aligned}$$
(8)

Update for u: The update rule for u can be defined as a fixed- point iteration with the difference between x and y (u converges when x and y converge each other) as follows:

$$\begin{aligned} u = u + x - y \end{aligned}$$
(9)

Solve for d:

$$\begin{aligned} \mathop {\min }\limits _d \quad \frac{\alpha }{2}\left\| {\sum {d * x} - s} \right\| _2^2 + \frac{\sigma }{2}\left\| {d - g + h} \right\| _2^2 \end{aligned}$$
(10)

Similar to x, d can be solved in the Fourier domain:

$$\begin{aligned}&\mathop {\min }\limits _{d_f } \quad \frac{\alpha }{2}\left\| {\sum {d_f x_f } - s_f } \right\| _2^2 + \frac{\sigma }{2}\left\| {d_f - g_f + h_f } \right\| _2^2 \end{aligned}$$
(11)
$$\begin{aligned}&\left( {\alpha {X_f^H X_f } + \sigma I} \right) d_f = {X_f^H s_f } + \sigma \left( {g_f - g_f } \right) \end{aligned}$$
(12)

where \(X_f\) stands for the concatenated matrix of all diagonalized matrices \(x_{fk}\) as follows: \(X_f = [diag(x_{f1}), ..., diag(x_{fk})]\) and \(X_f^H\) is the complex conjugated transpose of \(X_f\).

Solve for g:

$$\begin{aligned} \mathop {\min }\limits _g \quad \frac{\sigma }{2}\left\| {d - g + h} \right\| _2^2 \qquad s.t.: \qquad g=\mathcal {P}roj(d), \, \left\| g \right\| _2^2 \leqslant 1 \end{aligned}$$
(13)

g can be found by taking the inverse Fourier transform of \(d_f\). This projection should be constrained by suppressing the elements which are outside the filter size \(d_k\), and followed by normalizing its \(\ell \)2-norm to a unit length.

Update for h: Similar to u, the update rule for h can be defined as follows:

$$\begin{aligned} h = h + d - g \end{aligned}$$
(14)

Solve for s:

$$\begin{aligned} \mathop {\min }\limits _{s} \,\,\,\frac{\alpha }{2}\left\| s - \sum d * x \right\| _2^2 + \frac{\gamma }{2}\left\| {R_{} \mathcal {F}_2 s - m} \right\| _2^2 \end{aligned}$$
(15)

The objective function of (15) can be transformed into 2D Fourier domain:

$$\begin{aligned} \mathop {\min }\limits _{s_{f_2} } \quad \frac{\alpha }{2}\left\| {s_{f_2} - \mathcal {F}_t^H \sum {d_f x_f } } \right\| _2^2 + \frac{\gamma }{2}\left\| {R s_{f_2} - m} \right\| _2^2 \end{aligned}$$
(16)

Since \(d_f\) and \(x_f\) obtained previously in 3D Fourier domain, we need to bring it onto the same space by applying an inverse Fourier transform along time-axis \(\mathcal {F}_t^H\). Then \(s_{f_2}\) can be found by solving the following linear system:

$$\begin{aligned} \left( {\gamma R_{}^H R + \alpha I} \right) s_{f_2} = \gamma R_{}^H m + \alpha \mathcal {F}_t^H \sum \limits _{} {d_f x_f } \end{aligned}$$
(17)

Note that the efficient solutions of (6), (12) and (17) can be determined via the Sherman-Morrison formula for independent linear systems as shown in [13]. To this end, after the iteration process, s will be the results of applying a 2D inverse Fourier transform \(\mathcal {F}_2^H\) on \(s_{f_2}\).

Implementation Details: Since the above derivation consists only Fourier transform and element-wise operations, it maps well to data-parallel architecture, such as GPUs. We used MATLAB to implement the proposed method using the GPU. We set \(\alpha = 1\), \(\gamma = 1\), \(\lambda = 0.1\), \(\rho = 10\), \(\sigma = 10\) and keep refining the filter banks as well as the reconstruction iteratively until they converge.

3 Result

In order to assess the performance of the proposed method, we compared our algorithm with the stage-of-the-art dictionary learning-based CS reconstruction from Caballero et. al. [5], and the conventional CS reconstruction using wavelet and total variation energy from Quan et. al. [10]. We used three cardiac MRI datasets from The Data Science Bowl [1] – 2 chamber view (2ch), 4 chamber view (4ch), and short axis view (sax). Each dataset consists of 30 frames of a \(256\times 256\) image across the cardiac cycle of a heart. In the experiment, we used 3D atoms of size \(9\times 9\times 9\) and CS-undersampling factor was set to \(\times 4\).

Running Time Evaluation: In order to make this direct performance comparison of learning-based methods between the proposed one and Caballero et al. [5], we measured wall clock running time of both methods on a PC equipped with an Intel i7 CPU with 16 GB main memory and an NVIDIA GTX Geforce Titan X GPU. Our prototype code is written in MATLAB 2015b including GPU implementation, and we used the author-provided MATLAB code for Caballero et al. [5]. As shown in Table 1, we observed that our CPU-based method is about 54\(\times \) to 73\(\times \), or about two orders of magnitude, faster than the stage-of-the-art DL-based CS-MRI reconstruction method for 100 epochs (i.e., the number of learning iterations). In addition, our GPU-based accelerated implementation also outperforms the CPU version about 1.25\(\times \) to 3.82\(\times \), which is greatly reduced to a level closer to be ready for clinical application. We expect that the performance of our method can improve further by using CUDA C/C++ without MATLAB.

Table 1. Reconstruction times of learning-based methods (100 epochs)

Quality Evaluation: Figure 2 visualizes the reconstruction errors compared to the full reconstruction of each method, respectively. As can be seen, our approach generated less error compared to the stage-of-the-art method of [5] and conventional CS-reconstruction using wavelet and TV energy [10]. Their glitches on the temporal profile are clearly observed since total variation along time axis may smooth out the temporal features that move quickly, especially near the heart boundary. In our case, the learned atoms are in 3D with larger supports, which helps to capture the time trait better even under fast motion and reduces errors in the reconstructed images. In addition, shift-invariance of CSC helps to generate more compact filters compared to the patch-based method.

Figure 3 shows the achieved Peak Signal-To-Noise-Ratios (PSNRs) measured between the CS-reconstruction results and the full reconstruction. As shown in this figure, our method requires more iterations (epochs) to converge to the steady state, but the actual running time is much faster than the others due to GPU acceleration. In the mean time, our method can reach much higher PSNRs.

Fig. 2.
figure 2

Error plots (red: high, blue: low) between full reconstruction and the result from (a) the proposed method, (b) the DL-based method [5], and (c) wavelet and total variation energy method [10]. (Color figure online)

Fig. 3.
figure 3

Convergence rate evaluation based on PSNRs.

4 Conclusion

In this paper, we introduced an efficient CS-MRI reconstruction method based on pure 3D convolutional sparse coding where shift-invariant 3D filters can represent the temporal features of the MRI data. The proposed numerical solver is derived under the ADMM framework by leveraging the Fourier convolution theorem, which can be effectively accelerated using GPUs. As a result, we achieved faster running time and higher PSNRs compared to the state-of-the-art CS-MRI reconstruction methods, such as using a patch-based dictionary learning and conventional wavelet and total variation energy. In the future, we plan to conduct a proper controlled-study of tuning-parameters and assess its feasibility in clinical applications.