1 Introduction

Magnetic Resonance Imaging (MRI) is a powerful tool for disease diagnosis and treatment. However, the clinical applications of MRI are limited by its intrinsically low imaging speed. On a clinical MRI scanner, data are typically collected below the Nyquist sampling frequency for imaging acceleration. As undersampling may introduce aliasing artifacts, image reconstruction is challenging in MRI.

Parallel imaging is a conventional approach to reconstruct images from undersampled data in MRI. This approach requires a radiofrequency coil array for collecting multi-channel images in parallel [2, 7, 9]. The coil sensitivity differences between channels may be used to remove aliasing artifacts in image reconstruction. Several parallel imaging techniques including GRAPPA [2] and SENSE [9] have been commercialized and are being used as clinical standards. A drawback of these techniques is that coil sensitivity must be calibrated and the calibration procedure may reduce the effective parallel imaging acceleration factor. Recently, compressed sensing has become an active area of research interest in MRI  [6]. This approach relies on a sparsity constraint on MRI data and uses an iterative algorithm to reconstruct images. Although a calibration procedure is not needed, the computation is expensive. In addition, most compressed sensing techniques are sensitive to regularization parameters in image reconstruction and their clinical performance is not as robust as parallel imaging.

Deep learning has made many breakthroughs in the field of computer vision and medical imaging. Recent years have seen a number of research works on its applications in MRI reconstruction. For example, a data-driven scheme based on generative adversarial network (GAN) has been used in combination with compressed sensing [8]. This work, like many other deep learning studies, treats MRI reconstruction as a computer vision problem and removes aliasing from DICOM images in grayscale or RGB magnitude generated by a series of pre- and post-processing procedures [5, 8, 10, 14]. However, it should be known that MRI physically collects multi-channel complex data samples in Fourier space (termed k-space in the field of MRI) with both magnitude and phase information. By transforming complex k-space data into a grey-scale DICOM image with pre- and post-processing, a large amount of information is lost, which introduces low data utility in the network. As a result, image reconstruction may not take full advantage of the data-driven power of deep learning.

Our work aims to develop a deep generative model that can process multi-channel MRI raw data. It is expected that image reconstruction may take advantage of all the information in MRI data. The work is inspired by a well-known fact in parallel imaging: multi-channel coil sensitivity introduces k-space data correlation and every data sample may be represented by the convolution of their neighboring samples with a filter. This filter can be well modelled by convolutional neural networks in k-space. By training a GAN with multi-channel k-space raw data, our model may learn and use parallel imaging mechanisms underlying MRI data to reconstruct images from undersampled data. Herein, we present a multi-channel GAN model for MRI reconstruction from undersampled data in this paper. This model has the following features:

  1. (1)

    The multi-channel GAN receives multi-channel complex undersampled MRI data at the input and generates multi-channel complex fully-sampled MRI data at the output.

  2. (2)

    The model provides an “end-to-end” approach to MRI reconstruction, i.e., images can be generated directly from raw MRI data. No pre- or post-processing procedures are needed.

  3. (3)

    A new loss function is introduced to combine adversarial and perceptual loss for improved artifact suppression in image reconstruction.

  4. (4)

    The model is trained with multiple sets of multi-channel MRI data. In every training step, all the channels of the network are updated simultaneously with the multi-channel raw data.

  5. (5)

    The trained model is used as a general reconstructor for all new datasets. This is different from conventional MRI methods that use a different reconstructor for different datasets.

Fig. 1.
figure 1

An overview of the proposed Multi-channel GAN. Multi-channel k-space data are collected simultaneously from multiple coils and fed in to GAN to generate multi-channel outputs for MRI reconstruction.

2 Method

MRI raw data are collected in complex k-space (Fourier spatial-frequency domain) from a multi-channel (typically 8–32 channels) radiofrequency coil array. If data are fully sampled at the Nyquist frequency, an inverse Fourier transform may generate the real images. To accelerate MRI, k-space is usually undersampled, which introduces aliasing artifacts in image-space. Since multi-channel coil sensitivity may introduce k-space data correlation, a parallel imaging technique (e.g., GRAPPA, SPIRiT) may estimate missing data by the linear convolution of partially collected k-space with a filter. In our work, a GAN-based network is used to model the filter used in parallel imaging for image reconstruction. In GAN pipeline, two models are jointly trained: a generator model G which captures the training data distribution and a discriminator model D which justifies if the generated data come from the distribution of the training data. Through the training process, the generator should be trained to estimate the embedding data manifold and generate samples to fool the discriminator. Then after training the generator alone can be used to generate new samples that are similar to real samples.

Fig. 2.
figure 2

Multi-channel GAN structure for MRI reconstruction from undersampled k-space data.

Here a multi-channel GAN based model (a multi-input and multi-output system) is trained to estimate complex-valued data in k-space. Given a set of data pairs: fully sampled k-space data y and undersampled zero-filling data \(x = M_Ry\), where \(M_R\) is a undersampling mask with an acceleration factor of R, a generator G is trained to generate fake samples from x and these fake samples should be justified as real samples by the discriminator D. The adversarial loss of the generator is given by:

$$\begin{aligned} L_{adv} = \mathbb {E}_x (1-D(G(x)))^2 \end{aligned}$$
(1)

To minimize the difference between the generated data and ground truth data, a pixelwise MSE loss is also introduced. Here an \(l_1\) cost term is used for more robust performance in noise and blurring suppression:

$$\begin{aligned} L_{l_1} = \mathbb {E}_{x,y} ||y-G(x)||_1 \end{aligned}$$
(2)

In addition, a data consistency loss is used to minimize the sum of squared difference between the acquired data samples and their estimates. This loss is formulated as:

$$\begin{aligned} L_{dc} = \mathbb {E}_{x,y} ||M_R y - M_R G(x)||^2 \end{aligned}$$
(3)

The terms above encourage the generator to generate an output that matches the ground truth at the k-space sampling positions. To suppress artifacts that can not be quantified by Eqs. 1, 2 and 3, we introduce a perceptual loss based on high-level features extracted from pre-trained networks and combine the pixelwise loss with perceptual loss for improved visual quality of the reconstructed image. Here let \(\phi _j(\cdot )\) be the j-th layer output of a pre-trained network with shape \(C_j \times W_j \times H_j\). We can use \(\phi _j(\cdot )\) as a feature extractor which captures high-level image characteristics. The perceptual loss is formulated as:

$$\begin{aligned} L_{perc} = \mathbb {E}_{x,y}\frac{1}{C_j W_j H_j}||\phi _j(y)-\phi _j(G(x))||^2_2 \end{aligned}$$
(4)

By summing the adversarial loss, pixelwise loss, data consistency loss and perceptual loss together, the overall loss functions for generator and discriminator are formed as:

$$\begin{aligned} L_{gen}(G) = L_{adv} + \alpha L_{l_1} + \beta L_{dc} + \gamma L_{perc} \end{aligned}$$
(5)
$$\begin{aligned} L_{dis}(D) = \mathbb {E}_x (D(G(x)))^2 + \mathbb {E}_y (1-D(y))^2 \end{aligned}$$
(6)

The parameters \(\alpha \), \(\beta \) and \(\gamma \) are used to balance the adversarial loss, \(l_1\) loss, data consistency loss and perceptual loss. The generator and discriminator are trained with mini-batch stochastic gradient descent and back-propagation algorithms. The two sub-networks are updated alternatively until convergence. The trained generator is then used to reconstruct images from new raw MRI data.

Generator Architecture. We adapt the basic architecture of identity residual network with skip connections [3] for multi-channel MRI reconstruction. During training, multi-channel undersampled and fully sampled complex k-space data are fed into the network. Then 5 residual blocks are stacked sequentially where each block has two convolutional layers and skip connections from block input to output. Each convolutional layer in the residual block consists of 128 feature maps using \(3\times 3\) kernels and is followed by batch normalization and ReLU activation. The network is followed by three convolutional layers with kernel size \(1\times 1\). A VGG-16 network [11] pre-trained on ImageNet is used as feature extractor and the output of relu2_2 layer is used as perceptual feature.

Discriminator Architecture. A discriminator is connected to the generator output. The discriminator is a regular convolutional network which consists of 7 convolutional layers, each of which is followed by batch normalization and ReLU layers. We use 8, 16, 32, 64, 64, 64 feature maps for the first 6 layers, respectively. We use \(3\times 3\) kernels for the first 5 layers and \(1\times 1\) kernels for the last 2 layers. The discriminator output is a scaler between 0 and 1 measuring the estimated score of the “realness” of the generated data.

3 Experiment

3.1 Dataset and Training Details

To validate the proposed method, we perform several experiments with 170 2D multi-channel MRI images. These data are collected from the brain anatomy of different human subjects using an 8 channel coil array with a T1/T2 weighted TSE sequences (axial resolution \(256\times 256\) and FOV \(240\times 240\) mm). The data are randomly grouped into a training (127 images) and a test (43 images) set. To replicate the reconstruction process, we use uniform Cartesian masks with various undersampling factors. A Nvidia Tesla V100 GPU is used for training with batch size 4. The learning rate is initially set as \(1\times 10^{-5}\) and reduced in half every 10,000 iterations. An Adam optimizer is used for optimization. The network is trained for 2,000 epoches, which is about 1,500 min.

3.2 Results and Discussion

The proposed method is used to reconstruct images from undersampled data generated from the 43 test data samples in comparison to GRAPPA [2], SPIRiT [7], Compressive Sensing (CS) [6] and GANCS [8]. GRAPPA, SPIRiT and Compressive Sensing reconstruction uses Berkeley ESPIRiT [12] and BART [13] toolboxes. In GRAPPA and SPIRiT, additional fully-sampled \(18\times 18\) k-space areas in central k-space are used as autocalibration signals (ACS). In GANCS [8], the raw k-space data are transformed to image-space and only the magnitude images are used. Figure 3 shows a few examples with an undersampling factor of 5 in reference to the ground truth images and the Zero-filling (ZF) Fourier reconstruction results, which are used to show the spatial distribution of aliasing artifacts. Quantitative results including Structural SIMilarity (SSIM) and PSNR are given in Table 1. It is found that parallel imaging methods (GRAPPA and SPIRiT) gives higher PSNRs. However, they may generate image blurs with noticeable background noise, indicating outer k-space is not reconstructed well. Compressive Sensing gives better image details with less noise, but at a cost of computation time (Table 1) and considerable artifacts near tissue boundaries. GANCS gives apparently worse performance both qualitatively (Fig. 3) and quantitatively (Table 1). In comparison, the proposed method gives high-quality images with a low computation cost. It should be noted that this new method does not need calibration data, implying the net undersampling factor is higher than that in parallel imaging.

Fig. 3.
figure 3

Representative reconstructed images for two test samples with 5-fold uniform Cartesian undersampling (From left to right): Zero-filling, GRAPPA, SPIRiT, CS, GANCS, our method and ground truth. Contrast and exposure of the images are properly adjusted for better visualization.

Table 1. Comparison of average PSNR (dB), SSIM and Reconstruction time (second) of different methods with 5-fold undersampling

It should be mentioned that the performance of GANCS shown in Fig. 3 is worse than that in the original paper [8]. This should be attributed to the following differences: First, in the previous study, GANCS is used to reconstruct images from radial data. In this study, Cartesian data are used. Compared with radial undersampling, Cartesian undersampling introduces more patterned aliasing artifacts, which cannot be effectively removed with compressed sensing. Second, the previous GANCS study uses a total number of 45,300 training images. This study uses only 127 training data samples. The significant reduction in training data size should be a major factor that affects the reconstruction performance.

In this study, the proposed method is based on the same GAN structure as GANCS. However, a multi-channel architecture is used to process multi-channel complex k-space data. Because more information in MRI raw data is used, better performance can be achieved with less training data. This is an advantage of the proposed method. It should also be pointed out that the proposed method is practically useful in MRI. Most MRI protocols are running with fixed parameters on daily basis in clinical practice. Once the proposed deep learning network is trained with a certain amount of data collected from different patients using a fixed protocol, it can be directly used to reconstruct images for the upcoming new patients scanned with the same protocol. This is more time efficient than conventional parallel imaging, which always requires a calibration procedure with additional data acquisition in every scan.

Fig. 4.
figure 4

Average PSNR (dB) and SSIM with various undersampling factor R.

Fig. 5.
figure 5

Representative reconstructed images for two test samples with various undersampling factor R. From left to right: reconstructed images with \(R = 3, 4, 5, 6, 7, 8\) and ground truth images.

An investigation is also made on the performance of the trained multi-channel GAN with different acceleration factors. As shown in Figs. 4 and 5, the reconstruction performance is not dramatically degraded until the undersampling factor is higher than 5. In previous studies [15], it has been demonstrated that the standard 8-channel head coil used in this work has a maximal parallel imaging acceleration factor of 4 due to hardware limitation. This indicates that the proposed method can learn not only parallel imaging mechanisms but also useful k-space data features from MRI raw data, making it possible to accelerate MRI beyond the parallel imaging limit.

4 Conclusion

In this paper, we propose a multi-channel GAN model for parallel MRI reconstruction. Compared to other existing deep learning approaches, the proposed method directly uses multi-channel complex-valued k-space data. Instead of learning anatomy structure in image space, we reformulate MRI reconstruction as a data completion problem and learn physical data relationship in k-space with a multi-channel GAN model. The experimental results demonstrate that the proposed method outperforms other state-of-the art MRI reconstruction methods for imaging acceleration.