PET image denoising using unsupervised deep learning
- 158 Downloads
Image quality of positron emission tomography (PET) is limited by various physical degradation factors. Our study aims to perform PET image denoising by utilizing prior information from the same patient. The proposed method is based on unsupervised deep learning, where no training pairs are needed.
In this method, the prior high-quality image from the patient was employed as the network input and the noisy PET image itself was treated as the training label. Constrained by the network structure and the prior image input, the network was trained to learn the intrinsic structure information from the noisy image and output a restored PET image. To validate the performance of the proposed method, a computer simulation study based on the BrainWeb phantom was first performed. A 68Ga-PRGD2 PET/CT dataset containing 10 patients and a 18F-FDG PET/MR dataset containing 30 patients were later on used for clinical data evaluation. The Gaussian, non-local mean (NLM) using CT/MR image as priors, BM4D, and Deep Decoder methods were included as reference methods. The contrast-to-noise ratio (CNR) improvements were used to rank different methods based on Wilcoxon signed-rank test.
For the simulation study, contrast recovery coefficient (CRC) vs. standard deviation (STD) curves showed that the proposed method achieved the best performance regarding the bias-variance tradeoff. For the clinical PET/CT dataset, the proposed method achieved the highest CNR improvement ratio (53.35% ± 21.78%), compared with the Gaussian (12.64% ± 6.15%, P = 0.002), NLM guided by CT (24.35% ± 16.30%, P = 0.002), BM4D (38.31% ± 20.26%, P = 0.002), and Deep Decoder (41.67% ± 22.28%, P = 0.002) methods. For the clinical PET/MR dataset, the CNR improvement ratio of the proposed method achieved 46.80% ± 25.23%, higher than the Gaussian (18.16% ± 10.02%, P < 0.0001), NLM guided by MR (25.36% ± 19.48%, P < 0.0001), BM4D (37.02% ± 21.38%, P < 0.0001), and Deep Decoder (30.03% ± 20.64%, P < 0.0001) methods. Restored images for all the datasets demonstrate that the proposed method can effectively smooth out the noise while recovering image details.
The proposed unsupervised deep learning framework provides excellent image restoration effects, outperforming the Gaussian, NLM methods, BM4D, and Deep Decoder methods.
KeywordsPosition emission tomography Denoising Deep neural network Unsupervised deep learning Anatomical prior
Positron emission tomography (PET) is a powerful functional imaging modality which can detect molecular-level activity in the tissue by specific tracers. It has wide applications in oncology [1, 2], cardiology , and neurology [4, 5], but still suffers from the low signal-to-noise ratio (SNR) which affects its detection and quantification accuracy, especially for small structures.
The noise in PET images is caused by the low coincident-photon counts detected during a given scan time and various physical degradation factors. In addition, for longitudinal studies or scans of pediatric populations, it is desirable to reduce the dose level of PET scans, which would further increase the noise level. Clinically, the Gaussian filter is always used for PET image denoising. However, it can smooth out important image structures during the denoising process. Other post-filtering approaches, such as adaptive diffusion filtering , non-local mean (NLM) , wavelet [8, 9] and HYPR processing , were then proposed, trying to reduce the image noise while preserving structure details. As the image restoration process is ill-conditioned due to limited information available from the noisy PET image itself, another widely adopted strategy for PET image denoising is to incorporate high-resolution anatomical priors, such as the patient’s own MR or CT images, as additional regularizations. One intuitive approach is extracting information from segmented prior images, assuming homogenous tracer uptakes in the same segmented regions [11, 12, 13]. Techniques not requiring segmentation were also developed, attempting to leverage the high-quality priors directly: Bowsher et al.  encouraged the smoothness among nearby voxels that have similar signal in the corresponding anatomical images; Chan et al.  embedded the CT information for PET denoising using a non-local mean (NLM) filter; Yan et al.  proposed a MR-based guided filtering method ; mutual information (MI) and joint entropy (JE) were also proposed to extract information from anatomical images [18, 19, 20, 21].
Over the past several years, deep neural networks (DNNs) have been widely and successfully applied to computer vision tasks such as image segmentation and object detection, by demonstrating better performance than the state-of-the-art methods when large amounts of datasets are available. Recently, in medical imaging field, with the help of DNN, details of low-resolution images can be restored by employing high-resolution images as training labels [22, 23, 24, 25]. Furthermore, by utilizing co-registered MR images as additional network inputs, anatomical information can help synthesize high-quality PET images [26, 27]. One challenge for these DNN-based methods is that large paired training datasets are needed, which is not always feasible in clinical practice, especially for pilot clinical trials. To acquire high-quality PET images as labels, longer scanning time or higher dose injection is needed, which does not fall into clinical routines and may bring extra safety concerns. Besides, huge efforts to collect and process the data are additional obstacles.
In this paper, we explore the possibilities of utilizing anatomical information to perform PET denoising based on DNN through an unsupervised learning approach. Recently, Ulyanov et al.  proposed the deep image prior framework, which shows that DNNs can learn intrinsic structures from corrupted images without pre-training. No prior training pairs are needed, and random noise can be employed as the network input to generate clean images. Inspired by this work, we have proposed a conditional deep image prior framework for PET denoising. In this proposed framework, CT/MR images from the same patient are employed as the network input and the final corrected images are represented by the network output. The original noisy PET images, instead of high-quality PET images, are treated as training labels. In our framework, the modified 3D U-net was adopted as the network structure, and L-BFGS was chosen as the optimization algorithm for its monotonic property and better performance observed in the experiments.
Currently, CT/MR images of the same patient are readily available from PET/CT or PET/MR scans, and this proposed method can be easily applied for PET denoising. Contributions of this work include two aspects: (1) anatomical prior images are used as network input to perform PET denoising, and no prior training or training datasets is needed in this proposed method; (2) this is an unsupervised deep learning method which does not require any high-quality images as training labels.
Materials and methods
Conditional deep image prior
It is shown in conditional generative adversarial network (GAN)  studies that prediction results can be improved by using associated priors as network input, instead of random noise. Inspired by this, a conditional deep image prior method is proposed in this work to perform PET denoising, where the CT/MR images of the same patient are employed as the network input. To demonstrate the benefits of employing the prior image as the network input, a comparison between using the random noise as the network input and using the same patient’s MR prior image as the network input was performed, and shown in supplementary Fig. 1. We can see that with the MR prior image as the network input, more cortex details can be recovered and the noise in the white matter is much reduced.
To validate the proposed method, a computer simulation study based on the BrainWeb phantom (matrix size, 125 × 125 × 105; voxel dimensions, 2 × 2 × 2 mm3)  was first performed. Bias-variance tradeoff can be characterized in this simulation study as the ground truth is known and multiple independent and identically distributed (i.i.d.) realizations can be simulated. The simulated geometry is based on the Siemens mCT scanner. The sinogram data was generated from the last 5-min frame of a 1-h 18F-FDG scan with 1 mCi dose injection, assuming the count number in each line of response (LOR) follows the Poisson distribution. Random events and the attenuation effects were considered during the simulation and the object-dependent scatter was not. The PET images were reconstructed using the maximum likelihood expectation maximization (MLEM) algorithm running 40 iterations. The corresponding T1-weighted MR image was employed as the prior image.
Two groups of real datasets with different modalities and different tracers were used to evaluate performance of the proposed method. One is a PET/CT dataset with ten lung cancer patients (8 men and 2 women). The patient information is listed in supplementary Table. 1. The average patient age is 59.4 ± 10.9 years (range, 43–82 years), the average weight is 69.9 ± 13.5 kg (range, 41–84 kg), and the nominal injected dose of 68Ga-PRGD2 is 370 MBq. All patients were scanned with a Biograph 128 mCT PET/CT system (Siemens Medical Solutions, Erlangen, Germany). A low-dose CT scan (140 kV; 35 mA; pitch 1:1; layer spacing, 3 mm; matrix, 512 × 512; voxel size, 1.52 ×1.52 × 3 mm3; FOV, 70 cm) was performed for attenuation correction. PET images (matrix size, 200 × 200 × 243; voxel dimensions, 4.0728 × 4.0728 × 3 mm3) were acquired at 60-min post injection and reconstructed using three-dimensional ordered subset expectation maximization (3D-OSEM) with 3 iterations and 21 subsets.
The other dataset is a PET/MR dataset containing 30 patients (21 men and 9 women) with different tumor types. Patient details are shown in supplementary Table. 2. The average patient age is 55.2 ± 7.7 years (range, 38–74 years), the average weight is 66.8 ± 9.9 kg (range, 45–85 kg), and the average administered dose of 18F-FDG is 350.7 ± 54.7 MBq (range, 239.8–462.9 MBq). All patients were scanned on a Biograph mMR PET/MR system (Siemens Medical Solutions, Erlangen, Germany). T1-weighted images (repetition time, 3.47 ms; echo time, 1.32 ms; flip angle, 9°; acquisition time 19.5 s; matrix size, 260 × 320 × 256; voxel dimensions, 1.1875 × 1.1875 × 3 mm3) were acquired simultaneously. PET images (matrix size, 172 × 172 × 418; voxel dimensions, 4.1725 × 4.1725 × 2.0313 mm3) were acquired at 60-min post injection and reconstructed using 3D-OSEM.
The Gaussian filtering, NLM filtering guided by CT/MR images , BM4D , and Deep Decoder  methods were employed as the reference methods. To evaluate the performance of different methods quantitatively, for the simulation data, the contrast recovery coefficient (CRC), between the gray matter region and the white matter region vs. standard deviation (STD) calculated from the white matter region were plotted to evaluate the bias-variance tradeoff . Ten regions of interest (ROIs) were drawn on the gray matter region and thirty background ROIs were chosen on the white matter region. Thirty realizations were simulated and reconstructed to generate the CRC vs. STD curves.
Wilcoxon signed-rank test was performed on the CNR improvement ratios to compare the performance of different methods. P value less than 0.05 was chosen to indicate statistical significance.
The parameters of Gaussian (FWHM), NLM guided by CT/MR images (window size), BM4D (standard deviation of the noise), Deep Decoder (training epoch number), and the proposed method (training epoch number) were first tuned for one patient in each dataset (evolving curves shown in supplementary Fig. 4). Considering the fact that PET images in the same dataset having similar structures, the optimal parameters that achieved the highest CNR for each method were fixed when processing remaining patient data. Hence, the CNR value is also the stopping criterion of the network training for the proposed method and the Deep Decoder method: the epoch number that leads to the highest CNR was chosen as the optimal epoch number. Based on supplementary Fig. 4, for the PET/CT dataset, the Gaussian filter with FWHM equal to 2.4 pixel, the NLM filter with window size 5 × 5 × 5, the BM4D filter with 10% noise standard deviation, the Deep Decoder method with 1800 training epochs, and the proposed method trained with 900 epochs were employed in the denoising processing. For the PET/MR dataset, the Gaussian filter with FWHM equal to 1.6 pixel, the NLM filter with window size 5 × 5 × 5, the BM4D method with 8% noise standard deviation, the Deep Decoder with 2000 epochs, and the proposed method trained with 700 epochs were employed in the denoising process.
All the network training was performed using the NVIDIA 1080 Ti graphic card based on the TensorFlow 1.4 platform. For the simulation dataset running 200 epochs, the network training time of the proposed method is around 5 min. For the PET/CT dataset running 900 epochs and the PET/MR dataset running 700 epochs, the network training time of the proposed method is both around 40 min.
The plot of the contrast (mlesion − mref) vs. noise inside reference ROIs (SDref) for different methods with varying parameters (supplementary Fig. 4) shows that the proposed method can maintain high contrast within the tumor region while achieving low noise in the reference region. Compared with the proposed method, the NLM method could not preserve high contrast with the same noise and the Gaussian method showed higher noise at the same contrast level. From Fig. 9, we can see that there is no significant difference between the Gaussian method and the MR-guided NLM method for the lung tumor. The fact that the T1-weighted image does not have too many details in the lung region might be one explanation. However, the proposed method using MR as prior can still achieve significantly higher CNR improvement ratio compared with the Gaussian and NLM methods for the lung tumor case, which demonstrates that the proposed method can make use of priors more efficiently than the NLM method.
The correlations of CNR values and CNR improvement ratios with different tumor features for all scans of PET/CT and PET/MR datasets
0.8949 (p < 0.05)
0.8192 (p < 0.05)
0.8483 (p < 0.0001)
0.8508 (p < 0.0001)
0.6475 (p < 0.0001)
In this work, we proposed an unsupervised deep learning method for PET denoising, where the patient’s prior image was employed as the network input and the original noisy PET image was treated as the training label. Evaluations based on simulation datasets as well as PET/CT and PET/MR datasets demonstrate the effectiveness of the proposed denoising method over the Gaussian, anatomically guided NLM, BM4D, and Deep Decoder methods. Future work will focus on further clinical evaluations with various tumor types as well as the detailed effects of misregistration on the proposed method.
This work was supported by the National Institutes of Health under grant 1RF1AG052653-01A1, 1P41EB022544-01A1, NIH C06 CA059267, by the National Natural Science Foundation of China (No: U1809204, 61525106, 61427807, 61701436), by the National Key Technology Research and Development Program of China (No: 2017YFE0104000, 2016YFC1300302), and by Shenzhen Innovation Funding (No: JCYJ20170818164343304, JCYJ20170816172431715). Jianan Cui is a PhD student in Zhejiang University and was supported by the China Scholarship Council for 2-year study at Massachusetts General Hospital.
Compliance with ethical standards
Conflict of interest
Author Quanzheng Li has received research support from Siemens Medical Solutions. Other authors declare that they have no conflict of interest.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent was obtained from all individual participants included in the study.
- 2.Beyer T, Townsend DW, Brun T, Kinahan PE, Charron M, Roddy R, et al. A combined PET/CT scanner for clinical oncology. J Nucl Med. 2000;41:1369–79.Google Scholar
- 3.Schwaiger M, Ziegler S, Nekolla SG. PET/CT: challenge for nuclear cardiology. J Nucl Med. 2005;46:1664–78.Google Scholar
- 7.Dutta J, Leahy RM, Li Q. Non-local means denoising of dynamic PET images. Muñoz-Barrutia A, editor. PLoS One 2013;8:e81390. https://doi.org/10.1371/journal.pone.0081390.
- 14.Bowsher JE, Yuan H, Hedlund LW, Turkington TG, Akabani G, Badea A et al. Utilizing MRI information to estimate F18-FDG distributions in rat flank tumors. IEEE Symp Conf Rec Nucl Sci 2004. IEEE; 2004. p. 2488–92. https://doi.org/10.1109/nssmic.2004.1462760.
- 20.Nuyts J. The use of mutual information and joint entropy for anatomical priors in emission tomography. 2007 IEEE Nucl Sci Symp Conf Rec. IEEE; 2007. p. 4149–54. https://doi.org/10.1109/nssmic.2007.4437034.
- 21.Song T, Yang F, Chowdhury SR, Kim K, Johnson KA, El Fakhri G, et al. PET image deblurring and super-resolution with an MR-based joint entropy prior. IEEE Trans Comput Imaging. 2019;1. https://doi.org/10.1109/tci.2019.2913287
- 22.Wang S, Su Z, Ying L, Peng X, Zhu S, Liang F, et al. Accelerating magnetic resonance imaging via deep learning. 2016 IEEE 13th Int Symp Biomed Imaging. IEEE; 2016. p. 514–517. https://doi.org/10.1109/isbi.2016.7493320.
- 24.Wu D, Kim K, Fakhri G El, Li Q. A cascaded convolutional neural network for x-ray low-dose CT image denoising 2017.Google Scholar
- 25.Gong K, Guan J, Kim K, Zhang X, Yang J, Seo Y, et al. Iterative PET image reconstruction using convolutional neural network representation. IEEE Trans Med Imaging. 2018:1–8. https://doi.org/10.1109/tmi.2018.2869871.
- 27.Xiang L, Qiao Y, Nie D, An L, Wang Q, Shen D. Deep auto-context convolutional neural networks for standard-dose PET image estimation from low-dose PET/MRI. Neurocomputing. 2018;406–16. https://doi.org/10.1016/j.neucom.2017.06.048.
- 28.Ulyanov D, Vedaldi A, Lempitsky V. Deep image prior. 2017 IEEE Conf Comput Vis Pattern Recognit. IEEE; 2017; pp. 5882–5891. https://doi.org/10.1109/cvpr.2018.00984.
- 30.Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 2016. pp. 424–32. https://doi.org/10.1007/978-3-319-46723-8_49.
- 31.Gong K, Kim K, Cui J, Guo N, Catana C, Qi J, et al. Learning personalized representation for inverse problems in medical imaging using deep neural network. 2018;1–11. Available from: http://arxiv.org/abs/1807.01759
- 33.Kingma DP, Ba J. Adam: A method for stochastic optimization. 2014; Available from: http://arxiv.org/abs/1412.6980.
- 34.Nesterov Y. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2). Dokl AN USSR. 1983;269:543–7.Google Scholar
- 35.Cocosco CA, Kollokian V, Kwan RK-S, Pike GB, Evans AC. Brainweb: online interface to a 3D MRI simulated brain database. Citeseer: Neuroimage; 1997.Google Scholar
- 37.Heckel R, Hand P. Deep decoder: concise image representations from untrained non-convolutional networks. Int Conf Learn Represent. International Conference on Learning Representations; 2019. https://doi.org/10.1109/TIP.2012.2210725.