1 Introduction

Diffusion magnetic resonance imaging (dMRI) is a non invasive imaging technique which allows probing the microstructure of the brain. Recent advances in parallel imaging techniques and accelerated acquisitions have greatly reduced the inherently long scan time in dMRI. While it is known that the noise distribution found in magnitude dMRI data depends on the reconstruction algorithm used [1] and the number of channels in the receiver coils [2, 3], noise correlation effects in adjacent channels change the noise distribution from its theoretical formulation. Assumption of the Rician or more general noncentral chi distributions with degrees of freedom equal to the number of receiver coils deviate due to these effects and other filtering applied by the scanner. The resulting distribution usually exhibits a lower number of degrees of freedom N than the number of receiver coils and higher noise variance \(\sigma _g\) depending on the spatial location [2, 4]. Correcting deviations from the theoretical noise distributions is challenging and oftentimes requires coils correlation maps or information about the complex signal combination process, which is not readily available on most scanners. While some recent algorithms for dMRI are developed to include information about the noise distribution [5, 6], there is no method, to the best of our knowledge, providing a fully automatic way to characterize the noise distribution using information from the magnitude data itself only. Due to this gap between the physical acquisition process and noise estimation theory, noise distributions are either assumed as Rician or noncentral chi with N already known and concentrate in estimating the noise standard deviation \(\sigma _g\) [7,8,9]. We propose to estimate both \(\sigma _g\) and N from the magnitude data only by using a change of variable to a gamma distribution \(\varGamma (N, 1)\) [8], whose first moments directly depend on N. This makes the proposed method fast and easy to apply on existing data without additional information, while being robust to artifacts by only considering voxels adhering to the created gamma distribution.

2 Theory and Methods

Signal Distributions in Parallel MRI. To account for uncertainty in the acquisition process, the complex signal measured in k-space by the receiver coil can be modeled with a separate additive zero mean Gaussian noise for each channel, but assumed to have identical variance \(\sigma ^2_g\). When converted to the commonly used magnitude images, the resulting noise distribution follows a Rician or noncentral chi distribution, whose parameters depend on the employed reconstruction algorithm [2]. To account for signal correlations introduced by parallel imaging techniques, the case of the noncentral chi distribution is still valid with spatially varying parameters [4].

Parameter Estimation Using the Method of Moments and Maximum Likelihood. When the underlying signal intensity \(\eta \) is zero, the magnitude signal m reduces to a Rayleigh distribution or in the general case to a chi distribution. The pdf of magnitude noise over zero signal is given by \(pdf(m | \eta = 0, \sigma _g, N) = (m^{2N-1}) / (2^{N-1}\sigma ^{2N}_g \varGamma (N)) \exp {\left( -m^2 / (2\sigma _g^2)\right) } dm\) where \(\varGamma (x)\) is the gamma function. With the change of variable \(t = m^2 / (2\sigma ^2_g)\), the pdf can be rewritten as a gamma distribution \(\varGamma (N, 1)\) [8]. The pdf of the gamma distribution \(\varGamma (\alpha , \beta )\) is defined as \(pdf(t|\alpha ,\beta ) = 1 / (\varGamma (\alpha ) \beta ^\alpha ) t^{\alpha -1} \exp {(-t / \beta )} dt\) and has theoretical mean \(\mu _{gamma} = \alpha \beta \) and variance \(\sigma ^2_{gamma} = \alpha \beta ^2\). For a gamma distribution \(\varGamma (N, 1)\), we obtain that the mean and the variance are equal with a value of N. Another useful identity is that the sum of gamma distributions is a gamma distribution such that if \(t_i \thicksim \varGamma (\alpha _i, \beta )\), then \(\sum _{i=1}^K t_i \thicksim \varGamma (\sum _{i=1}^K\alpha _i, \beta )\). We can therefore estimate the Gaussian noise standard deviation \(\sigma _g\) and the number of coils N from the moments of the magnitude image themselves where no signal from the imaged object is present. Any method suitable for computing \(\sigma _g\) can be used, or it can also be estimated from the moments once again with the relationship

$$\begin{aligned} \sigma _g = \frac{1}{\sqrt{2}} \sqrt{\frac{\sum _{k=1}^K m^4_k}{\sum _{k=1}^K m^2_k} - \frac{1}{K}\sum _{k=1}^K m^2_k} \end{aligned}$$
(1)

where \(m_k\) is the magnitude signal for voxel k and K is the number of identified noise only voxels. Once \(\sigma _g\) is known, N can be estimated from the moments with

$$\begin{aligned} N = \frac{1}{K}\sum _{k=1}^K t_k = \frac{1}{2K\sigma ^2_g}\sum _{k=1}^K m^2_k \end{aligned}$$
(2)

where \(t_k = m_k^2 / (2\sigma _g^2)\) is the change of variable for voxel k. Estimation based on the method of maximum likelihood yields two equations for estimating \(\alpha \) and \(\beta \). Rearranging the equations for a gamma distribution \(\varGamma (N, 1)\) [10] will give the same expression as Eq. (1) and a second implicit equation for N that is given by

$$\begin{aligned} \psi (N) = \frac{1}{K}\sum _{k=1}^K \log (m^2_k / 2\sigma _g^2) \end{aligned}$$
(3)

where \(\psi (x)\) is the digamma function and can be numerically inverted using Newton’s method to obtain N.

Estimating \(\sigma _g\) and N. For simplicity, we assume that each 2D slice with the same spatial location belongs to the same distribution throughout each 3D volume. This practical assumption allows selecting a large number of noise only voxels for computing statistics as well as discarding acquisition artifacts such as ghosting. Following a methodology similar to [8], it is possible to identify voxels belonging to the gamma distribution by checking if they fall inside a predefined probability threshold of the inverse cumulative distribution function (cdf). Taking the sum of all MRI volumes can therefore be used to separate the background signal belonging to the gamma distribution \(\varGamma (KN, 1)\) from the rest of the volume with a rejection step using the inverse cdf. In the particular case \(\varGamma (\alpha , 1)\) at a probability level p, the inverse cdf is \(icdf(\alpha , p) = P^{-1}(\alpha , p)\) where \(P^{-1}\) is the inverse lower incomplete regularized gamma function. For the first iteration, initial bounds are set on the value of N and \(\sigma _g\) as they are unknown. We set a lower bound \(N_{min} = 1\) and an upper bound \(N_{max} = 12\) for the first iteration, noting that [2] reported values of N between 3 and 12 for a 32 channels receiver coil. Similar to [8], an upper bound of \(\sigma _g\) is given by \(\sigma _{g_{max}} = median / \sqrt{2\, icdf(N_{max}, 1/2)}\) where median is the median of the whole 4D dMRI dataset. From this upper bound \(\sigma _{g_{max}}\), a search interval with a values is created, where we chose \(a = 50\) as in [8]. Each point of the interval \(\varPhi = [1 \sigma _{g_{max}} / a, 2 \sigma _{g_{max}} / a, \cdots , a \sigma _{g_{max}} / a]\) is used as an initial value of \(\sigma _g\) in the change of variable \(t = m^2 / 2\sigma _g^2\). With these initial values, an iterative search for \(\sigma _g\) and N is made as follow. The value of \(\varPhi \) which identifies the largest number of voxels between the lower bound given by \(\lambda _{-} = icdf(KN_{min}, p/2)\) and the upper bound given by \(\lambda _{+} = icdf(KN_{max}, 1-p/2)\) is accepted as \(\sigma _g\). From those voxels, new values of \(\sigma _g\) are computed with Eq. (1) and N with Eq. (2) or Eq. (3). For the next iteration, we set \(\varPhi = [0.95 \sigma _{g}, 0.96 \sigma _{g}, \cdots , 1.05 \sigma _{g}]\) and recompute the icdf bounds \(\lambda _{-}, \lambda _{+}\) with the new value of N. Voxels between \(\lambda _{-}\) and \(\lambda _{+}\) belong to the distribution \(\varGamma (KN, 1)\) and are recomputed until the values of \(\sigma _g\) and N reach convergence.

Synthetic Phantom Datasets. We generated synthetic datasets based on the ISBI 2013 HARDI challengeFootnote 1 with phantomasFootnote 2. Two noiseless single shell phantoms with 64 gradient directions were generated at b = 1000 s/mm\(^2\) and b = 3000 s/mm\(^2\) with one b = 0 s/mm\(^2\) each. The datasets were then corrupted with Rician \((N = 1)\) and noncentral chi noise (\(N = \) 4, 8 and 12), both stationary and spatially varying, at a signal-to-noise ratio (SNR) of 30. The noisy data was generated according to \(\hat{I} = \sqrt{\sum _{i=0, j=0}^{N} \left( \frac{I}{\sqrt{N}} + \tau \epsilon _i \right) ^2 + \tau \epsilon _j^2}\), where \(\hat{I}\) is the resulting noisy volume \(\epsilon _i, \epsilon _j\) are Gaussian distributed with mean 0 and variance \(\sigma _g^2 = (mean(b0)/SNR)^2\). In the constant noise case, \(\tau \) is set to 1 so that the noise is uniform. For the spatially varying noise case, \(\tau \) is a sphere with a value of 1 in the center up to a value of 1.75 at the edges of the phantom, thus generating a stronger noise profile outside the phantom than for the stationary (constant) noise case. This noise profile mimics receiver coils disposed around the surface of the phantom, with an increase in the noise profile near each receiver. One important observation arising from choosing a single SNR level is that the noise standard deviation \(\sigma _g\) is the same for all datasets, while the magnitude standard deviation \(\sigma _{m_N}\) depends on the value of N and we have \(\sigma _{m_N} < \sigma _g\).

In Vivo Datasets. We obtained four repetitions of a freely available dMRI dataset of a single subjectFootnote 3 to assess the reproducibility of noise estimation without a priori knowledge. The acquisition was performed on a GE MR750 3T scanner at Stanford university, where a 3x slice acceleration with blipped-CAIPI shift of FOV/3 was used, partial Fourier 5/8 and a minimum TE of 81 ms. Two acquisitions were made in the anterior-posterior phase encoded direction and the two others in the posterior-anterior direction. The voxelsize was 1.7 mm isotropic with 7 b = 0 s/mm\(^2\) images, 38 volumes at b = 1500 s/mm\(^2\) and 38 volumes at b = 3000 s/mm\(^2\).

Noise Estimation Algorithms for Comparison. To assess the performance of the proposed method, we used three other noise estimation algorithms [7,8,9] previously used in the context of diffusion MRI with their default parameters. The local adaptive noise estimation (LANE) algorithm [9] was designed to estimate the noise standard deviation over tissue for both Rician and noncentral chi noise while also taking into account the structure of the data for adaptive estimation. Since the method works on a 3D volume, we only used the b = 0 s/mm\(^2\) image for all of the experiments as the signal does not vary spatially for the same type of tissue in such image. We used the Marchenko-Pastur (MP) distribution fitting on the principal component analysis (PCA) decomposition of the diffusion data [7]. MPPCA estimates the magnitude noise standard deviation \(\sigma _{m_N}\) in small local windows by finding an optimal threshold in PCA space which separates the signal from the noise. This value of \(\sigma _{m_N}\) is slightly underestimated due to the discrete nature of the PCA decomposition. Finally, we compared our proposed method with the Probabilistic Identification and Estimation of Noise (PIESNO) [8], which originally proposed the change of variable to the gamma distribution that is at the core of our proposed method. PIESNO requires the value of N, which is used to iteratively estimate \(\sigma _g\) until convergence by removing voxels which do not belong to the distribution \(\varGamma (N, 1)\) for a given slice. For our proposed algorithm, we set the probability level at \(p = 0.05\) and initial values of \(a = 50\), \(N_{min} = 1\) and \(N_{max} = 12\). To the best of our knowledge, ours is the first method which makes it possible to estimate both \(\sigma _g\) and N jointly without requiring any information about the reconstruction process of the MRI scanner. Finally, we quantitatively assessed the performance of each method on the synthetic datasets by measuring the percentage error inside the phantom against the known value of \(\sigma _g\), where the error is computed as \(percentage\, error = 100 (\sigma _{g_{estimated}} - \sigma _{g_{true}}) / \sigma _{g_{true}}\). We also show the estimated values of N using our method for each dataset.

3 Results

As MPPCA and LANE are designed to estimate \(\sigma _g\) over data, we report the estimation error computed only inside a mask excluding the background for all methods. Figure 1 shows the percentage error of each method on the synthetic datasets. The correct value of N was given to both LANE and PIESNO when \(\sigma _g\) was constant. All methods performed generally well, with our proposed method and PIESNO making less than 2% of errors for all cases. MPPCA and LANE commit larger errors (around 5% and 20% on average respectively) with increasing values of N, where LANE error is the largest when \(N = 12\). For the case of spatially varying \(\sigma _g\), we assumed N to be unknown and set \(N = 1\) for LANE and PIESNO. Due to a misspecification of N, PIESNO errors are several orders of magnitude larger than the other methods except for the Rician noise case. MPPCA and LANE both underestimate \(\sigma _g\) (around 20% and between 10 to 15% respectively) while our proposed method resulted in the lowest error, which is around 10%. Figure 2 shows the estimated values of N by the proposed method for all cases of the synthetic datasets. Even when \(\sigma _g\) is underestimated, values of N are close to the real value. Estimating N using Eq. (2) or Eq. (3) gave similar results in both cases, so we used Eq. (2) in the present work. As limited information is available for the in vivo datasets, we assumed a Rician distribution for LANE and set \(N = 1\) as suggested by [9]. For PIESNO, setting a Rician distribution with \(N = 1\) returned less than 10 voxels identified per datasets. We instead assumed \(N = 0.5\) since it corresponds to a half Gaussian distribution [2], which is the closest theoretical distribution estimated by our method. Figure 3 shows the mean (and standard deviation) value of \(\sigma _g\) on the in vivo datasets for each methods along axial slices. The value of N as computed by our proposed method is also reported and is stable across datasets. All methods recovered average stable values of \(\sigma _g\) on the four repetitions of the same subject. However, LANE recovered the highest values of \(\sigma _g\) amongst all methods with a large variance, which might indicate overestimation in some areas. Figure 4 shows an axial slice around the cerebellum corrupted by acquisition artifacts likely due to parallel imaging. Voxels containing artifacts were automatically discarded by our method. The values of N and \(\sigma _g\) computed from these voxels also offer a better qualitative fit than assuming a Rayleigh distribution or selecting non brain data. We also timed each method to estimate \(\sigma _g\) on one of the in vivo datasets using a standard desktop computer with a 3.5 GHz Intel Xeon processor. All methods were multi threaded while PIESNO was only single threaded. The runtime to estimate \(\sigma _g\) (and N) was around 10 secs for our proposed method, 11 secs for PIESNO, 3 min for MPPCA and 18 min for LANE.

Fig. 1.
figure 1

Percentage of error in estimating the noise standard deviation for each slice along the Z axis with the mean (solid line) and standard deviation (shaded area). In the top image, \(\sigma _g = 171\) is constant and N is known while in the bottom image \(\sigma _g\) varies spatially and N is unknown or assumed Rician distributed.

Fig. 2.
figure 2

Estimated value of N by the proposed method. Even for the spatially variable case where \(\sigma _g\) is slightly underestimated, the estimated values of N are stable and correspond to the real values used in the synthetic simulations in every case.

Fig. 3.
figure 3

At the top, estimated values of \(\sigma _g\) for the 4 in vivo datasets. For the proposed method, estimated values of N are shown in darker hues for each dataset. On the bottom, an axial slice of a b = 0 s/mm\(^2\) image from one dataset and the estimated values of \(\sigma _g\) for MPPCA and LANE. For the proposed method and PIESNO, a mask of the identified background voxels (in yellow) overlaid on the data.

Fig. 4.
figure 4

An axial slice in the cerebellum from one of the in vivo datasets. Voxels identified in (A) as noise only (yellow) are free of artifacts in a single slice in (B) or along the sum of all volumes in (C). In (D), the normalized density histogram using the selected voxels from (A) (blue) fits well a chi distribution with \(N = 0.47\) and \(\sigma _g = 0.11\), while assuming a Rayleigh distribution (green) or using all non brain voxels (orange) leads to a worse visual fit.

4 Discussion and Conclusion

We have shown how a change of variable to a gamma distribution \(\varGamma (N, 1)\) can be used to robustly and automatically identify background voxels. Once identified, the moments and maximum likelihood equations (Eqs. (1)–(3)) of the gamma distribution can be used iteratively to compute the number of degrees of freedom N and the Gaussian noise standard deviation \(\sigma _g\) relating to the original noise distribution. The presented equations are also fast to compute (around 10 s on in vivo data). Results on the synthetic datasets show that we can reliably estimate both parameters from the magnitude data itself. While the method we have presented assumes that each 2D slice contains a single noise distribution, N can be computed reliably on spatially varying noise and \(\sigma _g\) with an error between 5 and 10%, which is less than the compared methods. On the in vivo datasets, our method is stable across the four repetitions and can automatically discard voxels corrupted by acquisition artifacts due to parallel acceleration. From the identified background voxels, without any specific assumption, the recovered distribution parameters fit well the histogram of the data. This distribution is close to a half Gaussian distribution (\(N = 0.5\)) while the Rician noise assumption would not be adequate in this case. Our method is also the first to identify any type of noise distribution from the magnitude data itself without requiring external information about the scanner or the reconstruction process. Interestingly, while we have shown results on dMRI datasets, the theory we presented applies to any other MRI weighting using large samples of magnitude data e.g.functional MRI. If measurements from the scanner without any object signals are acquired (i.e.noise maps), a local window estimation of our proposed method could be used to overcome the shortcoming of assuming stationary 2D noise distributions. Noise maps measurements could also be used for cases such as body or cardiac imaging where background voxels are usually not available in large quantities. Automatic identification of the noise distribution parameters could help multicenter studies which may not currently collect information about the acquisition and reconstruction process [4] or methods harmonizing data between different scanners and acquisition protocols [11]. Our method can also be used to provide prior knowledge beyond the textbook Rician distribution when computing local diffusion models [5, 6].