1 Introduction

Multiple Sclerosis (MS) is a disease characterized by focal and diffuse inflammation, degeneration, and repair in the central nervous system [1]. The inflammatory demyelination is more common in white matter (WM), and become manifest as focal WM lesions in Magnetic Resonance Imaging (MRI). Recently, advanced MRI has revealed substantial tissue damage also in the cortical gray matter (GM) (i.e. cortical lesions) [2].

Automated MS lesion segmentation has been an active research topic for more than 20 years [3]. Despite significant advances in quantitative image analysis of MS lesions in MRI, some challenges however still remain. The effect of a mixture of tissues in the same voxel, known as partial volume (PV) effect, is one of the aspects that render the lesion segmentation problem difficult. PV affects particularly small lesions, which are of key importance for early diagnosis and follow-up of MS patients. It has been recently reported a relatively good correlation between the severity of cortical lesions and the patient disability [2]. They are normally small and tend to appear more frequently in regions prone to strong PV effects as seen at the interface between WM and GM [4].

Fig. 1.
figure 1

Diagram of the lesion segmentation pipeline divided into two main steps: the first step outputs lesion location bitmap (\(\lambda ^\mathrm{Les}\)) and the second step performs the PV estimation. \(\pi ^\mathrm{CSF}\), \(\pi ^\mathrm{GM}\) and \(\pi ^\mathrm{WM}\) are atlas-based prior probability maps for CSF, GM, and WM, respectively.

Our study is based on a set of advanced MRI sequences that have shown to be equally sensitive to WM lesions as routine sequences, but significantly more sensitive to cortical lesions [4, 5]. Some of these sequences have been recently recommended as optional sequences in clinical protocols [1]. The goal of this work is to improve MS lesion delineation and consequently the estimation of lesion load in cortical and subcortical areas, a clinically very significant biomarker. Our novel framework (see Fig. 1) combines a supervised lesion detection method with a Bayesian PV estimation prototype algorithm. The former is a k-nearest neighbors (kNN) approach that has been reported to achieve good detection of MS lesions [5, 6]. The latter is a novel method inspired by the “mixel” model originally proposed in [7]. The model leads, however, to an ill-posed estimation problem for which [8] suggested the use of regularizing priors. Here, we further introduced spatial constraints (derived from the initial kNN detection) into the model to estimate realistic concentration maps of healthy (WM, GM) and pathological brain tissue as well as cerebrospinal fluid (CSF). These concentration maps are used to directly compute lesion volumes rather than correcting an initial hard tissue classification for PV effects as in previous methods [9, 10]. Furthermore, we first use the supervised approach to drive the unsupervised segmentation contrary to what was proposed in [11].

With this work, we thus strive to combine the advantages of supervised and unsupervised methods to yield both good lesion detection and volume estimation.

2 Method

2.1 Partial Volume Estimation

Consider a set of \(n_c\) images of a given single subject acquired from different MRI sequences after the alignment, bias field correction and skull stripping. Consistent with [7, 8, 12], we assume that the vector of image intensities \(\mathbf{y}_i\) at a voxel i in the total intra-cranial mask relates to an unknown vector of tissue concentrations \(\mathbf{q}_i\), with \(\mathbf {q}_i\succeq 0\) and \(\mathbf {q}_i^{\top }{} \mathbf{1}=1\), through the statistical relation:

$$\begin{aligned} \mathbf{y}_i = \mathbf{M}^\top \mathbf{q}_i +\varvec{\varepsilon }_i, \qquad \varvec{\varepsilon }_i\sim N(0, \mathbf{V}), \end{aligned}$$
(1)

where \(n_t\) is the number of distinct tissues, \(\mathbf{M}\) is an \(n_t\times n_c\) matrix representing the mean tissue intensities for each channel (i.e., \(M_{tc}\) is the mean intensity of tissue t in channel c), and \(\mathbf{V}=\mathrm{diag}(\sigma _1^2,\ldots ,\sigma _{n_c}^2)\) is the noise covariance matrix assuming independent stationary Gaussian white noise across modalities. In this work, we consider \(n_t=4\) tissues: CSF, GM, WM, lesions, and \(n_c=3\) channels. We use a prior concentration model of the form proposed in [8] in order to regularize the problem of recovering the voxelwise tissue concentrations \(\mathbf{q}_i\) via Bayesian maximum a posteriori (MAP) estimation:

$$\begin{aligned} \pi (\mathbf{q}_1, \mathbf{q}_2, \ldots , \mathbf{q}_{n_v}) \propto \exp \Big [ -\frac{1}{2}\sum _i \mathbf {q}_i^\top \mathbf{A} \mathbf {q}_i -\frac{\beta }{2}\sum _{i,j\in \mathcal{N}_i}\Vert \mathbf {q}_i-\mathbf {q}_j\Vert ^{2} \Big ], \end{aligned}$$
(2)

where \(n_v\) is the total number of intra-cranial voxels, \(\mathbf{A}\) is a symmetric penalty matrix with zero diagonal and positive off-diagonal elements, \(\beta \) is a positive constant, and \(\mathcal{N}_i\) is the neighborhood of voxel i according to some discrete topology (we use a 6-topology). Both elements of \(\mathbf{A}\) and \(\beta \) are hyperparameters to be tuned in a learning phase. While \(\beta \) controls the amount of spatial smoothness of tissue concentration maps, the purpose of \(\mathbf{A}\) is to disentangle intensity fluctuations due to noise from PV effects. Each non-diagonal element acts as a penalty on the mixing of distinct tissues in a voxel, hence limiting spurious concentration variations when a single tissue is present. For instance, the larger \(A_{12}\), the less likely voxels contain both CSF and GM.

We propose to generalize the prior model of [8] by allowing voxel-dependent penalty matrices \(\mathbf{A}_i\) including non-zero diagonal elements in order to penalize tissues locally. This is done here to avoid confusing GM and lesions, which have similar intensity signatures in the investigated image sequences. Specifically, let \(\pi ^\mathrm{GM}\) be a probabilistic atlas-based prior probability map for the GM and \(\lambda ^\mathrm{Les}\) a bitmap that indicates brain regions with lesions (see Sect. 2.2). We set the diagonal elements of \(\mathbf{A}_i\) corresponding respectively to CSF, GM, WM and lesions, via:

$$\begin{aligned} A_{i,11} = 0, \quad A_{i,22} = a_2 (1-\pi _i^\mathrm{GM}), \quad A_{i,33} = 0, \quad A_{i,44} = a_4 (1-\lambda _i^\mathrm{Les}), \end{aligned}$$

where \(a_2\) and \(a_4\) are positive factors pre-tuned along with the off-diagonal elements \(A_{12}, A_{13}, A_{14}, A_{23}, A_{24}, A_{34}\) and the smoothness parameter \(\beta \), which are assumed voxel-independent in our particular implementation.

Solving for the MAP tissue concentrations yields a quadratic programming problem:

$$\begin{aligned} \min _{\mathbf{q}_1, \ldots , \mathbf{q}_{n_v}} \sum _{i} \Big [ (\mathbf{y}_i-\mathbf{M}^\top \mathbf{q}_i)^{\top } \mathbf{V}^{-1} (\mathbf{y}_i-\mathbf{M}^\top \mathbf{q}_i) + \mathbf {q}_i^\top \mathbf{A}_i \mathbf {q}_i + \beta \sum _{j\in \mathcal{N}_i}\Vert \mathbf {q}_i-\mathbf {q}_j\Vert ^{2} \Big ], \end{aligned}$$

where each \(\mathbf{q}_i\) is searched in the multidimensional simplex. The solution is tracked numerically using an iterative scheme that loops over the intra-cranial voxels, and solves for the associated concentration vector \(\mathbf{q}_i\) with all other concentration vectors held fixed using an active set algorithm [8]. This method proves very robust in practice, and typically converges in less than 25 iterations.

2.2 Bitmap of Lesion Location

A supervised approach was used to obtain a map that locates candidates of lesional tissue (\(\lambda ^\mathrm{Les}\)). The method is based on the kNN classifier trained using a set of features obtained from images and atlas-based prior probability maps of the two brain tissues and CSF (\(\pi ^\mathrm{GM}\), \(\pi ^\mathrm{WM}\) and \(\pi ^\mathrm{CSF}\)). The features used for the classification were (i) image voxel intensities, (ii) spatial location coordinates in a common space, and (iii) tissue prior probabilities. The k value was set to 15, which was empirically found as a good trade-off between accuracy and computation time [5]. Manual segmentations described in Sect. 3 were used to train the classifier (1 was assigned to lesions voxels, and 0 to the other tissues). Finally, in order to obtain the lesion location bitmap \(\lambda ^\mathrm{Les}\), a dilation using a \(4\times 4\times 4\) cubic shape as a structural element was applied to the kNN output to enlarge the boundaries of the detected regions in order to guarantee that all lesional tissue is covered. Using this map to drive the PV segmentation with lesion candidates renders the present approach more patient-specific in contrast to employing general tissue atlas priors.

2.3 Imaging Parameters

The noise variance matrix \(\mathbf{V}\) is initially assumed to be zero, and is iteratively re-estimated by MAP concurrently with the tissue concentrations (see Sect. 2.1), yielding the update rule:

$$\begin{aligned} \mathbf{V} = \mathrm{diag} \Big [ \frac{1}{n_v}\sum _i (\mathbf{y}_i-\mathbf{M}^{\top }{} \mathbf{q}_i)(\mathbf{y}_i-\mathbf{M}^{\top }{} \mathbf{q}_i)^\top \Big ], \end{aligned}$$
(3)

which is performed after a complete tissue concentration re-estimation loop over the intracranial voxels.

Conversely, the matrix \(\mathbf{M}\) of mean tissue intensities is held fixed during the estimation of tissue concentrations. The mean intensity for CSF, GM, WM was determined from \(\pi ^\mathrm{CSF}\), \(\pi ^\mathrm{GM}\), \(\pi ^\mathrm{WM}\) maps respectively, using the voxels with probability higher than 0.95. The mean intensity for lesional tissue was estimated from the kNN output.

2.4 Hyperparameter Tuning

A reference patient was used to tune the hyperparameters \(\mathbf{A}\) and \(\beta \) so as to minimize the Hellinger distance between the manual lesion segmentation binary mask and the lesion concentration map output by the PV estimation algorithm. Two elements of \(\mathbf{A}\) were fixed to very large values in order to proscribe mixing of CSF with WM and CSF with lesions (\(A_{13}=A_{14} = 1\times 10^{10}\)). The other parameters were optimized using Powell’s method, yielding \(A_{12} = 27.52\), \(A_{23} = -1.42\), \(A_{24} = 14.49\), \(A_{34} = 3.90\), \(a_2 = 15.41\), \(a_4 = 158.55\), and \(\beta =1.53\).

3 Experimental Validation

3.1 Data and Pre-processing

Thirty-nine patients (14 males, 25 females, median age 34 years, age range: 20–60 years) with early relapsing-remitting MS, disease duration less than 5 years from diagnosis) and Expand Disability Status Scale (EDSS) score between 1 and 2 (median EDSS = 1.5), were scanned on a 3T MRI system (MAGNETOM Trio, Siemens Healthcare GmbH, Erlangen, Germany) using a 32-channel head coil. The MRI protocol included: (i) high-resolution magnetization-prepared 2 rapid acquisition with gradient echo (MP2RAGE) (TR/TI1/TI2 = 5000/700/2500 ms, voxel size \(= 1 \times 1 \times 1.2\,\mathrm{mm}^3\)), (ii) 3D fluid-attenuated inversion recovery (FLAIR) (TR/TE/TI = 5000/394/1800 ms, voxel size \(= 1 \times 1 \times 1.2 \,\mathrm{mm}^3\)), and (iii) 3D double-inversion recovery (DIR) (TR/TE/TI1/TI2 = 10000/218/450/3650 ms, voxel size \(= 1 \times 1 \times 1.2 \,\mathrm{mm}^3\)) all acquired in the same session without patient repositioning.

WM and cortical lesions were first identified and marked by one radiologist and one neurologist separately and subsequently agreed on between the two in a follow-up reading. A trained technician then delineated the lesion volumes in each image, we consider the resulting masks as ground truth for lesion load and volume. A patient with relatively high lesion load was chosen as a reference to train the PV algorithm (see Sects. 2.3 and 2.4) and therefore excluded from the ensuing statistical analysis.

ELASTIX [13] was used to rigidly register the different images sequences to a common space in each subject. All images were further skull-stripped using an in-house method [14], and corrected for intensity inhomogeneities using the N4 algorithm [15]. The intensity normalization was performed using the histogram matching technique proposed by [16]. This last pre-processing step was applied only to the images used as kNN input. Fuzzy in-house templates of prior WM, GM, CSF probabilities were nonrigidly registered using ELASTIX onto each image volume to produce the prior maps \(\pi ^\mathrm{GM}\), \(\pi ^\mathrm{WM}\), and \(\pi ^\mathrm{CSF}\) (see Sect. 2.2).

Fig. 2.
figure 2

Patches showing examples of manual segmentation (GT, second column), and automated segmentation before (kNN, third column) and after (kNN-PV, last column) applying the PV approach. From the ground truth, WM lesions are shown in green and cortical lesions in blue. The examples are shown in a MP2RAGE background.

3.2 Results

In line with our goal to optimise the lesion delineation of the supervised algorithm used in the first step, we compared the obtained lesion load and voxel-wise metrics before and after applying the proposed PV algorithm on the binary lesion masks. Since the PV estimation is based on the initial lesion location bitmap \(\lambda ^\mathrm{Les}\) obtained by the kNN algorithm, it is restricted to known lesion locations. Consequently, no significant improvement was found for the lesion detection rate (DR) nor for the false positive rate. DR for WM lesions \({\approx }75\%\), and DR for cortical lesions \({\approx }55\%\) before and after applying the PV algorithm.

Fig. 3.
figure 3

Boxplots of TLV difference between manual (ground truth, GT) and automated lesion segmentation before (kNN) and after (kNN-PV) applying the PV method. From left to right, TLV difference for all, WM, and cortical lesions. The crosses in the plot represent outliers in our cohort. N.S.: not significant.

Fig. 4.
figure 4

Boxplots showing the voxel-wise analysis (sensitivity and Dice) before (kNN) and after (kNN-PV) applying the PV estimation method. The crosses in the plot represent outliers in our cohort.

Figure 2 shows exemplary results (WM, cortical and peri-ventricular lesions) comparing the ground truth (GT) with the lesion masks before and after PV estimation. It can be observed visually that lesions appear better delineated using the PV algorithm. Figure 3 shows the differences between the total lesion volume (TLV) computed from the binary and PV-segmented lesions masks and the GT both for all lesions and for WM/cortical lesions separately. PV improved the TLV estimation with respect to the GT significantly compared to the binary kNN mask (P-value \(\approx 0.004\)), mainly due to a substantial improvement in cortical lesion segmentation (P-value \(\approx 2\mathrm {e}{-05}\)). To compare the sensitivity and Dice of the two methods, a threshold has to be applied on the concentration map of lesional tissue obtained by the PV algorithm. The threshold was chosen so that the Dice between the automated and the GT segmentation was maximal. As shown in Fig. 4, a significant (P-value \(\approx 2\mathrm {e}{-07}\)) improvement of \(10\%\) for sensitivity and \(6\%\) for Dice was obtained by applying the PV estimation method.

4 Conclusion

We presented a novel framework to automatically detect and segment MS lesions in cortical and sub-cortical areas. Our method exploits the good lesion detection performance of a supervised kNN algorithm and extends it by a novel Bayesian PV estimation method to yield improved lesion load estimations. A good volume assessment is important since the TLV is a clinically relevant marker, both for diagnosis and treatment monitoring.

Our results suggest that both lesion volume estimation and lesion segmentation can be improved when PV effects are considered. Metrics like sensitivity and Dice were significantly improved when the PV estimation approach was used.

Our lesion PV estimation method can be combined with any initial detection. Actually, improving the initial lesion location bitmap would further improve the lesion segmentation and volume estimation. Future work will focus on improving the initial detection exploring other feature sets and other classification techniques.