Keywords

1 Introduction

Brain parcellation has become an essential tool for understanding neurological structural-functional associations at a millimeter scale. The resulting voxelwise tissue classifications are integral to identifying structural regions for connectomics, functional activations, quantitative/metabolical changes, diffusion connectivity, etcetera. These techniques require reproducible segmentations; however, manual delineation is time-consuming, exhibits poor reproducibility, and is subject to inter- and intra- operator variability. For these reasons, automatic brain parcellation has been widely studied [1,2,3]. Several automatic strategies have been proposed in the literature to segment brain structures, such as deformable, learning-based, region-based, etc. [4,5,6]; however, most of these methods are structure-specific and do not allow segmentation of the whole brain. In contrast, atlas-based strategies provide a whole parcellation when the atlases used have all the structures labelled.

In multi-atlas segmentation, a collection of atlases is registered to the target image and their labels are propagated and fused in the target image space, obtaining the final segmentation. Label fusion strategies based on intensities [7,8,9,10] have been demonstrated to be robust and provide good performance when dealing with healthy subjects. However, as most other state-of-the-art methods, they are designed to segment healthy subjects and their performance tends to be affected when segmenting brains hindered by tumors and lesions, for instance, as a result of multiple sclerosis (MS) [12].

Herein, we propose a novel statistical fusion algorithm that reformulates the non-local STAPLE (NLS) [8] statistical framework to handle (anatomical) MRI visible lesions. As in NLS, our method models the registered atlases as collections of volumetric patches with intensity and label information. To complement the non-local criteria, we introduce lesion mask information to resolve the imperfect correspondences between the healthy atlases and the lesioned target derived from inaccurate registrations. Additionally, a second mask is integrated into the estimation process, which forces the voxel label assignation in case it is known beforehand. For instance, this modification is useful when segmenting brains with tumors for which sub-regions are known. Together, these innovations enable inclusion of masks of abnormal anatomy and manually provided edits within modern statistical fusion approaches. We derive the theoretical basis governing our method and demonstrate segmentation improvement with respect to other multi-atlas strategies on the state of the art on both simulated and MS images.

2 Theory

Consider a target gray-level image (with lesions) represented as a vector \(I\in \mathrm I\!R^{N\times 1}\). Let \(T\in \mathcal {L}^{N\times 1}\) be the latent representation of the true target segmentation, where \(\mathcal {L}=\{0,\ldots ,L-1\}\) is the set of possible labels which can be assigned to a concrete voxel. Let \(M\in \{0,1\}^{N\times 1}\) be a binary lesion mask indicating whether a given voxel i of the target image contains or is part of a lesion and \(K\in \{0,1\}^{N\times 1}\) a second mask specifying if for a given voxel i of the target image, the true label is known, hence \(M_i=p\left( I_i\in lesion\right) \) and \(K_i=p\left( T_i=T_k\in \mathcal {L}^{N\times 1}\right) \). Note that both masks are optional and can be neglected if all voxels in the mask are set to 0. Consider a set R of registered healthy atlases with associated gray level images, \(A\in \mathrm I\!R^{N\times R}\), and propagated label decisions, \(D\in \mathcal {L}^{N\times R}\). Let \(\theta \in [0,1]^{R\times N\times L\times L}\) be the performance level parameters of the raters (registered atlases), defined voxel-wise. Each element of \(\theta \), \(\theta _{jis's}\), represents the probability that rater j observes label \(s'\) given that the true label is s at a given voxel i and the corresponding voxel \(i^{*}\) on the associated atlas−i.e., \(\theta _{jis's}=p\left( D_{i^{*}j}=s',A_j|T_i=s,I_i,M_i,K_i,\theta _{jis's}\right) \), where \(i^{*}\) is the voxel on atlas j that corresponds to the target voxel i.

2.1 Non-local Correspondence Model

Non-local STAPLE (NLS) [8] incorporates the concept of patch-based non-local correspondence based on the image intensities of both the target image I and the registered atlases A to the STAPLE framework. Although this concept has proven useful for matching healthy tissues to account for registration accuracy, we cannot rely on intensity similarities between the target lesion areas and the healthy atlases to rectify registration errors. Therefore, we assume that voxel correspondence inside the lesions cannot be further improved based on intensity and, hence, enforce the non-local weighting (\(\alpha _{ji'i}\)) between voxel i in the target image at voxel \(i'\) on the jth atlas as follows:

$$\begin{aligned} \begin{aligned} \alpha _{ji'i}=\left( \frac{1}{Z_\alpha } \exp {\left( -\frac{\Vert {\wp _{M_i}\circ \left( \wp \left( A_{i'j}\right) -\wp \left( I_i\right) \right) }\Vert ^{2}_{2}}{2\cdot {\sigma _i}^{2}\cdot \Vert \wp _{M_i}\Vert }\right) } \exp {\left( -\frac{\varepsilon _{i'i}^{2}}{2\cdot {\sigma _d}^{2}}\right) }\right) \cdot \left( 1-M_i\right) \\ +~\delta \left( i'=i\right) \cdot M_i \end{aligned} \end{aligned}$$
(1)

where \(\wp (\cdot )\) is the set of intensities in the patch neighborhood of a given intensity location. In this definition, \(\wp _{M_i}=\wp (1-M_i)\) is the masking term that excludes lesion voxels from the patch calculation and enforces the same patch neighborhood size/shape in both the atlas and the target, \(\Vert {\wp _{M_i}\circ (\wp (A_{i'j}-\wp (I_i))}\Vert ^{2}_2\) is the L2-norm between the atlas patch centered at \(i'\) and the target patch centered at i, \(\varepsilon _{i'i}^{2}\) is the Euclidean distance in physical space between i and \(i'\), \(\sigma _i\) and \(\sigma _d\) are the standard deviations of the intensity and distance weights, and \(Z_\alpha \) is a partition function that enforces the constraint that \(\sum _{i'\in \mathcal {N}(i)}{\alpha _{ji'i}}=1\), where \(\mathcal {N}(i)\) is the set of voxels in the search neighborhood of a given target voxel. \(\delta (i'=i)\) is the Dirac delta function, and \(\Vert \wp _{M_i}\Vert \) is the number of voxels in the patch neighborhood.

2.2 The Algorithm

If the exact voxel correspondences between the target and the atlases (non-local model) were known, the lesion mask, and the target and atlas intensity relationships could be ignored and the spatial STAPLE [11] definition of \(\theta \) could be used.

$$\begin{aligned} \begin{aligned} \theta _{jis's}\equiv p\left( D_{i^{*}j}=s',A_j | T_i=s,I_i,M_i,K_i,\theta _{jis's}\right) \\ = p\left( D_{i^{*}j}=s' | T_i=s,M_i,K_i,\theta _{jis's}\right) \end{aligned} \end{aligned}$$
(2)

However, this correspondence is not known and we have to learn it with the model defined in Sect. 2.1. Note that using this model we can approximate the relationship by taking the expected value of \(p\left( D_{i^{*}j}=s',A_j | T_i=s,I_i,M_i,K_i,\theta _{jis's}\right) \) across the raters. Using an assumption of conditional independence between the labels, lesion mask and intensity, we approximate the density function as:

$$\begin{aligned} \begin{aligned} p\left( D_{i^{*}j}=s',A_j | T_i=s,I_i,M_i,K_i,\theta _{jis's}\right) \approx E\left[ p\left( D_j,A_j | T_i=s,I_i,M_i,K_i,\theta _{jis}\right) \right] \\ = E\left[ p\left( D_j | T_i=s,M_i,K_i,\theta _{jis}\right) \cdot p\left( A_j | I_i,M_i\right) \right] \\ = \underset{i'\in \mathcal {N}(i)}{\sum } p\left( D_{i^{*}j}=s' | T_i=s,M_i,K_i,\theta _{jis's}\right) \cdot p\left( A_{i'j} | I_i,M_i\right) = \underset{i'\in \mathcal {N}(i)}{\sum }\theta _{jis's}\cdot \alpha _{ji'i} \end{aligned} \end{aligned}$$
(3)

E-step. Let \(W\in \mathrm I\!R^{L\times N}\), where \(W_{si}^{(t)}\) represents the probability that the true label associated with voxel i is s at iteration t of the algorithm given the provided information and the performance level parameters.

$$\begin{aligned} \begin{aligned} W_{si}^{(t)}\equiv p\left( T_i=s|D,A,I,M,K,\theta ^{(t)}\right) \end{aligned} \end{aligned}$$
(4)

Using Bayes’ rule to separate the prior label probability (\(p\left( T_i=s\right) \)) and assuming independence among the raters, we can rewrite this equation as follows:

(5)

where \(\delta \left( s'=s\right) \) is the Dirac delta function (probability that the known label for voxel i of the truth segmentation is s). Using the non-local correspondence model and the approximated density function, we obtain:

(6)

M-step. In this step, the calculated \(W_{si}^{(t)}\) is used to update \(\theta _{ji}^{(t+1)}\) by maximizing the expectation of the complete data log likelihood. As the complete data log likelihood is not observable, it is replaced by its conditional expectation given the observable data D, A, I, M, K using the current estimate \(\theta \).

$$\begin{aligned} \begin{aligned} \theta _{ji}^{(t+1)}=\underset{\theta _{ji}}{{{\mathrm{arg\,max}}}} \underset{i'\in \mathcal {B}_i}{\sum }E\left[ \ln \left( p\left( D_j,A_j | T_{i'},I_{i'},M_{i'},K_{i'},\theta _{ji} | D,A,I,M,K,\theta ^{(t)}\right) \right) \right] \\ =\underset{\theta _{ji}}{{{\mathrm{arg\,max}}}}\underset{i'\in \mathcal {B}_i}{\sum } \underset{s}{\sum }p\left( T_{i'}=s | D,A,I,M,K,\theta ^{(t)}\right) \cdot \ln \left( p\left( D_j,A_j | T_{i'},I_{i'},M_{i'},K_{i'},\theta _{ji}\right) \right) \\ =\underset{\theta _{ji}}{{{\mathrm{arg\,max}}}}\underset{i'\in \mathcal {B}_i}{\sum } \underset{s}{\sum }W_{si'}^{(t)}\cdot \ln \left( p\left( D_{i^{*}j}=s',A_j | T_{i'},I_{i'},M_{i'},K_{i'},\theta _{ji}\right) \right) \\ =\underset{\theta _{ji}}{{{\mathrm{arg\,max}}}}\underset{i'\in \mathcal {B}_i}{\sum } \underset{s}{\sum }W_{si'}^{(t)}\cdot \ln \left( \underset{i''\in \mathcal {N}(i'):D_{i''j=s'}}{\sum } \theta _{jis's}\cdot \alpha _{ji''i'}\right) \end{aligned} \end{aligned}$$
(7)

As each row of \(\theta \) must sum one to be a valid probability mass function, we can maximize the performance level parameters for each element by using a Lagrange multiplier (\(\lambda \)) to formulate the constrained optimization problem.

$$\begin{aligned} \begin{aligned} 0~= \frac{\delta }{\delta \theta _{jin'n}} \left[ \underset{i'\in \mathcal {B}_i}{\sum } \underset{s}{\sum }W_{si'}^{(t)}\cdot \ln \left( \underset{i''\in \mathcal {N}(i'):D_{i''j=s'}}{\sum } \theta _{jis's}\cdot \alpha _{ji''i'}\right) +\lambda \underset{s'}{\sum } \theta _{jis's}^{(t+1)}\right] \end{aligned} \end{aligned}$$
(8)

By solving this equation, we obtain

$$\begin{aligned} \begin{aligned} \theta _{jis's}^{(t+1)} =\frac{\sum _{i'\in \mathcal {B}_i} \left( \sum _{i''\in \mathcal {N}(i'):D_{i''j=s'}}\alpha _{ji''i'}\right) \cdot W_{si'}^{(t)}}{\sum _{i'\in \mathcal {B}_i}W_{si'}^{(t)}} \end{aligned} \end{aligned}$$
(9)

2.3 Initialization and Priors

The voxel-wise prior \(p\left( T_i=s\right) \) was initialized using the weak log-odds majority vote, as in NLSS. The performance parameters, \(\theta _{jis's}\), were initialized assuming each atlas has high performance as: 1, if \(s=s'\); 0.95, if \(s=s'\); 0, if \(s\ne s'\); and \(\frac{0.05}{L-1}\), otherwise. The search neighborhood \(\mathcal {N}(\cdot )\) was set to \(7\times 7\times 7\), patch \(\wp (\cdot )\) dimensions to \(5\times 5\times 5\) and \(\sigma _i\) and \(\sigma _d\) were set to 0.25 and 1.5, respectively. Algorithm convergence was detected when the average change in the diagonal elements of \(\theta \) was below \(10^{-4}\).

3 Experiments and Results

The atlases used in our experiments were taken from the MICCAI 2012 Grand Challenge and Workshop on Multi-Atlas Labeling database [13]. This database consists of 35 T1-w MR images, obtained from the OASISFootnote 1 project and labeled by Neuromorphometrics, Inc.Footnote 2, and includes labels for the whole brain. PCA atlas selection was performed and only the 15 most similar atlases were used for segmentation. All images were histogram normalized and N4 bias field corrected before registration. All pair-wise registrations were performed using an initial affine registration (niftyregFootnote 3) followed by a non-rigid (ANTsFootnote 4) procedure. In all the registrations performed, the lesions were masked-out to avoid their intensities to interfere in the similarity metric calculation.

As benchmarks, we compare the proposed algorithm to majority vote (MV) [14], non-local STAPLE (NLS) [8], non-local Spatial STAPLE (NLSS) [9] and Joint Label Fusion (JLF) [10]. For a fair comparison, all the parameters that NLS and NLSS share with our algorithm were set to the same values. Also, JLF was executed with the same patch and neighborhood size.

3.1 Simulated Lesions

Evaluating the performance of segmentation algorithms on real lesioned images is not an easy task since there is a lack of public databases with ground truth for both lesions and structures. For this reason, in the first experiment, we simulated two sets of artificially lesioned images: (1) 10 with uniform intensity lesions, to test the proposed theory and, (2) 15 with lesion shapes, intensities and locations obtained from an in-house MS patient database, to simulate realistic cases. All the lesions were generated on random subjects from the MICCAI 2012 database. The lesion load of the generated images ranged from [33.49−119.74] mm\(^3\) in the first cohort and from [3.16−26.96] mm\(^3\) in the second one.

We evaluated the segmentation results quantitatively using a global Dice Similarity Coefficient (DSC) across all the structures as the main measure. As the lesion intensities not only necessarily affect the lesion area segmentation itself, but also the surrounding tissues, two measures were calculated: (1) DSC inside the lesion mask, and (2) DSC inside a mask that included three voxels of the contour. Note that \(\mathcal {N}(\cdot )\) was set to \(7\times 7\times 7\).

Figure 1(A) shows that, inside the lesion mask, our method performed significantly better than all the intensity-based strategies (JLF, NLS and NLSS) in both cohorts. However, the performance was similar to that of MV. This is due to the fact that we cannot trust the intensities inside the lesions, and we can only rely on an accurate registration (same as MV does). On the other hand, when the performance was analyzed around the lesion areas, our proposal was the one that provided the best results (similar to MV in the first cohort and to JLF in the second one). This behavior is depicted in Fig. 1(B), where JLF (b and h) misclassifies several structures inside the lesion areas, whereas in NLSS (c and i) the segmentation is being also affected in the surrounding structures.

Fig. 1.
figure 1

(A) Global DSC and (B) qualitative segmentation results of analyzed multi-atlas strategies on both simulated databases: (a−d) uniform intensity lesions, and (e−j) MS simulated lesions.

For the evaluation of the manual edits (K mask integration), we segmented the first dataset again, this time feeding the algorithm with the same lesion mask for both M and K. The results showed, as expected, a DSC of 1 inside the lesion areas (M/K mask), whereas the mean DSC around the lesions was \(0.7901\pm 0.0463\), very similar to that of the first execution (\(0.7919\pm 0.0457\)), conserving a similar effect on the tissues surrounding the lesions.

Fig. 2.
figure 2

Segmentation results of the analyzed multi-atlas strategies for the image 01038PAGU of the MICCAI2016 Challenge database.

3.2 MSSeg 2016 Challenge

For the second experiment, we qualitatively compared the fusion results obtained by the analyzed algorithms on a MS patient database (MSSeg 2016 challengeFootnote 5).

Figure 2 shows the segmentations obtained with all the analyzed multi-atlas strategies. As we can observe from Fig. 2(a), MS lesions are hypo-intense in T1-w modality, which makes its intensity profile similar of that of the gray matter (GM) and even sometimes similar to the cerebro-spinal fluid (CSF) which may affect the results of intensity-based algorithms. The lesions shown in Fig. 2(b), should be classified as white matter, however, the intensity-based algorithms of the state of the art, Fig. 2(f−h), tend to misclassify those regions as GM or CSF, whereas our method, Fig. 2(c), shows better classification results in those areas. When our method is fed with a K mask, Fig. 2(d), the lesion surrounding voxels remain practically the same as when the K mask is not used, Fig. 2(c), whereas the segmentation result inside the lesions agrees entirely with the labels imposed by this mask, as seen in Sect. 3.1.

4 Discussion

Accurate structural volume measurements are important in MS, since the atrophy of some structures such as the deep GM is relevant to the disease progression. However, we have shown that multi-atlas strategies based on intensities, which achieve good segmentation results on healthy subjects, are affected by lesions, and therefore corrupting real measures.

Herein, we have presented the theory to modify the non-local STAPLE framework to deal with MRI visible lesions. The experiments performed show that our proposal outperforms the state-of-the-art multi-atlas strategies in the lesion areas for both simulated and MS patient images.

Over-performance of MV compared to the state-of-the-art intensity-based strategies was observed on the experiments performed on the uniform intensity lesions database around the lesion areas. This behavior could be due to the fact that the other strategies are patch-based. These strategies consider mean patch differences to calculate the correspondences, hence the bright voxels of the lesions could bias the mean intensity, finding wrong atlas correspondences. Even though, this is an extreme case to test the proposed theory, it shows, combined with the over-performance of MV inside the lesion areas, the effect of the lesion intensities on the segmentation.

In this work, we have only focused on the segmentation performance of the lesion areas, since those are the ones concerning the proposed reformulation. Nonetheless, as these areas are better segmented with our strategy, the average whole brain segmentation performance slightly increases compared to the non-local STAPLE variants. This small improvement is due to lesions are small compared to the whole brain volume. For this reason, we believe that extending our theory to other methods of the literature, such as JLF, would be beneficial in terms of segmentation accuracy of the lesion areas but also of the whole brain.