Keywords

1 Introduction

Early detection is the key of success in the treatment of tumors, it is utmost important as it can save human lives. The accurate segmentation, the separation of brain tumors from normal brain tissues is also essential, as it can assist the medical expert in the planning of treatment and intervention. The manual segmentation of tumors requires plenty of time even for a well-trained expert. The fully automated segmentation and quantitative analysis of tumors is thus a highly beneficial service. However, it is also a very challenging one, because of the high variety of anatomical structures and low contrast of current imaging techniques which make the difference between normal regions and the tumor hardly recognizable for the human eye [1].

Magnetic resonance imaging (MRI) is the preferred imaging device in brain tumor screening, due to its better contrast and relatively fine resolution. However, it also bears difficulties like the possible presence of intensity inhomogeneity [2], and the relative intensity values that vary from device to device and from patient to patient. The MICCAI Brain Tumor Segmentation Challenge, organized yearly since 2012, intensified the research in this topic and led to several important solutions, which are usually assisted by the use of prior information, and employ various image processing and pattern recognition methodologies. Asman and Landman [3] applied a non-parametric intensity analysis in combination with a segmentation based on multiple atlases. Ghanavati et al. [4] provided a solution using the AdaBoost classifier to distinguish tumor voxels from normal ones using features based on intensity, texture, and symmetry. Hamamci et al. [5] proposed a cellular automata driven method that produces segmentation based on level sets. Sachdeva et al. [6] deployed a content based active contour model relying on intensity and texture features extracted from the histogram and co-occurrence matrix of the MRI data. Njeh et al. [7] introduced a graph cut based solution that performs distribution matching, which is highly efficient because of using rather global than pixel wise information. Zhang et al. [8] proposed a support vector machine based procedure to follow the evolution of brain tumors over time. Tustison et al. [9] combined random forests with symmetry based features to segment brain tumors. Szilágyi et al. [10] provided a semi-supervised framework for the fuzzy c-means clustering algorithm to produce accurately segmented tumors. Kanas [11] combined a clustering based preprocessing with a multi-parametric random walker segmentation. Havaei et al. [12] developed an automatic brain tumor segmentation procedure based on deep neural networks that exploits both local and global contextual features simultaneously. Pereira et al. [13] proposed a convolutional neural network solution exploiting small kernels and successfully applied it for brain tumor segmentation. Menze et al. [14] combined a Gaussian mixture model with the expectation maximization (EM) algorithm to achieve an accurate segmentation. Another Gaussian mixture based accurate solution was given by Juan-Albarracín et al. [15]. Islam et al. [16] employed multifractional Brownian motion features to provide patient-independent characterization of tumor tissues and applied the AdaBoost algorithm for tissue segmentation. Shin et al. [17] proposed deep convolutional neural networks and successfully combined it with transfer learning. Huang et al. [18] provided a brain tumor segmentation framework employing local independent projection-based classification. For further information on current brain tumor segmentation techniques, there are available recent reviews [1, 19].

In a previous paper [20] we have presented a preliminary study on the use of binary decision trees (BDT) in brain tumor detection and segmentation. We selected 13 multispectral MRI volumes from the MICCAI BRATS 2013 data set, performed the training of individual BDTs and ensembles with information taken from a subset of the volumes, and tested using the complementary subset of volumes. As a further development of our previous algorithm, in this paper we propose a random forest solution trained and tested using the whole high-grade tumor data set of MICCAI BRATS 2016 that includes 220 volumes. Our main goal in this paper is to accurately separate the whole tumor from the normal tissues in each volume. Separating further parts of the tumor based on the ground truth offered by MICCAI BRATS human experts, remains out of the scope of this study.

The rest of this paper is structured as follows: Sect. 2 gives details on the proposed methodology. Section 3 exhibits and discusses the achieved results. Finally, Sect. 4 concludes the investigation.

2 Materials and Methods

Our main goal was to establish a machine learning algorithm that accurately segments tumors in MRI volumes. This paper presents preliminary results obtained using the random forest technique, combined with a neighborhood-based post-processing. The algorithm is trained to separate the whole tumor from negative tissues. A block diagram of the proposed segmentation procedure is given in Fig. 1.

2.1 BRATS Data Sets

Brain tumor image data used in this work were obtained from the MICCAI 2016 Challenge on Multimodal Brain Tumor Segmentation [21]. The challenge database contains fully anonymized images originating from four institutions. The image database consists of multi-contrast MR scans of 280 glioma patient, out of which 220 have been acquired from high-grade and 60 from low-grade glioma patients. For each patient, multimodal (T1, T2, FLAIR, and post-Gadolinium T1) MR images are available. All volumes were linearly co-registered to the T1 contrast image, skull stripped, and interpolated to 1 mm isotropic resolution. Each record contains approximately 1.5 millions of true tissue voxels. All voxels are provided with annotation produced by human expert. Beside the four observed features of each voxel, there is a strong need to extend the feature vectors with further, computed features.

Fig. 1.
figure 1

Block diagram of the proposed method.

2.2 Histogram Normalization

Because of the nature of MRI sensors, intensity values in MRI records are relative, so we need to map the histogram of each volume onto a uniform scale. In this order, all intensity values underwent a linear transformation \(x \rightarrow \alpha x + \beta \), where parameters \(\alpha \) and \(\beta \) were established separately for each volume and each feature, such a way that the 25-percentile and 75-percentile value became 600 and 800, respectively. Further on, a minimum and a maximum intensity barrier was set up at 200 and 1200, respectively.

2.3 Computed Features

Twelve computed features were added to the feature vector describing each voxel. For each of the four observed intensities (T1, T2, T1C, FLAIR), the minimum, the maximum, and the average value was extracted from the valid neighbors within the 26-neighborhood of the voxel. Neighbors were considered valid if they had nonzero observed intensity in the given channel. The 26-neighborhood of a voxel situated at coordinates \((x_0,y_0,z_0)\) consists of all voxels whose (xyz) coordinated satisfy \(|x-x_0|\le 1\), \(|y-y_0|\le 1\), and \(|z-z_0|\le 1\).

2.4 Missing Data

Some voxels had zero valued observed features interpreted as a missing value. Voxels with more than one such value were excluded from further processing. Those with a single zero value received an interpolated value from the neighborhood of the voxel. These voxels were not included in the main data processing.

2.5 Binary Decision Trees

Binary decision trees (BDT) can describe any hierarchy of crisp (non-fuzzy) two-way decisions [22]. Given an input data set of vectors \(\mathbf {X}=\{{\varvec{x}}_1,{\varvec{x}}_2,\dots ,{\varvec{x}}_n\}\), where \({\varvec{x}}_i = [x_{i,1},x_{i,2},\dots ,x_{i,m}]^T\), a BDT can be employed to learn the classification that corresponds to any set of labels \(\varLambda = \{\lambda _1,\lambda _2,\dots ,\lambda _n\}\). The classification learned by the BDT can be perfect if there are no identical training vectors with different labels, that is, \({\varvec{x}}_i={\varvec{x}}_j\) implies \(\lambda _i=\lambda _j\), \(\forall i,j\in \{1,2,\dots , n\}\). The BDT is built during the learning process. Initially the tree consists of a single node, the root, which has to make a decision regarding all n input vectors. If not all n vectors have the same label, which is likely to be so, then the set of data is not homogeneous, there is a need for a separation. The decision will compare a single chosen feature, the one with index k (\(1\le k \le m\)), of the input vectors with a certain threshold \(\alpha \), and the comparison will separate the vectors into two subgroups: those with \(x_{i,k} < \alpha \) (\(i=1\dots n\)), and those with \(x_{i,k}\ge \alpha \) (\(i=1\dots n\)). The root will then have two child nodes, each corresponding to one of the possible outcomes of the above decision. The left child will further classify those \(n_1\) input vectors, which satisfied the former condition, while the right child those \(n_2\) ones that satisfied the latter condition. Obviously, we have \(n_1+n_2=n\). For both child nodes, the procedure is the same as it was for the root. When at a certain point of the learning algorithm, all vectors being classified by a node have the same label \(\lambda _p\), then the node is declared a leaf node, which is attributed to the class with index p. Another case when a node is declared leaf node is when all vectors to be separated by the node are identical, so there is no possible condition to separate the vectors. In this case, the label of the node is decided by the majority of labels, or if there is no majority, a label should be chosen from the present ones. In our application, this kind of rare leaves are labeled as tumor.

The separation of a finite set of data vectors always terminates in a finite number of steps. The maximum depth of the tree highly depends on the way of establishing the separation condition in each node. Our application uses an entropy based criterion to choose the separation condition. Whenever a node has to establish its separation criterion for a subset of vectors \(\overline{\mathbf {X}} \subseteq \mathbf {X}\) containing \(\overline{n}\) items with \(1<\overline{n}\le n\), the following algorithm is performed:

  1. 1.

    Find all those features which have at least 2 different values in \(\overline{\mathbf {X}}\).

  2. 2.

    Find all different values for each feature and sort them in increasing order.

  3. 3.

    Set a threshold candidate at the middle of the distance between each consecutive pair of values for each feature.

  4. 4.

    Choose that feature and that threshold, for which the entropy-based criterion

    $$\begin{aligned} E = \overline{n}_1 \log \frac{\overline{n}_1}{\overline{n}} + \overline{n}_2 \log \frac{\overline{n}_2}{\overline{n}} \end{aligned}$$
    (1)

    gives the minimum value, where \(\overline{n}_1\) (\(\overline{n}_2\)) will be the cardinality of the subset of vectors \(\overline{\mathbf {X}}_1\) (\(\overline{\mathbf {X}}_2\)), for which the value of the tested feature is less than (greater or equal than) the tested threshold value.

After having the BDT trained, it can be applied for the classification of test data vectors. Any test vector is first fed to the root node, which according to the stored condition and the feature values of the vector, decides towards which child node to forward the vector. This strategy is followed then by the chosen child node, and the vector will be forwarded to a further child. The classification of a vector terminates at the moment when it is forwarded to a leaf node of the tree. The test vector will be attributed to the class indicated by the labeling of the reached leaf node.

2.6 The Random Forest

Binary decision trees (BDT) were trained to separate negative voxels from positive ones. In case of the BRATS high-grade tumor data set, we had a total number of 276 million negative and 24 million positive voxels. As a first step, randomly selected 90% of the negative voxels were eliminated and the remaining 10% kept for the training and testing process. Training data sets for various forests were created via random selection of negative and positive voxels, using the parameter \(p_N\) that stood for the ratio of negative pixels in each set. Any such learning data set contained voxels from volume records with either even or odd index, so that they can be tested on the complementary part of the records. Each training set consisted of \(N_S=10^6\) samples. Another parameter of each forest consisted in the number of trees \(n_T\), which varied between 50 and 500. Each tree of a forest was trained with \(N_S/n_T\) samples that were randomly selected from the total number of voxels \(N_S\) assigned to the forest in question. Those samples that were not selected for the training of any tree in the forest, approximately 360,000 voxels, acted as out-of-bag (OOB) data and were used for primary testing, as recommended by Breiman in [23]. Testing on OOB data allowed us to preselect those forests that were likely to produce high accuracy, and discard those that were prone to more misclassifications. The best performing forests achieved 95–96% accuracy in labeling the OOB data.

All forests trained with data originating from volumes with even (odd) index were tested on all volumes indexed with odd (even) number. Forests were created using a great variety of parameter values (\(p_N\) and \(n_T\)). All 220 high-grade tumor volumes were fed to all valid forests, according to the rule that any trained forest was only tested on never seen data. Finally we established the parameter values that led to best overall accuracy.

2.7 Post-processing

A posterior relabeling scheme was implemented as follows. The input data of the post-processing step consisted in the labels provided by the random forest to all voxels in the volume. For each voxel, the number of tumor labeled neighbors (\(\nu _T\)) and the number of all neighbors (\(\nu _\mathrm {All}\)) were extracted, using a predefined cubic neighborhood. The final label of a voxel was set to tumor if and only if \(\nu _T/\nu _\mathrm {All} > \theta \). The overall optimal value of the threshold was established during the test and was found as \(\theta = 0.4\).

2.8 Evaluation of Accuracy

We employed the Dice score (\(\mathrm {DS}\)) as the main indicator of accuracy, defined as \(\mathrm {DS} = \frac{2\times \mathrm {TP}}{2\times \mathrm {TP}+\mathrm {FP}+\mathrm {FN}} \in [0,1]\), where \(\mathrm {TP}\), \(\mathrm {FP}\), and \(\mathrm {FN}\) stand for the number of true positives, false positives, and false negatives, respectively. Fine accuracy is reflected by \(\mathrm {DS}\) values close to 1, but in this brain tumor segmentation problem, DS values around 0.94 are considered ideal [21], due to inter-rater differences that are present in the ground truth. Further on, the sensitivity (or true positive rate, TPR), and specificity (or true negative rate), defined as \(\mathrm {TPR} = \frac{\mathrm {TP}}{\mathrm {TP}+\mathrm {FN}}\) and \(\mathrm {TNR} = \frac{\mathrm {TN}}{\mathrm {TN}+\mathrm {FP}}\), were used as secondary accuracy indicators, where \(\mathrm {TN}\) represents the number of true negatives.

3 Results and Discussion

All 220 high-grade tumor volumes from the BRATS 2016 were involved in the evaluation of the proposed methodology. Volumes with even (odd) index were tested on random forests trained only with data from odd (even) numbered volumes. Several random forests were trained, having the ratio of negative voxels \(p_N\) within their training data between 70% and 98%. Ratios lower than 70% led to too many false positives in case of any test volume.

For each of the test volumes, the ideal \(p_N\) ratio was identified. Which led to the highest Dice score. The histogram of these \(p_N\) values, presented in Fig. 2(a), shows us that the great majority of the volumes are best segmented for \(p_N\) ratios above 80%. Figure 2(b) exhibits the overall Dice score obtained for various values of the \(p_N\) ratio, and indicates that the highest overall Dice scores are obtained if \(87\%\le p_N\le 89\%\). Choosing various values for the number of trees in each forest had little impact on accuracy. Best Dice scores were obtained in case of \(n_T=125\).

The proposed post-processing makes detected positive and negative regions more compact, it eliminates small isolated homogeneous regions that are either negative or positive, and thus improves the accuracy in case of a great majority of the test volumes via reducing the number false positives and false negatives. Table 1 shows the most important overall accuracy measures. The overall Dice score rises by 6%, while the median by almost 7%.

Fig. 2.
figure 2

Finding the overall best value for the ratio of negative voxels in the training data \(p_N\): (a) histogram of ideal \(p_N\) values extracted for each of the 220 volumes tumor volumes; (b) the overall Dice score plotted against \(p_N\). This figure reflects the case of \(n_T=125\) decision trees in the random forest, and the Dice scores obtained without post-processing.

Table 1. Main overall accuracy parameters
Fig. 3.
figure 3

Dice score, sensitivity, and specificity values obtained for all 220 high-grade tumor volumes in case of \(n_T=125\) and \(p_N=88\%\), sorted in increasing order. The result of the random forest is exhibited on the left side, while the graph on the right side shows the final result after post-processing. The specificity is well above \(97\%\) in case of most volumes, which is very important if we do not want to generate several false alerts. Sensitivity values are comparable with the Dice scores reported in Table 1: the overall mean and median sensitivity is approximately 83% and 86%, respectively, after post-processing.

Fig. 4.
figure 4

The effect of post-processing on overall average Dice score: the neighborhood-based validation of positive and negative voxels was separately investigated. Overall Dice score is plotted against threshold \(\theta \). The overall Dice score without post-processing is visible at \(\theta = 0\%\) of the graph indicating the positives and at \(\theta = 100\%\) of the graph indicating the negatives. Both curves have wide ranges of \(\theta \) that lead to improved overall Dice score.

Figure 3 exhibits the main accuracy indicators for each individual volume, before and after post-processing. The indicator values were sorted in increasing order for better visibility. The final overall mean values for specificity and sensitivity are 98.6% and 83.1%, respectively. There is approximately 10% of the volumes that were segmented with lower accuracy, characterized by a Dice score below 60%, while almost two thirds of the volumes received a Dice score above the overall mean.

Figure 4 presents the separate effect of each component of the post-processing, namely validating the positive and negative voxels after classification. Both curves plot the achieved overall mean Dice score against threshold \(\theta \). In case of positive voxels, the mean Dice score rises together with \(\theta \) and has a maximum somewhere around \(\theta =0.4\), and rapidly drops for higher values of the threshold. In case of negative voxels, the mean Dice score rapidly rises together with \(\theta \) and has a maximum somewhere around \(\theta =0.35\), and slightly drops for higher values of the threshold. Both curves have wide ranges of the threshold \(\theta \) that lead to improved overall Dice score. Our choice was to validate both positive and negative pixels using the threshold value \(\theta =0.4\).

Fig. 5.
figure 5

The effect of post-processing in case of \(n_T=125\) and \(p_N=88\%\): (a) Dice scores after post-processing plotted against Dice scores without post-processing; (b) Dice scores without post-processing plotted against actual tumor size; (c) Dice scores after post-processing plotted against actual tumor size. The straight lines in (b) and (c) indicate the linear trend of Dice scores, extracted with linear regression.

Fig. 6.
figure 6

Sixteen consecutive slices from an identified tumor. Black pixels represent true positives, red and blue ones stand for false positives and false negatives, respectively. The Dice score for this volume was 0.936. (Color figure online)

Fig. 7.
figure 7

Sixteen consecutive slices from an identified tumor. Black pixels represent true positives, red and blue ones stand for false positives and false negatives, respectively. The Dice score for this volume was 0.75 (Color figure online)

Figure 5 presents the effect of the proposed post-processing. Figure 5(a) plots the individual Dice scores for each volume after post-processing vs. before post-processing, indicating that post-processing had a significant beneficial effect in a great majority of the cases, and only 6% of the volumes were slightly pushed toward worse accuracy. Figures 5(b) and (c) plot the individual Dice scores obtained for each volume vs. the size of the tumor, without post-processing and with post-processing, respectively. The identified linear trends show that the strongest effect of post-processing occurs in case of small tumors.

Figure 6 exhibits the segmentation result of 16 consecutive slices from a high-grade tumor volume. Most tumor pixels were accurately identified in this case, as we can only see a few false negatives beside the true positives indicated by black pixels. This is one of the cases that were segmented with high accuracy. A worse, but still acceptable case is shown in Fig. 7.

The total runtime of the testing process, performed on a single volume ranges between 30 and 45 s, when executed on a single core of a PC with i7 processor running at 3.4 GHz frequency. Most operations can be easily implemented to run in parallel on all cores, making the processing even more efficient.

The overall mean Dice score above 80% allows us to detect the presence of the tumor in a great majority of cases. However, the accuracy indicators can be further improved the following ways:

  1. 1.

    Involving further morphological features into the feature vector, to collect much more information from the neighborhood of each pixel.

  2. 2.

    Including more sophisticated features, for example those obtained via wavelet transform, or employing fractal features.

  3. 3.

    Employing an effective feature selection scheme.

  4. 4.

    Implementing a more complex post-processing that investigates the contiguous ensembles of detected tumor voxels and discard small ones.

An objective comparison with existing methods enumerated in Sect. 1 is not an easily accomplishable task, as not all of them used the BRATS data set for evaluation, and even those which did, they did not evaluate all the 220 available volumes.

4 Conclusions

This paper we presented an automatic tumor detection and segmentation algorithm employing random forests of binary decision trees, in its preliminary stage of implementation. The proposed methodology already reliably detects tumors of 2 cm diameter. It is likely to obtain finer segmentation accuracy in the future via implementing some of the above mentioned further ideas. We will also concentrate on differentiating among the parts of the whole tumor (edema, tumor core, necrosis, active tumor), according to the grand truth provided by the BRATS data set.