1 Introduction

Gliomas account for around 45 % of primary brain tumors. The prognosis of gliomas depends on multiple factors, such as age, histopathology, tumor size and location, extent of resection, and type of treatment. Most deadly gliomas are classified by World Health Organization (WHO) as Grade III (anaplastic astrocytoma, and anaplastic oligodendroglioma) and Grade IV (glioblastoma multiforme), according to the histopathological subtypes. These are referred to as high-grade gliomas (HGG) with fast growing rate and diffusive infiltration. More importantly, HGG are characterized by a very poor prognosis but the outcome (i.e., the overall survival time) differs significantly from case to case. This can be explained by a large variation in tumor characteristics (e.g., location, and cancer cell type). Yet still challenging, pre-operative prediction of HGG outcome is of great importance and is highly desired by clinicians.

Multimodal presurgical brain imaging has been gaining more solid ground in surgical planning. In turn, this produces abundant multimodal neuroimaging information for potential HGG outcome prediction. For instance, in [1], multiple features, reflecting intensity distributions of various magnetic resonance imaging (MRI) sequences, were extracted to predict patient survival time and molecular subtype of glioblastoma. In [2], morphologic features and hemodynamic parameters, along with clinical and genomic biomarkers, were used to predict the outcome of glioblastoma patients. In [3], data mining techniques based on image attributes from MRI produced better HGG outcome prediction performance, than that solely using histopathologic information. Although promising, all these studies shared a first key limitation: they overlooked the relationship between brain connectivity and the outcome. In other words, they mainly relied on extracting information from the tumor region (i.e., tumor and necrotic tissue) or around it (e.g., edema region). This excludes the majority of the “normal appearing” brain tissue — which most likely has been also affected by the tumor. Based on all these information, our hypothesis is rooted in the fact that HGG highly diffuses along white matter fiber tracts, thus altering the brain structural connectivity. Consecutively, altered structural connectivity will lead to functional connectivity. Moreover, the mass effect, edema and abnormal neovascularization may further change brain functional and structural connectivities. Therefore, connectomics data may present useful and complementary information to intensity-based survival time prediction. A second key limitation of previous studies is that none of them compared the prediction performance when using conventional clinical data versus when using advanced connectomics data from multimodalities. We aimed to address both of the limitations.

Conventional neuroimaging computing methods, such as graph-theory-based complex network analysis, have demonstrated promising value in disease classification and biomarker detection [4]. However, to our best knowledge, no previous study has utilized brain connectome to predict the treatment outcome for HGG patients. In this work, we hypothesize that gliomas have ‘diffusive effects’ to both structural and functional connectivities, involving both white matter and grey matter, which could alter the inherent brain connectome and lead to abnormalities in network attributes. Hence, we devise an HGG outcome prediction framework, by integrating, extracting and selecting the best set of advanced brain connectome features.

Specifically, we retrospectively divided the recruited HGG patients into short and long survival time groups based on the follow-up of a large number of glioma patients. Our method comprises the following key steps. First, we construct both functional and structural brain networks. Second, we extract structural and functional connectomics features using diverse network metrics. Third, we propose a novel framework to effectively reduce the dimension of connectomics features by step-wisely selecting the most discriminative features in a gradual, three-stage strategy. Finally, we use support vector machine (SVM) to predict the outcome.

2 Method

Figure 1 illustrates the proposed pipeline to automatically predict the survival time for HGG patients in three steps. In Sect. 2.1, we introduce the construction of brain networks based on the resting-state functional MRI (rs-fMRI) and diffusion tensor imaging (DTI). In Sect. 2.2, we describe how to calculate network properties based on graph theory using a binary graph and a weighted graph. As we add up clinical information, such as tumor location, size and histopathological types, we generate a long stacked feature vector. In Sect. 2.3, we propose a three-stage feature selection algorithm to remove redundant features. Finally, we apply an SVM on the selected features to predict the treatment outcome.

Fig. 1.
figure 1

(K: degree; L: shortest path length; C: clustering coefficient; B: betweenness centrality; \( {\text{E}} \) g: global efficiency; \( {\text{E}}l \): local efficiency; OS: overall survival. For details, please see Sect. 2.2).

Proposed pipeline of treatment outcome prediction for high-grade glioma patients.

2.1 Brain Network Construction

Subjects. A total of 147 HGG patients were originally included in this study. We excluded patients lacking either rs-fMRI or DTI data. Patients with inadequate follow-up time, or died due to other reasons (e.g., road accident) were also excluded. Those with significant image artifacts and excessive head motion, as suggested by the following data processing, were also removed. All the images were checked by three experts to quantify the deformation of brain caused by tumor. Those with huge deformation, for which all three experts reached an agreement, were removed too. Finally, 34 patients who died within 650 days after surgery were labeled as “bad” outcome group, and the remaining 34 patients who survived more than 650 days after the surgery were classified into the “good” outcome group. The reason of using 650 days as a threshold is that the two-year survival rate for malignant glioma patients was reported to be 51.7 % [5]. We slightly adjusted the threshold to balance the sample size in the two groups.

Imaging. In addition to the conventional clinical imaging protocols, research-dedicated whole-brain rs-fMRI and DTI data were also collected preoperatively. The rs-fMRI has TR (repetition time) = 2 s, number of acquisitions = 240 (8 min), and a voxel size = \( 3.4 \times 3.4 \times 4 \) mm3. The DTI has 20 directions, voxel size = \( 2 \times 2 \times 2 \) mm3, and multiple acquisition = 2.

Clinical Treatment and Follow-up. All patients were treated according to clinical guideline for HGGs, including a total or sub-total resection of tumor entity during craniotomy and radio- and chemo-therapy after surgery. They were followed up in a scheduled time, e.g., 3, 6, 12, 24, 36, 48 months after discharging. Any vital event, such as death, was reported to us to let us calculate the overall survival time.

Image Processing. SPM8 and DPARSF [6] were used to preprocess rs-fMRI data and build functional brain networks. FSL and PANDA [7] were used to process the DTI data and build structural brain networks. Multimodal images were first co-registered within subject and then registered to the standard space. All these processes are following the commonly accepted pipeline and thus not detailed here.

Network Construction. For each subject, two types of brain networks were constructed (see descriptions below). For each network, we calculated graph theory-based properties from both binary and weighted graphs.

  • Structural Brain Network. We parcellated each brain into 116 regions using Automated Anatomical Labeling (AAL) atlas, by warping the AAL template to each individual brain. The parcellated ROIs in each subject were used as graph nodes. The weighted network \( N_{s}^{w} \) can be constructed by calculating the structural connectivity strength \( w_{s} \left( {i,j} \right) = \frac{2}{{S_{i} + S_{j} }}\sum\nolimits_{{i,j\varepsilon {\text{N}}}} {l(f)} \) for the edge connecting nodes i and j \( (i,j \in N;i \ne j) \), where \( N \) is the set of all 116 nodes in the network, \( l\left( f \right) \) represents the number of fibers linking each pair of the ROIs, and \( S_{i} \) denotes the cortical surface area of node i. The sum \( S_{i} + S_{j} \) corrects the bias caused by different ROI sizes. The binary structural network \( N_{s}^{b} \) can be generated by setting the weight of the top 15 % edges to 1 after ranking the \( w_{s} \) descending, and the others to 0 [8].

  • Functional Brain Network. Using the same parcellation, we extracted the mean BOLD time series \( TS_{i} (i \in N) \) of each brain region. Then, we defined the functional connectivity strength \( w_{f} \left( {i,j} \right) \) in the functional network by computing Pearson’s correlations between two BOLD time series in each pair \( (i,j) \) of 116 brain regions: \( w_{f} \left( {i,j} \right) = Corr\left( { TS_{i} , TS_{j} } \right) (i,j \in N;i \ne j) \), thus generating a weighted functional brain network \( N_{f}^{w} \). The binary functional network \( N_{f}^{b} \) can be generated in the same way as described above.

2.2 Feature Extraction

Graph theory-based complex network analysis is used to independently extract multiple features from four networks (\( N_{s}^{w} \), \( N_{s}^{b} \), \( N_{f}^{w} \), \( N_{f}^{b} \)) for each subject. Since various graph metrics can reflect different organizational properties of the networks, we calculated four types of these metrics (i.e., degree, small-world properties, network efficiency properties, and nodal centrality) [9], which are detailed below.

  • Degree. In each binary network, \( N_{s}^{b} \) and \( N_{f}^{b} \), the node i’s degree, \( k_{i} \), counts the number of edges linked to it. In each of the weighted networks, \( N_{s}^{w} \) and \( N_{f}^{w} \), the node degree is defined by \( k_{i} = \mathop \sum \limits_{j \in N, j \ne i} w_{*} (i,j) \), \( {\text{where }}_{*} \) refers to s or f.

  • Small-world property. This type of property is originally used to describe small-world, and can also be separately calculated for each node, including the clustering coefficient \( C_{i} \) (which measures local interconnectivity of the node i’s neighbors) and the shortest path length \( L_{i} \) (which measures overall communication speed between node i and all other nodes). Specifically, in \( N_{s}^{b} \) and \( N_{f}^{b} \), \( C_{i} \) is calculated through dividing the number of edges connecting \( i \)’s neighbors by all possible edges linking \( i \)’s neighbors (i.e., \( k_{i} (k_{i} - 1)/2 \)). On the other hand, in \( N_{s}^{w} \) and \( N_{f}^{w} \), \( C_{i} \) is calculated by a normalized sum of the mean weight of two participating edges in all triangles with node i as a vertex. \( L_{i} \) is defined as the averaged minimum number of edges from node i to all other nodes in \( N_{s}^{b} \) or \( N_{f}^{b} \), and the averaged minimum sum of weighted edges in \( N_{s}^{w} \) and \( N_{f}^{w} \).

  • Network efficiency. The efficiency property of a network measures how efficiently information is exchanged within a network, which gives a precise quantitative analysis of the networks’ information flow. The global efficiency, \( E_{global} (i) \), is defined as the sum of the inverse of the shortest path length between node i and all other nodes. The local efficiency, \( E_{local} (i) \), represents the global efficiency of a subgraph, which consists of all node i’s neighbors. The binary and weighted versions of shortest path length can result in binary and weighted efficiency metrics.

  • Nodal centrality. Nodal centrality, \( \varvec{B}_{\varvec{i}} \), quantifies how important of node i is in the network. A node with high \( \varvec{B}_{\varvec{i}} \) acts as a hub in the network. It is calculated as \( \varvec{B}_{\varvec{i}} = \mathop \sum \nolimits_{{\varvec{m} \ne \varvec{n} \ne \varvec{i} \in \varvec{N}}} \frac{{\varvec{L}_{{\varvec{mn}}} (\varvec{i})}}{{\varvec{L}_{{\varvec{mn}}} }} \), where \( \varvec{L}_{{\varvec{mn}}} \) is the total number of shortest paths from node m to node n, and \( \varvec{L}_{{\varvec{mn}}} (\varvec{i}) \) is the number of these shortest paths passing through node i. Since \( \varvec{L}_{{\varvec{mn}}} (\varvec{i}) \) and \( \varvec{L}_{{\varvec{mn}}} \) have both binary and weighted versions, \( \varvec{B}_{\varvec{i}} \) is calculated for each binary and weighted network.

These network metrics, which will be used as connectomics features, were computed as part of features using GRETNA [8]. We also add to them 13 clinical features (age, gender, tumor size, WHO grade, histopathological type, main location, epilepsy or not, specific location in all lobes, and hemisphere of tumor tissue). Therefore, a total of 2797 (6 metrics \( \times \) 4 networks \( \times \) 116regions \( + \) 13 clinical features) features for each subject were used. The number of features is much greater than that of samples (68 subjects). This is quite troublesome for machine learning-based methods because of the overfitting problem and the interference from noise. Thus, we design a three-stage feature selection framework, as specified below, to select the most relevant features for our classification (i.e., prediction) problem.

2.3 Three-Stage Feature Selection

To identify a small number of features that are optimal for treatment outcome prediction, we propose a three-stage feature selection method to gradually select the most relevant features.

  • First stage. We roughly select features that significantly distinguish the two outcome groups (i.e., “bad” and “good”) using two sample t-tests with \( p < 0.05 \).

  • Second stage. RELIEFF [10] is used to rank the remaining features \( {\mathbf{X}} \) and compute their weights. RELEFF is an algorithm, which estimates feature quality in classification. Many heuristic measures of feature quality usually suppose the independence of features, while actually they may be dependent. RELIEFF can correctly estimate the quality of each feature in classification problem with strong dependency assumption among features. The main idea of RELIEFF is to estimate how well each feature distinguishes itself from its neighbors that belong to other classes. Given a randomly selected feature \( R \) from the feature set \( {\mathbf{A}} \), RELIEFF searches for its k-nearest neighbors first. Basically, it defines a cohort of neighbors as belonging to the same class of \( R \) (called nearest hit \( H \)), and also other neighbors as part of a different class (called nearest miss \( M \)). Then, it computes and updates the quality estimation \( {\text{W}}({\mathbf{A}}) \) for all features based on the distance from \( R \) to \( H \) and also distance from \( R \) to \( M \). Therefore, the features can be descendingly ranked in \( {\mathbf{X}} \) based on \( {\text{W}}({\mathbf{A}}) \).

  • Third stage. A sequential backward selection [11] strategy was applied to carefully select a small group of significant features from \( {\mathbf{X}} \). Then, an inner SVM was wrapped into the feature selection framework to evaluate the predictive accuracy for candidate subset of features using a leave-one-out cross validation. The sequential backward selection is a feature selection strategy that sequentially removes one feature from back to front from \( {\mathbf{X}} \). The classification accuracy is recorded for the remaining subset of \( {\mathbf{X}} \). When no feature is left, the selection process stops and a subset of \( {\mathbf{X}} \) with the highest classification accuracy is selected.

Next, the selected features are fed into an outer SVM with a leave-one-out cross validation to build the prediction model. To test which features are more useful for outcome prediction, we conducted five experiments, where different features were combined in different ways for classification (see Sect. 3).

3 Results

The outcome prediction accuracy of our proposed prediction framework is displayed in Table 1. Using only clinical features, the prediction accuracy only reaches 63.2 %. Notably, when using only the features from functional networks, the accuracy increases to 72 %. As we combine structural network features with functional ones, the classification rate reaches its apex (75 %, better than when only using clinical features). However, no improvement was noted when clinical features were further added, which means that the information contained in clinical features is somehow represented already in the brain functional and structural networks using graph theory. In order to test the results that we learned were random or not, we also did 30 times permutation test. The p-value of permutation test was 0.015 and the mean accuracy of 30 times permutation test was 49.1 %, which means that our results can reflect the intrinsic properties of the data to some degree. The most significant features shown in Table 2 (also drawn in Fig. 2) are those that were selected by our three-stage feature selection strategy more than 60 times out of 68 trials.

Table 1. Prediction accuracy of using different sets of features.
Table 2. The most useful features for outcome prediction.
Fig. 2.
figure 2

Discriminative ROIs with high predictive power in functional and structural brain networks, respectively.

As reported in many previous studies, the most useful regions for HGG outcome prediction are highly correlated with movement, cognition, emotion, language and memory functions. The deteriorated structural and functional connections to these regions could influence the survival time. The most frequently selected ROIs from functional network are those in the cerebellum, which have dense functional connectivity to the neocortex and are closely associated with motor and cognitive functions. However, the most frequently selected ROIs from structural network are mostly located in the cortex and less overlapped with each other, which may indicate that the structural network is easily affected by brain tumors.

4 Conclusion and Future Works

In this paper, we have showed that complex brain network analysis, which is based on graph theory, is a powerful tool for treatment outcome prediction for high-glioma patients. Our findings highlighted the relevance of integrating functional and structural brain connectomics for HGG outcome prediction. Although the relationship between structural and functional brain networks is still poorly understood, our prediction framework remarkably benefitted from the use of brain connectomics for prognosis evaluation. In future works, we will incorporate the global graph properties (e.g., the averaged clustering coefficient, or network efficiency across all brain regions) as new features. In such case, individual heterogeneity of tumor characteristics can be better addressed. Also, more advanced graph metrics, e.g., assortativity, modularity, and rich-club value, can be taken into account for a more comprehensive network measurement. Moreover, intraoperatively derived features, e.g., extension of tumor resection, can also be integrated as important prognostic predictors.