Networkbased support vector machine for classification of microarray samples
 5.5k Downloads
 44 Citations
Abstract
Background
The importance of networkbased approach to identifying biological markers for diagnostic classification and prognostic assessment in the context of microarray data has been increasingly recognized. To our knowledge, there have been few, if any, statistical tools that explicitly incorporate the prior information of gene networks into classifier building. The main idea of this paper is to take full advantage of the biological observation that neighboring genes in a network tend to function together in biological processes and to embed this information into a formal statistical framework.
Results
We propose a networkbased support vector machine for binary classification problems by constructing a penalty term from the F_{∞}norm being applied to pairwise gene neighbors with the hope to improve predictive performance and gene selection. Simulation studies in both low and highdimensional data settings as well as two real microarray applications indicate that the proposed method is able to identify more clinically relevant genes while maintaining a sparse model with either similar or higher prediction accuracy compared with the standard and the L_{1} penalized support vector machines.
Conclusion
The proposed networkbased support vector machine has the potential to be a practically useful classification tool for microarrays and other highdimensional data.
Keywords
Support Vector Machine Cancer Gene Classification Error Neighboring Gene Heavy WeightList of abbreviations used
 SVM
support vector machine
 STDSVM
standard support vector machine
 L1SVM
L1penalized support vector machine
 LP
linear programming
 PD
Parkinson's disease
 BC
Breast cancer
 KEGG
Kyoto Encyclopedia of Genes and Genomes
 PPI
protein protein interaction
Background
The past two decades have witnessed rapid advances in gene expression profiling with the microarray technology, which not only brighten the prospect of deciphering the complexity of disease genesis and progression at the genomic level, but also revolutionize the diagnostic, therapeutic, and prognostic approaches. Up to recently, diagnostic classification and prognostic assessment have been based on conventional clinical and pathological risk factors, such as patient age and tumor size, many of which are believed to be secondary manifestation [1]. The advent of microarray technology allows researchers to explore primary disease mechanisms by comparing gene expression profiles for malignant and normal cells. The regularity and aberration in the expression patterns of certain genes shed light on their functions and pathological importance [2]. Studies that seek to identify gene markers to refine diagnostic classification and improve prognostic prediction in the context of gene expression data have enriched the literature [3, 4, 5]. In recent years, researchers have realized that gene markers identified from microarrays drawn from difierent studies on the same disease across similar cohorts lack consistency [6, 7]. A possibly more effective means to resolve this problem is to employ a networkbased approach, that is, to identify markers as gene subnetworks, defined as groups of functionally related genes based on a gene network, instead of treating individual genes as completely independent and identical a priori as in most existing approaches [1]. A novel networkbased approach proposed recently [1, 8] can be summarized as follows: (1) randomly searching subnetworks and assigning a score to each subnetwork that characterizes the subnetworkwise gene expression level; (2) identifying significant subnetworks that can well discriminate the clinical outcome; (3) constructing a classifier based on the significant subnetworks with a conventional statistical tool, such as logistic regression. Essentially such a networkbased approach aggregates gene expression data at the subnetwork level and then identifies and utilizes some significant subnetworks. It has been shown that such a networkbased approach not only improves predictive performance and reproducibility, but also sheds biological insights into molecular mechanisms underlying the clinical outcome. However, the above method is largely heuristic without a formal statistical framework; more importantly, it involves a random search over subnetworks, leading to possibly different results from different runs with no guarantee of the optimality of the final result. Because of the everincreasing popularity of penalization methods for highdimensional data, we propose a novel networkbased penalty to be used with the hinge loss, leading to a networkbased support vector machine. While maintaining some desirable properties of support vector machine (SVM) with the hinge loss function, the networkbased penalty directly integrates a biological network to realize more effective variable selection, as compared with generic methods, such as the standard SVM (STDSVM) or L_{1}penalized SVM (L1SVM).
The support vector machine (SVM) is one of the most popular supervised learning techniques with wideranging applications [9, 10]. In particular, previous studies have demonstrated its superior performance in gene expression data analysis, especially its ability to handle high dimensional data [11, 12]. Nevertheless, with categorical predictors, both the STDSVM and the L1SVM may have some shortcomings. Zou and Yuan [13] applied the concept of grouped variable selection and developed an F_{∞}norm penalized SVM to realize simultaneous selection/elimination of all the features derived from the same categorical factor (or a group of variables). Their numerical examples showed that the F_{∞}norm SVM outperformed the L1SVM in factorwise variable selection. We extend the idea of variable grouping to gene networks: rather than grouping all the dummy variables created from the same categorical factor, we treat two neighboring genes in a network as one group. The networkbased penalty is constructed as the sum of the F_{∞}norms being applied to the groups of neighboringgene pairs. With the hinge loss penalized by such a networkbased penalty as our objective function, we obtain our networkbased SVM. The later sections are organized as follows. We begin with a brief review of the SVM, and then introduce our proposed networkbased SVM. We evaluate its performance by simulation studies in both low dimensional and high dimensional data settings as well as two real data applications. The last section concludes the paper with a brief summary.
Methods
Existing methods
where the subscript "+" denotes the positive part, i.e., z_{+} = max{z, 0}, ${\Vert \beta \Vert}_{2}^{2}={\displaystyle {\sum}_{k\mathrm{=}1}^{p}{\left{\beta}_{k}\right}^{2}}$, and λ is the tuning parameter. The solution to (1) is the same as that to (2).
where ${\Vert \beta \Vert}_{1}={\displaystyle {\sum}_{k\mathrm{=}1}^{p}\left{\beta}_{k}\right}$. The L1SVM wins over the STDSVM when the true model is sparse, while the STDSVM is preferred if there are not many redundant noise features [16].
Zou and Yuan [13] pointed out the shortcoming of the L_{1}norm penalty: even though it encourages parsimonious models, it fails to guarantee successful models in cases of categorical predictors due to the fact that each dummy variable is selected independently. They applied the concept of grouped variable selection and proposed an F_{∞}norm SVM to realize simultaneous selection/elimination of features derived from the same factor so as to accomplish automatic factorwise variable selection. Suppose we have G factors F_{1},...,F_{ G }. From each factor F_{ g }, we generate a feature vector ${x}_{(g)}={({x}_{1}^{(g)},\cdots ,{x}_{j}^{(g)},\cdots ,{x}_{{n}_{g}}^{(g)})}^{T}$.
The most noteworthy property of the F_{∞}norm SVM is its guarantee of sparsity at the factor level. Due to the singularity property of the infinity norm:  β_{(g)}_{∞} is not differentiable at β_{(g)}= 0, β_{(g)}will be exactly zero if the regularization parameter λ is properly chosen [13]. Therefore, the F_{∞}norm SVM automatically eliminates factors that are completely irrelevant to the response, and thus achieves the goal of factorwise selection. The empirical evidence shows that the F_{∞}norm SVM often outperforms both the L1SVM and the STDSVM.
New method
Biological observations reveal that neighboring genes in a network tend to function together in biological processes. To incorporate this prior information, a networkbased SVM for binary classification is proposed to facilitate generating models that extract more biological insight from gene expression data. The penalty term that characterizes the network structure can be specified by implanting the F_{∞}norm into the context of known functional interrelationships among genes by considering each pair of the functionally related genes as one group.
Consider a gene network with S denoting the set of all edges, i.e., the pair of connected genes.
S = {(j_{1}, j_{2}) : gene j_{1} and gene j_{2} are connected}
Four properties of the penalty term are noteworthy. First, the regularization is performed at the level of grouped genes with each group containing two neighboring genes in the network. In the case of penalized linear regression, it has been proven that this penalty achieves the goal of eliminating both ${\beta}_{{j}_{1}}$ and ${\beta}_{{j}_{2}}$ simultaneously if (j_{1}, j_{2}) ∈ S [17]. The automatic selection of grouped features is due to the singularity of function max{a, b} [13]. This formulation satisfies our assumption that neighboring genes tend to (or not to) contribute to the same biological process at the same time. Second, the choice of the weight depends on the goal of shrinkage and influences the predictive performance. Consider a network comprised of several subnetworks, each with one regulator and ten target genes. Because of the singularity of function max(a, b) at a = b, the weighted penalty in the context of penalized regression, encourages $\left{\beta}_{{j}_{1}}\right/{w}_{{j}_{1}}=\left{\beta}_{{j}_{2}}\right/{w}_{{j}_{2}}$[17]. Here we examine three weight functions in particular: w_{ k }= 1, w_{ k }= $\sqrt{{d}_{k}}$, and w_{ k }= d_{ k }, where gene k has d_{ k }direct neighbors. The new method encourages $\left{\beta}_{{j}_{1}}\right=\left{\beta}_{{j}_{2}}\right$ if w_{ k }= 1, $\frac{\left{\beta}_{{j}_{1}}\right}{\sqrt{{d}_{{j}_{1}}}}=\frac{\left{\beta}_{{j}_{2}}\right}{\sqrt{{d}_{{j}_{2}}}}$ if w_{ k }= $\sqrt{{d}_{k}}$, and $\frac{\left{\beta}_{{j}_{1}}\right}{{d}_{{j}_{1}}}=\frac{\left{\beta}_{{j}_{2}}\right}{{d}_{{j}_{2}}}$ if w_{ k }= d_{ k }. Therefore, heavier weights (from w_{ k }= 1, w_{ k }= $\sqrt{{d}_{k}}$, to w_{ k }= d_{ k }) favor genes with more direct neighbors to have larger coefficient estimates; in other words, heavier weights relax the shrinkage effect for those regulators, which are known to be biologically more important. Due to this property, the choice of a heavy weight, as a simple strategy, enables us to alleviate the bias in the coefficient estimates from the penalization method and possibly improve the p predictive performance. Our default weight is w_{ k }= $\sqrt{{d}_{k}}$. The weight, considered as another tuning parameter, can be determined from crossvalidation or an independent validation data set, though we do not consider it here. Third, the penalty term, under certain conditions, tends to encourage a grouping effect, where highly correlated predictors tend to have similar coefficient estimates [17, 18, 19, 20]. Fourth, the penalty is linear, which allows the solution to be found by the linear programming (LP) technique that is computationally convenient.
and ${\beta}_{j}={\beta}_{j}^{+}{\beta}_{j}^{}$, in which ${\beta}_{j}^{+}$ and ${\beta}_{j}^{}$ denote the positive and negative parts of β_{ j }. The calculation of the new method can be easily implemented by the R package lpsolve, so is the computation of the L1SVM. The R package e1071 (with linear kernel) is used to obtain the solution to the STDSVM.
Results and discussion
Simulation
We conducted several simulation studies to numerically evaluate the performance of the networkbased SVM along with the STDSVM and L1SVM. The simulation setups were similar to those in [18]. We started from a simple network consisting of 5 subnetworks, each having a regulator gene t (t = 1,...,5) that regulated 10 target genes, leading to a total of 55 genes (p = 55). We assumed that two out of the five subnetworks were informative; that is, the coefficients of 22 genes were nonzero and thus informative to the outcome, while the remaining 33 noise genes had no effect on the outcome. We generated a simulated data set by the following steps:

Generate the expression level of regulator gene t, X_{ t }~ N (0, 1), t = 1,..., 5, independently.

Assume that the expression level of regulator gene t and each of its regulated genes follow a bivariate normal distribution with correlation 0.7. Thus, the expression level of each target gene regulated by gene t, ${X}_{l}^{(t)}$ ~ N(0.7X_{ t }, 0.51), l = 1,..., 10 and t = 1,..., 5.

Generate the outcome Y from a logistic regression model: Logit (Pr(Y = 1X)) = X^{ T }β + β_{0}, β_{0}= 2, where X is the vector of the expression levels of all the genes, and coefficient vector $\beta =({\beta}_{1}^{(1)},\mathrm{...},{\beta}_{10}^{(1)},\mathrm{...},{\beta}_{1}^{(5)},\mathrm{...},{\beta}_{10}^{(5)})$.
 1..$\beta =(5,\underset{10}{\underset{\u23df}{\frac{5}{\sqrt{10}},\cdots ,\frac{5}{\sqrt{10}}}},5,\underset{10}{\underset{\u23df}{\frac{5}{\sqrt{10}},\cdots ,\frac{5}{\sqrt{10}}}},0,\cdots ,0).$
 2..$\beta =(5,\underset{10}{\underset{\u23df}{\frac{5}{\sqrt{10}},\cdots ,\frac{5}{\sqrt{10}}}},3,\underset{10}{\underset{\u23df}{\frac{3}{\sqrt{10}},\cdots ,\frac{3}{\sqrt{10}}}},0,\cdots ,0).$
 3..$\beta =(5,\underset{7}{\underset{\u23df}{\frac{5}{\sqrt{10}},\cdots ,\frac{5}{\sqrt{10}}}},\frac{5}{\sqrt{10}},\frac{5}{\sqrt{10}},\frac{5}{\sqrt{10}},3,\underset{7}{\underset{\u23df}{\frac{3}{\sqrt{10}},\cdots ,\frac{3}{\sqrt{10}}}},\frac{3}{\sqrt{10}},\frac{3}{\sqrt{10}},\frac{3}{\sqrt{10}}0,\cdots ,0).$
 4..$\beta =(5,\underset{6}{\underset{\u23df}{\frac{5}{\sqrt{10}},\cdots ,\frac{5}{\sqrt{10}}}},\underset{4}{\underset{\u23df}{\frac{5}{\sqrt{10}},\cdots ,\frac{5}{\sqrt{10}}}},3,\underset{6}{\underset{\u23df}{\frac{3}{\sqrt{10}},\cdots ,\frac{3}{\sqrt{10}}}},\underset{4}{\underset{\u23df}{\frac{3}{\sqrt{10}},\cdots ,\frac{3}{\sqrt{10}}}},0,\cdots ,0).$
It was similar to but more extreme than scenario 3.
Simulation results for p = 55. The simulation results were averaged over 100 runs for p = 55 (22 informative and 33 noise genes).
Test Error (SE)  # False Negative (SE)  Model Size (SE)  

Scenario  Method  n = 50  n = 100  n = 50  n = 100  n = 50  n = 100 
1  STD  0.122 (0.002)  0.096 (0.001)  0.0 (0.0)  0.0 (0.0)  55.0 (0.0)  55.0 (0.0) 
L1  0.134 (0.003)  0.094 (0.002)  13.1 (0.3)  10.9 (0.4)  12.3 (0.6)  15.3 (0.7)  
New (w = 1)  0.156 (0.003)  0.105 (0.002)  9.3 (0.4)  2.4 (0.3)  17.0 (0.6)  24.3 (0.6)  
New (w = $\sqrt{d}$)  0.111 (0.003)  0.068 (0.002)  1.0 (0.3)  0.1 (0.1)  24.7 (0.5)  25.1 (0.4)  
New (w = d)  0.081 (0.002)  0.059 (0.002)  0.0 (0.0)  0.0 (0.0)  28.6 (0.8)  28.2 (0.8)  
2  STD  0.121 (0.002)  0.099 (0.001)  0.0 (0.0)  0.0 (0.0)  55.0 (0.0)  55.0 (0.0) 
L1  0.133 (0.003)  0.096 (0.001)  13.6 (0.3)  11.1 (0.4)  11.4 (0.5)  15.1 (0.7)  
New (w = 1)  0.156 (0.003)  0.105 (0.002)  9.6 (0.4)  3.9 (0.3)  16.3 (0.7)  24.7 (0.6)  
New (w = $\sqrt{d}$)  0.121 (0.003)  0.075 (0.002)  3.0 (0.4)  0.3 (0.1)  22.3 (0.6)  25.2 (0.5)  
New (w = d)  0.083 (0.002)  0.064 (0.002)  0.0 (0.0)  0.0 (0.0)  28.6 (0.8)  29.0 (0.8)  
3  STD  0.162 (0.002)  0.138 (0.001)  0.0 (0.0)  0.0 (0.0)  55.0 (0.0)  55.0 (0.0) 
L1  0.166 (0.003)  0.131 (0.001)  13.9 (0.2)  11.0 (0.3)  11.2 (0.5)  16.6 (0.7)  
New (w = 1)  0.177 (0.003)  0.140 (0.002)  12.4 (0.4)  7.7 (0.4)  13.5 (0.6)  19.9 (0.8)  
New (w = $\sqrt{d}$)  0.164 (0.003)  0.127 (0.002)  4.4 (0.5)  1.2 (0.3)  21.5 (0.6)  26.3 (0.7)  
New (w = d)  0.137 (0.003)  0.114 (0.001)  0.4 (0.2)  0.1 (0.1)  29.8 (0.9)  33.2 (0.9)  
4  STD  0.189 (0.002)  0.157 (0.002)  0.0 (0.0)  0.0 (0.0)  55.0 (0.0)  55.0 (0.0) 
L1  0.186 (0.002)  0.155 (0.002)  14.2 (0.3)  10.5 (0.3)  11.5 (0.6)  18.1 (0.8)  
New (w = 1)  0.198 (0.003)  0.160 (0.002)  13.8 (0.3)  8.6 (0.4)  11.8 (0.5)  20.9 (0.9)  
New (w = $\sqrt{d}$)  0.190 (0.003)  0.147 (0.002)  7.2 (0.6)  1.8 (0.4)  18.8 (0.7)  30.1 (0.9)  
New (w = d)  0.163 (0.002)  0.139 (0.002)  0.2 (0.2)  0.03 (0.03)  32.2 (1.0)  34.8 (1.0) 
Coefficient estimates of selected informative genes for p = 55 and n = 100. The mean and the standard deviation (SD) of the coefficient estimates for selected informative genes were calculated from 100 runs.
L1  New (w = 1)  New (w = $\sqrt{d}$)  New (w = d)  

Scenario  β  Mean  SD  Mean  SD  Mean  SD  Mean  SD 
1  β_{1} = 5  0.53  0.29  0.04  0.04  0.27  0.26  0.67  0.35 
${\beta}_{1}^{(1)}=\frac{5}{\sqrt{10}}$  0.11  0.17  0.14  0.15  0.10  0.10  0.07  0.08  
β_{2} = 5  0.55  0.30  0.04  0.05  0.28  0.32  0.68  0.35  
${\beta}_{1}^{(2)}=\frac{5}{\sqrt{10}}$  0.08  0.15  0.18  0.15  0.11  0.09  0.08  0.08  
2  β_{1} = 5  0.76  0.33  0.09  0.06  0.34  0.16  0.91  0.40 
${\beta}_{1}^{(1)}=\frac{5}{\sqrt{10}}$  0.09  0.14  0.20  0.14  0.14  0.11  0.09  0.08  
β_{2} = 3  0.29  0.23  0.01  0.03  0.15  0.10  0.48  0.23  
${\beta}_{1}^{(2)}=\frac{3}{\sqrt{10}}$  0.08  0.12  0.11  0.13  0.07  0.08  0.04  0.04  
3  β_{1} = 5  0.51  0.39  0.03  0.07  0.41  0.70  0.95  0.34 
${\beta}_{1}^{(1)}=\frac{5}{\sqrt{10}}$  0.22  0.21  0.24  0.19  0.20  0.17  0.13  0.11  
${\beta}_{8}^{(1)}=\frac{5}{\sqrt{10}}$  0.01  0.07  0.01  0.11  0.03  0.21  0.04  0.12  
β_{2} = 3  0.26  0.27  0.01  0.04  0.15  0.30  0.52  0.27  
${\beta}_{1}^{(2)}=\frac{3}{\sqrt{10}}$  0.09  0.13  0.13  0.16  0.12  0.16  0.07  0.11  
${\beta}_{8}^{(2)}=\frac{3}{\sqrt{10}}$  0.001  0.07  0.004  0.06  0.01  0.05  0.01  0.07  
4  β_{1} = 5  0.40  0.38  0.03  0.06  0.48  0.80  0.97  0.43 
${\beta}_{1}^{(1)}=\frac{5}{\sqrt{10}}$  0.27  0.26  0.32  0.25  0.30  0.23  0.20  0.20  
${\beta}_{7}^{(1)}=\frac{5}{\sqrt{10}}$  0.04  0.12  0.02  0.14  0.11  0.24  0.09  0.16  
β_{2} = 3  0.23  0.29  0.004  0.01  0.21  0.45  0.56  0.30  
${\beta}_{1}^{(2)}=\frac{3}{\sqrt{10}}$  0.15  0.20  0.16  0.19  0.17  0.19  0.09  0.13  
${\beta}_{7}^{(2)}=\frac{3}{\sqrt{10}}$  0.03  0.08  0.002  0.10  0.05  0.18  0.06  0.15 
Simulation results for p = 550 or 1, 100. The simulation results were averaged over 100 runs for p = 550 or 1, 100 (22 informative and either 528 or 1,078 noise genes).
Test Error (SE)  # False Negative (SE)  Model Size (SE)  

Method  p = 550  p = 1, 100  p = 550  p = 1, 100  p = 550  p = 1, 100 
STD  0.305 (0.003)  0.354 (0.002)  0.0 (0.0)  0.0 (0.0)  550 (0.0)  1,100 (0.0) 
L1  0.218 (0.004)  0.235 (0.004)  16.6 (0.2)  17.1 (0.2)  16.1 (1.0)  19.2 (1.2) 
New (w = 1)  0.232 (0.003)  0.255 (0.004)  14.9 (0.3)  15.6 (0.3)  20.7 (1.1)  22.6 (1.4) 
New (w = $\sqrt{d}$)  0.202 (0.004)  0.221 (0.004)  5.7 (0.5)  6.7 (0.6)  32.6 (1.5)  34.6 (1.9) 
New (w = d)  0.170 (0.003)  0.180 (0.004)  0.7 (0.3)  1.3 (0.4)  82.6 (5.4)  98.9 (7.2) 
Applications to microarray data
To evaluate its performance in the real world, we applied the new method to two microarray gene expression data sets related to the Parkinson's disease (PD) [21] and breast cancer metastasis (BC) [1, 4] respectively.
Parkinson's disease
The data set includes the Parkinson's disease status and the expression levels of 22,283 genes from 105 patients (50 cases and 55 controls) [22]. We used the same network structure as [18]. The network combines 33 Kyoto Encyclopedia of Genes and Genomes (KEGG) regulatory pathways and contains a total of 1,523 genes and 6,865 edges. The data were randomly split into training (40 observations), tuning (20 observations), and test (45 observations) sets. The expression level of each gene was normalized to have mean 0 and standard deviation 1 across samples. The tuning parameter was identified from the tuning set and the performance of the method was evaluated on the test set by the mean classification error and its standard error averaged over 10 runs. Five methods were compared: STDSVM, L1SVM, networkbased SVM with w = 1, w = $\sqrt{d}$, and w = d. To obtain a final model based on the new method with w = $\sqrt{d}$, we combined, for each run, the previous tuning and test data as the new tuning set leading to a sample size as large as 65 observations, on which the classification errors were calculated for wideranging values of the tuning parameter. Then after 10 runs, we had an averaged classification error corresponding to each tuning parameter value. The value that generated the minimal averaged error was the one we selected to fit the final model to all the data. Note that the classification error rate from the final model was likely to be biased due to the double use of the data for training/tuning and test; the main purpose of fitting the final model was to see the selected genes at the end.
Parkinson's disease data: 1,070 genes. A total of 1,070 genes with SD of expression levels across the 105 samples ≥ 15 had network information. The classification error, number of selected disease genes, number of selected genes, and their standard errors (SE in parentheses) were obtained by averaging over 10 runs. Five disease genes were UBE1, PARK2, UBB, SEPT5, and SNCAIP.
Method  Error  # Disease Genes  # Genes 

STD  0.424 (0.016)  5.0 (0.0)  1,070.0 (0.0) 
L1  0.464 (0.021)  0.1 (0.1)  19.2 (3.8) 
New (w = 1)  0.476 (0.015)  0.1 (0.1)  24.9 (4.3) 
New (w = $\sqrt{d}$)  0.480 (0.026)  0.2 (0.1)  30.6 (5.2) 
New (w = d)  0.451 (0.028)  0.0 (0.0)  70.6 (14.1) 
Final Model    1.0  75.0 
First and secondorderneighbor subnetworks of Parkinson's disease data. The classification error, number of selected disease genes, number of selected genes, and their standard errors (SE in parentheses) were obtained by averaging over 10 runs. Eight disease genes were UBE1, PARK2, UBB, SEPT5, SNCAIP, GPR37, TH, and SNCA.
Network  Method  Error  # Disease Genes  # Genes 

PD1nbnet  STD  0.476 (0.023)  8.0 (0.0)  16.0 (0.0) 
L1  0.471 (0.017)  2.8 (0.7)  6.1 (1.5)  
New (w = 1)  0.462 (0.016)  3.4 (0.8)  7.3 (1.7)  
New (w = $\sqrt{d}$)  0.462 (0.014)  3.6 (0.7)  8.4 (1.5)  
New (w = d)  0.482 (0.015)  3.0 (1.2)  7.5 (2.1)  
Final Model    8.0  16.0  
PD2nbnet  STD  0.444 (0.016)  8.0 (0.0)  26.0 (0.0) 
L1  0.449 (0.017)  3.1 (0.5)  10.9 (2.1)  
New (w = 1)  0.464 (0.022)  5.3 (0.9)  13.2 (3.2)  
New (w = $\sqrt{d}$)  0.447 (0.023)  6.1 (0.8)  13.7 (2.7)  
New (w = d)  0.433 (0.016)  6.2 (0.9)  20.0 (2.5)  
Final Model    8.0  26.0 
We see the gains from employing the new method when narrowing down our focus on the PD1nbnet and PD2nbnet. For the PD1nbnet, w = 1 and w = $\sqrt{d}$ performed equally well. They had the smallest classification error and identified one more disease gene through a model slightly larger than the one obtained from L1SVM. The new method with w = d won over in the case of PD2nbnet with the best accuracy and most selected disease genes. The w = $\sqrt{d}$ ranked the second in terms of the prediction accuracy while detecting 3 more disease genes by a model with 3 more genes than that of the L1SVM. This means that the new method was able to identify more clinically relevant genes while keeping the same number of noise genes in the model as L1SVM. In both subnetworks, the final models included all the genes.
Breast cancer metastasis
The breast cancer metastasis data set [1, 4] contains expression levels of 8,141 genes for 286 patients, 106 of whom were detected to develop metastasis within a 5year followup after surgery. TP53, BRCA1, and BRCA2 are three human genes that belong to the class of tumor suppressor genes, which are known to prevent uncontrolled cell proliferation, and to play a critical role in repairing the chromosomal damage. Certain mutations of these genes lead to increasing risk of breast cancer. We explored the proteinprotein interaction (PPI) network previously used by [1]. The PPI network comprises 57,235 interactions among 11,203 proteins, obtained by assembling various sources of experimental data and curation of the literature [1]. We confined our analysis to the direct or firstorder neighbors (BC1nbnet) of the three cancer genes, and the subnetwork composed of two parts (BC2nbnet): the direct neighbors of TP53, and the secondorder neighbors of BRCA1 and BRCA2. We fit the final model and compared the four methods in terms of classification error, cancer genes selection, and model sparsity. The cancer genes are the 227 known or putative cancer genes with estimated mutation frequencies in cancer samples ([1]). A total of 294 genes that fell into the BC1nbnet had observed expression levels, among which were 40 cancer genes and 7 cancer genes (ABL1, JAK2, p53, PTEN, p14ARF, PTCH, and RB) with mutation frequencies larger than 0.10. The BC2nbnet was composed of 2,070 genes, 1,718 of them with observed expression levels, including 107 cancer genes. Besides the 7 included in BC1nbnet, 7 additional cancer genes (ACH, APC, EGFR, KIT, NICD, RAS, and CTNNB1) that had mutation frequencies larger than 0.10 belonged to BC2nbnet.
Subnetworks of breast cancer data. The BC1nbnet/BC2nbnet had 294/1,718 genes in total including 40/107 cancer genes, and 7/14 cancer genes with mutation frequencies larger than 0.10. The classification error, number of selected cancer genes with mutation frequencies larger than 0.10 (CALMF), number of selected cancer genes (CA), number of selected genes, and their standard errors (SE in parentheses) were obtained by averaging over 10 runs.
Network  Method  Error  # CALMF  # CA  # Genes 

BC1nbnet  STD  0.371 (0.014)  7.0 (0.0)  40.0 (0.0)  294.0 (0.0) 
L1  0.357 (0.014)  0.3 (0.2)  4.6 (0.8)  32.3 (4.8)  
New (w = 1)  0.360 (0.014)  0.4 (0.2)  3.6 (1.1)  25.0 (7.0)  
New (w = $\sqrt{d}$)  0.366 (0.012)  0.6 (0.3)  4.7 (1.2)  27.2 (5.2)  
New (w = d)  0.399 (0.012)  1.2 (0.2)  7.8 (1.7)  40.2 (6.5)  
Final Model    1.0  4.0  14.0  
BC2nbnet  STD  0.351 (0.014)  14.0 (0.0)  107.0 (0.0)  1,718.0 (0.0) 
L1  0.360 (0.006)  0.0 (0.0)  2.4 (0.9)  42.9 (11.8)  
New (w = 1)  0.374 (0.011)  0.1 (0.1)  1.9 (0.5)  51.4 (12.6)  
New (w = $\sqrt{d}$)  0.360 (0.007)  0.2 (0.1)  2.5 (0.7)  41.7 (9.2)  
New (w = d)  0.385 (0.021)  0.3 (0.2)  0.7 (0.3)  34.2 (10.3)  
Final Model    1.0  2.0  23.0 
Conclusion
The advancement in the microarray technology has enriched the tool kit of researchers to decipher the complexity of disease mechanisms at the genomic level. Studies have been widely conducted to identify genetic markers to better the diagnostic classification and prognostic assessment, largely by ignoring biological knowledge on gene functions and treating individual genes equally and independently a priori. The downside of such an endeavor has been realized; for example, gene markers identified across similar patient cohorts for the same disease in such a way often lack consistency. As a viable alternative, the networkbased approach has been gaining popularity. In addition to improving predictive performance and gene selection, the networkbased approach extracts more biological insights from highthroughput gene expression data. Here we have proposed a networkbased SVM, with a penalty term incorporating gene network information, as a practically useful classification tool for microarray data. Our simulation studies and two real data applications indicate that the proposed method is able to better identify clinically relevant genes and make accurate predictions.
Notes
Acknowledgements
YZ and WP were partially supported by NIH grants HL65462 and GM081535; XS supported by NIH grant GM081535 and NSF grants IIS0328802 and DMS0604394. We thank Dr Hongzhe Li and Dr Trey Ideker for providing the KEGG network and PPI network data respectively.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 1, 2009: Proceedings of The Seventh Asia Pacific Bioinformatics Conference (APBC) 2009. The full contents of the supplement are available online at http://www.biomedcentral.com/14712105/10?issue=S1
References
 1.Chuang HY, Lee EJ, Liu YT, Lee DH, Ideker T: Networkbased classification of breast cancer metastasis. Mol Syst Biol 2007, 3: 140. 10.1038/msb4100180PubMedCentralCrossRefPubMedGoogle Scholar
 2.Frolov AE, Godwin AK, Favorova OO: Differential gene expression analysis by DNA microarray technology and its application in molecular oncology. Mol Biol 2003, 37: 486–494. 10.1023/A:1025166706481CrossRefGoogle Scholar
 3.Yang TY: The simple classification of multiple cancer types using a small number of significant genes. Mol Diagn Ther 2007, 11: 265–275.CrossRefPubMedGoogle Scholar
 4.Wang Y, Klijin JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijervan Gelder ME, Yu J, Jatkoe T, Berns EM, Atkins D, Foekens JA: Geneexpression profiles to predict distant metastasis of lymphnodenegative primary breast cancer. Lancet 2005, 365: 671–679.CrossRefPubMedGoogle Scholar
 5.Xiong MM, Li WJ, Zhao JY, Li J, Boerwinkle E: Feature (gene) selection in gene expressionbased tumor classification. Mol Genet Metab 2001, 73: 239–247. 10.1006/mgme.2001.3193CrossRefPubMedGoogle Scholar
 6.EinDor L, Kela I, Getz G, Givol D, Domany E: Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 2005, 21: 171–178. 10.1093/bioinformatics/bth469CrossRefPubMedGoogle Scholar
 7.EinDor L, Zuk O, Domany E: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA 2006, 103: 5923–5928. 10.1073/pnas.0601231103PubMedCentralCrossRefPubMedGoogle Scholar
 8.Liu M, Liberzon A, Kong SW, Lai WR, Park PJ, Kohane IS, Kasif S: Networkbased analysis of affected biological processes in type 2 diabetes models. PLoS Genet 2007, 3: e96. doi:10.1016/S0140–6736(05)17947–1 doi:10.1016/S01406736(05)179471 10.1371/journal.pgen.0030096PubMedCentralCrossRefPubMedGoogle Scholar
 9.Cortes C, Vapnik V: Supportvector networks. Machine Learning 1995, 20: 273–297.Google Scholar
 10.Vapnik V: The Nature of Statistical Learning Theory. New York: Springer; 1995.CrossRefGoogle Scholar
 11.Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D: Knowledgebased analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 2000, 97: 262–267. 10.1073/pnas.97.1.262PubMedCentralCrossRefPubMedGoogle Scholar
 12.Furey T, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000, 16: 906–914. 10.1093/bioinformatics/16.10.906CrossRefPubMedGoogle Scholar
 13.Zou H, Yuan M: The F_{∞}norm Support Vector Machine. Stat Sin 2008, 18: 379–398.Google Scholar
 14.Wahba G, Lin Y, Zhang H: GACV for support vector machines. In Advances in Large Margin Classifiers. Edited by: Smola A, Bartlett P, Scholkopf B, Schuurmans D. Cambridge, MA: MIT Press; 2000:297–311.Google Scholar
 15.Hastie T, Tibshirani R, Friedman JH: The Elements of Statistical Learning. New York: Springer; 2001.CrossRefGoogle Scholar
 16.Friedman JH, Hastie T, Rosset S, Tibshirani R, Zhu J: Discussion of boosting papers. Ann Appl Stat 2004, 32: 102–107.Google Scholar
 17.Pan W, Xie B, Shen X: Incorporating predictor network in penalized regression with application to microarray data. [Manuscript submitted]. [Manuscript submitted].Google Scholar
 18.Li C, Li H: Networkconstrained regularization and variable selection for analysis of genomic data. Bioinformatics 2008, 24: 1175–1182. 10.1093/bioinformatics/btn081CrossRefPubMedGoogle Scholar
 19.Zou H, Hastie T: Regularization and variable selection via the elastic net. J R Statist Soc B 2005, 67: 301–320. 10.1111/j.14679868.2005.00503.xCrossRefGoogle Scholar
 20.Wang L, Zhu J, Zou H: The doubly regularized support vector machine. Stat Sin 2006, 16: 589–615.Google Scholar
 21.Gene Expression Omnibus: GSE6613[http://www.ncbi.nlm.nih.gov/geo/]
 22.Scherzer CR, Eklund AC, Morse LJ, Liao Z, Locascio JJ, Fefer D, Schwarzschild MA, Schlossmacher MG, Hauser MA, Vance JM, Sudarsky LR, Standaert DG, Growdon JH, Jensen RV, Gullans SR: Molecular markers of early Parkinson's disease based on gene expression in blood. Proc Natl Acad Sci USA 2007, 104: 955–960. 10.1073/pnas.0610204104PubMedCentralCrossRefPubMedGoogle Scholar
 23.KEGG: Parkinson's disease[http://cgap.nci.nih.gov/Pathways/Kegg/hsa05020]
Copyright information
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), which permitsunrestricted use, distribution, and reproduction in any medium, provided the original work isproperly cited.