Gene Selection for Multiclass Prediction by Weighted Fisher Criterion
Gene expression profiling has been widely used to study molecular signatures of many diseases and to develop molecular diagnostics for disease prediction. Gene selection, as an important step for improved diagnostics, screens tens of thousands of genes and identifies a small subset that discriminates between disease types. A two-step gene selection method is proposed to identify informative gene subsets for accurate classification of multiclass phenotypes. In the first step, individually discriminatory genes (IDGs) are identified by using one-dimensional weighted Fisher criterion (wFC). In the second step, jointly discriminatory genes (JDGs) are selected by sequential search methods, based on their joint class separability measured by multidimensional weighted Fisher criterion (wFC). The performance of the selected gene subsets for multiclass prediction is evaluated by artificial neural networks (ANNs) and/or support vector machines (SVMs). By applying the proposed IDG/JDG approach to two microarray studies, that is, small round blue cell tumors (SRBCTs) and muscular dystrophies (MDs), we successfully identified a much smaller yet efficient set of JDGs for diagnosing SRBCTs and MDs with high prediction accuracies (96.9% for SRBCTs and 92.3% for MDs, resp.). These experimental results demonstrated that the two-step gene selection method is able to identify a subset of highly discriminative genes for improved multiclass prediction.
KeywordsSupport Vector Machine Muscular Dystrophy Gene Selection Molecular Diagnostics Gene Subset
- 15.Xiong M, Fang X, Zhao J: Biomarker identification by feature wrappers. Genome Research 2001, 11(11):1878-1887.Google Scholar
- 16.Loog M: Approximate Pairwise Accuracy Criteria for Multiclass Linear Dimension Reduction: Generalisations of the Fisher Criterion. Delft University Press, Delft, The Netherlands; 1999.Google Scholar
- 19.Press WM, Flannery BP, Teukolsky SA, Vetterling WT: Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, New York, NY, USA; 1986.Google Scholar
- 23.Stearns SD: On selecting features for pattern classifiers. Proceedings of the 3rd International Conference on Pattern Recognition, Coronado, Calif, USA, November 1976 71-75.Google Scholar
- 29.Affymetrix Technical Note: Statistical algorithms description document. Affymetrix 2002. [http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf]Google Scholar
- 39.Jaeger J, Weichenhan D, Ivandic B, Spang R: Early diagnostic marker panel determination for microarray based clinical studies. Statistical Applications in Genetics and Molecular Biology 2005., 4(1, article 9):Google Scholar
- 47.Oja E: Subspace Methods of Pattern Recognition. John Wiley & Sons, New York, NY, USA; 1984.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.