Background and objective: The problems involved in the classification of cancers have recently received a great deal of attention in the context of DNA microarrays. We propose a simple procedure for classifying or predicting the cancer types of test samples when multiple cancer types and many genes are present.
Method: The procedure sequentially combines a gene-sort algorithm and a predictive likelihood-based classifier. Genes that have homogeneous patterns of expression measurements across cancer types are of limited interest. Therefore, this algorithm orders genes on the basis of strong heterogeneous patterns. The proposed classifier then selects the first few genes, which are sufficient to classify most training samples correctly via cross validation. Test samples were classified using only the selected genes.
Results and conclusion: This predictive likelihood-based classifier performs well and is simple to understand. Empirical examination revealed good classification accuracy using relatively few genes.
Support Vector Machine Acute Lymphocytic Leukemia Cancer Type Acute Myelogenous Leukemia Expression Measurement
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in to check access
I am very grateful to the editor and anonymous referees for their constructive comments on a draft of this paper.
This work was supported by a Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Research Promotion Fund) [grant no: KRF-2006-312-C00493].
The author has no conflicts of interest that are directly relevant to the content of this study.
Armstrong SA, Staunton JE, Silverman LB, et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 2002 Jan; 30(1): 41–7PubMedCrossRefGoogle Scholar
Hedenfalk I, Duggan D, Chen Y, et al. Gene expression profiles in hereditary breast cancer. N Engl J Med 2001 Feb 22; 344(8): 539–48PubMedCrossRefGoogle Scholar
Golub TR, Slonim D, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999 Oct 15; 286(5439): 531–7PubMedCrossRefGoogle Scholar
Yeoh EJ, Ross ME, Shurtleff SA, et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 2002 Mar; 1(2): 133–43PubMedCrossRefGoogle Scholar
Eisen M, Spellman P, Brown P, et al. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998 Dec 8; 95(25): 14863–8PubMedCrossRefGoogle Scholar
Wang Y, Makedon FS, Ford J, et al. HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 2005 Apr 15; 21(8): 1530–7PubMedCrossRefGoogle Scholar
Tibshirani R, Hastie R, Narasimhan B, et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 2002 May 14; 99(10): 6567–72PubMedCrossRefGoogle Scholar
Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 2002 Mar 1; 97: 77–87CrossRefGoogle Scholar
Huang X, Pan W. Linear regression and two-class classification with gene expression data. Bioinformatics 2003 Nov 1; 19(16): 2072–8PubMedCrossRefGoogle Scholar
Li T, Zhang C, Ogihara M. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 2004 Oct 12; 20(15): 2429–37PubMedCrossRefGoogle Scholar
Wolfinger R, Gibson G, Wolfinger E, et al. Assessing gene significance from cDNAmicroarray expression data via mixed models. J Comput Biol 2001; 8(6): 625–37PubMedCrossRefGoogle Scholar
Chu TM, Weir B, Wolfinger R. A systematic statistical linear modeling approach to oligonucleotide array experiments. Math Biosci 2002 Mar; 176(1): 35–51PubMedCrossRefGoogle Scholar
Hsieh WP, Chu TM, Weir B, et al. Mixed model reanalysis of primate data suggests tissue and species biases in oligonucleotide-based gene expression profiles. Genetics 2003 Oct; 165(2): 747–57PubMedGoogle Scholar
Lee KE, Sha N, Dougherty ER, et al. Gene selection: a Bayesian variable selection approach. Bioinformatics 2003 Jan; 19(1): 90–7PubMedCrossRefGoogle Scholar
Ben-Dor L, Bruhn N, Friedman I, et al. Tissue classication with gene expression profiles. J Comput Biol 2000; 7(3): 559–84PubMedCrossRefGoogle Scholar
Li J, Liu H, Ng SK, et al. Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics 2003 Oct; 19 Suppl. 2: II93–102CrossRefGoogle Scholar
Buffer RW. Predictive likelihood inference with applications [with discussion]. J Roy Stat Soc B 1986; 48(1): 1–38Google Scholar