Molecular Diagnosis & Therapy

, Volume 11, Issue 4, pp 265–275 | Cite as

The Simple Classification of Multiple Cancer Types Using a Small Number of Significant Genes

  • Toe Young Yang
Original Research Article


Background and objective: The problems involved in the classification of cancers have recently received a great deal of attention in the context of DNA microarrays. We propose a simple procedure for classifying or predicting the cancer types of test samples when multiple cancer types and many genes are present.

Method: The procedure sequentially combines a gene-sort algorithm and a predictive likelihood-based classifier. Genes that have homogeneous patterns of expression measurements across cancer types are of limited interest. Therefore, this algorithm orders genes on the basis of strong heterogeneous patterns. The proposed classifier then selects the first few genes, which are sufficient to classify most training samples correctly via cross validation. Test samples were classified using only the selected genes.

Results and conclusion: This predictive likelihood-based classifier performs well and is simple to understand. Empirical examination revealed good classification accuracy using relatively few genes.


Support Vector Machine Acute Lymphocytic Leukemia Cancer Type Acute Myelogenous Leukemia Expression Measurement 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



I am very grateful to the editor and anonymous referees for their constructive comments on a draft of this paper.

This work was supported by a Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Research Promotion Fund) [grant no: KRF-2006-312-C00493].

The author has no conflicts of interest that are directly relevant to the content of this study.


  1. 1.
    Armstrong SA, Staunton JE, Silverman LB, et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 2002 Jan; 30(1): 41–7PubMedCrossRefGoogle Scholar
  2. 2.
    Hedenfalk I, Duggan D, Chen Y, et al. Gene expression profiles in hereditary breast cancer. N Engl J Med 2001 Feb 22; 344(8): 539–48PubMedCrossRefGoogle Scholar
  3. 3.
    Golub TR, Slonim D, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999 Oct 15; 286(5439): 531–7PubMedCrossRefGoogle Scholar
  4. 4.
    Yeoh EJ, Ross ME, Shurtleff SA, et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 2002 Mar; 1(2): 133–43PubMedCrossRefGoogle Scholar
  5. 5.
    Eisen M, Spellman P, Brown P, et al. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998 Dec 8; 95(25): 14863–8PubMedCrossRefGoogle Scholar
  6. 6.
    Tavazoie S, Hughes JD, Campbell MJ, et al. Systematic determination of genetic network architecture. Nat Genet 1999 Jul; 22(3): 281–5PubMedCrossRefGoogle Scholar
  7. 7.
    Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer Verlag, 2001Google Scholar
  8. 8.
    Yang TY. A tree-based model for homogeneous groupings of multinominals. Stat Med 2005 Nov 30; 24(22): 3513–22PubMedCrossRefGoogle Scholar
  9. 9.
    Wang Y, Makedon FS, Ford J, et al. HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 2005 Apr 15; 21(8): 1530–7PubMedCrossRefGoogle Scholar
  10. 10.
    Tibshirani R, Hastie R, Narasimhan B, et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 2002 May 14; 99(10): 6567–72PubMedCrossRefGoogle Scholar
  11. 11.
    Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 2002 Mar 1; 97: 77–87CrossRefGoogle Scholar
  12. 12.
    Huang X, Pan W. Linear regression and two-class classification with gene expression data. Bioinformatics 2003 Nov 1; 19(16): 2072–8PubMedCrossRefGoogle Scholar
  13. 13.
    Li T, Zhang C, Ogihara M. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 2004 Oct 12; 20(15): 2429–37PubMedCrossRefGoogle Scholar
  14. 14.
    Yeang CH, Ramaswamy S, Tamayo P, et al. Molecular classification of multiple tumor types. Bioinformatics 2001; 17 Suppl. 1: S316–22CrossRefGoogle Scholar
  15. 15.
    Bjornstad JF. Predictive likelihood: a review [with discussion]. Stat Sci 1990 May; 5(2): 242–5CrossRefGoogle Scholar
  16. 16.
    Pavlidis P, Noble WS. Analysis of strain and region variation in gene expression in mouse brain. Genome Biol 2001 Feb 10; 2(10): RESEARCH0042PubMedCrossRefGoogle Scholar
  17. 17.
    Lonnstedt I, Rimini R, Nilsson P. Empirical Bayes microarray ANOVA and grouping cell lines by equal expression levels. Stat Appl Genet Mol Biol 2005; 4 (1): Article7. Epub 2005 Apr 18Google Scholar
  18. 18.
    Kerr M, Martin M, Churchill G. Analysis of variance for gene expression microarray data. J Comput Biol 2000; 7(6): 819–37PubMedCrossRefGoogle Scholar
  19. 19.
    Kerr M, Churchill G. Experimental design for gene expression microarrays. Biostatistics 2001 Jun; 2(2): 183–201PubMedCrossRefGoogle Scholar
  20. 20.
    Wolfinger R, Gibson G, Wolfinger E, et al. Assessing gene significance from cDNAmicroarray expression data via mixed models. J Comput Biol 2001; 8(6): 625–37PubMedCrossRefGoogle Scholar
  21. 21.
    Chu TM, Weir B, Wolfinger R. A systematic statistical linear modeling approach to oligonucleotide array experiments. Math Biosci 2002 Mar; 176(1): 35–51PubMedCrossRefGoogle Scholar
  22. 22.
    Hsieh WP, Chu TM, Weir B, et al. Mixed model reanalysis of primate data suggests tissue and species biases in oligonucleotide-based gene expression profiles. Genetics 2003 Oct; 165(2): 747–57PubMedGoogle Scholar
  23. 23.
    Lee KE, Sha N, Dougherty ER, et al. Gene selection: a Bayesian variable selection approach. Bioinformatics 2003 Jan; 19(1): 90–7PubMedCrossRefGoogle Scholar
  24. 24.
    Ben-Dor L, Bruhn N, Friedman I, et al. Tissue classication with gene expression profiles. J Comput Biol 2000; 7(3): 559–84PubMedCrossRefGoogle Scholar
  25. 25.
    Li J, Liu H, Ng SK, et al. Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics 2003 Oct; 19 Suppl. 2: II93–102CrossRefGoogle Scholar
  26. 26.
    Buffer RW. Predictive likelihood inference with applications [with discussion]. J Roy Stat Soc B 1986; 48(1): 1–38Google Scholar
  27. 27.
    Buffer RW. Approximate predictive pivots and densities. Biometrika 1989; 76(3): 489–501CrossRefGoogle Scholar
  28. 28.
    Simon R, Radmacher M, Dobbin K, et al. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 2003 Jan 1; 95(1): 14–8PubMedCrossRefGoogle Scholar

Copyright information

© Adis Data Information BV 2007

Authors and Affiliations

  1. 1.Department of MathematicsMyongji UniversityYonginRepublic of Korea

Personalised recommendations