Skip to main content

Ensembles of Nearest Neighbors for Gene Expression Based Cancer Classification

  • Chapter
Supervised and Unsupervised Ensemble Methods and their Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 126))

Summary

Gene expression levels are useful in discriminating between cancer and normal examples and/or between different types of cancer. In this chapter, ensembles of k-nearest neighbors are employed for gene expression based cancer classification. The ensembles are created by randomly sampling subsets of genes, assigning each subset to a k-nearest neighbor (k-NN) to perform classification, and finally, combining k-NN predictions with majority vote. Selection of subsets is governed by the statistical dependence between dataset complexity and classification error, confirmed by the copula method, so that least complex subsets are preferred since they are associated with more accurate predictions. Experiments carried out on six gene expression datasets show that our ensemble scheme is superior to a single best classifier in the ensemble and to the redundancy-based filter, especially designed to remove irrelevant genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Proc Natl Acad Sci 96:6745–6750

    Article  Google Scholar 

  2. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR (2002) Nature 415:436–442

    Article  Google Scholar 

  3. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Cancer Cell 1:203–209

    Article  Google Scholar 

  4. Sima C, Attoor S, Braga-Neto U, Lowey J, Suh E, Dougherty ER (2005) Error estimation confounds feature selection in expression-based classification. In: Proc IEEE Int Workshop Genomic Sign Proc and Stat, Newport, Rhode Island

    Google Scholar 

  5. Braga-Neto U, Dougherty ER (2004) Pattern Recognition 37:1267–1281

    Article  MATH  Google Scholar 

  6. Kuncheva L (2004) Combining pattern classifiers: methods and algorithms. John Wiley & Sons, Hoboken

    Book  MATH  Google Scholar 

  7. Dudoit S, Fridlyand J (2003) Classification in microarray experiments. In: Speed T (ed) Statistical analysis of gene expression microarray data. Chapman & Hall∖CRC Press, Boca Raton

    Google Scholar 

  8. Yu L (2008) Feature selection for genomic data analysis. In Liu H, Motoda H (eds) Computational methods of feature selection. Chapman & Hall∖CRC, Boca Raton

    Google Scholar 

  9. Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publications of the Institute of Statistics, University of Paris

    Google Scholar 

  10. Nelsen RB (2006) An inroduction to copulas. Springer Science+Business Media, New York

    Google Scholar 

  11. Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall∖CRC Press, Boca Raton

    MATH  Google Scholar 

  12. Zar JH (1999) Biostatistical analysis. Prentice Hall, Upper Saddle River

    Google Scholar 

  13. Gandrillon O (2004) Guide to the gene expression data. In: Proc ECML/PKDD Discovery Challenge Workshop, Pisa, Italy, pp 116–120

    Google Scholar 

  14. Bø TH, Jonassen I (2002) Genome Biology 3:0017.1–0017.11

    Article  Google Scholar 

  15. Box GEP, Müller ME (1958) The Annals of Mathematical Statistics 29:610–611

    Article  MATH  Google Scholar 

  16. Schweizer B, Wolff EF (1981) The Annals of Statistics 9:879–885

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Okun, O., Priisalu, H. (2008). Ensembles of Nearest Neighbors for Gene Expression Based Cancer Classification. In: Okun, O., Valentini, G. (eds) Supervised and Unsupervised Ensemble Methods and their Applications. Studies in Computational Intelligence, vol 126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78981-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78981-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78980-2

  • Online ISBN: 978-3-540-78981-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics