Ensembles of Nearest Neighbors for Gene Expression Based Cancer Classification

Okun, Oleg; Priisalu, Helen

doi:10.1007/978-3-540-78981-9_6

Oleg Okun⁵ &
Helen Priisalu⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 126))

863 Accesses
2 Citations

Summary

Gene expression levels are useful in discriminating between cancer and normal examples and/or between different types of cancer. In this chapter, ensembles of k-nearest neighbors are employed for gene expression based cancer classification. The ensembles are created by randomly sampling subsets of genes, assigning each subset to a k-nearest neighbor (k-NN) to perform classification, and finally, combining k-NN predictions with majority vote. Selection of subsets is governed by the statistical dependence between dataset complexity and classification error, confirmed by the copula method, so that least complex subsets are preferred since they are associated with more accurate predictions. Experiments carried out on six gene expression datasets show that our ensemble scheme is superior to a single best classifier in the ensemble and to the redundancy-based filter, especially designed to remove irrelevant genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Proc Natl Acad Sci 96:6745–6750
Article Google Scholar
Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR (2002) Nature 415:436–442
Article Google Scholar
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Cancer Cell 1:203–209
Article Google Scholar
Sima C, Attoor S, Braga-Neto U, Lowey J, Suh E, Dougherty ER (2005) Error estimation confounds feature selection in expression-based classification. In: Proc IEEE Int Workshop Genomic Sign Proc and Stat, Newport, Rhode Island
Google Scholar
Braga-Neto U, Dougherty ER (2004) Pattern Recognition 37:1267–1281
Article MATH Google Scholar
Kuncheva L (2004) Combining pattern classifiers: methods and algorithms. John Wiley & Sons, Hoboken
Book MATH Google Scholar
Dudoit S, Fridlyand J (2003) Classification in microarray experiments. In: Speed T (ed) Statistical analysis of gene expression microarray data. Chapman & Hall∖CRC Press, Boca Raton
Google Scholar
Yu L (2008) Feature selection for genomic data analysis. In Liu H, Motoda H (eds) Computational methods of feature selection. Chapman & Hall∖CRC, Boca Raton
Google Scholar
Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publications of the Institute of Statistics, University of Paris
Google Scholar
Nelsen RB (2006) An inroduction to copulas. Springer Science+Business Media, New York
Google Scholar
Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall∖CRC Press, Boca Raton
MATH Google Scholar
Zar JH (1999) Biostatistical analysis. Prentice Hall, Upper Saddle River
Google Scholar
Gandrillon O (2004) Guide to the gene expression data. In: Proc ECML/PKDD Discovery Challenge Workshop, Pisa, Italy, pp 116–120
Google Scholar
Bø TH, Jonassen I (2002) Genome Biology 3:0017.1–0017.11
Article Google Scholar
Box GEP, Müller ME (1958) The Annals of Mathematical Statistics 29:610–611
Article MATH Google Scholar
Schweizer B, Wolff EF (1981) The Annals of Statistics 9:879–885
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of Oulu, P.O.Box 4500, FI-90014, Oulu, Finland
Oleg Okun
Teradata, Valkijärventie 7E, FI-2130, Espoo, Finland
Helen Priisalu

Authors

Oleg Okun
View author publications
You can also search for this author in PubMed Google Scholar
Helen Priisalu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Vision Group, Infotech Oulu, Finland
Oleg Okun
Department of Electrical and Information Engineering, University of Oulu, P.O. Box 4500, FI-90014, Oulu, Finland
Oleg Okun
Dipartimento di Scienze dell’Informazione, Universita degli Studi di Milano, Via Comelico 39, 20135, Milano, Italy
Giorgio Valentini

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Okun, O., Priisalu, H. (2008). Ensembles of Nearest Neighbors for Gene Expression Based Cancer Classification. In: Okun, O., Valentini, G. (eds) Supervised and Unsupervised Ensemble Methods and their Applications. Studies in Computational Intelligence, vol 126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78981-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-78981-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78980-2
Online ISBN: 978-3-540-78981-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics