k-NN for the Classification of Human Cancer Samples Using the Gene Expression Profiles

Martín-Merino, Manuel

doi:10.1007/978-1-4419-5913-3_18

Manuel Martín-Merino²

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 680))

2436 Accesses
1 Citations

Abstract

The \( k \)-Nearest Neighbor (k-NN) classifier has been applied to the identification of cancer samples using the gene expression profiles with encouraging results. However, the performance of \( k \)-NN depends strongly on the distance considered to evaluate the sample proximities. Besides, the choice of a good dissimilarity is a difficult task and depends on the problem at hand. In this chapter, we introduce a method to learn the metric from the data to improve the \( k \)-NN classifier. To this aim, we consider a regularized version of the kernel alignment algorithm that incorporates a term that penalizes the complexity of the family of distances avoiding overfitting. The error function is optimized using a semidefinite programming approach (SDP). The method proposed has been applied to the challenging problem of cancer identification using the gene expression profiles. Kernel alignment \( k \)-NN outperforms other metric learning strategies and improves the classical \( k \)-NN algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

N. Cristianini, J. Kandola, J. Elisseeff, and A. Shawe-Taylor, “On the kernel target alignment”, Journal of Machine Learning Research, vol. 1, pp. 1–31, 2002.
Google Scholar
R. Gentleman, V. Carey, W. Huber, R. Irizarry, and S. Dudoit, “Bioinformatics and Computational Biology Solutions Using R and Bioconductor”, Berlin: Springer Verlag, 2006
Google Scholar
J. Kandola, J. Shawe-Taylor, and N. Cristianini, “Optimizing kernel alignment over combinations of kernels”, NeuroCOLT, Tech. Rep, 2002.
Google Scholar
G. Lanckriet, N. Cristianini, P. Barlett, L. El Ghaoui, and M. Jordan, “Learning the kernel matrix with semidefinite programming”. Journal of Machine Learning Research vol. 3, pp. 27–72, 2004.
Google Scholar
E. Pekalska, P. Paclick, and R. Duin, “A generalized kernel approach to dissimilarity-based classification”. Journal of Machine Learning Research, vol. 2, pp. 175–211, 2001.
Google Scholar
S.E.A. Pomeroy, “Prediction of central nervous system embryonal tumour outcome based on gene expression”. Nature, vol. 415, pp. 436–442, 2002
Article PubMed CAS Google Scholar
K. Savage et al, “The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical hodgkin lymphoma”, Blood, vol. 102(12), pp. 3871–3879, 2003.
Article PubMed CAS Google Scholar
C. Soon Ong, A. Smola, and R. Williamson, “Learning the kernel with hyperkernels”, Journal of Machine Learning Research, vol. 6, pp. 1043–1071, 2005.
Google Scholar
K.Q. Weinberger, L.K. Saul, “Distance metric learning for large margin nearest neighbor classification”, Journal Of Machine Learning Research, vol. 10, pp. 207–244, 2009.
Google Scholar
M. West et al, “Predicting the clinical status of human breast cancer by using gene expression profiles”, Proceedings of the National Academy of Sciences of the United States of America, vol. 98(20), pp. 11462–11467, 2001.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Universidad Pontificia de Salamanca, C/Compañía 5, 37002, Salamanca, Spain
Manuel Martín-Merino

Authors

Manuel Martín-Merino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Martín-Merino .

Editor information

Editors and Affiliations

Dept. Computer Science, University of Georgia, Athens, 30602-7404, Georgia, USA
Hamid R. Arabnia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martín-Merino, M. (2010). k-NN for the Classification of Human Cancer Samples Using the Gene Expression Profiles. In: Arabnia, H. (eds) Advances in Computational Biology. Advances in Experimental Medicine and Biology, vol 680. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5913-3_18

Download citation

DOI: https://doi.org/10.1007/978-1-4419-5913-3_18
Published: 09 August 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5912-6
Online ISBN: 978-1-4419-5913-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics