Abstract
The \( k \)-Nearest Neighbor (k-NN) classifier has been applied to the identification of cancer samples using the gene expression profiles with encouraging results. However, the performance of \( k \)-NN depends strongly on the distance considered to evaluate the sample proximities. Besides, the choice of a good dissimilarity is a difficult task and depends on the problem at hand. In this chapter, we introduce a method to learn the metric from the data to improve the \( k \)-NN classifier. To this aim, we consider a regularized version of the kernel alignment algorithm that incorporates a term that penalizes the complexity of the family of distances avoiding overfitting. The error function is optimized using a semidefinite programming approach (SDP). The method proposed has been applied to the challenging problem of cancer identification using the gene expression profiles. Kernel alignment \( k \)-NN outperforms other metric learning strategies and improves the classical \( k \)-NN algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
N. Cristianini, J. Kandola, J. Elisseeff, and A. Shawe-Taylor, “On the kernel target alignment”, Journal of Machine Learning Research, vol. 1, pp. 1–31, 2002.
R. Gentleman, V. Carey, W. Huber, R. Irizarry, and S. Dudoit, “Bioinformatics and Computational Biology Solutions Using R and Bioconductor”, Berlin: Springer Verlag, 2006
J. Kandola, J. Shawe-Taylor, and N. Cristianini, “Optimizing kernel alignment over combinations of kernels”, NeuroCOLT, Tech. Rep, 2002.
G. Lanckriet, N. Cristianini, P. Barlett, L. El Ghaoui, and M. Jordan, “Learning the kernel matrix with semidefinite programming”. Journal of Machine Learning Research vol. 3, pp. 27–72, 2004.
E. Pekalska, P. Paclick, and R. Duin, “A generalized kernel approach to dissimilarity-based classification”. Journal of Machine Learning Research, vol. 2, pp. 175–211, 2001.
S.E.A. Pomeroy, “Prediction of central nervous system embryonal tumour outcome based on gene expression”. Nature, vol. 415, pp. 436–442, 2002
K. Savage et al, “The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical hodgkin lymphoma”, Blood, vol. 102(12), pp. 3871–3879, 2003.
C. Soon Ong, A. Smola, and R. Williamson, “Learning the kernel with hyperkernels”, Journal of Machine Learning Research, vol. 6, pp. 1043–1071, 2005.
K.Q. Weinberger, L.K. Saul, “Distance metric learning for large margin nearest neighbor classification”, Journal Of Machine Learning Research, vol. 10, pp. 207–244, 2009.
M. West et al, “Predicting the clinical status of human breast cancer by using gene expression profiles”, Proceedings of the National Academy of Sciences of the United States of America, vol. 98(20), pp. 11462–11467, 2001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this paper
Cite this paper
Martín-Merino, M. (2010). k-NN for the Classification of Human Cancer Samples Using the Gene Expression Profiles. In: Arabnia, H. (eds) Advances in Computational Biology. Advances in Experimental Medicine and Biology, vol 680. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5913-3_18
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5913-3_18
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5912-6
Online ISBN: 978-1-4419-5913-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)