Abstract
The k Nearest Neighbor classifier has been applied to the identification of cancer samples using the gene expression profiles with encouraging results. However, k-NN relies usually on the use of Euclidean distances that fail often to reflect accurately the sample proximities. Non Euclidean dissimilarities focus on different features of the data and should be integrated in order to reduce the misclassification errors.
In this paper, we learn a linear combination of dissimilarities using a regularized kernel alignment algorithm. The weights of the combination are learnt in a HRKHS (Hyper Reproducing Kernel Hilbert Space) using a Semidefinite Programming algorithm. This approach allow us to incorporate a smoothing term that penalizes the complexity of the family of distances and avoids overfitting.
The experimental results suggest that the method proposed outperforms other metric learning strategies and improves the classical k-NN algorithm based on a single dissimilarity.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blanco, A., Martín-Merino, M., De Las Rivas, J.: Combining dissimilarity based classifiers for cancer prediction using gene expression profiles. BMC Bioinformatics, 1–2 (2007); ISMB/ECCB 2007
Cristianini, N., Kandola, J., Elisseeff, J., Shawe-Taylor, A.: On the kernel target alignment. Journal of Machine Learning Research 1, 1–31 (2002)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Journal of the American Statistical Association 97(457), 77–87 (2002)
Fine, S., Scheinberg, K.: Efficient svm training using low-rank kernel representations. Journal of Machine Learning Research 2, 243–264 (2001)
Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, Berlin (2006)
Jeffery, I.B., Higgins, D.G., Culhane, A.C.: Comparison and Evaluation Methods for Generating Differentially Expressed Gene List from Microarray Data. BMC Bioinformatics 7(359), 1–16 (2006)
Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)
Kandola, J., Shawe-Taylor, J., Cristianini, N.: Optimizing kernel alignment over combinations of kernels. NeuroCOLT, Tech. Rep. (2002)
Löfberg, J.: YALMIP, yet another LMI parser (2002), www.control.isy.liu.se/~johanl/yalmip.html
Lanckriet, G., Cristianini, N., Barlett, P., El Ghaoui, L., Jordan, M.: Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research 3, 27–72 (2004)
Pekalska, E., Paclick, P., Duin, R.: A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research 2, 175–211 (2001)
Pomeroy, S.E.A.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415 (2002)
Savage, K., et al.: The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical hodgkin lymphoma. Blood 102(12) (December 2003)
Scholkopf, B., Tsuda, K., Vert, J.: Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)
Soon Ong, C., Smola, A., Williamson, R.: Learning the kernel with hyperkernels. Journal of Machine Learning Research 6, 1043–1071 (2005)
Statnikov, A.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2004)
Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software 11/12(1-4), 625–653 (1999)
Tsuda, K.: Support Vector Classifier with Assymetric Kernel Function. In: Proceedings of ESANN, Bruges, pp. 183–188 (1999)
Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
Weinberger, K.Q., Saul, L.K.: Distance Metric Learning for Large Margin Nearest Neighbor Classification. J. Machine Learning Research 10, 207–244 (2009)
West, M., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS 98(20) (2001)
Wu, G., Chang, E.Y., Panda, N.: Formulating distance functions via the kernel trick. In: ACM SIGKDD, Chicago, pp. 703–709 (2005)
Xiong, H., Chen, X.-W.: Kernel-Based Distance Metric Learning for Microarray Data Classification. BMC Bioinformatics 7(299), 1–11 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martín-Merino, M., De Las Rivas, J. (2009). Improving k-NN for Human Cancer Classification Using the Gene Expression Profiles. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, JF. (eds) Advances in Intelligent Data Analysis VIII. IDA 2009. Lecture Notes in Computer Science, vol 5772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03915-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-03915-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03914-0
Online ISBN: 978-3-642-03915-7
eBook Packages: Computer ScienceComputer Science (R0)