Improving k-NN for Human Cancer Classification Using the Gene Expression Profiles

Martín-Merino, Manuel; De Las Rivas, Javier

doi:10.1007/978-3-642-03915-7_10

Improving k-NN for Human Cancer Classification Using the Gene Expression Profiles

Manuel Martín-Merino²⁰ &
Javier De Las Rivas²¹

Conference paper

1779 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5772))

Abstract

The k Nearest Neighbor classifier has been applied to the identification of cancer samples using the gene expression profiles with encouraging results. However, k-NN relies usually on the use of Euclidean distances that fail often to reflect accurately the sample proximities. Non Euclidean dissimilarities focus on different features of the data and should be integrated in order to reduce the misclassification errors.

In this paper, we learn a linear combination of dissimilarities using a regularized kernel alignment algorithm. The weights of the combination are learnt in a HRKHS (Hyper Reproducing Kernel Hilbert Space) using a Semidefinite Programming algorithm. This approach allow us to incorporate a smoothing term that penalizes the complexity of the family of distances and avoids overfitting.

The experimental results suggest that the method proposed outperforms other metric learning strategies and improves the classical k-NN algorithm based on a single dissimilarity.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blanco, A., Martín-Merino, M., De Las Rivas, J.: Combining dissimilarity based classifiers for cancer prediction using gene expression profiles. BMC Bioinformatics, 1–2 (2007); ISMB/ECCB 2007
Google Scholar
Cristianini, N., Kandola, J., Elisseeff, J., Shawe-Taylor, A.: On the kernel target alignment. Journal of Machine Learning Research 1, 1–31 (2002)
Google Scholar
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Journal of the American Statistical Association 97(457), 77–87 (2002)
Article MathSciNet MATH Google Scholar
Fine, S., Scheinberg, K.: Efficient svm training using low-rank kernel representations. Journal of Machine Learning Research 2, 243–264 (2001)
MATH Google Scholar
Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, Berlin (2006)
MATH Google Scholar
Jeffery, I.B., Higgins, D.G., Culhane, A.C.: Comparison and Evaluation Methods for Generating Differentially Expressed Gene List from Microarray Data. BMC Bioinformatics 7(359), 1–16 (2006)
Google Scholar
Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)
Article Google Scholar
Kandola, J., Shawe-Taylor, J., Cristianini, N.: Optimizing kernel alignment over combinations of kernels. NeuroCOLT, Tech. Rep. (2002)
Google Scholar
Löfberg, J.: YALMIP, yet another LMI parser (2002), www.control.isy.liu.se/~johanl/yalmip.html
Lanckriet, G., Cristianini, N., Barlett, P., El Ghaoui, L., Jordan, M.: Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research 3, 27–72 (2004)
MATH Google Scholar
Pekalska, E., Paclick, P., Duin, R.: A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research 2, 175–211 (2001)
MathSciNet Google Scholar
Pomeroy, S.E.A.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415 (2002)
Google Scholar
Savage, K., et al.: The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical hodgkin lymphoma. Blood 102(12) (December 2003)
Google Scholar
Scholkopf, B., Tsuda, K., Vert, J.: Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)
Google Scholar
Soon Ong, C., Smola, A., Williamson, R.: Learning the kernel with hyperkernels. Journal of Machine Learning Research 6, 1043–1071 (2005)
MathSciNet MATH Google Scholar
Statnikov, A.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2004)
Article Google Scholar
Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software 11/12(1-4), 625–653 (1999)
Article MathSciNet MATH Google Scholar
Tsuda, K.: Support Vector Classifier with Assymetric Kernel Function. In: Proceedings of ESANN, Bruges, pp. 183–188 (1999)
Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
MATH Google Scholar
Weinberger, K.Q., Saul, L.K.: Distance Metric Learning for Large Margin Nearest Neighbor Classification. J. Machine Learning Research 10, 207–244 (2009)
MATH Google Scholar
West, M., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS 98(20) (2001)
Google Scholar
Wu, G., Chang, E.Y., Panda, N.: Formulating distance functions via the kernel trick. In: ACM SIGKDD, Chicago, pp. 703–709 (2005)
Google Scholar
Xiong, H., Chen, X.-W.: Kernel-Based Distance Metric Learning for Microarray Data Classification. BMC Bioinformatics 7(299), 1–11 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Pontificia de Salamanca, C/Compañía 5, 37002, Salamanca, Spain
Manuel Martín-Merino
Cancer Research Center (CIC-IBMCC, CSIC/USAL), Salamanca, Spain
Javier De Las Rivas

Authors

Manuel Martín-Merino
View author publications
You can also search for this author in PubMed Google Scholar
Javier De Las Rivas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics, Imperial College London, South Kensington Campus, SW7 2PG, London, United Kingdom
Niall M. Adams
INSA Lyon, LIRIS CNRS UMR 5205, Bâtiment Blaise Pascal, University of Lyon, F-69621, Villeurbanne, France
Céline Robardet
Department of Information and Computer Science, Universiteit Utrecht, Utrecht, The Netherlands
Arno Siebes
INSA-Lyon, LIRIS CNRS UMR5205, University of Lyon, F-69621, Villeurbanne, France
Jean-François Boulicaut

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martín-Merino, M., De Las Rivas, J. (2009). Improving k-NN for Human Cancer Classification Using the Gene Expression Profiles. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, JF. (eds) Advances in Intelligent Data Analysis VIII. IDA 2009. Lecture Notes in Computer Science, vol 5772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03915-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-03915-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03914-0
Online ISBN: 978-3-642-03915-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics