Data Analysis of (Non-)Metric Proximities at Linear Costs

Schleif, Frank-Michael; Gisbrecht, Andrej

doi:10.1007/978-3-642-39140-8_4

Data Analysis of (Non-)Metric Proximities at Linear Costs

Frank-Michael Schleif¹⁸ &
Andrej Gisbrecht¹⁸

Conference paper

1494 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7953))

Abstract

Domain specific (dis-)similarity or proximity measures, employed e.g. in alignment algorithms in bio-informatics, are often used to compare complex data objects and to cover domain specific data properties. Lacking an underlying vector space, data are given as pairwise (dis-)similarities. The few available methods for such data do not scale well to very large data sets. Kernel methods easily deal with metric similarity matrices, also at large scale, but costly transformations are necessary starting with non-metric (dis-) similarities. We propose an integrative combination of Nyström approximation, potential double centering and eigenvalue correction to obtain valid kernel matrices at linear costs. Accordingly effective kernel approaches, become accessible for these data. Evaluation at several larger (dis-)similarity data sets shows that the proposed method achieves much better runtime performance than the standard strategy while keeping competitive model accuracy. Our main contribution is an efficient linear technique, to convert (potentially non-metric) large scale dissimilarity matrices into approximated positive semi-definite kernel matrices.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, Y., Garcia, E.K., Gupta, M.R., Rahimi, A., Cazzanti, L.: Similarity-based classification: Concepts and algorithms. JMLR 10, 747–776 (2009)
MathSciNet MATH Google Scholar
Cortes, C., Mohri, M., Talwalkar, A.: On the impact of kernel approximation on learning accuracy. JMLR - Proceedings Track 9, 113–120 (2010)
Google Scholar
Drineas, P., Mahoney, M.W.: On the nyström method for approximating a gram matrix for improved kernel-based learning. Journal of Machine Learning Research 6, 2153–2175 (2005)
MathSciNet MATH Google Scholar
Duin, R.P.: PRTools (March 2012), http://www.prtools.org
Duin, R.P.W., Pękalska, E.: Non-euclidean dissimilarities: Causes and informativeness. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR&SPR 2010. LNCS, vol. 6218, pp. 324–333. Springer, Heidelberg (2010)
Chapter Google Scholar
Farahat, A.K., Ghodsi, A., Kamel, M.S.: A novel greedy algorithm for nyström approximation. JMLR - Proceedings Track 15, 269–277 (2011)
Google Scholar
Gisbrecht, A., Mokbel, B., Schleif, F.M., Zhu, X., Hammer, B.: Linear time relational prototype based learning. Journal of Neural Systems 22(5) (2012)
Google Scholar
Graepel, T., Obermayer, K.: A stochastic self-organizing map for proximity data. Neural Computation 11(1), 139–155 (1999)
Article Google Scholar
Hammer, B., Hasenfuss, A.: Topographic mapping of large dissimilarity data sets. Neural Computation 22(9), 2229–2284 (2010)
Article MathSciNet MATH Google Scholar
Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15(8-9), 945–952 (2002)
Article Google Scholar
Kumar, S., Mohri, M., Talwalkar, A.: On sampling-based approximate spectral decomposition. In: ICML. ACM International Conference Proceeding Series, vol. 382, p. 70. ACM (2009)
Google Scholar
Laub, J., Roth, V., Buhmann, J.M., Müller, K.R.: On the information and representation of non-euclidean pairwise data. Pattern Recognition 39(10), 1815–1826 (2006)
Article MATH Google Scholar
Li, W.J., Zhang, Z., Yeung, D.Y.: Latent wishart processes for relational kernel learning. JMLR - Proceedings Track 5, 336–343 (2009)
Google Scholar
Neuhaus, M., Bunke, H.: Edit distance based kernel functions for structural pattern classification. Pattern Recognition 39(10), 1852–1863 (2006)
Article MATH Google Scholar
Pekalska, E., Duin, R.: The dissimilarity representation for pattern recognition. World Scientific (2005)
Google Scholar
Pekalska, E., Duin, R.P.W.: Beyond traditional kernels: Classification in two dissimilarity-based representation spaces. IEEE Transactions on Systems, Man, and Cybernetics Part C 38(6), 729–744 (2008)
Article Google Scholar
Pękalska, E.z., Duin, R.P.W., Günter, S., Bunke, H.: On not making dissimilarities euclidean. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 1145–1154. Springer, Heidelberg (2004)
Chapter Google Scholar
Platt, J.: Fastmap, metricmap, and landmark mds are all nyström algorithms (2005)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis and Discovery. Cambridge University Press (2004)
Google Scholar
de Silva, V., Tenenbaum, J.B.: Global versus local methods in nonlinear dimensionality reduction. In: NIPS, pp. 705–712. MIT Press (2002)
Google Scholar
Tan, J., Kuchibhatla, D., Sirota, F.L.: Tachyon search speeds up retrieval of similar sequences by several orders of magnitude. Bio Informatics (April 23, 2012)
Google Scholar
Tsang, I.W., Kocsor, A., Kwok, J.T.: Simpler core vector machines with enclosing balls. In: ICML. ACM International Conference Proceeding Series, vol. 227, pp. 911–918. ACM (2007)
Google Scholar
Williams, C.K.I., Seeger, M.: Using the nyström method to speed up kernel machines. In: NIPS, pp. 682–688. MIT Press (2000)
Google Scholar
Zhang, K., Kwok, J.T.: Clustered nyström method for large scale manifold learning and dimension reduction. IEEE Transactions on Neural Networks 21(10), 1576–1587 (2010)
Article Google Scholar
Zhang, K., Lan, L., Wang, Z., Moerchen, F.: Scaling up kernel svm on limited resources: A low-rank linearization approach. JMLR - Proceedings Track 22, 1425–1434 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

CITEC Centre of Excellence, Bielefeld University, 33615, Bielefeld, Germany
Frank-Michael Schleif & Andrej Gisbrecht

Authors

Frank-Michael Schleif
View author publications
You can also search for this author in PubMed Google Scholar
Andrej Gisbrecht
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of York, Deramore Lane, YO10 5GH, York, UK
Edwin Hancock
DAIS, Università Ca’ Foscari, Via Torino 155, 30172, Venice, Italy
Marcello Pelillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schleif, FM., Gisbrecht, A. (2013). Data Analysis of (Non-)Metric Proximities at Linear Costs. In: Hancock, E., Pelillo, M. (eds) Similarity-Based Pattern Recognition. SIMBAD 2013. Lecture Notes in Computer Science, vol 7953. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39140-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-39140-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39139-2
Online ISBN: 978-3-642-39140-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics