Abstract
A dot-product similarity matrix is an alternative way to represent a multidimensional data set. In other words, one can convert an n × d data matrix D into an n × n similarity matrix S = DD T (which contains n 2 pairwise dot products between points). One can use S instead of D for machine learning algorithms. The reason is that the similarity matrix contains almost the same information about the data as the original matrix. This equivalence is the genesis of a large class of methods in machine learning, referred to as kernel methods. This chapter builds the linear algebra framework required for understanding this important class of methods in machine learning. The real utility of such methods arises when the similarity matrix is chosen differently from the use of dot products (and the data matrix is sometimes not even available).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
For some closed-form functions like the dot product, only d components of the eigenvectors will be non-zero, whereas for others like the Gaussian kernel, the entire set of infinite components will be needed.
- 2.
An exception is outlier detection.
References
C. C. Aggarwal and S. Sathe. Outlier Ensembles: An Introduction. Springer, 2017.
R. Ahuja, T. Magnanti, and J. Orlin. Network flows: theory, algorithms, and applications. Prentice Hall, 1993.
A. Emmott, S. Das, T. Dietterich, A. Fern, and W. Wong. Systematic Construction of Anomaly Detection Benchmarks from Real Data. arXiv:1503.01158, 2015. https://arxiv.org/abs/1503.01158
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. NIPS Conference, pp. 849–856, 2002.
B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), pp. 1299–1319, 1998.
B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13(7), pp. 1443–1472, 2001.
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), pp. 888–905, 2000.
B. Schölkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge University Press, 2001.
J. Tenenbaum, V. De Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500), pp. 2319–2323, 2000.
G. Wahba. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. Advances in Kernel Methods-Support Vector Learning, 6, pp. 69–87, 1999.
C. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. NIPS Conference, 2000.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Aggarwal, C.C. (2020). The Linear Algebra of Similarity. In: Linear Algebra and Optimization for Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-40344-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-40344-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40343-0
Online ISBN: 978-3-030-40344-7
eBook Packages: Computer ScienceComputer Science (R0)