The Linear Algebra of Similarity

Aggarwal, Charu C.

doi:10.1007/978-3-030-40344-7_9

The Linear Algebra of Similarity

Charu C. Aggarwal²

Chapter
First Online: 13 May 2020

10k Accesses

Abstract

A dot-product similarity matrix is an alternative way to represent a multidimensional data set. In other words, one can convert an n × d data matrix D into an n × n similarity matrix S = DD ^T (which contains n ² pairwise dot products between points). One can use S instead of D for machine learning algorithms. The reason is that the similarity matrix contains almost the same information about the data as the original matrix. This equivalence is the genesis of a large class of methods in machine learning, referred to as kernel methods. This chapter builds the linear algebra framework required for understanding this important class of methods in machine learning. The real utility of such methods arises when the similarity matrix is chosen differently from the use of dot products (and the data matrix is sometimes not even available).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Hardcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
For some closed-form functions like the dot product, only d components of the eigenvectors will be non-zero, whereas for others like the Gaussian kernel, the entire set of infinite components will be needed.
2.
An exception is outlier detection.

References

C. C. Aggarwal and S. Sathe. Outlier Ensembles: An Introduction. Springer, 2017.
Book Google Scholar
R. Ahuja, T. Magnanti, and J. Orlin. Network flows: theory, algorithms, and applications. Prentice Hall, 1993.
Google Scholar
A. Emmott, S. Das, T. Dietterich, A. Fern, and W. Wong. Systematic Construction of Anomaly Detection Benchmarks from Real Data. arXiv:1503.01158, 2015. https://arxiv.org/abs/1503.01158
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. NIPS Conference, pp. 849–856, 2002.
Google Scholar
B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), pp. 1299–1319, 1998.
Article Google Scholar
B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13(7), pp. 1443–1472, 2001.
Article Google Scholar
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), pp. 888–905, 2000.
Article Google Scholar
B. Schölkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge University Press, 2001.
Google Scholar
J. Tenenbaum, V. De Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500), pp. 2319–2323, 2000.
Article Google Scholar
G. Wahba. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. Advances in Kernel Methods-Support Vector Learning, 6, pp. 69–87, 1999.
Google Scholar
C. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. NIPS Conference, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Charu C. Aggarwal (Distinguished Research Staff Member)

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C. (2020). The Linear Algebra of Similarity. In: Linear Algebra and Optimization for Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-40344-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-40344-7_9
Published: 13 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40343-0
Online ISBN: 978-3-030-40344-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics