Skip to main content

The Linear Algebra of Similarity

  • Chapter
  • First Online:
  • 10k Accesses

Abstract

A dot-product similarity matrix is an alternative way to represent a multidimensional data set. In other words, one can convert an n × d data matrix D into an n × n similarity matrix S = DD T (which contains n 2 pairwise dot products between points). One can use S instead of D for machine learning algorithms. The reason is that the similarity matrix contains almost the same information about the data as the original matrix. This equivalence is the genesis of a large class of methods in machine learning, referred to as kernel methods. This chapter builds the linear algebra framework required for understanding this important class of methods in machine learning. The real utility of such methods arises when the similarity matrix is chosen differently from the use of dot products (and the data matrix is sometimes not even available).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   64.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    For some closed-form functions like the dot product, only d components of the eigenvectors will be non-zero, whereas for others like the Gaussian kernel, the entire set of infinite components will be needed.

  2. 2.

    An exception is outlier detection.

References

  1. C. C. Aggarwal and S. Sathe. Outlier Ensembles: An Introduction. Springer, 2017.

    Book  Google Scholar 

  2. R. Ahuja, T. Magnanti, and J. Orlin. Network flows: theory, algorithms, and applications. Prentice Hall, 1993.

    Google Scholar 

  3. A. Emmott, S. Das, T. Dietterich, A. Fern, and W. Wong. Systematic Construction of Anomaly Detection Benchmarks from Real Data. arXiv:1503.01158, 2015. https://arxiv.org/abs/1503.01158

  4. A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. NIPS Conference, pp. 849–856, 2002.

    Google Scholar 

  5. B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), pp. 1299–1319, 1998.

    Article  Google Scholar 

  6. B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13(7), pp. 1443–1472, 2001.

    Article  Google Scholar 

  7. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), pp. 888–905, 2000.

    Article  Google Scholar 

  8. B. Schölkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge University Press, 2001.

    Google Scholar 

  9. J. Tenenbaum, V. De Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500), pp. 2319–2323, 2000.

    Article  Google Scholar 

  10. G. Wahba. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. Advances in Kernel Methods-Support Vector Learning, 6, pp. 69–87, 1999.

    Google Scholar 

  11. C. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. NIPS Conference, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Aggarwal, C.C. (2020). The Linear Algebra of Similarity. In: Linear Algebra and Optimization for Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-40344-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-40344-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-40343-0

  • Online ISBN: 978-3-030-40344-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics