The Linear Algebra of Similarity



A dot-product similarity matrix is an alternative way to represent a multidimensional data set. In other words, one can convert an n × d data matrix D into an n × n similarity matrix S = DDT (which contains n2 pairwise dot products between points). One can use S instead of D for machine learning algorithms. The reason is that the similarity matrix contains almost the same information about the data as the original matrix. This equivalence is the genesis of a large class of methods in machine learning, referred to as kernel methods. This chapter builds the linear algebra framework required for understanding this important class of methods in machine learning. The real utility of such methods arises when the similarity matrix is chosen differently from the use of dot products (and the data matrix is sometimes not even available).


  1. 5.
    C. C. Aggarwal and S. Sathe. Outlier Ensembles: An Introduction. Springer, 2017.CrossRefGoogle Scholar
  2. 8.
    R. Ahuja, T. Magnanti, and J. Orlin. Network flows: theory, algorithms, and applications. Prentice Hall, 1993.Google Scholar
  3. 42.
    A. Emmott, S. Das, T. Dietterich, A. Fern, and W. Wong. Systematic Construction of Anomaly Detection Benchmarks from Real Data. arXiv:1503.01158, 2015.
  4. 98.
    A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. NIPS Conference, pp. 849–856, 2002.Google Scholar
  5. 112.
    B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), pp. 1299–1319, 1998.CrossRefGoogle Scholar
  6. 113.
    B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13(7), pp. 1443–1472, 2001.CrossRefGoogle Scholar
  7. 115.
    J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), pp. 888–905, 2000.CrossRefGoogle Scholar
  8. 118.
    B. Schölkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge University Press, 2001.Google Scholar
  9. 126.
    J. Tenenbaum, V. De Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500), pp. 2319–2323, 2000.CrossRefGoogle Scholar
  10. 129.
    G. Wahba. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. Advances in Kernel Methods-Support Vector Learning, 6, pp. 69–87, 1999.Google Scholar
  11. 133.
    C. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. NIPS Conference, 2000.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations