Advertisement

A Tale of Four Metrics

  • Richard ConnorEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9939)

Abstract

There are many contexts where the definition of similarity in multivariate space requires to be based on the correlation, rather than absolute value, of the variables. Examples include classic IR measurements such as TDF/IF and BM25, client similarity measures based on collaborative filtering, feature analysis of chemical molecules, and biodiversity contexts.

In such cases, it is almost standard for Cosine similarity to be used. More recently, Jensen-Shannon divergence has appeared in a proper metric form, and a related metric Structural Entropic Distance (SED) has been investigated. A fourth metric, based on a little-known divergence function named as Triangular Divergence, is also assessed here.

For these metrics, we study their properties in the context of similarity and metric search. We compare and contrast their semantics and performance. Our conclusion is that, despite Cosine Distance being an almost automatic choice in this context, Triangular Distance is most likely to be the best choice in terms of a compromise between semantics and performance.

Keywords

Information Retrieval Cosine Similarity Cosine Distance Semantic Basis Query Threshold 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Connor, R., Cardillo, F.A., Vadicamo, L., Rabitti, F.: Hilbert Exclusion: Improved Metric Search Through Finite Isometric Embeddings. ArXiv e-prints, accepted for publication ACM TOIS, April 2016Google Scholar
  2. 2.
    Connor, R., Cardillo, F.A., Vadicamo, L., Rabitti, F.: Supermetric Search with the Four-Point Property. Accepted for publication SISAP, Tokyo, Japan, October 2016Google Scholar
  3. 3.
    Connor, R., Moss, R.: A multivariate correlation distance for vector spaces. In: Navarro, G., Pestov, V. (eds.) SISAP 2012. LNCS, vol. 7404, pp. 209–225. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Connor, R., Simeoni, F., Iakovos, M., Moss, R.: A bounded distance metric for comparing tree structure. Inf. Syst. 36(4), 748–764 (2011)CrossRefGoogle Scholar
  5. 5.
    Connor, R., Moss, R., Harvey, M.: A new probabilistic ranking model. In: Proceedings of the 2013 Conference on the Theory of Information Retrieval, ICTIR 2013, p. 23: 109–23: 112, NY, USA (2013). http://doi.acm.org/10.1145/2499178.2499185
  6. 6.
    Endres, D., Schindelin, J.: A new metric for probability distributions. IEEE Trans. Inf. Theor. 49(7), 1858–1860 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Fuglede, B., Topsoe, F.: Jensen-Shannon divergence and Hilbert space embedding. In: Proceedings of International Symposium on Information Theory, ISIT 2004, p. 31 (2004)Google Scholar
  8. 8.
    Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments: part 2. Inf. Process. Manag. 36(6), 809–840 (2000)CrossRefGoogle Scholar
  9. 9.
    Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theor. 37(1), 145–151 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Österreicher, F., Vajda, I.: A new class of metric divergences on probability spaces and and its statistical applications. Ann. Inst. Stat. Math. 55, 639–653 (2003)CrossRefzbMATHGoogle Scholar
  11. 11.
    Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4), 35–43 (2001)Google Scholar
  12. 12.
    Topsoe, F.: Some inequalities for information divergence and related measures of discrimination. IEEE Trans. Inf. Theor. 46(4), 1602–1609 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Topsøe, F.: Jenson-Shannon divergence and norm-based measures of discrimination and variation. Preprint math.ku.dk (2003)

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Department of Computer and Information SciencesUniversity of StrathclydeGlasgowUK

Personalised recommendations