Towards a Classification of Binary Similarity Measures
Similarity measures for binary variables are used in many problems of machine learning, pattern recognition and classification. Currently, the dozens of similarity measures are introduced and the problem of comparative analysis of these measures appears. One of the methods used for such analysis is clustering of similarity measures based on correlation between data similarity values obtained by different measures. The paper proposes the method of comparative analysis of similarity measures based on the set theoretic representation of these measures and comparison of algebraic properties of these representations. The results show existing relationship between results of clustering and the classification of measures by their properties. Due to the results of clustering depend on the clustering method and on data used for measuring correlation between measures we conclude that the classification based on the proposed properties of similarity measures is more suitable for comparative analysis of similarity measures.
KeywordsSimilarity measure Binary data Contingency table Clustering
The work is partially supported by the projects SIP 20171344, BEIFI of IPN and 283778 of CONACYT.
- 5.Choi, S.S., Cha, S.H., Charles, C.T.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inf. 8, 43–48 (2010)Google Scholar
- 9.Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 857–871Google Scholar
- 11.Hassanat, A.B.: Dimensionality invariant similarity measure. J. Am. Sci. 221–226 (2014)Google Scholar
- 12.Johnston, J.W.: Similarity indices I: what do they measure? In: Energy Research and Development Administration, vol. 136 (1976)Google Scholar
- 17.Pearson, K., Blakeman, J.: Mathematical contributions to the theory of evolution. In: 13th on the Theory of Contingency and Its Relation to Association and Normal Correlation. Dulau & Co., London (1912)Google Scholar
- 19.Rodríguez-Salazar, M.E., Álvarez-Hernández, S., Bravo-Núñez, E.: Coeficientes de asociación. Plaza y Valdés Editores, México (2001)Google Scholar
- 22.Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41 (2002)Google Scholar