Adjusted Concordance Index: an Extensionl of the Adjusted Rand Index to Fuzzy Partitions

Abstract

In comparing clustering partitions, the Rand index (RI) and the adjusted Rand index (ARI) are commonly used for measuring the agreement between partitions. Such external validation indexes can be used to quantify how close the clusters are to a reference partition (or to prior knowledge about the data) by counting classified pairs of elements. To evaluate the solution of a fuzzy clustering algorithm, several extensions of the Rand index and other similarity measures to fuzzy partitions have been proposed. An extension of the ARI for fuzzy partitions based on the normalized degree of concordance is proposed. The performance of the proposed index is evaluated through Monte Carlo simulation studies.

This is a preview of subscription content, log in to check access.

References

  1. Albatineh, A. N., & Niewiadomska-Bugaj, M. (2011). Correcting Jaccard and other similarity indices for chance agreement in cluster analysis. Advances in Data Analysis and Classification, 5(3), 179–200.

    MathSciNet  MATH  Google Scholar 

  2. Albatineh, A. N., Niewiadomska-Bugaj, M., & Mihalko, D. (2006). On similarity indices and correction for chance agreement. Journal of Classification, 23(2), 301–313.

    MathSciNet  MATH  Google Scholar 

  3. Anderberg, M. R. (1973). Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks, 1st edn. New York: Academic press.

    Google Scholar 

  4. Anderson, D. T., Bezdek, J. C., Popescu, M., & Keller, J. M. (2010). Comparing fuzzy, probabilistic, and possibilistic partitions. IEEE Transactions on Fuzzy Systems, 18(5), 906–918.

    Google Scholar 

  5. Ben-Israel, A., & Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), 5–26.

    MathSciNet  MATH  Google Scholar 

  6. Berkhin, P. (2006). A survey of clustering data mining techniques, in Grouping multidimensional data. In Kogan, J., Nicholas, C., & Teboulle, M. (Eds.) (pp. 25–71). Berlin: Springer.

  7. Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: the Fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2), 191–203.

    Google Scholar 

  8. Böck, H. H. (1974). Automatische Klassifikation, 1st edn. Göttingen: Vandenhoeck & Ruprecht.

    Google Scholar 

  9. Brouwer, R. K. (2009). Extending the Rand, adjusted Rand and Jaccard indices to fuzzy partitions. Journal of Intelligent Information Systems, 32(3), 213–235.

    Google Scholar 

  10. Campello, R. J. (2007). A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Letters, 28(7), 833–841.

    Google Scholar 

  11. Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302.

    Google Scholar 

  12. Downton, M., & Brennan, T. (1980). Comparing classifications: an evaluation of several coefficients of partition agreement. Classification Society Bulletin, 4(4), 53–54.

    Google Scholar 

  13. Duran, B. S., & Odell, P. L. (2013). Cluster analysis: a survey, 2nd edn. Heidelberg: Springer Science & Business Media.

    Google Scholar 

  14. D’Urso, P. (2015). Fuzzy clustering, in Handbook of cluster analysis. In Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (Eds.) (pp. 545–574). Boca Raton: CRC Press, chap. 24.

  15. Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis, 5th edn. Chichester: Wiley.

    Google Scholar 

  16. Fasulo, D. (1999). An analysis of recent work on clustering algorithms. Department of Computer Science & Engineering, University of Washington. Available at https://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.44.2946&rep=rep1&type=pdf.

  17. Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383), 553–569.

    MATH  Google Scholar 

  18. Frigui, H., Hwang, C., & Rhee, F. C. -H. (2007). Clustering and aggregation of relational data with applications to image database categorization. Pattern Recognition, 40(11), 3053–3068.

    MATH  Google Scholar 

  19. Gower, J. C., & Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients. Journal of classification, 3(1), 5–48.

    MathSciNet  MATH  Google Scholar 

  20. Halkidi, M., Vazirgiannis, M., & Hennig, C. (2015). Method-independent indices for cluster validation and estimating the number of clusters. In Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (Eds.) Handbook of cluster analysis, chap. 26 (pp. 595–618). Boca Raton: CRC Press.

  21. Hamann, U. (1961). Merkmalsbestand und Verwandtschaftsbeziehungen der Farinosae: ein beitrag zum system der Monokotyledonen (639–768). Willdenowia.

  22. Han, J., Pei, J., & Kamber, M. (2012). Data mining: concepts and techniques, 3rd edn. Amsterdam: Elsevier.

    Google Scholar 

  23. Hartigan, J. A. (1975). Clustering algorithms, 1st edn. New York: Wiley.

    Google Scholar 

  24. Hennig, C., & Meila, M. (2015). Cluster analysis: an overview, in Handbook of cluster analysis. In Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (Eds.) (pp. 1–20). Boca Raton: CRC Press, chap. 1.

  25. Hoeffding, W. (1952). The large-sample power of tests based on permutations of observations. The Annals of Mathematical Statistics, 169–192.

  26. Höppner, F., Klawonn, F., Kruse, R., & Runkler, T. (1999). Fuzzy cluster analysis: methods for classification, data analysis and image recognition, 1st edn. Chichester: Wiley.

    Google Scholar 

  27. Hubert, L. (1977). Nominal scale response agreement as a generalized correlation. British Journal of Mathematical and Statistical Psychology, 30(1), 98–103.

    MATH  Google Scholar 

  28. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    MATH  Google Scholar 

  29. Hüllermeier, E., Rifqi, M., Henzgen, S., & Senge, R. (2012). Comparing fuzzy partitions: a generalization of the Rand index and related measures. IEEE Transactions on Fuzzy Systems, 20(3), 546–556.

    Google Scholar 

  30. Jaccard, P. (1901). Distribution de la Flore Alpine: dans le Bassin des dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 37(140), 241–272.

    Google Scholar 

  31. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data, 1st edn. Englewood Cliffs: Prentice-Hall, Inc.

    Google Scholar 

  32. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys (CSUR), 31(3), 264–323.

    Google Scholar 

  33. Kaufman, L., & Rousseeuw, P. J. (2005). Finding groups in data: an introduction to cluster analysis, 2nd. Hoboken: Wiley.

    Google Scholar 

  34. Klement, E. P., Mesiar, R., & Pap, E. (2010). Triangular norms, 1st edn. Dordercht: Springer Science & Business Media.

    Google Scholar 

  35. Kulczynski, S. (1927). Die pflanzenassociationen der pienenen. Bulletin International de l’académie Polonaise des Sciences et des letters, classe des sciences mathemátiques et naturelles, Serie B Supplement, II, 2, 57–203.

    Google Scholar 

  36. Meilă, M. (2007). Comparing clusterings - an information based distance. Journal of Multivariate Analysis, 98(5), 873–895.

    MathSciNet  MATH  Google Scholar 

  37. Mirkin, B. (1998). Mathematical classification and clustering: from how to what and why. In Balderjahn, I., Mathar, R., & Schader, M. (Eds.) Classification, data analysis, and data highways (pp. 172–181). Heidelberg: Springer.

  38. Morey, L. C., & Agresti, A. (1984). The measurement of classification agreement: an adjustment to the Rand statistic for chance agreement. Educational and Psychological Measurement, 44(1), 33–37.

    Google Scholar 

  39. Pesarin, F., & Salmaso, L. (2010a). The permutation testing approach: a review. Statistica, 70(4), 481–509.

    MATH  Google Scholar 

  40. Pesarin, F., & Salmaso, L. (2010b). Permutation tests for complex data: theory, applications and software, 1st edn. Chippenham: Wiley.

    Google Scholar 

  41. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.

    Google Scholar 

  42. Ruspini, E. H. (1970). Numerical methods for fuzzy clustering. Information Sciences, 2(3), 319–350.

    MATH  Google Scholar 

  43. Spath, H. (1980). Cluster analysis algorithms for data reduction and classification of objects, 1st edn. Chichester: Ellis Horwood, Ltd.

    Google Scholar 

  44. Stahl, D., & Sallis, H. (2012). Model-based cluster analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 4(4), 341–358.

    Google Scholar 

  45. Suleman, A. (2017). Assessing a fuzzy extension of Rand index and related measures. IEEE Transactions on Fuzzy Systems, 25(1), 237–244.

    Google Scholar 

  46. Warrens, M. J. (2008a). On association coefficients for 2× 2 tables and properties that do not depend on the marginal distributions. Psychometrika, 73(4), 777–789.

    MathSciNet  MATH  Google Scholar 

  47. Warrens, M. J. (2008b). On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. Journal of Classification, 25(2), 177–183.

    MathSciNet  MATH  Google Scholar 

  48. Warrens, M. J., & van der Hoef, H. (2019). Understanding partition comparison indices based on counting object pairs. Available at arXiv:1901.01777.

Download references

Acknowledgments

The authors would like to thank both the Editor and an anonymuous reviewer, whose comments and remarks highly contributed to improve the quality of this manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Antonio D’Ambrosio.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

D’Ambrosio, A., Amodio, S., Iorio, C. et al. Adjusted Concordance Index: an Extensionl of the Adjusted Rand Index to Fuzzy Partitions. J Classif (2020). https://doi.org/10.1007/s00357-020-09367-0

Download citation

Keywords

  • Clustering
  • Cluster validity
  • Fuzzy partitions
  • Normalized degree of concordance