Generalizing Centroid Index to Different Clustering Models

  • Pasi FräntiEmail author
  • Mohammad Rezaei
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10029)


Centroid index is the only measure that evaluates cluster level differences between two clustering results. It outputs an integer value of how many clusters are differently allocated. In this paper, we apply this index to other clustering models that do not use centroid as prototype. We apply it to centroid model, Gaussian mixture model, and arbitrary-shape clusters.


Clustering Validity index External index Centroid index 


  1. 1.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRefGoogle Scholar
  2. 2.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)CrossRefzbMATHGoogle Scholar
  3. 3.
    Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Kvalseth, T.O.: Entropy and correlation: some comments. IEEE Trans. Syst. Man Cybern. 17(3), 517–519 (1987)CrossRefGoogle Scholar
  5. 5.
    Dongen, S.V.: Performance criteria for graph clustering and Markov cluster experiments. Technical report INSR0012, Centrum voor Wiskunde en Informatica (2000)Google Scholar
  6. 6.
    Meila, M., Heckerman, D.: An experimental comparison of model based clustering methods. Mach. Learn. 41(1–2), 9–29 (2001)CrossRefzbMATHGoogle Scholar
  7. 7.
    MacKay, D.: An example inference task: clustering. In: MacKay, D. (ed.) Information Theory, Inference and Learning Algorithms, pp. 284–292. Cambridge University Press, Cambridge (2003)Google Scholar
  8. 8.
    Fränti, P., Kivijärvi, J.: Randomised local search algorithm for the clustering problem. Pattern Anal. Appl. 3(4), 358–369 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Fränti, P., Virmajoki, O., Hautamäki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE TPAMI 28(11), 1875–1881 (2006)CrossRefGoogle Scholar
  10. 10.
    Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recogn. 47(9), 3034–3045 (2014)CrossRefGoogle Scholar
  11. 11.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)CrossRefzbMATHGoogle Scholar
  12. 12.
    Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. Data Min. Knowl. Disc. 2(3), 283–304 (1998)CrossRefGoogle Scholar
  13. 13.
    Rezaei, M., Fränti, P.: Set matching measures for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)CrossRefGoogle Scholar
  14. 14.
    Zhao, Q., Fränti, P.: Centroid ratio for pairwise random swap clustering algorithm. IEEE Trans. Knowl. Data Eng. 26(5), 1090–1101 (2014)CrossRefGoogle Scholar
  15. 15.
    Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recogn. 39(5), 761–765 (2006)CrossRefzbMATHGoogle Scholar
  16. 16.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Udea, N., Nakano, R., Gharhamani, Z., Hinton, G.: SMEM algorithm for mixture models. Neural Comput. 12, 2109–2128 (2000)CrossRefGoogle Scholar
  18. 18.
    Pernkopf, F., Bouchaffra, D.: Genetic-based em algorithm for learning Gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1344–1348 (2005)CrossRefGoogle Scholar
  19. 19.
    Zhao, Q., Hautamäki, V., Kärkkäinen, I., Fränti, P.: Random swap EM algorithm for Gaussian mixture models. Pattern Recogn. Lett. 33, 2120–2126 (2012)CrossRefGoogle Scholar
  20. 20.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., Upper Saddle River (1988)zbMATHGoogle Scholar
  21. 21.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDDM, pp. 226–231 (1996)Google Scholar
  22. 22.
    Zhong, C., Miao, D., Fränti, P.: Minimum spanning tree based split-and-merge: a hierarchical clustering method. Inf. Sci. 181, 3397–3410 (2011)CrossRefGoogle Scholar
  23. 23.
    Zhang, T., et al.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Disc. 1(2), 141–182 (1997)CrossRefGoogle Scholar
  24. 24.
    Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Disc. Data (TKDD) 1(1), 1–30 (2007)CrossRefGoogle Scholar
  25. 25.
    Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 100(1), 68–86 (1971)CrossRefzbMATHGoogle Scholar
  26. 26.
    Arthur, D., Vassilvitskii, S.: K-means ++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms (SODA 2007), pp. 1027–1035, January 2007Google Scholar
  27. 27.
    Fränti, P.: Genetic algorithm with deterministic crossover for vector quantization. Pattern Recogn. Lett. 21(1), 61–68 (2000)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.University of Eastern FinlandJoensuuFinland

Personalised recommendations