Advertisement

A Modularity-Based Measure for Cluster Selection from Clustering Hierarchies

  • Francisco de Assis Rodrigues dos Anjos
  • Jadson Castro Gertrudes
  • Jörg Sander
  • Ricardo J. G. B. CampelloEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 996)

Abstract

Extracting a flat solution from a clustering hierarchy, as opposed to deriving it directly from data using a partitional clustering algorithm, is advantageous as it allows the hierarchical relationships between clusters and sub-clusters as well their stability across different hierarchical levels to be revealed before any decision on what clusters are more relevant is made. Traditionally, flat solutions are obtained by performing a global, horizontal cut through a clustering hierarchy (e.g. a dendrogram). This problem has gained special importance in the context of density-based hierarchical algorithms, because only sophisticated cutting strategies, in particular non-horizontal local cuts, are able to select clusters at different density levels. In this paper, we propose an adaptation of a variant of the Modularity Q measure, widely used in the realm of community detection in complex networks, so that it can be applied as an optimization criterion to the problem of optimal local cuts through clustering hierarchies. Our results suggest that the proposed measure is a competitive alternative, especially for high-dimensional data.

Keywords

Hierarchical clustering Cluster evaluation and selection 

Notes

Acknowledgements

CNPq and CAPES (Brazil), NSERC (Canada).

References

  1. 1.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., Englewood Cliffs (1988)zbMATHGoogle Scholar
  2. 2.
    Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)CrossRefGoogle Scholar
  3. 3.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)zbMATHGoogle Scholar
  4. 4.
    Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis. Oxford University Press, Oxford (2001)zbMATHGoogle Scholar
  5. 5.
    Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)CrossRefGoogle Scholar
  6. 6.
    Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Min. Knowl. Discov. 27(3), 344–371 (2013)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data 10(1), 1–51 (2015)CrossRefGoogle Scholar
  8. 8.
    Kriegel, H.P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. WIREs: Data Min. Knowl. Discov. 1(3), 231–240 (2011)Google Scholar
  9. 9.
    Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: PAKDD, pp. 160–172 (2013)Google Scholar
  10. 10.
    Piekenbrock, M., Hahsler, M.: HDBSCAN with the ‘dbscan’ package. https://cran.r-project.org/web/packages/dbscan/vignettes/hdbscan.html (nd)
  11. 11.
    McInnes, L., Healy, J., Astels, S.: The ‘hdbscan’ clustering library (Python Scikit-learn docs). http://hdbscan.readthedocs.io/en/latest/index.html (nd)
  12. 12.
    Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRefGoogle Scholar
  13. 13.
    Boudaillier, E., Hébrail, G.: Interactive interpretation of hierarchical clustering. Intell. Data Anal. 2, 229–244 (1998)CrossRefGoogle Scholar
  14. 14.
    Ferraretti, D., Gamberoni, G., Lamma, E.: Automatic cluster selection using index driven search strategy. In: AI*IA, pp. 172–181 (2009)Google Scholar
  15. 15.
    Gupta, G., Liu, A., Ghosh, J.: Automated hierarchical density shaving: a robust automated clustering and visualization framework for large biological data sets. IEEE/ACM Trans. Comp. Biol. Bioinform. 7(2), 223–237 (2010)CrossRefGoogle Scholar
  16. 16.
    Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comp. Graph. Stat. 19(2), 397–418 (2010)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automatic extraction of clusters from hierarchical clustering representations. In: PAKDD, pp. 75–87 (2003)CrossRefGoogle Scholar
  19. 19.
    Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 28(3), 301–315 (1998)CrossRefGoogle Scholar
  20. 20.
    Jaskowiak, P.A., Moulavi, D., Furtado, A.C., Campello, R.J., Zimek, A., Sander, J.: On strategies for building effective ensembles of relative clustering validity criteria. Knowl. Inf. Syst. 47(2), 329–354 (2016)CrossRefGoogle Scholar
  21. 21.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)Google Scholar
  22. 22.
    Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Feng, Z., Xu, X., Yuruk, N., Schweiger, T.A.J.: A novel similarity-based modularity function for graph partitioning. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 385–396. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-74553-2_36CrossRefGoogle Scholar
  24. 24.
    Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007)Google Scholar
  25. 25.
    Huang, J., Sun, H., Song, Q., Deng, H., Han, J.: Revealing density-based clustering structure from the core-connected tree of a network. IEEE Trans. Knowl. Data Eng. 25(8), 1876–1889 (2013)CrossRefGoogle Scholar
  26. 26.
    Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)CrossRefGoogle Scholar
  27. 27.
    Naldi, M.C., Campello, R.J.G.B., Hruschka, E.R., Carvalho, A.C.P.L.F.: Efficiency issues of evolutionary k-means. Appl. Soft Comput. 11(2), 1938–1952 (2011)CrossRefGoogle Scholar
  28. 28.
    Paulovich, F.V., Nonato, L.G., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Vis. Comput. Graph. 14, 564–575 (2008)CrossRefGoogle Scholar
  29. 29.
    Yeung, K., Fraley, C., Murua, A., Raftery, A., Ruzzo, W.: Model-based clustering and data transformations for gene expression data. Bioinf. 17(10), 977–987 (2001)CrossRefGoogle Scholar
  30. 30.
    Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene-expression data with repeated measurements. Genome Biol. 4(5), R34 (2003)CrossRefGoogle Scholar
  31. 31.
    Lichman, M.: UCI machine learn. Repository (2013). http://archive.ics.uci.edu/ml
  32. 32.
    Horta, D., Campello, R.J.G.B.: Automatic aspect discrimination in data clustering. Pattern Recognit. 45(12), 4370–4388 (2012)CrossRefGoogle Scholar
  33. 33.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Francisco de Assis Rodrigues dos Anjos
    • 1
  • Jadson Castro Gertrudes
    • 1
  • Jörg Sander
    • 2
  • Ricardo J. G. B. Campello
    • 3
    Email author
  1. 1.University of São PauloSão CarlosBrazil
  2. 2.University of AlbertaEdmontonCanada
  3. 3.University of NewcastleCallaghanAustralia

Personalised recommendations