Advertisement

A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms

  • Maurice Roux
Article

Abstract

A general scheme for divisive hierarchical clustering algorithms is proposed. It is made of three main steps: first a splitting procedure for the subdivision of clusters into two subclusters, second a local evaluation of the bipartitions resulting from the tentative splits and, third, a formula for determining the node levels of the resulting dendrogram. A set of 12 such algorithms is presented and compared to their agglomerative counterpart (when available). These algorithms are evaluated using the Goodman-Kruskal correlation coefficient. As a global criterion it is an internal goodness-of-fit measure based on the set order induced by the hierarchy compared to the order associated with the given dissimilarities. Applied to a hundred random data tables and to three real life examples, these comparisons are in favor of methods which are based on unusual ratio-type formulas to evaluate the intermediate bipartitions, namely the Silhouette formula, the Dunn's formula and the Mollineda et al. formula. These formulas take into account both the within cluster and the between cluster mean dissimilarities. Their use in divisive algorithms performs very well and slightly better than in their agglomerative counterpart.

Keywords

Hierarchical clustering Dissimilarity data Splitting procedures Evaluation of hierarchy Dendrogram Ultrametrics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BOLEY, D. (1998), “Principal Directions Divisive Partitioning”, Data Mining and Knowledge Discovery, 2(4), 325–344.CrossRefGoogle Scholar
  2. CUNNINGHAM, K.M., and OGILVIE, J.C. (1972), “Evaluation Of Hierarchical Grouping Techniques : A Preliminary Study”, Computer Journal, 15(3), 209–213.CrossRefGoogle Scholar
  3. DUNN, J.C. (1974), “Well Separated Clusters and Optimal Fuzzy Partitions”, Journal of Cybernetics, 4, 95–104.MathSciNetCrossRefMATHGoogle Scholar
  4. EDWARDS, A.W.F., and CAVALLI-SFORZA, L.L. (1965), “A Method for Cluster Analysis”, Biometrics, 21(2), 362–375.CrossRefGoogle Scholar
  5. FISHER, R. A. (1936), “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eugenics, 7, 179–188.CrossRefGoogle Scholar
  6. GOLUB, T.R., SLONIM, D.K., TAMAYO, P., HUARD, C., GAASENBEEK, M., MESIROV, J.P., COLLER, H., LOH, M.L., DOWNING, J.R., CALIGIURI, M.A., BLOOMFIELD, C.D., and LANDER, E.S. (1999), “Molecular Classification of Cancer: Class Discovery Monitoring and Class Prediction by Gene Expression Monitoring”, Science, 286, 531–537.CrossRefGoogle Scholar
  7. GOODMAN, L., and KRUSKAL, W. (1954), “Measures of Association for Cross-Validations, Part 1”, Journal of the American Statistical Association, 49, 732–764.MATHGoogle Scholar
  8. GOWER, J.C. (1966), “Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis”, Biometrika, 53(3,4), 325–338.MathSciNetCrossRefMATHGoogle Scholar
  9. HANDL, J., KNOWLES, J., and KELL, D.B. (2005), “Computational Cluster Validation in Post-Genomic Data Analysis”, Bioinformatics, 21(15), 3201–3212.CrossRefGoogle Scholar
  10. HUBERT, L.(1973), “Monotone Invariant Clustering Procedures”, Psychometrika, 38(1), 47–62.CrossRefMATHGoogle Scholar
  11. KAUFMAN L., and ROUSSEEUW, P.J. (1990), Finding Groups in Data, New York: Wiley.CrossRefMATHGoogle Scholar
  12. KENDALL, M.G. (1938), “A New Measure of Rank Correlation”, Biometrika. 30(1-2), 81–93.CrossRefMATHGoogle Scholar
  13. MACNAUGHTON-SMITH, P., WILLIAMS, W.T., DALE, M.B., and MOCKETT L.G. (1964), “Dissimilarity Analysis: A New Technique of Hierarchical Sub-Division”, Nature, 202, 1034–1035.CrossRefMATHGoogle Scholar
  14. MOLLINEDA, R.A., and VIDAL, E. (2000), “A Relative Approach to Hierarchical Clustering”, in Pattern Recognition and Applications, eds. M.I. Torres and A. Sanfeliu, Amsterdam : IOS Press, pp 19–28.Google Scholar
  15. MURTAGH, F., and LEGENDRE P. (2014), “Ward’s Hierarchical Agglomerative Method : Which Algorithms Implement Ward’s Criterion? ” Journal of Classification, 31, 274–295.MathSciNetCrossRefMATHGoogle Scholar
  16. REINERT, M. (1983), “Une Méthode de Classification Descendante Hiérarchique: Application à l'Analyse Lexicale par Contexte”, Les Cahiers de l'Analyse des Données, 8(2), 187–198.MathSciNetGoogle Scholar
  17. ROUSSEEUW, P.J. (1987), “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis”, Journal of Computational and Applied Mathematics, 20, 53–65.CrossRefMATHGoogle Scholar
  18. ROUX, M. (1991), “Basic Procedures in Hierarchical Cluster Analysis”, in Applied Multivariate Analysis in SA–R and Environmental Studies, eds. J. Devillers and W. Karcher, Dordrecht : Kluwer Academic Publishers, pp 115–135.Google Scholar
  19. ROUX, M. (1995),“About Divisive Methods in Hierarchical Clustering”, in Data Science and Its Applications, eds. Y. Escoufier, C. Hayashi, B. Fichet, N. Ohsumi, E. Diday, Y. Baba, and L. Lebart, Tokyo: Acadademic Press, pp 101–106.Google Scholar
  20. SNEATH, P.H.A., and SOKAL, R.R. (1973), Numerical Taxonomy, San Francisco: W.H. Freeman and Co.MATHGoogle Scholar
  21. SOKAL, R.R., and ROHLF, F.J. (1962), “The Comparison of Dendrograms by Objective Methods”, Taxonomy, 11(2), 33–40.CrossRefGoogle Scholar
  22. STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000), “A Comparison of Document Clustering Techniques”, Technical Report TR 00-034. University of Minnesota, Minneapolis, USA.Google Scholar
  23. SZÉKELY, G.J., and RIZZO, M.L. (2005), “Hierarchical Clustering Via Joint Between- Within Distances: Extending Ward's Minimum Variance Method”, Journal of Classification, 22, 151–183.MathSciNetCrossRefMATHGoogle Scholar
  24. TUBB, A., PARKER, N.J., and NICKLESS, G. (1980), “The Analysis of Romano-British Pottery by Atomic Absorption Spectrophotometry”, Archaeometry, 22, 153–171.CrossRefGoogle Scholar
  25. WARD, J.H. JR. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statisitcal Association, 58, 236–244.MathSciNetCrossRefGoogle Scholar
  26. WILLIAMS, W.T., and LAMBERT, J.M. (1959), “Multivariate Methods In Plant Ecology. I. Association Analysis in Plant Communities”, Journal of Ecology, 47(1), 83–101.CrossRefGoogle Scholar

Copyright information

© Classification Society of North America 2018

Authors and Affiliations

  1. 1.IMBE (Aix Marseille Université, CNRS, IRD, Univ Avignon)Faculté des Sciences de St-JérômeMarseille cedex 20France

Personalised recommendations