A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms
A general scheme for divisive hierarchical clustering algorithms is proposed. It is made of three main steps: first a splitting procedure for the subdivision of clusters into two subclusters, second a local evaluation of the bipartitions resulting from the tentative splits and, third, a formula for determining the node levels of the resulting dendrogram. A set of 12 such algorithms is presented and compared to their agglomerative counterpart (when available). These algorithms are evaluated using the Goodman-Kruskal correlation coefficient. As a global criterion it is an internal goodness-of-fit measure based on the set order induced by the hierarchy compared to the order associated with the given dissimilarities. Applied to a hundred random data tables and to three real life examples, these comparisons are in favor of methods which are based on unusual ratio-type formulas to evaluate the intermediate bipartitions, namely the Silhouette formula, the Dunn's formula and the Mollineda et al. formula. These formulas take into account both the within cluster and the between cluster mean dissimilarities. Their use in divisive algorithms performs very well and slightly better than in their agglomerative counterpart.
KeywordsHierarchical clustering Dissimilarity data Splitting procedures Evaluation of hierarchy Dendrogram Ultrametrics
Unable to display preview. Download preview PDF.
- GOLUB, T.R., SLONIM, D.K., TAMAYO, P., HUARD, C., GAASENBEEK, M., MESIROV, J.P., COLLER, H., LOH, M.L., DOWNING, J.R., CALIGIURI, M.A., BLOOMFIELD, C.D., and LANDER, E.S. (1999), “Molecular Classification of Cancer: Class Discovery Monitoring and Class Prediction by Gene Expression Monitoring”, Science, 286, 531–537.CrossRefGoogle Scholar
- MOLLINEDA, R.A., and VIDAL, E. (2000), “A Relative Approach to Hierarchical Clustering”, in Pattern Recognition and Applications, eds. M.I. Torres and A. Sanfeliu, Amsterdam : IOS Press, pp 19–28.Google Scholar
- ROUX, M. (1991), “Basic Procedures in Hierarchical Cluster Analysis”, in Applied Multivariate Analysis in SA–R and Environmental Studies, eds. J. Devillers and W. Karcher, Dordrecht : Kluwer Academic Publishers, pp 115–135.Google Scholar
- ROUX, M. (1995),“About Divisive Methods in Hierarchical Clustering”, in Data Science and Its Applications, eds. Y. Escoufier, C. Hayashi, B. Fichet, N. Ohsumi, E. Diday, Y. Baba, and L. Lebart, Tokyo: Acadademic Press, pp 101–106.Google Scholar
- STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000), “A Comparison of Document Clustering Techniques”, Technical Report TR 00-034. University of Minnesota, Minneapolis, USA.Google Scholar