Abstract
An automatic validation of hierarchical clustering based on resampling techniques is recommended that can be considered as a three level assessment of stability. The first and most general level is decision making about the appropriate number of clusters. The decision is based on measures of correspondence between partitions such as the adjusted Rand index. Second, the stability of each individual cluster is assessed based on measures of similarity between sets such as the Jaccard coefficient. In the third and most detailed level of validation, the reliability of the cluster membership of each individual observation can be assessed. The built-in validation is demonstrated on the wine data set from the UCI repository where both the number of clusters and the class membership are known beforehand.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
FOWLKES E.B. and MALLOWS, C.L. (1983): A Method for Comparing two Hierarchical Clusterings. JASA 78, 553–569.
HENNIG, C. (2004): A General Robustness and Stability Theory for Cluster Analysis. Preprint, 7, Universität Hamburg.
HUBERT, L.J. and ARABIE, P. (1985): Comparing Partitions. Journal of Classification, 2, 193–218.
JAIN, A.K. and DUBES, R.C. (1988): Algorithms for Clustering Data. Prentice Hall, New Jersey.
JUNG, Y., PARK, H., DU, D.-Z. and DRAKE, B.L. (2003): A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering. Journal of Global Optimization 25, 91–111.
LEBART, L., MORINEAU, A. and WARWICK, K.M. (1984): Multivariate Descriptive Statistical Analysis. Wiley, New York.
MUCHA, H.-J. (1992): Clusteranalyse mit Mikrocomputern. Akademie Verlag, Berlin.
MUCHA, H.-J. (2004): Automatic Validation of Hierarchical Clustering. In: J. Antoch (Ed.): Proceedings in Computational Statistics, COMPSTAT 2004, 16th Symposium. Physica-Verlag, Heidelberg, 1535–1542.
MUCHA, H.-J. (2006): Finding Meaningful and Stable Clusters Using Local Cluster Analysis. In: V. Batagelj, H.-H. Bock, A. Ferligoj and A. Ziberna (Eds.): Data Science and Classification, Springer, Berlin, 101–108.
MUCHA, H.-J. and HAIMERL, E. (2005): Automatic Validation of Hierarchical Cluster Analysis with Application in Dialectometry. In: C. Weihs and W. Gaul (Eds.): Classification-The Ubiquitous Challenge, Springer, Berlin, 513–520.
RAND, W.M. (1971): Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66, 846–850.
WARD, J.H. (1963): Hierarchical Grouping Methods to Optimise an Objective Function. JASA, 58, 235–244.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mucha, HJ. (2007). On Validation of Hierarchical Clustering. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-70981-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)