On Validation of Hierarchical Clustering

Mucha, Hans-Joachim

doi:10.1007/978-3-540-70981-7_14

Hans-Joachim Mucha³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3924 Accesses
6 Citations

Abstract

An automatic validation of hierarchical clustering based on resampling techniques is recommended that can be considered as a three level assessment of stability. The first and most general level is decision making about the appropriate number of clusters. The decision is based on measures of correspondence between partitions such as the adjusted Rand index. Second, the stability of each individual cluster is assessed based on measures of similarity between sets such as the Jaccard coefficient. In the third and most detailed level of validation, the reliability of the cluster membership of each individual observation can be assessed. The built-in validation is demonstrated on the wine data set from the UCI repository where both the number of clusters and the class membership are known beforehand.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

FOWLKES E.B. and MALLOWS, C.L. (1983): A Method for Comparing two Hierarchical Clusterings. JASA 78, 553–569.
Article MATH Google Scholar
HENNIG, C. (2004): A General Robustness and Stability Theory for Cluster Analysis. Preprint, 7, Universität Hamburg.
Google Scholar
HUBERT, L.J. and ARABIE, P. (1985): Comparing Partitions. Journal of Classification, 2, 193–218.
Article Google Scholar
JAIN, A.K. and DUBES, R.C. (1988): Algorithms for Clustering Data. Prentice Hall, New Jersey.
MATH Google Scholar
JUNG, Y., PARK, H., DU, D.-Z. and DRAKE, B.L. (2003): A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering. Journal of Global Optimization 25, 91–111.
Article MathSciNet Google Scholar
LEBART, L., MORINEAU, A. and WARWICK, K.M. (1984): Multivariate Descriptive Statistical Analysis. Wiley, New York.
MATH Google Scholar
MUCHA, H.-J. (1992): Clusteranalyse mit Mikrocomputern. Akademie Verlag, Berlin.
MATH Google Scholar
MUCHA, H.-J. (2004): Automatic Validation of Hierarchical Clustering. In: J. Antoch (Ed.): Proceedings in Computational Statistics, COMPSTAT 2004, 16th Symposium. Physica-Verlag, Heidelberg, 1535–1542.
Google Scholar
MUCHA, H.-J. (2006): Finding Meaningful and Stable Clusters Using Local Cluster Analysis. In: V. Batagelj, H.-H. Bock, A. Ferligoj and A. Ziberna (Eds.): Data Science and Classification, Springer, Berlin, 101–108.
Chapter Google Scholar
MUCHA, H.-J. and HAIMERL, E. (2005): Automatic Validation of Hierarchical Cluster Analysis with Application in Dialectometry. In: C. Weihs and W. Gaul (Eds.): Classification-The Ubiquitous Challenge, Springer, Berlin, 513–520.
Chapter Google Scholar
RAND, W.M. (1971): Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66, 846–850.
Article Google Scholar
WARD, J.H. (1963): Hierarchical Grouping Methods to Optimise an Objective Function. JASA, 58, 235–244.
Google Scholar

Download references

Author information

Authors and Affiliations

Weierstraß-Institut für Angewandte Analysis und Stochastik, D-10117, Berlin, Germany
Hans-Joachim Mucha

Authors

Hans-Joachim Mucha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Administration and Economics, Bielefeld University, Universitätsstr. 25, 33501, Bielefeld, Germany
Reinhold Decker
Department of Economics, Freie Universität Berlin, Garystraße 21, 14195, Berlin, Germany
Hans -J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mucha, HJ. (2007). On Validation of Hierarchical Clustering. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-70981-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics