Summary
Hierarchical clustering procedures such as single-, average-, or complete-link procedures produce a series of groupings of the data arranged in the form of a hierarchy, or tree structure. In most cases, the choice of where to “cut” the tree is left to the user. Occasional formal guidelines have usually been based on ideas of random sampling, but that assumption is often violated in the contexts in which cluster analysis is used. This paper explores the application of Rissanen’s MDL principle to derive possible guidelines for cutting the tree. These guidelines do not assume random sampling.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bryant, P. (1996): The Minimum Description Length Principle for Gaussian Regression. Working Paper 1996–08, University of Colorado at Denver, Graduate School of Business Administration. Denver, Colorado 80217–3364.
Duda, R. O. and Hart, P.E. (1973): Pattern Classification and Scene Analysis. John Wiley & Sons, New York.
Everitt, B. S. (1993): Cluster Analysis. Edward Arnold, London.
Johnson, R. A. and Wiehern, D. W. (1988): Applied Multivariate Statistical Analysis, second edition, Prentice-Hall, Englewood Cliffs, N. J.
Rissanen, J. (1987): Stochastic complexity. Journal of the Royal Statistical Society, Series B, 49, 3, 223–265
Rissanen, J. (1989): Stochastic Complexity in Statistical Inquiry. World Scientific Publishing Co., Singapore.
Rissanen, J. (1996): Shannon-Wiener information and stochastic complexity, In: Proceedings, N. Wiener Centenary Congress, East Lansing, Michigan.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Japan
About this paper
Cite this paper
Bryant, P.G. (1998). On the Minimum Description Length (MDL) Principle for Hierarchical Classifications. In: Hayashi, C., Yajima, K., Bock, HH., Ohsumi, N., Tanaka, Y., Baba, Y. (eds) Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Tokyo. https://doi.org/10.1007/978-4-431-65950-1_17
Download citation
DOI: https://doi.org/10.1007/978-4-431-65950-1_17
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-70208-5
Online ISBN: 978-4-431-65950-1
eBook Packages: Springer Book Archive