MDL for Mixtures of Normal Distributions

Bryant, Peter G.

doi:10.1007/978-3-642-80098-6_1

Peter G. Bryant⁶

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

399 Accesses
1 Citations

Summary

In this paper, I apply Rissanen’s minimum description length (MDL) principle to select the number of components in a mixture of Gaussian distributions and estimate the parameters of those distributions. Wolfe’s (1970) and Day’s (1969) maximum likelihood approaches to this problem do not apply directly to the case of unequal component covariance matrices, because the likelihood function diverges. The MDL approach successfully extends maximum likelihood to cover such cases, though. I apply the MDL method to three data sets from the literature. In general, the MDL approach selects simpler models than classical approaches do.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akaike, H. (1973): Information theory and an extension of the maximum likelihood principle. In: B. N. Petrov and F. Csaki, (eds.): Proc. 2nd International Symp. Inf. Theory. Akademia Kiedo, Budapest, 267–281.
Google Scholar
Banfield, J., and Raftery, A. E. (1993): Model-based Gaussian and non- Gaussian clustering. Biometrics, 803–821.
Google Scholar
Bozdogan, H. (1993): Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse Fisher information matrix. In: O. Opitz, B. Lausen and R. Klar (eds.): Information and Classification: Concepts, Methods and Applications. Proc. Annual Conf., Gesellschaft fiir Klassifikation, Springer-Verlag, Berlin, 40–54.
Google Scholar
Bryant, P. G. (1994): Selecting models using the minimum description length principle. University of Colorado at Denver, College of Business, Faculty Working Paper 1994–14.
Google Scholar
Celeux, G. and Govaert, G. (1995): Gaussian parsimonious clustering. Pattern Recognition, 28(5), 781–793.
Google Scholar
Day, N. E. (1969): Estimating the components of a mixture of normal distributions. Biometrika, 56, 463–474.
Article Google Scholar
Fisher, R. A. (1922): On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, Series A, 222, 309–368.
Article Google Scholar
Fisher, R. A. (1936): The use of multiple measurements in taxonomic problems. Annals of Eugenics, VII, Part II, 179–188.
Google Scholar
Mclachlan, G. and Basford, K. (1988): Mixture models. Marcel Dekker, Inc., New York.
Google Scholar
Miller, R. G. and Halpern, J. W. (1985): Chemical and overt diabetes. In: D. F. Andrews and A. M. Herzberg (eds.): Data. Springer-Verlag, New York, Chapter 36, 215–220.
Google Scholar
Rissanen, J. (1987): Stochastic complexity. Journal of the Royal Statistical Society, Series B, 493 223–265.
Google Scholar
Rissanen, J. (1989): Stochastic Complexity in Statistical Inquiry. World Scientific Publishing Company, Singapore.
Google Scholar
Rissanen, J. (1994): Shannon - Wiener information and stochastic complexity. Address to the N. Wiener Centenary Congress, Michigan State University., East Lansing, Michigan, December 3, 1994. To appear in: Proc. N. Wiener Centenary Congress.
Google Scholar
Windham, M. P. and Cutler, A. (1992): Information ratios for validating cluster analyses. Journal of the American Statistical Association, 87, 1188–1192.
Article Google Scholar
Wolfe, J. H. (1970): Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research, 5, 329–350.
Article Google Scholar
Wright, R. M. and Switzer, P. (1971): Numerical classification applied to certain Jamaican Eocene nummulitids. Mathematical Geology, 33, 297–311.
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Business, University of Colorado at Denver, Campus Box 165, Denver, Colorado, 80217-3364, USA
Peter G. Bryant

Authors

Peter G. Bryant
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Statistik und Wirtschaftsmathematik, Rheinisch-Westfälische Technische Hochschule Aachen (RWTH), Wüllnerstr. 3, D-52056, Aachen, Germany
Hans-Hermann Bock
Institut für Statistik und Ökonometrie, Universität Basel, Holbeinstr. 12, CH-4051, Basel, Switzerland
Wolfgang Polasek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bryant, P.G. (1996). MDL for Mixtures of Normal Distributions. In: Bock, HH., Polasek, W. (eds) Data Analysis and Information Systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-80098-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-80098-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60774-8
Online ISBN: 978-3-642-80098-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics