Mixture Models for Classification

Celeux, Gilles

doi:10.1007/978-3-540-70981-7_1

Gilles Celeux³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

4043 Accesses
12 Citations

Abstract

Finite mixture distributions provide efficient approaches of model-based clustering and classification. The advantages of mixture models for unsupervised classification are reviewed. Then, the article is focusing on the model selection problem. The usefulness of taking into account the modeling purpose when selecting a model is advocated in the unsupervised and supervised classification contexts. This point of view had lead to the definition of two penalized likelihood criteria, ICL and BEC, which are presented and discussed. Criterion ICL is the approximation of the integrated completed likelihood and is concerned with model-based cluster analysis. Criterion BEC is the approximation of the integrated conditional likelihood and is concerned with generative models of classification. The behavior of ICL for choosing the number of components in a mixture model and of BEC to choose a model minimizing the expected error rate are analyzed in contrast with standard model selection criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AITKIN, M. (2001): Likelihood and Bayesian Analysis of Mixtures. Statistical Modeling, 1, 287–304.
Article Google Scholar
AKAIKE, H. (1974): A New Look at Statistical Model Identification. IEEE Transactions on Automatic Control, 19, 716–723.
Article MathSciNet MATH Google Scholar
BANFIELD and RAFTERY, A.E. (1993): Model-based Gaussian and Non-Gaussian Clustering. Biometrics, 49, 803–821.
Article MathSciNet MATH Google Scholar
BENSMAIL, H. and CELEUX, G. (1996): Regularized Gaussian Discriminant Analysis Through Eigenvalue Decomposition. Journal of the American Statistical Association, 91, 1743–48.
Article MathSciNet MATH Google Scholar
BIERNACKI, C., CELEUX., G. and GOVAERT, G. (2000): Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood. IEEE Trans. on PAMI, 22, 719–725.
Article Google Scholar
BIERNACKI, C., CELEUX., G. and GOVAERT, G. (2003): Choosing Starting Values for the EM Algorithm for Getting the Highest Likelihood in Multivariate Gaussian Mixture Models. Computational Statistics and Data Analysis, 41, 561–575.
Article MathSciNet MATH Google Scholar
BIERNACKI, C., CELEUX, G., GOVAERT G. and LANGROGNET F. (2006): Model-based Cluster Analysis and Discriminant Analysis With the MIXMOD Software, Computational Statistics and Data Analysis (to appear).
Google Scholar
BOUCHARD, G. and CELEUX, G. (2006): Selection of Generative Models in Classification. IEEE Trans. on PAMI, 28, 544–554.
Article Google Scholar
BRYANT, P. and WILLIAMSON, J. (1978): Asymptotic Behavior of Classification Maximum Likelihood Estimates. Biometrika, 65, 273–281.
Article MATH Google Scholar
CELEUX, G., CHAUVEAU, D. and DIEBOLT, J. (1996): Some Stochastic Versions of the EM Algorithm. Journal of Statistical Computation and Simulation, 55, 287–314.
Article MATH Google Scholar
CELEUX, G. and GOVAERT, G. (1991): Clustering Criteria for Discrete Data and Latent Class Model. Journal of Classification, 8, 157–176.
Article MATH Google Scholar
CELEUX, G. and GOVAERT, G. (1992): A Classification EM Algorithm for Clustering and Two Stochastic Versions. Computational Statistics and Data Analysis, 14, 315–332.
Article MathSciNet MATH Google Scholar
CELEUX, G. and GOVAERT, G. (1993): Comparison of the Mixture and the Classification Maximum Likelihood in Cluster Analysis. Journal of Computational and Simulated Statistics, 14, 315–332.
MathSciNet Google Scholar
CIUPERCA, G., IDIER, J. and RIDOLFI, A.(2003): Penalized Maximum Likelihood Estimator for Normal Mixtures. Scandinavian Journal of Statistics, 30, 45–59.
Article MathSciNet MATH Google Scholar
DEMPSTER, A.P., LAIRD, N.M. and RUBIN, D.B. (1977): Maximum Likelihood From Incomplete Data Via the EM Algorithm (With Discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.
MathSciNet MATH Google Scholar
DIEBOLT, J. and ROBERT, C. P. (1994): Estimation of Finite Mixture Distributions by Bayesian Sampling. Journal of the Royal Statistical Society, Series B, 56, 363–375.
MathSciNet MATH Google Scholar
FIGUEIREDO, M. and JAIN, A.K. (2002): Unsupervised Learning of Finite Mixture Models. IEEE Trans. on PAMI, 24, 381–396.
Article Google Scholar
FRALEY, C. and RAFTERY, A.E. (1998): How Many Clusters? Answers via Modelbased Cluster Analysis. The Computer Journal, 41, 578–588.
Article MATH Google Scholar
FRIEDMAN, J. (1989): Regularized Discriminant Analysis. Journal of the American Statistical Association, 84, 165–175.
Article MathSciNet Google Scholar
GANESALINGAM, S. and MCLACHLAN, G. J. (1978): The Efficiency of a Linear Discriminant Function Based on Unclassified Initial Samples. Biometrika, 65, 658–662.
Article MathSciNet MATH Google Scholar
GOODMAN, L.A. (1974): Exploratory Latent Structure Analysis Using Both Identifiable and Unidentifiable Models. Biometrika, 61, 215–231.
Article MathSciNet MATH Google Scholar
HASTIE, T. and TIBSHIRANI, R. (1996): Discriminant Analysis By Gaussian Mixtures. Journal of the Royal Statistical Society, Series B, 58, 158–176.
MathSciNet Google Scholar
HUNT, L.A. and BASFORD K.E. (2001): Fitting a Mixture Model to Three-mode Three-way Data With Missing Information. Journal of Classification, 18, 209–226.
MathSciNet MATH Google Scholar
KASS, R.E. and RAFTERY, A.E. (1995): Bayes Factors. Journal of the American Statistical Association, 90, 773–795.
Article MathSciNet MATH Google Scholar
KERIBIN, C. (2000): Consistent Estimation of the Order of Mixture. Sankhya, 62, 49–66.
MathSciNet MATH Google Scholar
MARIN, J.-M., MENGERSEN, K. and ROBERT, C.P. (2005): Bayesian Analysis of Finite mixtures. Handbook of Statistics, Vol. 25, Chapter 16. Elsevier B.V.
Google Scholar
MCLACHLAN, G.J. and PEEL, D. (2000): Finite Mixture Models. Wiley, New York.
Book MATH Google Scholar
RAFTERY, A.E. (1995): Bayesian Model Selection in Social Research (With Discussion). In: P.V. Marsden (Ed.): Sociological Methodology 1995, Oxford, U.K.: Blackwells, 111–196.
Google Scholar
RAFTERY, A.E. and DEAN, N. (2006): Journal of the American Statistical Association, 101, 168–78.
Article MathSciNet MATH Google Scholar
ROEDER, K. (1990): Density Estimation with Confidence Sets Exemplified by Superclusters and Voids in Galaxies. Journal of the American Statistical Association, 85, 617–624.
Article MATH Google Scholar
SCHWARZ, G. (1978): Estimating the Dimension of a Model. The Annals of Statistics, 6, 461–464.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Inria Futurs, Orsay, France
Gilles Celeux

Authors

Gilles Celeux
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Administration and Economics, Bielefeld University, Universitätsstr. 25, 33501, Bielefeld, Germany
Reinhold Decker
Department of Economics, Freie Universität Berlin, Garystraße 21, 14195, Berlin, Germany
Hans -J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Celeux, G. (2007). Mixture Models for Classification. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-70981-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics