Choosing the Number of Clusters, Subset Selection of Variables, and Outlier Detection in the Standard Mixture-Model Cluster Analysis

Bozdogan, Hamparsum

doi:10.1007/978-3-642-51175-2_19

Hamparsum Bozdogan⁸

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

664 Accesses
9 Citations

Summary

This paper considers the problem of choosing the number of component clusters of individuals, determining the variables which are contributing to the differences between the clusters using all possible subset selection of variables, and detecting outliers or extreme observations across the clustering alternatives in one expert-system simultaneously within the context of the standard mixture of multivariate normal distributions. This is achieved by introducing and deriving a new informational measure of complexity (ICOMP) criterion of the estimated inverse-Fisher information matrix (IFIM) developed by Bozdogan as an alternative to Akaike’ s information criterion (AIC), and Bozdogan’s CAIC for the mixture-model. A numerical example is shown on a real data set to illustrate the significance of these validity functionals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AKAIKE, H. (1973): Information Theory and an Extension of the Maximum Likelihood Principle. In: B. N. Pretrov and F. Csaki (Eds.), Second International Symposium on Information Theory, Academiai Kiado Budapest, 267–281.
Google Scholar
ALTMAN, E. I. (1968): Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, Vol. 23, 589–609.
Article Google Scholar
BINDER, D.A. (1978): Bayesian Cluster Analysis. Biometrika, 65, 31–38.
Article Google Scholar
BOZDOGAN, H. (1981): Multi-Smple Cluster Analysis and Approaches to Validity Studies in Clustering Individuals. Ph.D. thesis, Department of Mathematics, University of Illinois at Chicago, Chicago, Illinois 60680.
Google Scholar
BOZDOGAN, H. (1983): Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria. Technical Report No. UIC/DQM/A83–1, June 16, 1983, ARO Contract DAAG29–82-k-0155, Quantitative Methods Department, University of Illinois at Chicago, Chicago, Illinois 60680.
Google Scholar
BOZDOGAN, H. (1987): Model Slection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions. Psychometrika, Vol. 52, No. 3, 1987, Special Section (invited paper), pp.345–370.
Article Google Scholar
BOZDOGAN, H. (1988): ICOMP: A New Model Selection Criterion. In: Hans H. Bock (Ed.), Classification and Related Methods of Data Analysis, North-Holland, Amsterdam, April, 599–608.
Google Scholar
BOZDOGAN, H. (1990a): On the Information-Based Measure of Covariance Complexity and its Application to the Evaluation of Multivariate Linear Models. Communications in Statistics, Theory and Methods, 19(1), 221–278.
Article Google Scholar
BOZDOGAN, H. (1990b): Multisample Cluster Analysis of the Common Principle Component Model in & Groups Using an Entropic Statistical Complexity Criterion. Invited paper presented at the International Symposium on Theory and Practice of Classification, December 16–19, Puschino, Soviet Union.
Google Scholar
BOZDOGAN, H. (1993): Choosing the Number of Component Clusters in the Mixture Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix. In: O. Opitz, B. Lausen, and R. Klar (Eds.): Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg, Germany, 40–54.
Google Scholar
BOZDOGAN, H. (1994): Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Information Measure of Complexity. In: H. Bozdogan (Ed.): Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Information Approach. Kluwer Academic Publishers, Dordrecht, the Netherlands, 69–113.
Chapter Google Scholar
DAY, N.E. (1969): Estinmating the Components of a Mixture of Normal Distributions. Biometrika, 11, 235–254.
Google Scholar
HARTIGAN, J.A. (1977). Distribution Problems in Clustering. In: J. Van Ryzin (Ed.), Classification and Clustering, Academic Press, New York, 45–71.
Google Scholar
HAWKINS, D. M., MULLER, M. W., and KROODEN, J. A. T. (1982): Cluster Analysis. In: D. M. Hawkins (Ed.): Topics in Applied Multivariate Analysis. Cambridge University Press, Cambridge, 303–356.
Chapter Google Scholar
MAGNUS, J.R. (1989): Personal correspondence.
Google Scholar
MORRISON, D. F. (1990): Multivariate Statistical Methods, 3Third Edition, Mc-Graw-Hill, Inc., New York, N.Y.
Google Scholar
SCLOVE, S.L. (1977): Population Mixture Models and Clustering Algorithms, Communications in Statistics, Theory and Methods, A6(5), 417–434.
Article Google Scholar
SCLOVE, S.L. (1982): Application of the Conditional Population Mixture Model to Image Segmentation, Technical Report A82–1, 1982, ARO Contract DAAG29–82-K-0155, University of Illinois at Chicago, Chicago, Illinois 60680.
Google Scholar
VAN EMDEN, M.H. (1971): An Anlysis of Complexity, Mathematical Center Tracts, 35, Amsterdam.
Google Scholar
WOLFE, J.H. (1967): NORMIX: Computational Methods for Estimating the Parameters of Multivariate Normal Mixtures of Distributions. Research Memorandum, SRM 68–2, U.S. Naval Personnel Research Activity, San Diego, California.
Google Scholar
WOLFE, J.H. (1970): Pattern Clustering by Multivariate Mixture Analysis. Multivariate Behavioral Res., 5, 329–350.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, The University of Tennessee, Knoxville, TN, 37996-0532, USA
Hamparsum Bozdogan

Authors

Hamparsum Bozdogan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut National de Recherche en Informatique et en Automatique (INRIA), F-75150, Rocquencourt, Le Chesnay, France
Edwin Diday & Yves Lechevallier &
Universität Mannheim, Schloß, D-68131, Mannheim, Germany
Martin Schader (Lehrstuhl für Wirtschaftsinformatik III) (Lehrstuhl für Wirtschaftsinformatik III)
Université Paris IX Dauphine, Pl. du Maréchal de Lattre de Tassigny, F-75775, Paris Cedex 16, France
Patrice Bertrand
TELECOM-Paris, 46, rue Barrault, F-75013, Paris, France
Bernard Burtschy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bozdogan, H. (1994). Choosing the Number of Clusters, Subset Selection of Variables, and Outlier Detection in the Standard Mixture-Model Cluster Analysis. In: Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., Burtschy, B. (eds) New Approaches in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-51175-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-51175-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58425-4
Online ISBN: 978-3-642-51175-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics