Summary
This paper considers the problem of choosing the number of component clusters of individuals, determining the variables which are contributing to the differences between the clusters using all possible subset selection of variables, and detecting outliers or extreme observations across the clustering alternatives in one expert-system simultaneously within the context of the standard mixture of multivariate normal distributions. This is achieved by introducing and deriving a new informational measure of complexity (ICOMP) criterion of the estimated inverse-Fisher information matrix (IFIM) developed by Bozdogan as an alternative to Akaike’ s information criterion (AIC), and Bozdogan’s CAIC for the mixture-model. A numerical example is shown on a real data set to illustrate the significance of these validity functionals.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AKAIKE, H. (1973): Information Theory and an Extension of the Maximum Likelihood Principle. In: B. N. Pretrov and F. Csaki (Eds.), Second International Symposium on Information Theory, Academiai Kiado Budapest, 267–281.
ALTMAN, E. I. (1968): Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, Vol. 23, 589–609.
BINDER, D.A. (1978): Bayesian Cluster Analysis. Biometrika, 65, 31–38.
BOZDOGAN, H. (1981): Multi-Smple Cluster Analysis and Approaches to Validity Studies in Clustering Individuals. Ph.D. thesis, Department of Mathematics, University of Illinois at Chicago, Chicago, Illinois 60680.
BOZDOGAN, H. (1983): Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria. Technical Report No. UIC/DQM/A83–1, June 16, 1983, ARO Contract DAAG29–82-k-0155, Quantitative Methods Department, University of Illinois at Chicago, Chicago, Illinois 60680.
BOZDOGAN, H. (1987): Model Slection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions. Psychometrika, Vol. 52, No. 3, 1987, Special Section (invited paper), pp.345–370.
BOZDOGAN, H. (1988): ICOMP: A New Model Selection Criterion. In: Hans H. Bock (Ed.), Classification and Related Methods of Data Analysis, North-Holland, Amsterdam, April, 599–608.
BOZDOGAN, H. (1990a): On the Information-Based Measure of Covariance Complexity and its Application to the Evaluation of Multivariate Linear Models. Communications in Statistics, Theory and Methods, 19(1), 221–278.
BOZDOGAN, H. (1990b): Multisample Cluster Analysis of the Common Principle Component Model in & Groups Using an Entropic Statistical Complexity Criterion. Invited paper presented at the International Symposium on Theory and Practice of Classification, December 16–19, Puschino, Soviet Union.
BOZDOGAN, H. (1993): Choosing the Number of Component Clusters in the Mixture Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix. In: O. Opitz, B. Lausen, and R. Klar (Eds.): Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg, Germany, 40–54.
BOZDOGAN, H. (1994): Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Information Measure of Complexity. In: H. Bozdogan (Ed.): Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Information Approach. Kluwer Academic Publishers, Dordrecht, the Netherlands, 69–113.
DAY, N.E. (1969): Estinmating the Components of a Mixture of Normal Distributions. Biometrika, 11, 235–254.
HARTIGAN, J.A. (1977). Distribution Problems in Clustering. In: J. Van Ryzin (Ed.), Classification and Clustering, Academic Press, New York, 45–71.
HAWKINS, D. M., MULLER, M. W., and KROODEN, J. A. T. (1982): Cluster Analysis. In: D. M. Hawkins (Ed.): Topics in Applied Multivariate Analysis. Cambridge University Press, Cambridge, 303–356.
MAGNUS, J.R. (1989): Personal correspondence.
MORRISON, D. F. (1990): Multivariate Statistical Methods, 3Third Edition, Mc-Graw-Hill, Inc., New York, N.Y.
SCLOVE, S.L. (1977): Population Mixture Models and Clustering Algorithms, Communications in Statistics, Theory and Methods, A6(5), 417–434.
SCLOVE, S.L. (1982): Application of the Conditional Population Mixture Model to Image Segmentation, Technical Report A82–1, 1982, ARO Contract DAAG29–82-K-0155, University of Illinois at Chicago, Chicago, Illinois 60680.
VAN EMDEN, M.H. (1971): An Anlysis of Complexity, Mathematical Center Tracts, 35, Amsterdam.
WOLFE, J.H. (1967): NORMIX: Computational Methods for Estimating the Parameters of Multivariate Normal Mixtures of Distributions. Research Memorandum, SRM 68–2, U.S. Naval Personnel Research Activity, San Diego, California.
WOLFE, J.H. (1970): Pattern Clustering by Multivariate Mixture Analysis. Multivariate Behavioral Res., 5, 329–350.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bozdogan, H. (1994). Choosing the Number of Clusters, Subset Selection of Variables, and Outlier Detection in the Standard Mixture-Model Cluster Analysis. In: Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., Burtschy, B. (eds) New Approaches in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-51175-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-51175-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58425-4
Online ISBN: 978-3-642-51175-2
eBook Packages: Springer Book Archive