Skip to main content

Choosing the Number of Clusters, Subset Selection of Variables, and Outlier Detection in the Standard Mixture-Model Cluster Analysis

  • Conference paper
New Approaches in Classification and Data Analysis

Summary

This paper considers the problem of choosing the number of component clusters of individuals, determining the variables which are contributing to the differences between the clusters using all possible subset selection of variables, and detecting outliers or extreme observations across the clustering alternatives in one expert-system simultaneously within the context of the standard mixture of multivariate normal distributions. This is achieved by introducing and deriving a new informational measure of complexity (ICOMP) criterion of the estimated inverse-Fisher information matrix (IFIM) developed by Bozdogan as an alternative to Akaike’ s information criterion (AIC), and Bozdogan’s CAIC for the mixture-model. A numerical example is shown on a real data set to illustrate the significance of these validity functionals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • AKAIKE, H. (1973): Information Theory and an Extension of the Maximum Likelihood Principle. In: B. N. Pretrov and F. Csaki (Eds.), Second International Symposium on Information Theory, Academiai Kiado Budapest, 267–281.

    Google Scholar 

  • ALTMAN, E. I. (1968): Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, Vol. 23, 589–609.

    Article  Google Scholar 

  • BINDER, D.A. (1978): Bayesian Cluster Analysis. Biometrika, 65, 31–38.

    Article  Google Scholar 

  • BOZDOGAN, H. (1981): Multi-Smple Cluster Analysis and Approaches to Validity Studies in Clustering Individuals. Ph.D. thesis, Department of Mathematics, University of Illinois at Chicago, Chicago, Illinois 60680.

    Google Scholar 

  • BOZDOGAN, H. (1983): Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria. Technical Report No. UIC/DQM/A83–1, June 16, 1983, ARO Contract DAAG29–82-k-0155, Quantitative Methods Department, University of Illinois at Chicago, Chicago, Illinois 60680.

    Google Scholar 

  • BOZDOGAN, H. (1987): Model Slection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions. Psychometrika, Vol. 52, No. 3, 1987, Special Section (invited paper), pp.345–370.

    Article  Google Scholar 

  • BOZDOGAN, H. (1988): ICOMP: A New Model Selection Criterion. In: Hans H. Bock (Ed.), Classification and Related Methods of Data Analysis, North-Holland, Amsterdam, April, 599–608.

    Google Scholar 

  • BOZDOGAN, H. (1990a): On the Information-Based Measure of Covariance Complexity and its Application to the Evaluation of Multivariate Linear Models. Communications in Statistics, Theory and Methods, 19(1), 221–278.

    Article  Google Scholar 

  • BOZDOGAN, H. (1990b): Multisample Cluster Analysis of the Common Principle Component Model in & Groups Using an Entropic Statistical Complexity Criterion. Invited paper presented at the International Symposium on Theory and Practice of Classification, December 16–19, Puschino, Soviet Union.

    Google Scholar 

  • BOZDOGAN, H. (1993): Choosing the Number of Component Clusters in the Mixture Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix. In: O. Opitz, B. Lausen, and R. Klar (Eds.): Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg, Germany, 40–54.

    Google Scholar 

  • BOZDOGAN, H. (1994): Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Information Measure of Complexity. In: H. Bozdogan (Ed.): Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Information Approach. Kluwer Academic Publishers, Dordrecht, the Netherlands, 69–113.

    Chapter  Google Scholar 

  • DAY, N.E. (1969): Estinmating the Components of a Mixture of Normal Distributions. Biometrika, 11, 235–254.

    Google Scholar 

  • HARTIGAN, J.A. (1977). Distribution Problems in Clustering. In: J. Van Ryzin (Ed.), Classification and Clustering, Academic Press, New York, 45–71.

    Google Scholar 

  • HAWKINS, D. M., MULLER, M. W., and KROODEN, J. A. T. (1982): Cluster Analysis. In: D. M. Hawkins (Ed.): Topics in Applied Multivariate Analysis. Cambridge University Press, Cambridge, 303–356.

    Chapter  Google Scholar 

  • MAGNUS, J.R. (1989): Personal correspondence.

    Google Scholar 

  • MORRISON, D. F. (1990): Multivariate Statistical Methods, 3Third Edition, Mc-Graw-Hill, Inc., New York, N.Y.

    Google Scholar 

  • SCLOVE, S.L. (1977): Population Mixture Models and Clustering Algorithms, Communications in Statistics, Theory and Methods, A6(5), 417–434.

    Article  Google Scholar 

  • SCLOVE, S.L. (1982): Application of the Conditional Population Mixture Model to Image Segmentation, Technical Report A82–1, 1982, ARO Contract DAAG29–82-K-0155, University of Illinois at Chicago, Chicago, Illinois 60680.

    Google Scholar 

  • VAN EMDEN, M.H. (1971): An Anlysis of Complexity, Mathematical Center Tracts, 35, Amsterdam.

    Google Scholar 

  • WOLFE, J.H. (1967): NORMIX: Computational Methods for Estimating the Parameters of Multivariate Normal Mixtures of Distributions. Research Memorandum, SRM 68–2, U.S. Naval Personnel Research Activity, San Diego, California.

    Google Scholar 

  • WOLFE, J.H. (1970): Pattern Clustering by Multivariate Mixture Analysis. Multivariate Behavioral Res., 5, 329–350.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bozdogan, H. (1994). Choosing the Number of Clusters, Subset Selection of Variables, and Outlier Detection in the Standard Mixture-Model Cluster Analysis. In: Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., Burtschy, B. (eds) New Approaches in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-51175-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-51175-2_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58425-4

  • Online ISBN: 978-3-642-51175-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics