Skip to main content

Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity

  • Chapter

Abstract

Analysis of clusters by means of mixture distribution, called mixture-model cluster analysis, has been one of the most difficult problems in statistics. But theoretical work, coupled with the development of new computational tools in the past ten years, has been made it possible to overcome some of the intractable technical and numerical issues that have limited the widespread applicability of mixture-model cluster analysis to complex real-word problems. The development of new objective analysis techniques had to wait the emergence of information-based model selection procedure to overcome difficulties with cinventional techniques within the context of the mixture-model cluster analysis. See, e.g., Bozdogan (1992), Windham and Cutler (1993) (in this volume)

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Al-Hussaini, E. K. and Ahmad, K. L. (981). On the Identifiability of Finite Mixtures of Distributions, IEEE Transactions on Information Theory, Vol. IT-27, No. 5, 662–668.

    Google Scholar 

  • Anderson, J. J. (1985). Normal Mixtures and the Number of Clusters Problem, Computational Statistics Quarterly (CQS), Vol. 2, Issue 1, 3–14.

    MATH  Google Scholar 

  • Andrews, D. F. and Herzberg, A. M. (1985), Data, Springer-Verlag, New York.

    Book  MATH  Google Scholar 

  • Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle, in Second International Symposium on Information Theory, B.N. Petrov and F. Csaki (Eds.), Budapest: Academiai Kiado, 267–281.

    Google Scholar 

  • Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE Tran. on Automatic Control, AC-19, 716–723.

    Article  MathSciNet  Google Scholar 

  • Akaike, H. (1978). Time Series Analysis and Control Through Parametric Models, in Applied Time Series Analysis, D. F. Findley (Ed.), Academic Press, New York, 1–23.

    Google Scholar 

  • Akaike, H. (1981). Modern Development of Statistical Methods, in Trends and Progress in System Identification, P. Eykoff (Ed.), Pergamon Press, New York, 169–184.

    Google Scholar 

  • Akaike, H. (1985). Prediction and Entropy, in A Celebration of Statistics: The ISI Centenary Volume, A. C. Atkinson and S. E. Fienberg (Eds.), Springer-Verlag, New York, 1–24.

    Google Scholar 

  • Beale, E. M. (1969). Cluster Analysis, Scientific Control Systems, London.

    Google Scholar 

  • Behboodian, J. (1972). Information Matrix for a Mixture of Two Normal Distributions, Journal of Statistical Computation and Simulation, 1, 295–314.

    Article  MATH  Google Scholar 

  • Bhattacharyya, A. (1943). On a Measure of Divergence Between Two Statistical Populations Defined by Their Probability Distributions, Bull. Calcutta Math. Soc., 35, 99–110.

    MathSciNet  MATH  Google Scholar 

  • Binder, D. A. (1978), Bayesian Cluster Analysis, Biometrika, 65, 31–38.

    Article  MathSciNet  MATH  Google Scholar 

  • Brennan, T. (1980). Multivariate Taxonomic Classification for Criminal Justice Research, Final Report, Vol. 2, Project No. 78-NI-AX-0065, National Institute of Justice, Washington, D. C. 20531.

    Google Scholar 

  • Bock, H. H. (1981), Statistical Testing and Evaluation Methods in Cluster Analysis, in the Proceedings of the Indian Statistical Institute Golden Jubilee International Conference on: Statistics: Applications and New Directions, J. K. Ghosh and J. Roy (Eds.) December 16–19, Calcutta, 116–146.

    Google Scholar 

  • Box, G. E. P. and Cox, D.R. (1964). An Analysis of Transformations, Journal of the Royal Statistical Society, Series B, 26, No. 2, 211–252 (with discussion).

    MathSciNet  MATH  Google Scholar 

  • Bozdogan, H. (1981). Multi-Sample Cluster Analysis and Approaches to Validity Studies in Clustering Individuals. Ph.D. thesis, Department of Mathematics, University of Illinois at Chicago, Chicago, Illinois 60680.

    Google Scholar 

  • Bozdogan, H. (1983), Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria, Technical Report No. UIC/DQM/A83-1, June 16, 1983, ARO Contract DAAG29-82-K-0155, Quantitative Methods Department, University of Illinois at Chicago, Chicago, Illinois 60680.

    Google Scholar 

  • Bozdogan, H. (1987). Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions, Psychometrika, 52, No. 3, Special Section (invited paper), 345–370.

    Article  MathSciNet  MATH  Google Scholar 

  • Bozdogan, H. (1988). ICOMP: A New Model Selection Criterion, in Classification and Related Methods of Data Analysis, Hans H. Bock (Ed.), North-Holland, Amsterdam April, 599–608.

    Google Scholar 

  • Bozdogan, H. (1990a). On the Information-Based Measure of Covariance Complexity and its Application to the Evaluation of Multivariate Linear Models, Communications in Statistics (Theory and Methods) 19 (1), 221–278.

    Article  MathSciNet  MATH  Google Scholar 

  • Bozdogan, H. (1990b). Multisample Cluster Analysis of the Common Principal Component Model in K Groups Using An Entropie Statistical Complexity Criterion, invited paper presented at the International Symposium on Theory and Practice of Classification, December 16–19, Puschino, Soviet Union.

    Google Scholar 

  • Bozdogan, H. (1992). Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix, Invited paper in Studies in Classification, Data Analysis, and Knowledge Organization, O. Opitz, B. Lausen, and R. Klar (Eds.), Springer-Verlag, Heidelberg, Germany. To appear.

    Google Scholar 

  • Calinski, T. and Harabasz, J. (1974). A Dendrite Method for Cluster Analysis, Communications in Statistics, 3, 1–27.

    MathSciNet  MATH  Google Scholar 

  • Carman, C. S. (1989). Pattern Recognition of Magnetic Resonance Images with Application to Atherosclerosis. Ph.D. thesis, Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia 22903.

    Google Scholar 

  • Chernoff, H. (1954). On the Distribution of the Likelihood Ratio, Annals of Mathematical Statistics, 25, 573–578.

    Article  MathSciNet  MATH  Google Scholar 

  • Cutler A. and Windham, M. P. (1993). Information-Based Validity Functionals for Mixture Analysis, in Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, H. Bozdogan (Ed.), Kluwer Academic Publishers, Dordrecht, The Netherlands.

    Google Scholar 

  • Day, N. E. (1969), Estimating the Components of a Mixture of Normal Distributions, Biometrika, 11, 235–254.

    Google Scholar 

  • Dempster, A., Laird, N. M., and Rubin D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, Series B, 29, 1–38 (with discussion).

    MathSciNet  Google Scholar 

  • Everitt, B. S. (1974). Cluster Analysis, Heinemann Educational Books, London.

    Google Scholar 

  • Everitt, B. S. (1979), Unresolved Problems in Cluster Analysis, Biometrics, 35, 169–181.

    Article  MATH  Google Scholar 

  • Everitt, B. S. (1981). A Monte Carlo Investigation of the Likelihood Ratio Test for the Number of Components in a Mixture of Normal Distributions, Multivariate Behavioral Research, 16, 171–180.

    Article  Google Scholar 

  • Everitt, B. S., and Hand, D. J. (1981), Finite Mixture Distributions, Chapman and Hall, New York.

    Book  MATH  Google Scholar 

  • Feder, P. I. (1968). On the Distribution of the Log Likelihood Ratio Test Statistic When the True Parameter is “Near” the Boundaries of the Hypothesis Regions, Annals of Mathematical Statistics, 39, 2044–2055.

    Article  MathSciNet  MATH  Google Scholar 

  • Fleiss, J. L. and Zubin, J. (1969). On the Methods and Theory of Clustering, Multivariate Behavioral Research, 4, 235–250.

    Article  Google Scholar 

  • Fukunaga, K. (1990). Statistical Pattern Recognition, 2nd Edition, Academic Press, New York.

    MATH  Google Scholar 

  • Gordon, A. D. (1981). Classification, Chapman and Hall, London.

    MATH  Google Scholar 

  • Hartigan, J. A. (1975), Clustering Algorithms, John Wiley & Sons, New York.

    MATH  Google Scholar 

  • Hartigan, J. A. (1977), Distribution Problems in Clustering, in Classification and Clustering, J. Van Ryzin (Ed.), Academic Press, New York, 45–71.

    Google Scholar 

  • Hartigan, J. (1985). Statistical Theory in Clustering, Journal of Classification, 2, 63–76.

    Article  MathSciNet  MATH  Google Scholar 

  • Hawkins, D. M., Muller, M. W., and Krooden, J. A. T. (1982). Cluster Analysis, in Topics in Applied Multivariate Analysis, D. M. Hawkins (Ed.), Cambridge University Press, Cambridge, 303–356.

    Chapter  Google Scholar 

  • Henna, J. (1985). On Estimating of the Number of Constituents of a Finite Mixture of Continuous Distributions, Annals of the Institute of Statistical Mathematics, Part A, 37, 235–240.

    Article  MathSciNet  MATH  Google Scholar 

  • Henna, J. (1986). An Application of a Mixture Method to Classification, Journal of Japan Statistical Society, Vol. 16, No. 2, 133–143.

    MathSciNet  MATH  Google Scholar 

  • John, S. (1970), On Identifying the Population of Origin of Each Observation in a Mixture of Observations from Two Normal Populations, Technometrics, 12, 553–563.

    Article  Google Scholar 

  • Kendall, M. G. and Stuart, M. A. (1979). The Advanced Theory of Statistics, Vol. 2, Fourth Edition, Hafner Publishing, New York.

    MATH  Google Scholar 

  • Kullback, S., and Leibler, R. A. (1951), On Information and Sufficiency, Annals of Mathematical Statistics, 22, 79–86.

    Article  MathSciNet  MATH  Google Scholar 

  • Magnus, J. R. (1988), Linear Structures, Oxford University Press, New York.

    MATH  Google Scholar 

  • Magnus, J. R. (1989), Personal correspondence.

    Google Scholar 

  • Magnus, J. R., and Neudecker, H. (1988), Matrix Differential Calculus with Applications in Statistics and Econometrics, John Wiley & Sons, New York.

    MATH  Google Scholar 

  • Maklad, M. S., and Nichols, T. (1980), A New Approach to Model Structure Discrimination, IEEE Transactions on Systems, Man, and Cybernetics, SMC-10, No. 2, 78–84

    Article  MathSciNet  MATH  Google Scholar 

  • Marcha, K. V. (1970). Measures of Multivariate Skewness and Kurtosis with Applications, Biometrika, 57, 519–530.

    Article  MathSciNet  Google Scholar 

  • Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979). Multivariate Analysis, Academic Press, New York.

    MATH  Google Scholar 

  • Maronna, R. and Jacovkis, P. M. (1974). Multivariate Clustering Procedures with Variable Metrics, Biometrics, 30, 499–505.

    Article  MATH  Google Scholar 

  • Marriott, F. H. C. (1971). Practical Problems in a Method of Cluster Analysis, Biometrics, 27, 501–514.

    Article  Google Scholar 

  • Matusita, K. and Ohsumi, N. (1981). Evaluation Procedure of Some Clustering Techniques. Unpublished paper, The Institute of Statistical Mathematics, Tokyo, Japan.

    Google Scholar 

  • McLachlan, G. J., and Basford, K. E. (1988), Mixture Models: Inference and Applications to Clustering, Marcel Dekker, Inc., New York.

    MATH  Google Scholar 

  • Pearlman, J.D. (1986). Nuclear Magnetic Resonance Spectral Signatures of Liquid Crystals in Human Atheroma as Basis for Multi-Dimensional Digital Imaging of Atherosclerosis. Ph.D. thesis, Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia 22903.

    Google Scholar 

  • Pearlman, J.D., Bozdogan, H., Brown, M. G., Ware, S., Carman, C. S., Merickel, M. B., Cail, B., Brookeman, J. R., and Ayers, C. R. (1986). Early Detection and Quantification of Human Atheroma in the Aortic Wall by a Multidimensional Approach to 1H-NMR Imaging. Presented at the 59th Annual Session of the American Heart Association, Dallas, Texas

    Google Scholar 

  • Pearlman, J.D., Bozdogan, H., Brown, M. G., Ware, S., Carman, C. S., Merickel, M. B., Cail, B., Brookeman, J. R., and Ayers, C. R. 1986, Circulation, 74: II–202,

    Google Scholar 

  • Peters, B. C. and Walker, H. F. (1978). An Iterative Procedure for Obtaining Maximum-Likelihood Estimates of the Parameters for a Mixture of Normal Distributions, SIAM Journal of Applied Mathematics, 35, 362–378.

    Article  MathSciNet  MATH  Google Scholar 

  • Reaven, G. M. and Miller, R. G. (1979), An attempt to define the nature of chemical diabetes using a multidimensional analysis, Diabetologia 16, 17–24.

    Article  Google Scholar 

  • Rissanen, J. (1976), Minmax Entropy Estimation of Models for Vector Processes, in: R. K. Mehra and D. G. Lainiotis (eds.), System Identification, Academic Press, New York, 97–119.

    Google Scholar 

  • Rissanen, J. (1978). Modeling by Shortest Data Description, Automatica, 14, 465–471.

    Article  MATH  Google Scholar 

  • Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, World Scientific Publishing Company, Teaneck, New Jersey.

    MATH  Google Scholar 

  • Rissanen, J. and Ristad, E. S. (1993). Unsupervised Classification With Stochastic Complexity, in Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, H. Bozdogan (Ed.), Kluwer Academic Publishers, Dordrecht, The Netherlands.

    Google Scholar 

  • Schwarz, G. (1978). Estimating the Dimension of a Model, Annals of Statistics, 6, 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Sclove, S. L. (1977), Population Mixture Models and Clustering Algorithms, Communications in Statistics (Theory and Methods) A6, 417–434.

    Article  MathSciNet  MATH  Google Scholar 

  • Sclove, S. L. (1982), Application of the Conditional Population Mixture Model to Image Segmentation, Technical Report A82-1,1982, ARO Contract DAAG29-82-K-0155, University of Illinois at Chicago, Chicago, Illinois 60680.

    Google Scholar 

  • Sclove, S. L. (1993). Some Aspects of Model-Selection Criteria, in Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, H. Bozdogan (Ed.), Kluwer Academic Publishers, Dordrecht, the Netherlands.

    Google Scholar 

  • Scott, A. J., and Symons, M. J. (1971). Clustering Methods Based on Likelihood Ratio Criteria, Biometrics, 27, 387–397.

    Article  Google Scholar 

  • Sokal, R. R. (1977). Clustering and Classification: Background and Current Directions, in Classification and Clustering, J. Van Ryzin (Ed.), Academic Press, New York, 1–15.

    Google Scholar 

  • Sturges, H. A. (1926). The Choice of Class Intervals, Journal of the American Statistical Association, 21, 65–66.

    Article  Google Scholar 

  • Symons, M. J. (1981). Clustering Criteria and Multivariate Normal Mixtures, Biometrics, 37, 35–43.

    Article  MathSciNet  MATH  Google Scholar 

  • Teicher, H. (1961). Identifiability of Mixtures, Annals of Mathematical Statistics, 32, 244–248.

    Article  MathSciNet  MATH  Google Scholar 

  • Teicher, H. (1963). Identifiability of Finite Mixtures, Annals of Mathematical Statistics, 34, 1265–1269.

    Article  MathSciNet  MATH  Google Scholar 

  • Titterington, D. M. (1982). Some Problems with Data from Finite Mixture Distributions, Mathematics Research Center Summary Report No. 2369, University of Wisconsin, Madison, Wisconsin.

    Google Scholar 

  • Titterington, D. M., Smith, A. M. F., and Makov, U. E. (1985), Statistical Analysis of Finite Mixture Distributions, John Wiley & Sons, New York.

    MATH  Google Scholar 

  • Van Emden, M. H. (1971), An Analysis of Complexity, Mathematical Centre Tracts, 35, Amsterdam.

    MATH  Google Scholar 

  • Wald, A. (1943). Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large, Trans. of the American Math Society, 54, 426–482.

    Article  MathSciNet  MATH  Google Scholar 

  • Wilks, S. S. (1938). The Large Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses, Annals of Mathematical Statistics, 9, 60–62.

    Article  MATH  Google Scholar 

  • Windham, M. P. and Cutler, A. (1992). Information Ratios for Validating Mixture Analyses, Journal of the American Statistical Association, 87, No. 420, 1188–1192.

    Article  Google Scholar 

  • Wolfe, J. H. (1967), NORMIX: Computational Methods for Estimating the Parameters of Multivariate Normal Mixtures of Distributions, Research Memorandum, SRM 68-2, U. S. Naval Personnel Research Activity, San Diego, California.

    Google Scholar 

  • Wolfe, J. H. (1970), Pattern Clustering by Multivariate Mixture Analysis, Multivariate Behavioral Res., 5, 329–350.

    Article  Google Scholar 

  • Wolfe, J. H. (1971). A Monte-Carlo Study of the Sampling Distribution of the Likelihood Ratio for Mixtures of Multinormal Distribution, Research Memorandum 72-2, U. S. Naval Personnel Research Activity, San Diego, California.

    Google Scholar 

  • Wong, M. A. (1982). A Hybrid Clustering Method for Identifying High-Density Clusters, Journal of the American Statistical Association, 77, No. 380, 841–847.

    Article  MathSciNet  MATH  Google Scholar 

  • Yakowitz, S. and Spragins, J. (1968). On the Identifiability of Finite Mixtures, Annals of Mathematical Statistics, 39, 209–214.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Additional information

Dedicated to Professor Akaike on the occasion of his 65th birthday celebration.

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Bozdogan, H. (1994). Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity. In: Bozdogan, H., et al. Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0800-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-94-011-0800-3_3

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-4344-1

  • Online ISBN: 978-94-011-0800-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics