Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity

Bozdogan, Hamparsum

doi:10.1007/978-94-011-0800-3_3

Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity

Hamparsum Bozdogan⁶

Chapter

279 Accesses
60 Citations

Abstract

Analysis of clusters by means of mixture distribution, called mixture-model cluster analysis, has been one of the most difficult problems in statistics. But theoretical work, coupled with the development of new computational tools in the past ten years, has been made it possible to overcome some of the intractable technical and numerical issues that have limited the widespread applicability of mixture-model cluster analysis to complex real-word problems. The development of new objective analysis techniques had to wait the emergence of information-based model selection procedure to overcome difficulties with cinventional techniques within the context of the mixture-model cluster analysis. See, e.g., Bozdogan (1992), Windham and Cutler (1993) (in this volume)

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Al-Hussaini, E. K. and Ahmad, K. L. (981). On the Identifiability of Finite Mixtures of Distributions, IEEE Transactions on Information Theory, Vol. IT-27, No. 5, 662–668.
Google Scholar
Anderson, J. J. (1985). Normal Mixtures and the Number of Clusters Problem, Computational Statistics Quarterly (CQS), Vol. 2, Issue 1, 3–14.
MATH Google Scholar
Andrews, D. F. and Herzberg, A. M. (1985), Data, Springer-Verlag, New York.
Book MATH Google Scholar
Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle, in Second International Symposium on Information Theory, B.N. Petrov and F. Csaki (Eds.), Budapest: Academiai Kiado, 267–281.
Google Scholar
Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE Tran. on Automatic Control, AC-19, 716–723.
Article MathSciNet Google Scholar
Akaike, H. (1978). Time Series Analysis and Control Through Parametric Models, in Applied Time Series Analysis, D. F. Findley (Ed.), Academic Press, New York, 1–23.
Google Scholar
Akaike, H. (1981). Modern Development of Statistical Methods, in Trends and Progress in System Identification, P. Eykoff (Ed.), Pergamon Press, New York, 169–184.
Google Scholar
Akaike, H. (1985). Prediction and Entropy, in A Celebration of Statistics: The ISI Centenary Volume, A. C. Atkinson and S. E. Fienberg (Eds.), Springer-Verlag, New York, 1–24.
Google Scholar
Beale, E. M. (1969). Cluster Analysis, Scientific Control Systems, London.
Google Scholar
Behboodian, J. (1972). Information Matrix for a Mixture of Two Normal Distributions, Journal of Statistical Computation and Simulation, 1, 295–314.
Article MATH Google Scholar
Bhattacharyya, A. (1943). On a Measure of Divergence Between Two Statistical Populations Defined by Their Probability Distributions, Bull. Calcutta Math. Soc., 35, 99–110.
MathSciNet MATH Google Scholar
Binder, D. A. (1978), Bayesian Cluster Analysis, Biometrika, 65, 31–38.
Article MathSciNet MATH Google Scholar
Brennan, T. (1980). Multivariate Taxonomic Classification for Criminal Justice Research, Final Report, Vol. 2, Project No. 78-NI-AX-0065, National Institute of Justice, Washington, D. C. 20531.
Google Scholar
Bock, H. H. (1981), Statistical Testing and Evaluation Methods in Cluster Analysis, in the Proceedings of the Indian Statistical Institute Golden Jubilee International Conference on: Statistics: Applications and New Directions, J. K. Ghosh and J. Roy (Eds.) December 16–19, Calcutta, 116–146.
Google Scholar
Box, G. E. P. and Cox, D.R. (1964). An Analysis of Transformations, Journal of the Royal Statistical Society, Series B, 26, No. 2, 211–252 (with discussion).
MathSciNet MATH Google Scholar
Bozdogan, H. (1981). Multi-Sample Cluster Analysis and Approaches to Validity Studies in Clustering Individuals. Ph.D. thesis, Department of Mathematics, University of Illinois at Chicago, Chicago, Illinois 60680.
Google Scholar
Bozdogan, H. (1983), Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria, Technical Report No. UIC/DQM/A83-1, June 16, 1983, ARO Contract DAAG29-82-K-0155, Quantitative Methods Department, University of Illinois at Chicago, Chicago, Illinois 60680.
Google Scholar
Bozdogan, H. (1987). Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions, Psychometrika, 52, No. 3, Special Section (invited paper), 345–370.
Article MathSciNet MATH Google Scholar
Bozdogan, H. (1988). ICOMP: A New Model Selection Criterion, in Classification and Related Methods of Data Analysis, Hans H. Bock (Ed.), North-Holland, Amsterdam April, 599–608.
Google Scholar
Bozdogan, H. (1990a). On the Information-Based Measure of Covariance Complexity and its Application to the Evaluation of Multivariate Linear Models, Communications in Statistics (Theory and Methods) 19 (1), 221–278.
Article MathSciNet MATH Google Scholar
Bozdogan, H. (1990b). Multisample Cluster Analysis of the Common Principal Component Model in K Groups Using An Entropie Statistical Complexity Criterion, invited paper presented at the International Symposium on Theory and Practice of Classification, December 16–19, Puschino, Soviet Union.
Google Scholar
Bozdogan, H. (1992). Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix, Invited paper in Studies in Classification, Data Analysis, and Knowledge Organization, O. Opitz, B. Lausen, and R. Klar (Eds.), Springer-Verlag, Heidelberg, Germany. To appear.
Google Scholar
Calinski, T. and Harabasz, J. (1974). A Dendrite Method for Cluster Analysis, Communications in Statistics, 3, 1–27.
MathSciNet MATH Google Scholar
Carman, C. S. (1989). Pattern Recognition of Magnetic Resonance Images with Application to Atherosclerosis. Ph.D. thesis, Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia 22903.
Google Scholar
Chernoff, H. (1954). On the Distribution of the Likelihood Ratio, Annals of Mathematical Statistics, 25, 573–578.
Article MathSciNet MATH Google Scholar
Cutler A. and Windham, M. P. (1993). Information-Based Validity Functionals for Mixture Analysis, in Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, H. Bozdogan (Ed.), Kluwer Academic Publishers, Dordrecht, The Netherlands.
Google Scholar
Day, N. E. (1969), Estimating the Components of a Mixture of Normal Distributions, Biometrika, 11, 235–254.
Google Scholar
Dempster, A., Laird, N. M., and Rubin D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, Series B, 29, 1–38 (with discussion).
MathSciNet Google Scholar
Everitt, B. S. (1974). Cluster Analysis, Heinemann Educational Books, London.
Google Scholar
Everitt, B. S. (1979), Unresolved Problems in Cluster Analysis, Biometrics, 35, 169–181.
Article MATH Google Scholar
Everitt, B. S. (1981). A Monte Carlo Investigation of the Likelihood Ratio Test for the Number of Components in a Mixture of Normal Distributions, Multivariate Behavioral Research, 16, 171–180.
Article Google Scholar
Everitt, B. S., and Hand, D. J. (1981), Finite Mixture Distributions, Chapman and Hall, New York.
Book MATH Google Scholar
Feder, P. I. (1968). On the Distribution of the Log Likelihood Ratio Test Statistic When the True Parameter is “Near” the Boundaries of the Hypothesis Regions, Annals of Mathematical Statistics, 39, 2044–2055.
Article MathSciNet MATH Google Scholar
Fleiss, J. L. and Zubin, J. (1969). On the Methods and Theory of Clustering, Multivariate Behavioral Research, 4, 235–250.
Article Google Scholar
Fukunaga, K. (1990). Statistical Pattern Recognition, 2nd Edition, Academic Press, New York.
MATH Google Scholar
Gordon, A. D. (1981). Classification, Chapman and Hall, London.
MATH Google Scholar
Hartigan, J. A. (1975), Clustering Algorithms, John Wiley & Sons, New York.
MATH Google Scholar
Hartigan, J. A. (1977), Distribution Problems in Clustering, in Classification and Clustering, J. Van Ryzin (Ed.), Academic Press, New York, 45–71.
Google Scholar
Hartigan, J. (1985). Statistical Theory in Clustering, Journal of Classification, 2, 63–76.
Article MathSciNet MATH Google Scholar
Hawkins, D. M., Muller, M. W., and Krooden, J. A. T. (1982). Cluster Analysis, in Topics in Applied Multivariate Analysis, D. M. Hawkins (Ed.), Cambridge University Press, Cambridge, 303–356.
Chapter Google Scholar
Henna, J. (1985). On Estimating of the Number of Constituents of a Finite Mixture of Continuous Distributions, Annals of the Institute of Statistical Mathematics, Part A, 37, 235–240.
Article MathSciNet MATH Google Scholar
Henna, J. (1986). An Application of a Mixture Method to Classification, Journal of Japan Statistical Society, Vol. 16, No. 2, 133–143.
MathSciNet MATH Google Scholar
John, S. (1970), On Identifying the Population of Origin of Each Observation in a Mixture of Observations from Two Normal Populations, Technometrics, 12, 553–563.
Article Google Scholar
Kendall, M. G. and Stuart, M. A. (1979). The Advanced Theory of Statistics, Vol. 2, Fourth Edition, Hafner Publishing, New York.
MATH Google Scholar
Kullback, S., and Leibler, R. A. (1951), On Information and Sufficiency, Annals of Mathematical Statistics, 22, 79–86.
Article MathSciNet MATH Google Scholar
Magnus, J. R. (1988), Linear Structures, Oxford University Press, New York.
MATH Google Scholar
Magnus, J. R. (1989), Personal correspondence.
Google Scholar
Magnus, J. R., and Neudecker, H. (1988), Matrix Differential Calculus with Applications in Statistics and Econometrics, John Wiley & Sons, New York.
MATH Google Scholar
Maklad, M. S., and Nichols, T. (1980), A New Approach to Model Structure Discrimination, IEEE Transactions on Systems, Man, and Cybernetics, SMC-10, No. 2, 78–84
Article MathSciNet MATH Google Scholar
Marcha, K. V. (1970). Measures of Multivariate Skewness and Kurtosis with Applications, Biometrika, 57, 519–530.
Article MathSciNet Google Scholar
Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979). Multivariate Analysis, Academic Press, New York.
MATH Google Scholar
Maronna, R. and Jacovkis, P. M. (1974). Multivariate Clustering Procedures with Variable Metrics, Biometrics, 30, 499–505.
Article MATH Google Scholar
Marriott, F. H. C. (1971). Practical Problems in a Method of Cluster Analysis, Biometrics, 27, 501–514.
Article Google Scholar
Matusita, K. and Ohsumi, N. (1981). Evaluation Procedure of Some Clustering Techniques. Unpublished paper, The Institute of Statistical Mathematics, Tokyo, Japan.
Google Scholar
McLachlan, G. J., and Basford, K. E. (1988), Mixture Models: Inference and Applications to Clustering, Marcel Dekker, Inc., New York.
MATH Google Scholar
Pearlman, J.D. (1986). Nuclear Magnetic Resonance Spectral Signatures of Liquid Crystals in Human Atheroma as Basis for Multi-Dimensional Digital Imaging of Atherosclerosis. Ph.D. thesis, Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia 22903.
Google Scholar
Pearlman, J.D., Bozdogan, H., Brown, M. G., Ware, S., Carman, C. S., Merickel, M. B., Cail, B., Brookeman, J. R., and Ayers, C. R. (1986). Early Detection and Quantification of Human Atheroma in the Aortic Wall by a Multidimensional Approach to 1H-NMR Imaging. Presented at the 59th Annual Session of the American Heart Association, Dallas, Texas
Google Scholar
Pearlman, J.D., Bozdogan, H., Brown, M. G., Ware, S., Carman, C. S., Merickel, M. B., Cail, B., Brookeman, J. R., and Ayers, C. R. 1986, Circulation, 74: II–202,
Google Scholar
Peters, B. C. and Walker, H. F. (1978). An Iterative Procedure for Obtaining Maximum-Likelihood Estimates of the Parameters for a Mixture of Normal Distributions, SIAM Journal of Applied Mathematics, 35, 362–378.
Article MathSciNet MATH Google Scholar
Reaven, G. M. and Miller, R. G. (1979), An attempt to define the nature of chemical diabetes using a multidimensional analysis, Diabetologia 16, 17–24.
Article Google Scholar
Rissanen, J. (1976), Minmax Entropy Estimation of Models for Vector Processes, in: R. K. Mehra and D. G. Lainiotis (eds.), System Identification, Academic Press, New York, 97–119.
Google Scholar
Rissanen, J. (1978). Modeling by Shortest Data Description, Automatica, 14, 465–471.
Article MATH Google Scholar
Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, World Scientific Publishing Company, Teaneck, New Jersey.
MATH Google Scholar
Rissanen, J. and Ristad, E. S. (1993). Unsupervised Classification With Stochastic Complexity, in Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, H. Bozdogan (Ed.), Kluwer Academic Publishers, Dordrecht, The Netherlands.
Google Scholar
Schwarz, G. (1978). Estimating the Dimension of a Model, Annals of Statistics, 6, 461–464.
Article MathSciNet MATH Google Scholar
Sclove, S. L. (1977), Population Mixture Models and Clustering Algorithms, Communications in Statistics (Theory and Methods) A6, 417–434.
Article MathSciNet MATH Google Scholar
Sclove, S. L. (1982), Application of the Conditional Population Mixture Model to Image Segmentation, Technical Report A82-1,1982, ARO Contract DAAG29-82-K-0155, University of Illinois at Chicago, Chicago, Illinois 60680.
Google Scholar
Sclove, S. L. (1993). Some Aspects of Model-Selection Criteria, in Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, H. Bozdogan (Ed.), Kluwer Academic Publishers, Dordrecht, the Netherlands.
Google Scholar
Scott, A. J., and Symons, M. J. (1971). Clustering Methods Based on Likelihood Ratio Criteria, Biometrics, 27, 387–397.
Article Google Scholar
Sokal, R. R. (1977). Clustering and Classification: Background and Current Directions, in Classification and Clustering, J. Van Ryzin (Ed.), Academic Press, New York, 1–15.
Google Scholar
Sturges, H. A. (1926). The Choice of Class Intervals, Journal of the American Statistical Association, 21, 65–66.
Article Google Scholar
Symons, M. J. (1981). Clustering Criteria and Multivariate Normal Mixtures, Biometrics, 37, 35–43.
Article MathSciNet MATH Google Scholar
Teicher, H. (1961). Identifiability of Mixtures, Annals of Mathematical Statistics, 32, 244–248.
Article MathSciNet MATH Google Scholar
Teicher, H. (1963). Identifiability of Finite Mixtures, Annals of Mathematical Statistics, 34, 1265–1269.
Article MathSciNet MATH Google Scholar
Titterington, D. M. (1982). Some Problems with Data from Finite Mixture Distributions, Mathematics Research Center Summary Report No. 2369, University of Wisconsin, Madison, Wisconsin.
Google Scholar
Titterington, D. M., Smith, A. M. F., and Makov, U. E. (1985), Statistical Analysis of Finite Mixture Distributions, John Wiley & Sons, New York.
MATH Google Scholar
Van Emden, M. H. (1971), An Analysis of Complexity, Mathematical Centre Tracts, 35, Amsterdam.
MATH Google Scholar
Wald, A. (1943). Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large, Trans. of the American Math Society, 54, 426–482.
Article MathSciNet MATH Google Scholar
Wilks, S. S. (1938). The Large Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses, Annals of Mathematical Statistics, 9, 60–62.
Article MATH Google Scholar
Windham, M. P. and Cutler, A. (1992). Information Ratios for Validating Mixture Analyses, Journal of the American Statistical Association, 87, No. 420, 1188–1192.
Article Google Scholar
Wolfe, J. H. (1967), NORMIX: Computational Methods for Estimating the Parameters of Multivariate Normal Mixtures of Distributions, Research Memorandum, SRM 68-2, U. S. Naval Personnel Research Activity, San Diego, California.
Google Scholar
Wolfe, J. H. (1970), Pattern Clustering by Multivariate Mixture Analysis, Multivariate Behavioral Res., 5, 329–350.
Article Google Scholar
Wolfe, J. H. (1971). A Monte-Carlo Study of the Sampling Distribution of the Likelihood Ratio for Mixtures of Multinormal Distribution, Research Memorandum 72-2, U. S. Naval Personnel Research Activity, San Diego, California.
Google Scholar
Wong, M. A. (1982). A Hybrid Clustering Method for Identifying High-Density Clusters, Journal of the American Statistical Association, 77, No. 380, 841–847.
Article MathSciNet MATH Google Scholar
Yakowitz, S. and Spragins, J. (1968). On the Identifiability of Finite Mixtures, Annals of Mathematical Statistics, 39, 209–214.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, The University of Tennessee, Knoxville, Tennessee, 37996-0532, USA
Hamparsum Bozdogan

Authors

Hamparsum Bozdogan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Statistics, The University of Tennessee, 331 Stokely Management Center, Knoxville, TN, 37996-0532, USA
Hamparsum Bozdogan
Department of Information & Decision Sciences M/C 294, CBA, University of Illinois at Chicago, Box 802451, 60607-7124, Chicago, IL, USA
Stanley L. Sclove
Department of Mathematics & Statistics, Bowling Green State University, Bowling Green, OH, 43403, USA
Arjun K. Gupta
Department of Mathematical Sciences, Bentley College, Waltham, MA, USA
D. Haughton
The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-Ku, Tokyo, Japan
G. Kitagawa , T. Ozaki & K. Tanabe , &

Additional information

Dedicated to Professor Akaike on the occasion of his 65th birthday celebration.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bozdogan, H. (1994). Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity. In: Bozdogan, H., et al. Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0800-3_3

Download citation

DOI: https://doi.org/10.1007/978-94-011-0800-3_3
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-4344-1
Online ISBN: 978-94-011-0800-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics