Abstract
Analysis of clusters by means of mixture distribution, called mixture-model cluster analysis, has been one of the most difficult problems in statistics. But theoretical work, coupled with the development of new computational tools in the past ten years, has been made it possible to overcome some of the intractable technical and numerical issues that have limited the widespread applicability of mixture-model cluster analysis to complex real-word problems. The development of new objective analysis techniques had to wait the emergence of information-based model selection procedure to overcome difficulties with cinventional techniques within the context of the mixture-model cluster analysis. See, e.g., Bozdogan (1992), Windham and Cutler (1993) (in this volume)
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Al-Hussaini, E. K. and Ahmad, K. L. (981). On the Identifiability of Finite Mixtures of Distributions, IEEE Transactions on Information Theory, Vol. IT-27, No. 5, 662–668.
Anderson, J. J. (1985). Normal Mixtures and the Number of Clusters Problem, Computational Statistics Quarterly (CQS), Vol. 2, Issue 1, 3–14.
Andrews, D. F. and Herzberg, A. M. (1985), Data, Springer-Verlag, New York.
Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle, in Second International Symposium on Information Theory, B.N. Petrov and F. Csaki (Eds.), Budapest: Academiai Kiado, 267–281.
Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE Tran. on Automatic Control, AC-19, 716–723.
Akaike, H. (1978). Time Series Analysis and Control Through Parametric Models, in Applied Time Series Analysis, D. F. Findley (Ed.), Academic Press, New York, 1–23.
Akaike, H. (1981). Modern Development of Statistical Methods, in Trends and Progress in System Identification, P. Eykoff (Ed.), Pergamon Press, New York, 169–184.
Akaike, H. (1985). Prediction and Entropy, in A Celebration of Statistics: The ISI Centenary Volume, A. C. Atkinson and S. E. Fienberg (Eds.), Springer-Verlag, New York, 1–24.
Beale, E. M. (1969). Cluster Analysis, Scientific Control Systems, London.
Behboodian, J. (1972). Information Matrix for a Mixture of Two Normal Distributions, Journal of Statistical Computation and Simulation, 1, 295–314.
Bhattacharyya, A. (1943). On a Measure of Divergence Between Two Statistical Populations Defined by Their Probability Distributions, Bull. Calcutta Math. Soc., 35, 99–110.
Binder, D. A. (1978), Bayesian Cluster Analysis, Biometrika, 65, 31–38.
Brennan, T. (1980). Multivariate Taxonomic Classification for Criminal Justice Research, Final Report, Vol. 2, Project No. 78-NI-AX-0065, National Institute of Justice, Washington, D. C. 20531.
Bock, H. H. (1981), Statistical Testing and Evaluation Methods in Cluster Analysis, in the Proceedings of the Indian Statistical Institute Golden Jubilee International Conference on: Statistics: Applications and New Directions, J. K. Ghosh and J. Roy (Eds.) December 16–19, Calcutta, 116–146.
Box, G. E. P. and Cox, D.R. (1964). An Analysis of Transformations, Journal of the Royal Statistical Society, Series B, 26, No. 2, 211–252 (with discussion).
Bozdogan, H. (1981). Multi-Sample Cluster Analysis and Approaches to Validity Studies in Clustering Individuals. Ph.D. thesis, Department of Mathematics, University of Illinois at Chicago, Chicago, Illinois 60680.
Bozdogan, H. (1983), Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria, Technical Report No. UIC/DQM/A83-1, June 16, 1983, ARO Contract DAAG29-82-K-0155, Quantitative Methods Department, University of Illinois at Chicago, Chicago, Illinois 60680.
Bozdogan, H. (1987). Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions, Psychometrika, 52, No. 3, Special Section (invited paper), 345–370.
Bozdogan, H. (1988). ICOMP: A New Model Selection Criterion, in Classification and Related Methods of Data Analysis, Hans H. Bock (Ed.), North-Holland, Amsterdam April, 599–608.
Bozdogan, H. (1990a). On the Information-Based Measure of Covariance Complexity and its Application to the Evaluation of Multivariate Linear Models, Communications in Statistics (Theory and Methods) 19 (1), 221–278.
Bozdogan, H. (1990b). Multisample Cluster Analysis of the Common Principal Component Model in K Groups Using An Entropie Statistical Complexity Criterion, invited paper presented at the International Symposium on Theory and Practice of Classification, December 16–19, Puschino, Soviet Union.
Bozdogan, H. (1992). Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix, Invited paper in Studies in Classification, Data Analysis, and Knowledge Organization, O. Opitz, B. Lausen, and R. Klar (Eds.), Springer-Verlag, Heidelberg, Germany. To appear.
Calinski, T. and Harabasz, J. (1974). A Dendrite Method for Cluster Analysis, Communications in Statistics, 3, 1–27.
Carman, C. S. (1989). Pattern Recognition of Magnetic Resonance Images with Application to Atherosclerosis. Ph.D. thesis, Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia 22903.
Chernoff, H. (1954). On the Distribution of the Likelihood Ratio, Annals of Mathematical Statistics, 25, 573–578.
Cutler A. and Windham, M. P. (1993). Information-Based Validity Functionals for Mixture Analysis, in Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, H. Bozdogan (Ed.), Kluwer Academic Publishers, Dordrecht, The Netherlands.
Day, N. E. (1969), Estimating the Components of a Mixture of Normal Distributions, Biometrika, 11, 235–254.
Dempster, A., Laird, N. M., and Rubin D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, Series B, 29, 1–38 (with discussion).
Everitt, B. S. (1974). Cluster Analysis, Heinemann Educational Books, London.
Everitt, B. S. (1979), Unresolved Problems in Cluster Analysis, Biometrics, 35, 169–181.
Everitt, B. S. (1981). A Monte Carlo Investigation of the Likelihood Ratio Test for the Number of Components in a Mixture of Normal Distributions, Multivariate Behavioral Research, 16, 171–180.
Everitt, B. S., and Hand, D. J. (1981), Finite Mixture Distributions, Chapman and Hall, New York.
Feder, P. I. (1968). On the Distribution of the Log Likelihood Ratio Test Statistic When the True Parameter is “Near” the Boundaries of the Hypothesis Regions, Annals of Mathematical Statistics, 39, 2044–2055.
Fleiss, J. L. and Zubin, J. (1969). On the Methods and Theory of Clustering, Multivariate Behavioral Research, 4, 235–250.
Fukunaga, K. (1990). Statistical Pattern Recognition, 2nd Edition, Academic Press, New York.
Gordon, A. D. (1981). Classification, Chapman and Hall, London.
Hartigan, J. A. (1975), Clustering Algorithms, John Wiley & Sons, New York.
Hartigan, J. A. (1977), Distribution Problems in Clustering, in Classification and Clustering, J. Van Ryzin (Ed.), Academic Press, New York, 45–71.
Hartigan, J. (1985). Statistical Theory in Clustering, Journal of Classification, 2, 63–76.
Hawkins, D. M., Muller, M. W., and Krooden, J. A. T. (1982). Cluster Analysis, in Topics in Applied Multivariate Analysis, D. M. Hawkins (Ed.), Cambridge University Press, Cambridge, 303–356.
Henna, J. (1985). On Estimating of the Number of Constituents of a Finite Mixture of Continuous Distributions, Annals of the Institute of Statistical Mathematics, Part A, 37, 235–240.
Henna, J. (1986). An Application of a Mixture Method to Classification, Journal of Japan Statistical Society, Vol. 16, No. 2, 133–143.
John, S. (1970), On Identifying the Population of Origin of Each Observation in a Mixture of Observations from Two Normal Populations, Technometrics, 12, 553–563.
Kendall, M. G. and Stuart, M. A. (1979). The Advanced Theory of Statistics, Vol. 2, Fourth Edition, Hafner Publishing, New York.
Kullback, S., and Leibler, R. A. (1951), On Information and Sufficiency, Annals of Mathematical Statistics, 22, 79–86.
Magnus, J. R. (1988), Linear Structures, Oxford University Press, New York.
Magnus, J. R. (1989), Personal correspondence.
Magnus, J. R., and Neudecker, H. (1988), Matrix Differential Calculus with Applications in Statistics and Econometrics, John Wiley & Sons, New York.
Maklad, M. S., and Nichols, T. (1980), A New Approach to Model Structure Discrimination, IEEE Transactions on Systems, Man, and Cybernetics, SMC-10, No. 2, 78–84
Marcha, K. V. (1970). Measures of Multivariate Skewness and Kurtosis with Applications, Biometrika, 57, 519–530.
Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979). Multivariate Analysis, Academic Press, New York.
Maronna, R. and Jacovkis, P. M. (1974). Multivariate Clustering Procedures with Variable Metrics, Biometrics, 30, 499–505.
Marriott, F. H. C. (1971). Practical Problems in a Method of Cluster Analysis, Biometrics, 27, 501–514.
Matusita, K. and Ohsumi, N. (1981). Evaluation Procedure of Some Clustering Techniques. Unpublished paper, The Institute of Statistical Mathematics, Tokyo, Japan.
McLachlan, G. J., and Basford, K. E. (1988), Mixture Models: Inference and Applications to Clustering, Marcel Dekker, Inc., New York.
Pearlman, J.D. (1986). Nuclear Magnetic Resonance Spectral Signatures of Liquid Crystals in Human Atheroma as Basis for Multi-Dimensional Digital Imaging of Atherosclerosis. Ph.D. thesis, Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia 22903.
Pearlman, J.D., Bozdogan, H., Brown, M. G., Ware, S., Carman, C. S., Merickel, M. B., Cail, B., Brookeman, J. R., and Ayers, C. R. (1986). Early Detection and Quantification of Human Atheroma in the Aortic Wall by a Multidimensional Approach to 1H-NMR Imaging. Presented at the 59th Annual Session of the American Heart Association, Dallas, Texas
Pearlman, J.D., Bozdogan, H., Brown, M. G., Ware, S., Carman, C. S., Merickel, M. B., Cail, B., Brookeman, J. R., and Ayers, C. R. 1986, Circulation, 74: II–202,
Peters, B. C. and Walker, H. F. (1978). An Iterative Procedure for Obtaining Maximum-Likelihood Estimates of the Parameters for a Mixture of Normal Distributions, SIAM Journal of Applied Mathematics, 35, 362–378.
Reaven, G. M. and Miller, R. G. (1979), An attempt to define the nature of chemical diabetes using a multidimensional analysis, Diabetologia 16, 17–24.
Rissanen, J. (1976), Minmax Entropy Estimation of Models for Vector Processes, in: R. K. Mehra and D. G. Lainiotis (eds.), System Identification, Academic Press, New York, 97–119.
Rissanen, J. (1978). Modeling by Shortest Data Description, Automatica, 14, 465–471.
Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, World Scientific Publishing Company, Teaneck, New Jersey.
Rissanen, J. and Ristad, E. S. (1993). Unsupervised Classification With Stochastic Complexity, in Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, H. Bozdogan (Ed.), Kluwer Academic Publishers, Dordrecht, The Netherlands.
Schwarz, G. (1978). Estimating the Dimension of a Model, Annals of Statistics, 6, 461–464.
Sclove, S. L. (1977), Population Mixture Models and Clustering Algorithms, Communications in Statistics (Theory and Methods) A6, 417–434.
Sclove, S. L. (1982), Application of the Conditional Population Mixture Model to Image Segmentation, Technical Report A82-1,1982, ARO Contract DAAG29-82-K-0155, University of Illinois at Chicago, Chicago, Illinois 60680.
Sclove, S. L. (1993). Some Aspects of Model-Selection Criteria, in Multivariate Statistical Modeling, Vol. II, Proc. of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, H. Bozdogan (Ed.), Kluwer Academic Publishers, Dordrecht, the Netherlands.
Scott, A. J., and Symons, M. J. (1971). Clustering Methods Based on Likelihood Ratio Criteria, Biometrics, 27, 387–397.
Sokal, R. R. (1977). Clustering and Classification: Background and Current Directions, in Classification and Clustering, J. Van Ryzin (Ed.), Academic Press, New York, 1–15.
Sturges, H. A. (1926). The Choice of Class Intervals, Journal of the American Statistical Association, 21, 65–66.
Symons, M. J. (1981). Clustering Criteria and Multivariate Normal Mixtures, Biometrics, 37, 35–43.
Teicher, H. (1961). Identifiability of Mixtures, Annals of Mathematical Statistics, 32, 244–248.
Teicher, H. (1963). Identifiability of Finite Mixtures, Annals of Mathematical Statistics, 34, 1265–1269.
Titterington, D. M. (1982). Some Problems with Data from Finite Mixture Distributions, Mathematics Research Center Summary Report No. 2369, University of Wisconsin, Madison, Wisconsin.
Titterington, D. M., Smith, A. M. F., and Makov, U. E. (1985), Statistical Analysis of Finite Mixture Distributions, John Wiley & Sons, New York.
Van Emden, M. H. (1971), An Analysis of Complexity, Mathematical Centre Tracts, 35, Amsterdam.
Wald, A. (1943). Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large, Trans. of the American Math Society, 54, 426–482.
Wilks, S. S. (1938). The Large Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses, Annals of Mathematical Statistics, 9, 60–62.
Windham, M. P. and Cutler, A. (1992). Information Ratios for Validating Mixture Analyses, Journal of the American Statistical Association, 87, No. 420, 1188–1192.
Wolfe, J. H. (1967), NORMIX: Computational Methods for Estimating the Parameters of Multivariate Normal Mixtures of Distributions, Research Memorandum, SRM 68-2, U. S. Naval Personnel Research Activity, San Diego, California.
Wolfe, J. H. (1970), Pattern Clustering by Multivariate Mixture Analysis, Multivariate Behavioral Res., 5, 329–350.
Wolfe, J. H. (1971). A Monte-Carlo Study of the Sampling Distribution of the Likelihood Ratio for Mixtures of Multinormal Distribution, Research Memorandum 72-2, U. S. Naval Personnel Research Activity, San Diego, California.
Wong, M. A. (1982). A Hybrid Clustering Method for Identifying High-Density Clusters, Journal of the American Statistical Association, 77, No. 380, 841–847.
Yakowitz, S. and Spragins, J. (1968). On the Identifiability of Finite Mixtures, Annals of Mathematical Statistics, 39, 209–214.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Additional information
Dedicated to Professor Akaike on the occasion of his 65th birthday celebration.
Rights and permissions
Copyright information
© 1994 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Bozdogan, H. (1994). Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity. In: Bozdogan, H., et al. Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0800-3_3
Download citation
DOI: https://doi.org/10.1007/978-94-011-0800-3_3
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-4344-1
Online ISBN: 978-94-011-0800-3
eBook Packages: Springer Book Archive