Abstract
Cluster analysis provides methods for subdividing a set of objects into a suitable number of ‘classes’, ‘groups’, or ‘types’ C 1,…,C m such that each class is as homogeneous as possible and different classes are sufficiently separated. This paper shows how entropy and information measures have been or can be used in this framework. We present several probabilistic clustering approaches which are related to, or lead to, information and entropy criteria g(C) for selecting an optimum partition C = (C 1,…,C m ) of n data vectors, for qualitative and for quantitative data, assuming loglinear, logistic, and normal distribution models, together with appropriate iterative clustering algorithms. A new partitioning problem is considered in Section 5 where we look for a dissection (discretization) C of an arbitrary sample space Y (e.g. R p or 0,1p) such that the ø—divergence I c (P 0, P 1) between two discretized distributions P o (C i ), P 1(C i ) (i = 1,…, m) will be maximized (e.g., Kullback-Leibler’s discrimination information or the X 2 noncentrality parameter). We conclude with some comments on methods for selecting a suitable number of classes, e.g., by using Akaike’s information criterion AIC and its modifications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agresti, A.: Ordinal categorical data. Wiley, New York, 1990.
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., and Csaki, F. (eds.): Second International Symposium on Information Theory. Akademiai Kiado, Budapest, 1973, 267–281.
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19 (1974) 716–723.
Akaike, H.: On entropy maximization principle. In: Krisnaiah, P.R. (ed.): Applications of statistics. North Holland, Amsterdam, 1977, 27–41.
Akaike, H.: A Bayesian analysis of the minimum AIC procedure. Ann. Inst. Stat. Math. A 30 (1979) 9–14.
Arnold, S.J.: A test for clusters. J. Marketing Research 16 (1979) 545–551.
Benzécri, J.P.: Théorie de l’information et classification d’après un tableau de contingence. In: Benzécri, J.P.: L’Analyse des Données, Vol. 1. Dunod, Paris, 1973, 207–236.
Binder, D.A.: Bayesian cluster analysis. Biometrika 65 (1978) 31–38.
Binder, D.A.: Approximations to Bayesian clustering rules. Biometrika 68 (1981) 275–286.
Bock, H.H.: Statistische Modelle für die einfache und doppelte Klassifikation von normalverteilten Beobachtungen. Dissertation, University of Freiburg, 1968.
Bock, H.H.: The equivalence of two extremal problems and its application to the iterative classification of multivariate data. Written version of a lecture given at the Conference on “Medizinische Statistik”, Forschungsinstitut Oberwolfach, February 23 — March 1, 1969, 10 pp.
Bock, H.H.: Statistische Modelle und Bayes’sehe Verfahren zur Bestimmung einer unbekannten Klassifikation normalverteilter zufälliger Vektoren. Metrika 18 (1972) 120–132.
Bock, H.H.: Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten (Clusteranalyse). Vandenhoeck & Ruprecht, Göttingen, 1974.
Bock, H.H.: On tests concerning the existence of a classification. In: Proc. 1st Symp. Data Analysis and Informatics. Versailles, 1977. Institut de Recherche d’Informatique et d’Automatique (IRIA), Le Cesnay, France, 1977, 449–464.
Bock, H.H.: A clustering algorithm for choosing optimal classes for the chi-square test. Bull 44th Session of the International Statistical Institute, Madrid, Contributed papers, Vol. 2, (1983) 758–762.
Bock, H.H.: Statistical testing and evaluation methods in cluster analysis. In: Ghosh, J.K. and Roy, J. (eds.): Golden Jubilee Conference in Statistics: Applications and new directions. Calcutta, december 1981. Indian Statistical Institute, Calcutta, 1984, 116–146.
Bock, H.H.: On some significance tests in cluster analysis. J. of Classification 2 (1985) 77–108.
Bock, H.H.: Loglinear models and entropy clustering methods for qualitative data. In: Gaul, W. and Schader, M. (eds.): Classification as a tool of research. Proc. 9th Annual Conference of the Gesellschaft für Klassifikation, Karlsruhe, 1985. North Holland, Amsterdam, 1986, 19–26.
Bock, H.H.: On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: Bozdogan, H. and Gupta, A.K. (eds.): Multivariate statistical modeling and data analysis. Reidel Publ., Dordrecht, 1987, 17–34.
Bock, H.H.: Probabilistic aspects in cluster analysis. In: O. Opitz (ed.): Conceptual and numerical analysis of data. Springer-Verlag, Heidelberg-Berlin, 1989, 12–44.
Bock, H.H.: A clustering technique for maximizing ø—divergence, noncentrality and discriminating power. In: Schader (ed.): Analyzing and modeling data and knowledge. Proc. 15th Annual Conference of the Gesellschaft für Klassifikation, Salzburg, 1991, Vol. 1. Springer-Verlag, Heidelberg — New York, 1991, 19–36.
Boulton, D.M. and Wallace, C.S.: The information content of a multistate distribution. J. Theoretical Biology 23 (1969) 269–278.
Bozdogan, H.: ICOMP: A new model selection criterion. In: Bock, H.H. (ed.): Classification and related methods of data analysis. Proc. First Conference of the International Federation of Classification Societies, Aachen, 1987. North Holland, Amsterdam, 1988, 599-608.
Bozdogan, H.: On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Comm. Statist., Theory and Methods 19 (1990) 221–278.
Bozdogan, H.: Choosing the number of component clusters in the mixture model using a new informational complexity criterion of the inverse Fisher information matrix. In: O. Opitz, B. Lausen, R. Klar (eds.): Information and classification. Proc. 16th Annual Conference of the Gesellschaft für Klassifikation, Dortmund, April 1992. Springer-Verlag, Heidelberg, 1993 (to appear).
Bozdogan, H. and Gupta, A.K. (eds.): Multivariate statistical modeling and data analysis. Reidel Publ., Dordrecht, 1987.
Bozdogan, H. and Sclove, S.L.: Multi-sample cluster analysis using Akaike’s information criterion. Ann. Inst. Statist. Math. 36 (1984), Part B, 163–180.
Bryant, P.: On characterizing optimization-based clustering methods. J. of Classification 5(1988) 81–84.
Carman, C.S., Merickel, M.B.: Supervising ISODATA with an information theoretic stopping rule. Pattern Recognition 23 (1990) 185.
Celeux, G.: Classification et modèles. Revue de Statistique Appliquée 36 (1988), no. 4, 43–58.
Celeux, G. and Govaert, G.: Clustering criteria for discrete data and latent class models. J. of Classification 8(1991) 157–176.
Ciampi, A., Thiffault, J. and Sagman, U.: Évaluation de classifications par le critère d’Akaike et la validation croisée. Revue de Statistique Appliquée 13 (1988) (3) 33–50.
Csiszár, I.: Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica 2 (1967) 299–318.
Darroch, J.N., Lauritzen, S.L. and Speed, T.P.: Markov fields and log-linear interaction models for contingency tables. Ann. Statist. 8 (1980) 522–539.
Diday, E. and Schroeder, A.: A new approach in mixed distributions detection. Revue Française d’Automatique, Informatique et Recherche Opérationnelle 10 (1976), no. 6, 75–106.
Diday, E. and Simon, J.C.: Clustering analysis. In: K.S. Fu (ed.): Digital pattern recognition. Springer-Verlag, Berlin, 1976, 47–94.
Diday, E. and Govaert, G.: Classification automatique avec distances adaptatives. Revue Française d’Automatique, Informatique et Recherche Opérationnelle (R.A.I.R.O.), Série Informatique 11 (1977) 329–349.
Diday, E. et al. (eds.): Optimisation en classification automatique I, II. Institut National de Recherche en Informatique et en Automatique, Le Chesnay, 1979.
Eisenblätter, D. and Bozdogan, H.: Two-stage multi-sample cluster analysis as a general approach to discriminant analysis. In: Bozdogan, H., and A.K. Gupta (eds.): Multivariate statistical modeling and data analysis. Reidel Publ., Dordrecht, 1987, 95–119.
Eisenblätter, D. and Bozdogan, H.: Two-stage multi-sample cluster analysis. In: Bock, H.H. (ed.): Classification and related methods of data analysis. Proc. First Conference of the International Federation of Classification Societies, Aachen, 1987. North Holland, Amsterdam, 1988, 91–96.
Engelman, L. and Hartigan, J.A.: Percentage points of a test for clusters. J. Amer. Statist. Assoc. 64 (1969) 1647–1648.
Everitt, B.S.: A Monte Carlo investigation of the likelihood ratio test for the number of components in a mixture of normal distributions. Multivariate Behavioral Reserach 16 (1981) 171–180.
Forst, H. T.: On the hierarchical classification of observation units according to comparative characteristics (in German). International Classification 5 (1978) 81–85.
Ghosh, J.K. and Sen, P.K.: On the asymptotic performance of the log-likelihood ratio statistic for the mixture model and related results. In: LeCam, L.M. and R.A. Olshen (eds.): Proc. Berkely Conference in Honor of Jerzy Neyman and Jack Kiefer. Vol II. Wadsworth, Monterey, California, 1985, 789–806.
Godehardt, E.: Graphs as structural models. The application of graphs and multigraphs in cluster analysis. Vieweg Verlag, Braunschweig, 19902.
Govaert, G.: Classification avec distances adaptatives. Thèse de 3e cycle. Université Paris VI, 1975.
Govaert, G.: Classification binaire et modèles. Revue Statistique Appliquée 38 (1990), no.1, 67–81.
Haberman, S.J.: Log-linear models for frequency data: sufficient statistics and likelihood equations. Ann. Statist. 1 (1973) 617–632.
Haberman, S.J.: The analysis of frequency data. University of Chicago Press, Chicago, 1974.
Haberman, S.J.: Log-linear models and frequency tables with small expected cell counts. Ann. Statist. 5 (1977) 1148–1169.
Hall, P.: Akaike’s information criterion and Kullback-Leibler loss for histogram density estimation. Theory of Probability and Related Fields 85 (1990) 449–467.
Hartigan, J.A.: Asymptotic distributions for clustering criteria. Ann. Statist. 6 (1978) 117–131.
Haughton, D.M.A.: On the choice of a model to fit data from an exponential family. Ann. Statist. 16 (1988) 342–355.
Haughton, D., Haughton, J. and Izenman, A.J.: Information criteria and harmonic models in time series analysis. J. Statist. Comput. Simul. 35 (1990) 187–207.
Hyvärinen, L.: Classification of qualitative data. Nord. Tidskrift Info. Behandling (BIT) 2 (1962), no. 2, 83–89.
Jacobsen, M.: Existence and unicity of MLE’s in discrete exponential famila distributions. Scand. J. Statist. 16 (1989) 335–350.
Jain, A.K. and Dubes, R.C.: Algorithms for cluster analysis. Prentice Hall, Englewood Cliffs NJ, 1988.
Jones, L.K. et al.: General entropy criteria for inverse problems, with applications to data compression, pattern classification and cluster analysis. IEEE Trans. Inform. Theory IT-36(1990) 23–30.
Kashyap, R.L.: Optimal choice of AR and MA parts in autoregressive moving average models. IEEE Trans. Pattern Analysis and Machine Intelligence PAMI 4 (1982) 99–104.
Khouas, S. and Parodi, A.: Towards natural clustering through entropy minimization. In: Diday, E., Lechevallier, Y. (eds.): Symbolic-numeric data analysis and learning. Nova Science Publishers, New York, 1991, 429–442.
Koziol, J.A.: Cluster analysis of antigenic profiles of tumors: Selection of number of clusters using Akaike’s information criterion. Methods of Information in Medicine 29(1990) 200–204.
Lambert, J.M. and Williams, W.T.: Multivariate methods in plant ecology. VI. Comparison of information analysis and association analysis. J. Ecology 54 (1966) 635–664.
Lance, G.N. and Williams, W.T.: Mixed data classificatory programs. I. Agglomerative systems. Australian Computer J. 1 (1967) 15–20.
Lance, G.N. and Williams, W.T.: Note on a new information statistic classificatory program. Computer J. 11 (1968) 195.
Lauritzen, S.L. and Wermuth, N.: Graphical models for associations between variables, some of which are qualitative and some quantitative. Ann. Statist. 17 (1989) 31–57.
Lee, K.L.: Multivariate tests for clusters. J. Amer. Statist. Assoc. 74 (1979) 708–714.
Macnaughton-Smith, P.: Some statistical and other numerical techniques for classifying individuals. Home Office Research Unit Report No. 6, H.M.S.O. London, 1965.
McLachlan, G.J.: On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics 36 (1987) 318–324.
McLachlan, G.J. and Basford, K.E.: Mixture models. Inference and applications to clustering. Marcel Dekker, New York — Basel, 1988.
Nishii, R.: Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist. 12 (1984) 758–765.
Orloci, L.: Information analysis in phytosociology: Partition, classification and prediction. J. Theoret. Biology 20 (1968) 271–284.
Orloci, L.: Information theory models for hierarchic and non-hierarchic classification. In: A.J. Cole (ed.): Numerical taxonomy. Academic Press, New York, 1969, 148–165.
Rissanen, J.: Modeling by shortest data description. Automatika 14 (1978) 465–471.
Rissanen, J.: Stochastic complexity and modeling. Ann. Statist. 14 (1986) 1080–1100.
Rousseau, P.: Analyse de données binaires. Ph. D. thesis, Université de Montréal, 1978.
Rousseau, P. and Sankoff, D.: A solution to the problem of grouping speakers. In: Sankoff, D.: Linguistic variation: models and methods. Academic Press, New York, 1978, 97–117.
Schroeder, A.: Analyse d’un mélange de distributions de probabilité de même type. Revue de Statistique Appliquée 24 (1976), no. 1, 39–62.
Schwarz, G.: Estimating the dimension of a model. Ann. Statist. 6 (1978) 461–464.
Silverman, B.W.: Using kernel density estimates to investigate multimodality. J. Roy. Statist. Soc. B 43 (1981) 97–99.
Späth, H.: Cluster dissection and analysis. Theory, FORTRAN programs, examples. Ellis Horwood Ltd./Wiley, Chichester, 1985.
Spruill, M.C.: Cell selection in the Chernoff-Lehmann chi-square statistics. Ann. Statist. 4(1976), 375–383.
Thode, H.C., Finch, S.J. and Mendell, N.R.: Simulated percentage points for the null distribution of the likelihood ratio test for a mixture of two normals. Biometrics 44 (1988) 1195–1201.
Titterington, D.M., Smith, A.F.M. and Mokov, U.E.: Statistical analysis of finite mixture distributions. Wiley, Chichester, 1985.
Vogel, F.: Ein Streuungsmaß für komparative Merkmale. Jahrbücher für Nationalökonomie und Statistik 197 (1982) (2) 145–157.
Wallace, C.S. and Boulton, B.M.: An information measure for classification. Computer J. 11 (1968) 185–194.
Williams, W.T. and Dale, M.B.: Fundamental problems in numerical taxonomy. Advances Botanical Research 2 (1965) 35–68.
Williams, W.T., Lambert, J.M. and Lance, G.N.: Multivariate methods in plant ecology V. Similarity analysis and information analysis. J. Ecology 54 (1966) 427–445.
Whittaker, J.: Graphical models in applied multivariate statistics. Wiley, New York, 1989.
Windham, M.P.: Parameter modification for clustering. J. of Classification 4(1987) 191–214.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Bock, H.H. (1994). Information and Entropy in Cluster Analysis. In: Bozdogan, H., et al. Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0800-3_4
Download citation
DOI: https://doi.org/10.1007/978-94-011-0800-3_4
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-4344-1
Online ISBN: 978-94-011-0800-3
eBook Packages: Springer Book Archive