Skip to main content

Abstract

Cluster analysis provides methods for subdividing a set of objects into a suitable number of ‘classes’, ‘groups’, or ‘types’ C 1,…,C m such that each class is as homogeneous as possible and different classes are sufficiently separated. This paper shows how entropy and information measures have been or can be used in this framework. We present several probabilistic clustering approaches which are related to, or lead to, information and entropy criteria g(C) for selecting an optimum partition C = (C 1,…,C m ) of n data vectors, for qualitative and for quantitative data, assuming loglinear, logistic, and normal distribution models, together with appropriate iterative clustering algorithms. A new partitioning problem is considered in Section 5 where we look for a dissection (discretization) C of an arbitrary sample space Y (e.g. R p or 0,1p) such that the ø—divergence I c (P 0, P 1) between two discretized distributions P o (C i ), P 1(C i ) (i = 1,…, m) will be maximized (e.g., Kullback-Leibler’s discrimination information or the X 2 noncentrality parameter). We conclude with some comments on methods for selecting a suitable number of classes, e.g., by using Akaike’s information criterion AIC and its modifications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agresti, A.: Ordinal categorical data. Wiley, New York, 1990.

    Google Scholar 

  • Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., and Csaki, F. (eds.): Second International Symposium on Information Theory. Akademiai Kiado, Budapest, 1973, 267–281.

    Google Scholar 

  • Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19 (1974) 716–723.

    Article  MathSciNet  MATH  Google Scholar 

  • Akaike, H.: On entropy maximization principle. In: Krisnaiah, P.R. (ed.): Applications of statistics. North Holland, Amsterdam, 1977, 27–41.

    Google Scholar 

  • Akaike, H.: A Bayesian analysis of the minimum AIC procedure. Ann. Inst. Stat. Math. A 30 (1979) 9–14.

    Article  MathSciNet  Google Scholar 

  • Arnold, S.J.: A test for clusters. J. Marketing Research 16 (1979) 545–551.

    Article  Google Scholar 

  • Benzécri, J.P.: Théorie de l’information et classification d’après un tableau de contingence. In: Benzécri, J.P.: L’Analyse des Données, Vol. 1. Dunod, Paris, 1973, 207–236.

    Google Scholar 

  • Binder, D.A.: Bayesian cluster analysis. Biometrika 65 (1978) 31–38.

    Article  MathSciNet  MATH  Google Scholar 

  • Binder, D.A.: Approximations to Bayesian clustering rules. Biometrika 68 (1981) 275–286.

    Article  MathSciNet  Google Scholar 

  • Bock, H.H.: Statistische Modelle für die einfache und doppelte Klassifikation von normalverteilten Beobachtungen. Dissertation, University of Freiburg, 1968.

    Google Scholar 

  • Bock, H.H.: The equivalence of two extremal problems and its application to the iterative classification of multivariate data. Written version of a lecture given at the Conference on “Medizinische Statistik”, Forschungsinstitut Oberwolfach, February 23 — March 1, 1969, 10 pp.

    Google Scholar 

  • Bock, H.H.: Statistische Modelle und Bayes’sehe Verfahren zur Bestimmung einer unbekannten Klassifikation normalverteilter zufälliger Vektoren. Metrika 18 (1972) 120–132.

    Article  MathSciNet  MATH  Google Scholar 

  • Bock, H.H.: Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten (Clusteranalyse). Vandenhoeck & Ruprecht, Göttingen, 1974.

    Google Scholar 

  • Bock, H.H.: On tests concerning the existence of a classification. In: Proc. 1st Symp. Data Analysis and Informatics. Versailles, 1977. Institut de Recherche d’Informatique et d’Automatique (IRIA), Le Cesnay, France, 1977, 449–464.

    Google Scholar 

  • Bock, H.H.: A clustering algorithm for choosing optimal classes for the chi-square test. Bull 44th Session of the International Statistical Institute, Madrid, Contributed papers, Vol. 2, (1983) 758–762.

    Google Scholar 

  • Bock, H.H.: Statistical testing and evaluation methods in cluster analysis. In: Ghosh, J.K. and Roy, J. (eds.): Golden Jubilee Conference in Statistics: Applications and new directions. Calcutta, december 1981. Indian Statistical Institute, Calcutta, 1984, 116–146.

    Google Scholar 

  • Bock, H.H.: On some significance tests in cluster analysis. J. of Classification 2 (1985) 77–108.

    Article  MathSciNet  MATH  Google Scholar 

  • Bock, H.H.: Loglinear models and entropy clustering methods for qualitative data. In: Gaul, W. and Schader, M. (eds.): Classification as a tool of research. Proc. 9th Annual Conference of the Gesellschaft für Klassifikation, Karlsruhe, 1985. North Holland, Amsterdam, 1986, 19–26.

    Google Scholar 

  • Bock, H.H.: On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: Bozdogan, H. and Gupta, A.K. (eds.): Multivariate statistical modeling and data analysis. Reidel Publ., Dordrecht, 1987, 17–34.

    Chapter  Google Scholar 

  • Bock, H.H.: Probabilistic aspects in cluster analysis. In: O. Opitz (ed.): Conceptual and numerical analysis of data. Springer-Verlag, Heidelberg-Berlin, 1989, 12–44.

    Chapter  Google Scholar 

  • Bock, H.H.: A clustering technique for maximizing ø—divergence, noncentrality and discriminating power. In: Schader (ed.): Analyzing and modeling data and knowledge. Proc. 15th Annual Conference of the Gesellschaft für Klassifikation, Salzburg, 1991, Vol. 1. Springer-Verlag, Heidelberg — New York, 1991, 19–36.

    Google Scholar 

  • Boulton, D.M. and Wallace, C.S.: The information content of a multistate distribution. J. Theoretical Biology 23 (1969) 269–278.

    Article  MathSciNet  Google Scholar 

  • Bozdogan, H.: ICOMP: A new model selection criterion. In: Bock, H.H. (ed.): Classification and related methods of data analysis. Proc. First Conference of the International Federation of Classification Societies, Aachen, 1987. North Holland, Amsterdam, 1988, 599-608.

    Google Scholar 

  • Bozdogan, H.: On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Comm. Statist., Theory and Methods 19 (1990) 221–278.

    Article  MathSciNet  MATH  Google Scholar 

  • Bozdogan, H.: Choosing the number of component clusters in the mixture model using a new informational complexity criterion of the inverse Fisher information matrix. In: O. Opitz, B. Lausen, R. Klar (eds.): Information and classification. Proc. 16th Annual Conference of the Gesellschaft für Klassifikation, Dortmund, April 1992. Springer-Verlag, Heidelberg, 1993 (to appear).

    Google Scholar 

  • Bozdogan, H. and Gupta, A.K. (eds.): Multivariate statistical modeling and data analysis. Reidel Publ., Dordrecht, 1987.

    MATH  Google Scholar 

  • Bozdogan, H. and Sclove, S.L.: Multi-sample cluster analysis using Akaike’s information criterion. Ann. Inst. Statist. Math. 36 (1984), Part B, 163–180.

    Article  MATH  Google Scholar 

  • Bryant, P.: On characterizing optimization-based clustering methods. J. of Classification 5(1988) 81–84.

    Article  Google Scholar 

  • Carman, C.S., Merickel, M.B.: Supervising ISODATA with an information theoretic stopping rule. Pattern Recognition 23 (1990) 185.

    Article  Google Scholar 

  • Celeux, G.: Classification et modèles. Revue de Statistique Appliquée 36 (1988), no. 4, 43–58.

    MathSciNet  MATH  Google Scholar 

  • Celeux, G. and Govaert, G.: Clustering criteria for discrete data and latent class models. J. of Classification 8(1991) 157–176.

    Article  MATH  Google Scholar 

  • Ciampi, A., Thiffault, J. and Sagman, U.: Évaluation de classifications par le critère d’Akaike et la validation croisée. Revue de Statistique Appliquée 13 (1988) (3) 33–50.

    Google Scholar 

  • Csiszár, I.: Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica 2 (1967) 299–318.

    MathSciNet  MATH  Google Scholar 

  • Darroch, J.N., Lauritzen, S.L. and Speed, T.P.: Markov fields and log-linear interaction models for contingency tables. Ann. Statist. 8 (1980) 522–539.

    Article  MathSciNet  MATH  Google Scholar 

  • Diday, E. and Schroeder, A.: A new approach in mixed distributions detection. Revue Française d’Automatique, Informatique et Recherche Opérationnelle 10 (1976), no. 6, 75–106.

    MathSciNet  MATH  Google Scholar 

  • Diday, E. and Simon, J.C.: Clustering analysis. In: K.S. Fu (ed.): Digital pattern recognition. Springer-Verlag, Berlin, 1976, 47–94.

    Chapter  Google Scholar 

  • Diday, E. and Govaert, G.: Classification automatique avec distances adaptatives. Revue Française d’Automatique, Informatique et Recherche Opérationnelle (R.A.I.R.O.), Série Informatique 11 (1977) 329–349.

    MathSciNet  MATH  Google Scholar 

  • Diday, E. et al. (eds.): Optimisation en classification automatique I, II. Institut National de Recherche en Informatique et en Automatique, Le Chesnay, 1979.

    Google Scholar 

  • Eisenblätter, D. and Bozdogan, H.: Two-stage multi-sample cluster analysis as a general approach to discriminant analysis. In: Bozdogan, H., and A.K. Gupta (eds.): Multivariate statistical modeling and data analysis. Reidel Publ., Dordrecht, 1987, 95–119.

    Chapter  Google Scholar 

  • Eisenblätter, D. and Bozdogan, H.: Two-stage multi-sample cluster analysis. In: Bock, H.H. (ed.): Classification and related methods of data analysis. Proc. First Conference of the International Federation of Classification Societies, Aachen, 1987. North Holland, Amsterdam, 1988, 91–96.

    Google Scholar 

  • Engelman, L. and Hartigan, J.A.: Percentage points of a test for clusters. J. Amer. Statist. Assoc. 64 (1969) 1647–1648.

    Article  Google Scholar 

  • Everitt, B.S.: A Monte Carlo investigation of the likelihood ratio test for the number of components in a mixture of normal distributions. Multivariate Behavioral Reserach 16 (1981) 171–180.

    Article  Google Scholar 

  • Forst, H. T.: On the hierarchical classification of observation units according to comparative characteristics (in German). International Classification 5 (1978) 81–85.

    Google Scholar 

  • Ghosh, J.K. and Sen, P.K.: On the asymptotic performance of the log-likelihood ratio statistic for the mixture model and related results. In: LeCam, L.M. and R.A. Olshen (eds.): Proc. Berkely Conference in Honor of Jerzy Neyman and Jack Kiefer. Vol II. Wadsworth, Monterey, California, 1985, 789–806.

    Google Scholar 

  • Godehardt, E.: Graphs as structural models. The application of graphs and multigraphs in cluster analysis. Vieweg Verlag, Braunschweig, 19902.

    Google Scholar 

  • Govaert, G.: Classification avec distances adaptatives. Thèse de 3e cycle. Université Paris VI, 1975.

    Google Scholar 

  • Govaert, G.: Classification binaire et modèles. Revue Statistique Appliquée 38 (1990), no.1, 67–81.

    Google Scholar 

  • Haberman, S.J.: Log-linear models for frequency data: sufficient statistics and likelihood equations. Ann. Statist. 1 (1973) 617–632.

    Article  MATH  Google Scholar 

  • Haberman, S.J.: The analysis of frequency data. University of Chicago Press, Chicago, 1974.

    MATH  Google Scholar 

  • Haberman, S.J.: Log-linear models and frequency tables with small expected cell counts. Ann. Statist. 5 (1977) 1148–1169.

    Article  MathSciNet  MATH  Google Scholar 

  • Hall, P.: Akaike’s information criterion and Kullback-Leibler loss for histogram density estimation. Theory of Probability and Related Fields 85 (1990) 449–467.

    Article  MATH  Google Scholar 

  • Hartigan, J.A.: Asymptotic distributions for clustering criteria. Ann. Statist. 6 (1978) 117–131.

    Article  MathSciNet  MATH  Google Scholar 

  • Haughton, D.M.A.: On the choice of a model to fit data from an exponential family. Ann. Statist. 16 (1988) 342–355.

    Article  MathSciNet  MATH  Google Scholar 

  • Haughton, D., Haughton, J. and Izenman, A.J.: Information criteria and harmonic models in time series analysis. J. Statist. Comput. Simul. 35 (1990) 187–207.

    Article  MathSciNet  Google Scholar 

  • Hyvärinen, L.: Classification of qualitative data. Nord. Tidskrift Info. Behandling (BIT) 2 (1962), no. 2, 83–89.

    MATH  Google Scholar 

  • Jacobsen, M.: Existence and unicity of MLE’s in discrete exponential famila distributions. Scand. J. Statist. 16 (1989) 335–350.

    MathSciNet  Google Scholar 

  • Jain, A.K. and Dubes, R.C.: Algorithms for cluster analysis. Prentice Hall, Englewood Cliffs NJ, 1988.

    Google Scholar 

  • Jones, L.K. et al.: General entropy criteria for inverse problems, with applications to data compression, pattern classification and cluster analysis. IEEE Trans. Inform. Theory IT-36(1990) 23–30.

    Article  Google Scholar 

  • Kashyap, R.L.: Optimal choice of AR and MA parts in autoregressive moving average models. IEEE Trans. Pattern Analysis and Machine Intelligence PAMI 4 (1982) 99–104.

    Article  MATH  Google Scholar 

  • Khouas, S. and Parodi, A.: Towards natural clustering through entropy minimization. In: Diday, E., Lechevallier, Y. (eds.): Symbolic-numeric data analysis and learning. Nova Science Publishers, New York, 1991, 429–442.

    Google Scholar 

  • Koziol, J.A.: Cluster analysis of antigenic profiles of tumors: Selection of number of clusters using Akaike’s information criterion. Methods of Information in Medicine 29(1990) 200–204.

    Google Scholar 

  • Lambert, J.M. and Williams, W.T.: Multivariate methods in plant ecology. VI. Comparison of information analysis and association analysis. J. Ecology 54 (1966) 635–664.

    Article  Google Scholar 

  • Lance, G.N. and Williams, W.T.: Mixed data classificatory programs. I. Agglomerative systems. Australian Computer J. 1 (1967) 15–20.

    Google Scholar 

  • Lance, G.N. and Williams, W.T.: Note on a new information statistic classificatory program. Computer J. 11 (1968) 195.

    Google Scholar 

  • Lauritzen, S.L. and Wermuth, N.: Graphical models for associations between variables, some of which are qualitative and some quantitative. Ann. Statist. 17 (1989) 31–57.

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, K.L.: Multivariate tests for clusters. J. Amer. Statist. Assoc. 74 (1979) 708–714.

    Article  MathSciNet  MATH  Google Scholar 

  • Macnaughton-Smith, P.: Some statistical and other numerical techniques for classifying individuals. Home Office Research Unit Report No. 6, H.M.S.O. London, 1965.

    Google Scholar 

  • McLachlan, G.J.: On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics 36 (1987) 318–324.

    Article  Google Scholar 

  • McLachlan, G.J. and Basford, K.E.: Mixture models. Inference and applications to clustering. Marcel Dekker, New York — Basel, 1988.

    MATH  Google Scholar 

  • Nishii, R.: Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist. 12 (1984) 758–765.

    Article  MathSciNet  MATH  Google Scholar 

  • Orloci, L.: Information analysis in phytosociology: Partition, classification and prediction. J. Theoret. Biology 20 (1968) 271–284.

    Article  Google Scholar 

  • Orloci, L.: Information theory models for hierarchic and non-hierarchic classification. In: A.J. Cole (ed.): Numerical taxonomy. Academic Press, New York, 1969, 148–165.

    Google Scholar 

  • Rissanen, J.: Modeling by shortest data description. Automatika 14 (1978) 465–471.

    Article  MATH  Google Scholar 

  • Rissanen, J.: Stochastic complexity and modeling. Ann. Statist. 14 (1986) 1080–1100.

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseau, P.: Analyse de données binaires. Ph. D. thesis, Université de Montréal, 1978.

    Google Scholar 

  • Rousseau, P. and Sankoff, D.: A solution to the problem of grouping speakers. In: Sankoff, D.: Linguistic variation: models and methods. Academic Press, New York, 1978, 97–117.

    Google Scholar 

  • Schroeder, A.: Analyse d’un mélange de distributions de probabilité de même type. Revue de Statistique Appliquée 24 (1976), no. 1, 39–62.

    MathSciNet  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Statist. 6 (1978) 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Silverman, B.W.: Using kernel density estimates to investigate multimodality. J. Roy. Statist. Soc. B 43 (1981) 97–99.

    Google Scholar 

  • Späth, H.: Cluster dissection and analysis. Theory, FORTRAN programs, examples. Ellis Horwood Ltd./Wiley, Chichester, 1985.

    MATH  Google Scholar 

  • Spruill, M.C.: Cell selection in the Chernoff-Lehmann chi-square statistics. Ann. Statist. 4(1976), 375–383.

    Article  MathSciNet  MATH  Google Scholar 

  • Thode, H.C., Finch, S.J. and Mendell, N.R.: Simulated percentage points for the null distribution of the likelihood ratio test for a mixture of two normals. Biometrics 44 (1988) 1195–1201.

    Article  MathSciNet  MATH  Google Scholar 

  • Titterington, D.M., Smith, A.F.M. and Mokov, U.E.: Statistical analysis of finite mixture distributions. Wiley, Chichester, 1985.

    MATH  Google Scholar 

  • Vogel, F.: Ein Streuungsmaß für komparative Merkmale. Jahrbücher für Nationalökonomie und Statistik 197 (1982) (2) 145–157.

    Google Scholar 

  • Wallace, C.S. and Boulton, B.M.: An information measure for classification. Computer J. 11 (1968) 185–194.

    MATH  Google Scholar 

  • Williams, W.T. and Dale, M.B.: Fundamental problems in numerical taxonomy. Advances Botanical Research 2 (1965) 35–68.

    Article  Google Scholar 

  • Williams, W.T., Lambert, J.M. and Lance, G.N.: Multivariate methods in plant ecology V. Similarity analysis and information analysis. J. Ecology 54 (1966) 427–445.

    Article  Google Scholar 

  • Whittaker, J.: Graphical models in applied multivariate statistics. Wiley, New York, 1989.

    Google Scholar 

  • Windham, M.P.: Parameter modification for clustering. J. of Classification 4(1987) 191–214.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Bock, H.H. (1994). Information and Entropy in Cluster Analysis. In: Bozdogan, H., et al. Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0800-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-94-011-0800-3_4

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-4344-1

  • Online ISBN: 978-94-011-0800-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics