Summary
This paper studies the dendrograms produced by algorithms of classification such as the Single Link Algorithm. We introduce probability distributions on dendrograms corresponding to distinct non classifiability hypotheses. The distributions of the height of a random dendrogram under these hypotheses are studied and their asymptotics explicitly computed. This leads to statistical tests for non-classifiability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Benzecri, J.P. (1973): L’analyse des données. I. La taxinomie. Dunod. Paris.
Bock, H.H. (1985): On some significance tests in cluster analysis. Journal of Classification, 2, 77–108.
Bock, H.H. (1995a): Probabilistic models in cluster analysis. Comput. Statist. Data Anal, (to appear).
Bock, H.H. (1995b): Probabilistic approaches and hypothesis testing in partitional cluster analysis. To appear in: Ph. Arabie, L. Hubert and G. de Soete (eds.): Clustering and classification. World Sciences Publ. Singapore, NJ.
Critchley, F., and Fichet, B. (1994): The partial order by inclusion of the principal classes of dissimilarity on a finite set, and some of their basic properties. In: B. Van Cutsem (ed.): Classification and Dissimilarity Analysis. Lecture Notes in Statistics 93. Springer-Verlag, New York, 5–65.
Erdôs, P., and Renyi, A. (1960): On the evolution of random graphs. Magyar Tud. Akad. Mat. Kut. Int. Kozi, 5, 17–61.
Florek, K.J., Lukaszewicz, J., Perkal, J., Steinhaus, H., and Zu- Brzycki, S. (1951a): Sur la liaison et la division des points d’un ensemble fini. Colloquium Math, 2, 282–285.
Florek, K.J., Lukaszewicz, J., Perkal, J., Steinhaus, H., and Zu- Brzycki, S. (1951b): Taksonomia Wroclawska. Przegl. AntropoL, 17, 193–211.
Frank, O., and Svensson, K. (1981): On probability distributions of single link dendrograms. J. Statist. Comput. Simul, 12, 121–131.
Hartigan, J.A. (1967): Representations of similarity matrices by trees. J. Amer. Statist. Assoc, 62, 1140–1158.
Jain, A.K., and DUBES, R.C. (1988): Algorithms for clustering data. Prentice Hall, Englewood Cliffs.
Jardine, C.J., Jardine, N., and SIBSON, R. (1967): The structure and the construction of taxonomic hierarchies. Math. Biosci, 1, 171–179.
Johnson, S.C. (1967): Hierarchical clustering schemes. Psychometrika, 32, 241–254.
Lengyel, T. (1984): On a recurrence involving Stirling numbers. Europ. J. Combinatorics, 5, 313–321.
Lerman, I.C. (1970): Les bases de la classification automatique. Gauthier- Villard, Paris.
Ling, R.F., and Killough, G. G. (1976): Probability tables for cluster analysis based on a theory of random graphs. J. Amer. Statist. Assoc, 71, 293–300.
Ling, R.F. (1973): A probability theory of cluster analysis. J. Amer. Statist. Assoc, 68, 159–164.
Murtagh, F. (1983): A probability theory of hierarchic clustering using random dendrograms. J. Statist. Comput. Simul, 18, 145–157.
Sneath, P.H.A. (1957): The application of computers to taxonomy. J. Gen. Microbiol, 17, 184–200.
Sneath, P.H.A., and Sokal, R.R. (1973): Numerical Taxonomy. Freeman, San Francisco.
Spencer, J. (1993): Nine lectures on random graphs. In: P.L. Hennequin (éd.): Ecole d’été de probabilités de Saint-Flour XXI - 1991. Lecture Notes in Mathematics 1541. Springer Verlag, Berlin, 293–347.
Van Cutsem, B. (1995): Combinatorial structures and structures for classification. To appear in: Proceedings of the XIVth Journées Franco-Belges de Statisticiens. Namur, Nov. 1993. Springer Verlag, Berlin.
Van Cutsem, B., and Ycart, B. (1994): Renewal-type behaviour of absorption times in Markov Chains. Adv. Appl. Prob, 26, 998–1005.
Wolfram, S. (1992): Mathematica. Wolfram Res. Inc.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Van Cutsem, B., Ycart, B. (1996). Probability Distributions on Indexed Dendrograms and Related Problems of Classifiability. In: Bock, HH., Polasek, W. (eds) Data Analysis and Information Systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-80098-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-80098-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60774-8
Online ISBN: 978-3-642-80098-6
eBook Packages: Springer Book Archive