Advertisement

Probability Distributions on Indexed Dendrograms and Related Problems of Classifiability

  • Bernard Van Cutsem
  • Bernard Ycart
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Summary

This paper studies the dendrograms produced by algorithms of classification such as the Single Link Algorithm. We introduce probability distributions on dendrograms corresponding to distinct non classifiability hypotheses. The distributions of the height of a random dendrogram under these hypotheses are studied and their asymptotics explicitly computed. This leads to statistical tests for non-classifiability.

Keywords

Null Hypothesis Random Graph Asymptotic Distribution Threshold Function Dissimilarity Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benzecri, J.P. (1973): L’analyse des données. I. La taxinomie. Dunod. Paris.Google Scholar
  2. Bock, H.H. (1985): On some significance tests in cluster analysis. Journal of Classification, 2, 77–108.CrossRefGoogle Scholar
  3. Bock, H.H. (1995a): Probabilistic models in cluster analysis. Comput. Statist. Data Anal, (to appear).Google Scholar
  4. Bock, H.H. (1995b): Probabilistic approaches and hypothesis testing in partitional cluster analysis. To appear in: Ph. Arabie, L. Hubert and G. de Soete (eds.): Clustering and classification. World Sciences Publ. Singapore, NJ.Google Scholar
  5. Critchley, F., and Fichet, B. (1994): The partial order by inclusion of the principal classes of dissimilarity on a finite set, and some of their basic properties. In: B. Van Cutsem (ed.): Classification and Dissimilarity Analysis. Lecture Notes in Statistics 93. Springer-Verlag, New York, 5–65.Google Scholar
  6. Erdôs, P., and Renyi, A. (1960): On the evolution of random graphs. Magyar Tud. Akad. Mat. Kut. Int. Kozi, 5, 17–61.Google Scholar
  7. Florek, K.J., Lukaszewicz, J., Perkal, J., Steinhaus, H., and Zu- Brzycki, S. (1951a): Sur la liaison et la division des points d’un ensemble fini. Colloquium Math, 2, 282–285.Google Scholar
  8. Florek, K.J., Lukaszewicz, J., Perkal, J., Steinhaus, H., and Zu- Brzycki, S. (1951b): Taksonomia Wroclawska. Przegl. AntropoL, 17, 193–211.Google Scholar
  9. Frank, O., and Svensson, K. (1981): On probability distributions of single link dendrograms. J. Statist. Comput. Simul, 12, 121–131.CrossRefGoogle Scholar
  10. Hartigan, J.A. (1967): Representations of similarity matrices by trees. J. Amer. Statist. Assoc, 62, 1140–1158.CrossRefGoogle Scholar
  11. Jain, A.K., and DUBES, R.C. (1988): Algorithms for clustering data. Prentice Hall, Englewood Cliffs.Google Scholar
  12. Jardine, C.J., Jardine, N., and SIBSON, R. (1967): The structure and the construction of taxonomic hierarchies. Math. Biosci, 1, 171–179.CrossRefGoogle Scholar
  13. Johnson, S.C. (1967): Hierarchical clustering schemes. Psychometrika, 32, 241–254.CrossRefGoogle Scholar
  14. Lengyel, T. (1984): On a recurrence involving Stirling numbers. Europ. J. Combinatorics, 5, 313–321.Google Scholar
  15. Lerman, I.C. (1970): Les bases de la classification automatique. Gauthier- Villard, Paris.Google Scholar
  16. Ling, R.F., and Killough, G. G. (1976): Probability tables for cluster analysis based on a theory of random graphs. J. Amer. Statist. Assoc, 71, 293–300.CrossRefGoogle Scholar
  17. Ling, R.F. (1973): A probability theory of cluster analysis. J. Amer. Statist. Assoc, 68, 159–164.CrossRefGoogle Scholar
  18. Murtagh, F. (1983): A probability theory of hierarchic clustering using random dendrograms. J. Statist. Comput. Simul, 18, 145–157.CrossRefGoogle Scholar
  19. Sneath, P.H.A. (1957): The application of computers to taxonomy. J. Gen. Microbiol, 17, 184–200.Google Scholar
  20. Sneath, P.H.A., and Sokal, R.R. (1973): Numerical Taxonomy. Freeman, San Francisco.Google Scholar
  21. Spencer, J. (1993): Nine lectures on random graphs. In: P.L. Hennequin (éd.): Ecole d’été de probabilités de Saint-Flour XXI - 1991. Lecture Notes in Mathematics 1541. Springer Verlag, Berlin, 293–347.Google Scholar
  22. Van Cutsem, B. (1995): Combinatorial structures and structures for classification. To appear in: Proceedings of the XIVth Journées Franco-Belges de Statisticiens. Namur, Nov. 1993. Springer Verlag, Berlin.Google Scholar
  23. Van Cutsem, B., and Ycart, B. (1994): Renewal-type behaviour of absorption times in Markov Chains. Adv. Appl. Prob, 26, 998–1005.CrossRefGoogle Scholar
  24. Wolfram, S. (1992): Mathematica. Wolfram Res. Inc.Google Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 1996

Authors and Affiliations

  • Bernard Van Cutsem
    • 1
  • Bernard Ycart
    • 1
  1. 1.Laboratoire Modélisation et Calcul - I.M.A.G.Grenoble Cedex 9France

Personalised recommendations