Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Significance tests for multivariate normality of clusters from branching patterns in dendrograms

  • 65 Accesses

  • 4 Citations

Abstract

A significance test is presented for whether, based on levels of branches in a dendrogram, a cluster is from a multivariate normal distribution. The method compares the observed cumulative graph of number of branches with a graph derived from a simple logistic function. Provided the number of objects or variables is not small, the difference between graphs can be tested by the Kolmogorov-Smirnov, Cramér-von Mises, and Lilliefors statistics.

Logistic functions were obtained by simulation and are available for three similarity measures: (1) Euclidean distances, (2) squared Euclidean distances, and (3) simple matching coefficients, and for five cluster methods: (1) WPGMA, (2) UPGMA, (3) single linkage (or minimum spanning trees), (4) complete linkage, and (5) Ward's increase in sums of squares. For simple matching coefficient, the mean intracluster similarity also is required.

The method allows a test of whether the dendrogram could be from a cluster of smaller dimensionality due to character correlations. Good fit of the data to abnormally large or small dimensionality provides an important warning to interpretation of the dendrogram. Quantiles of test statistics were found by simulation to be well-approximated by logistic functions. The Lilliefors test is recommended for general use; if a conservative test is required, the two-tailed Kolmogorov-Smirnov test is most suitable. The method is suitable for use with a hand calculator, and a computer program for it is available from the author.

This is a preview of subscription content, log in to check access.

References

  1. Atkinson, A. C. and Pearce, M. C., 1976, The computer generation of beta, gamma and normal random variables: J. Roy. Stat. Soc. Ser. A, v. 139, p 431–461.

  2. Blackith, R. E. and Reyment, R. A., 1971, Multivariate morphometrics: Academic Press, London and New York, 412 p.

  3. Box, G. E. P. and Muller, M. E., 1958, A note on the generation of random normal deviates: Ann. Math. Stat., v. 29, p. 610–611.

  4. Conover, W. J., 1971, Practical nonparametric statistics: John Wiley & Sons, New York, 462 p.

  5. Craddock, J. M. 1965, A meterological application of principal component analysis: Statistician, v. 15, p. 143–165.

  6. Crow, E. L., Davis, F. A., and Maxfield, M. W., 1960, Statistics manual with examples taken from ordnance development: Dover Publications, New York, 288 p.

  7. Davis, J. C., 1973, Statistics and data analysis in geology: John Wiley & Sons, 550 p.

  8. Day, N. E., 1969a, Estimating the components of a mixture of normal distributions: Biometrika, v. 56, p. 463–474

  9. Day, N. E., 1969b, Divisive cluster analysis and a test for multivariate normality: International Statistical Institute Bulletin, v. 43, no. 2, p. 110–112.

  10. Doran, J. E. and Hodson, F. R. 1975, Mathematics and computers in archaeology: Edinburgh University Press, Edinburgh, 381 p.

  11. Gower, J. C., 1966, Some distance properties of latent root and vector methods used in multivariate analysis: Biometrika, v. 53, p. 325–338.

  12. Gower, J. C. and Banfield, C. F., 1978, Goodness-of-fit criteria for hierarchical classification and their empirical functions: 8th International Biometric Symposium, Constanz, p. 347–361.

  13. Gower, J. C. and Ross, G. J. S., 1969, Minimun spanning trees and single linkage cluster analysis: Appl. Stat., v. 18, p. 54–64.

  14. Koziol, J. A., 1982, A class of invariant procedures for assessing multivariate normality: Biometrika, v. 69, p. 423–427.

  15. Lance, G. N. and Williams, W. T., 1967, A general theory of classificatory sorting strategies. I. Hierarchical systems: Comput. J., v. 9, p. 373–380.

  16. Lilliefors, H. W., 1967, On the Kolmogorov-Smirnov test for normality with mean and variance unknown: J. Amer. Stat. Assoc., v. 62, p. 399–402.

  17. Milligan, G. W., 1981, A review of Monte Carlo tests of cluster analysis: Multivar. Behav. Res., v. 16, p. 379–407.

  18. Milligan, G. W. and Mahajan, V., 1980, A note on procedures for testing the quality of a clustering of a set of objects: Decis. Sci., v. 11, p. 669–677.

  19. Mudholkar, G. S. and George, E. O., 1978, A remark on the shape of the logistic distribution: Biometrika, v. 63, p. 667–668.

  20. Owen, D. B., 1962, Handbook of statistical tables: Addison-Wesley, Reading, Massachusetts, 580 p.

  21. Sneath, P. H. A., 1979, BASIC program for a significance test for clusters in UPGMA dendrograms obtained from squared Euclidean distances: Comput. Geosci., v. 5, p. 127–137.

  22. Sneath, P. H. A., 1980a, Some empirical tests for significance of clusters,in E. Diday, L. Lebart, J. P. Pagès, and R. Tomassone (Eds). Data analysis and informatics. Proceedings of the Second International Symposium on Data Analysis and Informatics, organized by the Institut de Recherche d'Informatique et Automatique, Versailles, October 17–19, 1979: North Holland, Amsterdam, P. 491–508.

  23. Sneath, P. H. A., 1980b, The probability that distinct clusters will be unrecognized in low dimensional ordinations: Class. Soc. Bull., v. 4, no. 4, p 22–43.

  24. Sneath, P. H. A., 1983, Distortions of taxonomic structure from incomplete data on a restricted set of reference strains: J. Gen. Microbiol., v. 129, p. 1045–1073.

  25. Sneath, P. H. A. and Hansell, R. I. C. 1985, Naturalness and predictivity of classifications: Bio. J. Linn. Soc., v. 24, p. 217–231.

  26. Sneath, P. H. A. and Sokal, R. R., 1973, Numerical taxonomy: the principles and practice of numerical classification: W. H. Freeman, San Francisco, 573 p.

  27. Steel, R. G. D. and Torrie, J. H., 1960, Principles and procedures of statistics with special reference to the biological sciences: McGraw-Hill, New York, 481 p.

  28. Ward, J. H., Jr. 1963, Hierarchical grouping to optimize an objective function: J. Amer. Stat Assoc., v. 58, p. 236–244.

  29. Welch, B. L., 1949, Further note on Mrs. Aspin's tables and on certain approximations to the tabled function: Biometrika, v. 34, p. 293–296.

  30. Wishart, D., 1969, Algorithm for hierarchical classifications: Biometrics, v. 22, p. 165–170.

Download references

Author information

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Sneath, P.H.A. Significance tests for multivariate normality of clusters from branching patterns in dendrograms. Math Geol 18, 3–32 (1986). https://doi.org/10.1007/BF00897653

Download citation

Key words

  • classification
  • cluster analysis
  • significance tests
  • multivariate normality
  • minimum spanning trees