Advertisement

Summary

Clustering algorithms can provide misleading summaries of data, and attention has been devoted to investigating ways of guarding against reaching incorrect conclusions, by validating the results of a cluster analysis. The paper provides an overview of recent work in this area of cluster validation. Material covered includes: the distinction between external, internal, and relative clustering indices; types of null model, including ‘data-influenced’ null models; tests of the complete absence of any class structure in a data set; and ways of assessing the validity of individual clusters, partitions of data into disjoint clusters, and hierarchical classifications. A discussion indicates areas in which further research seems desirable.

Keywords

Cluster Algorithm Null Model Random Graph Class Structure American Statistical Association 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arnold, S. J. (1979): A test for clusters. Journal of Marketing Research, 16, 545–551.CrossRefGoogle Scholar
  2. Art, D., Gnanadesikan, R. and Kettenring, J. R. (1982): Data-based metrics for cluster analysis. Utilitas Mathematica, 21A, 75–99.MathSciNetGoogle Scholar
  3. Bailey, T. A., Jr. and Dubes, R. (1982): Cluster validity profiles. Pattern Recognition, 15, 61–83.MathSciNetCrossRefGoogle Scholar
  4. Baker, F. B. (1974): Stability of two hierarchical grouping techniques case I: Sensitivity to data errors. Journal of the American Statistical Association, 69, 440–445.Google Scholar
  5. Baker, F. B. and Hubert, L. J. (1976): A graph-theoretic approach to goodness-of-fit in complete link hierarchical clustering. Journal of the American Statistical Association, 71, 870–878.MATHCrossRefGoogle Scholar
  6. Barnett, V., Kay, R. and Sneath, P. H. A. (1979): A familiar statistic in an unfamiliar guise A problem in clustering. The Statistician, 28, 185–191.CrossRefGoogle Scholar
  7. Beale, E. M. L. (1969): Euclidean cluster analysis. Bulletin of the International Statistical Institute, 43 (2), 92–94.Google Scholar
  8. Begovich, C. L. and Kane, V. E. (1982): Estimating the number of groups and group membership using simulation cluster analysis. Pattern Recognition, 15, 335–342.MATHCrossRefGoogle Scholar
  9. Binder, D. A. (1978): Bayesian cluster analysis. Biometrika, 65, 31–38.MathSciNetMATHCrossRefGoogle Scholar
  10. Bobisud, H. M. and Bobisud, L. E. (1972): A metric for classification. Taxon, 21, 607–613.CrossRefGoogle Scholar
  11. Bock, H. H. (1974): Automatische Klassifikation: Theoretische und Praktische Methoden zur Gruppierung und Strukturierung von Daten (Cluster-Analyse). Vandenhoeck Ruprecht, Göttingen.Google Scholar
  12. Bock, H. H. (1985): On some significance tests in cluster analysis. Journal of Classification, 2, 77–108.MathSciNetMATHCrossRefGoogle Scholar
  13. Bock, H. H. (1989): Probabilistic aspects in cluster analysis. In Conceptual and Numerical Analysis of Data, Opitz, O. (ed.), 12–44, Springer-Verlag, Berlin.Google Scholar
  14. Bock, H. H. (1996): Probability models and hypothesis testing in partitioning cluster analysis. In Clustering and Classification, Arabie, P. et al. (eds.), 377–453, World Scientific Publishing, River Edge, NJ.Google Scholar
  15. Boorman, S. A. and Olivier, D. C. (1973): Metrics on spaces of finite trees. Journal of Mathematical Psychology, 10, 26–59.MathSciNetMATHCrossRefGoogle Scholar
  16. Brailovsky, V. L. (1991): A probabilistic approach to clustering. Pattern Recognition Letters, 12, 193–198.CrossRefGoogle Scholar
  17. Breckenridge, J. N. (1989): Replicating cluster analysis: Method, consistency and validity. Multivariate Behavioral Research, 24, 147–161.Google Scholar
  18. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984): Classification and Regression Trees. Wadsworth, Belmont, CA.Google Scholar
  19. Calinski, T. and Harabasz, J. (1974): A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.MathSciNetMATHGoogle Scholar
  20. Chand, D. R. and Kapur, S. S. (1970): An algorithm for convex polytopes. Journal of the Association for Computing Machinery, 17, 78–86.MathSciNetMATHCrossRefGoogle Scholar
  21. Chazelle, B. (1985): Fast searching in a real algebraic manifold with applications to geometric complexity. Lecture Notes in Computer Science, 185, 145–156.MathSciNetCrossRefGoogle Scholar
  22. Cross, G. C. and Jain, A. K. (1982): Measurement of clustering tendency. In Proceedings of IFAC Symposium on Theory and Application of Digital Control (Volume 2),24–29, New Delhi.Google Scholar
  23. Cunningham, K. M. and Ogilvie, J. C. (1972): Evaluation of hierarchical grouping techniques: A preliminary study. Computer Journal, 15, 209–213.CrossRefGoogle Scholar
  24. Davies, D. L. and Bouldin, D. W. (1979): A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1, 224–227.Google Scholar
  25. De Soete, G., Carroll, J. D. and DeSarbo, W. S. (1987): Least squares algorithms for constructing constrained ultrametric and additive tree representations of symmetric proximity data. Journal of Classification, 4, 155–173.MathSciNetMATHCrossRefGoogle Scholar
  26. Diday, E. and Govaert, G. (1977): Classification automatique avec distances adaptatives. R. A. I. R. O. Informatique/Computer Sciences, 11, 329–349.MathSciNetMATHGoogle Scholar
  27. Diday, E. and Simon, J. C. (1976): Clustering analysis. In Communication and Cybernetics 10 Digital Pattern Recognition, Fu, K. S. (ed.), 47–94, Springer-Verlag, Berlin.CrossRefGoogle Scholar
  28. Diggle, P. J. (1983): Statistical Analysis of Spatial Point Patterns. Academic Press, London.MATHGoogle Scholar
  29. Dobkin, D. and Lipton, R. J. (1976): Multidimensional.searching problems. SIAM Journal on Computing, 5, 181–186.MathSciNetMATHCrossRefGoogle Scholar
  30. Dubes, R. C. (1987): How many clusters are best?–An experiment. Pattern Recognition, 20, 645–663.CrossRefGoogle Scholar
  31. Dubes, R. C. and Zeng, G. (1987): A test for spatial homogeneity in cluster analysis. Journal of Classification, 4, 33–56.CrossRefGoogle Scholar
  32. Duda, R. O. and Hart, P. E. (1973): Pattern Classification and Scene Analysis. Wiley, New York. Edelsbrunner, H. ( 1987 ): Algorithms in Combinatorial Geometry. Springer-Verlag, Berlin.Google Scholar
  33. Engelman, L. and Hartigan, J. A. (1969): Percentage points of a test for clusters. Journal of the American Statistical Association, 64, 1647–1648.CrossRefGoogle Scholar
  34. Estabrook, G. F. (1966): A mathematical model in graph theory for biological classification. Journal of Theoretical Biology, 12, 297–310.CrossRefGoogle Scholar
  35. Faust, K. and Romney, A. K. (1985): The effect of skewed distributions on matrix permutation tests. British Journal of Mathematical and Statistical Psychology, 38, 152–160.Google Scholar
  36. Fisher, D. (1996): Iterative optimization and simplification of hierarchical clusterings. Journal of Artificial Intelligence Research, 4, 147–180.MATHGoogle Scholar
  37. Fisher, L. and Van Ness, J. W. (1971): Admissible clustering procedures. Biometrika, 58, 91–104.MathSciNetMATHCrossRefGoogle Scholar
  38. Frank, O. (1978): Inferences concerning cluster structure. In COMPST.4T 1978, Corsten, L. C. A. and Hermans, J. (eds.), 259–265, Physica-Verlag, Wien.Google Scholar
  39. Frank, O. and Harary, F. (1982): Cluster inference by using transitivity indices in empirical graphs. Journal of the American Statistical Association, 77, 835–840.MathSciNetMATHCrossRefGoogle Scholar
  40. Frank, O. and Strauss, D. (1986): Markov graphs. Journal of the.American Statistical Association, 81, 832–842.MathSciNetMATHCrossRefGoogle Scholar
  41. Frank, O. and Svensson, K. (1981): On probability distributions of single-linkage dendrograms. Journal of Statistical Computation and Simulation, 12, 121–131.MathSciNetMATHCrossRefGoogle Scholar
  42. Friedman, J. H. and Rafsky, L. C. (1979): Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Annals of Statistics, 7, 697–717.MathSciNetMATHCrossRefGoogle Scholar
  43. Furnas, G. W. (1984): The generation of random, binary unordered trees. Journal of Classification, 1, 187–233.MathSciNetMATHCrossRefGoogle Scholar
  44. Gabriel, K. R. and Sokal, R. R. (1969): A new statistical approach to geographical variation anal-ysis. Systematic Zoology, 18, 259–278.CrossRefGoogle Scholar
  45. Gnanadesikan, R., Kettenring, J. R. and Landwehr, J. M. (1977): Interpreting and assessing the results of cluster analyses. Bulletin of the International Statistical Institute, 47 (2), 451–463.MathSciNetGoogle Scholar
  46. Godehardt, E. (1990): Graphs as Structural Models: The Application of Graphs and Multigraphs in Cluster Analysis ( 2nd edn. ). Friedr. Vieweg Sohn, Braunschweig.Google Scholar
  47. Goodman, L. A. and Kruskal, W. H. (1954): Measures of association for cross classifications. Journal of the American Statistical Association, 49, 732–764.MATHGoogle Scholar
  48. Gordon, A. D. (1981): Classification: Methods for the Exploratory Analysis of Multivariate Data. Chapman and Hall, London.Google Scholar
  49. Gordon, A. D. (1994): Identifying genuine clusters in a classification. Computational Statistics E Data Analysis, 18, 561–581.CrossRefGoogle Scholar
  50. Gordon, A. D. (1995): Tests for assessing clusters. Statistics in Transition, 2, 207–217.Google Scholar
  51. Gordon, A. D. (1996a): Hierarchical classification. In Clustering and Classification, Arabie, P. et al. (eds.), 65–121, World Scientific Publishing, River Edge, NJ.Google Scholar
  52. Gordon, A. D. (1996b): Null models in cluster validation. In From Data to Knowledge: Theoretical and Practical Aspects of Classification, Data Analysis, and Knowledge Organization, Gaul, W. and Pfeifer, D. (eds.), 32–44, Springer-Verlag, Berlin.Google Scholar
  53. Gordon, A. D. (1996c): How many clusters? An investigation of five procedures for detecting nested cluster structure. Paper presented at IFCS-96 Conference, Kobe, 27–30 March 1996.Google Scholar
  54. Gordon, A. D. (1996d): External validation in cluster analysis. Submitted for publication.Google Scholar
  55. Gordon, A. D. and De Cata, A. (1988): Stability and influence in sum of squares clustering. Metron, 46, 347–360.Google Scholar
  56. Gower, J. C. (1973): Classification problems. Bulletin of the International Statistical Institute, 45 (1), 471–477.Google Scholar
  57. Gower, J. C. and Banfield, C. F. (1975): Goodness-of-fit criteria for hierarchical classification and their empirical distributions. In Proceedings of the 8 °h International Biometric Conference, Corsten, L. C. A. and Postelnicu, T. (eds.), 347–361, Constantla, Romania.Google Scholar
  58. Harper, C. W., Jr. (1978): Groupings by locality in community ecology and paleoecology: Tests of significance. Lethaia, 11, 251–257.CrossRefGoogle Scholar
  59. Hartigan, J. A. (1975): Clustering Algorithms. Wiley, New York.MATHGoogle Scholar
  60. Hartigan, J. A. (1977): Distribution problems in clustering. In Classification and Clustering, Van Ryzin, J. (ed.), 45–71, Academic Press, New York.Google Scholar
  61. Hartigan, J. A. (1978): Asymptotic distributions for clustering criteria. Annals of Statistics, 6, 117–131.MathSciNetMATHCrossRefGoogle Scholar
  62. Hartigan, J. A. (1985): Statistical theory in clustering. Journal of Classification, 2, 63–76.MathSciNetMATHCrossRefGoogle Scholar
  63. Hartigan, J. A. (1988): The span test for unimodality. In Classification and Related Methods of Data Analysis, Bock, H. H. (ed.), 229–236, North-Holland, Amsterdam.Google Scholar
  64. Hartigan, J. A. and Mohanty, S. (1992): The runt test for multimodality. Journal of Classification, 9, 63–70.MathSciNetCrossRefGoogle Scholar
  65. Hill, R. S. (1980): A stopping rule for partitioning dendrograms. Botanical Gazette, 141, 321–324.CrossRefGoogle Scholar
  66. Hoffman, R. and Jain, A. K. (1983): A test of randomness based on the minimal spanning tree. Pattern Recognition Letters, 1, 175–180.CrossRefGoogle Scholar
  67. Hopkins, B. (1954): A new method for determining the type of distribution of plant individuals (with an appendix by J. G. Skellam). Annals of Botany, NS, 18, 213–227.Google Scholar
  68. Howe, S. E. (1979): Estimating Regions and Clustering Spatial Data: Analysis and Implementation of Methods Using the Voronoi Diagram. Unpublished Ph.D. thesis, Brown University, Providence, RI.Google Scholar
  69. Hubert, L. J. (1974a): Some applications of graph theory to clustering. Psychometrika, 39, 283–309.MathSciNetMATHCrossRefGoogle Scholar
  70. Hubert, L. (1974b): Approximate evaluation techniques for the single-link and complete-link hierarchical clustering procedures. Journal of the American Statistical Association, 69, 698–704.MathSciNetMATHCrossRefGoogle Scholar
  71. Hubert, L. J. (1987): Assignment Methods in Combinatorial Data Analysis. Marcel Dekker, New York.MATHGoogle Scholar
  72. Hubert, L. and Arabie, P. (1985): Comparing partitions. Journal of Classification, 2, 193–218.CrossRefGoogle Scholar
  73. Hubert, L. J. and Baker, F. B. (1977): The comparison and fitting of given classification schemes. Journal of Mathematical Psychology, 16, 233–253.MathSciNetMATHCrossRefGoogle Scholar
  74. Jackson, D. M. (1969): Comparison of classifications. In Numerical Taxonomy, Cole, A. J. (ed.), 91–113, Academic Press, London.Google Scholar
  75. Jain, A. K. and Dubes, R. C. (1988): Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
  76. Jain, A. K. and Moreau, J. V. (1987): Bootstrap techniques in cluster analysis. Pattern Recognition, 20, 547–568.CrossRefGoogle Scholar
  77. Jambu, M. and Lebeaux, M. 0. (1983): Cluster Analysis and Data Analysis. North-Holland, Amsterdam.Google Scholar
  78. Jardine, N. (1969): Towards a general theory of clustering (abstract). Biometrics, 25, 609–610.Google Scholar
  79. Jardine, N. and Sibson, R. (1971): Mathematical Taxonomy. Wiley, London.MATHGoogle Scholar
  80. Jolliffe, I. T., Jones, B. and Morgan, B. J. T. (1988): Stability and influence in cluster analysis. In Data Analysis and Informatics V, Diday, E. (ed.), 507–514, North-Holland, Amsterdam.Google Scholar
  81. Kelly, F. P. and Ripley, B. D. (1976): A note on Strauss’s model for clustering. Biometrika, 63, 357–360.MathSciNetMATHCrossRefGoogle Scholar
  82. Krzanowski, W. J. and Lai, Y. T. (1983): A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics, 44, 23–34.MathSciNetCrossRefGoogle Scholar
  83. Lanyon, S. M. (1985): Detecting internal inconsistencies in distance data. Systematic Zoology, 34, 397–403.CrossRefGoogle Scholar
  84. Lapointe, F.-J. (1996): To validate and how to validate? That is the real question. Paper presented at IFCS-96 Conference, Kobe, 27–30 March 1996.Google Scholar
  85. Lapointe, F.-J., Kirsch, J. A. W. and Bleiweiss, R. (1994): Jackknifing of weighted trees: Validation of phylogenies reconstructed from distance matrices. Molecular Phylogenetics and Evolution, 3, 256–267.CrossRefGoogle Scholar
  86. Lapointe, F.-J. and Legendre, P. (1990): A statistical framework to test the consensus of two nested classifications. Systematic Zoology, 39, 1–13.CrossRefGoogle Scholar
  87. Lapointe, F.-J. and Legendre, P. (1991): The generation of random ultrametric matrices representing dendrograms. Journal of Classification, 8, 177–200.CrossRefGoogle Scholar
  88. Lapointe, F.-J. and Legendre, P. (1995). Comparison tests for dendrograms: A comparative evaluation. Journal of Classification, 12, 265–282.CrossRefGoogle Scholar
  89. Lee, K. L. (1979): Multivariate tests for clusters. Journal of the American Statistical Association, 74, 708–714.MathSciNetMATHCrossRefGoogle Scholar
  90. Lefkovitch, L. P. (1978): Cluster generation and grouping using mathematical programming. Mathematical Biosciences, 41, 91–110.MATHCrossRefGoogle Scholar
  91. Lefkovitch, L. P. (1980): Conditional clustering. Biometrics, 36, 43–58.MATHCrossRefGoogle Scholar
  92. Legendre, P., Dallot, S. and Legendre, L. (1985): Succession of species within a community: Chronological clustering, with applications to marine and freshwater zooplankton. The American Naturalist, 125, 257–288.CrossRefGoogle Scholar
  93. Lerman, I. C. ( 1970: Les Bases de la Classification Automatique. Gauthier-Villars, Paris.MATHGoogle Scholar
  94. Lerman, I. C. (1980): Combinatorial analysis in the statistical treatment of behavioral data. Quality and Quantity, 14, 431–469.MathSciNetCrossRefGoogle Scholar
  95. Lerman, I. C. (1981): Classification et Analyse Ordinale des Données. Dunod, Paris.MATHGoogle Scholar
  96. Lerman, I. C. (1983): Sur la signification des classes issues d’une classification automatique de données. In Numerical Taxonomy, Felsenstein, J. (ed.), 179–198, Springer-Verlag, Berlin.Google Scholar
  97. Lerman, I. C. and Ghazzali, N. (1991): What do we retain from a classification tree? An experiment in image coding. In Symbolic-Numeric Data Analysis and Learning, Diday, E. and Lechevallier, Y. (eds.), 27–42, Nova Science, New York.Google Scholar
  98. Ling, R. F. (1972): On the theory and construction of k-clusters. Computer Journal, 15, 326–332.MathSciNetMATHCrossRefGoogle Scholar
  99. Ling, R. F. (1973a): A probability theory for cluster analysis. Journal of the American Statistical Association, 68, 159–164.MathSciNetMATHCrossRefGoogle Scholar
  100. Ling, R. F. (1973b): The expected number of components in random linear graphs. Annals of Probability, 1, 876–881.MATHCrossRefGoogle Scholar
  101. Ling, R. F. (1975): An exact probability distribution on the connectivity of random graphs. Journal of Mathematical Psychology, 12, 90–98.MATHCrossRefGoogle Scholar
  102. Ling, R. F. and Killough, G. G. (1976): Probability tables for cluster analysis based on a theory of random graphs. Journal of the American Statistical Association, 71, 293–300.MATHCrossRefGoogle Scholar
  103. McIntyre, R. M. and Blashfield, R. K. (1980): A nearest-centroid technique for evaluating the minimum-variance clustering procedure. Multivariate Behavioral Research, 15, 225–238.CrossRefGoogle Scholar
  104. McMorris, F. R., Meronk, D. B. and Neumann, D. A. (1983): A view of some consensus methods for trees. In Numerical Taxonomy, Felsenstein, J. (ed.), 122–126, Springer-Verlag, Berlin.Google Scholar
  105. McQuitty, L. L. (1963): Rank order typal analysis. Educational and Psychological Measurement, 23, 55–61.CrossRefGoogle Scholar
  106. McQuitty, L. L. (1967): A mutual development of some typological theories and pattern analytical methods. Educational and Psychological Measurement, 27, 21–46.CrossRefGoogle Scholar
  107. Marriott, F. H. C. (1982): Optimization methods of cluster analysis. Biometrica, 69, 417–422.MathSciNetCrossRefGoogle Scholar
  108. Matula, D. W. (1977): Graph theoretic techniques for cluster analysis algorithms. In Classification and Clustering, Van Ryzin, J. (ed.), 95–129, Academic Press, New York.Google Scholar
  109. Milligan; G. W. (1981): A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika, 46, 187–199.CrossRefGoogle Scholar
  110. Milligan, G. W. and Cooper, M. C. (1985): An examination of procedures for determining the number of dusters in a data set. Psychometrika, 50, 159–179.CrossRefGoogle Scholar
  111. Milligan, G. W. and Cooper, M. C. (1986): A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research, 21, 441–458.CrossRefGoogle Scholar
  112. Milligan, G. W. and Mahajan, V. (1980): A note on procedures for testing the quality of a clustering of a set of objects. Decision Sciences, 11, 669–677.CrossRefGoogle Scholar
  113. Milligan, G. W. and Sokol, L. M. (1980): A two-stage clustering algorithm with robust recovery characteristics. Educational and Psychological Measurement, 40, 755–759.CrossRefGoogle Scholar
  114. Müller, D. W. and Sawitzki, G. (1991): Excess mass estimates and tests for multimodality. Journal of the American Statistical Association, 86, 738–746.MathSciNetMATHGoogle Scholar
  115. Murtagh, F. (1984): Counting dendrograms: A survey. Discrete Applied Mathematics, 7, 191–199.MathSciNetMATHCrossRefGoogle Scholar
  116. Ogilvie, J. C. (1969): The distribution of number and size of connected components in random graphs of medium size. Information Processing, 68, 1527–1530.Google Scholar
  117. Overall, J. E. and Magee, K. N. (1992): Replication as a rule for determining the number of clusters in hierarchial cluster analysis. Applied Psychological Measurement, 16, 119–128.CrossRefGoogle Scholar
  118. Panayirci, E. and Dubes, R. C. (1983): A test for multidimensional clustering tendency. Pattern Recognition, 16, 433–444.MATHCrossRefGoogle Scholar
  119. Perruchet. C. (1983): Une analyse bibliographique des épreuves de classifiabilité en analyse des données. Statistiques et Analyse de Données, 8, 18–41.Google Scholar
  120. Pollard, D. (1982): A central limit theorem for k-means clustering. Annals of Probability, 10, 919–926.MathSciNetMATHCrossRefGoogle Scholar
  121. Quinlan, J. R. (1987): Simplifying decision trees. International Journal of Alan-Machine Studies, 27, 221–234.CrossRefGoogle Scholar
  122. Rand, W. M. (1971): Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.CrossRefGoogle Scholar
  123. Rapoport, A. and Fillenbaum, S. (1972): An experimental study of semantic structures. In Multidimensional Scaling. Theory and Applications in the Behavioral Sciences: Volume II. Applications, Romney, A. K. et al. (eds.), 93–131, Seminar Press, New York.Google Scholar
  124. Ratkowsky, D. A. (1984): A stopping rule and clustering method of wide applicability. Botanical Gazette, 145, 518–523.CrossRefGoogle Scholar
  125. Ripley, B. D. (1981): Spatial Statistics. Wiley, New York.MATHCrossRefGoogle Scholar
  126. Ripley, B. D. and Rasson, J.-P. (1977): Finding the edge of a Poisson forest. Journal of Applied Probability, 14, 483–491.MathSciNetMATHCrossRefGoogle Scholar
  127. Rivera, F. F., Zapata, E. L. and Carazo, J. M. (1990): Cluster validity based on the hard tendency of the fuzzy classification. Pattern Recognition Letters, 11, 7–12.MATHCrossRefGoogle Scholar
  128. Rohlf, F. J. (1970): Adaptive hierarchical clustering schemes. Systematic Zoology, 19, 58–82.CrossRefGoogle Scholar
  129. Rohlf, F. J. (1975): Generalization of the gap test for the detection of multivariate outliers. Biometrics, 31, 93–101.MATHCrossRefGoogle Scholar
  130. Rohlf, F. J. (1982): Consensus indices for comparing classifications. Mathematical Biosciences, 59, 131–144.MathSciNetCrossRefGoogle Scholar
  131. Rohlf, F. J. and Fisher, D. R. (1968): Tests for hierarchical structure in random data sets. Systematic Zoology, 17, 407–412.CrossRefGoogle Scholar
  132. Roubens, M. (1978): Pattern classification problems and fuzzy sets. Fuzzy Sets and Systems, 1, 239–253.MathSciNetMATHCrossRefGoogle Scholar
  133. Rousseeuw, P. J. (1987): Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.MATHCrossRefGoogle Scholar
  134. RozéJ, G. P. M. and Hartigan, J. A. (1994): The MAP test for multimodality. Journal of Classification, 11, 5–36.MathSciNetCrossRefGoogle Scholar
  135. Rubin, P. A. (1984): Generating random points in a polytope. Communications in Statistics: Simulation and Computation, B 13, 375–396.MATHCrossRefGoogle Scholar
  136. Sarle, W. S. (1983): Cubic Clustering Criterion. Technical Report A-108, SAS Institute, Cary, NC.Google Scholar
  137. Saunders, R. and Funk, G. M. (1977): Poisson limits for a clustering model of Strauss. Journal of Applied Probability, 14, 776–784.MathSciNetMATHCrossRefGoogle Scholar
  138. Schultz, J. V. and Hubert, L. J. (1973): Data analysis and the connectivity of random graphs. Journal of Mathematical Psychology, 10, 421–428.MATHCrossRefGoogle Scholar
  139. Scott, A. J. and Symons, M. J. (1971): Clustering methods based on likelihood ratio criteria. Biometrics, 27, 387–397.CrossRefGoogle Scholar
  140. Shepard, R. N. (1974): Representation of structure in similarity data: Problems and prospects. Psychometrika, 39, 373–421.MathSciNetMATHCrossRefGoogle Scholar
  141. Simberloff, D. (1987): Calculating probabilities that cladograms match: A method of biogeographical inference. Systematic Zoology, 36, 175–195.CrossRefGoogle Scholar
  142. Smith, S. P. and Dubes, R. (1980): Stability of a hierarchical clustering. Pattern Recognition, 12, 177–187.CrossRefGoogle Scholar
  143. Smith, S. P and Jain, A. K. (1984): Testing for uniformity in multidimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 73–81.Google Scholar
  144. Sneath, P. H. A. (1969): Evaluation of clustering methods (with Discussion). In Numerical Taxonomy, Cole, A. J. (ed.), 257–271, Academic Press, London.Google Scholar
  145. Sneath, P. H. A. (1977): A method for testing the distinctness of clusters: A test of the disjunction of two clusters in Euclidean space as measured by their overlap. Mathematical Geology, 9, 123–143.CrossRefGoogle Scholar
  146. Sneath, P. H. A. (1979): The sampling distribution of the W statistic of disjunction for the arbitrary division of a random rectangular distribution. Mathematical Geology, 11, 423–429.CrossRefGoogle Scholar
  147. Sneath, P. H. A. (1980). Some empirical tests for significance of clusters. In Data Analysis and Informatics, Diday, E. et al. (eds.), 491–508, North-Holland, Amsterdam.Google Scholar
  148. Sneath, P. H. A. (1986): Significance tests for multivariate normality of clusters from branching patterns in dendrograms. Mathematical Geology, 18, 3–32.CrossRefGoogle Scholar
  149. Sokal, R. R. and Rohlf, F. J. (1962): The comparison of dendrograms by objective methods. Taxon, 11, 33–40.CrossRefGoogle Scholar
  150. Strauss, D. J. (1975): A model for clustering. Biometrika, 62, 467–475.MathSciNetMATHCrossRefGoogle Scholar
  151. Strauss, R. E. (1982): Statistical significance of species clusters in association analysis. Ecology, 63, 634–639.CrossRefGoogle Scholar
  152. Van Cutsem, B. and Ycart, B. (1996): Indexed Dendrograms on Random Dissimilarities. Rapport MAI 23, CNRS, Université Joseph Fourier Grenoble I.Google Scholar
  153. Van Ness, J. W. (1973): Admissible clustering procedures. Biometrika, 60, 422–424.MathSciNetCrossRefGoogle Scholar
  154. van Rijsbergen, C. J. (1970): A clustering algorithm. Computer Journal, 13, 113–115.Google Scholar
  155. Vassiliou, A., Ignatiades, L. and Karydis, M. (1989): Clustering of transect phytoplankton collections with a quick randomization algorithm. Journal of Experimental Marine Biology and Ecology, 130, 135–145.CrossRefGoogle Scholar
  156. Ward, J. H., Jr. (1963): Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.MathSciNetCrossRefGoogle Scholar
  157. Windham, M. P. (1981): Cluster validity for fuzzy clustering algorithms. Fuzzy Sets and Systems, 5, 177–185.MATHCrossRefGoogle Scholar
  158. Windham, M. P. (1982): Cluster validity for the fuzzy c-means clustering algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-4, 357–363.Google Scholar
  159. Xu, S., Karnath, M. V. and Capson, D. W. (1993): Selection of partitions from a hierarchy. Pattern Recognition Letters, 14, 7–15.CrossRefGoogle Scholar
  160. Zeng, G. and Dubes, R. C. (1985a): A test for spatial randomness based on k-NN distances. Pattern Recognition Letters, 3, 85–91.CrossRefGoogle Scholar
  161. Zeng, G. and Dubes, R. C. (1985b): A comparison of tests for randomness. Pattern Recognition, 18, 191–198.CrossRefGoogle Scholar

Copyright information

© Springer Japan 1998

Authors and Affiliations

  • A. D. Gordon
    • 1
  1. 1.Mathematical InstituteUniversity of St AndrewsSt AndrewsScotland

Personalised recommendations