How Many Clusters? An Investigation of Five Procedures for Detecting Nested Cluster Structure

  • A. D. Gordon
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


The paper addresses the problem of identifying relevant values for the number of clusters present in a data set. The problem has usually been tackled by searching for a best partition using so-called stopping rules. It is argued that it can be of interest to detect cluster structure at several different levels, and five stopping rules that performed well in a previous investigation are modified for this purpose. The rules are assessed by their performance in the analysis of simulated data sets which contain nested cluster structure.


Cluster Structure Single Link Local Rule Cluster Criterion Global Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Beale, E. NI. L. (1969): Euclidean cluster analysis. Bulletin of the International Statistical Institute, 43(2), 92–94.Google Scholar
  2. Bock, H. H. (1996): Probability models and hypotheses testing in partitioning cluster analysis. In Clustering and Classification, Arabie, P., Hubert, L. J. and De Soete, G. (eds.), 377–453, World Scientific, River Edge, NJ.CrossRefGoogle Scholar
  3. Calinski, T. and Harabasz, J. (1974): A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.MathSciNetMATHGoogle Scholar
  4. Cooper, M. C. and Milligan, G. W. (1988): The effect of measurement error on determining the number of clusters in cluster analysis. In Data, Expert Knowledge and Decisions, Gaul. W. and Schader, M. (eds.), 319–328, Springer-Verlag, Berlin.Google Scholar
  5. Duda, R. O. and Hart, P. E. (1973): Pattern Classification and Scene Analysis. Wiley, New York.MATHGoogle Scholar
  6. Goodman, L. A. and Kruskal, W. H. (1954): Measures of association for cross-classifications. Journal of the American Statistical Association, 49, 732–764.MATHGoogle Scholar
  7. Gordon, A. D. (1996): Cluster validation. Paper presented at IFCS-96 Conference, Kobe, 27–30 March, 1996.Google Scholar
  8. Hubert, L. (1974): Approximate evaluation techniques for the single-link and complete-link hierarchical clustering procedures. Journal of the American Statistical Association, 69, 698–704.MathSciNetMATHCrossRefGoogle Scholar
  9. Jain, A. K. and Dubes, R. C. (1988): Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
  10. Milligan, G. W. and Cooper, M. C. (1985): An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.Google Scholar
  11. Scott, A. J. and Symons, M. J. (1971): Clustering methods based on likelihood ratio criteria. Biometrics, 27, 387–397.CrossRefGoogle Scholar

Copyright information

© Springer Japan 1998

Authors and Affiliations

  • A. D. Gordon
    • 1
  1. 1.Mathematical InstituteUniversity of St AndrewsNorth Haugh, St AndrewsScotland

Personalised recommendations