Advertisement

An investigation of nine procedures for detecting the structure in a data set

  • André Hardy
  • Paul Andre
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

A problem common to all clustering techniques is the difficulty of deciding the number of clusters present in the data. The aim of this paper is to assess the performance of the best stopping rules from the Milligan and Cooper’s (1985) study, on specific artificial data sets containing a particular cluster structure. To provide a variety of solutions the data sets are analysed by four clustering procedures. We compare also these results with those obtained by three methods based on the hypervolume clustering criterion.

Keywords

Clustering stopping rule number of clusters hypervolume criterion 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beale, E. M. L. (1969): Euclidean cluster analysis. Bulletin of the International Statistical Institute, 43, 2, 92–94.Google Scholar
  2. Calinski, T., and Harabasz, J. (1974): A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.CrossRefGoogle Scholar
  3. Duda, R.O., and Hart, P.E. (1973): Pattern Classification and Scene Analysis. Wiley, New York.Google Scholar
  4. Goodman, L.A. and Kruskal, W.H. (1954): Measures of association for cross- classifications. Journal of the American Statistical Association, 49, 732–764.CrossRefGoogle Scholar
  5. Gordon, A.D. (1997): How many clusters? An investigation of five procedures for detecting nested cluster structure, in Proceedings of the IFCS-96 Conference, Kobe (in print).Google Scholar
  6. Hardy, A., and Rasson, J.P. (1982): Une nouvelle approche des problèmes de classification automatique. Statistique et Analyse des données, 7, 41–56.Google Scholar
  7. Hardy, A. (1983): Statistique et classification automatique: Un modèle - Un nouveau critère - Des algorithmes - Des applications. Ph.D Thesis, F.U.N.D.P., Namur, Belgium.Google Scholar
  8. Hardy, A. (1994): An examination of procedures for determining the number of clusters in a data set, in New Approches in Classification and Data Analysis, E. Diday et al. (Editors), Springer-Verlag, Paris, 178–185.Google Scholar
  9. Milligan, G.W. and Cooper, M.C. (1985): An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.CrossRefGoogle Scholar
  10. Ripley, B.D., and Rasson, J.P. (1977): Finding the edge of a Poisson Forest. Journal of Applied Probability, 14, 483–491.CrossRefGoogle Scholar
  11. Sarle, W.S. (1983): Cubic Clustering Criterion. Technical Report: A-108, SAS Institute Inc., Cary, NC, USA.Google Scholar
  12. Wishart, D. (1978): CLUSTAN User Manual, 3rd edition, Program Library Unit, University of Edinburgh.Google Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 1998

Authors and Affiliations

  • André Hardy
    • 1
  • Paul Andre
    • 1
  1. 1.Unité de Statistique, Département de MathématiqueFacultés Universitaires N.-D. de la PaixNamurBelgium

Personalised recommendations