An investigation of nine procedures for detecting the structure in a data set
A problem common to all clustering techniques is the difficulty of deciding the number of clusters present in the data. The aim of this paper is to assess the performance of the best stopping rules from the Milligan and Cooper’s (1985) study, on specific artificial data sets containing a particular cluster structure. To provide a variety of solutions the data sets are analysed by four clustering procedures. We compare also these results with those obtained by three methods based on the hypervolume clustering criterion.
KeywordsClustering stopping rule number of clusters hypervolume criterion
Unable to display preview. Download preview PDF.
- Beale, E. M. L. (1969): Euclidean cluster analysis. Bulletin of the International Statistical Institute, 43, 2, 92–94.Google Scholar
- Duda, R.O., and Hart, P.E. (1973): Pattern Classification and Scene Analysis. Wiley, New York.Google Scholar
- Gordon, A.D. (1997): How many clusters? An investigation of five procedures for detecting nested cluster structure, in Proceedings of the IFCS-96 Conference, Kobe (in print).Google Scholar
- Hardy, A., and Rasson, J.P. (1982): Une nouvelle approche des problèmes de classification automatique. Statistique et Analyse des données, 7, 41–56.Google Scholar
- Hardy, A. (1983): Statistique et classification automatique: Un modèle - Un nouveau critère - Des algorithmes - Des applications. Ph.D Thesis, F.U.N.D.P., Namur, Belgium.Google Scholar
- Hardy, A. (1994): An examination of procedures for determining the number of clusters in a data set, in New Approches in Classification and Data Analysis, E. Diday et al. (Editors), Springer-Verlag, Paris, 178–185.Google Scholar
- Sarle, W.S. (1983): Cubic Clustering Criterion. Technical Report: A-108, SAS Institute Inc., Cary, NC, USA.Google Scholar
- Wishart, D. (1978): CLUSTAN User Manual, 3rd edition, Program Library Unit, University of Edinburgh.Google Scholar