Setting the Number of Clusters in K-Means Clustering
K-means clustering is an efficient non-hierarchical clustering method, which became widely used in data mining. In applying the method, however, one needs to specify k,the number of clusters, a priori. In this short paper, we propose an exploratory procedure for setting k using Euclidean and/or Mahalanobis inter-point distances.
KeywordsMahalanobis Distance Iris Data Multivariate Normal Distribution Exploratory Procedure Rock Crab
Unable to display preview. Download preview PDF.
- Bensmail, H. and Meulman, J. J. (1998). MCMC inference for modelbased cluster analysis, Advances in Data Science and Classification, edited by Rizzi,A. and Vichi, M., Berlin: Springer.Google Scholar
- Huh, Myung-Hoe (2000). Double K-means clustering, Unpublished manuscript (Submitted to Korean Journal of Applied Statistics, Written in Korean).Google Scholar
- Jin, Seohoon (1999). A Study of the Partitioning Method for Cluster Analysis. Doctoral Thesis, Dept. of Statistics, Korea University. Seoul, Korea.Google Scholar
- Milligan, G. W. and Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set, Psychometrika, 50, 159179.Google Scholar
- Sarle, W. S. (1983). Cubic Clustering Criterion, Technical Report A-108. SAS Institute, NC: Cary.Google Scholar
- SAS Institute (1990). SAS/STAT User’s Guide (Vol. 1), Version 6 Fourth Edition. SAS Institute, NC: Cary.Google Scholar
- Sharma, S. (1996). Applied Multivariate Techniques. New York: Wiley. SPSS Inc. (1997). SPSS 7. 5 Statistical Algorithms. Chicago: SPSS Inc.Google Scholar
- Trejos, J., Murillo, A., and Piza, E. (1998). Global stochastic optimization techniques applied to partitioning, Advances in Data Science and Classification, edited by Rizzi, A. and Vichi, M., Berlin: Springer.Google Scholar