A Method to Estimate the Number of Clusters Using Gravity
The number of clusters is crucial to the correctness of the clustering. However, most available clustering algorithms have two main issues: (1) they need to specify the number of clusters by users; (2) they are easy to fall into local optimum because the selection of initial centers is random. To solve these problems, we propose a novel algorithm using gravity for auto determining the number of clusters, and this method can obtain the better initial centers. In the proposed algorithm, we firstly scatter some detectors on the data space uniformly and they can be moved according to the law of universal gravitation, and two detectors can be merged when the distance between them less than a given threshold. When all detectors no longer move, we take the number of detectors as the number of the clusters. Then, we utilize the finally obtained detectors as the initial center points. Finally, the experimental results show that the proposed method can automatically determine the number of clusters and generate better initial centers, thus the clustering accuracy is improved observably.
KeywordsClustering Number of clusters Initial centers Gravity Detector
This work is supported by the National Natural Science Foundation of China (No. 61472297 and No. 61402350 and No. 61662068).
- 2.MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA, pp. 281–297. University of California Press (1967)Google Scholar
- 3.Estivill, C.V., Yang, J.: Fast and robust general purpose clustering algorithms. Data Min. Knowl. Discov. 8(2), 127–150 (2004)Google Scholar
- 7.Chunsheng, H.U.A., Qian, C.H.E.N., et al.: RK-means clustering: K-means with reliability. IEICE Trans. Inf. Syst. E91D(1), 96–104 (2008)Google Scholar
- 9.Pelleg, D., Moore, A.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, pp. 727–734 (2000)Google Scholar
- 10.Hamerly, G., Elkan, C.: Learning the k in k-means. In: Proceedings of the 17th Annual Conference on Neural Information Processing Systems, pp. 281–288 (2003)Google Scholar