Skip to main content

A Method to Estimate the Number of Clusters Using Gravity

  • Conference paper
  • First Online:
  • 696 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 891))

Abstract

The number of clusters is crucial to the correctness of the clustering. However, most available clustering algorithms have two main issues: (1) they need to specify the number of clusters by users; (2) they are easy to fall into local optimum because the selection of initial centers is random. To solve these problems, we propose a novel algorithm using gravity for auto determining the number of clusters, and this method can obtain the better initial centers. In the proposed algorithm, we firstly scatter some detectors on the data space uniformly and they can be moved according to the law of universal gravitation, and two detectors can be merged when the distance between them less than a given threshold. When all detectors no longer move, we take the number of detectors as the number of the clusters. Then, we utilize the finally obtained detectors as the initial center points. Finally, the experimental results show that the proposed method can automatically determine the number of clusters and generate better initial centers, thus the clustering accuracy is improved observably.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Pena, J.M., Lozano, J.A., Larranaga, P.: An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recognit. Lett. 20, 1027–1040 (1999)

    Article  Google Scholar 

  2. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA, pp. 281–297. University of California Press (1967)

    Google Scholar 

  3. Estivill, C.V., Yang, J.: Fast and robust general purpose clustering algorithms. Data Min. Knowl. Discov. 8(2), 127–150 (2004)

    Google Scholar 

  4. Muchun, S.U., Chienhsing, C.H.O.U.: A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 674–680 (2001)

    Article  Google Scholar 

  5. Likas, A., Vlassis, M., Verbeek, J.: The global k-means clustering algorithm. Pattern Recognit. 36, 451–461 (2003)

    Article  Google Scholar 

  6. D’Urso, P., Giordani, P.: A robust fuzzy k-means clustering model for interval valued data. Comput. Stat. 21(2), 251–269 (2006)

    Article  MathSciNet  Google Scholar 

  7. Chunsheng, H.U.A., Qian, C.H.E.N., et al.: RK-means clustering: K-means with reliability. IEICE Trans. Inf. Syst. E91D(1), 96–104 (2008)

    Google Scholar 

  8. Timmerman, M.E., Ceulemans, E., et al.: Subspace K-means clustering. Behav. Res. Methods 45(4), 1011–1023 (2013)

    Article  Google Scholar 

  9. Pelleg, D., Moore, A.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, pp. 727–734 (2000)

    Google Scholar 

  10. Hamerly, G., Elkan, C.: Learning the k in k-means. In: Proceedings of the 17th Annual Conference on Neural Information Processing Systems, pp. 281–288 (2003)

    Google Scholar 

  11. Fujita, A., Takahashi, D.Y., Patriota, A.G.: A non-parametric method to estimate the number of clusters. Comput. Stat. Data Anal. 73, 27–39 (2014)

    Article  MathSciNet  Google Scholar 

  12. Kolesnikov, A., Trichina, E., Kauranne, T.: Estimating the number of clusters in a numerical data set via quantization error modeling. Pattern Recognit. 48(3), 941–952 (2015)

    Article  Google Scholar 

  13. Tzortzis Likas, G.A.: The MinMax k-means clustering algorithm. Pattern Recognit. 47, 2505–2516 (2014)

    Article  Google Scholar 

  14. Fang, K.T., Shiu, W.C., Pan, J.X.: Uniform designs based on Latin squares. Stat. Sin. 9(3), 905–912 (1999)

    MathSciNet  MATH  Google Scholar 

  15. Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman and Hall, London (1994)

    Book  Google Scholar 

  16. Zhang, L., Liang, Y., Jiang, J., Yu, R., Fang, K.T.: Uniform designs applied to nonlinear multivariate calibration by ANN. Anal. Chim. Acta 370(1), 65–77 (1998)

    Article  Google Scholar 

  17. Shang, F.H., Jiao, L.C.: Fast affinity propagation clustering: a multilevel approach. Pattern Recognit. 45, 474–486 (2012)

    Article  Google Scholar 

  18. http://archive.ics.uci.edu/ml/

  19. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intel. (PAMI) 1, 224–227 (1979)

    Article  Google Scholar 

  20. http://www.ux.uis.no/~tranden/brodatz.html

  21. http://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (No. 61472297 and No. 61402350 and No. 61662068).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Du .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Du, H., Wang, X., Huang, M., Wang, X. (2019). A Method to Estimate the Number of Clusters Using Gravity. In: Krömer, P., Zhang, H., Liang, Y., Pan, JS. (eds) Proceedings of the Fifth Euro-China Conference on Intelligent Data Analysis and Applications. ECC 2018. Advances in Intelligent Systems and Computing, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-03766-6_47

Download citation

Publish with us

Policies and ethics