Skip to main content

Accelerating K-Means by Grouping Points Automatically

  • Conference paper
  • First Online:
Book cover Big Data Analytics and Knowledge Discovery (DaWaK 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10440))

Included in the following conference series:

Abstract

K-means is a well-known clustering algorithm in data mining and machine learning. It is widely applicable in various domains such as computer vision, market segmentation, social network analysis, etc. However, k-means wastes a large amount of time on the unnecessary distance calculations. Thus accelerating k-means has become a worthy and important topic. Accelerated k-means algorithms can achieve the same result as k-means, but only faster. In this paper, we present a novel accelerated exact k-means algorithm named Fission-Fusion k-means that is significantly faster than the state-of-the-art accelerated k-means algorithms. The additional memory consumption of our algorithm is also much less than other accelerated k-means algorithms. Fission-Fusion k-means accelerates k-means by grouping number of points automatically during the iterations. It can balance these expenses well between distance calculations and the filtering time cost. We conduct extensive experiments on the real world datasets. In the experiments, real world datasets verify that Fission-Fusion k-means can considerably outperform the state-of-the-art accelerated k-means algorithms especially when the datasets are low-dimensional and the number of clusters is quite large. In addition, for more separated and naturally-clustered datasets, our algorithm is relatively faster than other accelerated k-means algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  2. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)

    Google Scholar 

  3. Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178 (2010)

    Google Scholar 

  4. Wang, J., Wang, J., Ke, Q., Zeng, G., and Li, S.: Fast approximate k-means via cluster closures. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3037–3044 (2012)

    Google Scholar 

  5. Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 277–281 (1999)

    Google Scholar 

  6. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24, 881–892 (2002)

    Article  MATH  Google Scholar 

  7. Elkan, C.: Using the triangle inequality to accelerate k- means. In: Proceedings of the 20th International Conference on Machine Learning (ICML), pp. 147–153 (2003)

    Google Scholar 

  8. Hamerly, G.: Making k-means even faster. In: SIAM International Conference on Data Mining (SDM), pp. 130–140 (2010)

    Google Scholar 

  9. Drake, J., Hamerly, G.: Accelerated k-means with adaptive distance bounds. In: 5th NIPS Workshop on Optimization for Machine Learning, pp. 579–587 (2012)

    Google Scholar 

  10. Drake, J.: Faster k-means clustering (2013). Accessed online 19 August 2015

    Google Scholar 

  11. Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., Mytkowicz, T.: Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In: Proceedings of the 32nd International Conference on Machine Learning (ICML), pp. 579–587 (2015)

    Google Scholar 

  12. Ryšavý, P., Hamerly, G.: Geometric methods to accelerate k-means algorithms. In: SIAM International Conference on Data Mining (SDM), pp. 324–332 (2016)

    Google Scholar 

  13. Bottesch, T., Bühler, T., Kächele, M.: Speeding up k-means by approximating euclidean distances via block vectors. In: Proceedings of the 33rd International Conference on Machine Learning, New York (2016)

    Google Scholar 

  14. Newling, J., Fleuret, F.: Fast K-means with accurate bounds. In: Proceedings of the 33rd International Conference on Machine Learning, New York (2016)

    Google Scholar 

  15. Bache, K., Lichman, M.: UCI machine learning repository (2013). url: http://archive.ics.uci.edu/ml/

  16. Joensuu: clustering datasets – Joensuu homepage url: https://cs.joensuu.fi/sipu/datasets/

  17. Rong-En, F.: LIBSVM homepage url:https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bi-Ru Dai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Yu, Q., Dai, BR. (2017). Accelerating K-Means by Grouping Points Automatically. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64283-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64282-6

  • Online ISBN: 978-3-319-64283-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics