Accelerating K-Means by Grouping Points Automatically

Yu, Qiao; Dai, Bi-Ru

doi:10.1007/978-3-319-64283-3_15

Qiao Yu¹⁵ &
Bi-Ru Dai¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10440))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

2062 Accesses
1 Citations

Abstract

K-means is a well-known clustering algorithm in data mining and machine learning. It is widely applicable in various domains such as computer vision, market segmentation, social network analysis, etc. However, k-means wastes a large amount of time on the unnecessary distance calculations. Thus accelerating k-means has become a worthy and important topic. Accelerated k-means algorithms can achieve the same result as k-means, but only faster. In this paper, we present a novel accelerated exact k-means algorithm named Fission-Fusion k-means that is significantly faster than the state-of-the-art accelerated k-means algorithms. The additional memory consumption of our algorithm is also much less than other accelerated k-means algorithms. Fission-Fusion k-means accelerates k-means by grouping number of points automatically during the iterations. It can balance these expenses well between distance calculations and the filtering time cost. We conduct extensive experiments on the real world datasets. In the experiments, real world datasets verify that Fission-Fusion k-means can considerably outperform the state-of-the-art accelerated k-means algorithms especially when the datasets are low-dimensional and the number of clusters is quite large. In addition, for more separated and naturally-clustered datasets, our algorithm is relatively faster than other accelerated k-means algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet MATH Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)
Google Scholar
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178 (2010)
Google Scholar
Wang, J., Wang, J., Ke, Q., Zeng, G., and Li, S.: Fast approximate k-means via cluster closures. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3037–3044 (2012)
Google Scholar
Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 277–281 (1999)
Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24, 881–892 (2002)
Article MATH Google Scholar
Elkan, C.: Using the triangle inequality to accelerate k- means. In: Proceedings of the 20th International Conference on Machine Learning (ICML), pp. 147–153 (2003)
Google Scholar
Hamerly, G.: Making k-means even faster. In: SIAM International Conference on Data Mining (SDM), pp. 130–140 (2010)
Google Scholar
Drake, J., Hamerly, G.: Accelerated k-means with adaptive distance bounds. In: 5th NIPS Workshop on Optimization for Machine Learning, pp. 579–587 (2012)
Google Scholar
Drake, J.: Faster k-means clustering (2013). Accessed online 19 August 2015
Google Scholar
Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., Mytkowicz, T.: Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In: Proceedings of the 32nd International Conference on Machine Learning (ICML), pp. 579–587 (2015)
Google Scholar
Ryšavý, P., Hamerly, G.: Geometric methods to accelerate k-means algorithms. In: SIAM International Conference on Data Mining (SDM), pp. 324–332 (2016)
Google Scholar
Bottesch, T., Bühler, T., Kächele, M.: Speeding up k-means by approximating euclidean distances via block vectors. In: Proceedings of the 33rd International Conference on Machine Learning, New York (2016)
Google Scholar
Newling, J., Fleuret, F.: Fast K-means with accurate bounds. In: Proceedings of the 33rd International Conference on Machine Learning, New York (2016)
Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository (2013). url: http://archive.ics.uci.edu/ml/
Joensuu: clustering datasets – Joensuu homepage url: https://cs.joensuu.fi/sipu/datasets/
Rong-En, F.: LIBSVM homepage url:https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, No. 43, Section 4, Keelung Road, Da’an District, Taipei, 106, Taiwan, ROC
Qiao Yu & Bi-Ru Dai

Authors

Qiao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Bi-Ru Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bi-Ru Dai .

Editor information

Editors and Affiliations

LIAS/ISAE-ENSMA, Chasseneuil, France
Ladjel Bellatreche
University of Texas at Arlington, Arlington, Texas, USA
Sharma Chakravarthy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, Q., Dai, BR. (2017). Accelerating K-Means by Grouping Points Automatically. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-64283-3_15
Published: 03 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64282-6
Online ISBN: 978-3-319-64283-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics