An Improved Parallel K-Means Algorithm Based on Cloud Computing

Zhang, Dongbo; Shou, Yanfang

doi:10.1007/978-981-10-0356-1_32

Dongbo Zhang¹⁴ &
Yanfang Shou¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 575))

Included in the following conference series:

International Symposium on Computational Intelligence and Intelligent Systems

1674 Accesses

Abstract

In this paper we presented CK-means clustering algorithm based on improved K-means algorithm and the Canopy algorithm, which uses MapReduce programming model of Hadoop platform. The experimental results prove that the CK-means algorithm has a good advantage in the processing of large data sets, in the acceleration ratio, accuracy, expansion rate, and the effect of the algorithm after deploying on the Hadoop clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Qian, W.N., Zhou, A.Y.: Analyzing popular clustering algorithms from different viewpoints. J. Softw. 13(8), 1382–1394 (2002)
Google Scholar
Gustavo, E.A., Batista, P.A., Monard, M.C.: Annalysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 13(5/6), 519–533 (2003)
Google Scholar
Bao, L., Li, Q.: Combat Big Data. Tsinghua University Press, Beijing (2014)
Google Scholar
Wen, C.: Parallel Clustering Algorithm Based on MapReduce. Zhejiang University, HangZhou (2011)
Google Scholar
Jiang, X., Li, C.: Parallel implementing k-means clustering algorithm using MapReduce. J. Huazhong Univ. Sci. Tech. (Nat. Sci. Ed.) 39(1), 120–124 (2011)
Google Scholar
Li, Y.: Research on parallelization of clustering algorithm based on MapReduce. Sun Yat-sen University, Guangzhou
Google Scholar
Xue, S.-J., Pan, W.: Parallel Pk-means algorithm on meteorological data using MapReduce. J. Wuhan Univ. Technol. 34(12), 139–142 (2012)
Google Scholar
Ji, S.-Q., Shi, H.-B.: K-means clustering ensemble based on MapReduce. Comput. Eng. 39(9), 84–87 (2013)
Google Scholar
Xie, X., Li, L.: Reseach on parallel k-means algorithm based on cloud computing platform. Comput. Meas. Control 22(5), 1510–1512 (2014)
Google Scholar
Zhang, X., Zhang, G., Liu, P.: Improved k-means algorithm based on clustering criterion function. Comput. Eng. Appl. 47(11), 123–128 (2011)
Google Scholar
Su, M.C., Chou, C.H.: A modified version of the k-means algorithm with a distance based on cluster symmetry. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 674–680 (2001)
Article Google Scholar
Fu, N., Qiao, L.Y., Peng, X.Y.: Blind recovery of mixing matrix with sparse sources based on improved k-means clustering and hough transform. Chin. J. Electron. 37(4), 92–96 (2009). (Ch)
Google Scholar
Gao, R., Li, J., Xiao, Y., Zhu, S., Peng, W.: Parallel algorithm based on K-means clustering in cloud environment. J. Wuhan. Univ. (Nat. Sci. Ed.) 61(4), 368–374 (2015)
MATH Google Scholar

Download references

Acknowledgments

This research is supported by the Fundamental Research Funds for the Central Universities (No. 2015ZM039) and scientific research project for Guangdong University of Science and Technology (No. GKY-2014KYYB-10).

Author information

Authors and Affiliations

Department of Computer Science, Guangdong University of Science and Technology, Dongguan, China
Dongbo Zhang
Guangzhou Institute of Modern Industrial Technology, South China University of Technology, Guangzhou, China
Yanfang Shou

Authors

Dongbo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yanfang Shou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongbo Zhang .

Editor information

Editors and Affiliations

College of Mathematics and Informatics, The South China Agricultural University, Guangzhou, China
Kangshun Li
School of Computer Science, Guangzhou University, Guangzhou, China
Jin Li
School of Computer Science and Engineeri, The University of Aizu, Aizu-Wakamatsu, Fukushima, Japan
Yong Liu
Dept. of Informatics, University of Salerno, Fisciano, Italy
Aniello Castiglione

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, D., Shou, Y. (2016). An Improved Parallel K-Means Algorithm Based on Cloud Computing. In: Li, K., Li, J., Liu, Y., Castiglione, A. (eds) Computational Intelligence and Intelligent Systems. ISICA 2015. Communications in Computer and Information Science, vol 575. Springer, Singapore. https://doi.org/10.1007/978-981-10-0356-1_32

Download citation

DOI: https://doi.org/10.1007/978-981-10-0356-1_32
Published: 19 January 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0355-4
Online ISBN: 978-981-10-0356-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics