Abstract
In this paper we presented CK-means clustering algorithm based on improved K-means algorithm and the Canopy algorithm, which uses MapReduce programming model of Hadoop platform. The experimental results prove that the CK-means algorithm has a good advantage in the processing of large data sets, in the acceleration ratio, accuracy, expansion rate, and the effect of the algorithm after deploying on the Hadoop clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Qian, W.N., Zhou, A.Y.: Analyzing popular clustering algorithms from different viewpoints. J. Softw. 13(8), 1382–1394 (2002)
Gustavo, E.A., Batista, P.A., Monard, M.C.: Annalysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 13(5/6), 519–533 (2003)
Bao, L., Li, Q.: Combat Big Data. Tsinghua University Press, Beijing (2014)
Wen, C.: Parallel Clustering Algorithm Based on MapReduce. Zhejiang University, HangZhou (2011)
Jiang, X., Li, C.: Parallel implementing k-means clustering algorithm using MapReduce. J. Huazhong Univ. Sci. Tech. (Nat. Sci. Ed.) 39(1), 120–124 (2011)
Li, Y.: Research on parallelization of clustering algorithm based on MapReduce. Sun Yat-sen University, Guangzhou
Xue, S.-J., Pan, W.: Parallel Pk-means algorithm on meteorological data using MapReduce. J. Wuhan Univ. Technol. 34(12), 139–142 (2012)
Ji, S.-Q., Shi, H.-B.: K-means clustering ensemble based on MapReduce. Comput. Eng. 39(9), 84–87 (2013)
Xie, X., Li, L.: Reseach on parallel k-means algorithm based on cloud computing platform. Comput. Meas. Control 22(5), 1510–1512 (2014)
Zhang, X., Zhang, G., Liu, P.: Improved k-means algorithm based on clustering criterion function. Comput. Eng. Appl. 47(11), 123–128 (2011)
Su, M.C., Chou, C.H.: A modified version of the k-means algorithm with a distance based on cluster symmetry. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 674–680 (2001)
Fu, N., Qiao, L.Y., Peng, X.Y.: Blind recovery of mixing matrix with sparse sources based on improved k-means clustering and hough transform. Chin. J. Electron. 37(4), 92–96 (2009). (Ch)
Gao, R., Li, J., Xiao, Y., Zhu, S., Peng, W.: Parallel algorithm based on K-means clustering in cloud environment. J. Wuhan. Univ. (Nat. Sci. Ed.) 61(4), 368–374 (2015)
Acknowledgments
This research is supported by the Fundamental Research Funds for the Central Universities (No. 2015ZM039) and scientific research project for Guangdong University of Science and Technology (No. GKY-2014KYYB-10).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Zhang, D., Shou, Y. (2016). An Improved Parallel K-Means Algorithm Based on Cloud Computing. In: Li, K., Li, J., Liu, Y., Castiglione, A. (eds) Computational Intelligence and Intelligent Systems. ISICA 2015. Communications in Computer and Information Science, vol 575. Springer, Singapore. https://doi.org/10.1007/978-981-10-0356-1_32
Download citation
DOI: https://doi.org/10.1007/978-981-10-0356-1_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0355-4
Online ISBN: 978-981-10-0356-1
eBook Packages: Computer ScienceComputer Science (R0)