Abstract
Through deeply analyzing of the problem in K-Means algorithm, this topic proposed an improved scheme based on Hadoop distributed platform. Using the proposed clustering analysis system to configure the experimental environment, the algorithm is optimized from three aspects: parallel random sampling, parallelization of sample distance computation and parallelization of data clustering process. At the same time, the improved K-Means parallel algorithm flow was described in detail. The experimental result shows that the cluster analysis system based on Hadoop distributed cloud computing platform can provide efficient, stable and configurable clustering analysis service. Improved K-Means parallel clustering algorithm can quickly deal with large scale calculation of cluster analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Deng, Q., Yang, Y.: Research on improved parallel K-means algorithm based on Spark framework. Intell. Comput. Appl. 8(01), 76–78 (2018)
Li, X., Yu, L., Lei, H., Tang, X.: A parallel implementation and application of K-means improved algorithm. J. Univ. Electron. Sci. Technol. China 46(01), 61–68 (2017)
Li, H.: Improved K-means clustering method and its application, pp. 15–17. Northeast Agricultural University (2014)
Li, G.B., Han Qing, J.: An improved K-means clustering algorithm for MapReduce parallelization. Digit. Technol. Appl. (12), 134–136 (2016)
Lu, S., Wang, J., Zhang, X., Gao, J.: Optimization of K-means clustering algorithm based on Hadoop platform. J. Inner Mongolia Univ. Sci. Technol. 35(03), 264–268 (2016)
Ran, J., Kou, C., Liu, R.: Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment. J. Cloud Comput. Adv. Syst. Appl. 2(1), 1–10 (2013)
Fu, C., Zhou, G.: Improved parallel sorting algorithm based on Hadoop. Softw. Guide 15(4), 68–70 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, X., Li, D. (2018). An Improved K-Means Parallel Algorithm Based on Cloud Computing. In: Zhou, Q., Gan, Y., Jing, W., Song, X., Wang, Y., Lu, Z. (eds) Data Science. ICPCSEE 2018. Communications in Computer and Information Science, vol 901. Springer, Singapore. https://doi.org/10.1007/978-981-13-2203-7_30
Download citation
DOI: https://doi.org/10.1007/978-981-13-2203-7_30
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2202-0
Online ISBN: 978-981-13-2203-7
eBook Packages: Computer ScienceComputer Science (R0)