Abstract
Bisecting K-means clustering method belongs to the hierarchical algorithm in text clustering, in which the selection of K value and initial center of mass will affect the final result of clustering. Chinese word segmentation has the characteristics of vague word and word boundary, etc. We transformed the corpus into word vector by word2vec, reduced the dimension of data by ontology modeling, and cleaned the data by jieba word segmentation and TF-IDF to improve the accuracy of the data. We propose an improved algorithm based on hierarchical clustering and Bisecting K-means clustering to cluster the data many times until it converges. Through experiments, it is proved that the clustering result of this method is better than that of K-means clustering algorithm and Bisecting K-means clustering algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, Y., Huang, T., Lin, K., Zhang, Q.: An improved K-means text clustering algorithm. J. Guilin Univ. Electron. Sci. Technol. 36(04), 311–314 (2016)
Wang, Q.: Chinese word segmentation and word vector. China New Commun. 20(23), 19–23 (2018)
An, J., Gao, G., Shi, Z., Sun, L.: An improved K-means text clustering algorithm. Sens. Microsyst. 34(05), 130–133 (2015)
Liu, P., Lu, J.: Improved K-means text clustering algorithm based on MapReduce. Inf. Technol. (11), 201–205 (2016)
Zou, H., Li, M.: An improved bisecting K-means algorithm for text clustering. Microcomput. Appl. 29(12), 64–67 (2010)
Zhang, J., Wang, N., Huang, S., Li, S.: Research on optimization and parallelization of bisecting K-means clustering algorithm. Comput. Eng. 37(17), 23–25 (2011)
Hui, Y., Xia, Y., Chen, Z., Tong, X.: Short text clustering algorithm based on synonyms and K-means. Comput. Knowl. Technol. 15(01), 5–6 (2019)
Tang, X., Zhai, X.: Semantic indexing of text knowledge fragments based on ontology and Word2Vec. Inf. Sci. 37(04), 97–102 (2019)
Dai, Y., Xu, L.: An improved TF-IDF algorithm based on semantic analysis. J. Southwest Univ. Sci. Technol. 34(01), 6773 (2019)
Kui, Z.: Improvement of TF-IDF weight calculation method in text classification. Softw. Guide 17(12), 39–42 (2018)
Liang, K., Wang, C., Zhang, Y., Zou, W.: Knowledge aggregation and intelligent guidance for fragmented learning. Procedia Comput. Sci. 131, 656–664 (2018)
Acknowledgements
This work was partially supported by NSFC (No. 61807024).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zi, Y., Kun, L., Zhang, Z., Wang, C., Peng, Z. (2020). An Improved Bisecting K-Means Text Clustering Method. In: Xhafa, F., Patnaik, S., Tavana, M. (eds) Advances in Intelligent Systems and Interactive Applications. IISA 2019. Advances in Intelligent Systems and Computing, vol 1084. Springer, Cham. https://doi.org/10.1007/978-3-030-34387-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-34387-3_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34386-6
Online ISBN: 978-3-030-34387-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)