Abstract
In this paper, firstly, we express the text filter as a pretreatment of sequences of syllabic words by using the theory of factor space (F-space) to determine the basic factor set and then to do factor decomposition. Secondly, we structure text’s factor matching vectors and their similarity degree, and then to achieve a deeper characterization of concepts by factors’ similarity degree between texts, so as to realize the mining of text clustering. Finally, we randomly selected 90 articles on sogou.com as the experimental object, and verify the proposed algorithm by using plane division method and hierarchical agglomerative clustering these two algorithms. The results show that: (1) the clustering accuracies reaches 91 % and 94 % respectively; (2) the classification result obtained from proposed algorithm in this paper has little difference from the results of manual annotation method; (3) the effect based on hierarchical agglomerative clustering has better performance when compared with the plane division method. This paper provides a new feasible method for text mining fields.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cheng, X.Y., Zhu, Q.: Text Mining Principle. Science Press, Beijing (2010)
Wang, P.Z., Sugeno, M.: The factors field and background structure for fuzzy subsets. Fuzzy Math. 2(2), 45–54 (1982)
Zhuang, W.P., Li, H.X.: Mathematical Theory of Knowledge Representation. Tianjin Science and Technology Press, Tianjin (1994)
Li, H.X.: Mathematical framework factor space theory and knowledge representation—factor space frame axiomatic definition and description. Beijing Normal Univ. (Nat. Sci.) (1996)
Liu, Y.: Implicit policy kinship mining domain-oriented Harbin. Harbin Eng. Univ. (2013)
Sogou Sogou laboratory data download - text classification corpus. http://www.sogou.com/labs/dl/c.html
Zhou, Z.T.: Text Clustering Analysis Evaluation and Text Representation. Beijing, Chinese Academy of Sciences (2005)
Wang, P.Z.: A factor spaces approach to knowledge representation. Fuzzy Sets Syst. (1990)
Luo, C.Z., Yu, F.S.: Mathematical models and expert system development tool. Fuzzy Syst. Math. 6(2), 20 (1992)
He, Q., Tong, Z.M.: A method of forming the concept of factors of space and fuzzy clustering. Syst. Eng. Theory Pract. (1999)
Liu, Y.M.: Feature extraction and classification factor space. Beijing Normal Univ. (Nat. Sci.) 36 (2), 172–177 (2000)
Wang, P.Z., Li, H.X.: Fuzzy System Theory and Fuzzy Computer. Science Press, Beijing (1996)
Acknowledgements
This research was supported by Information and Computing Science Outstanding Talent Training project of Guangdong Province. (No.20153324)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhong, YB., Li, Zj., Zhao, Mh., Yao, Wq. (2016). An Optimized Algorithm for Text Clustering Based on F-Space. In: Cao, BY., Wang, PZ., Liu, ZL., Zhong, YB. (eds) International Conference on Oriental Thinking and Fuzzy Logic. Advances in Intelligent Systems and Computing, vol 443. Springer, Cham. https://doi.org/10.1007/978-3-319-30874-6_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-30874-6_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30873-9
Online ISBN: 978-3-319-30874-6
eBook Packages: EngineeringEngineering (R0)