Abstract
The Rough Set Theory can reduce features of Chinese text effectively [1], but it is often encountered that the reduction will need a very long time in the case of a large number of training sets [2]. To solve the problem, this article proposes a method of associating Rough Set Theory with Automatic Abstracting Technology (AAT). Firstly, by calculating the weight of each node-it consists of the Self-Frequency, Tree Frequency, Concept Generalization Degree and Concept Selection Degree -in the Concept Hierarchy Tree [3] which based on Tongyici Cilin semantic dictionary [4] [5], it can determine theme concepts of Chinese Text. Secondly, it will extract the topic sentences [6] by calculating the importance of sentences [7]. Finally, it reduces features of these topic sentences again by IQR (Improved Quick Reduct Algorithm), and constructs the vector. Then from the whole information retrieval system perspective, it is clear that this method can save time for Automatic Abstracting and reduction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Srinivasan, P., Ruiz, M.E.: Vocabulary Mining for Information Retrieval: Rough Sets and Fuzzy Sets. In: Information Processing and Management, pp. 15–38 (2001)
Jiang, M.: Natural language processing. Higher Education Press, Beijing (2006) (in Chinese)
Jin, B., Shi, Y., Teng, H.: Similarity algorithm of text based on semantic understanding. Dalian University of Technology, Dalian (2005) (in Chinese)
Wu, C., Zhang, Q., Miao, J.: Semantic comprehension of natural language processing and information retrieval model. Graduate School, Chinese Academy of Sciences (2008) (in Chinese)
Mei, J.: Tongyici Cilin. Shanghai Dictionary Publishing House, Shanghai (1985) (in Chinese)
Yang, X., Song, F., Zhong, Y.: Based on the selected to generate abstract method research and implementation of automatic abstract system. In: The Fourth National Computational Linguistics Conference Proceedings, pp. 313–318. Tsinghua University Press (1997) (in Chinese)
Mitra, M., Singhal, A., Buckley, C.: Automatic Text Summarization by Paragraph Extraction. In: Workshop on Intelligent Scalable Text Summarization, ACL/EACL 1997, Madrid, pp. 31–36 (1997)
Chen, W., Liu, T., Qin, B.: For bilingual sentence retrieval of Chinese sentence similarity computing. In: Joint Conference of the National Seventh Computer Linguistics, pp. 81–88. Tsinghua University Press, Beijing (2003) (in Chinese)
Chen, C., Yan, H.: Rough set theory in information retrieval. Guilin University of Technology (2011) (in Chinese)
Zong, C.: Statistical natural language processing. Tsinghua University Press, Beijing (2008) (in Chinese)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shen, M., Dong, B., Xu, L. (2012). An Improved Method for the Feature Extraction of Chinese Text by Combining Rough Set Theory with Automatic Abstracting Technology. In: Khachidze, V., Wang, T., Siddiqui, S., Liu, V., Cappuccio, S., Lim, A. (eds) Contemporary Research on E-business Technology and Strategy. iCETS 2012. Communications in Computer and Information Science, vol 332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34447-3_44
Download citation
DOI: https://doi.org/10.1007/978-3-642-34447-3_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34446-6
Online ISBN: 978-3-642-34447-3
eBook Packages: Computer ScienceComputer Science (R0)