An Improved Method for the Feature Extraction of Chinese Text by Combining Rough Set Theory with Automatic Abstracting Technology

Shen, Min; Dong, Baosen; Xu, Linying

doi:10.1007/978-3-642-34447-3_44

Min Shen⁷,
Baosen Dong⁷ &
Linying Xu⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 332))

Included in the following conference series:

International Conference on E-business Technology and Strategy

2716 Accesses
1 Citations

Abstract

The Rough Set Theory can reduce features of Chinese text effectively [1], but it is often encountered that the reduction will need a very long time in the case of a large number of training sets [2]. To solve the problem, this article proposes a method of associating Rough Set Theory with Automatic Abstracting Technology (AAT). Firstly, by calculating the weight of each node-it consists of the Self-Frequency, Tree Frequency, Concept Generalization Degree and Concept Selection Degree -in the Concept Hierarchy Tree [3] which based on Tongyici Cilin semantic dictionary [4] [5], it can determine theme concepts of Chinese Text. Secondly, it will extract the topic sentences [6] by calculating the importance of sentences [7]. Finally, it reduces features of these topic sentences again by IQR (Improved Quick Reduct Algorithm), and constructs the vector. Then from the whole information retrieval system perspective, it is clear that this method can save time for Automatic Abstracting and reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Srinivasan, P., Ruiz, M.E.: Vocabulary Mining for Information Retrieval: Rough Sets and Fuzzy Sets. In: Information Processing and Management, pp. 15–38 (2001)
Google Scholar
Jiang, M.: Natural language processing. Higher Education Press, Beijing (2006) (in Chinese)
Google Scholar
Jin, B., Shi, Y., Teng, H.: Similarity algorithm of text based on semantic understanding. Dalian University of Technology, Dalian (2005) (in Chinese)
Google Scholar
Wu, C., Zhang, Q., Miao, J.: Semantic comprehension of natural language processing and information retrieval model. Graduate School, Chinese Academy of Sciences (2008) (in Chinese)
Google Scholar
Mei, J.: Tongyici Cilin. Shanghai Dictionary Publishing House, Shanghai (1985) (in Chinese)
Google Scholar
Yang, X., Song, F., Zhong, Y.: Based on the selected to generate abstract method research and implementation of automatic abstract system. In: The Fourth National Computational Linguistics Conference Proceedings, pp. 313–318. Tsinghua University Press (1997) (in Chinese)
Google Scholar
Mitra, M., Singhal, A., Buckley, C.: Automatic Text Summarization by Paragraph Extraction. In: Workshop on Intelligent Scalable Text Summarization, ACL/EACL 1997, Madrid, pp. 31–36 (1997)
Google Scholar
Chen, W., Liu, T., Qin, B.: For bilingual sentence retrieval of Chinese sentence similarity computing. In: Joint Conference of the National Seventh Computer Linguistics, pp. 81–88. Tsinghua University Press, Beijing (2003) (in Chinese)
Google Scholar
Chen, C., Yan, H.: Rough set theory in information retrieval. Guilin University of Technology (2011) (in Chinese)
Google Scholar
Zong, C.: Statistical natural language processing. Tsinghua University Press, Beijing (2008) (in Chinese)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Tianjin University, 300072, Tianjin, China
Min Shen, Baosen Dong & Linying Xu

Authors

Min Shen
View author publications
You can also search for this author in PubMed Google Scholar
Baosen Dong
View author publications
You can also search for this author in PubMed Google Scholar
Linying Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CeBA Canada, Ottawa, ON, Canada
Vasil Khachidze
ITSB, PWGSC, Ottawa, ON, Canada
Tim Wang
Algonquin College, Woodroffe Campus, Ottawa, ON, Canada
Sohail Siddiqui
Macau University of Science and Technology, Taipa, Macau
Vincent Liu
Nationan Research Council, Ottawa, ON, Canada
Sergio Cappuccio
Carleton University, Ottawa, ON, Canda
Alicia Lim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shen, M., Dong, B., Xu, L. (2012). An Improved Method for the Feature Extraction of Chinese Text by Combining Rough Set Theory with Automatic Abstracting Technology. In: Khachidze, V., Wang, T., Siddiqui, S., Liu, V., Cappuccio, S., Lim, A. (eds) Contemporary Research on E-business Technology and Strategy. iCETS 2012. Communications in Computer and Information Science, vol 332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34447-3_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-34447-3_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34446-6
Online ISBN: 978-3-642-34447-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics