Abstract
In this paper, an efficient topic-specific Web text filtering framework is proposed. This framework focuses on blocking some topic-specific Web text content. In this framework, a hybrid feature selection method is proposed, and a high efficient filtering engine is designed. In training, we select features based on CHI statistic and rough set theory, then to construct filter with Vector Space Model. We train our frame with huge datasets, and the result suggests our framework is more effective for the topic-specific text filtering. This framework runs at server such as gateway, and it is more efficient than a client-based system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lee, P.Y., Hui, S.C., Fong, A.C.M.: Neural Networks for Web Content Filtering. IEEE Intelligent Systems 17, 48–57 (2002)
Ding, C., Chi, C.-H., Deng, J., Dong, C.-L.: Centralized Content-Based Web Filtering and Blocking: How Far Can It Go. In: Proceeding of IEEE International Conference on Systems, Man and Cybernetics, pp. 115–119 (1999)
Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: CIKM 2002, Virginia, USA, November 2002, pp. 659–661 (2002)
Pawlak, Z.: Rough sets. International Journal of Information and computer Science 11(5), 341–356 (1982)
Chouchoulas, A., Shen, Q.: Rough set-aided keyword reduction for text categorization. Applied Artificial Intelligence 15(9), 843–873 (2001)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Comm.ACM 18(11), 613–620 (1975)
Pang, J., Bu, D., Bai, S.: Research and Implementation of Text Categorization System Based on VSM. Compute Application Research 9, 23–26 (2001)
Fan, J.-J., Su, K.-Y.: An efficient algorithm for matching multiple patterns. IEEE Transactions on Knowledge and Data Engineering 5(2), 339–351 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Q., Li, J. (2005). An Efficient Topic-Specific Web Text Filtering Framework. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds) Web Technologies Research and Development - APWeb 2005. APWeb 2005. Lecture Notes in Computer Science, vol 3399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31849-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-31849-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25207-8
Online ISBN: 978-3-540-31849-1
eBook Packages: Computer ScienceComputer Science (R0)