Abstract
Web page classification is the problem of assigning predefined categories to web pages. A challenge in web page classification is how to deal with the high dimensionality of the feature space. We present a feature reduction method based on the rough set theory and investigate the effectiveness of the rough set feature selection method on web page classification. Our experiments indicate that rough set feature selection can improve the predictive performance when the original feature set for representing web pages is large.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
An, A., Cercone, N.: ELEM2: A Learning system for more accurate classifications. In: Proceedings of the 12th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI 1998, Vancouver, Canada (1998)
An, A., Cercone, N.: Rule quality measures for rule induction systems: Description and evaluation. Computational Intelligence 17(3), 409–424 (2001)
Huang, Y.: Web-based Classification Using Machine Learning Approaches. MSc Thesis, Department of Computer Science, University of Regina, Regina, Canada (2002)
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning
Lawrence, S., Giles, L.: Accessibility and distribution of information on the Web. Nature 400, 107–109 (1999), http://www.metrics.com
Lewis, D.D., Ringuette, M.: Comparison of two learning algorithms for text categorization. In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval SDAIR 1994 (1994)
Mladenic, D.: Feature subset selection in text learning. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
Notess, G.R. Search engine statistics: database total size estimates. http://www.searchengineshowdown.com/stats/sizeest.shtml
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer, Dordrecht (1991)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Raghavan, V.V., Sever, H.: The state of rough sets for database mining applications. In: Proceedings of 23rd Computer Science Conference Workshop on Rough Sets and Database Mining, pp. 1–11 (1995)
Scott, S., Matwin, S.: Text classification using WordNet Hypernyms. In: Proceedings of the Conference on the Use of WordNet in Natural Language Processing Systems (1998)
Slowinski, R. (ed.): Intelligent Decision Support: Handbook of Advances and Applications of the Rough Sets Theory. Kluwer, Dordrecht (1992)
Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval SDAIR 1995 (1995)
Yang, Y.: Noise reduction in a statistical approach to text categorization. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 1995 (1995)
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 412–420 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
An, A., Huang, Y., Huang, X., Cercone, N. (2004). Feature Selection with Rough Sets for Web Page Classification. In: Peters, J.F., Skowron, A., Dubois, D., Grzymała-Busse, J.W., Inuiguchi, M., Polkowski, L. (eds) Transactions on Rough Sets II. Lecture Notes in Computer Science, vol 3135. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27778-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-27778-1_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23990-1
Online ISBN: 978-3-540-27778-1
eBook Packages: Computer ScienceComputer Science (R0)
Publish with us
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.