Skip to main content

Feature Selection with Rough Sets for Web Page Classification

  • Conference paper
Transactions on Rough Sets II

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 3135))

Abstract

Web page classification is the problem of assigning predefined categories to web pages. A challenge in web page classification is how to deal with the high dimensionality of the feature space. We present a feature reduction method based on the rough set theory and investigate the effectiveness of the rough set feature selection method on web page classification. Our experiments indicate that rough set feature selection can improve the predictive performance when the original feature set for representing web pages is large.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. An, A., Cercone, N.: ELEM2: A Learning system for more accurate classifications. In: Proceedings of the 12th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI 1998, Vancouver, Canada (1998)

    Google Scholar 

  2. An, A., Cercone, N.: Rule quality measures for rule induction systems: Description and evaluation. Computational Intelligence 17(3), 409–424 (2001)

    Article  Google Scholar 

  3. Huang, Y.: Web-based Classification Using Machine Learning Approaches. MSc Thesis, Department of Computer Science, University of Regina, Regina, Canada (2002)

    Google Scholar 

  4. Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning

    Google Scholar 

  5. Lawrence, S., Giles, L.: Accessibility and distribution of information on the Web. Nature 400, 107–109 (1999), http://www.metrics.com

    Article  Google Scholar 

  6. Lewis, D.D., Ringuette, M.: Comparison of two learning algorithms for text categorization. In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval SDAIR 1994 (1994)

    Google Scholar 

  7. Mladenic, D.: Feature subset selection in text learning. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)

    Google Scholar 

  8. Notess, G.R. Search engine statistics: database total size estimates. http://www.searchengineshowdown.com/stats/sizeest.shtml

  9. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer, Dordrecht (1991)

    MATH  Google Scholar 

  10. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  11. Raghavan, V.V., Sever, H.: The state of rough sets for database mining applications. In: Proceedings of 23rd Computer Science Conference Workshop on Rough Sets and Database Mining, pp. 1–11 (1995)

    Google Scholar 

  12. Scott, S., Matwin, S.: Text classification using WordNet Hypernyms. In: Proceedings of the Conference on the Use of WordNet in Natural Language Processing Systems (1998)

    Google Scholar 

  13. Slowinski, R. (ed.): Intelligent Decision Support: Handbook of Advances and Applications of the Rough Sets Theory. Kluwer, Dordrecht (1992)

    MATH  Google Scholar 

  14. Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval SDAIR 1995 (1995)

    Google Scholar 

  15. Yang, Y.: Noise reduction in a statistical approach to text categorization. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 1995 (1995)

    Google Scholar 

  16. Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 412–420 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

An, A., Huang, Y., Huang, X., Cercone, N. (2004). Feature Selection with Rough Sets for Web Page Classification. In: Peters, J.F., Skowron, A., Dubois, D., Grzymała-Busse, J.W., Inuiguchi, M., Polkowski, L. (eds) Transactions on Rough Sets II. Lecture Notes in Computer Science, vol 3135. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27778-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27778-1_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23990-1

  • Online ISBN: 978-3-540-27778-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.