Skip to main content

Using an Integrated Ontology Database to Categorize Web Pages

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6059))

Abstract

As we know, current classification methods are mostly based on the VSM (Vector Space Model), which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. We proposed a system that uses an integrated ontologies and Natural Language Processing techniques to index texts. Traditional Words matrix is replaced by Concepts based matrix. For this purpose, we developed fully automated methods for mapping keywords to their corresponding ontology concepts. Support Vector Machine a successful machine learning technique is used for classification. Experimental results shows that our proposed method dose improve text classification performance significantly.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kehagias, A., et al.: A comparison of word- and sense-based text categorization using several classification algorithms. Journal of Intelligent Information Systems 21(3), 227–247 (2003)

    Article  MathSciNet  Google Scholar 

  2. Moschitti, A., Basili, R.: Complex Linguistic Features for Text Classification: A Comprehensive Study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)

    Google Scholar 

  3. Sahlgren, M., Cöster, R.: Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 487–493 (2004)

    Google Scholar 

  4. Mihalcea, R., Moldovan, D.: An iterative approach to word sense disambiguation. In: Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society (FLAIRS), Orlando, Florida, USA. AAAI Press, Menlo Park (2000)

    Google Scholar 

  5. Voorhees, E.: Natural language processing and information retrieval. In: Pazienza, M.T. (ed.) Information Extraction: Towards Scalable, Adaptable Systems, pp. 32–48. Springer, New York (1999)

    Google Scholar 

  6. Fellbaum, C.: WordNet: An Electronic Lexical Database, Language, Speech, and Communication. MIT Press, Cambridge (1998)

    Google Scholar 

  7. Lenat, D.B., Guha, R.V.: Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley Pub. Co., Reading (1989)

    Google Scholar 

  8. McGuinness, D.L., Fikes, R., Rice, J., Wilder, S.: An Environment for Merging and Testing Large Ontologies. In: Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000), Breckenridge, Colorado (April 2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bai, R., Wang, X., Liao, J. (2010). Using an Integrated Ontology Database to Categorize Web Pages. In: Kim, Th., Adeli, H. (eds) Advances in Computer Science and Information Technology. AST ACN 2010 2010. Lecture Notes in Computer Science, vol 6059. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13577-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13577-4_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13576-7

  • Online ISBN: 978-3-642-13577-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics