Abstract
As we know, current classification methods are mostly based on the VSM (Vector Space Model), which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. We proposed a system that uses an integrated ontologies and Natural Language Processing techniques to index texts. Traditional Words matrix is replaced by Concepts based matrix. For this purpose, we developed fully automated methods for mapping keywords to their corresponding ontology concepts. Support Vector Machine a successful machine learning technique is used for classification. Experimental results shows that our proposed method dose improve text classification performance significantly.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kehagias, A., et al.: A comparison of word- and sense-based text categorization using several classification algorithms. Journal of Intelligent Information Systems 21(3), 227–247 (2003)
Moschitti, A., Basili, R.: Complex Linguistic Features for Text Classification: A Comprehensive Study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)
Sahlgren, M., Cöster, R.: Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 487–493 (2004)
Mihalcea, R., Moldovan, D.: An iterative approach to word sense disambiguation. In: Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society (FLAIRS), Orlando, Florida, USA. AAAI Press, Menlo Park (2000)
Voorhees, E.: Natural language processing and information retrieval. In: Pazienza, M.T. (ed.) Information Extraction: Towards Scalable, Adaptable Systems, pp. 32–48. Springer, New York (1999)
Fellbaum, C.: WordNet: An Electronic Lexical Database, Language, Speech, and Communication. MIT Press, Cambridge (1998)
Lenat, D.B., Guha, R.V.: Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley Pub. Co., Reading (1989)
McGuinness, D.L., Fikes, R., Rice, J., Wilder, S.: An Environment for Merging and Testing Large Ontologies. In: Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000), Breckenridge, Colorado (April 2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bai, R., Wang, X., Liao, J. (2010). Using an Integrated Ontology Database to Categorize Web Pages. In: Kim, Th., Adeli, H. (eds) Advances in Computer Science and Information Technology. AST ACN 2010 2010. Lecture Notes in Computer Science, vol 6059. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13577-4_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-13577-4_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13576-7
Online ISBN: 978-3-642-13577-4
eBook Packages: Computer ScienceComputer Science (R0)