Skip to main content

Design and Implementation of an Ontology Algorithm for Web Documents Classification

  • Conference paper
Computational Science and Its Applications - ICCSA 2006 (ICCSA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3983))

Included in the following conference series:

Abstract

Traditional methods of documents classification need characteristic abstraction and classifier training. The work of collecting trainable text terms is laborious and time-consuming. Additionally, it is difficult to abstract the characteristics from Chinese documents. In order to solve the problem, this paper proposes an ontology-based approach to improve the efficiency and effectiveness of web documents classification and retrieval. Firstly, the approach establishes an ontology model based on Hownet[6] kownledge base and its method. Then, it creates ontologies for each subclass of the classification system. It uses RDFS to convert Hownet into ontology and to define the relations among ontologies. The web documents classification is performed automatically using the ontology relevance calculating algorithm. Comparing with the method of KNN[2], the results of our experiments indicate that the accuracy of ontology-based approach is close to KNN, its algorithms is more robust than KNN, and its recalling rate is better than KNN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cortes, C., Vapnik, V.: Support vector networks. Machine learning (20), 273–297 (1995)

    Google Scholar 

  2. Baoli, L., Qin, L., Shiwen, Y.: An adaptive k-nearest neighbor text categorization strategy. ACM Transactions on Asian Language Information Processing (TALIP), 215–226 (2004)

    Google Scholar 

  3. Kan, M.-Y.: Web page classification without the web page. In: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters table of contents, pp. 262–263 (2004)

    Google Scholar 

  4. Ehrig, M., Maedche, A.: Ontology-focused crawling of Web documents. In: Proceedings of the 2003 ACM symposium on Applied computing, pp. 1174–1178 (2003)

    Google Scholar 

  5. Lauser, B., Wildemann, T., Poulos, A., Fisseha, F., Keizer, J., Katz, S.: A comprehensive framework for building multilingual domain ontologies: Creating a prototype biosecurity ontology. In: DC-2002: Metadata for e-Communities: Supporting Diversity and Convergence,Florence, Italy (October 2002)

    Google Scholar 

  6. Dong, Z.: Knowledge Description: What, How and Who? In: Proceedings of International Symposium on Electronic Dictionary, Tokyo, Japan (1988)

    Google Scholar 

  7. Web Ontology Language (OWL), (Current November 10, 2005), http://www.w3.org/2004/OWL/

  8. Resource Description Framework (RDF), (Current November 10, 2005), http://www.w3.org/RDF/

  9. Report SMI-2001-0880. Stanford Knowledge Systems Laboratory., Available at http://www.ksl.stanford.edu/people/dlm/papers/ontologytutorial-noy-mcguinness-abstract.html

  10. Arch-int, N.: A semantic information gathering approach for heterogeneous information sources on WWW. Journal of Information Science 29(5), 357 (2003)

    Article  Google Scholar 

  11. Martin, P., Eklund, P.: Embedding knowledge in Web documents. Computer Networks 31, 1403–1420 (1999)

    Article  Google Scholar 

  12. Liddy, E.D., Paik, W., Yu, E.S.: Text categorization for multiple users based on semantic features from a machine-readable dictionary. ACM Transaction on Information Systems 12(3), 278–295 (1994)

    Article  Google Scholar 

  13. Chen, L.-C., Luh, C.-J., Jou, C.: Generating page clippings from web search results using a dynamically terminated genetic algorithm. Information Systems, 299–316 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wei, G., Yu, J., Ling, Y., Liu, J. (2006). Design and Implementation of an Ontology Algorithm for Web Documents Classification. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3983. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751632_71

Download citation

  • DOI: https://doi.org/10.1007/11751632_71

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34077-5

  • Online ISBN: 978-3-540-34078-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics