Skip to main content

Crawling the Web with OntoDir

  • Conference paper
Book cover Database and Expert Systems Applications (DEXA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4653))

Included in the following conference series:

  • 1200 Accesses

Abstract

Managing large amount of information on the internet needs more efficient and effective methods and techniques for mining and representing information. The use of ontologies for knowledge representation has had a fast increase in the last years: in fact the use of a common and formal representation of knowledge allows a more accurate analysis of a number of documents content, in several contexts. One of these challenging applications is the Web: the World Wide Web, in fact, has nowadays those kinds of requirements which are hard to satisfy, especially when one considers a complex scenario as the Semantic Web. In this paper we present a methodology for automatic topic annotation of Web pages. We describe an algorithm for words disambiguation using an apposite metric for measuring the semantic relatedness and we show a technique which allows to detect the topic of the analyzed document by means of ontologies extracted from a knowledge base. The strategy is implemented in a system where these information are taken into account to build a topic hierarchy automatically created and not a priori defined. Experimental results are presented and discussed in order to measure the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albanese, M., Picariello, A., Rinaldi, A.M.: A semantic search engine for web information retrieval: an approach based on dynamic semantic networks. In: ACM SIGIR Semantic Web and Information Retrieval Workshop (SWIR 2004), pp. 25–29. ACM Press, New York (2004)

    Google Scholar 

  2. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web: A new form of web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American 284(5), 28–37 (2001)

    Article  Google Scholar 

  3. Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., Gonçalves, M.A.: Combining link-based and content-based methods for web document classification. In: CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management, pp. 394–401. ACM Press, New York (2003)

    Chapter  Google Scholar 

  4. Chen, H., Dumais, S.: Bringing order to the web: automatically categorizing search results. In: CHI 2000: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 145–152. ACM Press, New York (2000)

    Chapter  Google Scholar 

  5. Dumais, S., Chen, H.: Hierarchical classification of web content. In: SIGIR 2000: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 256–263. ACM Press, New York (2000)

    Chapter  Google Scholar 

  6. Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquis. 5(2), 199–220 (1993)

    Article  Google Scholar 

  7. Huang, C.-C., Chuang, S.-L., Chien, L.-F.: Liveclassifier: creating hierarchical text classifiers through web corpora. In: WWW 2004: Proceedings of the 13th international conference on World Wide Web, pp. 184–192. ACM Press, New York (2004)

    Chapter  Google Scholar 

  8. Huang, Y., Wang, Q., Yang, J., Ding, Q.: The design and implementation of a subject-oriented web information classification system. In: Proceedings of the 9th International Conference on Computer Supported Cooperative Work in Design, vol. 2, pp. 836–840 (2005)

    Google Scholar 

  9. Jackson, M., Burden, P.: WWLib-TNG - new directions in search engine technology. IEE Informatics Colloquium Lost in the Web - navigation on the Internet, 10/1–10/8 (1999)

    Google Scholar 

  10. Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 331–339 (1995)

    Google Scholar 

  11. Li, Y., Bandar, Z., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003)

    Article  Google Scholar 

  12. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  13. Qi, D., Sun, B.: A genetic k-means approaches for automated web page classification. In: IRI, pp. 241–246 (2004)

    Google Scholar 

  14. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, London (1989)

    Google Scholar 

  15. Song, M.-H., Lim, S.-Y., Kang, D.-J., Lee, S.-J.: Automatic classification of web pages based on the concept of domain ontology. In: Proceeding of the 12th Asia-Pacific Software Engineering Conference (APSEC 2005), Taipei, Taiwan, pages CD–ROM (2005)

    Google Scholar 

  16. Xiaogang, P., Choi, B.: Automatic web page classification in a dynamic and hierarchical way. In: Proceeding of the IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 386–393 (2002)

    Google Scholar 

  17. Zhang, M.-Y., Lu, Z.-D.: A fuzzy classification based on feature selection for web pages. In: WI 2004: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI 2004), pp. 469–472. IEEE Computer Society Press, Washington (2004)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Roland Wagner Norman Revell Günther Pernul

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Picariello, A., Rinaldi, A.M. (2007). Crawling the Web with OntoDir. In: Wagner, R., Revell, N., Pernul, G. (eds) Database and Expert Systems Applications. DEXA 2007. Lecture Notes in Computer Science, vol 4653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74469-6_71

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74469-6_71

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74467-2

  • Online ISBN: 978-3-540-74469-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics