Skip to main content

Web Directory Construction Using Lexical Chains

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3513))

Abstract

Web Directories provide a way of locating relevant information on the Web. Typically, Web Directories rely on humans putting in significant time and effort into finding important pages on the Web and categorizing them in the Directory. In this paper we present a way for automating the creation of a Web Directory. At a high level, our method takes as input a subject hierarchy and a collection of pages. We first leverage a variety of lexical resources from the Natural Language Processing community to enrich our hierarchy. After that, we process the pages and identify sequences of important terms, which are referred to as lexical chains. Finally, we use the lexical chains in order to decide where in the enriched subject hierarchy we should assign every page. Our experimental results with real Web data show that our method is quite promising into assisting humans during page categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Google Directory, http://dir.google.com

  2. Kartoo, http://www.kartoo.com

  3. MultiWordNet Domains, http://wndomains.itc.it/

  4. Open Directory Project, http://dmoz.com

  5. Sumo Ontology, http://ontology.teknowledge.com/

  6. Vivisimo, http://www.vivisimo.com/

  7. WordNet 2.0, http://www.cogsci.princeton.edu/~wn/

  8. Yahoo!, http://yahoo.com

  9. Yahoo! Inc. MyYahoo, http://my.yahoo.com

  10. Anderson, C.R., Horvitz, E.: Web montage: A dynamic personalized start page. In: Proceedings of the 11th WWW Conference, pp. 704–712 (2002)

    Google Scholar 

  11. Barzilay, R., Elhadad, M.: Lexical chains for text summarization. Master’s Thesis, Ben-Gurion University (1997)

    Google Scholar 

  12. Broder, A.Z., Glassman, S.C., Manasse, M., Zweig, G.: Syntactic clustering of the web. In: Proceedings of the 6th WWW Conference, pp. 1157–1166 (1997)

    Google Scholar 

  13. Chakrabarti, S., Dom, B., Agraval, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB Journal 7, 163–178 (1998)

    Article  Google Scholar 

  14. Chekuri, C., Goldwasser, M., Raghavan, P., Upfal, E.: Web search using automated classification. In: Proceedings of the 6th WWW Conference (1997)

    Google Scholar 

  15. Chen, H., Dumais, S.: Bringing order to the web: Automatically categorizing search results. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 145–152 (2000)

    Google Scholar 

  16. Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M.: THESUS: Organizing web document collections based on link semantics. VLDB Journal 12, 320–332 (2003)

    Article  Google Scholar 

  17. Haveliwala, T.: Topic sensitive PageRank. In: Proceedings of the 11th WWW Conference, pp. 517–526 (2002)

    Google Scholar 

  18. Hirst, G., St-Onge, D.: Lexical chains as representations of content for the detection and correction of malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 305–332. MIT Press, Cambridge (1998)

    Google Scholar 

  19. Huang, C.C., Chuang, S.L., Chien, L.K.: LiveClassifier: Creating hierarchical text classifiers through web corpora. In: Proceedings of the 13th WWW Conference, pp. 184–192 (2004)

    Google Scholar 

  20. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & sons, New York (1990)

    Google Scholar 

  21. Kummumuru, K., Lotlikar, R., Roy, S., Singai, K., Krishnapuram, R.: A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In: Proceedings of the 13th WWW Conference, pp. 658–655 (2004)

    Google Scholar 

  22. Mladenic, D.: Turning Yahoo into an automatic web page classifier. In: Proceedings of the 13th European Conference on Artificial Intelligence, pp. 473–474 (1998)

    Google Scholar 

  23. Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics 17(1), 21–43 (1991)

    Google Scholar 

  24. Ntoulas, A., Cho, J., Olston, C.: What’s new on the web? The evolution of the web from a search engine perspective. In: Proceedings of the 13th WWW Conference, pp. 1–12 (2004)

    Google Scholar 

  25. Olston, C., Chi, E.: ScentTrails: Intergrading browsing and searching. ACM Transactions on Computer-Human Interaction 10(3), 1–21 (2003)

    Article  Google Scholar 

  26. Song, Y.I., Han, K.S., Rim, H.C.: A term weighting method based on lexical chain for automatic summarization. In: Proc. of the 5th CICLing Conference, pp. 636–639 (2004)

    Google Scholar 

  27. Montoyo, A., Palomar, M., Rigau, G.: WordNet Enrichment with Classification Systems. In: Proc. of NAACL Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customization (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stamou, S., Krikos, V., Kokosis, P., Ntoulas, A., Christodoulakis, D. (2005). Web Directory Construction Using Lexical Chains. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_13

Download citation

  • DOI: https://doi.org/10.1007/11428817_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26031-8

  • Online ISBN: 978-3-540-32110-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics