Advertisement

Web Directory Construction Using Lexical Chains

  • Sofia Stamou
  • Vlassis Krikos
  • Pavlos Kokosis
  • Alexandros Ntoulas
  • Dimitris Christodoulakis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3513)

Abstract

Web Directories provide a way of locating relevant information on the Web. Typically, Web Directories rely on humans putting in significant time and effort into finding important pages on the Web and categorizing them in the Directory. In this paper we present a way for automating the creation of a Web Directory. At a high level, our method takes as input a subject hierarchy and a collection of pages. We first leverage a variety of lexical resources from the Natural Language Processing community to enrich our hierarchy. After that, we process the pages and identify sequences of important terms, which are referred to as lexical chains. Finally, we use the lexical chains in order to decide where in the enriched subject hierarchy we should assign every page. Our experimental results with real Web data show that our method is quite promising into assisting humans during page categorization.

Keywords

Subject Hierarchy Open Directory Project Lexical Chain Lexical Cohesion Domain Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Google Directory, http://dir.google.com
  2. 2.
  3. 3.
    MultiWordNet Domains, http://wndomains.itc.it/
  4. 4.
    Open Directory Project, http://dmoz.com
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
    Yahoo! Inc. MyYahoo, http://my.yahoo.com
  10. 10.
    Anderson, C.R., Horvitz, E.: Web montage: A dynamic personalized start page. In: Proceedings of the 11th WWW Conference, pp. 704–712 (2002)Google Scholar
  11. 11.
    Barzilay, R., Elhadad, M.: Lexical chains for text summarization. Master’s Thesis, Ben-Gurion University (1997)Google Scholar
  12. 12.
    Broder, A.Z., Glassman, S.C., Manasse, M., Zweig, G.: Syntactic clustering of the web. In: Proceedings of the 6th WWW Conference, pp. 1157–1166 (1997)Google Scholar
  13. 13.
    Chakrabarti, S., Dom, B., Agraval, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB Journal 7, 163–178 (1998)CrossRefGoogle Scholar
  14. 14.
    Chekuri, C., Goldwasser, M., Raghavan, P., Upfal, E.: Web search using automated classification. In: Proceedings of the 6th WWW Conference (1997)Google Scholar
  15. 15.
    Chen, H., Dumais, S.: Bringing order to the web: Automatically categorizing search results. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 145–152 (2000)Google Scholar
  16. 16.
    Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M.: THESUS: Organizing web document collections based on link semantics. VLDB Journal 12, 320–332 (2003)CrossRefGoogle Scholar
  17. 17.
    Haveliwala, T.: Topic sensitive PageRank. In: Proceedings of the 11th WWW Conference, pp. 517–526 (2002)Google Scholar
  18. 18.
    Hirst, G., St-Onge, D.: Lexical chains as representations of content for the detection and correction of malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 305–332. MIT Press, Cambridge (1998)Google Scholar
  19. 19.
    Huang, C.C., Chuang, S.L., Chien, L.K.: LiveClassifier: Creating hierarchical text classifiers through web corpora. In: Proceedings of the 13th WWW Conference, pp. 184–192 (2004)Google Scholar
  20. 20.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & sons, New York (1990)Google Scholar
  21. 21.
    Kummumuru, K., Lotlikar, R., Roy, S., Singai, K., Krishnapuram, R.: A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In: Proceedings of the 13th WWW Conference, pp. 658–655 (2004)Google Scholar
  22. 22.
    Mladenic, D.: Turning Yahoo into an automatic web page classifier. In: Proceedings of the 13th European Conference on Artificial Intelligence, pp. 473–474 (1998)Google Scholar
  23. 23.
    Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics 17(1), 21–43 (1991)Google Scholar
  24. 24.
    Ntoulas, A., Cho, J., Olston, C.: What’s new on the web? The evolution of the web from a search engine perspective. In: Proceedings of the 13th WWW Conference, pp. 1–12 (2004)Google Scholar
  25. 25.
    Olston, C., Chi, E.: ScentTrails: Intergrading browsing and searching. ACM Transactions on Computer-Human Interaction 10(3), 1–21 (2003)CrossRefGoogle Scholar
  26. 26.
    Song, Y.I., Han, K.S., Rim, H.C.: A term weighting method based on lexical chain for automatic summarization. In: Proc. of the 5th CICLing Conference, pp. 636–639 (2004)Google Scholar
  27. 27.
    Montoyo, A., Palomar, M., Rigau, G.: WordNet Enrichment with Classification Systems. In: Proc. of NAACL Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customization (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Sofia Stamou
    • 1
  • Vlassis Krikos
    • 1
  • Pavlos Kokosis
    • 1
  • Alexandros Ntoulas
    • 2
  • Dimitris Christodoulakis
    • 1
  1. 1.Computer Technology Institute, Computer Engineering DepartmentPatras UniversityPatrasGreece
  2. 2.Computer Science DepartmentUniversity of CaliforniaLos AngelesUSA

Personalised recommendations