Discovering Web Document Associations for Web Site Summarization

  • K. Selςuk Candan⋆
  • Wen-Syan Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2114)


Complex web information structures prevent search engines from providing satisfactory context-sensitive retrieval. We see that in order to overcome this obstacle, it is essential to use techniques that recover the web authors’ intentions and superimpose them with the users’ retrieval contexts in summarizing web sites. Therefore, in this paper, we present a framework for discovering implicit associations among web documents for effective web site summarization. In the proposed framework, associations of web documents are induced by the web structure embedding them, as well as the contents of the documents and users’ interests. We analyze the semantics of document associations and describe an algorithm which capture these semantics for enumerating and ranking possible document associations. We then use these asociations in creating context-sensitive summaries of web neighborhoods.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Wen-Syan Li, Okan Kolak, Quoc Vu, and Hajime Takano. Defining Logical Domains in a Web Site. In Proceedings of the 11th ACM Conference on Hypertext, pages 123–132, San Antonio, TX, USA, May 2000.Google Scholar
  2. [2]
    Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 668–677, January 1998.Google Scholar
  3. [3]
    Wen-Syan Li and K Selςuk Candan. Integrating Content Search with Structure Analysis for Hypermedia Retrieval and Management. ACM Computing Surveys, 31(4es):13, 1999.CrossRefGoogle Scholar
  4. [4]
    K. Selςuk Candan and Wen-Syan Li. Using Random Walks for Mining Web Document Associations. In Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 294–305, Kyoto, Japan, April 2000.Google Scholar
  5. [5]
    Krishna Bharat and Monika Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21th Annual International ACM SIGIR Conference, pages 104–111, Melbourne, Australia, August 1998.Google Scholar
  6. [6]
    Lawrence Page and Sergey Brin. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proceedings of the 7th World-Wide Web Conference, Brisbane, Queensland, Australia, April 1998.Google Scholar
  7. [7]
    T. Joachims, D. Freitag, and T. Mitchell. Webwatcher: A tour guide for the world wide web. In Proceedings of the 1997 Internaltional Joint Conference on Artificial Intelligence, August 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • K. Selςuk Candan⋆
    • 1
  • Wen-Syan Li
    • 2
  1. 1.Computer Sci. and Eng. DeptArizona State UniversityTempeUSA
  2. 2.CCRL, NEC USA, Inc.San JoseUSA

Personalised recommendations