Skip to main content

Phrase-Based Hierarchical Clustering of Web Search Results

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Abstract

The paper addresses the problem of clustering text documents coming from the Web. We apply clustering to support users in interactive browsing through hierarchically organized search results as opposed to the standard ranked-list presentation. We propose a clustering method that is tailored to on-line processing of Web documents and takes into account the time aspect, the particular requirements of clustering texts, and readability of the produced hierarchy. Finally, we present the user interface of an actual system in which the method is applied to the results of a popular search engine.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boley D., Gini M. et al. (1999) Partitioning-based clustering for web document categorization. Decision Support Systems 27(3), 329–341

    Article  Google Scholar 

  2. Hearst M. A. (1998) The use of categories and clusters in information access interfaces. T. Strzalkowski (ed.), Natural Language Information Retrieval. Kluwer Academic Publishers

    Google Scholar 

  3. Hearst M. A., Pedersen J. O. (1996) Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. Proc. of the 19th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 85–92

    Google Scholar 

  4. Maarek Y. S., Fagin R. et al. (2000) Ephemeral document clustering for Web applications. IBM Research Report RJ 10186, Haifa

    Google Scholar 

  5. Masłowska I., Weiss D. (2000) JUICER — a data mining approach to information extraction from the WWW, Foundations of Computing and Decision Sciences 25(2), 67–87

    Google Scholar 

  6. Milligan G. W., Cooper M. C. (1985) An examination of procedures for detecting the number of clusters in a data set. Psychometrika 50, 159–79

    Article  Google Scholar 

  7. van Rijsbergen C. J. (1979) Information Retrieval, Butterworths, London

    Google Scholar 

  8. Roy B. (1969) Algèbre moderne et théorie des graphes orientées vers les sciences économiques et sociales, Dunod

    Google Scholar 

  9. Salton G. (1989) Automatic Text Processing, Addison-Wesley

    Google Scholar 

  10. Ukkonen E. (1995) On-line construction of suffix trees, Algorithmica 14, 249–260

    Article  MATH  MathSciNet  Google Scholar 

  11. Voorhees E. M. (1986) Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management 22, 465–76

    Article  Google Scholar 

  12. Weiss D. (2001) A Clustering Interface for Web Search Results in Polish and English. Master Thesis, Poznań University of Technology (http://www.cs.put.poznan.pl/dweiss/index.php/publications/)

  13. Willett P. (1988) Recent trends in hierarchical document clustering: A critical review. Information Processing & Management 24(5), 577–597

    Article  Google Scholar 

  14. Zamir O. (1999) Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. Doctoral dissertation, University of Washington

    Google Scholar 

  15. Zamir O., Etzioni O. (1998) Web Document Clustering: A Feasibility Demonstration. Proc. of the 21st Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 46–54

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Masłowska, I. (2003). Phrase-Based Hierarchical Clustering of Web Search Results. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_42

Download citation

  • DOI: https://doi.org/10.1007/3-540-36618-0_42

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-01274-0

  • Online ISBN: 978-3-540-36618-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics