Skip to main content

Landscape of Web Search Results Clustering Algorithms

  • Conference paper
Advances in Computing, Communication and Control (ICAC3 2011)

Abstract

Searching for information on the Webhas attracted great attention in many research com-communities. Due to the enormous size of the Web and low precision of user queries, results returned from present web search engines can reach hundreds or even hundreds of thousands documents. Therefore, finding the right information can be difficult if not impossible. One approach that tries to solve this problem is by using clustering techniques for grouping similar document together in order to facilitate presentation of results in more compact form and enable thematic browsing of the results set. Web Search Results clustering is about efficient identification of meaningful, thematic groups of documents in a search result and their concise presentation. This paper is an introduction to the problem of web search results clustering and we have a brief survey of previous work on web search results clustering and existing commercial search engines using this technique, and propose the possibility of future research direction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proc. ACMSIGIR 1998, Melbourne, Australia, pp. 46–54 (1998)

    Google Scholar 

  2. Zhong, S.: Semi-supervised model-based document clustering: A comparative study. Springer, Heidelberg (2006)

    Google Scholar 

  3. Janruang, J., Kreesuradej, W.: A New Web Search Result Clustering based True Common Phrase Label Discovery. In: International Conference on Computational Intelligence for Modeling Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC 2006) (2006)

    Google Scholar 

  4. Crabtree, D., Gao, X., Andreae, P.: Improving Web Clustering by Cluster Selection. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2005 (2005)

    Google Scholar 

  5. Zamir, O.: Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. PhD Thesis University of Washington (1999)

    Google Scholar 

  6. Zhang, D., Dong, Y.: Semantic, Hierarchical, Online Clustering of Web Search Results (unpublished)

    Google Scholar 

  7. Mecca, G., Raunich, S., Pappalardo, A.: A new algorithm for clustering search results. Science Direct Data & Knowledge Engineering 62, 504–522 (2007)

    Article  Google Scholar 

  8. Zeng, H.-J., et al.: Learning to Cluster Web Search Results. In: SIGIR 2004, Peking University (2004)

    Google Scholar 

  9. Campos, R., Dias, G., Nunes, C.: WISE: Hierarchical Soft Clustering of Web Page Search Results based on Web Content Mining Techniques. In: Proceedings of the 2006. IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006). IEEE, Los Alamitos (2006)

    Google Scholar 

  10. Osinski, S., Weiss, D.: Conceptual clustering using lingo algorithm: Evaluation on open directory project data. In: IIPWM 2004 (2004)

    Google Scholar 

  11. Osiński, S.: An algorithm for clustering of web search results, Master thesis (2003)

    Google Scholar 

  12. Porter, M.F.: An algorithm for suffix stripping. In: Readings in Information Retrieval, pp. 313–316 (1997)

    Google Scholar 

  13. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing largee document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329 (1992)

    Google Scholar 

  14. Pirolli, P., Schank, P., Hearst, M., Diehl, C.: Scatter/gather browsing communicates the topic structure of a very largee text collection. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 213–220 (1996)

    Google Scholar 

  15. Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proceedings of SIGIR 1996, 19th ACM International Conference on Research and Development in Information Retrieval, ZÄurich, CH, pp. 76–84 (1996)

    Google Scholar 

  16. Weiss, D.: Carrot2 Developers Guide, http://www.cs.put.poznan.pl/dweiss/carrot/site/developers/man

  17. Wroblewski, M.: A hierarchical www pages clustering algorithm based on the vector space model. Master’s thesis, Poznan University of Technology, Poland (July 2003)

    Google Scholar 

  18. Borch, H.O.: Clustering On-line Clustering of Web Search Results. M.S. Thesis Norwegian University of Science and Technology (2006)

    Google Scholar 

  19. Crabtree, D., Gao, X., Andreae, P.: Improving Query Directed Web Page lustering. In: Proceedings of the 2006 (2006)

    Google Scholar 

  20. Ferragina, P., Gulli, A.: The Anatomy of a Hierarchical Clustering Engine for Webpage, News and Book Snippets. Technical report, RR04-04 Informatica, Pisa (2004)

    Google Scholar 

  21. Segond, F.,Shiller, A.,Grefenstette, G., Chanod, J.-P.: An Experiment in Semantic Tagging Using Hidden Markov Model Tagging. In: Dans ACL 1997 Workshop on Information Extraction and the Building of Lexical Semantic Resources for NLP Applications (1997)

    Google Scholar 

  22. Ferragina, P., Gulli, A.: A personalized Search Engine Based On Web-Snippet hierarchical Clustering. In: 14th International World Wide Web Conference (2005)

    Google Scholar 

  23. Jiang, Z.H., Joshi, A., Krishnapuram, R., Yi, L.Y.: Retriever: improving web search engine results using clustering. In: Managing Business and Electronic Commerce (2002)

    Google Scholar 

  24. Wang, Y., Zuo, W., Peng, T., He, F., Hu, H.: Clustering Web Search Results Based on Interactive Suffix Tree Algorithm. In: Third 2008 International Conference on Convergence and Hybrid Information Technology (2008)

    Google Scholar 

  25. Wen, H., Huang, G.-S., Li, Z.: Clustering web Search using semantic information. In: Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, July 12-15 (2009)

    Google Scholar 

  26. Chim, H., Deng, X.: Efficient Phrase-based Document similarity for clustering. IEEE Transaction on Knowledge and Data Engineering 20(9) (September 2008)

    Google Scholar 

  27. Kale, A., Bharambe, U.: A New Suffix Tree Similarity Measure and Labeling for Web Search Results Clustering. In: Second International Conference on Emerging Trends in Engineering & Technology, pp. 856–861 (2009)

    Google Scholar 

  28. www.stanford.edu/class/cs276a/projects/.../arigreen-sbranson.pdf (visited_on march 2009)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bharambe, U., Kale, A. (2011). Landscape of Web Search Results Clustering Algorithms. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds) Advances in Computing, Communication and Control. ICAC3 2011. Communications in Computer and Information Science, vol 125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18440-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-18440-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-18439-0

  • Online ISBN: 978-3-642-18440-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics