Abstract
Searching for information on the Webhas attracted great attention in many research com-communities. Due to the enormous size of the Web and low precision of user queries, results returned from present web search engines can reach hundreds or even hundreds of thousands documents. Therefore, finding the right information can be difficult if not impossible. One approach that tries to solve this problem is by using clustering techniques for grouping similar document together in order to facilitate presentation of results in more compact form and enable thematic browsing of the results set. Web Search Results clustering is about efficient identification of meaningful, thematic groups of documents in a search result and their concise presentation. This paper is an introduction to the problem of web search results clustering and we have a brief survey of previous work on web search results clustering and existing commercial search engines using this technique, and propose the possibility of future research direction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proc. ACMSIGIR 1998, Melbourne, Australia, pp. 46–54 (1998)
Zhong, S.: Semi-supervised model-based document clustering: A comparative study. Springer, Heidelberg (2006)
Janruang, J., Kreesuradej, W.: A New Web Search Result Clustering based True Common Phrase Label Discovery. In: International Conference on Computational Intelligence for Modeling Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC 2006) (2006)
Crabtree, D., Gao, X., Andreae, P.: Improving Web Clustering by Cluster Selection. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2005 (2005)
Zamir, O.: Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. PhD Thesis University of Washington (1999)
Zhang, D., Dong, Y.: Semantic, Hierarchical, Online Clustering of Web Search Results (unpublished)
Mecca, G., Raunich, S., Pappalardo, A.: A new algorithm for clustering search results. Science Direct Data & Knowledge Engineering 62, 504–522 (2007)
Zeng, H.-J., et al.: Learning to Cluster Web Search Results. In: SIGIR 2004, Peking University (2004)
Campos, R., Dias, G., Nunes, C.: WISE: Hierarchical Soft Clustering of Web Page Search Results based on Web Content Mining Techniques. In: Proceedings of the 2006. IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006). IEEE, Los Alamitos (2006)
Osinski, S., Weiss, D.: Conceptual clustering using lingo algorithm: Evaluation on open directory project data. In: IIPWM 2004 (2004)
Osiński, S.: An algorithm for clustering of web search results, Master thesis (2003)
Porter, M.F.: An algorithm for suffix stripping. In: Readings in Information Retrieval, pp. 313–316 (1997)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing largee document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329 (1992)
Pirolli, P., Schank, P., Hearst, M., Diehl, C.: Scatter/gather browsing communicates the topic structure of a very largee text collection. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 213–220 (1996)
Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proceedings of SIGIR 1996, 19th ACM International Conference on Research and Development in Information Retrieval, ZÄurich, CH, pp. 76–84 (1996)
Weiss, D.: Carrot2 Developers Guide, http://www.cs.put.poznan.pl/dweiss/carrot/site/developers/man
Wroblewski, M.: A hierarchical www pages clustering algorithm based on the vector space model. Master’s thesis, Poznan University of Technology, Poland (July 2003)
Borch, H.O.: Clustering On-line Clustering of Web Search Results. M.S. Thesis Norwegian University of Science and Technology (2006)
Crabtree, D., Gao, X., Andreae, P.: Improving Query Directed Web Page lustering. In: Proceedings of the 2006 (2006)
Ferragina, P., Gulli, A.: The Anatomy of a Hierarchical Clustering Engine for Webpage, News and Book Snippets. Technical report, RR04-04 Informatica, Pisa (2004)
Segond, F.,Shiller, A.,Grefenstette, G., Chanod, J.-P.: An Experiment in Semantic Tagging Using Hidden Markov Model Tagging. In: Dans ACL 1997 Workshop on Information Extraction and the Building of Lexical Semantic Resources for NLP Applications (1997)
Ferragina, P., Gulli, A.: A personalized Search Engine Based On Web-Snippet hierarchical Clustering. In: 14th International World Wide Web Conference (2005)
Jiang, Z.H., Joshi, A., Krishnapuram, R., Yi, L.Y.: Retriever: improving web search engine results using clustering. In: Managing Business and Electronic Commerce (2002)
Wang, Y., Zuo, W., Peng, T., He, F., Hu, H.: Clustering Web Search Results Based on Interactive Suffix Tree Algorithm. In: Third 2008 International Conference on Convergence and Hybrid Information Technology (2008)
Wen, H., Huang, G.-S., Li, Z.: Clustering web Search using semantic information. In: Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, July 12-15 (2009)
Chim, H., Deng, X.: Efficient Phrase-based Document similarity for clustering. IEEE Transaction on Knowledge and Data Engineering 20(9) (September 2008)
Kale, A., Bharambe, U.: A New Suffix Tree Similarity Measure and Labeling for Web Search Results Clustering. In: Second International Conference on Emerging Trends in Engineering & Technology, pp. 856–861 (2009)
www.stanford.edu/class/cs276a/projects/.../arigreen-sbranson.pdf (visited_on march 2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bharambe, U., Kale, A. (2011). Landscape of Web Search Results Clustering Algorithms. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds) Advances in Computing, Communication and Control. ICAC3 2011. Communications in Computer and Information Science, vol 125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18440-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-18440-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18439-0
Online ISBN: 978-3-642-18440-6
eBook Packages: Computer ScienceComputer Science (R0)