Abstract
Automatic query expansion may be used in document retrieval to improve search effectiveness. Traditional query expansion methods are based on the document collection itself. For example, pseudo-relevance feedback (PRF) assumes that the top retrieved documents are relevant, and uses the terms extracted from those documents for query expansion. However, there are other sources of evidence that can be used for expansion, some of which may give better search results with greater efficiency at query time. In this paper, we use the external evidence, especially the hints obtained from external web search engines to expand the original query. We explore 6 different methods using search engine query log, snippets and search result documents. We conduct extensive experiments, with state of the art PRF baselines and careful parameter tuning, on three TREC collections: AP, WT10g, GOV2. Log-based methods do not show consistent significant gains, despite being very efficient at query-time. Snippet-based expansion, using the summaries provided by an external search engine, provides significant effectiveness gains with good efficiency at query-time.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bai, J., Song, D., Bruza, P., Nie, J.-Y., Cao, G.: Query expansion using term relationships in language models for information retrieval. In: Proceedings of the 14th ACM CIKM conference, Bremen, Germany, pp. 688–695 (2005)
Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of the sixth ACM SIGKDD conference, Boston, MA, pp. 407–416 (2000)
Billerbeck, B., Scholer, F., Williams, H.E., Zobel, J.: Query expansion using associated queries. In: Proceedings of the twelfth ACM CIKM conference, New Orleans, LA, pp. 2–9 (2003)
Buscher, G., Dengel, A., van Elst, L.: Query expansion using gaze-based feedback on the subdocument level. In: Proceedings of the 31st ACM SIGIR conference, Singapore, pp. 387–394 (2008)
Cao, G., Nie, J.-Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st ACM SIGIR conference, Singapore, pp. 243–250 (2008)
Cao, G., Robertson, S., Nie, J.-Y.: Selecting query term alternations for web search by exploiting query contexts. In: Proceedings of ACL 2008: HLT, Columbus, Ohio, pp. 148–155. Association for Computational Linguistics (June 2008)
Collins-Thompson, K., Callan, J.: Query expansion using random walk models. In: Proceedings of the 14th ACM CIKM conference, Bremen, Germany, pp. 704–711 (2005)
Craswell, N., Szummer, M.: Random walks on the click graph. In: Proceedings of the 30th ACM SIGIR conference, Amsterdam, The Netherlands, pp. 239–246 (2007)
Cui, H., Wen, J.-R., Nie, J.-Y., Ma, W.-Y.: Probabilistic query expansion using query logs. In: Proceedings of the 11th WWW conference, Honolulu, HI, pp. 325–332 (2002)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)
Rocchio Jr., J.J.: The smart retrieval system: Experiments in automatic document processing. In: Relevance feedback in information retrieval, pp. 313–323 (1971)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of the 24th ACM SIGIR conference, New Orleans, LA, pp. 120–127 (2001)
Metzler, D., Croft, W.B.: Latent concept expansion using markov random fields. In: Proceedings of the 30th ACM SIGIR conference, Amsterdam, The Netherlands, pp. 311–318 (2007)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st ACM SIGIR conference, Melbourne, Australia, pp. 275–281 (1998)
Robertson, S.: On gmap: and other transformations. In: Proceedings of the 15th ACM CIKM conference, Arlington, VA, pp. 78–83 (2006)
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th WWW conference, Edinburgh, Scotland, pp. 377–386 (2006)
Tao, T., Zhai, C.: Regularized estimation of mixture models for robust pseudo-relevance feedback. In: Proceedings of the 29th ACM SIGIR conference, Seattle, WA, pp. 162–169. ACM, New York (2006)
Voorhees, E.: The TREC robust retrieval track. In: ACM SIGIR Forum, vol. 39, pp. 11–20. ACM, New York (2005)
Wang, X., Fang, H., Zhai, C.: A study of methods for negative relevance feedback. In: Proceedings of the 31st ACM SIGIR conference, Singapore, pp. 219–226 (2008)
Wen, J., Nie, J., Zhang, H.: Query clustering using user logs. ACM Transactions on Information Systems 20(1), 59–81 (2002)
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proceedings of the 19th ACM SIGIR conference, Zurich, Switzerland, pp. 4–11 (1996)
Xu, Z., Akella, R.: A bayesian logistic regression model for active relevance feedback. In: Proceedings of the 31st ACM SIGIR conference, Singapore, pp. 227–234 (2008)
Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the tenth ACM CIKM conference, Atlanta, GA, pp. 403–410 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yin, Z., Shokouhi, M., Craswell, N. (2009). Query Expansion Using External Evidence. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-00958-7_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)