Skip to main content

Explicit Search Result Diversification through Sub-queries

  • Conference paper
Advances in Information Retrieval (ECIR 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

Abstract

Queries submitted to a retrieval system are often ambiguous. In such a situation, a sensible strategy is to diversify the ranking of results to be retrieved, in the hope that users will find at least one of these results to be relevant to their information need. In this paper, we introduce xQuAD, a novel framework for search result diversification that builds such a diversified ranking by explicitly accounting for the relationship between documents retrieved for the original query and the possible aspects underlying this query, in the form of sub-queries. We evaluate the effectiveness of xQuAD using a standard TREC collection. The results show that our framework markedly outperforms state-of-the-art diversification approaches under a simulated best-case scenario. Moreover, we show that its effectiveness can be further improved by estimating the relative importance of each identified sub-query. Finally, we show that our framework can still outperform the simulated best-case scenario of the state-of-the-art diversification approaches using sub-queries automatically derived from the baseline document ranking itself.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Spärck-Jones, K., Robertson, S.E., Sanderson, M.: Ambiguous requests: implications for retrieval tests, systems and theories. SIGIR Forum 41(2), 8–17 (2007)

    Article  Google Scholar 

  2. Robertson, S.E.: The probability ranking principle in IR. Journal of Documentation 33(4), 294–304 (1977)

    Article  Google Scholar 

  3. Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: SIGIR, pp. 659–666 (2008)

    Google Scholar 

  4. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)

    Google Scholar 

  5. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998)

    Google Scholar 

  6. Hochbaum, D.S. (ed.): Approximation algorithms for NP-hard problems. PWS Publishing Co. (1997)

    Google Scholar 

  7. Zhai, C., Cohen, W.W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: SIGIR, pp. 10–17 (2003)

    Google Scholar 

  8. Wang, J., Zhu, J.: Portfolio theory of information retrieval. In: SIGIR, pp. 115–122 (2009)

    Google Scholar 

  9. Radlinski, F., Dumais, S.: Improving personalized web search using result diversification. In: SIGIR, pp. 691–692 (2006)

    Google Scholar 

  10. Mihalcea, R.: Using Wikipedia for automatic word sense disambiguation. In: HLT-NAACL, pp. 196–203 (2007)

    Google Scholar 

  11. Zeng, H.J., He, Q.C., Chen, Z., Ma, W.Y., Ma, J.: Learning to cluster web search results. In: SIGIR, pp. 210–217 (2004)

    Google Scholar 

  12. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Berkeley SMSP, pp. 281–297 (1967)

    Google Scholar 

  13. Amati, G.: Probability models for information retrieval based on Divergence From Randomness. PhD thesis, University of Glasgow (2003)

    Google Scholar 

  14. Callan, J.: Distributed information retrieval. In: Croft, W.B. (ed.) Advances in Information Retrieval, pp. 127–150. Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  15. Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: SIGIR, pp. 298–305 (2003)

    Google Scholar 

  16. Shokouhi, M.: Central-rank-based collection selection in uncooperative distributed information retrieval. In: ECIR, pp. 160–172 (2007)

    Google Scholar 

  17. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: a high performance and scalable information retrieval platform. In: SIGIR/OSIR (2006)

    Google Scholar 

  18. Hersh, W., Over, P.: TREC-8 Interactive track report. In: TREC (2000)

    Google Scholar 

  19. Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at TREC. In: TREC (1992)

    Google Scholar 

  20. Amati, G., Ambrosi, E., Bianchi, M., Gaibisso, C., Gambosi, G.: FUB, IASI-CNR and University of Tor Vergata at TREC 2007 Blog track. In: TREC (2007)

    Google Scholar 

  21. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM TOIS 20(4), 422–446 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Santos, R.L.T., Peng, J., Macdonald, C., Ounis, I. (2010). Explicit Search Result Diversification through Sub-queries. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12275-0_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12274-3

  • Online ISBN: 978-3-642-12275-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics