Advertisement

Journal of Computer Science and Technology

, Volume 23, Issue 4, pp 590–601 | Cite as

Query Performance Prediction for Information Retrieval Based on Covering Topic Score

  • Hao LangEmail author
  • Bin Wang
  • Gareth Jones
  • Jin-Tao Li
  • Fan Ding
  • Yi-Xuan Liu
Regular Paper

Abstract

We present a statistical method called Covering Topic Score (CTS) to predict query performance for information retrieval. Estimation is based on how well the topic of a user’s query is covered by documents retrieved from a certain retrieval system. Our approach is conceptually simple and intuitive, and can be easily extended to incorporate features beyond bag-of-words such as phrases and proximity of terms. Experiments demonstrate that CTS significantly correlates with query performance in a variety of TREC test collections, and in particular CTS gains more prediction power benefiting from features of phrases and proximity of terms. We compare CTS with previous state-of-the-art methods for query performance prediction including clarity score and robustness score. Our experimental results show that CTS consistently performs better than, or at least as well as, these other methods. In addition to its high effectiveness, CTS is also shown to have very low computational complexity, meaning that it can be practical for real applications.

Keywords

information storage and retrieval information search and retrieval query performance prediction covering topic score 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2008_9155_MOESM1_ESM.pdf (100 kb)
(PDF 100 kb)

References

  1. [1]
    Carmel D, Yom-Tov E, Soboroff I. Predicting query difficulty. In Proc. SIGIR Workshop, Salvador, Brazil, 2005, http://www.haifa.ibm.com/sigir05-qp/index.html.
  2. [2]
    Voorhees E M. Overview of the TREC 2004 robust track. In the Online Proceeding of 2004 Text Retrieval Conference (TREC 2004).Google Scholar
  3. [3]
    Yom-Tov E, Fine S, Carmel D, Darlow A. Learning to estimate query difficulty: Including applications to missing content detection and distributed information retrieval. In Proc. the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 2005, pp.512–519.Google Scholar
  4. [4]
    Cronen-Townsend S, Zhou Y, Croft B. Precision prediction based on ranked list coherence. Information Retrieval, 2006, 9(6): 723–755.CrossRefGoogle Scholar
  5. [5]
    Harman D, Buckley C. The NRRC reliable information access (RIA) workshop. In Proc. the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, 2004, pp.528–529.Google Scholar
  6. [6]
    He B, Ounis I. Inferring query performance using pre-retrieval predictors. In Proc. the SPIRE 2004, Padova, Italy, 2004, pp.43–54.Google Scholar
  7. [7]
    Plachouras V, He B, Ounis I. University of Glasgow at TREC2004: Experiments in web, robust, and terabyte tracks with terrier. In the Online Proc. 2004 Text Retrieval Conference (TREC 2004).Google Scholar
  8. [8]
    Mothe J, Tanguy L. Linguistic features to predict query difficulty. In Proc. ACM SIGIR 2005 Workshop on Predicting Query Difficulty-Methods and Applications, 2005.Google Scholar
  9. [9]
    Swen B, Lu X-Q, Zan H-Y, Su Q, Lai Z-G, Xiang K, Hu J-H. Part-of-speech sense matrix model experiments in the TREC 2004 robust track at ICL, PKU. In the Online Proceeding of 2004 Text Retrieval Conference (TREC 2004).Google Scholar
  10. [10]
    Cronen-Townsend S, Zhou Y, Croft W B. Predicting query performance. In Proc. the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 2002, pp.299–306.Google Scholar
  11. [11]
    Amati G, Carpineto C, Romano G. Query difficulty, robustness and selective application of query expansion. In Proc. the 25th European Conference on Information Retrieval, Sunderland, Great Britain, 2004, pp.127–137.Google Scholar
  12. [12]
    Zhou Y, Croft W B. Ranking robustness: A novel framework to predict query performance. In Proc. the 15th ACM International Conference on Information and Knowledge Management. Virginia, USA, 2006, pp.567–574.Google Scholar
  13. [13]
    Vinay V, Cox I J, Milic-Frayling N, Wood K. On ranking the effectiveness of searches. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, 2006, pp.398–404.Google Scholar
  14. [14]
    C J van Rijsbergen. Information Retrieval. Second Edition, London: Butterworths, 1979.Google Scholar
  15. [15]
    Carmel D, Yom-Tov E, Darlow A, Pelleg D. What makes a query difficult? In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, 2006, pp.390–397.Google Scholar
  16. [16]
    Song F, Croft W B. A general language model for information retrieval. In Proc. the 18th ACM International Conference on Information and Knowledge Management, Kansas City, USA, 1999, pp.316–321.Google Scholar
  17. [17]
    D Metzler, W Bruce Croft. A Markov random field model for term dependencies. In Proc. the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 2005, pp.472–479.Google Scholar
  18. [18]
    G Mishne, M de Rijke. Boosting web retrieval through query operations. In Proc. the 27th European Conference on Information Retrieval, pp.502–516.Google Scholar
  19. [19]
    Yang Y, Liu X. A re-examination of text categorization methods. In Proc. the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, USA, 1999, pp.42–49.Google Scholar
  20. [20]
    Wasserman L. All of Statistics: A Concise Course in Statistical Inference. Springer Press, 2004.Google Scholar
  21. [21]
    Tao T, Zhai C. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, 2006, pp.162–169.Google Scholar

Copyright information

© Springer 2008

Authors and Affiliations

  • Hao Lang
    • 1
    Email author
  • Bin Wang
    • 1
  • Gareth Jones
    • 2
  • Jin-Tao Li
    • 1
  • Fan Ding
    • 1
  • Yi-Xuan Liu
    • 1
  1. 1.Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.School of ComputingDublin City UniversityDublinIreland

Personalised recommendations