Skip to main content

The Combination and Evaluation of Query Performance Prediction Methods

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Abstract

In this paper, we examine a number of newly applied methods for combining pre-retrieval query performance predictors in order to obtain a better prediction of the query’s performance. However, in order to adequately and appropriately compare such techniques, we critically examine the current evaluation methodology and show how using linear correlation coefficients (i) do not provide an intuitive measure indicative of a method’s quality, (ii) can provide a misleading indication of performance, and (iii) overstate the performance of combined methods. To address this, we extend the current evaluation methodology to include cross validation, report a more intuitive and descriptive statistic, and apply statistical testing to determine significant differences. During the course of a comprehensive empirical study over several TREC collections, we evaluate nineteen pre-retrieval predictors and three combination methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rank Correlation Methods. Hafner Publishing Co., New York (1955)

    Google Scholar 

  2. WordNet - An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  3. Bach, F.R.: Bolasso: Model consistent lasso estimation through the bootstrap. In: ICML (2008)

    Google Scholar 

  4. Banerjee, S., Pedersen, T.: Extended gloss overlaps as a measure of semantic relatedness. In: IJCAI 2003, pp. 805–810 (2003)

    Google Scholar 

  5. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: SIGIR 2002, pp. 299–306 (2002)

    Google Scholar 

  6. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Statist. 32(2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  7. He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 43–54. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. He, J., Larson, M., de Rijke, M.: Using coherence-based measures to predict query difficulty. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 689–694. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Krovetz, R.: Viewing morphology as an inference process. In: SIGIR 1993, pp. 191–202 (1993)

    Google Scholar 

  10. Macdonald, C., He, B., Ounis, I.: Predicting query performance in intranet search. In: SIGIR 2005 Query Prediction Workshop (2005)

    Google Scholar 

  11. Meng, X., Rosenthal, R., Rubin, D.: Comparing correlated correlation coefficients. Psych. Bull. 111, 172–175 (1992)

    Article  Google Scholar 

  12. Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty - a case study on previous trec campaigns. In: SIGIR 2005 Query Prediction Workshop (2005)

    Google Scholar 

  13. Scholer, F., Williams, H., Turpin, A.: Query association surrogates for web search. Journal of the American Society for Information Science and Technology 55(7), 637–650 (2004)

    Article  Google Scholar 

  14. Segal, M.R., Dahlquist, K.D., Conklin, B.R.: Regression approaches for microarray data analysis. J. Comput. Biol. 10(6), 961–980 (2003)

    Article  Google Scholar 

  15. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  16. Vinay, V., Cox, I.J., Milic-Frayling, N., Wood, K.: On ranking the effectiveness of searches. In: SIGIR 2006, pp. 398–404 (2006)

    Google Scholar 

  17. Voorhees, E.: Overview of the trec 2003 robust retrieval track. In: Proceedings of the Twelfth Text REtrieval Conference (2003)

    Google Scholar 

  18. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR 2001, pp. 334–342 (2001)

    Google Scholar 

  19. Zhao, Y., Scholer, F., Tsegay, Y.: Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  20. Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: SIGIR 2007, pp. 543–550 (2007)

    Google Scholar 

  21. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Statist. Soc. B 67(2), 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hauff, C., Azzopardi, L., Hiemstra, D. (2009). The Combination and Evaluation of Query Performance Prediction Methods. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00958-7_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00957-0

  • Online ISBN: 978-3-642-00958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics