Measuring the Variability in Effectiveness of a Retrieval System

Hosseini, Mehdi; Cox, Ingemar J.; Millic-Frayling, Natasa; Vinay, Vishwa

doi:10.1007/978-3-642-13084-7_7

Mehdi Hosseini¹⁹,
Ingemar J. Cox¹⁹,
Natasa Millic-Frayling²⁰ &
…
Vishwa Vinay²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6107))

Included in the following conference series:

Information Retrieval Facility Conference

406 Accesses

Abstract

A typical evaluation of a retrieval system involves computing an effectiveness metric, e.g. average precision, for each topic of a test collection and then using the average of the metric, e.g. mean average precision, to express the overall effectiveness. However, averages do not capture all the important aspects of effectiveness and, used alone, may not be an informative measure of systems’ effectiveness. Indeed, in addition to the average, we need to consider the variation of effectiveness across topics. We refer to this variation as the variability in effectiveness. In this paper we explore how the variance of a metric can be used as a measure of variability. We define a variability metric, and illustrate how the metric can be used in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Collins-Thompson, K.: Robust word similarity estimation using perturbation kernels. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 265–272. Springer, Heidelberg (2009)
Chapter Google Scholar
Collins-Thompson, K., Callan, J.: Estimation and use of uncertainty in pseudo-relevance feedback. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 303–310. ACM, New York (2007)
Chapter Google Scholar
Cormack, G.V., Lynam, T.R.: Statistical precision of information retrieval evaluation. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 533–540. ACM, New York (2006)
Chapter Google Scholar
Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In: SIGIR 1993: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 329–338. ACM, New York (1993)
Chapter Google Scholar
Lee, C.T., Vinay, V., Mendes Rodrigues, E., Kazai, G., Milic-Frayling, N., Ignjatovic, A.: Measuring system performance and topic discernment using generalized adaptive-weight mean. In: CIKM 2009: Proceeding of the 18th ACM conference on Information and knowledge management, pp. 2033–2036. ACM, New York (2009)
Chapter Google Scholar
Levene, H.: Robust test for equality of variances. Contributions to Probability and Statistics: Essays in Honor of Harold Hotteling, 278–292 (1960)
Google Scholar
Lin, W.-H., Hauptmann, A.: Revisiting the effect of topic set size on retrieval error. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 637–638. ACM, New York (2005)
Chapter Google Scholar
Sakai, T.: Evaluating evaluation metrics based on the bootstrap. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 525–532. ACM, New York (2006)
Chapter Google Scholar
Sanderson, M., Zobel, J.: Information retrieval system evaluation: effort, sensitivity, and reliability. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 162–169. ACM, New York (2005)
Chapter Google Scholar
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM 2007: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 623–632. ACM, New York (2007)
Chapter Google Scholar
Voorhees, E.M.: The trec robust retrieval track. SIGIR Forum 39(1), 11–20 (2005)
Article Google Scholar
Voorhees, E.M., Buckley, C.: The effect of topic set size on retrieval experiment error. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 316–323. ACM Press, New York (2002)
Chapter Google Scholar
Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: SIGIR 2008: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 51–58. ACM, New York (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University College London,
Mehdi Hosseini & Ingemar J. Cox
Microsoft Research Cambridge,
Natasa Millic-Frayling & Vishwa Vinay

Authors

Mehdi Hosseini
View author publications
You can also search for this author in PubMed Google Scholar
Ingemar J. Cox
View author publications
You can also search for this author in PubMed Google Scholar
Natasa Millic-Frayling
View author publications
You can also search for this author in PubMed Google Scholar
Vishwa Vinay
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, University of Sheffield, Regent Court, 211 Portobello St., S1 4DP, Sheffield, UK
Hamish Cunningham
Information Retrieval Facility, Operngasse 20b, 1040, Vienna, Austria
Allan Hanbury
Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
Stefan Rüger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hosseini, M., Cox, I.J., Millic-Frayling, N., Vinay, V. (2010). Measuring the Variability in Effectiveness of a Retrieval System. In: Cunningham, H., Hanbury, A., Rüger, S. (eds) Advances in Multidisciplinary Retrieval. IRFC 2010. Lecture Notes in Computer Science, vol 6107. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13084-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-13084-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13083-0
Online ISBN: 978-3-642-13084-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics