Abstract
A typical evaluation of a retrieval system involves computing an effectiveness metric, e.g. average precision, for each topic of a test collection and then using the average of the metric, e.g. mean average precision, to express the overall effectiveness. However, averages do not capture all the important aspects of effectiveness and, used alone, may not be an informative measure of systems’ effectiveness. Indeed, in addition to the average, we need to consider the variation of effectiveness across topics. We refer to this variation as the variability in effectiveness. In this paper we explore how the variance of a metric can be used as a measure of variability. We define a variability metric, and illustrate how the metric can be used in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Collins-Thompson, K.: Robust word similarity estimation using perturbation kernels. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 265–272. Springer, Heidelberg (2009)
Collins-Thompson, K., Callan, J.: Estimation and use of uncertainty in pseudo-relevance feedback. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 303–310. ACM, New York (2007)
Cormack, G.V., Lynam, T.R.: Statistical precision of information retrieval evaluation. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 533–540. ACM, New York (2006)
Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In: SIGIR 1993: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 329–338. ACM, New York (1993)
Lee, C.T., Vinay, V., Mendes Rodrigues, E., Kazai, G., Milic-Frayling, N., Ignjatovic, A.: Measuring system performance and topic discernment using generalized adaptive-weight mean. In: CIKM 2009: Proceeding of the 18th ACM conference on Information and knowledge management, pp. 2033–2036. ACM, New York (2009)
Levene, H.: Robust test for equality of variances. Contributions to Probability and Statistics: Essays in Honor of Harold Hotteling, 278–292 (1960)
Lin, W.-H., Hauptmann, A.: Revisiting the effect of topic set size on retrieval error. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 637–638. ACM, New York (2005)
Sakai, T.: Evaluating evaluation metrics based on the bootstrap. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 525–532. ACM, New York (2006)
Sanderson, M., Zobel, J.: Information retrieval system evaluation: effort, sensitivity, and reliability. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 162–169. ACM, New York (2005)
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM 2007: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 623–632. ACM, New York (2007)
Voorhees, E.M.: The trec robust retrieval track. SIGIR Forum 39(1), 11–20 (2005)
Voorhees, E.M., Buckley, C.: The effect of topic set size on retrieval experiment error. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 316–323. ACM Press, New York (2002)
Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: SIGIR 2008: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 51–58. ACM, New York (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hosseini, M., Cox, I.J., Millic-Frayling, N., Vinay, V. (2010). Measuring the Variability in Effectiveness of a Retrieval System. In: Cunningham, H., Hanbury, A., Rüger, S. (eds) Advances in Multidisciplinary Retrieval. IRFC 2010. Lecture Notes in Computer Science, vol 6107. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13084-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-13084-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13083-0
Online ISBN: 978-3-642-13084-7
eBook Packages: Computer ScienceComputer Science (R0)