Measuring Stability and Discrimination Power of Metrics in Information Retrieval Evaluation

Shi, Huaji; Tan, Yanzhi; Zhu, Xiaolong; Wu, Shengli

doi:10.1007/978-3-642-41278-3_2

Huaji Shi²⁴,
Yanzhi Tan²⁴,
Xiaolong Zhu²⁴ &
…
Shengli Wu²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8206))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

4850 Accesses
1 Citations

Abstract

Retrieval evaluation is always an important aspect in information retrieval (web search) and metrics are a key factor that needs to be carefully considered. In this paper, we propose a new method of measuring stability and discrimination power of a metric. The problem is initiated by Buckley and Voorhees. The advantage of the proposed method is that we are able to measure both aspects together in a systematic manner. Five metrics are tested in the study. They are average precision over all relevant documents, recall-level precision, normalized discount cumulative gain, precision at 10 documents level, and reciprocal rank. Experimental results show that normalized discount cumulative gain is the best, which is followed by average precision over all relevant documents, recall-level precision, precision at 10 documents level, while reciprocal rank is the worst.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proceedings of ACM SIGIR Conference, Athens, Greece, pp. 33–40 (July 2000)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 442–446 (2002)
Article Google Scholar
Lin, W., Hauptmann, A.: Revisiting the effect of topic set size on retrieval error. In: Proceedings of ACM SIGIR Conference, Salvador, Brazil, pp. 637–638 (August 2005)
Google Scholar
Robertson, S., Kanoulas, E.: On per-topic variance in IR evaluation. In: Proceedings of ACM SIGIR Conference, Portland, USA, pp. 891–900 (August 2012)
Google Scholar
Sakai, T.: Evaluating evaluation metrics based on the bootstrap. In: Proceedings of ACM SIGIR Conference, Seattle, USA, pp. 525–532 (August 2006)
Google Scholar
Sakai, T.: On the reliability of information retrieval metrics based on graded relevance. Information Processing & Management 43(2), 531–548 (2007)
Article Google Scholar
Voorhees, E.M., Buckley, C.: The effect of topic set size on retrieval experiment error. In: Proceedings of ACM SIGIR Conference, Tampere, Finland, pp. 316–323 (August 2002)
Google Scholar
Zhou, K., Cummins, R., Lalmas, M., Jose, J.: Evaluating aggregated search pages. In: Proceedings of ACM SIGIR Conference, Portland, USA, pp. 115–124 (August 2012)
Google Scholar
Zobel, J.: How reliable are the results of large-scale information retrieval experiments. In: Proceedings of ACM SIGIR Conference, Melbourne, Australia, pp. 307–314 (August 1998)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, China, 212013
Huaji Shi, Yanzhi Tan, Xiaolong Zhu & Shengli Wu

Authors

Huaji Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yanzhi Tan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shengli Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin
University of Science and Technology of China, Hefei, China
Ke Tang
Nanjing University, Nanjing, China
Yang Gao
Ostfalia University of Applied Sciences, 38302, Wolfenbüttel, Germany
Frank Klawonn
Kyungpook National University, 702-701, Buk-Gu, Daegu, Korea
Minho Lee
Nature Inspired Computational and Applications Laboratory, School of Computer Science and Technology,, University of Science and Technology of China, 230027, Hefei, China
Thomas Weise
University of Science and Technology of China, 230017, Hefei, China
Bin Li
CERCIA, School of Computer Science, University of Birmingham, B15 2TT, Edgbaston, Birmingham, UK
Xin Yao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, H., Tan, Y., Zhu, X., Wu, S. (2013). Measuring Stability and Discrimination Power of Metrics in Information Retrieval Evaluation. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2013. IDEAL 2013. Lecture Notes in Computer Science, vol 8206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41278-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-41278-3_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41277-6
Online ISBN: 978-3-642-41278-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics