The Reliability of Metrics Based on Graded Relevance

Sakai, Tetsuya

doi:10.1007/11562382_1

Tetsuya Sakai²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Asia Information Retrieval Symposium

1032 Accesses
4 Citations

Abstract

This paper compares 14 metrics designed for information retrieval evaluation with graded relevance, together with 10 traditional metrics based on binary relevance, in terms of reliability and resemblance of system rankings. More specifically, we use two test collections with submitted runs from the Chinese IR and English IR tasks in the NTCIR-3 CLIR track to examine the metrics using methods proposed by Buckley/Voorhees and Voorhees/Buckley as well as Kendall’s rank correlation. Our results show that AnDCG_l and nDCG_l ((Average) Normalised Discounted Cumulative Gain at Document cut-off l) are good metrics, provided that l is large. However, if one wants to avoid the parameter l altogether, or if one requires a metric that closely resembles TREC Average Precision, then Q-measure appears to be the best choice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buckley, C., Voorhees, E.M.: Evaluating Evaluation Measure Stability. In: ACM SIGIR 2000 Proceedings, pp. 33–40 (2000)
Google Scholar
Chen, K.-H., et al.: Overview of CLIR Task at the Third NTCIR Workshop. In: NTCIR-3 Proceedings (2003)
Google Scholar
Della Mea, V., Mizzaro, S.: Measuring Retrieval Effectiveness: A New Proposal and a First Experimental Validation. Journal of the American Society for Information Science and Technology 55(6), 530–543 (2004)
Article Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)
Article Google Scholar
Kekäläinen, J.: Binary and Graded Relevance in IR Evaluations – Comparison of the Effects on Ranking of IR Systems. Information Processing and Management 41, 1019–1033 (2005)
Article Google Scholar
Sakai, T.: Average Gain Ratio: A Simple Retrieval Performance Measure for Evaluation with Multiple Relevance Levels. In: ACM SIGIR 2003 Proceedings, pp. 417–418 (2003)
Google Scholar
Sakai, T.: New Performance Metrics based on Multigrade Relevance: Their Application to Question Answering. In: NTCIR-4 Proceedings (2004), http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings/OPEN/NTCIR4-OPEN-SakaiTrev.pdf
Sakai, T.: Ranking the NTCIR Systems based on Multigrade Relevance. In: Myaeng, S.-H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 251–262. Springer, Heidelberg (2005)
Chapter Google Scholar
Sakai, T.: A Note on the Reliability of Japanese Question Answering Evaluation. IPSJ SIG Technical Reports FI-77-7, 57–64 (2004)
Google Scholar
Sakai, T.: The Effect of Topic Sampling in Sensitivity Comparisons of Information Retrieval Metrics. IPSJ SIG Technical Reports FI-80/NL-169 (2005) (to appear)
Google Scholar
Soboroff, I., Voorhees, E.: private communication (2005)
Google Scholar
Voorhees, E.M., Buckley, C.: The Effect of Topic Set Size on Retrieval Experiment Error. In: ACM SIGIR 2002 Proceedings, pp. 316–323 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Toshiba Corporate R&D Center, Kawasaki, 212-8582, Japan
Tetsuya Sakai

Authors

Tetsuya Sakai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
Computer and Communication Media Research, NEC Corp., Miyazaki 4-1-1, Miyamae-ku, 216-8555, Kawasaki, Japan
Akio Yamada
Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Helen Meng
School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, 305-732, Daejeon, Korea
Sung Hyon Myaeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sakai, T. (2005). The Reliability of Metrics Based on Graded Relevance. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_1

Download citation

DOI: https://doi.org/10.1007/11562382_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics