Test Collections and Evaluation Metrics Based on Graded Relevance

Järvelin, Kalervo

doi:10.1007/978-3-642-40087-2_27

Kalervo Järvelin²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7536))

691 Accesses
1 Citations

Abstract

In modern large information retrieval (IR) environments, the number of documents relevant to a request may easily exceed the number of documents a user is willing to examine. Therefore it is desirable to rank highly relevant documents first in search results. To develop retrieval methods for this purpose requires evaluating retrieval methods accordingly. However, the most IR method evaluations are based on rather liberal and binary relevance assessments. Therefore differences between sloppy and excellent IR methods may not be observed in evaluation. An alternative is to employ graded relevance assessments in evaluation. The present paper discusses graded relevance, test collections providing graded assessments, evaluation metrics based on graded relevance assessments. We shall also examine the effects of using graded relevance assessments in retrieval evaluation, and some evaluation results based on graded relevance. We find that graded relevance provides new insight into IR phenomena and affects the relative merits of IR methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blair, D.C., Maron, M.E.: An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM 28(3), 289–299 (1985)
Article Google Scholar
Harman, D.: Private communication on TREC relevance judgments (February 1, 2001)
Google Scholar
Hawking, D.: Overview of the TREC-9 Web Track. In: Voorhees, E., Harman, D. (eds.) The Ninth Text REtrieval Conference, TREC 9 (2011), http://trec.nist.gov/pubs/trec9/t9_proceedings.html (visited April 10, 2011)
Hersh, W.R., Hickam, D.H.: An evaluation of interactive Boolean and natural language searching with an online medical textbook. Journal of the American Society for Information Science 46(7), 478–489 (1995)
Article Google Scholar
Ingwersen, P., Järvelin, K.: The Turn: Integration of Information Seeking and Retrieval in Context. Springer (2005)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)
Article Google Scholar
Järvelin, K., Kekäläinen, J.: Discounted Cumulated Gain. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems. Springer (2009)
Google Scholar
Järvelin, K., Price, S.L., Delcambre, L.M.L., Nielsen, M.L.: Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 4–15. Springer, Heidelberg (2008)
Chapter Google Scholar
Järvelin, K.: Interactive Relevance Feedback with Graded Relevance and Sentence Extraction: Simulated User Experiments. In: Cheung, D., et al. (eds.) Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2053–2056. ACM, New York (2009)
Chapter Google Scholar
Kekäläinen, J.: Binary and graded relevance in IR evaluations - Comparison of the effects on ranking of IR systems. Information Processing & Management 41(5), 1019–1033 (2005)
Article Google Scholar
Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology 53(13), 1120–1129 (2002)
Article Google Scholar
Keskustalo, H., Järvelin, K., Pirkola, A.: The Effects of Relevance Feedback Quality and Quantity in Interactive Relevance Feedback: A Simulation Based on User Modeling. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 191–204. Springer, Heidelberg (2006)
Chapter Google Scholar
Keskustalo, H., Järvelin, K., Pirkola, A.: Evaluating the Effectiveness of Relevance Feedback Based on a User Simulation Model: Effects of a User Scenario on Cumulated Gain Value. Information Retrieval 11(5), 209–228 (2008)
Article Google Scholar
Keskustalo, H., Järvelin, K., Pirkola, A., Kekäläinen, J.: Intuition-Supporting Visualization of User’s Performance Based on Explicit Negative Higher-Order Relevance. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 675–682. ACM, New York (2008)
Google Scholar
Lehtokangas, R., Keskustalo, H., Järvelin, K.: Dictionary-based CLIR loses highly relevant documents. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 421–432. Springer, Heidelberg (2005)
Chapter Google Scholar
Lehtokangas, R., Keskustalo, H., Järvelin, K.: Experiments with Transitive Dictionary Translation and Pseudo-Relevance Feedback Using Graded Relevance Assessments. Journal of the American Society for Information Science and Technology 59(3), 476–488 (2008)
Article Google Scholar
NTCIR-4 WEB test collection, http://research.nii.ac.jp/ntcir/permission/ntcir-4/perm-en-WEB.html (visited April 10, 2011)
Pirkola, A.: The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: Proceedings of the 21st Annual International ACM Sigir Conference on Research and Development in Information Retrieval, pp. 55–63. ACM, New York (1998)
Chapter Google Scholar
Saracevic, T., Kantor, P., Chamis, A., Trivison, D.: A study of information seeking and retrieving. I. Background and methodology. Journal of the American Society for Information Science 39(3), 161–176 (1988)
Article Google Scholar
Sormunen, E.: A Method for measuring Wide Range Performance of Boolean Queries in Full-Text Databases. University of Tampere, Acta Electronica Universitatis Tamperensis (2000), http://acta.uta.fi/pdf/951-44-4732-8.pdf (visited April 10, 2011)
Sormunen, E.: Liberal relevance criteria of TREC—Counting on negligible documents? In: Beaulieu, M., Baeza-Yates, R., Myaeng, S.H., Järvelin, K. (eds.) Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 324–330. ACM, Tampere (2002)
Chapter Google Scholar
Tang, R., Shaw, W.M., Vevea, J.L.: Towards the identification of the optimal number of relevance categories. Journal of the American Society for Information Science 50(3), 254–264 (1999)
Article Google Scholar
Vakkari, P., Sormunen, E.: The Influence of Relevance Levels on the Effectiveness of Interactive Information Retrieval. Journal of the American Society for Information Science 55(11), 963–969 (2004)
Article Google Scholar
Voorhees, E.: Evaluation by Highly Relevant Documents. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 74–82. ACM, New Orleans (2001)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Sciences, University of Tampere, Finland
Kalervo Järvelin

Authors

Kalervo Järvelin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
Prasenjit Majumder
Indian Statistical Institute, Kolkata, India
Mandar Mitra
Indian Institutte of Technology, Bombay, India
Pushpak Bhattacharyya
IBM Research New Delhi, India
L. Venkata Subramaniam & Danish Contractor &
NLE Lab - ELiRF, Universitat Politècnica de València, Valencia, Spain
Paolo Rosso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Järvelin, K. (2013). Test Collections and Evaluation Metrics Based on Graded Relevance. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.V., Contractor, D., Rosso, P. (eds) Multilingual Information Access in South Asian Languages. Lecture Notes in Computer Science, vol 7536. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40087-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-40087-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40086-5
Online ISBN: 978-3-642-40087-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics