Skip to main content

Test Collections and Evaluation Metrics Based on Graded Relevance

  • Conference paper
Multilingual Information Access in South Asian Languages

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7536))

Abstract

In modern large information retrieval (IR) environments, the number of documents relevant to a request may easily exceed the number of documents a user is willing to examine. Therefore it is desirable to rank highly relevant documents first in search results. To develop retrieval methods for this purpose requires evaluating retrieval methods accordingly. However, the most IR method evaluations are based on rather liberal and binary relevance assessments. Therefore differences between sloppy and excellent IR methods may not be observed in evaluation. An alternative is to employ graded relevance assessments in evaluation. The present paper discusses graded relevance, test collections providing graded assessments, evaluation metrics based on graded relevance assessments. We shall also examine the effects of using graded relevance assessments in retrieval evaluation, and some evaluation results based on graded relevance. We find that graded relevance provides new insight into IR phenomena and affects the relative merits of IR methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blair, D.C., Maron, M.E.: An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM 28(3), 289–299 (1985)

    Article  Google Scholar 

  2. Harman, D.: Private communication on TREC relevance judgments (February 1, 2001)

    Google Scholar 

  3. Hawking, D.: Overview of the TREC-9 Web Track. In: Voorhees, E., Harman, D. (eds.) The Ninth Text REtrieval Conference, TREC 9 (2011), http://trec.nist.gov/pubs/trec9/t9_proceedings.html (visited April 10, 2011)

  4. Hersh, W.R., Hickam, D.H.: An evaluation of interactive Boolean and natural language searching with an online medical textbook. Journal of the American Society for Information Science 46(7), 478–489 (1995)

    Article  Google Scholar 

  5. Ingwersen, P., Järvelin, K.: The Turn: Integration of Information Seeking and Retrieval in Context. Springer (2005)

    Google Scholar 

  6. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)

    Article  Google Scholar 

  7. Järvelin, K., Kekäläinen, J.: Discounted Cumulated Gain. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems. Springer (2009)

    Google Scholar 

  8. Järvelin, K., Price, S.L., Delcambre, L.M.L., Nielsen, M.L.: Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 4–15. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Järvelin, K.: Interactive Relevance Feedback with Graded Relevance and Sentence Extraction: Simulated User Experiments. In: Cheung, D., et al. (eds.) Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2053–2056. ACM, New York (2009)

    Chapter  Google Scholar 

  10. Kekäläinen, J.: Binary and graded relevance in IR evaluations - Comparison of the effects on ranking of IR systems. Information Processing & Management 41(5), 1019–1033 (2005)

    Article  Google Scholar 

  11. Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology 53(13), 1120–1129 (2002)

    Article  Google Scholar 

  12. Keskustalo, H., Järvelin, K., Pirkola, A.: The Effects of Relevance Feedback Quality and Quantity in Interactive Relevance Feedback: A Simulation Based on User Modeling. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 191–204. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Keskustalo, H., Järvelin, K., Pirkola, A.: Evaluating the Effectiveness of Relevance Feedback Based on a User Simulation Model: Effects of a User Scenario on Cumulated Gain Value. Information Retrieval 11(5), 209–228 (2008)

    Article  Google Scholar 

  14. Keskustalo, H., Järvelin, K., Pirkola, A., Kekäläinen, J.: Intuition-Supporting Visualization of User’s Performance Based on Explicit Negative Higher-Order Relevance. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 675–682. ACM, New York (2008)

    Google Scholar 

  15. Lehtokangas, R., Keskustalo, H., Järvelin, K.: Dictionary-based CLIR loses highly relevant documents. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 421–432. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Lehtokangas, R., Keskustalo, H., Järvelin, K.: Experiments with Transitive Dictionary Translation and Pseudo-Relevance Feedback Using Graded Relevance Assessments. Journal of the American Society for Information Science and Technology 59(3), 476–488 (2008)

    Article  Google Scholar 

  17. NTCIR-4 WEB test collection, http://research.nii.ac.jp/ntcir/permission/ntcir-4/perm-en-WEB.html (visited April 10, 2011)

  18. Pirkola, A.: The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: Proceedings of the 21st Annual International ACM Sigir Conference on Research and Development in Information Retrieval, pp. 55–63. ACM, New York (1998)

    Chapter  Google Scholar 

  19. Saracevic, T., Kantor, P., Chamis, A., Trivison, D.: A study of information seeking and retrieving. I. Background and methodology. Journal of the American Society for Information Science 39(3), 161–176 (1988)

    Article  Google Scholar 

  20. Sormunen, E.: A Method for measuring Wide Range Performance of Boolean Queries in Full-Text Databases. University of Tampere, Acta Electronica Universitatis Tamperensis (2000), http://acta.uta.fi/pdf/951-44-4732-8.pdf (visited April 10, 2011)

  21. Sormunen, E.: Liberal relevance criteria of TREC—Counting on negligible documents? In: Beaulieu, M., Baeza-Yates, R., Myaeng, S.H., Järvelin, K. (eds.) Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 324–330. ACM, Tampere (2002)

    Chapter  Google Scholar 

  22. Tang, R., Shaw, W.M., Vevea, J.L.: Towards the identification of the optimal number of relevance categories. Journal of the American Society for Information Science 50(3), 254–264 (1999)

    Article  Google Scholar 

  23. Vakkari, P., Sormunen, E.: The Influence of Relevance Levels on the Effectiveness of Interactive Information Retrieval. Journal of the American Society for Information Science 55(11), 963–969 (2004)

    Article  Google Scholar 

  24. Voorhees, E.: Evaluation by Highly Relevant Documents. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 74–82. ACM, New Orleans (2001)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Järvelin, K. (2013). Test Collections and Evaluation Metrics Based on Graded Relevance. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.V., Contractor, D., Rosso, P. (eds) Multilingual Information Access in South Asian Languages. Lecture Notes in Computer Science, vol 7536. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40087-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40087-2_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40086-5

  • Online ISBN: 978-3-642-40087-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics