Relevance Judgments for Image Retrieval Evaluation

Kalpathy–Cramer, Jayashree; Bedrick, Steven; Hersh, William

doi:10.1007/978-3-642-15181-1_4

Jayashree Kalpathy–Cramer⁵,
Steven Bedrick⁵ &
William Hersh⁵

Part of the book series: The Information Retrieval Series ((INRE,volume 32))

1004 Accesses

Abstract

In this chapter, we review our experiences with the relevance judging process at ImageCLEF, using the medical retrieval task as a primary example. We begin with a historic perspective of the precursor to most modern retrieval evaluation campaigns, the Cranfield paradigm, as most modern system–based evaluation campaigns including ImageCLEF are modeled after it. We then briefly describe the stages in an evaluation campaign and provide details of the different aspects of the relevance judgment process. We summarize the recruitment process and describe the various systems used for judgment at ImageCLEF. We discuss the advantages and limitations of creating pools that are then judged by human experts. Finally, we discuss our experiences with the subjectivity of the relevance process and the relative robustness of the performance measures to variability in relevance judging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM press, New York, NY, USA, pp 25–32
Google Scholar
Buckley C, Dimmick D, Soboroff I, Voorhees E (2006) Bias and the limits of pooling. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM press, New York, NY, USA, pp 619–620
Chapter Google Scholar
Cleverdon CW (1962) Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. Tech. rep., Aslib Cranfield Research Project, Cranfield, USA
Google Scholar
Cleverdon CW (1991) The significance of the cranfield tests on index languages. In: Proceedings of the 14th annual international ACM SIGIR conference on research and development in information retrieval. ACM press, pp 3–12
Google Scholar
Clough P, Sanderson M, Müller H (2004) The CLEF Cross Language Image Retrieval Track (ImageCLEF) 2004. In: Image and Video Retrieval (CIVR 2004). Lecture Notes in Computer Science (LNCS), vol 3115. Springer, pp 243–251
Google Scholar
Cormack GV, Palmer CR, Clarke CLA (1998) Efficient construction of large test collections. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM press, pp 282–289
Google Scholar
Di Nunzio G, Ferro N (2004) DIRECT: a system for evaluating information access components of digital libraries. In: Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science (LNCS), vol 3652. Springer, pp 483–484
Google Scholar
He B, Macdonald C, Ounis I (2008) Retrieval sensitivity under training using different measures. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM press, New York, NY, USA, pp 67–74
Chapter Google Scholar
Hersh W, Buckley C, Leone TJ, Hickam D (1994) OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. ACM press, pp 192–201
Google Scholar
Hersh W, Turpin A, Price S, Chan B, Kramer D, Sacherek L, Olson D (2000) Do batch and user evaluations give the same results? In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM press, pp 17–24
Google Scholar
Hersh W, Müller H, Jensen J, Yang J, Gorman P, Ruch P (2006a) Advancing biomedical image retrieval: Development and analysis of a test collection. Journal of the American Medical Informatics Association 13(5):488–496
Article Google Scholar
Hersh W, Müller H, Kalpathy-Cramer J, Kim E, Zhou X (2009) The consolidated ImageCLEFmed medical image retrieval task test collection. Journal of Digital Imaging 22(6):648–655
Article Google Scholar
Hersh WR, Bhupatiraju RT, Ross L, Roberts P, Cohen AM, Kraemer DF (2006b) Enhancing access to the bibliome: the trec 2004 genomics track. Journal of Biomedical Discovery and Collaboration 1:3
Article Google Scholar
Järvelin K, Kekäläinen J (2002) Cumulated gain–based evaluation of ir techniques. ACM Transactions of Information Systems 20(4):422–446
Article Google Scholar
Müller H, Clough P, Hersh W, Geissbuhler A (2007) Variations of relevance assessments for medical image retrieval. In: Adaptive Multimedia Retrieval (AMR). Lecture Notes in Computer Science (LNCS), vol 4398. Springer, Geneva, Switzerland, pp 233–247
Google Scholar
Müller H, Deselaers T, Kim E, Kalpathy-Cramer J, Deserno TM, Clough PD, Hersh W (2008) Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks. In: CLEF 2007 Proceedings. Lecture Notes in Computer Science (LNCS), vol 5152. Springer, Budapest, Hungary, pp 473–491
Google Scholar
Müller H, Kalpathy-Cramer J, Eggel I, Bedrick S, Saïd R, Bakke B, Kahn Jr CE, Hersh W (2009) Overview of the CLEF 2009 medical image retrieval track. In: Working Notes of CLEF 2009, Corfu, Greece
Google Scholar
Nowak S, Rüger S (2010) How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the international conference on multimedia information retrieval (MIR 2010). ACM press, New York, NY, USA, pp 557–566
Chapter Google Scholar
Russell B, Torralba A, Murphy K, Freeman W (2008) LabelMe: a database and web–based tool for image annotation. International Journal of Computer Vision 77(1–3):157–173
Article Google Scholar
Sparck Jones K, van Rijsbergen C (1975) Report on the need for and provision of an ideal information retrieval test collection. British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge
Google Scholar
Tsikrika T, Kludas J (2009) Overview of the wikipediaMM task at ImageCLEF 2008. In: Peters C, Giampiccolo D, Ferro N, Petras V, Gonzalo J, Peñas A, Deselaers T, Mandl T, Jones G, Kurimo M (eds) Evaluating Systems for Multilingual and Multimodal Information Access — 9th Workshop of the Cross-Language Evaluation Forum. Lecture Notes in Computer Science (LNCS). Springer, Aarhus, Denmark
Google Scholar
Voorhees EM (2002) The philosophy of information retrieval evaluation. In: Revised Papers from the Second Workshop of the Cross–Language Evaluation Forum on Evaluation of Cross–Language Information Retrieval Systems — CLEF 2001. Lecture Notes in Computer Science (LNCS). Springer, London, UK, pp 355–370
Google Scholar
Voorhees EM, Harmann D (1998) Overview of the seventh Text REtrieval Conference (TREC–7). In: The Seventh Text Retrieval Conference, Gaithersburg, MD, USA, pp 1–23
Google Scholar
Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th ACM international conference on Information and knowledge management. ACM press, New York, NY, USA, pp 102–111
Google Scholar
Zobel J (1998) How reliable are the results of large–scale information retrieval experiments? In: Croft WB, Moffat A, van Rijsbergen CJ, Wilkinson R, Zobel J (eds) Proceedings of the 21st Annual International ACM SIGIR conference on research and development in information retrieval. ACM press, Melbourne, Australia, pp 307–314
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Oregon Health & Science University, Portland, OR, USA
Jayashree Kalpathy–Cramer, Steven Bedrick & William Hersh

Authors

Jayashree Kalpathy–Cramer
View author publications
You can also search for this author in PubMed Google Scholar
Steven Bedrick
View author publications
You can also search for this author in PubMed Google Scholar
William Hersh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jayashree Kalpathy–Cramer .

Editor information

Editors and Affiliations

HES-SO Business Information Systems, TechnoArk 3, Sierre, 3960, Switzerland
Henning Müller
Dept. Information Studies, University of Sheffield, Portobello Street 211, Sheffield, S1 4DP, United Kingdom
Paul Clough
, Computer Vision Lab/ETF-C 113.2, ETH Zürich, Zürich, 8092, Switzerland
Thomas Deselaers
Idiap Research Institute, rue Marconi 19, Martigny, 1920, Switzerland
Barbara Caputo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kalpathy–Cramer, J., Bedrick, S., Hersh, W. (2010). Relevance Judgments for Image Retrieval Evaluation. In: Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds) ImageCLEF. The Information Retrieval Series, vol 32. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15181-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-15181-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15180-4
Online ISBN: 978-3-642-15181-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics