Skip to main content

Relevance Judgments for Image Retrieval Evaluation

  • Chapter
ImageCLEF

Part of the book series: The Information Retrieval Series ((INRE,volume 32))

  • 1004 Accesses

Abstract

In this chapter, we review our experiences with the relevance judging process at ImageCLEF, using the medical retrieval task as a primary example. We begin with a historic perspective of the precursor to most modern retrieval evaluation campaigns, the Cranfield paradigm, as most modern system–based evaluation campaigns including ImageCLEF are modeled after it. We then briefly describe the stages in an evaluation campaign and provide details of the different aspects of the relevance judgment process. We summarize the recruitment process and describe the various systems used for judgment at ImageCLEF. We discuss the advantages and limitations of creating pools that are then judged by human experts. Finally, we discuss our experiences with the subjectivity of the relevance process and the relative robustness of the performance measures to variability in relevance judging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM press, New York, NY, USA, pp 25–32

    Google Scholar 

  • Buckley C, Dimmick D, Soboroff I, Voorhees E (2006) Bias and the limits of pooling. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM press, New York, NY, USA, pp 619–620

    Chapter  Google Scholar 

  • Cleverdon CW (1962) Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. Tech. rep., Aslib Cranfield Research Project, Cranfield, USA

    Google Scholar 

  • Cleverdon CW (1991) The significance of the cranfield tests on index languages. In: Proceedings of the 14th annual international ACM SIGIR conference on research and development in information retrieval. ACM press, pp 3–12

    Google Scholar 

  • Clough P, Sanderson M, Müller H (2004) The CLEF Cross Language Image Retrieval Track (ImageCLEF) 2004. In: Image and Video Retrieval (CIVR 2004). Lecture Notes in Computer Science (LNCS), vol 3115. Springer, pp 243–251

    Google Scholar 

  • Cormack GV, Palmer CR, Clarke CLA (1998) Efficient construction of large test collections. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM press, pp 282–289

    Google Scholar 

  • Di Nunzio G, Ferro N (2004) DIRECT: a system for evaluating information access components of digital libraries. In: Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science (LNCS), vol 3652. Springer, pp 483–484

    Google Scholar 

  • He B, Macdonald C, Ounis I (2008) Retrieval sensitivity under training using different measures. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM press, New York, NY, USA, pp 67–74

    Chapter  Google Scholar 

  • Hersh W, Buckley C, Leone TJ, Hickam D (1994) OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. ACM press, pp 192–201

    Google Scholar 

  • Hersh W, Turpin A, Price S, Chan B, Kramer D, Sacherek L, Olson D (2000) Do batch and user evaluations give the same results? In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM press, pp 17–24

    Google Scholar 

  • Hersh W, Müller H, Jensen J, Yang J, Gorman P, Ruch P (2006a) Advancing biomedical image retrieval: Development and analysis of a test collection. Journal of the American Medical Informatics Association 13(5):488–496

    Article  Google Scholar 

  • Hersh W, Müller H, Kalpathy-Cramer J, Kim E, Zhou X (2009) The consolidated ImageCLEFmed medical image retrieval task test collection. Journal of Digital Imaging 22(6):648–655

    Article  Google Scholar 

  • Hersh WR, Bhupatiraju RT, Ross L, Roberts P, Cohen AM, Kraemer DF (2006b) Enhancing access to the bibliome: the trec 2004 genomics track. Journal of Biomedical Discovery and Collaboration 1:3

    Article  Google Scholar 

  • Järvelin K, Kekäläinen J (2002) Cumulated gain–based evaluation of ir techniques. ACM Transactions of Information Systems 20(4):422–446

    Article  Google Scholar 

  • Müller H, Clough P, Hersh W, Geissbuhler A (2007) Variations of relevance assessments for medical image retrieval. In: Adaptive Multimedia Retrieval (AMR). Lecture Notes in Computer Science (LNCS), vol 4398. Springer, Geneva, Switzerland, pp 233–247

    Google Scholar 

  • Müller H, Deselaers T, Kim E, Kalpathy-Cramer J, Deserno TM, Clough PD, Hersh W (2008) Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks. In: CLEF 2007 Proceedings. Lecture Notes in Computer Science (LNCS), vol 5152. Springer, Budapest, Hungary, pp 473–491

    Google Scholar 

  • Müller H, Kalpathy-Cramer J, Eggel I, Bedrick S, Saïd R, Bakke B, Kahn Jr CE, Hersh W (2009) Overview of the CLEF 2009 medical image retrieval track. In: Working Notes of CLEF 2009, Corfu, Greece

    Google Scholar 

  • Nowak S, Rüger S (2010) How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the international conference on multimedia information retrieval (MIR 2010). ACM press, New York, NY, USA, pp 557–566

    Chapter  Google Scholar 

  • Russell B, Torralba A, Murphy K, Freeman W (2008) LabelMe: a database and web–based tool for image annotation. International Journal of Computer Vision 77(1–3):157–173

    Article  Google Scholar 

  • Sparck Jones K, van Rijsbergen C (1975) Report on the need for and provision of an ideal information retrieval test collection. British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge

    Google Scholar 

  • Tsikrika T, Kludas J (2009) Overview of the wikipediaMM task at ImageCLEF 2008. In: Peters C, Giampiccolo D, Ferro N, Petras V, Gonzalo J, Peñas A, Deselaers T, Mandl T, Jones G, Kurimo M (eds) Evaluating Systems for Multilingual and Multimodal Information Access — 9th Workshop of the Cross-Language Evaluation Forum. Lecture Notes in Computer Science (LNCS). Springer, Aarhus, Denmark

    Google Scholar 

  • Voorhees EM (2002) The philosophy of information retrieval evaluation. In: Revised Papers from the Second Workshop of the Cross–Language Evaluation Forum on Evaluation of Cross–Language Information Retrieval Systems — CLEF 2001. Lecture Notes in Computer Science (LNCS). Springer, London, UK, pp 355–370

    Google Scholar 

  • Voorhees EM, Harmann D (1998) Overview of the seventh Text REtrieval Conference (TREC–7). In: The Seventh Text Retrieval Conference, Gaithersburg, MD, USA, pp 1–23

    Google Scholar 

  • Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th ACM international conference on Information and knowledge management. ACM press, New York, NY, USA, pp 102–111

    Google Scholar 

  • Zobel J (1998) How reliable are the results of large–scale information retrieval experiments? In: Croft WB, Moffat A, van Rijsbergen CJ, Wilkinson R, Zobel J (eds) Proceedings of the 21st Annual International ACM SIGIR conference on research and development in information retrieval. ACM press, Melbourne, Australia, pp 307–314

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jayashree Kalpathy–Cramer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kalpathy–Cramer, J., Bedrick, S., Hersh, W. (2010). Relevance Judgments for Image Retrieval Evaluation. In: Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds) ImageCLEF. The Information Retrieval Series, vol 32. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15181-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15181-1_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15180-4

  • Online ISBN: 978-3-642-15181-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics