Advertisement

Multimedia Tools and Applications

, Volume 21, Issue 1, pp 55–73 | Cite as

A Framework for Benchmarking in CBIR

  • Henning Müller
  • Wolfgang Müller
  • Stéphane Marchand-Maillet
  • Thierry Pun
  • David McG. Squire
Article

Abstract

Content-based image retrieval (CBIR) has been a very active research area for more than ten years. In the last few years the number of publications and retrieval systems produced has become larger and larger. Despite this, there is still no agreed objective way in which to compare the performance of any two of these systems. This fact is blocking the further development of the field since good or promising techniques can not be identified objectively, and the potential commercial success of CBIR systems is hindered because it is hard to establish the quality of an application.

We are thus in the position in which other research areas, such as text retrieval or the database systems, found themselves several years ago. To have serious applications, as well as commercial success, objective proof of system quality is needed: in text retrieval the TREC benchmark is a widely accepted performance measure; in the transaction processing field for databases it is the TPC benchmark that has wide support.

This paper describes a framework that enables the creation of a benchmark for CBIR. Parts of this framework have already been developed and systems can be evaluated against a small, freely-available database via a web interface. Much work remains to be done with respect to making available large, diverse image databases and obtaining relevance judgments for those large databases. We also need to establish an independent body, accepted by the entire community, that would organize a benchmarking event, give out official results and update the benchmark regularly. The Benchathlon could get this role if it manages to gain the confidence of the field. This should also prevent the negative effects, e.g., “benchmarketing”, experienced with other benchmarks, such as the TPC predecessors.

This paper sets out our ideas for an open framework for performance evaluation. We hope to stimulate discussion on evaluation in image retrieval so that systems can be compared on the same grounds. We also identify query paradigms beyond query by example (QBE) that may be integrated into a benchmarking framework, and we give examples of application-based benchmarking areas.

evaluation content-based image retrieval benchmarking Benchathlon TREC 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    G. Beretta and R. Schettini (Eds.), “Internet Imaging III,” in SPIE Proceedings. San Jose: California, USA, 2002, Vol. 4672, (SPIE Photonics West Conference).Google Scholar
  2. 2.
    P. Borlund and P. Ingwersen, “The development of a method for the evaluation of interactive information retrieval systems,” Journal of Documentation, Vol. 53, pp. 225–250, 1997.Google Scholar
  3. 3.
    C.W. Cleverdon, “Report on the testing and analysis of an investigation into the comparative efficiency pf indexing systems,” Technical Report, Aslib Cranfield Research Project, Cranfield, USA, 1962.Google Scholar
  4. 4.
    C.W. Cleverdon, L. Mills, and M. Keen, “Factors determining the performance of indexing systems,” Technical Report, ASLIB Cranfield Research Project, Cranfield, 1966.Google Scholar
  5. 5.
    I.J. Cox, M.L. Miller, S.M. Omohundro, and P.N. Yianilos, “Target testing and the PicHunter Bayesian multimedia retrieval system,” in Advances in Digital Libraries (ADL'96). Library of Congress, Washington, D.C., 1996, pp. 66–75.Google Scholar
  6. 6.
    A. Dimai, “Assessment of effectiveness of content-based image retrieval systems,” in Third International Conference on Visual Information Systems (VISUAL'99), D.P. Huijsmans and A.W.M. Smeulders (Eds.), Springer-Verlag: Amsterdam, The Netherlands, 1999.Google Scholar
  7. 7.
    J.G. Dy, C.E. Brodley, A. Kak, C.-R. Shyu, and L.S. Broderick, “The customized-queries approach to CBIR using EM,” in Proceedings of the 1999 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'99). Fort Collins, Colorado, USA, 1999, pp. 400–406.Google Scholar
  8. 8.
    J.P. Eakins, B.J.M., and M.E. Graham, “Similarity retrieval of trademark images,” IEEE Multimedia Magazine April June, 53–63, 1998.Google Scholar
  9. 9.
    N.J. Gunther and G. Beretta, “A benchmark for image retrieval using distributed systems over the internet: BIRDS-I,” Technical Report, HP Labs, Palo Alto, Technical Report HPL-2000-162, San Jose, 2001.Google Scholar
  10. 10.
    D. Harman, “Overview of the First Text REtrieval Conference (TREC-1),” in Proceedings of the First Text REtrieval Conference (TREC-1), Washington DC, USA, 1992, pp. 1–20.Google Scholar
  11. 11.
    D.P. Huijsmans and A.W.M. Smeulders (Eds.), “Third international conference on visual information systems (VISUAL'99),” No. 1614 in Lecture Notes in Computer Science. Springer-Verlag: Amsterdam, The Netherlands, 1999.Google Scholar
  12. 12.
    ICME'2001, “Proceedings of the Second International Conference on Multimedia and Exposition (ICME'2001),” Tokyo, Japan, IEEE, 2001.Google Scholar
  13. 13.
    C. Jörgensen, “Classifying images: Criteria for grouping as revealed in a sorting task,” in Proceedings of the6th ASIS SIG/CR Classification Research Workshop, Chicago, IL, USA, 1995, pp. 65–78.Google Scholar
  14. 14.
    C. Jörgensen and P. Jörgensen, “Testing a vocabulary for image indexing and ground truthing,” in SPIE Proceedings, G. Beretta and R. Schettini (Eds.), San Jose: California, USA, 2002, Vol. 4672, (SPIE Photonics West Conference).Google Scholar
  15. 15.
    M. Koskela, J. Laaksonen, S. Laakso, and E. Oja, “Evaluating the performance of content-based image retrieval systems,” in Fourth International Conference on Visual Information Systems (VISUAL'2000), R. Laurini (Ed.), Springer-Verlag, 2000.Google Scholar
  16. 16.
    R. Laurini (Ed.), “Fourth International Conference on Visual Information Systems (VISUAL'2000),” No. 1929 in Lecture Notes in Computer Science, Springer-Verlag: Lyon, France, 2000.Google Scholar
  17. 17.
    C. Leung and H. Ip, “Benchmarking for content-based visual information search,” in Fourth International Conference on Visual Information Systems (VISUAL'2000), R. Laurini (Ed.), Springer-Verlag, 2000.Google Scholar
  18. 18.
    M. Markkula and E. Sormunen, “‘Searching for photos—Journalists’ practices in pictorial IR,” in The Challenge of Image Retrieval, A Workshop and Symposium on Image Retrieval, J.P. Eakins, D.J. Harper, and J. Jose (Eds.), Newcastle upon Tyne, The British Computer Society, 1998.Google Scholar
  19. 19.
    F. Mokhtarian, S. Abbasi, and J. Kittler, “Efficient and robust retrieval by shape content through curvature scale space,” in Image Databases and Multi-Media Search, A.W.M. Smeulders and R. Jain (Eds.), Amsterdam University Press: Amsterdam, The Netherlands, 1996, pp. 35–42.Google Scholar
  20. 20.
    MPEG Requirements Group, “MPEG-7: Context and objectives (version 10 Atlantic City),” Doc. ISO/IEC JTC1/SC29/WG11, International Organisation for Standardisation, 1998.Google Scholar
  21. 21.
    H. Müller, W. Müller, S. Marchand-Maillet, D.M. Squire, and T. Pun, “Automated benchmarking in contentbased image retrieval,” in Proceedings of the Second International Conference on Multimedia and Exposition (ICME'2001),” Tokyo, Japan, IEEE, 2001a.Google Scholar
  22. 22.
    H. Müller, W. Müller, S. Marchand-Maillet, D.M. Squire, and T. Pun, “A web-based evaluation system for content-based image retrieval,” in Proceedings of the ACMMultimediaWorkshop on Multimedia Information Retrieval (ACM MIR 2001). The Association for Computing Machinery: Ottawa, Canada, 2001b, pp. 50–54.Google Scholar
  23. 23.
    H. Müller, W. Müller, D.M. Squire, S. Marchand-Maillet, and T. Pun, “Performance evaluation in contentbased image retrieval: Overview and proposals,” Pattern Recognition Letters, Vol. 22, No. 5, 2001c.Google Scholar
  24. 24.
    W. Müller, S. Marchand-Maillet, H. Müller, and T. Pun, “Towards a fair benchmark for image browsers,” in SPIE Photonics East, Voice, Video, and Data Communications. Boston, MA, USA, 2000.Google Scholar
  25. 25.
    W. Müller, Z. Pečenović, A.P. de Vries, D.M. Squire, H. Müller, and T. Pun, “MRML: Towards an extensible standard for multimedia querying and benchmarking—Draft proposal,” Technical Report 99.04, ComputerVision Group, Computing Centre, University of Geneva, rue Général Dufour, 24, CH-1211 Genève, Switzerland, 1999.Google Scholar
  26. 26.
    M. Nakazato and T.S. Huang, “3D Mars: Immersive virtual reality for content-based image retrieval,” in Proceedings of the Second International Conference on Multimedia and Exposition (ICME'2001),” Tokyo, Japan, IEEE, 2001, pp. 45–48.Google Scholar
  27. 27.
    A.D. Narasimhalu, M.S. Kankanhalli, and J. Wu, “Benchmarking multimedia databases,” Multimedia Tools and Applications, Vol. 4, pp. 333–356, 1997.Google Scholar
  28. 28.
    T. Pfund and S. Marchand-Maillet, “Dynamic multimedia annotation tool,” in SPIE Proceedings, G. Beretta and R. Schettini (Eds.), San Jose: California, USA, 2002, Vol. 4672, (SPIE Photonics West Conference).Google Scholar
  29. 29.
    P.S. Salembier and B.S. Manjunath, “Audiovisual content description and retrieval: Tools and MPEG-7 standardization techniques,” in IEEE Internation Conference on Image Processing (ICIP 2000). Vancouver, BC, Canada, 2000.Google Scholar
  30. 30.
    G. Salton, The SMART Retrieval System, Experiments in Automatic Document Processing. Prentice Hall: Englewood Cliffs, NJ, USA, 1971.Google Scholar
  31. 31.
    C.-R. Shyu, A. Kak, C. Brodley, and L.S. Broderick, “Testing for human perceptual categories in a physician-in-the-loop CBIR system for medical imagery,” in IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99). Fort Collins, CO, USA, 1999, pp. 102–108.Google Scholar
  32. 32.
    J.R. Smith, “Image retrieval evaluation,” in IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'98). Santa Barbara, CA, USA, 1998, pp. 112–113.Google Scholar
  33. 33.
    K. Sparck Jones and C. van Rijsbergen, “Report on the need for and provision of an ideal information retrieval test collection,” British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge, 1975.Google Scholar
  34. 34.
    D.M. Squire, W. Müller, and H. Müller, “Relevance feedback and term weighting schemes for contentbased image retrieval,” in Third International Conference on Visual Information Systems (VISUAL'99), D.P. Huijsmans and A.W.M. Smeulders (Eds.), Springer-Verlag: Amsterdam, The Netherlands, 1999, pp. 549–556.Google Scholar
  35. 35.
    D.M. Squire and T. Pun, “A comparison of human and machine assessments of image similarity for the organization of image databases,” in The 10th Scandinavian Conference on Image Analysis (SCIA'97), M. Frydrych, J. Parkkinen, and A. Visa (Eds.), Pattern Recognition Society of Finland: Lappeenranta, Finland, 1997, pp. 51–58.Google Scholar
  36. 36.
    E.M. Vorhees and D. Harmann, “Overview of the seventh Text REtrieval Conference (TREC-7),” in The Seventh Text Retrieval Conference. Gaithersburg, MD, USA, 1998, pp. 1–23.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Henning Müller
    • 1
  • Wolfgang Müller
    • 1
  • Stéphane Marchand-Maillet
    • 1
  • Thierry Pun
    • 1
  • David McG. Squire
    • 2
  1. 1.Vision GroupUniversity of GenevaSwitzerland
  2. 2.CSSEMonash UniversityMelbourneAustralia

Personalised recommendations