Abstract
Content-based image retrieval (CBIR) has been a very active research area for more than ten years. In the last few years the number of publications and retrieval systems produced has become larger and larger. Despite this, there is still no agreed objective way in which to compare the performance of any two of these systems. This fact is blocking the further development of the field since good or promising techniques can not be identified objectively, and the potential commercial success of CBIR systems is hindered because it is hard to establish the quality of an application.
We are thus in the position in which other research areas, such as text retrieval or the database systems, found themselves several years ago. To have serious applications, as well as commercial success, objective proof of system quality is needed: in text retrieval the TREC benchmark is a widely accepted performance measure; in the transaction processing field for databases it is the TPC benchmark that has wide support.
This paper describes a framework that enables the creation of a benchmark for CBIR. Parts of this framework have already been developed and systems can be evaluated against a small, freely-available database via a web interface. Much work remains to be done with respect to making available large, diverse image databases and obtaining relevance judgments for those large databases. We also need to establish an independent body, accepted by the entire community, that would organize a benchmarking event, give out official results and update the benchmark regularly. The Benchathlon could get this role if it manages to gain the confidence of the field. This should also prevent the negative effects, e.g., “benchmarketing”, experienced with other benchmarks, such as the TPC predecessors.
This paper sets out our ideas for an open framework for performance evaluation. We hope to stimulate discussion on evaluation in image retrieval so that systems can be compared on the same grounds. We also identify query paradigms beyond query by example (QBE) that may be integrated into a benchmarking framework, and we give examples of application-based benchmarking areas.
Similar content being viewed by others
References
G. Beretta and R. Schettini (Eds.), “Internet Imaging III,” in SPIE Proceedings. San Jose: California, USA, 2002, Vol. 4672, (SPIE Photonics West Conference).
P. Borlund and P. Ingwersen, “The development of a method for the evaluation of interactive information retrieval systems,” Journal of Documentation, Vol. 53, pp. 225–250, 1997.
C.W. Cleverdon, “Report on the testing and analysis of an investigation into the comparative efficiency pf indexing systems,” Technical Report, Aslib Cranfield Research Project, Cranfield, USA, 1962.
C.W. Cleverdon, L. Mills, and M. Keen, “Factors determining the performance of indexing systems,” Technical Report, ASLIB Cranfield Research Project, Cranfield, 1966.
I.J. Cox, M.L. Miller, S.M. Omohundro, and P.N. Yianilos, “Target testing and the PicHunter Bayesian multimedia retrieval system,” in Advances in Digital Libraries (ADL'96). Library of Congress, Washington, D.C., 1996, pp. 66–75.
A. Dimai, “Assessment of effectiveness of content-based image retrieval systems,” in Third International Conference on Visual Information Systems (VISUAL'99), D.P. Huijsmans and A.W.M. Smeulders (Eds.), Springer-Verlag: Amsterdam, The Netherlands, 1999.
J.G. Dy, C.E. Brodley, A. Kak, C.-R. Shyu, and L.S. Broderick, “The customized-queries approach to CBIR using EM,” in Proceedings of the 1999 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'99). Fort Collins, Colorado, USA, 1999, pp. 400–406.
J.P. Eakins, B.J.M., and M.E. Graham, “Similarity retrieval of trademark images,” IEEE Multimedia Magazine April June, 53–63, 1998.
N.J. Gunther and G. Beretta, “A benchmark for image retrieval using distributed systems over the internet: BIRDS-I,” Technical Report, HP Labs, Palo Alto, Technical Report HPL-2000-162, San Jose, 2001.
D. Harman, “Overview of the First Text REtrieval Conference (TREC-1),” in Proceedings of the First Text REtrieval Conference (TREC-1), Washington DC, USA, 1992, pp. 1–20.
D.P. Huijsmans and A.W.M. Smeulders (Eds.), “Third international conference on visual information systems (VISUAL'99),” No. 1614 in Lecture Notes in Computer Science. Springer-Verlag: Amsterdam, The Netherlands, 1999.
ICME'2001, “Proceedings of the Second International Conference on Multimedia and Exposition (ICME'2001),” Tokyo, Japan, IEEE, 2001.
C. Jörgensen, “Classifying images: Criteria for grouping as revealed in a sorting task,” in Proceedings of the6th ASIS SIG/CR Classification Research Workshop, Chicago, IL, USA, 1995, pp. 65–78.
C. Jörgensen and P. Jörgensen, “Testing a vocabulary for image indexing and ground truthing,” in SPIE Proceedings, G. Beretta and R. Schettini (Eds.), San Jose: California, USA, 2002, Vol. 4672, (SPIE Photonics West Conference).
M. Koskela, J. Laaksonen, S. Laakso, and E. Oja, “Evaluating the performance of content-based image retrieval systems,” in Fourth International Conference on Visual Information Systems (VISUAL'2000), R. Laurini (Ed.), Springer-Verlag, 2000.
R. Laurini (Ed.), “Fourth International Conference on Visual Information Systems (VISUAL'2000),” No. 1929 in Lecture Notes in Computer Science, Springer-Verlag: Lyon, France, 2000.
C. Leung and H. Ip, “Benchmarking for content-based visual information search,” in Fourth International Conference on Visual Information Systems (VISUAL'2000), R. Laurini (Ed.), Springer-Verlag, 2000.
M. Markkula and E. Sormunen, “‘Searching for photos—Journalists’ practices in pictorial IR,” in The Challenge of Image Retrieval, A Workshop and Symposium on Image Retrieval, J.P. Eakins, D.J. Harper, and J. Jose (Eds.), Newcastle upon Tyne, The British Computer Society, 1998.
F. Mokhtarian, S. Abbasi, and J. Kittler, “Efficient and robust retrieval by shape content through curvature scale space,” in Image Databases and Multi-Media Search, A.W.M. Smeulders and R. Jain (Eds.), Amsterdam University Press: Amsterdam, The Netherlands, 1996, pp. 35–42.
MPEG Requirements Group, “MPEG-7: Context and objectives (version 10 Atlantic City),” Doc. ISO/IEC JTC1/SC29/WG11, International Organisation for Standardisation, 1998.
H. Müller, W. Müller, S. Marchand-Maillet, D.M. Squire, and T. Pun, “Automated benchmarking in contentbased image retrieval,” in Proceedings of the Second International Conference on Multimedia and Exposition (ICME'2001),” Tokyo, Japan, IEEE, 2001a.
H. Müller, W. Müller, S. Marchand-Maillet, D.M. Squire, and T. Pun, “A web-based evaluation system for content-based image retrieval,” in Proceedings of the ACMMultimediaWorkshop on Multimedia Information Retrieval (ACM MIR 2001). The Association for Computing Machinery: Ottawa, Canada, 2001b, pp. 50–54.
H. Müller, W. Müller, D.M. Squire, S. Marchand-Maillet, and T. Pun, “Performance evaluation in contentbased image retrieval: Overview and proposals,” Pattern Recognition Letters, Vol. 22, No. 5, 2001c.
W. Müller, S. Marchand-Maillet, H. Müller, and T. Pun, “Towards a fair benchmark for image browsers,” in SPIE Photonics East, Voice, Video, and Data Communications. Boston, MA, USA, 2000.
W. Müller, Z. Pečenović, A.P. de Vries, D.M. Squire, H. Müller, and T. Pun, “MRML: Towards an extensible standard for multimedia querying and benchmarking—Draft proposal,” Technical Report 99.04, ComputerVision Group, Computing Centre, University of Geneva, rue Général Dufour, 24, CH-1211 Genève, Switzerland, 1999.
M. Nakazato and T.S. Huang, “3D Mars: Immersive virtual reality for content-based image retrieval,” in Proceedings of the Second International Conference on Multimedia and Exposition (ICME'2001),” Tokyo, Japan, IEEE, 2001, pp. 45–48.
A.D. Narasimhalu, M.S. Kankanhalli, and J. Wu, “Benchmarking multimedia databases,” Multimedia Tools and Applications, Vol. 4, pp. 333–356, 1997.
T. Pfund and S. Marchand-Maillet, “Dynamic multimedia annotation tool,” in SPIE Proceedings, G. Beretta and R. Schettini (Eds.), San Jose: California, USA, 2002, Vol. 4672, (SPIE Photonics West Conference).
P.S. Salembier and B.S. Manjunath, “Audiovisual content description and retrieval: Tools and MPEG-7 standardization techniques,” in IEEE Internation Conference on Image Processing (ICIP 2000). Vancouver, BC, Canada, 2000.
G. Salton, The SMART Retrieval System, Experiments in Automatic Document Processing. Prentice Hall: Englewood Cliffs, NJ, USA, 1971.
C.-R. Shyu, A. Kak, C. Brodley, and L.S. Broderick, “Testing for human perceptual categories in a physician-in-the-loop CBIR system for medical imagery,” in IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99). Fort Collins, CO, USA, 1999, pp. 102–108.
J.R. Smith, “Image retrieval evaluation,” in IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'98). Santa Barbara, CA, USA, 1998, pp. 112–113.
K. Sparck Jones and C. van Rijsbergen, “Report on the need for and provision of an ideal information retrieval test collection,” British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge, 1975.
D.M. Squire, W. Müller, and H. Müller, “Relevance feedback and term weighting schemes for contentbased image retrieval,” in Third International Conference on Visual Information Systems (VISUAL'99), D.P. Huijsmans and A.W.M. Smeulders (Eds.), Springer-Verlag: Amsterdam, The Netherlands, 1999, pp. 549–556.
D.M. Squire and T. Pun, “A comparison of human and machine assessments of image similarity for the organization of image databases,” in The 10th Scandinavian Conference on Image Analysis (SCIA'97), M. Frydrych, J. Parkkinen, and A. Visa (Eds.), Pattern Recognition Society of Finland: Lappeenranta, Finland, 1997, pp. 51–58.
E.M. Vorhees and D. Harmann, “Overview of the seventh Text REtrieval Conference (TREC-7),” in The Seventh Text Retrieval Conference. Gaithersburg, MD, USA, 1998, pp. 1–23.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Müller, H., Müller, W., Marchand-Maillet, S. et al. A Framework for Benchmarking in CBIR. Multimedia Tools and Applications 21, 55–73 (2003). https://doi.org/10.1023/A:1025034215859
Issue Date:
DOI: https://doi.org/10.1023/A:1025034215859