Skip to main content
Log in

Descriptor optimization for multimedia indexing and retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose and evaluate a method for optimizing descriptors used for content-based multimedia indexing and retrieval. A large variety of descriptors are commonly used for this purpose. However, the most efficient ones often have characteristics preventing them to be easily used in large scale systems. They may have very high dimensionality (up to tens of thousands dimensions) and/or be suited for a distance which is costly to compute (e.g. χ 2). The proposed method combines a PCA-based dimensionality reduction with pre- and post-PCA non-linear transformations. The resulting transformation is globally optimized. The produced descriptors have a much lower dimensionality while performing at least as well, and often significantly better, with the Euclidean distance than the original high dimensionality descriptors with their optimal distance. Our approach also includes a hyper-parameter optimization procedure based on the use of a fast kNN classifier and on a polynomial fit to overcome the MAP metric instability. The method has been validated and evaluated on a variety of descriptors using the TRECVid 2010 semantic indexing task data. It has been applied at large scale for the TRECVid 2012 semantic indexing task on tens of descriptors of various types and with initial dimensionalities ranging from 15 up to 32,768. The same transformation can be used also for multimedia retrieval in the context of query by example and/or of relevance feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Both “feature” and “descriptor” are often used for designating the data extracted from images and videos for abstracting their content. “Signature” or other terms are also sometimes used. In this paper, we shall use the term “descriptor” for this purpose in all cases, whether the extracted data is local, e.g. SIFT point descriptor, intermediate or global, e.g. directly computed global image statistics or aggregations of local descriptors via the bag of visual words or Fisher vectors approaches

  2. We call these “hyper-parameters” instead of simply “parameters” because we feel that they are rather at the level of what is called hyper-parameters in the control of a classifier, e.g. C and γ in RBF SVMs, rather than at the level of the “regular” parameters in the same context, e.g. the α i weights associated to the support vectors, even though there are no regular parameters against which they could be opposed to in our process.

  3. Grid’5000 is a scientific instrument designed to support experiment-driven research in all areas of computer science related to parallel, large-scale or distributed computing and networking, https://www.grid5000.fr

References

  1. Ballas N, Labbé B, Shabou A, Le Borgne H, Gosselin P, Redi M, Merialdo B, Jégou H, Delhumeau J, Vieux R, Mansencal B, Benois-Pineau J, Ayache S, Hamadi A, Safadi B, Thollard F, Derbas N, Quénot G, Bredin H, Cord M, Gao B, Zhu C, Tang Y, Dellandrea E, Bichot C E, Chen L, Benoît A, Lambert P, Strat T, Razik J, Paris S, Glotin H, Ngoc Trung T, Petrovska Delacrétaz D, Chollet G, Stoian A, Crucianu M (2012) IRIM at TRECVID 2012: Semantic indexing and instance search. In: Proceedings TRECVID workshop. Gaithersburg, MD

  2. Bishop C M (2007) Pattern recognition and machine learning (Information science and statistics), 1 edn. Springer

  3. Chang C , Lin CJ (2011) LIBSVM: A library for support vector machines.ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

    Article  Google Scholar 

  4. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  5. Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. ECCV, pp 1–22

  6. Gorisse D, Cord M, Precioso F (2011) Salsas: Sub-linear active learning strategy with approximate k-nn search. Pattern Recog 44(10–11):2343–2357

    Article  MATH  Google Scholar 

  7. Gorisse D, Precioso F, Gosselin P, Granjon L, Pellerin D, Rombaut M, Bredin H, Koenig L, Lachambre H, El Khoury E, Vieux R, Mansencal B, Zhou Y, Benois-Pineau J, Jégou H, Ayache S, Safadi B, Tong Y, Thollard F, Quénot G, Benoit A, Lambert P (2010) IRIM at TRECVID 2010: High level feature extraction and instance search. In: TREC video retrieval evaluation workshop. National institute of standards and technology. Gaithersburg, MD USA

  8. Hamadi A, Quénot G, Mulhem P (2012) Two-layers re-ranking approach based on contextual information for visual concepts detection in videos. In: CBMI, pp 1–6

  9. Hinton G E, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507. doi: 10.1126/science.1127647

    Article  MATH  MathSciNet  Google Scholar 

  10. Jégou H, Chum O (2012) Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening. In: ECCV - European conference on computer vision. Firenze, Italie

  11. Jégou H, Douze M, Schmid C (2009) On the burstiness of visual elements. In: IEEE conference on computer vision and pattern recognition (CVPR ’09), pp 1169–1176. http://hal.inria.fr/inria-00394211. doi: 10.1109/CVPRW.2009.5206609

  12. Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 23rd IEEE conference on computer vision & pattern recognition (CVPR ’10), pp 3304–3311. IEEE Computer Society, San Francisco. doi: 10.1109/CVPR.2010.5540039

  13. Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2011) Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence

  14. Kramer M A (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE J 37:233–243

    Article  Google Scholar 

  15. Lee H, Battle A, Raina R, Ng AY (2007) Efficient sparse coding algorithms. In: NIPS, pp 801–808. NIPS

  16. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  17. Mahalanobis PC (1936) On the generalised distance in statistics. Proc. Int Inst Sci, India 2(1):49–55

    MATH  MathSciNet  Google Scholar 

  18. Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. Int J Comput Vis 42:145–175

    Article  MATH  Google Scholar 

  19. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: ECCV (4), pp. 143–156

  20. Redi M, Merialdo BSaliency moments for image categorization. In: Proceedings of the 1st ACM international conference on multimedia retrieval, ICMR ’11, pp 39:1–39:8. ACM: New York

  21. Safadi B, Derbas N, Hamadi A, Thollard F, Georges Q, Delhumeau J, Jégou H, Gehrig T, Kemal Ekenel H, Stifelhagen R (2012) Quaero at TRECVID 2012: Semantic indexing. In: Proceedings TRECVID workshop. Gaithersburg, MD

  22. Safadi B, Quénot G (2010) Evaluations of multi-learners approaches for concepts indexing in video documents. In: RIAO. Paris, France

  23. Safadi B, Quénot G (2011) Re-ranking by local re-scoring for video indexing and retrieval. In: Proceedings of the 20th ACM conference on information and knowledge management (CIKM), pp 2081–2084. Glasgow, United Kingdom

  24. Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: Theory and practice. Int J Comput Vis 105(3):222–245. doi: 10.1007/s11263-013-0636-x

    Article  MATH  MathSciNet  Google Scholar 

  25. Van de Sande K E A, Gevers T, Snoek C G M (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596

    Article  Google Scholar 

  26. Sivic J, Zisserman A (2003) A text retrieval approach to object matching in videos. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, vol 2, ICCV ’03, pp 1470. IEEE Computer Society: Washington, DC

  27. Yang J, Hauptmann A G (2008) (un)reliability of video concept detection. In: CIVR’08, pp. 85–94

Download references

Acknowledgments

This work was partly realized as part of the Quaero Program funded by OSEO, French State agency for innovation. This work was supported in part by the French project VideoSense ANR-09-CORD-026 of the ANR. Experiments presented in this paper were carried out using the Grid’5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr). The authors wish to thanks the participants of the IRIM (Indexation et Recherche d’Information Multimédia) group of the GDR-ISIS research network from CNRS for providing the descriptors used in these experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georges Quénot.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Safadi, B., Derbas, N. & Quénot, G. Descriptor optimization for multimedia indexing and retrieval. Multimed Tools Appl 74, 1267–1290 (2015). https://doi.org/10.1007/s11042-014-2071-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2071-6

Keywords

Navigation