Descriptor optimization for multimedia indexing and retrieval

Safadi, Bahjat; Derbas, Nadia; Quénot, Georges

doi:10.1007/s11042-014-2071-6

Descriptor optimization for multimedia indexing and retrieval

Published: 17 May 2014

Volume 74, pages 1267–1290, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Bahjat Safadi¹,
Nadia Derbas¹ &
Georges Quénot¹

299 Accesses
10 Citations
Explore all metrics

Abstract

In this paper, we propose and evaluate a method for optimizing descriptors used for content-based multimedia indexing and retrieval. A large variety of descriptors are commonly used for this purpose. However, the most efficient ones often have characteristics preventing them to be easily used in large scale systems. They may have very high dimensionality (up to tens of thousands dimensions) and/or be suited for a distance which is costly to compute (e.g. χ ²). The proposed method combines a PCA-based dimensionality reduction with pre- and post-PCA non-linear transformations. The resulting transformation is globally optimized. The produced descriptors have a much lower dimensionality while performing at least as well, and often significantly better, with the Euclidean distance than the original high dimensionality descriptors with their optimal distance. Our approach also includes a hyper-parameter optimization procedure based on the use of a fast kNN classifier and on a polynomial fit to overcome the MAP metric instability. The method has been validated and evaluated on a variety of descriptors using the TRECVid 2010 semantic indexing task data. It has been applied at large scale for the TRECVid 2012 semantic indexing task on tens of descriptors of various types and with initial dimensionalities ranging from 15 up to 32,768. The same transformation can be used also for multimedia retrieval in the context of query by example and/or of relevance feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The influence of image descriptors’ dimensions’ value cardinalities on large-scale similarity search

Article 26 November 2014

Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Indexing

Improving content-based image retrieval for heterogeneous datasets using histogram-based descriptors

Article 01 May 2017

Notes

Both “feature” and “descriptor” are often used for designating the data extracted from images and videos for abstracting their content. “Signature” or other terms are also sometimes used. In this paper, we shall use the term “descriptor” for this purpose in all cases, whether the extracted data is local, e.g. SIFT point descriptor, intermediate or global, e.g. directly computed global image statistics or aggregations of local descriptors via the bag of visual words or Fisher vectors approaches
We call these “hyper-parameters” instead of simply “parameters” because we feel that they are rather at the level of what is called hyper-parameters in the control of a classifier, e.g. C and γ in RBF SVMs, rather than at the level of the “regular” parameters in the same context, e.g. the α _i weights associated to the support vectors, even though there are no regular parameters against which they could be opposed to in our process.
Grid’5000 is a scientific instrument designed to support experiment-driven research in all areas of computer science related to parallel, large-scale or distributed computing and networking, https://www.grid5000.fr

References

Ballas N, Labbé B, Shabou A, Le Borgne H, Gosselin P, Redi M, Merialdo B, Jégou H, Delhumeau J, Vieux R, Mansencal B, Benois-Pineau J, Ayache S, Hamadi A, Safadi B, Thollard F, Derbas N, Quénot G, Bredin H, Cord M, Gao B, Zhu C, Tang Y, Dellandrea E, Bichot C E, Chen L, Benoît A, Lambert P, Strat T, Razik J, Paris S, Glotin H, Ngoc Trung T, Petrovska Delacrétaz D, Chollet G, Stoian A, Crucianu M (2012) IRIM at TRECVID 2012: Semantic indexing and instance search. In: Proceedings TRECVID workshop. Gaithersburg, MD
Bishop C M (2007) Pattern recognition and machine learning (Information science and statistics), 1 edn. Springer
Chang C , Lin CJ (2011) LIBSVM: A library for support vector machines.ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Article Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. ECCV, pp 1–22
Gorisse D, Cord M, Precioso F (2011) Salsas: Sub-linear active learning strategy with approximate k-nn search. Pattern Recog 44(10–11):2343–2357
Article MATH Google Scholar
Gorisse D, Precioso F, Gosselin P, Granjon L, Pellerin D, Rombaut M, Bredin H, Koenig L, Lachambre H, El Khoury E, Vieux R, Mansencal B, Zhou Y, Benois-Pineau J, Jégou H, Ayache S, Safadi B, Tong Y, Thollard F, Quénot G, Benoit A, Lambert P (2010) IRIM at TRECVID 2010: High level feature extraction and instance search. In: TREC video retrieval evaluation workshop. National institute of standards and technology. Gaithersburg, MD USA
Hamadi A, Quénot G, Mulhem P (2012) Two-layers re-ranking approach based on contextual information for visual concepts detection in videos. In: CBMI, pp 1–6
Hinton G E, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507. doi: 10.1126/science.1127647
Article MATH MathSciNet Google Scholar
Jégou H, Chum O (2012) Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening. In: ECCV - European conference on computer vision. Firenze, Italie
Jégou H, Douze M, Schmid C (2009) On the burstiness of visual elements. In: IEEE conference on computer vision and pattern recognition (CVPR ’09), pp 1169–1176. http://hal.inria.fr/inria-00394211. doi: 10.1109/CVPRW.2009.5206609
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 23rd IEEE conference on computer vision & pattern recognition (CVPR ’10), pp 3304–3311. IEEE Computer Society, San Francisco. doi: 10.1109/CVPR.2010.5540039
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2011) Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence
Kramer M A (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE J 37:233–243
Article Google Scholar
Lee H, Battle A, Raina R, Ng AY (2007) Efficient sparse coding algorithms. In: NIPS, pp 801–808. NIPS
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Mahalanobis PC (1936) On the generalised distance in statistics. Proc. Int Inst Sci, India 2(1):49–55
MATH MathSciNet Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. Int J Comput Vis 42:145–175
Article MATH Google Scholar
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: ECCV (4), pp. 143–156
Redi M, Merialdo BSaliency moments for image categorization. In: Proceedings of the 1st ACM international conference on multimedia retrieval, ICMR ’11, pp 39:1–39:8. ACM: New York
Safadi B, Derbas N, Hamadi A, Thollard F, Georges Q, Delhumeau J, Jégou H, Gehrig T, Kemal Ekenel H, Stifelhagen R (2012) Quaero at TRECVID 2012: Semantic indexing. In: Proceedings TRECVID workshop. Gaithersburg, MD
Safadi B, Quénot G (2010) Evaluations of multi-learners approaches for concepts indexing in video documents. In: RIAO. Paris, France
Safadi B, Quénot G (2011) Re-ranking by local re-scoring for video indexing and retrieval. In: Proceedings of the 20th ACM conference on information and knowledge management (CIKM), pp 2081–2084. Glasgow, United Kingdom
Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: Theory and practice. Int J Comput Vis 105(3):222–245. doi: 10.1007/s11263-013-0636-x
Article MATH MathSciNet Google Scholar
Van de Sande K E A, Gevers T, Snoek C G M (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
Article Google Scholar
Sivic J, Zisserman A (2003) A text retrieval approach to object matching in videos. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, vol 2, ICCV ’03, pp 1470. IEEE Computer Society: Washington, DC
Yang J, Hauptmann A G (2008) (un)reliability of video concept detection. In: CIVR’08, pp. 85–94

Download references

Acknowledgments

This work was partly realized as part of the Quaero Program funded by OSEO, French State agency for innovation. This work was supported in part by the French project VideoSense ANR-09-CORD-026 of the ANR. Experiments presented in this paper were carried out using the Grid’5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr). The authors wish to thanks the participants of the IRIM (Indexation et Recherche d’Information Multimédia) group of the GDR-ISIS research network from CNRS for providing the descriptors used in these experiments.

Author information

Authors and Affiliations

UJF-Grenoble 1 / UPMF-Grenoble 2 / Grenoble INP / CNRS, LIG UMR 5217, Grenoble, F-38041, France
Bahjat Safadi, Nadia Derbas & Georges Quénot

Authors

Bahjat Safadi
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Derbas
View author publications
You can also search for this author in PubMed Google Scholar
Georges Quénot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georges Quénot.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Safadi, B., Derbas, N. & Quénot, G. Descriptor optimization for multimedia indexing and retrieval. Multimed Tools Appl 74, 1267–1290 (2015). https://doi.org/10.1007/s11042-014-2071-6

Download citation

Received: 22 September 2013
Revised: 18 April 2014
Accepted: 29 April 2014
Published: 17 May 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s11042-014-2071-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Descriptor optimization for multimedia indexing and retrieval

Abstract

Access this article

Similar content being viewed by others

The influence of image descriptors’ dimensions’ value cardinalities on large-scale similarity search

Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Indexing

Improving content-based image retrieval for heterogeneous datasets using histogram-based descriptors

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Descriptor optimization for multimedia indexing and retrieval

Abstract

Access this article

Similar content being viewed by others

The influence of image descriptors’ dimensions’ value cardinalities on large-scale similarity search

Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Indexing

Improving content-based image retrieval for heterogeneous datasets using histogram-based descriptors

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation