Abstract
In this paper, we propose and evaluate a method for optimizing descriptors used for content-based multimedia indexing and retrieval. A large variety of descriptors are commonly used for this purpose. However, the most efficient ones often have characteristics preventing them to be easily used in large scale systems. They may have very high dimensionality (up to tens of thousands dimensions) and/or be suited for a distance which is costly to compute (e.g. χ 2). The proposed method combines a PCA-based dimensionality reduction with pre- and post-PCA non-linear transformations. The resulting transformation is globally optimized. The produced descriptors have a much lower dimensionality while performing at least as well, and often significantly better, with the Euclidean distance than the original high dimensionality descriptors with their optimal distance. Our approach also includes a hyper-parameter optimization procedure based on the use of a fast kNN classifier and on a polynomial fit to overcome the MAP metric instability. The method has been validated and evaluated on a variety of descriptors using the TRECVid 2010 semantic indexing task data. It has been applied at large scale for the TRECVid 2012 semantic indexing task on tens of descriptors of various types and with initial dimensionalities ranging from 15 up to 32,768. The same transformation can be used also for multimedia retrieval in the context of query by example and/or of relevance feedback.
Similar content being viewed by others
Notes
Both “feature” and “descriptor” are often used for designating the data extracted from images and videos for abstracting their content. “Signature” or other terms are also sometimes used. In this paper, we shall use the term “descriptor” for this purpose in all cases, whether the extracted data is local, e.g. SIFT point descriptor, intermediate or global, e.g. directly computed global image statistics or aggregations of local descriptors via the bag of visual words or Fisher vectors approaches
We call these “hyper-parameters” instead of simply “parameters” because we feel that they are rather at the level of what is called hyper-parameters in the control of a classifier, e.g. C and γ in RBF SVMs, rather than at the level of the “regular” parameters in the same context, e.g. the α i weights associated to the support vectors, even though there are no regular parameters against which they could be opposed to in our process.
Grid’5000 is a scientific instrument designed to support experiment-driven research in all areas of computer science related to parallel, large-scale or distributed computing and networking, https://www.grid5000.fr
References
Ballas N, Labbé B, Shabou A, Le Borgne H, Gosselin P, Redi M, Merialdo B, Jégou H, Delhumeau J, Vieux R, Mansencal B, Benois-Pineau J, Ayache S, Hamadi A, Safadi B, Thollard F, Derbas N, Quénot G, Bredin H, Cord M, Gao B, Zhu C, Tang Y, Dellandrea E, Bichot C E, Chen L, Benoît A, Lambert P, Strat T, Razik J, Paris S, Glotin H, Ngoc Trung T, Petrovska Delacrétaz D, Chollet G, Stoian A, Crucianu M (2012) IRIM at TRECVID 2012: Semantic indexing and instance search. In: Proceedings TRECVID workshop. Gaithersburg, MD
Bishop C M (2007) Pattern recognition and machine learning (Information science and statistics), 1 edn. Springer
Chang C , Lin CJ (2011) LIBSVM: A library for support vector machines.ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. ECCV, pp 1–22
Gorisse D, Cord M, Precioso F (2011) Salsas: Sub-linear active learning strategy with approximate k-nn search. Pattern Recog 44(10–11):2343–2357
Gorisse D, Precioso F, Gosselin P, Granjon L, Pellerin D, Rombaut M, Bredin H, Koenig L, Lachambre H, El Khoury E, Vieux R, Mansencal B, Zhou Y, Benois-Pineau J, Jégou H, Ayache S, Safadi B, Tong Y, Thollard F, Quénot G, Benoit A, Lambert P (2010) IRIM at TRECVID 2010: High level feature extraction and instance search. In: TREC video retrieval evaluation workshop. National institute of standards and technology. Gaithersburg, MD USA
Hamadi A, Quénot G, Mulhem P (2012) Two-layers re-ranking approach based on contextual information for visual concepts detection in videos. In: CBMI, pp 1–6
Hinton G E, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507. doi: 10.1126/science.1127647
Jégou H, Chum O (2012) Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening. In: ECCV - European conference on computer vision. Firenze, Italie
Jégou H, Douze M, Schmid C (2009) On the burstiness of visual elements. In: IEEE conference on computer vision and pattern recognition (CVPR ’09), pp 1169–1176. http://hal.inria.fr/inria-00394211. doi: 10.1109/CVPRW.2009.5206609
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 23rd IEEE conference on computer vision & pattern recognition (CVPR ’10), pp 3304–3311. IEEE Computer Society, San Francisco. doi: 10.1109/CVPR.2010.5540039
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2011) Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence
Kramer M A (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE J 37:233–243
Lee H, Battle A, Raina R, Ng AY (2007) Efficient sparse coding algorithms. In: NIPS, pp 801–808. NIPS
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Mahalanobis PC (1936) On the generalised distance in statistics. Proc. Int Inst Sci, India 2(1):49–55
Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. Int J Comput Vis 42:145–175
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: ECCV (4), pp. 143–156
Redi M, Merialdo BSaliency moments for image categorization. In: Proceedings of the 1st ACM international conference on multimedia retrieval, ICMR ’11, pp 39:1–39:8. ACM: New York
Safadi B, Derbas N, Hamadi A, Thollard F, Georges Q, Delhumeau J, Jégou H, Gehrig T, Kemal Ekenel H, Stifelhagen R (2012) Quaero at TRECVID 2012: Semantic indexing. In: Proceedings TRECVID workshop. Gaithersburg, MD
Safadi B, Quénot G (2010) Evaluations of multi-learners approaches for concepts indexing in video documents. In: RIAO. Paris, France
Safadi B, Quénot G (2011) Re-ranking by local re-scoring for video indexing and retrieval. In: Proceedings of the 20th ACM conference on information and knowledge management (CIKM), pp 2081–2084. Glasgow, United Kingdom
Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: Theory and practice. Int J Comput Vis 105(3):222–245. doi: 10.1007/s11263-013-0636-x
Van de Sande K E A, Gevers T, Snoek C G M (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
Sivic J, Zisserman A (2003) A text retrieval approach to object matching in videos. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, vol 2, ICCV ’03, pp 1470. IEEE Computer Society: Washington, DC
Yang J, Hauptmann A G (2008) (un)reliability of video concept detection. In: CIVR’08, pp. 85–94
Acknowledgments
This work was partly realized as part of the Quaero Program funded by OSEO, French State agency for innovation. This work was supported in part by the French project VideoSense ANR-09-CORD-026 of the ANR. Experiments presented in this paper were carried out using the Grid’5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr). The authors wish to thanks the participants of the IRIM (Indexation et Recherche d’Information Multimédia) group of the GDR-ISIS research network from CNRS for providing the descriptors used in these experiments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Safadi, B., Derbas, N. & Quénot, G. Descriptor optimization for multimedia indexing and retrieval. Multimed Tools Appl 74, 1267–1290 (2015). https://doi.org/10.1007/s11042-014-2071-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2071-6