Extended conceptual feedback for semantic multimedia indexing

Hamadi, Abdelkader; Mulhem, Philippe; Quénot, Georges

doi:10.1007/s11042-014-1937-y

Extended conceptual feedback for semantic multimedia indexing

Published: 15 April 2014

Volume 74, pages 1225–1248, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Abdelkader Hamadi¹,
Philippe Mulhem¹ &
Georges Quénot¹

194 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, we consider the problem of automatically detecting a large number of visual concepts in images or video shots. State of the art systems generally involve feature (descriptor) extraction, classification (supervised learning) and fusion when several descriptors and/or classifiers are used. Though direct multi-label approaches are considered in some works, detection scores are often computed independently for each target concept. We propose a method that we call “conceptual feedback” which implicitly takes into account the relations between concepts to improve the overall concepts detection performance. A conceptual descriptor is built from the system’s output scores and fed back by adding it to the pool of already available descriptors. Our proposal can be iterated several times. Moreover, we propose three extensions of our method. Firstly, a weighting of the conceptual dimensions is performed to give more importance to concepts which are more correlated to the target concept. Secondly, an explicit selection of a set of concepts that are semantically or statically related to the target concept is introduced. For video indexing, we propose a third extension which integrates the temporal dimension in the feedback process by taking into account simultaneously the conceptual and the temporal dimensions to build the high-level descriptor. Our proposals have been evaluated in the context of the TRECVid 2012 semantic indexing task involving the detection of 346 visual or multi-modal concepts. Overall, combined with temporal re-scoring, the proposed method increased the global system performance (MAP) from 0.2613 to 0.3082 ( + 17.9 % of relative improvement) while the temporal re-scoring alone increased it only from 0.2613 to 0.2691 ( + 3.0 %).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A generic framework for semantic video indexing based on visual concepts/contexts detection

Article Open access 25 April 2014

High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations

A Comparative Study on the Use of Multi-label Classification Techniques for Concept-Based Video Indexing and Annotation

References

Ayache S, Quénot G (2008) Video corpus annotation using active learning. In: European conference on information retrieval (ECIR). Glasgow, Scotland, pp 187–198
Ballas N, Labbé B, Shabou A, Le Borgne H, Gosselin P, Redi M, Merialdo B, Jégou H, Delhumeau J, Vieux R, Mansencal B, Benois-Pineau J, Ayache S, Hamadi A, Safadi B, Thollard F, Derbas N, Quénot G, Bredin H, Cord M, Gao B, Zhu C, Tang Y, Dellandrea E, Bichot CE, Chen L, Benoît A, Lambert P, Strat T, Razik J, Paris S, Glotin H, Ngoc Trung T, Petrovska Delacrétaz D, Chollet G, Stoian A, Crucianu M (2012) IRIM at TRECVID 2012: Semantic Indexing and Instance Search. In: Proceedings on TRECVID Workshop. Gaithersburg, MD, USA
Cover TM, Thomas JA (1991) Elements of information theory. Wiley-Interscience, New York
Book MATH Google Scholar
Hamadi A, Quenot G, Mulhem P (2012) Two-layers re-ranking approach based on contextual information for visual concepts detection in videos. In: 2012 10th international workshop on content-based multimedia indexing (CBMI), pp 1–6. doi:10.1109/CBMI.2012.6269837
Hradis M, Kolár M, Láník A, Král J, Zemcík P, Smrz P (2012) Annotating images with suggestions - user study of a tagging system. ACIVS, pp 155–166
Inoue N, Kamishima Y, Mori K, Shinoda K (2012) TokyoTechCanon at TRECVID 2012. In: TRECVID 2012. USA, Gaithersburg, MD
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 2169–2178. doi:10.1109/CVPR.2006.68
Over P, Awad G, Michel M, Fiscus J, Sanders G, Shaw B, Kraaij W, Smeaton AF, Quénot G (2012) TRECVID 2012 – an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2012. NIST, USA
Google Scholar
Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers. MIT Press, pp 61–74
Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Wang M, Zhang HJ (2008) Correlative multilabel video annotation with temporal kernels. ACM Trans Multimed Comput Commun Appl 5(1):3:1–3:27. doi:10.1145/1404880.1404883
Article Google Scholar
Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Lienhart R, Prasad AR, Hanjalic A, Choi S, Bailey BP, Sebe N (eds) Proceedings of the 15th international conference on multimedia. ACM, Augsburg, Germany, pp 17–26. doi:10.1145/1291233.1291245
Chapter Google Scholar
Ramesh Naphade M, Kozintsev IV, Huang TS (2002) Factor graph framework for semantic video indexing. IEEE Trans Circ Sys Video Technol 12(1):40–52. doi:10.1109/76.981844
Article Google Scholar
Safadi B, Quénot G (2011) Re-ranking by local re-scoring for video indexing and retrieval. In: CIKM 2011: 20th ACM conference on information and knowledge management, CIKM ’11. ACM, Glasgow, Scotland, pp 2081–2084. http://dl.acm.org/citation.cfm?doid=2063576.2063895
Google Scholar
Safadi B, Quénot G (2011) Re-ranking for multimedia indexing and retrieval. In: ECIR 2011: 33rd European conference on information retrieval. Springer, Dublin, Ireland, pp 708–711. http://link.springer.com/chapter/10.1007$%2F978-3-642-20161-5_$76
Google Scholar
Safadi B, Quénot G (2013) Descriptor optimization for multimedia indexing and retrieval. In: Proceedings of content based multimedia ingexing (CBMI) Workshop, Veszprém, Hungary
Smeaton A.F., Over P., Kraaij W. (2009). In: Divakaran A. (ed) High-Level Feature Detection from Video in TRECVid: a 5-Year Retrospective of Achievements. Multimedia Content Analysis, Theory and Applications. Springer Verlag, Berlin, pp 151–174
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 12:1349–1380. doi:10.1109/34.895972
Article Google Scholar
Smith JR, Naphade MR, Natsev A (2003) Multimedia semantic indexing using model vectors. In: ICME. IEEE, pp 445–448
Snoek CG, Worring M, Smeulders AW (2005) Early versus late fusion in semantic video analysis. In: Proceedings of ACM multimedia
Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 4(2):215–322
Google Scholar
Snoek CGM, Worring M, Hauptmann AG (2006) Learning rich semantics from news video archives by style analysis. ACM Trans Multimed Comput Commun Appl 2(2):91–108. doi:10.1145/1142020.1142021
Article Google Scholar
Tao M, Shyu ML (2012) Leveraging concept association network for multimedia rare concept mining and retrieval. In: ICME. IEEE, pp 860–865
Tiberius Strat S., Benot A, Bredin H, Quénot G, Lambert P (2012) Hierarchical late fusion for concept detection in videos. In: ECCV 2012, Workshop on Information Fusion in Computer Vision for Concept Recognition. Firenze, Italy
Tseng B, Lin CY, Naphade M, Natsev A, Smith J (2003) Normalized classifier fusion for semantic visual concept detection. In: 2003 International conference on image processing, 2003, ICIP 2003, vol 2, pp II–535–8 vol 3. doi:10.1109/ICIP.2003.1246735
Wang F, Merialdo B (2009) Eurecom at trecvid 2009 high-level feature extraction. In: TREC2009 notebook
Weng MF, Chuang YY (2012) Cross-domain multicue fusion for concept-based video indexing. IEEE Trans Pattern Anal Mach Intell 34(10):1927–1941. doi:10.1109/TPAMI.2011.273
Article Google Scholar
Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM ’06:102–111. doi:10.1145/1183614.1183633
Yuan J, Li J, Zhang B (2007) Exploiting spatial context constraints for automatic image region annotation. In: Proceedings of the 15th international conference on multimedia, Multimedia ’07: 595–604. doi:10.1145/1291233.1291379
Zhu Q, Liu D, Meng T, Chen C, Shyu ML, Yang Y, Ha H, Fleites F, Chen SC (2012) Florida international university and university of miami trecvid 2012 - semantic indexing. In: TRECVID 2012. USA, Gaithersburg, MD
Google Scholar

Download references

Acknowledgments

This work was partly realized as part of the Quaero Program funded by OSEO, French State agency for innovation. This work was supported in part by the French project VideoSense ANR-09-CORD-026 of the ANR. Experiments presented in this paper were carried out using the Grid’5000 experimental test bed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr). The authors wish to thanks the participants of the IRIM (Indexation et Recherche d’Information Multimédia) group of the GDR-ISIS research network from CNRS for providing the descriptors used in these experiments.

Author information

Authors and Affiliations

UJF-Grenoble 1 / UPMF-Grenoble 2 / Grenoble INP / CNRS, LIG UMR 5217, Grenoble F-38041, France
Abdelkader Hamadi, Philippe Mulhem & Georges Quénot

Authors

Abdelkader Hamadi
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Mulhem
View author publications
You can also search for this author in PubMed Google Scholar
Georges Quénot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georges Quénot.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hamadi, A., Mulhem, P. & Quénot, G. Extended conceptual feedback for semantic multimedia indexing. Multimed Tools Appl 74, 1225–1248 (2015). https://doi.org/10.1007/s11042-014-1937-y

Download citation

Published: 15 April 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s11042-014-1937-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extended conceptual feedback for semantic multimedia indexing

Abstract

Access this article

Similar content being viewed by others

A generic framework for semantic video indexing based on visual concepts/contexts detection

High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations

A Comparative Study on the Use of Multi-label Classification Techniques for Concept-Based Video Indexing and Annotation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Extended conceptual feedback for semantic multimedia indexing

Abstract

Access this article

Similar content being viewed by others

A generic framework for semantic video indexing based on visual concepts/contexts detection

High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations

A Comparative Study on the Use of Multi-label Classification Techniques for Concept-Based Video Indexing and Annotation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation