Skip to main content
Log in

Extended conceptual feedback for semantic multimedia indexing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we consider the problem of automatically detecting a large number of visual concepts in images or video shots. State of the art systems generally involve feature (descriptor) extraction, classification (supervised learning) and fusion when several descriptors and/or classifiers are used. Though direct multi-label approaches are considered in some works, detection scores are often computed independently for each target concept. We propose a method that we call “conceptual feedback” which implicitly takes into account the relations between concepts to improve the overall concepts detection performance. A conceptual descriptor is built from the system’s output scores and fed back by adding it to the pool of already available descriptors. Our proposal can be iterated several times. Moreover, we propose three extensions of our method. Firstly, a weighting of the conceptual dimensions is performed to give more importance to concepts which are more correlated to the target concept. Secondly, an explicit selection of a set of concepts that are semantically or statically related to the target concept is introduced. For video indexing, we propose a third extension which integrates the temporal dimension in the feedback process by taking into account simultaneously the conceptual and the temporal dimensions to build the high-level descriptor. Our proposals have been evaluated in the context of the TRECVid 2012 semantic indexing task involving the detection of 346 visual or multi-modal concepts. Overall, combined with temporal re-scoring, the proposed method increased the global system performance (MAP) from 0.2613 to 0.3082 ( + 17.9 % of relative improvement) while the temporal re-scoring alone increased it only from 0.2613 to 0.2691 ( + 3.0 %).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Ayache S, Quénot G (2008) Video corpus annotation using active learning. In: European conference on information retrieval (ECIR). Glasgow, Scotland, pp 187–198

  2. Ballas N, Labbé B, Shabou A, Le Borgne H, Gosselin P, Redi M, Merialdo B, Jégou H, Delhumeau J, Vieux R, Mansencal B, Benois-Pineau J, Ayache S, Hamadi A, Safadi B, Thollard F, Derbas N, Quénot G, Bredin H, Cord M, Gao B, Zhu C, Tang Y, Dellandrea E, Bichot CE, Chen L, Benoît A, Lambert P, Strat T, Razik J, Paris S, Glotin H, Ngoc Trung T, Petrovska Delacrétaz D, Chollet G, Stoian A, Crucianu M (2012) IRIM at TRECVID 2012: Semantic Indexing and Instance Search. In: Proceedings on TRECVID Workshop. Gaithersburg, MD, USA

  3. Cover TM, Thomas JA (1991) Elements of information theory. Wiley-Interscience, New York

    Book  MATH  Google Scholar 

  4. Hamadi A, Quenot G, Mulhem P (2012) Two-layers re-ranking approach based on contextual information for visual concepts detection in videos. In: 2012 10th international workshop on content-based multimedia indexing (CBMI), pp 1–6. doi:10.1109/CBMI.2012.6269837

  5. Hradis M, Kolár M, Láník A, Král J, Zemcík P, Smrz P (2012) Annotating images with suggestions - user study of a tagging system. ACIVS, pp 155–166

  6. Inoue N, Kamishima Y, Mori K, Shinoda K (2012) TokyoTechCanon at TRECVID 2012. In: TRECVID 2012. USA, Gaithersburg, MD

    Google Scholar 

  7. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 2169–2178. doi:10.1109/CVPR.2006.68

  8. Over P, Awad G, Michel M, Fiscus J, Sanders G, Shaw B, Kraaij W, Smeaton AF, Quénot G (2012) TRECVID 2012 – an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2012. NIST, USA

    Google Scholar 

  9. Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers. MIT Press, pp 61–74

  10. Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Wang M, Zhang HJ (2008) Correlative multilabel video annotation with temporal kernels. ACM Trans Multimed Comput Commun Appl 5(1):3:1–3:27. doi:10.1145/1404880.1404883

    Article  Google Scholar 

  11. Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Lienhart R, Prasad AR, Hanjalic A, Choi S, Bailey BP, Sebe N (eds) Proceedings of the 15th international conference on multimedia. ACM, Augsburg, Germany, pp 17–26. doi:10.1145/1291233.1291245

    Chapter  Google Scholar 

  12. Ramesh Naphade M, Kozintsev IV, Huang TS (2002) Factor graph framework for semantic video indexing. IEEE Trans Circ Sys Video Technol 12(1):40–52. doi:10.1109/76.981844

    Article  Google Scholar 

  13. Safadi B, Quénot G (2011) Re-ranking by local re-scoring for video indexing and retrieval. In: CIKM 2011: 20th ACM conference on information and knowledge management, CIKM ’11. ACM, Glasgow, Scotland, pp 2081–2084. http://dl.acm.org/citation.cfm?doid=2063576.2063895

    Google Scholar 

  14. Safadi B, Quénot G (2011) Re-ranking for multimedia indexing and retrieval. In: ECIR 2011: 33rd European conference on information retrieval. Springer, Dublin, Ireland, pp 708–711. http://link.springer.com/chapter/10.1007$%2F978-3-642-20161-5_$76

    Google Scholar 

  15. Safadi B, Quénot G (2013) Descriptor optimization for multimedia indexing and retrieval. In: Proceedings of content based multimedia ingexing (CBMI) Workshop, Veszprém, Hungary

  16. Smeaton A.F., Over P., Kraaij W. (2009). In: Divakaran A. (ed) High-Level Feature Detection from Video in TRECVid: a 5-Year Retrospective of Achievements. Multimedia Content Analysis, Theory and Applications. Springer Verlag, Berlin, pp 151–174

  17. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 12:1349–1380. doi:10.1109/34.895972

    Article  Google Scholar 

  18. Smith JR, Naphade MR, Natsev A (2003) Multimedia semantic indexing using model vectors. In: ICME. IEEE, pp 445–448

  19. Snoek CG, Worring M, Smeulders AW (2005) Early versus late fusion in semantic video analysis. In: Proceedings of ACM multimedia

  20. Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 4(2):215–322

    Google Scholar 

  21. Snoek CGM, Worring M, Hauptmann AG (2006) Learning rich semantics from news video archives by style analysis. ACM Trans Multimed Comput Commun Appl 2(2):91–108. doi:10.1145/1142020.1142021

    Article  Google Scholar 

  22. Tao M, Shyu ML (2012) Leveraging concept association network for multimedia rare concept mining and retrieval. In: ICME. IEEE, pp 860–865

  23. Tiberius Strat S., Benot A, Bredin H, Quénot G, Lambert P (2012) Hierarchical late fusion for concept detection in videos. In: ECCV 2012, Workshop on Information Fusion in Computer Vision for Concept Recognition. Firenze, Italy

  24. Tseng B, Lin CY, Naphade M, Natsev A, Smith J (2003) Normalized classifier fusion for semantic visual concept detection. In: 2003 International conference on image processing, 2003, ICIP 2003, vol 2, pp II–535–8 vol 3. doi:10.1109/ICIP.2003.1246735

  25. Wang F, Merialdo B (2009) Eurecom at trecvid 2009 high-level feature extraction. In: TREC2009 notebook

  26. Weng MF, Chuang YY (2012) Cross-domain multicue fusion for concept-based video indexing. IEEE Trans Pattern Anal Mach Intell 34(10):1927–1941. doi:10.1109/TPAMI.2011.273

    Article  Google Scholar 

  27. Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM ’06:102–111. doi:10.1145/1183614.1183633

  28. Yuan J, Li J, Zhang B (2007) Exploiting spatial context constraints for automatic image region annotation. In: Proceedings of the 15th international conference on multimedia, Multimedia ’07: 595–604. doi:10.1145/1291233.1291379

  29. Zhu Q, Liu D, Meng T, Chen C, Shyu ML, Yang Y, Ha H, Fleites F, Chen SC (2012) Florida international university and university of miami trecvid 2012 - semantic indexing. In: TRECVID 2012. USA, Gaithersburg, MD

    Google Scholar 

Download references

Acknowledgments

This work was partly realized as part of the Quaero Program funded by OSEO, French State agency for innovation. This work was supported in part by the French project VideoSense ANR-09-CORD-026 of the ANR. Experiments presented in this paper were carried out using the Grid’5000 experimental test bed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr). The authors wish to thanks the participants of the IRIM (Indexation et Recherche d’Information Multimédia) group of the GDR-ISIS research network from CNRS for providing the descriptors used in these experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georges Quénot.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamadi, A., Mulhem, P. & Quénot, G. Extended conceptual feedback for semantic multimedia indexing. Multimed Tools Appl 74, 1225–1248 (2015). https://doi.org/10.1007/s11042-014-1937-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-1937-y

Keywords

Navigation