Concept detection and keyframe extraction using a visual thesaurus

Spyrou, Evaggelos; Tolias, Giorgos; Mylonas, Phivos; Avrithis, Yannis

doi:10.1007/s11042-008-0237-9

Concept detection and keyframe extraction using a visual thesaurus

Published: 08 November 2008

Volume 41, pages 337–373, (2009)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Evaggelos Spyrou¹,
Giorgos Tolias¹,
Phivos Mylonas¹ &
…
Yannis Avrithis¹

248 Accesses
21 Citations
Explore all metrics

Abstract

This paper presents a video analysis approach based on concept detection and keyframe extraction employing a visual thesaurus representation. Color and texture descriptors are extracted from coarse regions of each frame and a visual thesaurus is constructed after clustering regions. The clusters, called region types, are used as basis for representing local material information through the construction of a model vector for each frame, which reflects the composition of the image in terms of region types. Model vector representation is used for keyframe selection either in each video shot or across an entire sequence. The selection process ensures that all region types are represented. A number of high-level concept detectors is then trained using global annotation and Latent Semantic Analysis is applied. To enhance detection performance per shot, detection is employed on the selected keyframes of each shot, and a framework is proposed for working on very large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A generic framework for semantic video indexing based on visual concepts/contexts detection

Article Open access 25 April 2014

Local Invariant Feature Tracks for High-Level Video Feature Extraction

Ordering of Visual Descriptors in a Classifier Cascade Towards Improved Video Concept Detection

Notes

http://www.image.ntua.gr/smag/tools/vde.
More details can be found in http://www-nlpir.nist.gov/projects/tv2007/tv2007.html#3.
http://www.mathworks.com/.

References

Avrithis Y, Doulamis A, Doulamis N, Kollias S (1999) A stochastic framework for optimal key frame extraction from mpeg video databases. Comput Vis Image Underst 5(1):3–24
Article Google Scholar
Ayache S, Quenot G (2007) TRECVID 2007 collaborative annotation using active learning. In: TRECVID 2007 workshop, Gaithersburg, 5–6 November 2007
Boujemaa N, Fleuret F, Gouet V, Sahbi H (2004) Visual content extraction for automatic semantic annotation of video news. In: IS&T/SPIE conference on storage and retrieval methods and applications for multimedia, part of electronic imaging symposium, San Jose, January 2004
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/∼cjlin/libsvm
Chang SF, Sikora T, Puri A (2001) Overview of the MPEG-7 standard. IEEE Trans Circuits Systems Video Technol 11(6):688–695
Article Google Scholar
Chapelle O, Haffner P, Vapnik V (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064
Article Google Scholar
Chiu S (1997) Extracting fuzzy rules from data for function approximation and pattern classification. In: Dubois D, Prade H, Yager R (eds) Fuzzy information engineering: a guided tour of applications. Wiley, New York
Google Scholar
Cooper M, Foote J (2005) Discriminative techniques for keyframe selection. In: Proceedings of the IEEE international conference on multimedia & expo (ICME), Amsterdam, 6–9 July 2005
Dance C, Willamowski J, Fan L, Bray C, Csurka G (2004) Visual categorization with bags of keypoints. In: ECCV—international workshop on statistical learning in computer vision
Deerwester S, Dumais S, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Soc Inf Sci 41(6):391–407
Article Google Scholar
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Gokalp D, Aksoy S (2007) Scene classification using bag-of-regions representations. In: IEEE conference on computer vision and pattern recognition (CVPR), Minneapolis, 18–23 June 2007
Haykin S (1998) Neural networks: a comprehensive foundation. Prentice Hall, Englewood Cliffs
Google Scholar
IBM (2005) MARVEL multimedia analysis and retrieval system. IBM Research White Paper
Kishida K (2005) Property of average precision and its generalization: an examination of evaluation indicator for information retrieval. NII Technical Reports, NII-2005-014E
Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic—theory and applications. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Laaksonen J, Koskela M, Oja E (2002) Picsom, self-organizing image retrieval with MPEG-7 content descriptors. IEEE Trans Neural Netw 13:841–853
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) A discriminative framework for texture and object recognition using local image features. In: Towards category-level object recognition. Springer, New York, pp 423–442
Chapter Google Scholar
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Ma YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. In: MULTIMEDIA ’02: Proceedings of the tenth ACM international conference on multimedia. ACM, New York, pp 533–542
Chapter Google Scholar
Manjunath B, Ohm J, Vasudevan V, Yamada A (2001) Color and texture descriptors. IEEE Trans Circuits Syst Video Technol 11(6):703–715
Article Google Scholar
Mérialdo B, Huet B, Yahiaoui I, Souvannavong F (2002) Automatic video summarization. In: International thyrrenian workshop on digital communications, advanced methods for multimedia signal processing, Palazzo dei Congressi, Capri, 8–11 September 2002
Mitchell M (1998) An introduction to genetic algorithms. MIT, Cambridge
MATH Google Scholar
Molina J, Spyrou E, Sofou N, Martinez JM (2007) On the selection of MPEG-7 visual descriptors and their level of detail for nature disaster video sequences classification. In: 2nd international conference on semantics and digital media technologies (SAMT), Genova, 5–7 December 2007
Morris OJ, Lee MJ, Constantinides AG (1986) Graph theory for image analysis: an approach based on the shortest spanning tree. IEE Proc 133:146–152
Google Scholar
Naphade MR, Kennedy L, Kender JR, Chang SF, Smith JR, Over P, Hauptmann A (2005) A light scale concept ontology for multimedia understanding for TRECVID 2005. IBM Research Technical Report
Natsev A, Naphade M, Smith J (2003) Lexicon design for semantic indexing in media databases. In: International conference on communication technologies and programming, Varna, 23–26 June 2003
Opelt A, Pinz A, Zisserman A (2006) Incremental learning of object detectors using a visual shape alphabet. In: IEEE computer society conference on computer vision and pattern recognition, New York, 17–22 June 2006
Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173
Article Google Scholar
Saux BL, Amato G (2004) Image classifiers for scene analysis. In: International conference on computer vision and graphics, Warsaw, 22–24 September 2004
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: MIR ’06: proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM, New York
Google Scholar
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Article Google Scholar
Snoek CGM, Worring M (2003) Time interval based modelling and classification of events in soccer video. In: Proceedings of the 9th annual conference of the advanced school for computing and imaging (ASCI), Heijen, June 2003
Souvannavong F, Mérialdo B, Huet B (2005) Region-based video content indexing and retrieval. In: CBMI 2005, fourth international workshop on content-based multimedia indexing, Riga, 21–23 June 2005
Spyrou E, Avrithis Y (2007) A region thesaurus approach for high-level concept detection in the natural disaster domain. In: 2nd international conference on semantics and digital media technologies (SAMT), Genova, December 2007
Spyrou E, Avrithis Y (2007) Keyframe extraction using local visual semantics in the form of a region thesaurus. In: 2nd international workshop on semantic media adaptation and personalization (SMAP), London, 17–18 December 2007
Spyrou E, LeBorgne H, Mailis T, Cooke E, Avrithis Y, O’Connor N (2005) Fusing MPEG-7 visual descriptors for image classification. In: International conference on artificial neural networks (ICANN), Warsaw, 11–15 September 2005
Spyrou E, Tolias G, Mylonas P, Avrithis Y (2008) A semantic multimedia analysis approach utilizing a region thesaurus and LSA. In: International workshop on image analysis for multimedia interactive services (WIAMIS), Klagenfurt, 7–9 May 2008
Sundaram H, Chang SF (2003) Video analysis and summarization at structural and semantic levels, multimedia information retrieval and management: technological fundamentals and applications. In: Feng D, Siu WC, Zhang H (Eds) Springer, New York
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Voisine N, Dasiopoulou S, Mezaris V, Spyrou E, Athanasiadis T, Kompatsiaris I, Avrithis Y, Strintzis MG (2005) Knowledge-assisted video analysis using a genetic algorithm. In: 6th international workshop on image analysis for multimedia interactive services (WIAMIS 2005), Montreux, 13–15 April 2005
Yamada A, Pickering M, Jeannin S, Cieplinski L, Ohm J, Kim M (2001) MPEG-7 Visual part of eXperimentation model version 9.0
Yanagawa A, Chang SF, Kennedy L, Hsu W (2007) Columbia universitys baseline detectors for 374 LSCOM semantic visual concepts. Columbia University ADVENT Technical Report
Yuan J, Guo Z et al (2007) THU and ICRC at TRECVID 2007. In: 5th TRECVID workshop, Gaithersburg, November 2007
Zhang H, Wu J, Zhong D, Smoliar S (1997) An integrated system for content-based retrieval and browsing. Pattern Recogn 30:643–658
Article Google Scholar
Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive keyframe extraction using unsupervised clustering. In: Proc of international conference oh image processing (ICIP), Chicago, 4–7 October 1998

Download references

Acknowledgements

This work was partially supported by the European Commission under contracts FP7-215453 WeKnowIt, FP6-027026 K-Space and FP6-027685 MESH.

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, National Technical University of Athens, Iroon Politechniou 9 Str., Zographou Campus, 157 73, Athens, Greece
Evaggelos Spyrou, Giorgos Tolias, Phivos Mylonas & Yannis Avrithis

Authors

Evaggelos Spyrou
View author publications
You can also search for this author in PubMed Google Scholar
Giorgos Tolias
View author publications
You can also search for this author in PubMed Google Scholar
Phivos Mylonas
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Avrithis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evaggelos Spyrou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spyrou, E., Tolias, G., Mylonas, P. et al. Concept detection and keyframe extraction using a visual thesaurus. Multimed Tools Appl 41, 337–373 (2009). https://doi.org/10.1007/s11042-008-0237-9

Download citation

Published: 08 November 2008
Issue Date: February 2009
DOI: https://doi.org/10.1007/s11042-008-0237-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Concept detection and keyframe extraction using a visual thesaurus

Abstract

Access this article

Similar content being viewed by others

A generic framework for semantic video indexing based on visual concepts/contexts detection

Local Invariant Feature Tracks for High-Level Video Feature Extraction

Ordering of Visual Descriptors in a Classifier Cascade Towards Improved Video Concept Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Concept detection and keyframe extraction using a visual thesaurus

Abstract

Access this article

Similar content being viewed by others

A generic framework for semantic video indexing based on visual concepts/contexts detection

Local Invariant Feature Tracks for High-Level Video Feature Extraction

Ordering of Visual Descriptors in a Classifier Cascade Towards Improved Video Concept Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation