Abstract
There is a growing evidence that visual saliency can be better modeled using top-down mechanisms that incorporate object semantics. This suggests a new direction for image and video analysis, where semantics extraction can be effectively utilized to improve video summarization, indexing and retrieval. This paper presents a framework that models semantic contexts for key-frame extraction. Semantic context of video frames is extracted and its sequential changes are monitored so that significant novelties are located using a one-class classifier. Working with wildlife video frames, the framework undergoes image segmentation, feature extraction and matching of image blocks, and then a co-occurrence matrix of semantic labels is constructed to represent the semantic context within the scene. Experiments show that our approach using high-level semantic modeling achieves better key-frame extraction as compared with its counterparts using low-level features.
Similar content being viewed by others
References
Benjamas N, Cooharojananone N, Jaruskulchai C (2005) Flashlight and player detection in fighting sport for video summarization. In: Proceedings of the IEEE international symposium on communications and information technology, vol 1, pp 441–444. doi:10.1109/ISCIT.2005.1566888
Chatzigiorgaki M, Skodras AN (2009) Real-time keyframe extraction towards video content identification. In: DSP’09: proceedings of the 16th international conference on Digital Signal Processing. IEEE Press, Piscataway, pp 934–939
Deng Y, Manjunath B (2001) Unsupervised segmentation of color-texture regions in images and video. IEEE Trans Pattern Anal Mach Intell 23(8):800–810
Ekin A, Tekalp A, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12(7):796–807. doi:10.1109/TIP.2003.812758
Gibson D, Campbell N, Thomas B (2002) Visual abstraction of wildlife footage using Gaussian mixture models and the minimum description length criterion. In: 16th international conference on pattern recognition, vol 2, pp 814–817. doi:10.1109/ICPR.2002.1048427
Haering N, Qian R, Sezan M (2000) A semantic event-detection approach and its application to detecting hunts in wildlife video. IEEE Trans Circuits Syst Video Technol 10(6):857–868. doi:10.1109/76.867923
Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3:610–621
Jing F, Li M, Zhang L, Zhang HJ, Zhang B (2003) Learning in region-based image retrieval. In: CIVR’03: proceedings of the 2nd international Conference on Image and Video Retrieval. Springer-Verlag, Berlin, pp 206–215
Ju SX, Black MJ, Minneman S, Kimber D (1998) Summarization of videotaped presentations: Automatic analysis of motion and gesture. IEEE Trans Circuits Syst Video Technol 8:686–696
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: State of the art and challenges. ACM Trans Multimedia Comput Commun Appl 2:1–19. doi:10.1145/1126004.1126005
Li Z, Schuster GM, Katsaggelos AK (2005) MINMAX optimal video summarization. IEEE Trans Circuits Syst Video Technol 15:1245–1256
Liu G, Zhao J (2009) Key frame extraction from MPEG video stream. In: The 2nd symposium International Computer Science and Computational Technology (ISCSCT ’09)
Liu T, Zhang HJ, Qi F (2003) A novel video key-frame-extraction algorithm based on perceived motion energy model. IEEE Trans Circuits Syst Video Technol 13(10):1006–1013. doi:10.1109/TCSVT.2003.816521
Ma YF, Zhang HJ (2003) Contrast-based image attention analysis by using fuzzy growing. In: Multimedia’03: proceedings of the 11th ACM international conference on multimedia. ACM, New York, pp 374–381. doi:10.1145/957013.957094
Manjunath BS, Ohm JR, Vasudevan VV, Yamada A (2001) Color and texture descriptors. IEEE Trans Circuits Syst Video Technol 11(6):703–715. doi:10.1109/76.927424
Money AG, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143. doi:10.1016/j.jvcir.2007.04.002
Mukherjee D, Das S, Saha S (2007) Key frame estimation in video using randomness measure of feature point pattern. IEEE Trans Circuits Syst Video Technol 17(5):612–620. doi:10.1109/TCSVT.2007.895353
Narasimha R, Savakis A, Rao R, Queiroz RD (2003) Key frame extraction using mpeg-7 motion descriptors. In: Proceedings of the asilomar conference on signals, systems, and computers, pp 1575–1579
Panagiotakis C, Doulamis A, Tziritas G (2009) Equivalent key frames selection based on iso-content principles. IEEE Trans Circuits Syst Video Technol 19(3):447–451
Park DS, Park JS, Kim TY, Han JH (1999) Image indexing using weighted color histogram. International conference on image analysis and processing, pp 909–914. doi:http://doi.ieeecomputersociety.org/10.1109/ICIAP.1999.797711
Qing L, Wang W, Huang T, Gao W (2002) A framework for background detection in video. In: Chen YC, Chang LW, Hsu CT (eds) Advances in multimedia information processing—PCM 2002. Lecture notes in computer science, vol 2532. Springer, Berlin/Heidelberg, pp 39–48
Shih HC, Huang CL (2005) MSN: statistical understanding of broadcasted baseball video using multi-level semantic network. IEEE Trans Broadcast 51(4):449–459. doi:10.1109/TBC.2005.854169
Spyrou E, Avrithis Y (2007) Keyframe extraction using local visual semantics in the form of a region thesaurus. International workshop on semantic media adaptation and personalization, pp 98–103. doi:http://doi.ieeecomputersociety.org/10.1109/SMAP.2007.39
Spyrou E, Tolias G, Mylonas P, Avrithis Y (2009) Concept detection and keyframe extraction using a visual thesaurus. Multimed Tools Appl. 41:337–373. doi:10.1007/s11042-008-0237-9
Stirk JA, Underwood G (2007) Low-level visual saliency does not predict change detection in natural scences. J Vis 7(10):3, 1–10
Torralba A, Castelhano MS, Oliva A, Henderson JM (2006) Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev 113:766–786
Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans. Multimedia Comput Commun Appl 3(1):3. doi:10.1145/1198302.1198305
Yong S, Deng J, Purvis M (2010) Modelling semantic context for novelty detection in wildlefe scenes. In: 2010 IEEE international conference on multimedia and expo, pp 1254–1259
Zeng X, Weiming H, Li W, Zhang X, Xu B (2008) Key-frame extraction using dominant-set clustering. In: 2008 IEEE international conference on multimedia and expo, pp 1285–1288. doi:10.1109/ICME.2008.4607677
Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In: Proceedings International Conference on Image Processing, 1998, ICIP 98, vol 1, pp 866–870. doi:10.1109/ICIP.1998.723655
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yong, SP., Deng, J.D. & Purvis, M.K. Wildlife video key-frame extraction based on novelty detection in semantic context. Multimed Tools Appl 62, 359–376 (2013). https://doi.org/10.1007/s11042-011-0902-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0902-2