Abstract
A content-based video retrieval via visual feature pooling is proposed in this paper. Since these visual words represent local features extracted from frame images, spatio-temporal constrains are applied to solve the ambiguity of the model towards effective retrieval of semantic video clips. Both shot level and segment level processing are employed, and the latter is found more robust in dealing with complex scenes where accurate video segmentation may fail. Our experimental results have shown that the constrained scheme help to improve 5 % average matching accuracy. In addition, it suggests that summarized videos at 25–30 % of original size can still maintain a viewing quality of 70–80 % towards fast content delivery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cotsaces C, Nikolaidis N, Pitas I (2006) Video shot detection and condensed representation: a review. IEEE Signal Proc Mag 23(2):28–37
Ren J, Jiang J, Chen J (2009) Shot boundary detection in MPEG videos using local and global indicators. IEEE Trans Circ Syst Video Tech 19(8):1234–1238
Ren J, Jiang J (2009) Hierarchical modelling and adaptive clustering for real-time summarization of rush videos. IEEE Trans Multimedia 11(5):906–917
Yuan Y, Wang H, Xiao et al (2007) A formal study of shot boundary detection. IEEE Trans Circ Syst Video Tech 17(2):168–186
Ngo CW, Ma YF, Zhang H-J (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circ Syst Video Tech 15(2):296–305
Chang S-F, Vetro A (2005) Video adaptation: concepts, technologies, and open issues. Proc IEEE 93(1):148–158
Hanjalic A, Xu L-Q (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154
Qin J, Yung HC (2010) Scene categorization via contextual visual words. Pattern recognition
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. In: Int J Comput Vis (IJCV)
Tuytelaars T, Lampert CH, Blaschko MB, Buntine W (2010) Unsupervised object discovery: a comparison. IJCV
van Gemert JC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans, PAMI
Rapantzikos, K., Tsapatsoulis, N., Avrithis, Y., and Kollias, S. 2010, Spatiotemporal Saliency for Video Classification, Signal Processing: Image Communication
Zhang J et al (2007) Local feature and kernels for classification of textures and object categories: a comprehensive study. IJCV 73(2):213–238
Spyrou E, Tolias G, Mylonas Ph, Avrithis Y (2009) Concept detection and keyframe extraction using a visual thesaurus. Multimed Tools Appl 41(3):337–373
Tuytelaars T, Mikolajczyk K (2008) Local invariant feature detectors: a survey. Found Tr Comput Gr Vis 3(3):177–280
Jiang J, Qiu K, Xiao G (2008) A block-edge-pattern-based content descriptor in DCT domain. IEEE Trans Circ Syst Video Tech 18(7):994–998
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Ren, J., Ren, J. (2016). Feature Pooling Using Spatio-Temporal Constrain for Video Summarization and Retrieval. In: Park, J., Jin, H., Jeong, YS., Khan, M. (eds) Advanced Multimedia and Ubiquitous Engineering. Lecture Notes in Electrical Engineering, vol 393. Springer, Singapore. https://doi.org/10.1007/978-981-10-1536-6_50
Download citation
DOI: https://doi.org/10.1007/978-981-10-1536-6_50
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1535-9
Online ISBN: 978-981-10-1536-6
eBook Packages: Computer ScienceComputer Science (R0)