Abstract
Grouping video content into semantic segments and classifying semantic scenes into different types are the crucial processes to content-based video organization, management and retrieval. In this paper, a novel approach to automatically segment scenes and semantically represent scenes is proposed. Firstly, video shots are detected using a rough-to-fine algorithm. Secondly, key-frames within each shot are selected adaptively with hybrid features, and redundant key-frames are removed by template matching. Thirdly, spatio-temporal coherent shots are clustered into the same scene based on the temporal constraint of video content and visual similarity between shot activities. Finally, under the full analysis of typical characters on continuously recorded videos, scene content is semantically represented to satisfy human demand on video retrieval. The proposed algorithm has been performed on various genres of films and TV program. Promising experimental results show that the proposed method makes sense to efficient retrieval of interesting video content.
Similar content being viewed by others
References
Adams B, Dorai C, Venkatesh S (2000) Towards automatic extraction of expressive elements from motion pictures: tempo. IEEE proceeding on International Conference on Image Processing, 641–644
Aner A, Kender JR (2002) Video summaries through mosaic-based shot and scene clustering. Proceeding on European Conference on Computer Vision, 388–402
Ariki Y, Kumano M, Tsukada K (2003) Highlight scene extraction in real time from baseball live video. Proceeding on ACM International Workshop on Multimedia Information Retrieval, 209–214
Avrithis YS, Doulamis AD et al (1999) A stochastic framework for optimal key frame extraction from MPEG Video Databases. J Comput Vis Image Underst 75(1/2):3–24 doi:10.1006/cviu.1999.0761
Bordwell D, Thompson K (1997) Film art: an introduction, 5th edn. McGraw-Hill, New York
Bouthemy P, Garcia C et al. (1999) Scene segmentation and image feature extraction for video indexing and retrieval. Proceeding on International Conference on Visual Information and Information Systems, 245–252
Cernekova Z, Kotropoulos C, Pitas I (2003) Video shot segmentation using singular value decomposition. IEEE proceeding on International Conference on Multimedia and Expo, 301–302
Chaisorn L, Chua TS, Lee C-H (2002) The segmentation of news video into story units. IEEE proceeding on International Conference on Multimedia and Expo, 73–76
Hanjalic A, Lagendijk RL, Biemond J (1999) Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans Circuits Syst Video Technol 9(4):580–588
Hoashi K, Sugano M et al. (2004) Shot boundary determination on MPEG compressed domain and story segmentation experiments for TRECVID 2004. TREC Video Retrieval Evaluation Forum
Hsu W, Chang SF (2004) Generative, discriminative, and ensemble learning on multi-model perceptual fusion toward news video story segmentation. IEEE proceeding on International Conference on Multimedia and Expo, 656–659
Huang J, Liu Z, Wang Y (1998) Integration of audio and visual information for content-based video segmentation. IEEE proceeding on International Conference on Image Processing, 526–530
Kender JR, Yeo BL (1998) Video scene segmentation via continuous video coherence. IEEE proceeding on Computer Vision and Pattern Recognition, 367–373
Li SZ, Zhu L et al. (2002) Statistic learning of multi-view face detection. Proceeding on European Conference on Computer Vision, 67–81
Li Y, Narayanan S, Jay Kuo C-C (2003) Movie content analysis indexing, and skimming. Kluwer, Video Mining, Chapter 5
Lienhart R, Pfeiffer S, Effelsberg W (1999) Scene determination based on video and audio features. IEEE proceeding on International Conference on Multimedia Computing and Systems, 685–690
Lin T, Zhang HJ, Shi QY (2001) Video content representation for shot retrieval and scene extraction. Int J Image Graph 1(3):507–526 doi:10.1142/S0219467801000293
Ngo CW, Zhang HJ et al (2002) Motion-based video representation for scene change detection. Int J Comput Vis 50(2):127–142 doi:10.1023/A:1020341931699
Qi Y, Huuptmunn AG, Liu T (2003) Supervised classification for video shot segmentation. IEEE proceeding on International Conference on Multimedia and Expo, 689–672
Rasheed Z, Shah M (2003) Scene detection in Hollywood movies and TV shows. IEEE proceeding on Computer vision and pattern recognition, 343–348
Rasheed Z, Shah M (2005) Detection and representation of scenes in videos. IEEE Trans Multimed 7(6):1097–1105 doi:10.1109/TMM.2005.858392
Rui Y, Huang TS, Mehrotra S (1999) Constructing table-of-content for videos. Journal of ACM Multimedia Systems. Spec Issue Multimedia Syst Video Libr 7(5):359–368
Shahraray B (1995) Scene change detection and content-based sampling of video sequence. Proceeding on SPIE Storage and Retrieval for Image and Video Databases, 2–13
Sundaram H, Chang SF (2000) Video scene segmentation using video and audio features. IEEE proceeding on International Conference on Multimedia and Expo, 1145–1148
Tavanapong W, Zhou J (2004) Shot clustering techniques for story browsing. IEEE Trans Multimed 6(4):517–526 doi:10.1109/TMM.2004.830810
Truong BT, Venkatesh S, Dorai C (2003) Scene extraction in motion picture. IEEE Trans Circuits Syst Video Technol 13(1):5–15
Wolf W (1996) Key frame selection by motion analysis. IEEE proceeding on International Conference on Acoustics, Speech, and Signal Processing, 1228–1231
Xie L, Xu P et al (2004) Structure analysis of soccer video with domain knowledge and hidden Markov models. J Pattern Recognit Lett 25(7):767–775 doi:10.1016/j.patrec.2004.01.005
Yeung M, Yeo B-L (1998) Segmentation of video by clustering and graph analysis. J Comput Vis Image Underst 71(1):94–109 doi:10.1006/cviu.1997.0628
Yoshitaka A, Ishii T et al. (1997) Content-based retrieval of video data by the grammar of film. Proceeding on IEEE Symposium on Visual Languages, 310–317
Yuan J, Zhang B, Lin F (2005) Graph partition model for robust temporal data segmentation. Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 758–763
Zabih R, Miller J, Mai K (1999) A feature-based algorithm for detecting and classification production effects. J ACM Multimedia Syst 7(1):119–128 doi:10.1007/s005300050115
Zhai Y, Shah M (2006) Video scene segmentation using Markov Chain Monte Carlo. IEEE Trans Multimed 8(4):686–697 doi:10.1109/TMM.2006.876299
Zhang H, Low CY et al. (1995) Video parsing, retrieval and browsing: An integrated and content-based solution. Proceedings of ACM Conference on Multimedia, 15–24
Zhao L, Qi W et al. (2000) Key-frame extraction and shot retrieval using nearest feature line (NFL). Proceedings of International Workshop on Multimedia Information Retrieval, 217–220
Zhao YJ, Wang T et al. (2007) Scene segmentation and categorization using NCuts. IEEE proceeding on Computer vision and pattern recognition, 343–348
Zhuang Y, Rui Y et al. (1998) Adaptive key frame extraction using unsupervised clustering. IEEE proceeding on International Conference on Multimedia and Expo, 866–870
Acknowledgements
This work is supported by the National High-Tech Research and development Plan of China (973) under Grant No. 2006CB303103, and also supported by the National Natural Science Foundation of China under Grant No. 60833009.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, S., Liu, Y. Video scene segmentation and semantic representation using a novel scheme. Multimed Tools Appl 42, 183–205 (2009). https://doi.org/10.1007/s11042-008-0233-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-008-0233-0