Skip to main content

Advertisement

Log in

Video scene segmentation and semantic representation using a novel scheme

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Grouping video content into semantic segments and classifying semantic scenes into different types are the crucial processes to content-based video organization, management and retrieval. In this paper, a novel approach to automatically segment scenes and semantically represent scenes is proposed. Firstly, video shots are detected using a rough-to-fine algorithm. Secondly, key-frames within each shot are selected adaptively with hybrid features, and redundant key-frames are removed by template matching. Thirdly, spatio-temporal coherent shots are clustered into the same scene based on the temporal constraint of video content and visual similarity between shot activities. Finally, under the full analysis of typical characters on continuously recorded videos, scene content is semantically represented to satisfy human demand on video retrieval. The proposed algorithm has been performed on various genres of films and TV program. Promising experimental results show that the proposed method makes sense to efficient retrieval of interesting video content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Adams B, Dorai C, Venkatesh S (2000) Towards automatic extraction of expressive elements from motion pictures: tempo. IEEE proceeding on International Conference on Image Processing, 641–644

  2. Aner A, Kender JR (2002) Video summaries through mosaic-based shot and scene clustering. Proceeding on European Conference on Computer Vision, 388–402

  3. Ariki Y, Kumano M, Tsukada K (2003) Highlight scene extraction in real time from baseball live video. Proceeding on ACM International Workshop on Multimedia Information Retrieval, 209–214

  4. Avrithis YS, Doulamis AD et al (1999) A stochastic framework for optimal key frame extraction from MPEG Video Databases. J Comput Vis Image Underst 75(1/2):3–24 doi:10.1006/cviu.1999.0761

    Article  Google Scholar 

  5. Bordwell D, Thompson K (1997) Film art: an introduction, 5th edn. McGraw-Hill, New York

    Google Scholar 

  6. Bouthemy P, Garcia C et al. (1999) Scene segmentation and image feature extraction for video indexing and retrieval. Proceeding on International Conference on Visual Information and Information Systems, 245–252

  7. Cernekova Z, Kotropoulos C, Pitas I (2003) Video shot segmentation using singular value decomposition. IEEE proceeding on International Conference on Multimedia and Expo, 301–302

  8. Chaisorn L, Chua TS, Lee C-H (2002) The segmentation of news video into story units. IEEE proceeding on International Conference on Multimedia and Expo, 73–76

  9. Hanjalic A, Lagendijk RL, Biemond J (1999) Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans Circuits Syst Video Technol 9(4):580–588

    Article  Google Scholar 

  10. Hoashi K, Sugano M et al. (2004) Shot boundary determination on MPEG compressed domain and story segmentation experiments for TRECVID 2004. TREC Video Retrieval Evaluation Forum

  11. Hsu W, Chang SF (2004) Generative, discriminative, and ensemble learning on multi-model perceptual fusion toward news video story segmentation. IEEE proceeding on International Conference on Multimedia and Expo, 656–659

  12. Huang J, Liu Z, Wang Y (1998) Integration of audio and visual information for content-based video segmentation. IEEE proceeding on International Conference on Image Processing, 526–530

  13. Kender JR, Yeo BL (1998) Video scene segmentation via continuous video coherence. IEEE proceeding on Computer Vision and Pattern Recognition, 367–373

  14. Li SZ, Zhu L et al. (2002) Statistic learning of multi-view face detection. Proceeding on European Conference on Computer Vision, 67–81

  15. Li Y, Narayanan S, Jay Kuo C-C (2003) Movie content analysis indexing, and skimming. Kluwer, Video Mining, Chapter 5

  16. Lienhart R, Pfeiffer S, Effelsberg W (1999) Scene determination based on video and audio features. IEEE proceeding on International Conference on Multimedia Computing and Systems, 685–690

  17. Lin T, Zhang HJ, Shi QY (2001) Video content representation for shot retrieval and scene extraction. Int J Image Graph 1(3):507–526 doi:10.1142/S0219467801000293

    Article  Google Scholar 

  18. Ngo CW, Zhang HJ et al (2002) Motion-based video representation for scene change detection. Int J Comput Vis 50(2):127–142 doi:10.1023/A:1020341931699

    Article  MATH  Google Scholar 

  19. Qi Y, Huuptmunn AG, Liu T (2003) Supervised classification for video shot segmentation. IEEE proceeding on International Conference on Multimedia and Expo, 689–672

  20. Rasheed Z, Shah M (2003) Scene detection in Hollywood movies and TV shows. IEEE proceeding on Computer vision and pattern recognition, 343–348

  21. Rasheed Z, Shah M (2005) Detection and representation of scenes in videos. IEEE Trans Multimed 7(6):1097–1105 doi:10.1109/TMM.2005.858392

    Article  Google Scholar 

  22. Rui Y, Huang TS, Mehrotra S (1999) Constructing table-of-content for videos. Journal of ACM Multimedia Systems. Spec Issue Multimedia Syst Video Libr 7(5):359–368

    Google Scholar 

  23. Shahraray B (1995) Scene change detection and content-based sampling of video sequence. Proceeding on SPIE Storage and Retrieval for Image and Video Databases, 2–13

  24. Sundaram H, Chang SF (2000) Video scene segmentation using video and audio features. IEEE proceeding on International Conference on Multimedia and Expo, 1145–1148

  25. Tavanapong W, Zhou J (2004) Shot clustering techniques for story browsing. IEEE Trans Multimed 6(4):517–526 doi:10.1109/TMM.2004.830810

    Article  Google Scholar 

  26. Truong BT, Venkatesh S, Dorai C (2003) Scene extraction in motion picture. IEEE Trans Circuits Syst Video Technol 13(1):5–15

    Article  Google Scholar 

  27. Wolf W (1996) Key frame selection by motion analysis. IEEE proceeding on International Conference on Acoustics, Speech, and Signal Processing, 1228–1231

  28. Xie L, Xu P et al (2004) Structure analysis of soccer video with domain knowledge and hidden Markov models. J Pattern Recognit Lett 25(7):767–775 doi:10.1016/j.patrec.2004.01.005

    Article  Google Scholar 

  29. Yeung M, Yeo B-L (1998) Segmentation of video by clustering and graph analysis. J Comput Vis Image Underst 71(1):94–109 doi:10.1006/cviu.1997.0628

    Article  Google Scholar 

  30. Yoshitaka A, Ishii T et al. (1997) Content-based retrieval of video data by the grammar of film. Proceeding on IEEE Symposium on Visual Languages, 310–317

  31. Yuan J, Zhang B, Lin F (2005) Graph partition model for robust temporal data segmentation. Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 758–763

  32. Zabih R, Miller J, Mai K (1999) A feature-based algorithm for detecting and classification production effects. J ACM Multimedia Syst 7(1):119–128 doi:10.1007/s005300050115

    Article  Google Scholar 

  33. Zhai Y, Shah M (2006) Video scene segmentation using Markov Chain Monte Carlo. IEEE Trans Multimed 8(4):686–697 doi:10.1109/TMM.2006.876299

    Article  Google Scholar 

  34. Zhang H, Low CY et al. (1995) Video parsing, retrieval and browsing: An integrated and content-based solution. Proceedings of ACM Conference on Multimedia, 15–24

  35. Zhao L, Qi W et al. (2000) Key-frame extraction and shot retrieval using nearest feature line (NFL). Proceedings of International Workshop on Multimedia Information Retrieval, 217–220

  36. Zhao YJ, Wang T et al. (2007) Scene segmentation and categorization using NCuts. IEEE proceeding on Computer vision and pattern recognition, 343–348

  37. Zhuang Y, Rui Y et al. (1998) Adaptive key frame extraction using unsupervised clustering. IEEE proceeding on International Conference on Multimedia and Expo, 866–870

Download references

Acknowledgements

This work is supported by the National High-Tech Research and development Plan of China (973) under Grant No. 2006CB303103, and also supported by the National Natural Science Foundation of China under Grant No. 60833009.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songhao Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, S., Liu, Y. Video scene segmentation and semantic representation using a novel scheme. Multimed Tools Appl 42, 183–205 (2009). https://doi.org/10.1007/s11042-008-0233-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-008-0233-0

Keywords

Navigation