Unsupervised scene detection and commentator building using multi-modal chains

Poulisse, Gert-Jan; Patsis, Yorgos; Moens, Marie-Francine

doi:10.1007/s11042-012-1086-0

Unsupervised scene detection and commentator building using multi-modal chains

Published: 05 May 2012

Volume 70, pages 159–175, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Gert-Jan Poulisse¹,
Yorgos Patsis^2,3 &
Marie-Francine Moens¹

211 Accesses
3 Citations
Explore all metrics

Abstract

This paper presents a novel unsupervised method for identifying the semantic structure in long semi-structured video streams. We identify chains, i.e., local clusters of repeated features from both the video stream and audio transcripts. Each chain serves as an indicator that the temporal interval it demarcates is part of the same semantic event. By layering all the chains over each other, dense regions emerge from the overlapping chains, from which we can identify the semantic structure of the video. We present two clustering strategies that accomplish this task, and compare them against a baseline Scene Transition Graph approach. We then develop a commentator that provides a semantic labeling of the resultant video segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Use of Audio Events for Improving Video Scene Segmentation

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

Discovering Video Clusters from Visual Features and Noisy Tags

Notes

Matlab: Savitzky-Golay filter, which is a moving average with filter coefficients determined by an unweighted linear least-squares regression and a polynomial model of specified degree (degree 7 used here)
http://www-nlpir.nist.gov/projects/tv2004/tv2004.html
http://www.cs.waikato.ac.nz/ml/weka/
http://en.wikipedia.org/wiki/2008_Summer_Olympics

References

Amir A, Argillander J, Berg M, Chang S-F et al (2004) IBM Research TRECVID-2004 Video Retrieval System. In Proceedings of TRECVID
Babaguchi N, Nitta N (2003) Intermodal collaboration: a strategy for semantic content analysis for broadcasted sports video. In Proceeding of the International Conference on Video Processing
Benini S, Bianchetti A, Leonardi R, Migliorati P (2006) Extraction of significant video summaries by dendrogram analysis. In Proceedings of the International Conference on Image Processing
Bertini M, Del Bimbo A, Nunziati W (2005) Common visual cues for sports highlights modeling. Multimed Tool Appl 27:215–218
Article Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Finkel JR, Grenager T, Manning CD (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings ACL
Hearst MA (1997) TextTiling: segmenting text into multi-paragraph subtopic passages. Comput Ling 23(1):33–64
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Proceedings CVPR
Li Y, Narayanan S, Kuo CCJ (2004) Content-based movie analysis and indexing based on audiovisual cues. IEEE Trans Circ Syst Video Tech 14(8):1073–1085
Article Google Scholar
Li Y, Lee S-H, Yeh C-H, Kuo C-CJ (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23(2):79–89
Article MATH Google Scholar
Liu S, Xu M, Li H, Chia L-T, Rajan D (2006) Multimodal semantic analysis and annotation for basketball video. EURASIP J Adv Sig Process 1–13
Nastase V, Strube M (2008) Decoding wikipedia categories for knowledge acquisition. Proceedings AAAI 1219–1224
Ngo C-W, Ma Y-F, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circ Syst Video Tech 15(2):296–305
Article Google Scholar
Nitta N, Babaguchi N (2002) Automatic story segmentation of closed-caption text for semantic content analysis of broadcasted sports video. In Proceedings of International Workshop on MM Information Systems, 110–116
Patsis Y, Verhelst W (2008) A speech/music/silence/garbage/ classifier for searching and indexing broadcast news material. In Proceedings of Database and Expert Systems Applications, 585–589
Poulisse GJ, Moens M-F (2010) Unsupervised scene detection in olympic video using multi-modal chains. In Proceedings of CBMI, 103–108
Poulisse GJ, Moens M-F, Dekens T, Deschacht K (2010) News story segmentation in multiple modalities. Multimed Tool Appl 48:3–22
Article Google Scholar
Quenot G, Moraru D, Ayache S, Charhad M, Guironnet M, Carminati L, Mulhem P, Gensel J, Pellerin D, Besacier L (2004) CLIPS-LIS-LSR-LABRI experiments at TRECVID 2004. In Proceedings of TRECVID
Sadler, DA, O’Connor N (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circ Syst Video Tech 1225–1233
Sidiropoulos P, Mezaris V, Kompatsiaris I, Meinedo H, Trancoso I (2009) Multi-modal scene segmentation using scene transition graphs. In Proceedings of ACM Multimedia 665–668
Skorochod’ko EF (1972) Adaptive method of automatic abstracting and indexing. Inf Process 71:1179–1182
Google Scholar
Vasconcelos N, Lippman A (2000) Statistical models of video structure for content analysis and characterization. IEEE Trans Image Process 9(1):3–19
Article Google Scholar
Wang Y, Liu Z, Huang J-C (2002) Multimedia content analysis-using both audio and visual clues. IEEE Signal Process Mag 17(6):12–36
Article Google Scholar
Xu C, Wang J, Wan K, Li Y, Duan L (2006) Live sports event detection based on broadcast video and web-casting text. In Proceedings ACM Multimedia
Xu C, Zhang Y-F, Zhu G, Rui Y, Lu H, Huang Q (2008) Using webcast text for semantic event detection in broadcast sports video. IEEE Trans Multimed 10(7):1342–1355
Article Google Scholar
Xu M, Xu C, Duan L (2008) Audio keywords generation for sports video analysis. ACM Trans Multimed Comput Comm Appl 4(2):article 11
Article Google Scholar
Yeung M, Yeo B-L, Liu B (1998) Segmentation of video by clustering and graph analysis. J Comput Vis Image Understand 7(1):94–109
Article Google Scholar

Download references

Acknowledgments

The work reported is supported by IWT-SBO project AMASS++ (Advanced Multimedia Alignment and Structured Summarization, IWT 060051) and TOSCA-MP (Task-oriented search and content annotation for media production, FP7-ICT 287532).

Author information

Authors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium
Gert-Jan Poulisse & Marie-Francine Moens
IBBT, Ghent-Ledeberg, Belgium
Yorgos Patsis
ETRO—DSSP, Vrije Universiteit Brussel, Brussels, Belgium
Yorgos Patsis

Authors

Gert-Jan Poulisse
View author publications
You can also search for this author in PubMed Google Scholar
Yorgos Patsis
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Francine Moens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gert-Jan Poulisse.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Poulisse, GJ., Patsis, Y. & Moens, MF. Unsupervised scene detection and commentator building using multi-modal chains. Multimed Tools Appl 70, 159–175 (2014). https://doi.org/10.1007/s11042-012-1086-0

Download citation

Published: 05 May 2012
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11042-012-1086-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised scene detection and commentator building using multi-modal chains

Abstract

Access this article

Similar content being viewed by others

On the Use of Audio Events for Improving Video Scene Segmentation

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

Discovering Video Clusters from Visual Features and Noisy Tags

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised scene detection and commentator building using multi-modal chains

Abstract

Access this article

Similar content being viewed by others

On the Use of Audio Events for Improving Video Scene Segmentation

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

Discovering Video Clusters from Visual Features and Noisy Tags

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation