Skip to main content
Log in

Temporal video segmentation: detecting the end-of-act in circus performance videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The segmentation into acts of a circus performance video is challenging as the content has similar characteristics to other performance videos but is quite different from movies, TV programs, and home videos. Segmentation is useful as a long duration circus show usually contains several shorter segments that are acts. We propose a new method for detecting end-of-act within circus performance videos. Unlike other temporal video segmentation methods, this method does not rely on shot detection techniques and uses audio and video content analysis separately. First is audio content analysis, for detecting applause on the circus audio stream. Second is image analysis. The applause is further analyzed to test whether this applause occurs at the end-of-act. An end-of-act is detected, if the image(s) before and after the applause are different or there are black frames just after the applause. Otherwise, it is not the end-of-act. The experiment to detect end-of-act on Circus Oz performance videos achieved a 92.27 % recall and 49.05 % precision, providing useful clues that assist human annotators to segment circus video into acts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Briggs F, Raich R, Fern XZ (2009) Audio classification of bird species: a statistical manifold approach. 9th IEEE Int Conf Data Mining 51–60

  2. Cai L-H, Lu L, Hanjalic A, Zhang H-J (2006) A flexible framework for key audio effects detection and auditory context inference. IEEE Trans Audio, Speech, Language Process 14:1026–1039

    Article  Google Scholar 

  3. Cao Y, Tavanapong W, Kim K, Oh JH (2003) Audio-assisted scene segmentation for story browsing. 2nd International Conference on Image and Video Retrieval. Springer-Verlag 446–455

  4. Chasanis V, Kalogeratos A, Likas A (2009) Movie segmentation into scenes and chapters using locally weighted bag of visual words. Proceedings of the ACM International Conference on Image and Video Retrieval. ACM Press, New York, Article No. 35

    Google Scholar 

  5. Chen L-H, Lai Y-C, Mark Liao H-YM (2008) Movie scene segmentation using background information. Pattern Recogn 41:1056–1065

    Article  MATH  Google Scholar 

  6. Covell M, Baluja S, Fink M (2006) Advertisement detection and replacement using acoustic and visual repetition. IEEE 8th Workshop Multimed Sig Process 461–466

  7. Duan L-Y, Wang J, Zheng Y, Jin JS, Lu H, Xu C (2006) Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis. Proceedings of the 14th annual ACM international conference on Multimedia. ACM, New York, pp 201–210

    Google Scholar 

  8. Günsel B, Ferman AM, Tekalp AM (1998) Temporal video segmentation using unsupervised clustering and semantic object tracking. J Electronic Imag 7:592–604

    Article  Google Scholar 

  9. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: and update. ACM SIGKDD Exporation Newslett 11:10–18

    Article  Google Scholar 

  10. Hanjalic A, Lagendijk R, Biemond J (1999) Automatically segmenting movies into logical story units. Proc 3rd Int Conf Visual Inform Inform Syst. Springer-Verlag 229–236

  11. Harb H, Chen L (2007) A general audio classifier based on human perception motivated model. Multimed Tools Applic 34:375–395

    Article  Google Scholar 

  12. Jarina R, Olajec J (2007) Discriminative feature selection for applause sounds detection. Eighth Int Workshop Imag Anal Multimed Interact Service, Paper 13

  13. Kijak E, Gravier G, Oisel L, Gros P (2006) Audiovisual integration for tennis broadcast structuring. Multimed Tools Applic 30:289–311

    Article  Google Scholar 

  14. Kiranyaz S, Qureshi AF, Gabbouj M (2006) A generic audio classification and segmentation approach for multimedia indexing and retrieval. IEEE Trans Audio, Speech, Language Process 14:1062–1081

    Article  Google Scholar 

  15. Lesser N, Ellis DPW (2005) Clap detection and discrimination for rhythm therapy. Proc IEEE Int Conf Acoustics, Speech, Sig Process. Philadelphia, Pennsylvania 37–40

  16. Li Y-X, He Q-H, Kwong S, Li T, Yang J-C (2009) Characteristics-based effective applause detection for meeting speech. Signal Process 89:1625–1633

    Article  MATH  Google Scholar 

  17. Lienhart R, Pfeiffer S, Effelsberg W (1999) Scene determination based on video and audio features. IEEE Int Conf Multimed Comput Syst. Florence 685–690

  18. Lienhart R, Kuhmunch C, Effelsberg W et al. (1997) On the detection and recognition of television commercials. IEEE Int Conf Multimed Comput Syst 509–516

  19. Liu C, Wang D, Zhu J, Zhang B (2012) Learning a contextual multi-thread model for movie/TV scene segmentation. IEEE Trans Multimed 15:884–897

    Google Scholar 

  20. Liu N, Zhao Y, Zhu Z (2010) Commercial recognition in TV streams using coarse-to-fine matching strategy. Proc 11th Pacific Rim Conf Adv Multimed Inform Process: Part I. Shanghai, China 296–307

  21. Liu N, Zhao Y, Zhu Z, Lu H (2011) Exploiting visual-audio-textual characteristics for automatic TV commercial block detection and segmentation. IEEE Trans Multimed 13:961–973

    Article  Google Scholar 

  22. Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. Multimedia Systems 8:482–492

    Article  Google Scholar 

  23. Manoj C, Magesh S, Sankaran AS, Manikandan MS (2011) Novel approach for detecting applause in continuous meeting speech. 3rd Int Conf Electronics Comput Technol (ICECT) 182–186

  24. McEnnis D, McKay C, Fujinaga I, Depalle P (2005) JAudio: a feature extraction library. 6th Int Conf Music Inform Retriev 600–603

  25. McKay C (2010) Automatic music classification with jMIR. Ph.D. Dissertation. McGill University, Canada

  26. Olajec J, Jarina R, Kuba M (2006) GA-based feature extraction for clapping sound detection. 8th seminar on neural network applications in electrical engineering. NEUREL 2006:21–25

    Google Scholar 

  27. Sadlier DA, Marlow S, O’Connor NE, Murphy N (2001) Automatic TV advertisement detection from MPEG Bitstream. Proc 1st Int Workshop Pattern Recognit Inform Syst: In conjunction with ICEIS 2001. ICEIS Press 14–25

  28. Sidiropoulos P (2011) Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans Circ Syst Video Technol 21:1163–1177

    Article  Google Scholar 

  29. Silva P (2012) Classification, segmentation and chronological prediction of cinematic sound. 11th Int Conf Mach Learn Applic (ICMLA) 2:369–374

    Google Scholar 

  30. Snoek CGM, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimed Tools Applic 25:5–35

    Article  Google Scholar 

  31. Subashini K, Palanivel S (2012) Audio-video based segmentation and classification using SVM and AANN. J Comput Applic 53:43–49

    Google Scholar 

  32. Theodorou T, Mporas L, Fakotakis N (2012) Automatic sound classification of radio broadcast news. J Sig Process, Imag Process, Pattern Recognit 5:37–47

    Google Scholar 

  33. Yuan J, Wang H, Xiao L, Zheng W, Li J, Lin F, Zhang B (2007) A formal study of shot boundary detection. IEEE Trans Circ Syst Video Technol 17:168–186

    Article  Google Scholar 

  34. Zhang T, Kuo C-CJ (1999) Hierarchical classification of audio data for archiving and retrieving. Proc IEEE Int Conf Acoustics, Speech, Sig Process 6:3001–3004

    Google Scholar 

Download references

Acknowledgments

This research was supported under Australian Research Council’s Linkage Projects funding scheme (project number LP100200118). We would like to thank our partners on the project: Australia Research Council, RMIT University, LaTrobe University, Circus Australia Ltd, Australia Council for the Arts, and Victoria Arts Centre Trust. We thank the anonymous referees for their helpful feedback and suggestion improvements to the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukman H. Iwan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Iwan, L.H., Thom, J.A. Temporal video segmentation: detecting the end-of-act in circus performance videos. Multimed Tools Appl 76, 1379–1401 (2017). https://doi.org/10.1007/s11042-015-3130-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3130-3

Keywords

Navigation