Multimedia Tools and Applications

, Volume 76, Issue 1, pp 1379–1401 | Cite as

Temporal video segmentation: detecting the end-of-act in circus performance videos

  • Lukman H. Iwan
  • James A. Thom


The segmentation into acts of a circus performance video is challenging as the content has similar characteristics to other performance videos but is quite different from movies, TV programs, and home videos. Segmentation is useful as a long duration circus show usually contains several shorter segments that are acts. We propose a new method for detecting end-of-act within circus performance videos. Unlike other temporal video segmentation methods, this method does not rely on shot detection techniques and uses audio and video content analysis separately. First is audio content analysis, for detecting applause on the circus audio stream. Second is image analysis. The applause is further analyzed to test whether this applause occurs at the end-of-act. An end-of-act is detected, if the image(s) before and after the applause are different or there are black frames just after the applause. Otherwise, it is not the end-of-act. The experiment to detect end-of-act on Circus Oz performance videos achieved a 92.27 % recall and 49.05 % precision, providing useful clues that assist human annotators to segment circus video into acts.


Video temporal segmentation Audio classification Sound detection Image sequence analysis Image comparison Machine learning Performing arts 



This research was supported under Australian Research Council’s Linkage Projects funding scheme (project number LP100200118). We would like to thank our partners on the project: Australia Research Council, RMIT University, LaTrobe University, Circus Australia Ltd, Australia Council for the Arts, and Victoria Arts Centre Trust. We thank the anonymous referees for their helpful feedback and suggestion improvements to the paper.


  1. 1.
    Briggs F, Raich R, Fern XZ (2009) Audio classification of bird species: a statistical manifold approach. 9th IEEE Int Conf Data Mining 51–60Google Scholar
  2. 2.
    Cai L-H, Lu L, Hanjalic A, Zhang H-J (2006) A flexible framework for key audio effects detection and auditory context inference. IEEE Trans Audio, Speech, Language Process 14:1026–1039CrossRefGoogle Scholar
  3. 3.
    Cao Y, Tavanapong W, Kim K, Oh JH (2003) Audio-assisted scene segmentation for story browsing. 2nd International Conference on Image and Video Retrieval. Springer-Verlag 446–455Google Scholar
  4. 4.
    Chasanis V, Kalogeratos A, Likas A (2009) Movie segmentation into scenes and chapters using locally weighted bag of visual words. Proceedings of the ACM International Conference on Image and Video Retrieval. ACM Press, New York, Article No. 35Google Scholar
  5. 5.
    Chen L-H, Lai Y-C, Mark Liao H-YM (2008) Movie scene segmentation using background information. Pattern Recogn 41:1056–1065CrossRefMATHGoogle Scholar
  6. 6.
    Covell M, Baluja S, Fink M (2006) Advertisement detection and replacement using acoustic and visual repetition. IEEE 8th Workshop Multimed Sig Process 461–466Google Scholar
  7. 7.
    Duan L-Y, Wang J, Zheng Y, Jin JS, Lu H, Xu C (2006) Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis. Proceedings of the 14th annual ACM international conference on Multimedia. ACM, New York, pp 201–210Google Scholar
  8. 8.
    Günsel B, Ferman AM, Tekalp AM (1998) Temporal video segmentation using unsupervised clustering and semantic object tracking. J Electronic Imag 7:592–604CrossRefGoogle Scholar
  9. 9.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: and update. ACM SIGKDD Exporation Newslett 11:10–18CrossRefGoogle Scholar
  10. 10.
    Hanjalic A, Lagendijk R, Biemond J (1999) Automatically segmenting movies into logical story units. Proc 3rd Int Conf Visual Inform Inform Syst. Springer-Verlag 229–236Google Scholar
  11. 11.
    Harb H, Chen L (2007) A general audio classifier based on human perception motivated model. Multimed Tools Applic 34:375–395CrossRefGoogle Scholar
  12. 12.
    Jarina R, Olajec J (2007) Discriminative feature selection for applause sounds detection. Eighth Int Workshop Imag Anal Multimed Interact Service, Paper 13Google Scholar
  13. 13.
    Kijak E, Gravier G, Oisel L, Gros P (2006) Audiovisual integration for tennis broadcast structuring. Multimed Tools Applic 30:289–311CrossRefGoogle Scholar
  14. 14.
    Kiranyaz S, Qureshi AF, Gabbouj M (2006) A generic audio classification and segmentation approach for multimedia indexing and retrieval. IEEE Trans Audio, Speech, Language Process 14:1062–1081CrossRefGoogle Scholar
  15. 15.
    Lesser N, Ellis DPW (2005) Clap detection and discrimination for rhythm therapy. Proc IEEE Int Conf Acoustics, Speech, Sig Process. Philadelphia, Pennsylvania 37–40Google Scholar
  16. 16.
    Li Y-X, He Q-H, Kwong S, Li T, Yang J-C (2009) Characteristics-based effective applause detection for meeting speech. Signal Process 89:1625–1633CrossRefMATHGoogle Scholar
  17. 17.
    Lienhart R, Pfeiffer S, Effelsberg W (1999) Scene determination based on video and audio features. IEEE Int Conf Multimed Comput Syst. Florence 685–690Google Scholar
  18. 18.
    Lienhart R, Kuhmunch C, Effelsberg W et al. (1997) On the detection and recognition of television commercials. IEEE Int Conf Multimed Comput Syst 509–516Google Scholar
  19. 19.
    Liu C, Wang D, Zhu J, Zhang B (2012) Learning a contextual multi-thread model for movie/TV scene segmentation. IEEE Trans Multimed 15:884–897Google Scholar
  20. 20.
    Liu N, Zhao Y, Zhu Z (2010) Commercial recognition in TV streams using coarse-to-fine matching strategy. Proc 11th Pacific Rim Conf Adv Multimed Inform Process: Part I. Shanghai, China 296–307Google Scholar
  21. 21.
    Liu N, Zhao Y, Zhu Z, Lu H (2011) Exploiting visual-audio-textual characteristics for automatic TV commercial block detection and segmentation. IEEE Trans Multimed 13:961–973CrossRefGoogle Scholar
  22. 22.
    Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. Multimedia Systems 8:482–492CrossRefGoogle Scholar
  23. 23.
    Manoj C, Magesh S, Sankaran AS, Manikandan MS (2011) Novel approach for detecting applause in continuous meeting speech. 3rd Int Conf Electronics Comput Technol (ICECT) 182–186Google Scholar
  24. 24.
    McEnnis D, McKay C, Fujinaga I, Depalle P (2005) JAudio: a feature extraction library. 6th Int Conf Music Inform Retriev 600–603Google Scholar
  25. 25.
    McKay C (2010) Automatic music classification with jMIR. Ph.D. Dissertation. McGill University, CanadaGoogle Scholar
  26. 26.
    Olajec J, Jarina R, Kuba M (2006) GA-based feature extraction for clapping sound detection. 8th seminar on neural network applications in electrical engineering. NEUREL 2006:21–25Google Scholar
  27. 27.
    Sadlier DA, Marlow S, O’Connor NE, Murphy N (2001) Automatic TV advertisement detection from MPEG Bitstream. Proc 1st Int Workshop Pattern Recognit Inform Syst: In conjunction with ICEIS 2001. ICEIS Press 14–25Google Scholar
  28. 28.
    Sidiropoulos P (2011) Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans Circ Syst Video Technol 21:1163–1177CrossRefGoogle Scholar
  29. 29.
    Silva P (2012) Classification, segmentation and chronological prediction of cinematic sound. 11th Int Conf Mach Learn Applic (ICMLA) 2:369–374Google Scholar
  30. 30.
    Snoek CGM, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimed Tools Applic 25:5–35CrossRefGoogle Scholar
  31. 31.
    Subashini K, Palanivel S (2012) Audio-video based segmentation and classification using SVM and AANN. J Comput Applic 53:43–49Google Scholar
  32. 32.
    Theodorou T, Mporas L, Fakotakis N (2012) Automatic sound classification of radio broadcast news. J Sig Process, Imag Process, Pattern Recognit 5:37–47Google Scholar
  33. 33.
    Yuan J, Wang H, Xiao L, Zheng W, Li J, Lin F, Zhang B (2007) A formal study of shot boundary detection. IEEE Trans Circ Syst Video Technol 17:168–186CrossRefGoogle Scholar
  34. 34.
    Zhang T, Kuo C-CJ (1999) Hierarchical classification of audio data for archiving and retrieving. Proc IEEE Int Conf Acoustics, Speech, Sig Process 6:3001–3004Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.School of Computer Science and Information TechnologyRMIT UniversityMelbourneAustralia

Personalised recommendations