Multimedia Tools and Applications

, Volume 51, Issue 2, pp 649–673 | Cite as

Example-based video remixing



A video remix is generally created by arranging selected video clips and combining them with other media streams such as audio clips and video transition effects. This paper proposes a system for semi-automatically creating video remixes of good expressive quality. Given multiple original video clips, audio clips, and transition effects as the input, the proposed system generates a video remix by five processes: I) video clip sequence generation, II) audio clip selection, III) audio boundary extraction, IV) video segment extraction, and V) transition effect selection, based on the spatial and temporal structural patterns automatically learned from professionally created video remix examples. Experiments using movie trailers of action genre as video remix examples not only demonstrate that video remixing by professionals can be imitated based on examples but also reveal that the video clip sequence generation and audio clip selection are the most important processes to improve the perceived expressive quality of video remixes.


Video remixing Examples Expressive quality Structural patterns Video clips Audio clips Transition effects 



This research was partially supported by a Grant-in-Aid for Young Scientists (B) 20700087 from JSPS and by Core Project from Microsoft Institute for Japanese Academic Research Collaboration. The authors especially thank Jang-il Kim, Yosuke Kurihara, and Guozhen Jiang for their contributions to the development of the system.


  1. 1.
    Aoyanagi S, Kourai K, Sato K, Takada T, Sugawara T (2003) Evaluation of new video skimming method using audio and video information. In: Proc. DEWS, 2-A-01Google Scholar
  2. 2., Horror Movie Entertainment. Accessed 2008
  3. 3.
    Bouguet J-Y (2000) Pyramidal implementation of the Lucas Kanade feature tracker. OpenCV Documentation, Microprocessor Research Labs, Intel CorporationGoogle Scholar
  4. 4.
    Chen H-W, Kuo J-H, Chu W-T, Wu J-L (2004) Action movies segmentation and summarization based on tempo analysis. In: Proc. ACM SIGMM international workshop on multimedia information retrieval, pp 251–258Google Scholar
  5. 5.
    Duda RO, Hart PE, Stork DG (2006) Pattern classification, 2nd edn. John Wiley & Sons, 2001. p 7307Google Scholar
  6. 6.
    Dunker P, Nowak S, Begau A, Lanz C (2008) Content-based mood classification for photos and music: a generic multi-modal classification framework and evaluation approach. In: Proc. international conference on multimedia information retrieval, pp 97–104Google Scholar
  7. 7.
    Foote J, Cooper M, Girgensohn A (2002) Creating music videos using automatic media analysis. In: Proc. ACM international conference on multimedia, pp 553–560Google Scholar
  8. 8.
    Hua X-S, Lu L, Zhang H-J (2004) Automatic music video generation based on temporal pattern analysis. In: Proc. ACM international conference on multimedia, pp 472–475Google Scholar
  9. 9.
    Hua X-S, Lu L, Zhang H-J (2004) Optimization-based automated home video editing system. IEEE Trans Circuits Syst Video Technol 14(5):572–583CrossRefGoogle Scholar
  10. 10.
    Jung B, Song J, Lee Y (2007) A narrative-based abstraction framework for story-oriented video. In: ACM transactions on multimedia computing, communications and applications, vol 3, no 2Google Scholar
  11. 11.
    Kim J-G, Chang HS, Kim J, Kim H-M (2004) Threshold-based camera motion characterization of MPEG video. ETRI J 26(3):269–272CrossRefGoogle Scholar
  12. 12.
    Kumano M, Ariki Y, Shunto K, Tsukada K (2002) Video editing support system based on video content analysis. In: Proc. Asian conference on computer vision, pp 628–633Google Scholar
  13. 13.
    Kurihara Y, Nitta N, Babaguchi N (2009) Automatic appropriate segment extraction from shots based on learning from example videos. In: Proc. Pacific-Rim symposium on image and video technology, pp 1082–1093Google Scholar
  14. 14.
    Lee S-H, Yeh CH, Kuo C-CJ (2004) Video skimming based on story units via general tempo analysis. In: Proc. IEEE international conference on multimedia and expo, vol 2, pp 1099– 1102Google Scholar
  15. 15.
    Li B, Pan H, Sezan I (2003) A general framework for sports video summarization with its application to soccer. In: Proc. IEEE international conference on acoustics, speech, and signal processing, pp 169–172Google Scholar
  16. 16.
    Li T, Ogihara M, Li Q (2003) A comparative study on content-based music classification. In: Proc. international ACM SIGIR conference on research and development in information retrieval, pp 282–289Google Scholar
  17. 17.
    Lienhart R (1999) Abstracting home video automatically. In: Proc. ACM international conference on multimedia, pp 37–40Google Scholar
  18. 18.
    Liu Z, Wang Y, Chen T (1998) Audio feature extraction and analysis for scene segmentation and classification. J VLSI Signal Process Syst 20(1–2):pp 61–79CrossRefGoogle Scholar
  19. 19.
    Mckinney M, Breebaart J (2003) Features for audio and music classification. In: Proc. international symposium on music information retrieval, pp 151–158Google Scholar
  20. 20.
    MixMeister BPM Analyzer. Accessed 2008
  21. 21.
    Mulhem P, Kankanhalli MS, Yi J, Hassan H (2003) Pivot vector space approach for audio-video mixing. IEEE Multimed 10(2):28–40CrossRefGoogle Scholar
  22. 22.
    Nitta N, Takahashi Y, Babaguchi N (2009) Automatic personalized video abstraction for sports videos using metadata. Multimed Tools Appl 41(1):1–25CrossRefGoogle Scholar
  23. 23.
    Rabiner LR (1989) A tutorial on Hidden Markov Models and selected applications in speech recognition. In: Proc. IEEE, vol 77, pp 257–285Google Scholar
  24. 24.
    Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64CrossRefGoogle Scholar
  25. 25.
    Smeaton AF, Lehane B, O’Connor NE, Brady C, Craig G (2006) Automatically selecting shots for action movie trailers. In: Proc. ACM international workshop on multimedia information retrieval, pp 231–238Google Scholar
  26. 26.
    Soundtracknet, The Art of Film and Television Music. Accessed 2008
  27. 27.
    Sundaram H, Chang S-F (2001) Condensing computable scenes using visual complexity and film syntax analysis. In: Proc. IEEE international conference on mutimedia and expo, pp 389–392Google Scholar
  28. 28.
    Takemoto R, Yoshitaka A, Hirashima T (2006) Video editing based on movie effects by shot length transition. Technical Report of IEICE PRMU2005-149-183, pp 19–24Google Scholar
  29. 29.
    Taskiran CM, Pizlo Z, Amir A, Poncelelon D, Delp EJ (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimedia 8(4):775–791CrossRefGoogle Scholar
  30. 30.
    The International Movie Database. Accessed 2005
  31. 31.
    Tjondronegoro D, Chen Y-PP, Pham B (2003) Sports video summarization using highlights and play-breaks. In: Proc. ACM international workshop on multimedia information retrieval, pp 201–208Google Scholar
  32. 32.
    Viola P, Jones M (2001) Rapid object detection using a boosting cascade of simple features. In: Proc. IEEE conference on computer vision and pattern recognition, vol 1, pp 511–518Google Scholar
  33. 33.
    Wang J, Xu C, Chung E, Duan L, Wan K, Tian Q (2005) Automatic generation of personalized music sports video. In: Proc. ACM international conference on multimedia, pp 735–744Google Scholar
  34. 34.
    Yoon J-C, Lee I-K, Byun S (2009) Automated music video generation using multi-level feature-based segmentation. Multimed Tools Appl 41(2):197–214CrossRefGoogle Scholar
  35. 35.
    Zettl H (1999) Sight sound motion: applied media aesthetics, 3rd edn. Wadsworth PublishingGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Graduate School of EngineeringOsaka UniversitySuitaJapan

Personalised recommendations