Automatic Home Video Editing

  • Xian-Sheng Hua
  • Hong-Jiang Zhang
Part of the Signals and Communication Technology book series (SCT)


In this chapter, we present an optimization-based system for editing home video in an automatic manner. This system automatically selects highlight segments from a set of raw home videos and aligns them with a given piece of incidental music to create an edited video segment with a desired length, which is based on the content of the video and the incidental music. We developeded an approach for extracting temporal structure and determining the importance of a video segment in order to facilitate the selection of highlight segments. And we also extract temporal structure, beats and tempos from the incidental music. In order to create more professional-looking results, the selected highlight segments satisfy a set of editing rules and are matched to the content of the incidental music. This task is formulated as a nonlinear 0-1 programming problem with adjustable and increasable constraints. The output video is rendered by connecting the selected highlight video segments with transition effects and the incidental music. We will also introduce the real product with this technology, as well as a set of extensions based on the automatic home video editing system.


Video Segment Scalable Video Code Music Video Motion Intensity Video Summarization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boivin S., et al. Image-based Rendering of Diffuse, Specular and Glossy Surfaces from a Single Image. ACM SIGGRAPH 2001, 107–116.Google Scholar
  2. 2.
    DeMenthon, D., Kobla, V., and Doermann, D. Video Summarization by Curve Simplification. ACM Multimedia 1998, Bristol, UK, 211–218, 1998.Google Scholar
  3. 3.
    Dwelle, T. Music Video 101. [eBook]
  4. 4.
    Foote, J., Cooper, M., and Girgensohn, A. Creating Music Videos Using Automatic Media Analysis. ACM MM 2002.Google Scholar
  5. 5.
    Foote, J., et al. Creating Music Videos Using Automatic Media Analysis. ACM Multimedia 2002.Google Scholar
  6. 6.
    Freund, Y., and Schapire, R. E. A decision-theoretic generalization of online learning and an application to boosting. The Journal of Computer and System Science, 55(1), 119–139, 1997.MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Girgensohn, A. et al. A Semi-Automatic Approach to Home Video Editing. Proc. UIST00, ACM Press, 81–89, 2000.Google Scholar
  8. 8.
    Girgensohn , A. et al. Home Video Editing Made Easy – Balancing Automation and User Control. Human-Computer Interaction INTERACT’01, IOS Press, 464–471, 2001.Google Scholar
  9. 9.
    Gong, Y. H., and Liu, X. Video Summarization Using Singular Value Decomposition. Proc. CVPR, June, 2000.Google Scholar
  10. 10.
    Hanjalic, A., Lagendijk, R. L., and Biemond, J. Automated Highlevel Movie Segmentation for Advanced Video-Retrieval Systems. IEEE Trans on Circuits and Systems For Video Technology, 9(4), 580–588, June 1999.CrossRefGoogle Scholar
  11. 11.
    Horry D., et al. Tour into the Picture: Using a Spidery Mesh Interface to Make Animation from a Single Image. ACM SIGGRAPH 1997.Google Scholar
  12. 12.
    Hua, X. S., et al. Optimization-Based Automated Home Video Editing System. IEEE Trans. on Circuits and Systems for Video Technology, 4(5), 572–583, May 2004.CrossRefGoogle Scholar
  13. 13.
    Hua, X. S., Lu, L., Zhang, H. J. Automatic Music Video Generation Based on Temporal Pattern Analysis. ACM Multimedia 2004, October 10–16, New York, USA, 2004.CrossRefGoogle Scholar
  14. 14.
    Hua, X. S, Lu, L., Zhang, H. J. Automatically Converting Photographic Series into Video. ACM Multimedia, October 10–16, New York, USA, 2004.Google Scholar
  15. 15.
    Hua, X. S., Li, S. Interactive Video Authoring and Sharing Based on Two-Layer Templates.  Workshop on Human-Centered Multimedia 2006 (HCM 2006 – ACM MM 2006 Workshop).Google Scholar
  16. 16.
    Hua, X. S. Li, S.  Personal Media Sharing and Authoring On the Web. ACM Multimedia 2005. Singapore. November 6–11, 2005.Google Scholar
  17. 17.
    Hua, X. S., et al. Automatic Location of Text in Video Frames. Proc. ACM Multimedia 2001 Workshops: Multimedia Information Retrieval (MIR2001), 24–27, Ottawa, Canada, October 5, 2001.CrossRefGoogle Scholar
  18. 18.
    Itti, L. Real-Time High-Performance Attention Focusing in Outdoors Color Video Streams. Proc. SPIE Human Vision and Electronic Imaging IV (HVEI'02), January 2002.Google Scholar
  19. 19.
    Ozer, J. Microsoft Windows Movie Maker 2 (Visual QuickStart Guide). Peachpit Press; New Edition (July 24, 2007). ISBN-13: 978-0321199546.Google Scholar
  20. 20.
    Jeho, N., Ahmed, H. T. Dynamic Video Summarization and Visualization. Proc. ACM Multimedia, October 1999.Google Scholar
  21. 21.
    Kender, J. R., Yeo, B. L. Video Scene Segmentation via Continuous Video Coherence. Proc. IEEE Intl Conf on Computer Vision and Pattern Recognition 1998, 367–373.Google Scholar
  22. 22.
    Li, S. Z., et al. Statistical Learning of Multi-View Face Detection. Proc. of ECCV 2002.Google Scholar
  23. 23.
    Lienhart, R. Abstracting Home Video Automatically. ACM Multimedia 99, Orlando, Florida, 1999.CrossRefGoogle Scholar
  24. 24.
    Lienhart, R. Dynamic Video Summarization of Home Video. SPIE 3972: Storage and Retrieval for Media Databases 2000, Jan 2000.Google Scholar
  25. 25.
    Lin, T., Zhang, H. J. Video Scene Extraction by Force Competition. ICME2001.Google Scholar
  26. 26.
    Lu, L., Jiang, H., Zhang, H. J. A Robust Audio Classification and Segmentation Method. ACM MM 2001.Google Scholar
  27. 27.
    Ma, Y. F., Lu, L., Zhang, H. J., Li, M. J. A User Attention Model for Video Summarization. ACM MM 2002, 533–542.Google Scholar
  28. 28.
    Matsuo, Y., Amano, M., Uehara K. Mining Video Editing Rules in Video Streams. ACM MM 2002, 255–258.Google Scholar
  29. 29.
    Oh, B. M., et al. Image-Based Modeling and Photo Editing. Proc. ACM SIGGRAPH 2001, 433–442.Google Scholar
  30. 30.
    Omoigui, N., He, L., Gupta, A., Grudin, J., Sanoki, E. Time-compression: System Concerns, Usage, and Benefits. Proc. of ACM ICH1999.Google Scholar
  31. 31.
    Orriols, X., Binefa, X. An EM Algorithm for Video Summarization, Generative Model Approach. ICCV 2001.Google Scholar
  32. 32.
    Scheirer, E. Tempo and Beat Aalysis of Acoustic Musical Signals. Journal of the Acoustical Society of America, 103(1), 588–601, 1998.Google Scholar
  33. 33.
    Smith, M. A., Kanade, T. Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques. Proc. of Computer Vision and Pattern Recognition1997.Google Scholar
  34. 34.
    Stefanidis, A., Partsinevelos, P., Agouris, P., Doucette, P. Summarizing Video Datasets in the Spatiotemporal Domain. Proc. of 11th Intl. Workshop on Database and Expert Systems Applications, 2000. Google Scholar
  35. 35.
    Sundaram, H., Xie, L., Chang, S. F. A Utility Framework for the Automatic Generation of Audio-Visual Skims. ACM Multimedia2002.Google Scholar
  36. 36.
    Wang, D., Lu, L., Zhang, H. J. Speech Segmentation without Speech Recognition. Proc. of ICASSP2003.Google Scholar
  37. 37.
    Wernicke, A., Lienhart, R. On the Segmentation of Text in Videos. Proc. IEEE Int. Conference on Multimedia and Expo (ICME 2000), 1511-1514, New York, USA, July 2000.Google Scholar
  38. 38.
    Whitley, D. A Genetic Algorithm Tutorial. Statistics and Computing, 4, 64–85, 1994.CrossRefGoogle Scholar
  39. 39.
    Wu, F., et al. SMART: An Efficient, Scalable and Robust Streaming Video System. EURASIP Journal on Applied Signal Processing, Special issue on Multimedia over IP and Wireless Networks, .2, 192–206, 2004.Google Scholar
  40. 40.
    Yan, W. Q., Kankanhalli, M. Detection and Removal of Lighting & Shaking Artifacts in Home Videos. ACM Multimedia 2002, 107–116.Google Scholar
  41. 41.
    Zhang, D., Qi, W., Zhang, H.J. A New Shot Detection Algorithm. 2nd IEEE Pacific-Rim Conf on Multimedia (PCM 2001), Beijing, China, 63–70, October 2001.Google Scholar
  42. 42.
    Zhang, H. J., Kankanhalli, A., Smoliar, S. W. Automatic Partitioning of Full-Motion Video. Multimedia Systems, 1, 10–2, 1993.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Microsoft Research AsiaChina

Personalised recommendations