Creating Summaries from User Videos

  • Michael Gygli
  • Helmut Grabner
  • Hayko Riemenschneider
  • Luc Van Gool
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8695)


This paper proposes a novel approach and a new benchmark for video summarization. Thereby we focus on user videos, which are raw videos containing a set of interesting events. Our method starts by segmenting the video by using a novel “superframe” segmentation, tailored to raw videos. Then, we estimate visual interestingness per superframe using a set of low-, mid- and high-level features. Based on this scoring, we select an optimal subset of superframes to create an informative and interesting summary. The introduced benchmark comes with multiple human created summaries, which were acquired in a controlled psychological experiment. This data paves the way to evaluate summarization methods objectively and to get new insights in video summarization. When evaluating our method, we find that it generates high-quality results, comparable to manual, human-created summaries.


Video analysis video summarization temporal segmentation 

Supplementary material

978-3-319-10584-0_33_MOESM1_ESM.mp4 (7.5 mb)
Electronic Supplementary Material (MP4 7,642 KB)
978-3-319-10584-0_33_MOESM2_ESM.pdf (531 kb)
Electronic Supplementary Material (PDF 531 KB)
978-3-319-10584-0_33_MOESM3_ESM.avi (952 kb)
Electronic Supplementary Material (AVI 953 KB)
978-3-319-10584-0_33_MOESM4_ESM.avi (1013 kb)
Electronic Supplementary Material (AVI 1,014 KB)
978-3-319-10584-0_33_MOESM5_ESM.avi (602 kb)
Electronic Supplementary Material (AVI 602 KB)
978-3-319-10584-0_33_MOESM6_ESM.avi (593 kb)
Electronic Supplementary Material (AVI 594 KB)
978-3-319-10584-0_33_MOESM7_ESM.avi (987 kb)
Electronic Supplementary Material (AVI 988 KB)
978-3-319-10584-0_33_MOESM8_ESM.avi (909 kb)
Electronic Supplementary Material (AVI 910 KB)
978-3-319-10584-0_33_MOESM9_ESM.avi (1.1 mb)
Electronic Supplementary Material (AVI 1,159 KB)
978-3-319-10584-0_33_MOESM10_ESM.avi (1.9 mb)
Electronic Supplementary Material (AVI 1,920 KB)
978-3-319-10584-0_33_MOESM11_ESM.avi (538 kb)
Electronic Supplementary Material (AVI 538 KB)
978-3-319-10584-0_33_MOESM12_ESM.avi (1.2 mb)
Electronic Supplementary Material (AVI 1,203 KB)
978-3-319-10584-0_33_MOESM13_ESM.avi (2.2 mb)
Electronic Supplementary Material (AVI 2,286 KB)
978-3-319-10584-0_33_MOESM14_ESM.avi (315 kb)
Electronic Supplementary Material (AVI 315 KB)
978-3-319-10584-0_33_MOESM15_ESM.avi (234 kb)
Electronic Supplementary Material (AVI 234 KB)
978-3-319-10584-0_33_MOESM16_ESM.avi (597 kb)
Electronic Supplementary Material (AVI 597 KB)
978-3-319-10584-0_33_MOESM17_ESM.avi (1.1 mb)
Electronic Supplementary Material (AVI 1,134 KB)
978-3-319-10584-0_33_MOESM18_ESM.avi (1.7 mb)
Electronic Supplementary Material (AVI 1,766 KB)
978-3-319-10584-0_33_MOESM19_ESM.avi (619 kb)
Electronic Supplementary Material (AVI 619 KB)
978-3-319-10584-0_33_MOESM20_ESM.avi (508 kb)
Electronic Supplementary Material (AVI 509 KB)
978-3-319-10584-0_33_MOESM21_ESM.avi (679 kb)
Electronic Supplementary Material (AVI 679 KB)
978-3-319-10584-0_33_MOESM22_ESM.avi (1.4 mb)
Electronic Supplementary Material (AVI 1,388 KB)
978-3-319-10584-0_33_MOESM23_ESM.avi (451 kb)
Electronic Supplementary Material (AVI 452 KB)
978-3-319-10584-0_33_MOESM24_ESM.avi (469 kb)
Electronic Supplementary Material (AVI 470 KB)
978-3-319-10584-0_33_MOESM25_ESM.avi (972 kb)
Electronic Supplementary Material (AVI 972 KB)
978-3-319-10584-0_33_MOESM26_ESM.avi (2 mb)
Electronic Supplementary Material (AVI 2,081 KB)
978-3-319-10584-0_33_MOESM27_ESM.avi (1.1 mb)
Electronic Supplementary Material (AVI 1,091 KB)


  1. 1.
    de Avila, S.E.F., Lopes, A.P.B., da Luz Jr., A., de A. Araújo, A.: VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters (2011)Google Scholar
  2. 2.
    Van den Bergh, M., Boix, X., Roig, G., de Capitani, B., Van Gool, L.: SEEDS: Superpixels extracted via energy-driven sampling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 13–26. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Datta, R., Joshi, D., Li, J., Wang, J.Z.: Studying aesthetics in photographic images using a computational approach. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 288–301. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Dragon, R., Ostermann, J., Van Gool, L.: Robust Realtime Motion-Split-And-Merge for Motion Segmentation. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 425–434. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Ejaz, N., Mehmood, I., Wook Baik, S.: Efficient visual attention based framework for extracting key frames from videos. Signal Processing: Image Communication (2013)Google Scholar
  6. 6.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI (2010)Google Scholar
  7. 7.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science (2007)Google Scholar
  8. 8.
    Gammeter, S., Bossard, L., Quack, T., Van Gool, L.: I know what you did last summer: object-level auto-annotation of holiday snaps. In: ICCV (2009)Google Scholar
  9. 9.
    Goldman, D., Curless, B.: Schematic storyboarding for video visualization and editing. ACM Trans. on Graphics (2006)Google Scholar
  10. 10.
    Goodrich, M.T., Tamassia, R.: Algorithm Design: Foundation, Analysis and Internet Examples. John Wiley & Sons (2006)Google Scholar
  11. 11.
    Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., Van Gool, L.: The interestingness of images. In: ICCV (2013)Google Scholar
  12. 12.
    Hou, X., Harel, J., Koch, C.: Image signature: Highlighting sparse salient regions. PAMI (2012)Google Scholar
  13. 13.
    Huang, T., Mehrotra, S.: Adaptive key frame extraction using unsupervised clustering. In: Proc. Image Processing (1998)Google Scholar
  14. 14.
    Ke, Y., Tang, X., Jing, F.: The design of high-level features for photo quality assessment. In: CVPR (2006)Google Scholar
  15. 15.
    Khosla, A., Hamid, R., Lin, C., Sundaresan, N.: Large-Scale Video Summarization Using Web-Image Priors. In: CVPR (2013)Google Scholar
  16. 16.
    Kim, G., Sigal, L., Xing, E.P.: Joint Summarization of Large-scale Collections of Web Images and Videos for Storyline Reconstruction. In: CVPR (2014)Google Scholar
  17. 17.
    Kline, P.: The handbook of psychological testing. Psychology Press (2000)Google Scholar
  18. 18.
    Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)Google Scholar
  19. 19.
    Liu, D., Hua, G., Chen, T.: A hierarchical visual model for video object summarization. PAMI (2010)Google Scholar
  20. 20.
    Liu, F., Niu, Y., Gleicher, M.: Using Web Photos for Measuring Video Frame Interestingness. In: IJCAI (2009)Google Scholar
  21. 21.
    Liu, Z., Zavesky, E., Shahraray, B.: Brief and high-interest video summary generation: evaluating the AT&T labs rushes summarizations. In: ACM WS on Video Summarization (2008)Google Scholar
  22. 22.
    Lu, Z., Grauman, K.: Story-Driven Summarization for Egocentric Video. In: CVPR (2013)Google Scholar
  23. 23.
    Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV (2001)Google Scholar
  24. 24.
    Mascelli, J.V.: The five C’s of cinematography. Cine/Grafic Publications (1965)Google Scholar
  25. 25.
    Over, P., Smeaton, A.F., Awad, G.: The TRECVID 2008 BBC rushes summarization evaluation. In: Proc. ACM WS on Video Summarization (2008)Google Scholar
  26. 26.
    Smith, M., Kanade, T.: Video skimming and characterization through the combination of image and language understanding. In: Proc. on Content-Based Access of Image and Video Database (1998)Google Scholar
  27. 27.
    Viola, P., Jones, M.: Robust real-time face detection. IJCV (2004)Google Scholar
  28. 28.
    Wolf, W.: Key frame selection by motion analysis. Acoustics, Speech, and Signal Processing (1996)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Michael Gygli
    • 1
    • 2
  • Helmut Grabner
    • 1
    • 2
  • Hayko Riemenschneider
    • 1
  • Luc Van Gool
    • 1
    • 3
  1. 1.Computer Vision LaboratoryETH ZurichSwitzerland
  2. 2.upicto GmbHZurichSwitzerland
  3. 3.K.U. LeuvenBelgium

Personalised recommendations