Multimedia Tools and Applications

, Volume 76, Issue 6, pp 7947–7971 | Cite as

Scalable video summarization via sparse dictionary learning and selection simultaneously

  • Pouriya Etezadifar
  • Hassan FarsiEmail author


Every day, a huge amount of video data is generated worldwide and processing this kind of data requires powerful resources in terms of time, manpower, and hardware. Therefore, to help quickly understand the content of video data, video summarization methods have been proposed. Recently, sparse formulation-based methods have been found to be able to summarize a large amount of video compared to other methods. In this paper, we propose a new method in which video summarization is performed as training and selection sparse dictionary problem simultaneously. It is shown that the proposed method is able to improve the summarization of a large amount of video data compared to other methods. Finally, the performance of the proposed method is compared to state-of-the-art methods using standard data sets, in which the key frames are manually tagged. The obtained results demonstrate that the proposed method improves video summarization compared to other methods.


Video summarization Dictionary learning Key frames Scene analysis Video analysis Sparse coding 


  1. 1.
    Arnold BC, Groeneveld RA (1995) Measuring skewness with respect to the mode. Am Stat 49:34–38MathSciNetGoogle Scholar
  2. 2.
    Avila SEF, Lopes APB, daLuz A, Araújo A (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Rec Let 32(1):56–68CrossRefGoogle Scholar
  3. 3.
    Cernekova Z, Pitas I, Nikou C (2006) Information theory-based shot cut/fade detection and video summarization. IEEE Trans Circuits Syst Video Technol 16(1):82–91CrossRefGoogle Scholar
  4. 4.
    Chen F, Cooper M, Adcock J (2007) Video summarization preserving dynamic content. In proceeding of international workshop on TRECVID video summarizationGoogle Scholar
  5. 5.
    Chen F, Vleeschouwer D (2011) Formulating team-sport video summarization as a resource allocation problem. IEEE Trans Circuits Syst Video Technol 21(2):193–205CrossRefGoogle Scholar
  6. 6.
    Chen F, Vleeschouwer CD, Cavallaro A (2014) Resource allocation for personalized video summarization. IEEE Trans Multimed 16(2):455–469CrossRefGoogle Scholar
  7. 7.
    Ciocca G, Schettini R (2006) Innovative algorithm for key frame extraction in video summarization. J Real-Time Image Proc 1(1):69–88CrossRefGoogle Scholar
  8. 8.
    Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: Proc. IEEE Conf. Com. Vision & Pattern Recognition(CVPR), pp. 3449–3456Google Scholar
  9. 9.
    Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans Multimed 14(1):66–75CrossRefGoogle Scholar
  10. 10.
    Doulamis ND, Doulamis AD, Avrithis YS, Ntalianis KS, Kollias SD (2000) Efficient summarization of stereoscopic video sequences. IEEE Trans Circuits Syst Video Technol 10(4):501–517CrossRefGoogle Scholar
  11. 11.
    Ejaz N, Manzoor U, Nefti S, Baik SW (2012) A collaborative multi-agent framework for abnormal activity detection in crowded areas. Int J Innov Comp Inf Control 8(6):4219–4234Google Scholar
  12. 12.
    Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. Elsevier. Visual comm image rep. 23: 1031–1040Google Scholar
  13. 13.
    Elad M (2010) Sparse redundant representations, from theory to applications in signal and image processing. Springer, the Teknion Institute of technology Haifa, pp. 200–246Google Scholar
  14. 14.
    Ferman AM, Gunsel B, Tekalp AM (1997) Object-based indexing of MPEG-4 compressed video. Proc. VCIP’97, vol. SPIE-3024, pp. 953–963Google Scholar
  15. 15.
    Fu Y, Guo Y, Zhu Y, Liuv SC, Zhou Z (2010) Multi view video summarization. IEEE Trans Multimed 12(7):717–729CrossRefGoogle Scholar
  16. 16.
    Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: still and moving video storyboard for the web scenario. Multimed Tools Appl 46(1):47–69CrossRefGoogle Scholar
  17. 17.
    Gallager RG (1968) Information theory and reliable communication. Wiley, New YorkzbMATHGoogle Scholar
  18. 18.
    Golub GH, Van Loan CF (1996) Matrix computations, 3rd. Edition. Johns Hopkins University Press, Baltimore and LondonGoogle Scholar
  19. 19.
    Groeneveld RA, Meeden G (1984) Measuring skewness and kurtosis. J R Stat Soc Stat 33:391–399Google Scholar
  20. 20.
    Guan G, Wang Z, Lu S, Dadeng J, Feng D (2013) Keypoint based keyframe selection. IEEE Trans Circuits Syst Video Technol 23(4):729–734CrossRefGoogle Scholar
  21. 21.
    Hanjalic A, Langendijk RL, Biemond J (1996) A new key frame allocation method for representing stored video streams. 1st Int. Workshop on image databases & multi, search, Amsterdam, The Netherlands, pp. 67–74Google Scholar
  22. 22.
    Hu W, Xie N, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern 41(6):797–819CrossRefGoogle Scholar
  23. 23.
    Kim HH, Kim YH (2010) Toward a conceptual framework of key-frame extraction and storyboard display for video summarization. J Am Soc Inf Sci Techol 61(5):927–939CrossRefGoogle Scholar
  24. 24.
    Lakshmi GG, Domnic S (2014) Shot based key frame extraction for ecological video indexing and retrieval. Elsevier Ecol Inf 23:107–117CrossRefGoogle Scholar
  25. 25.
    Li Y, Lee SH, Yeh CH, Kuo CC (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23(2):79–89CrossRefGoogle Scholar
  26. 26.
    Li Y, Zhang T, Tretter D (2001) An overview of video abstraction techniques. Technical report HPL-2001–191Google Scholar
  27. 27.
    Lindeberg T (1994) Scale-space theory: a basic tool for analyzing structures at different scales. J Appl Stat, pp. 224–270Google Scholar
  28. 28.
    Loui A, Luo J, Chang S, Ellis D, Jiang W, Kennedy L, Lee K, Yanagawa A (2007) Kodak’s consumer video benchmark data set: concept definition and annotation. in Proc. Int. Workshop multimedia Inf. Ret, pp. 245–254Google Scholar
  29. 29.
    Lowe DG (2004) Distinctive image features from scale-invariant key-points. Int J Comput Vision 60(2):91–110CrossRefGoogle Scholar
  30. 30.
    Lu S, Wang Z, Mei T, Guan G, Feng DD (2014) A Bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans Multimed 16(6):1497–1509CrossRefGoogle Scholar
  31. 31.
    Luo J, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: from humans to computers. IEEE Trans Circuits Syst Video Technol 19(2):289–301CrossRefGoogle Scholar
  32. 32.
    Massimiliano A (2006) Extracting and Summarizing Information from large data Repositories,” Ph.D. Dissertation, University of Naples Federico II, ItaliaGoogle Scholar
  33. 33.
    Mikolajczyk K, Zisserman A, Schmid C (2003) Shape recognition with edge-based features. In Proc. British Machine Vision Conf, Norwich, U.KGoogle Scholar
  34. 34.
    Money AG, Agius H (2008) Video summarization: a conceptual framework and survey of the state of the Art. J Visual Commun Image Represent 19(2):121–143CrossRefGoogle Scholar
  35. 35.
    Mundur P, Rao Y, Yesha Y (2006) Keyframe based video summarization using Delaunay clustering. Int J Digit Libr 6(2):219–232CrossRefGoogle Scholar
  36. 36.
    Nesterov Y (2007) Gradient methods for minimizing composite objective function. CORE, Louvain-la-NeuveGoogle Scholar
  37. 37.
    Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305CrossRefGoogle Scholar
  38. 38.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefzbMATHGoogle Scholar
  39. 39.
    Rasheed Z, Shah M (2005) Detection and representation of scenes in videos. IEEE Trans Multimed 7(6):1097–1105CrossRefGoogle Scholar
  40. 40.
    Rayner JCW, Best DJ, Matthews KL (1995) Interpreting the skewness coefficient. Commun Stat Theory Methods 24:593–600CrossRefzbMATHGoogle Scholar
  41. 41.
    Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. Adv Artif Intell 43(4):1015–1021Google Scholar
  42. 42.
    Stricker M, Orengo M (1995) Similarity of color images. Proc SPIE Storage Retr Image Video Databases 2420:381–392CrossRefGoogle Scholar
  43. 43.
    Tapas K, Res A et al (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892CrossRefGoogle Scholar
  44. 44.
    Taskiran CM, Pizlo Z, Amir A, Ponceleon D, Delp EJ (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimedia 8(4):775–791CrossRefGoogle Scholar
  45. 45.
    The open video project:
  46. 46.
  47. 47.
    Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Statist Soc Ser B (Methodol) 58(1):267–288MathSciNetzbMATHGoogle Scholar
  48. 48.
    Truong BT, Venkatesh S (2007) Video abstraction: a systematic re-view and classification. ACM Trans Multimedia Com Commun Appl 3(1):1–37CrossRefGoogle Scholar
  49. 49.
    Tsai DM, Lai SC (2009) Independent component analysis-based background subtraction for indoor surveillance. IEEE Trans Image Process 18(1):158–167MathSciNetCrossRefGoogle Scholar
  50. 50.
    Wu J, Christensen H, Rehg J (2009) Visual place categorization: problem, dataset, and algorithm. In: Proc. IROGoogle Scholar
  51. 51.
    Wu J, Christensen H, Rehg J (2009) Visual place categorization: problem, dataset, and algorithm. In: Proc. Intelligent robots and systems, pp. 4763–47760Google Scholar
  52. 52.
    Wu J, Rehg J (2010) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501Google Scholar
  53. 53.
    Xiang T, Gong S (2008) Video behavior profiling for anomaly detection. IEEE Trans Pattern Anal Mach Intell 30(5):893–908CrossRefGoogle Scholar
  54. 54.
    Xu M, Orwell J, Jones G (2004) Tracking football players with multiple cameras. In: Proc. Int. Conf. Image Process, ICIP 2004, pp. 2909–2912Google Scholar
  55. 55.
    Yan C, Zhang Y et al (2014) A Highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576CrossRefGoogle Scholar
  56. 56.
    Yan C, Zhang Y et al (2014) Efficient Parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Multimed Circuits Syst Video Technol 24(12):2077–2089CrossRefGoogle Scholar
  57. 57.
    Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc 68(1):49–67MathSciNetCrossRefzbMATHGoogle Scholar
  58. 58.
    Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. Proc Int Conf Image Process 1:866–870Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Faculty of Electrical and Computer EngineeringUniversity of BirjandBirjandIran

Personalised recommendations