Skip to main content
Log in

Scalable video summarization via sparse dictionary learning and selection simultaneously

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Every day, a huge amount of video data is generated worldwide and processing this kind of data requires powerful resources in terms of time, manpower, and hardware. Therefore, to help quickly understand the content of video data, video summarization methods have been proposed. Recently, sparse formulation-based methods have been found to be able to summarize a large amount of video compared to other methods. In this paper, we propose a new method in which video summarization is performed as training and selection sparse dictionary problem simultaneously. It is shown that the proposed method is able to improve the summarization of a large amount of video data compared to other methods. Finally, the performance of the proposed method is compared to state-of-the-art methods using standard data sets, in which the key frames are manually tagged. The obtained results demonstrate that the proposed method improves video summarization compared to other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Arnold BC, Groeneveld RA (1995) Measuring skewness with respect to the mode. Am Stat 49:34–38

    MathSciNet  Google Scholar 

  2. Avila SEF, Lopes APB, daLuz A, Araújo A (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Rec Let 32(1):56–68

    Article  Google Scholar 

  3. Cernekova Z, Pitas I, Nikou C (2006) Information theory-based shot cut/fade detection and video summarization. IEEE Trans Circuits Syst Video Technol 16(1):82–91

    Article  Google Scholar 

  4. Chen F, Cooper M, Adcock J (2007) Video summarization preserving dynamic content. In proceeding of international workshop on TRECVID video summarization

  5. Chen F, Vleeschouwer D (2011) Formulating team-sport video summarization as a resource allocation problem. IEEE Trans Circuits Syst Video Technol 21(2):193–205

    Article  Google Scholar 

  6. Chen F, Vleeschouwer CD, Cavallaro A (2014) Resource allocation for personalized video summarization. IEEE Trans Multimed 16(2):455–469

    Article  Google Scholar 

  7. Ciocca G, Schettini R (2006) Innovative algorithm for key frame extraction in video summarization. J Real-Time Image Proc 1(1):69–88

    Article  Google Scholar 

  8. Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: Proc. IEEE Conf. Com. Vision & Pattern Recognition(CVPR), pp. 3449–3456

  9. Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans Multimed 14(1):66–75

    Article  Google Scholar 

  10. Doulamis ND, Doulamis AD, Avrithis YS, Ntalianis KS, Kollias SD (2000) Efficient summarization of stereoscopic video sequences. IEEE Trans Circuits Syst Video Technol 10(4):501–517

    Article  Google Scholar 

  11. Ejaz N, Manzoor U, Nefti S, Baik SW (2012) A collaborative multi-agent framework for abnormal activity detection in crowded areas. Int J Innov Comp Inf Control 8(6):4219–4234

    Google Scholar 

  12. Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. Elsevier. Visual comm image rep. 23: 1031–1040

  13. Elad M (2010) Sparse redundant representations, from theory to applications in signal and image processing. Springer, the Teknion Institute of technology Haifa, pp. 200–246

  14. Ferman AM, Gunsel B, Tekalp AM (1997) Object-based indexing of MPEG-4 compressed video. Proc. VCIP’97, vol. SPIE-3024, pp. 953–963

  15. Fu Y, Guo Y, Zhu Y, Liuv SC, Zhou Z (2010) Multi view video summarization. IEEE Trans Multimed 12(7):717–729

    Article  Google Scholar 

  16. Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: still and moving video storyboard for the web scenario. Multimed Tools Appl 46(1):47–69

    Article  Google Scholar 

  17. Gallager RG (1968) Information theory and reliable communication. Wiley, New York

    MATH  Google Scholar 

  18. Golub GH, Van Loan CF (1996) Matrix computations, 3rd. Edition. Johns Hopkins University Press, Baltimore and London

  19. Groeneveld RA, Meeden G (1984) Measuring skewness and kurtosis. J R Stat Soc Stat 33:391–399

    Google Scholar 

  20. Guan G, Wang Z, Lu S, Dadeng J, Feng D (2013) Keypoint based keyframe selection. IEEE Trans Circuits Syst Video Technol 23(4):729–734

    Article  Google Scholar 

  21. Hanjalic A, Langendijk RL, Biemond J (1996) A new key frame allocation method for representing stored video streams. 1st Int. Workshop on image databases & multi, search, Amsterdam, The Netherlands, pp. 67–74

  22. Hu W, Xie N, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern 41(6):797–819

    Article  Google Scholar 

  23. Kim HH, Kim YH (2010) Toward a conceptual framework of key-frame extraction and storyboard display for video summarization. J Am Soc Inf Sci Techol 61(5):927–939

    Article  Google Scholar 

  24. Lakshmi GG, Domnic S (2014) Shot based key frame extraction for ecological video indexing and retrieval. Elsevier Ecol Inf 23:107–117

    Article  Google Scholar 

  25. Li Y, Lee SH, Yeh CH, Kuo CC (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23(2):79–89

    Article  Google Scholar 

  26. Li Y, Zhang T, Tretter D (2001) An overview of video abstraction techniques. Technical report HPL-2001–191

  27. Lindeberg T (1994) Scale-space theory: a basic tool for analyzing structures at different scales. J Appl Stat, pp. 224–270

  28. Loui A, Luo J, Chang S, Ellis D, Jiang W, Kennedy L, Lee K, Yanagawa A (2007) Kodak’s consumer video benchmark data set: concept definition and annotation. in Proc. Int. Workshop multimedia Inf. Ret, pp. 245–254

  29. Lowe DG (2004) Distinctive image features from scale-invariant key-points. Int J Comput Vision 60(2):91–110

    Article  Google Scholar 

  30. Lu S, Wang Z, Mei T, Guan G, Feng DD (2014) A Bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans Multimed 16(6):1497–1509

    Article  Google Scholar 

  31. Luo J, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: from humans to computers. IEEE Trans Circuits Syst Video Technol 19(2):289–301

    Article  Google Scholar 

  32. Massimiliano A (2006) Extracting and Summarizing Information from large data Repositories,” Ph.D. Dissertation, University of Naples Federico II, Italia

  33. Mikolajczyk K, Zisserman A, Schmid C (2003) Shape recognition with edge-based features. In Proc. British Machine Vision Conf, Norwich, U.K

  34. Money AG, Agius H (2008) Video summarization: a conceptual framework and survey of the state of the Art. J Visual Commun Image Represent 19(2):121–143

    Article  Google Scholar 

  35. Mundur P, Rao Y, Yesha Y (2006) Keyframe based video summarization using Delaunay clustering. Int J Digit Libr 6(2):219–232

    Article  Google Scholar 

  36. Nesterov Y (2007) Gradient methods for minimizing composite objective function. CORE, Louvain-la-Neuve

    Google Scholar 

  37. Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305

    Article  Google Scholar 

  38. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  39. Rasheed Z, Shah M (2005) Detection and representation of scenes in videos. IEEE Trans Multimed 7(6):1097–1105

    Article  Google Scholar 

  40. Rayner JCW, Best DJ, Matthews KL (1995) Interpreting the skewness coefficient. Commun Stat Theory Methods 24:593–600

    Article  MATH  Google Scholar 

  41. Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. Adv Artif Intell 43(4):1015–1021

    Google Scholar 

  42. Stricker M, Orengo M (1995) Similarity of color images. Proc SPIE Storage Retr Image Video Databases 2420:381–392

    Article  Google Scholar 

  43. Tapas K, Res A et al (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892

    Article  Google Scholar 

  44. Taskiran CM, Pizlo Z, Amir A, Ponceleon D, Delp EJ (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimedia 8(4):775–791

    Article  Google Scholar 

  45. The open video project: http://www.open-video.org/

  46. The VSUMM database site: https://sites.google.com/site/vsummsite/results

  47. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Statist Soc Ser B (Methodol) 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  48. Truong BT, Venkatesh S (2007) Video abstraction: a systematic re-view and classification. ACM Trans Multimedia Com Commun Appl 3(1):1–37

    Article  Google Scholar 

  49. Tsai DM, Lai SC (2009) Independent component analysis-based background subtraction for indoor surveillance. IEEE Trans Image Process 18(1):158–167

    Article  MathSciNet  Google Scholar 

  50. Wu J, Christensen H, Rehg J (2009) Visual place categorization: problem, dataset, and algorithm. In: Proc. IRO

  51. Wu J, Christensen H, Rehg J (2009) Visual place categorization: problem, dataset, and algorithm. In: Proc. Intelligent robots and systems, pp. 4763–47760

  52. Wu J, Rehg J (2010) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501

    Google Scholar 

  53. Xiang T, Gong S (2008) Video behavior profiling for anomaly detection. IEEE Trans Pattern Anal Mach Intell 30(5):893–908

    Article  Google Scholar 

  54. Xu M, Orwell J, Jones G (2004) Tracking football players with multiple cameras. In: Proc. Int. Conf. Image Process, ICIP 2004, pp. 2909–2912

  55. Yan C, Zhang Y et al (2014) A Highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576

    Article  Google Scholar 

  56. Yan C, Zhang Y et al (2014) Efficient Parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Multimed Circuits Syst Video Technol 24(12):2077–2089

    Article  Google Scholar 

  57. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc 68(1):49–67

    Article  MathSciNet  MATH  Google Scholar 

  58. Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. Proc Int Conf Image Process 1:866–870

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hassan Farsi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Etezadifar, P., Farsi, H. Scalable video summarization via sparse dictionary learning and selection simultaneously. Multimed Tools Appl 76, 7947–7971 (2017). https://doi.org/10.1007/s11042-016-3433-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3433-z

Keywords

Navigation