Abstract
Every day, a huge amount of video data is generated worldwide and processing this kind of data requires powerful resources in terms of time, manpower, and hardware. Therefore, to help quickly understand the content of video data, video summarization methods have been proposed. Recently, sparse formulation-based methods have been found to be able to summarize a large amount of video compared to other methods. In this paper, we propose a new method in which video summarization is performed as training and selection sparse dictionary problem simultaneously. It is shown that the proposed method is able to improve the summarization of a large amount of video data compared to other methods. Finally, the performance of the proposed method is compared to state-of-the-art methods using standard data sets, in which the key frames are manually tagged. The obtained results demonstrate that the proposed method improves video summarization compared to other methods.
Similar content being viewed by others
References
Arnold BC, Groeneveld RA (1995) Measuring skewness with respect to the mode. Am Stat 49:34–38
Avila SEF, Lopes APB, daLuz A, Araújo A (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Rec Let 32(1):56–68
Cernekova Z, Pitas I, Nikou C (2006) Information theory-based shot cut/fade detection and video summarization. IEEE Trans Circuits Syst Video Technol 16(1):82–91
Chen F, Cooper M, Adcock J (2007) Video summarization preserving dynamic content. In proceeding of international workshop on TRECVID video summarization
Chen F, Vleeschouwer D (2011) Formulating team-sport video summarization as a resource allocation problem. IEEE Trans Circuits Syst Video Technol 21(2):193–205
Chen F, Vleeschouwer CD, Cavallaro A (2014) Resource allocation for personalized video summarization. IEEE Trans Multimed 16(2):455–469
Ciocca G, Schettini R (2006) Innovative algorithm for key frame extraction in video summarization. J Real-Time Image Proc 1(1):69–88
Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: Proc. IEEE Conf. Com. Vision & Pattern Recognition(CVPR), pp. 3449–3456
Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans Multimed 14(1):66–75
Doulamis ND, Doulamis AD, Avrithis YS, Ntalianis KS, Kollias SD (2000) Efficient summarization of stereoscopic video sequences. IEEE Trans Circuits Syst Video Technol 10(4):501–517
Ejaz N, Manzoor U, Nefti S, Baik SW (2012) A collaborative multi-agent framework for abnormal activity detection in crowded areas. Int J Innov Comp Inf Control 8(6):4219–4234
Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. Elsevier. Visual comm image rep. 23: 1031–1040
Elad M (2010) Sparse redundant representations, from theory to applications in signal and image processing. Springer, the Teknion Institute of technology Haifa, pp. 200–246
Ferman AM, Gunsel B, Tekalp AM (1997) Object-based indexing of MPEG-4 compressed video. Proc. VCIP’97, vol. SPIE-3024, pp. 953–963
Fu Y, Guo Y, Zhu Y, Liuv SC, Zhou Z (2010) Multi view video summarization. IEEE Trans Multimed 12(7):717–729
Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: still and moving video storyboard for the web scenario. Multimed Tools Appl 46(1):47–69
Gallager RG (1968) Information theory and reliable communication. Wiley, New York
Golub GH, Van Loan CF (1996) Matrix computations, 3rd. Edition. Johns Hopkins University Press, Baltimore and London
Groeneveld RA, Meeden G (1984) Measuring skewness and kurtosis. J R Stat Soc Stat 33:391–399
Guan G, Wang Z, Lu S, Dadeng J, Feng D (2013) Keypoint based keyframe selection. IEEE Trans Circuits Syst Video Technol 23(4):729–734
Hanjalic A, Langendijk RL, Biemond J (1996) A new key frame allocation method for representing stored video streams. 1st Int. Workshop on image databases & multi, search, Amsterdam, The Netherlands, pp. 67–74
Hu W, Xie N, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern 41(6):797–819
Kim HH, Kim YH (2010) Toward a conceptual framework of key-frame extraction and storyboard display for video summarization. J Am Soc Inf Sci Techol 61(5):927–939
Lakshmi GG, Domnic S (2014) Shot based key frame extraction for ecological video indexing and retrieval. Elsevier Ecol Inf 23:107–117
Li Y, Lee SH, Yeh CH, Kuo CC (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23(2):79–89
Li Y, Zhang T, Tretter D (2001) An overview of video abstraction techniques. Technical report HPL-2001–191
Lindeberg T (1994) Scale-space theory: a basic tool for analyzing structures at different scales. J Appl Stat, pp. 224–270
Loui A, Luo J, Chang S, Ellis D, Jiang W, Kennedy L, Lee K, Yanagawa A (2007) Kodak’s consumer video benchmark data set: concept definition and annotation. in Proc. Int. Workshop multimedia Inf. Ret, pp. 245–254
Lowe DG (2004) Distinctive image features from scale-invariant key-points. Int J Comput Vision 60(2):91–110
Lu S, Wang Z, Mei T, Guan G, Feng DD (2014) A Bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans Multimed 16(6):1497–1509
Luo J, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: from humans to computers. IEEE Trans Circuits Syst Video Technol 19(2):289–301
Massimiliano A (2006) Extracting and Summarizing Information from large data Repositories,” Ph.D. Dissertation, University of Naples Federico II, Italia
Mikolajczyk K, Zisserman A, Schmid C (2003) Shape recognition with edge-based features. In Proc. British Machine Vision Conf, Norwich, U.K
Money AG, Agius H (2008) Video summarization: a conceptual framework and survey of the state of the Art. J Visual Commun Image Represent 19(2):121–143
Mundur P, Rao Y, Yesha Y (2006) Keyframe based video summarization using Delaunay clustering. Int J Digit Libr 6(2):219–232
Nesterov Y (2007) Gradient methods for minimizing composite objective function. CORE, Louvain-la-Neuve
Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Rasheed Z, Shah M (2005) Detection and representation of scenes in videos. IEEE Trans Multimed 7(6):1097–1105
Rayner JCW, Best DJ, Matthews KL (1995) Interpreting the skewness coefficient. Commun Stat Theory Methods 24:593–600
Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. Adv Artif Intell 43(4):1015–1021
Stricker M, Orengo M (1995) Similarity of color images. Proc SPIE Storage Retr Image Video Databases 2420:381–392
Tapas K, Res A et al (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Taskiran CM, Pizlo Z, Amir A, Ponceleon D, Delp EJ (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimedia 8(4):775–791
The open video project: http://www.open-video.org/
The VSUMM database site: https://sites.google.com/site/vsummsite/results
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Statist Soc Ser B (Methodol) 58(1):267–288
Truong BT, Venkatesh S (2007) Video abstraction: a systematic re-view and classification. ACM Trans Multimedia Com Commun Appl 3(1):1–37
Tsai DM, Lai SC (2009) Independent component analysis-based background subtraction for indoor surveillance. IEEE Trans Image Process 18(1):158–167
Wu J, Christensen H, Rehg J (2009) Visual place categorization: problem, dataset, and algorithm. In: Proc. IRO
Wu J, Christensen H, Rehg J (2009) Visual place categorization: problem, dataset, and algorithm. In: Proc. Intelligent robots and systems, pp. 4763–47760
Wu J, Rehg J (2010) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501
Xiang T, Gong S (2008) Video behavior profiling for anomaly detection. IEEE Trans Pattern Anal Mach Intell 30(5):893–908
Xu M, Orwell J, Jones G (2004) Tracking football players with multiple cameras. In: Proc. Int. Conf. Image Process, ICIP 2004, pp. 2909–2912
Yan C, Zhang Y et al (2014) A Highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576
Yan C, Zhang Y et al (2014) Efficient Parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Multimed Circuits Syst Video Technol 24(12):2077–2089
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc 68(1):49–67
Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. Proc Int Conf Image Process 1:866–870
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Etezadifar, P., Farsi, H. Scalable video summarization via sparse dictionary learning and selection simultaneously. Multimed Tools Appl 76, 7947–7971 (2017). https://doi.org/10.1007/s11042-016-3433-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3433-z