Abstract
Video summarization has been a core problem to manage the growing amount of content in multimedia databases. An efficient video summary should display an overview of the video content and most existing approaches fulfill this goal. However, such an overview does not allow the user to reach all details of interest selectively and progressively. This paper proposes a novel scalable summary generation approach based on the On-Line Analytical Processing data cube. Such a structure integrates tools like the drill down operation allowing to browse efficiently multiple descriptions of a dataset according to increased levels of detail. We adapt this model to video summary generation by expressing a video within a cross-media feature space and by performing clusterings according to particular subspaces. Consensus clustering is used to guide the subspace selection strategy at small dimensions, as the novelty brought by the least consensual subspaces is interesting for the refinements of a summary. Our approach is designed for weakly-structured contents such as cultural documentaries. We perform its evaluation on a corpus of cultural archives provided by the French Audiovisual National Institute (INA) using information retrieval metrics handling single and multiple reference annotations. The performances obtained overall improved results compared to two baseline systems performing random and arbitrary segmentations, showing a better balance between Precision and Recall.
Similar content being viewed by others
Notes
These features are extracted using the MPEG-7 Feature Extraction Library proposed by Bilkent University Multimedia Database Group, available at http://cs.bilkent.edu.tr/∼bilmdg/bilvideo-7/Software.html.
PHOG descriptors are computed thanks to the MATLAB script by Anna Bosch and Andrew Zisserman available at http://www.robots.ox.ac.uk/∼vgg/research/caltech/phog.html.
The extraction of HOG, HOF, MBH and the implementation of bag of words approach are realized using the Dense Trajectories Video Description Toolbox by Wang et al., available at http://lear.inrialpes.fr/people/wang/dense_trajectories.
The analysis window size is set automatically by the toolbox in the case of the Chroma vectors. The other parameters for the extraction of MFCC and Chroma vectors are set as the default ones in Yaafe.
The datasets used by Li and Merialdo [14] were not considered in this work for copyright reasons.
References
Almeida J, Leite NJ, da S Torres R (2013) Online video summarization in compressed domain. J Vis Commun Image Represent 24:729–738
Bartolini I, Patella M, Stromei G (2011) The windsurf library for the efficient retrieval of multimedia hierarchical data. In: Proceedings of ACM special interest group on multimedia (SIGMM), pp 139–148
Bartsch MA, Wakefield GH (2005) Audio thumbnailing of popular music using chroma-based representations. IEEE Trans Multimed 7:96–104
Ben Abdelali A, Nidhalkrifa M, Mtibaa A, Bourennane EB (2009) A study of color structure descriptor for shot boundary detection. Int J Sci Tech Autom Control Comput Eng 3(1):956–971
Benini S, Bianchetti A, Leonardi R, Migliorati P (2006) Extraction of significant video summaries by dendrogram analysis. In: Proceedings of the international conference on image processing (ICIP), pp 133–136
Benois-Pineau J, Dupuy W, Barba D (2001) Recovering of visual scenarios in movies by motion analysis and grouping spatio-temporal colour signatures of video shots. In: Proceedings of EUSFLAT’2001, pp 385–389
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231
Goder A, Filkov V (2008) Consensus clustering algorithms: Comparison and refinement. In: Proceedings of 9th workshop on algorithm engineering and experiments (ALENEX’08), pp 109–117
Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proceedings of the neural information processing systems conference (NIPS), pp 1–9
Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub totals. J Data Min Knowledge Disc 1(1):29–53
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31:651–666
Jin X, Han J, Cao L, Luo J, Ding B, Lin CK (2010) Visual cube and n-line analytical processing of images. In: Proceedings of the 19th ACM international conference on information and knowledge management (CIKM), pp 849–858
Kompatsiaris Y, Merialdo B, Lian S (eds) (2012) TV content analysis. Techniques and applications. CRC Press
Li Y, Merialdo B (2010) VERT: automatic evaluation of video summaries. In: Proceedings of ACM multimedia, pp 851–854
Mathieu B, Essid S, Fillon T, Prado J, Richard G (2010) YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings of the 11th International society for music information retrieval (ISMIR), pp 441–446
Messing D, van Beek P, Errico JH (2001) The mpeg-7 color structure descriptor: image description using color and local spatial information. In: Proceedings of the international conference on image processing (ICIP), pp 670–673
Naci U, Damnjanovic U, Mansencal B, Benois-Pineau J, Kaes C, Corvaglia M, Rossi E, Aginako N (2008) The COST292 experimental framework for rushes summarization task in TRECVID 2008. In: Proceedings of the 2nd ACM TRECVID video summarization workshop, pp 40–44
Peltonen V, Tuomi J, Klapuri A, Huopaniemi J, Sorsa T (2002) Computational auditory scene recognition. In: Proceedings of the 2002 IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 1941–1944
Pinquier J, Karaman S, Letoupin L, Guyot P, Mégret R, Benois-Pineau J, Gaëstel Y, Dartigues JF (2012) Strategies for multiple feature fusion with Hierarchical HMM: Application to activity recognition from wearable audiovisual sensors. In: Proceedings of the 21st international conference on pattern recognition (ICPR), pp 3192–3195
Quénot G, Benois-Pineau J, Mansencal B, Rossi E, Cord M, Precioso F, Gorisse D, Lambert P, Augereau B, Granjon L, Pellerin D, Rombaut M, Ayache S (2008) Rushes summarization by IRIM consortium: redundancy removal and multi-feature fusion. In: Proceedings of the 2nd ACM TRECVID video summarization workshop, pp 80–84
R Perez-Daniel K, Nakano-Miyatake M, Benois-Pineau J, Maabout S, Sargent G (2014) Scalable video summarization of cultural video documents in cross-media space based on data cube approach. In: Proceedings of the 12th international workshop on content-based multimedia indexing (CBMI), pp 1–6
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79
Wang J, Liu P, She M, Kouzani A, Nahavandi S (2011) The MPEG-7 color structure descriptor: Image description using color and local spatial information. In: Proceedings of 2011 IEEE international conference on systems, man, and cybernetics (SMC), pp 2449–2454
Yeung M, Yeo BL (1996) Time-constrained clustering for segmentation of video into story units. In: Proceedings of the 13th international conference on pattern recognition (ICPR), vol. 3, pp 375–380
Yong-ge W, Sheng-ze P (2012) Research on image retrieval based on scalable color descriptor of mpeg7. Adv Control Commun:91–98
Acknowledgments
This work is supported by the French National Research Agency grant ANR-11-IS02-001 within the joint French-Mexican project Mex-Culture. We are grateful to the Institut National de l’Audiovisuel (INA, France) for providing us the video content we employed for setting up the evaluation. The authors thank Michel Crucianu and Marin Ferecatu for valuable discussions and master student Elie Génard for his efficient help in conducting computational experiments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sargent, G., Perez-Daniel, K.R., Stoian, A. et al. A scalable summary generation method based on cross-modal consensus clustering and OLAP cube modeling. Multimed Tools Appl 75, 9073–9094 (2016). https://doi.org/10.1007/s11042-015-2863-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2863-3