A scalable summary generation method based on cross-modal consensus clustering and OLAP cube modeling

Sargent, Gabriel; Perez-Daniel, Karina R.; Stoian, Andrei; Benois-Pineau, Jenny; Maabout, Sofian; Nicolas, Henri; Miyatake, Mariko Nakano; Carrive, Jean

doi:10.1007/s11042-015-2863-3

A scalable summary generation method based on cross-modal consensus clustering and OLAP cube modeling

Published: 01 September 2015

Volume 75, pages 9073–9094, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Gabriel Sargent¹,
Karina R. Perez-Daniel²,
Andrei Stoian¹,
Jenny Benois-Pineau³,
Sofian Maabout³,
Henri Nicolas³,
Mariko Nakano Miyatake² &
…
Jean Carrive⁴

312 Accesses
5 Citations
Explore all metrics

Abstract

Video summarization has been a core problem to manage the growing amount of content in multimedia databases. An efficient video summary should display an overview of the video content and most existing approaches fulfill this goal. However, such an overview does not allow the user to reach all details of interest selectively and progressively. This paper proposes a novel scalable summary generation approach based on the On-Line Analytical Processing data cube. Such a structure integrates tools like the drill down operation allowing to browse efficiently multiple descriptions of a dataset according to increased levels of detail. We adapt this model to video summary generation by expressing a video within a cross-media feature space and by performing clusterings according to particular subspaces. Consensus clustering is used to guide the subspace selection strategy at small dimensions, as the novelty brought by the least consensual subspaces is interesting for the refinements of a summary. Our approach is designed for weakly-structured contents such as cultural documentaries. We perform its evaluation on a corpus of cultural archives provided by the French Audiovisual National Institute (INA) using information retrieval metrics handling single and multiple reference annotations. The performances obtained overall improved results compared to two baseline systems performing random and arbitrary segmentations, showing a better balance between Precision and Recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video Summarization by Robust Low-Rank Subspace Segmentation

A structure-transfer-driven temporal subspace clustering for video summarization

Article 13 November 2018

Jing Zhang, Yue Shi, … Yuting Su

A Domain Independent Approach to Video Summarization

Notes

http://trecvid.nist.gov/
http://www.ina.fr
These features are extracted using the MPEG-7 Feature Extraction Library proposed by Bilkent University Multimedia Database Group, available at http://cs.bilkent.edu.tr/∼bilmdg/bilvideo-7/Software.html.
PHOG descriptors are computed thanks to the MATLAB script by Anna Bosch and Andrew Zisserman available at http://www.robots.ox.ac.uk/∼vgg/research/caltech/phog.html.
The extraction of HOG, HOF, MBH and the implementation of bag of words approach are realized using the Dense Trajectories Video Description Toolbox by Wang et al., available at http://lear.inrialpes.fr/people/wang/dense_trajectories.
The analysis window size is set automatically by the toolbox in the case of the Chroma vectors. The other parameters for the extraction of MFCC and Chroma vectors are set as the default ones in Yaafe.
http://www-nlpir.nist.gov/projects/tv2008/tv2008.html#4.4
The datasets used by Li and Merialdo [14] were not considered in this work for copyright reasons.

References

Almeida J, Leite NJ, da S Torres R (2013) Online video summarization in compressed domain. J Vis Commun Image Represent 24:729–738
Article Google Scholar
Bartolini I, Patella M, Stromei G (2011) The windsurf library for the efficient retrieval of multimedia hierarchical data. In: Proceedings of ACM special interest group on multimedia (SIGMM), pp 139–148
Bartsch MA, Wakefield GH (2005) Audio thumbnailing of popular music using chroma-based representations. IEEE Trans Multimed 7:96–104
Article Google Scholar
Ben Abdelali A, Nidhalkrifa M, Mtibaa A, Bourennane EB (2009) A study of color structure descriptor for shot boundary detection. Int J Sci Tech Autom Control Comput Eng 3(1):956–971
Google Scholar
Benini S, Bianchetti A, Leonardi R, Migliorati P (2006) Extraction of significant video summaries by dendrogram analysis. In: Proceedings of the international conference on image processing (ICIP), pp 133–136
Benois-Pineau J, Dupuy W, Barba D (2001) Recovering of visual scenarios in movies by motion analysis and grouping spatio-temporal colour signatures of video shots. In: Proceedings of EUSFLAT’2001, pp 385–389
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231
Goder A, Filkov V (2008) Consensus clustering algorithms: Comparison and refinement. In: Proceedings of 9th workshop on algorithm engineering and experiments (ALENEX’08), pp 109–117
Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proceedings of the neural information processing systems conference (NIPS), pp 1–9
Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub totals. J Data Min Knowledge Disc 1(1):29–53
Article Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31:651–666
Article Google Scholar
Jin X, Han J, Cao L, Luo J, Ding B, Lin CK (2010) Visual cube and n-line analytical processing of images. In: Proceedings of the 19th ACM international conference on information and knowledge management (CIKM), pp 849–858
Kompatsiaris Y, Merialdo B, Lian S (eds) (2012) TV content analysis. Techniques and applications. CRC Press
Li Y, Merialdo B (2010) VERT: automatic evaluation of video summaries. In: Proceedings of ACM multimedia, pp 851–854
Mathieu B, Essid S, Fillon T, Prado J, Richard G (2010) YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings of the 11th International society for music information retrieval (ISMIR), pp 441–446
Messing D, van Beek P, Errico JH (2001) The mpeg-7 color structure descriptor: image description using color and local spatial information. In: Proceedings of the international conference on image processing (ICIP), pp 670–673
Naci U, Damnjanovic U, Mansencal B, Benois-Pineau J, Kaes C, Corvaglia M, Rossi E, Aginako N (2008) The COST292 experimental framework for rushes summarization task in TRECVID 2008. In: Proceedings of the 2nd ACM TRECVID video summarization workshop, pp 40–44
Peltonen V, Tuomi J, Klapuri A, Huopaniemi J, Sorsa T (2002) Computational auditory scene recognition. In: Proceedings of the 2002 IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 1941–1944
Pinquier J, Karaman S, Letoupin L, Guyot P, Mégret R, Benois-Pineau J, Gaëstel Y, Dartigues JF (2012) Strategies for multiple feature fusion with Hierarchical HMM: Application to activity recognition from wearable audiovisual sensors. In: Proceedings of the 21st international conference on pattern recognition (ICPR), pp 3192–3195
Quénot G, Benois-Pineau J, Mansencal B, Rossi E, Cord M, Precioso F, Gorisse D, Lambert P, Augereau B, Granjon L, Pellerin D, Rombaut M, Ayache S (2008) Rushes summarization by IRIM consortium: redundancy removal and multi-feature fusion. In: Proceedings of the 2nd ACM TRECVID video summarization workshop, pp 80–84
R Perez-Daniel K, Nakano-Miyatake M, Benois-Pineau J, Maabout S, Sargent G (2014) Scalable video summarization of cultural video documents in cross-media space based on data cube approach. In: Proceedings of the 12th international workshop on content-based multimedia indexing (CBMI), pp 1–6
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79
Article MathSciNet Google Scholar
Wang J, Liu P, She M, Kouzani A, Nahavandi S (2011) The MPEG-7 color structure descriptor: Image description using color and local spatial information. In: Proceedings of 2011 IEEE international conference on systems, man, and cybernetics (SMC), pp 2449–2454
Yeung M, Yeo BL (1996) Time-constrained clustering for segmentation of video into story units. In: Proceedings of the 13th international conference on pattern recognition (ICPR), vol. 3, pp 375–380
Yong-ge W, Sheng-ze P (2012) Research on image retrieval based on scalable color descriptor of mpeg7. Adv Control Commun:91–98

Download references

Acknowledgments

This work is supported by the French National Research Agency grant ANR-11-IS02-001 within the joint French-Mexican project Mex-Culture. We are grateful to the Institut National de l’Audiovisuel (INA, France) for providing us the video content we employed for setting up the evaluation. The authors thank Michel Crucianu and Marin Ferecatu for valuable discussions and master student Elie Génard for his efficient help in conducting computational experiments.

Author information

Authors and Affiliations

Vertigo-CEDRIC, CNAM, 292 rue St. Martin, 75141, Paris Cedex 03, France
Gabriel Sargent & Andrei Stoian
SEPI, ESIME Culhuacan National Polytechnic Institute IPN, Unidad Profesional Adolfo López Mateos, Edificio 7, Gustavo A. Madero, Linda Vista, 07738, Mexico City, Mexico
Karina R. Perez-Daniel & Mariko Nakano Miyatake
LaBRI, University of Bordeaux, Domaine universitaire, 351 cours de la Libration, 33405, Talence, France
Jenny Benois-Pineau, Sofian Maabout & Henri Nicolas
Institut National de l’Audiovisuel - INA Expert, 4 avenue de l’Europe, 94366, Bry-sur-Marne, France
Jean Carrive

Authors

Gabriel Sargent
View author publications
You can also search for this author in PubMed Google Scholar
Karina R. Perez-Daniel
View author publications
You can also search for this author in PubMed Google Scholar
Andrei Stoian
View author publications
You can also search for this author in PubMed Google Scholar
Jenny Benois-Pineau
View author publications
You can also search for this author in PubMed Google Scholar
Sofian Maabout
View author publications
You can also search for this author in PubMed Google Scholar
Henri Nicolas
View author publications
You can also search for this author in PubMed Google Scholar
Mariko Nakano Miyatake
View author publications
You can also search for this author in PubMed Google Scholar
Jean Carrive
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriel Sargent.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sargent, G., Perez-Daniel, K.R., Stoian, A. et al. A scalable summary generation method based on cross-modal consensus clustering and OLAP cube modeling. Multimed Tools Appl 75, 9073–9094 (2016). https://doi.org/10.1007/s11042-015-2863-3

Download citation

Received: 01 December 2014
Revised: 21 July 2015
Accepted: 03 August 2015
Published: 01 September 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s11042-015-2863-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A scalable summary generation method based on cross-modal consensus clustering and OLAP cube modeling

Abstract

Access this article

Similar content being viewed by others

Video Summarization by Robust Low-Rank Subspace Segmentation

A structure-transfer-driven temporal subspace clustering for video summarization

A Domain Independent Approach to Video Summarization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A scalable summary generation method based on cross-modal consensus clustering and OLAP cube modeling

Abstract

Access this article

Similar content being viewed by others

Video Summarization by Robust Low-Rank Subspace Segmentation

A structure-transfer-driven temporal subspace clustering for video summarization

A Domain Independent Approach to Video Summarization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation