Multi-modality video shot clustering with tensor representation

Liu, Yanan; Wu, Fei

doi:10.1007/s11042-008-0220-5

Multi-modality video shot clustering with tensor representation

Published: 03 October 2008

Volume 41, pages 93–109, (2009)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yanan Liu¹ &
Fei Wu¹

188 Accesses
8 Citations
Explore all metrics

Abstract

Video analysis and understanding is a challenging issue nowadays. Video data has multiple media modalities, which present a characteristic of temporal-sequenced associated cooccurrence (TSAC). Traditionally, videos are represented as vectors in the Euclidean space. Many learning algorithms are then applied to these vectors in a high dimensional space for dimensionality reduction, classification, clustering and recognition as well. However, the multiple modalities in video not only have their own properties, but also have correlations between them; whereas the simple vector representation weakens the power of these relatively independent modalities and even ignores their relations to some extent. Clustering is an important technique for multimedia data management. Recently, a powerful clustering algorithm named Affinity Propagation is devised. In this paper, we introduce a higher-order tensor framework for video analysis. In this framework, we represent image frame, audio stream and transcript text which are the three modalities in video shots as data points by the third-order tensor. Besides, we present a dimension reduction method for the high-dimensional features of video shots which explicitly considers the manifold structure of the tensor space from temporal-sequenced associated co-occurring multimodal media data. We call it TensorShot approach. Then we utilize the effective Affinity Propagation to cluster video shots that are in tensor form. Our algorithm preserves the intrinsic structure of the submanifold where tensorshots are sampled. The experiments on TRECVID2005 news video data set show that our algorithm achieves improved performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Tensor Framework for Data Stream Clustering and Compression

Tensor-Based Shot Boundary Detection in Video Streams

Article Open access 08 August 2017

Bogusław Cyganek & Michał Woźniak

A Multi-modal Clustering Method for Web Videos

References

Babaguchi N, Kawai Y, Kitahashi T (2002) Event based indexing of broadcast sports video by intermodal collaboration. IEEE Trans Multimedia 4(1):68–75
Article Google Scholar
Bader BW, Kolda TG (2004) MATLAB tensor classes for fast algorithm prototyping. Technical Report SAND2004-5187, Sandia National Laboratories
Bader BW, Kolda TG (2006) Efficient MATLAB computations with sparse and factored tensors. Technical Report SAND02006-7592, Sandia National Laboratories
Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst (NIPS2002) 15:585–591
Google Scholar
Chung FRK (1997) Spectral graph theory. In: Regional conference series in mathematics, vol 92
Dumais ST, Furnas GW, Landauer TK (1988) Using latent semantic analysis to improve access to textual information. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 281–285
Ekin A, Pankanti S, Hampapur A (2004) Initialization-independent spectral clustering with applications to automatic video analysis. In: IEEE international conference on aoustics, speech, and signal processing (ICASSP’04), vol 3(3), pp 641–644
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976
Article MathSciNet Google Scholar
He XF, Niyogi P (2003) Locality preserving projections. Adv Neural Inf Process Syst (NIPS2003)
He XF, Cai D, Liu HF, Han JW (2005) Image clustering with tensor representation. In: Proceedings of the ACM conference on multimedia, pp 132–140
He XF, Cai D, Niyogi P (2005) Tensor subspace analysis. Adv Neural Inf Process Syst (NIPS2005)
Itti L, Koch C, Niebur E (2003) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 25(9):1075–1088
Article Google Scholar
Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2:83–97
Article MathSciNet Google Scholar
Lathauwer LD (1997) Signal processing based on multilinear algebra. PhD thesis
Lathauwer LD, Moor BD, Vandewalle J (2000) A multilinear singular value decomposition. SIAM J Matrix Anal Appl 21(4):1253–1278
Article MATH MathSciNet Google Scholar
Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Article Google Scholar
Lee D, Seung H (2000) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562 (NIPS2000)
Google Scholar
Liu YN, Wu F (2007) Video semantic concept detection using multi-modality subspace correlation propagation. In: 13th international multimedia modeling conference (mmm2007). Singapore
Liu N, Zhang BY, Yan J, Chen Z, Liu WY, Bai FS, Chien LF (2005) Text representation: from vector to tensor. In: Proceedings of the fifth IEEE international conference on data mining (ICDM’05)
Naphade MR, Kennydy L, et al (2005) A light scale concept ontology for multimedia understanding for TRECVID 2005
Ngo C-W, Pong T-C, Zhang H-J (2000) On clustering and retrieval of video shots through temporal slices analysis. IEEE Trans Multimedia 4(4):446–458
Google Scholar
Rui Y, Huang T, (2000) A unified framework for video browsing and retrieval. In: Bovik A (ed) Image and video processing handbook. New York, pp 705–715
Snoek CGM, Worring M (2005) Multimedia event-based video indexing using time intervals. IEEE Trans Multimedia 7(4):638–647
Article Google Scholar
Snoek CGM, Worring M, Smeulders AWM (2005) Early versus late fusion in semantic video analysis. In: Proceedings of the 13th annual ACM international conference on multimedia, pp 399–402
Tao DC, Li XL, Wu XD, Maybank SJ (2006) Human carrying status in visual surveillance. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06)
Turk MA, Pentland AP (1991) Face recognition using eigenfaces. In: IEEE computer society conference on computer vision and pattern recognition, pp 586–591
Vasilescu MAO, Terzopoulos D (2002) Multilinear analysis of image ensembles: tensorfaces. In: Proceedings of 7th European conference on computer vision, vol 2350, pp 447–460
Yedidia JS, Freeman WT, Weiss Y (2003) Understanding belief propagation and its generalizations. Exploring artificial intelligence in the new millennium. ISBN 1558608117, Chap 8, pp 239–236
Zhang D-Q, Lin C-Y, Chang S-F, Smith JR (2004) Semantic video clustering across sources using bipartite spectral clustering. In: IEEE international conference on multimedia and expo (ICME’04), vol 1, pp 117–120
Zheng X, Cai D, He XF, Ma WY, Lin XY (2004) Locality preserving clustering for image database. In: Proceedings of the ACM conference on multimedia, pp 885–891

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (No.60603096, No. 60533090), Key Technology R&D Program (2006BAH02A13-4), The National High Technology Research and Development Program of China (2006AA010107), Program for Changjiang Scholars and Innovative Research Team in University (IRT0652,PCSIRT), The Cultivation Fund of the Key Scientific and Technical Innovation Project, Ministry of Education of China (No. 706033).

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Yanan Liu & Fei Wu

Authors

Yanan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Wu, F. Multi-modality video shot clustering with tensor representation. Multimed Tools Appl 41, 93–109 (2009). https://doi.org/10.1007/s11042-008-0220-5

Download citation

Published: 03 October 2008
Issue Date: January 2009
DOI: https://doi.org/10.1007/s11042-008-0220-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multi-modality video shot clustering with tensor representation

Abstract

Access this article

Similar content being viewed by others

A Tensor Framework for Data Stream Clustering and Compression

Tensor-Based Shot Boundary Detection in Video Streams

A Multi-modal Clustering Method for Web Videos

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-modality video shot clustering with tensor representation

Abstract

Access this article

Similar content being viewed by others

A Tensor Framework for Data Stream Clustering and Compression

Tensor-Based Shot Boundary Detection in Video Streams

A Multi-modal Clustering Method for Web Videos

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation