Video Analysis and Summarization at Structural and Semantic Levels

Sundaram, Hari; Chang, Shih-Fu

doi:10.1007/978-3-662-05300-3_4

Hari Sundaram &
Shih-Fu Chang

Part of the book series: Signals and Communication Technology ((SCT))

461 Accesses
2 Citations

Abstract

In this chapter, we present an overview of some of the research issues related to three areas of audio-visual analysis — (a) segmentation, (b) event detection and (c) summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

B. Arons Pitch-Based Emphasis Detection For Segmenting Speech Recordings, Proc. ICSLP 1994, Sep. 1994, vol. 4, pp. 1931–1934, Yokohama, Japan, 1994.
Google Scholar
B. Adams et. al. Automated Film Rhythm Extraction for Scene Analysis, Proc. ICME 2001, Aug. 2001, Japan.
Google Scholar
A.B. Benitez, S.F. Chang, J.R. Smith IMKA: A Multimedia Organization System Combining Perceptual and Semantic Knowledge, Proc. ACM MM 2001, Nov. 2001, Ottawa Canada.
Google Scholar
A.S. Bregman Auditory Scene Analysis: The Perceptual Organization of Sound, MIT Press, 1990.
Google Scholar
B. Burke and F. Shook, “Sports photography and reporting”, Chapter 12, in Television field production and reporting, 2nd Ed, Longman Publisher USA, 1996
Google Scholar
M. Burrows, D.J. Wheeler A Block-sorting Lossless Data Compression Algorithm, Digital Systems Research Center Research Report #124, 1994.
Google Scholar
C.-C. Chang, C.-J.Lin, LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/—cj1 in/l ibsvm
Google Scholar
M.G. Christel et. al Evolving Video Skims into Useful Multimedia Abstractions, ACM CHI ‘88, pp. 171–78, Los Angeles, CA, Apr. 1998.
Google Scholar
N. Christianini, J. Shawe-Taylor Support Vector Machines and other kernel-based learning methods, 2000, Cambridge University Press, New York.
Google Scholar
T.M. Cover, J.A. Thomas Elements of Information Theory,1991, John Wiley and Sons.
Google Scholar
S. Ebadollahi, S.F. Chang, H. Wu, Echocardiogram Videos: Summarization, Temporal Segmentation And Browsing, to appear in ICIP 2002, Sep. 2002, Rochester NY.
Google Scholar
D.P.W. Ellis Prediction-Driven Computational Auditory Scene Analysis, Ph.D. thesis, Dept. of EECS, MIT, 1996.
Google Scholar
J. Feldman Minimization of Boolean complexity in human concept learning, Nature, pp. 630–633, vol. 407, Oct. 2000.
Article Google Scholar
Bob Foss Filmmaking: Narrative and Structural techniques Silman James Press LA, 1992.
Google Scholar
Y. Gong; L.T. Sin; C. Chuan; H. Zhang; and M. Sakauchi, Automatic parsing of TV soccer programs, Proc. ICMCS ‘85, Washington D.C, May, 1995
Google Scholar
B. Grosz J. Hirshberg Some Intonational Characteristics of Discourse Structure, Proc. Int. Conf. on Spoken Lang. Processing, pp. 429–432, 1992.
Google Scholar
A. Hanjalic, R.L. Lagendijk, J. Biemond Automated high-level movie segmentation for advanced video-retrieval systems, IEEE Trans. on CSVT, Vol. 9 No. 4, pp. 580–88, Jun. 1999.
Google Scholar
L. He et. al. Auto-Summarization of Audio-Video Presentations, ACM MM ‘89, Orlando FL, Nov. 1999.
Google Scholar
J. Hirschberg, B. Groz Some Intonational Characteristics of Discourse Structure, Proc. ICSLP 1992.
Google Scholar
J. Hirschberg D. Litman Empirical Studies on the Disambiguation of Cue Phrases, Computational Linguistics, 1992.
Google Scholar
J. Huang; Z. Liu; Y. Wang, Joint video scene segmentation and classification based on hidden Markov model, Proc. ICME 2000, P 1551–1554 vol.3, New York, NY, July 30-Aug3, 2000
Google Scholar
J. Huang; Z. Liu; Y. Wang, Integration of Audio and Visual Information for Content-Based Video Segmentation, Proc. ICIP 98. pp. 526–30, Chicago IL. Oct. 1998.
Google Scholar
A. Jaimes and S.F. Chang, Concepts and Techniques for Indexing Visual Semantics, book chapter in Image Databases, Search and Retrieval of Digital Imagery, edited by V. Castelli and L. Bergman. Wiley Sons, New York, 2002
Google Scholar
J.R. Kender B.L. Yeo, Video Scene Segmentation Via Continuous Video Coherence, CVPR ‘88, Santa Barbara CA, Jun. 1998.
Google Scholar
R. Lienhart et. al. Automatic Movie Abstracting, Technical Report TR-97–003, Praktische Informatik IV, University of Mannheim, Jul. 1997.
Google Scholar
L. Lu et. al. A robust audio classification and segmentation method, ACM Multimedia 2001, pp. 203–211, Ottawa, Canada, Oct. 2001.
Google Scholar
T.S-Mahmood, D. Ponceleon, Learning video browsing behavior and its application in the generation of video previews, Proc. ACM Multimedia 2001, pp. 119–128, Ottawa, Canada, Oct. 2001.
Google Scholar
[28] MPEG MDS Group, Text of ISO/IEC 15938–5 FDIS Information Technology Multimedia
Google Scholar
Content Decsription Interface Part 5 Multimedia Description Schemes,ISO/IEC
Google Scholar
JTC1/SC29/WG11 MPEG01/N4242, Sydney, July 2001.
Google Scholar
J. Nam, A.H. Tewfik Combined audio and visual streams analysis for video sequence segmentation, Proc. ICASSP 97, pp. 2665 —2668, Munich, Germany, Apr. 1997.
Google Scholar
M. Naphade et. al. Probabilistic Multimedia Objects Multijects: A novel Approach to Indexing and Retrieval in Multimedia Systems, Proc. I.E.E.E. International Conference on Image Processing, Volume 3, pages 536–540, Chicago, IL, Oct 1998.
Google Scholar
M. Naphade et. al A Factor Graph Framework for Semantic Indexing and Retrieval in Video, Content-Based Access of Image and Video Library 2000 June 12, 2000 held in conjunction with the IEEE Computer Vision and Pattern Recognition 2000.
Google Scholar
R. Patterson et. al. Complex Sounds and Auditory Images, in Auditory Physiology and Perception eds. Y Cazals et. al. pp. 429–46, Oxford, 1992.
Google Scholar
S. Paek and S.-F. Chang, A Knowledge Engineering Approach for Image Classification Based on Probabilistic Reasoning Systems, IEEE International Conference on Multimedia and Expo. (ICME-2000), New York City, NY, USA, Jul 30-Aug 2, 2000.
Google Scholar
S. Pfeiffer et. al. Abstracting Digital Movies Automatically, J. of Visual Communication and Image Representation, pp. 345–53, vol. 7, No. 4, Dec. 1996.
Article MATH Google Scholar
W.H. Press et. al Numerical recipes in C, 2nd ed. Cambridge University Press, 1992.
Google Scholar
L. R. Rabiner B.H. Huang Fundamentals of Speech Recognition, Prentice-Hall 1993.
Google Scholar
K. Reisz, G. Millar, The Technique of Film Editing,2^nd ed. 1968, Focal Press.
Google Scholar
C. Saraceno, R. Leonardi Identification of story units in audio-visual sequences by joint audio and video processing, Proc. ICIP 98. pp. 363–67, Chicago IL. Oct. 1998.
Google Scholar
E. Scheirer M.Slaney Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator Proc. ICASSP ‘87, Munich, Germany Apr. 1997.
Google Scholar
S. Pfeiffer et. al. Automatic Audio Content Analysis,Proc. ACM Multimedia ‘86, pp. 21–30. Boston, MA, Nov. 1996
Google Scholar
J.Saunders Real Time Discrimination of Broadcast Speech/Music,Proc. ICASSP ‘86, pp. 993–6, Atlanata GA May 1996.
Google Scholar
B. Shahraray, D.C. Gibbon Automated Authoring of Hypermedia Documents of Video Programs, in Proc. ACM MM 95, pp. 401–409, 1995.
Google Scholar
D. O’Shaughnessy Recognition of Hesitations in Spontaneous Speech, Proc. ICASSP, 1992.
Google Scholar
S. Sharff The Elements of Cinema: Towards a Theory of Cinesthetic Impact,1982, Columbia University Press.
Google Scholar
L.J. Stifelman The Audio Notebook: Pen and Paper Interaction with Structured Speech, PhD Thesis, Program in Media Arts and Sciences, School of Architecture and Planning, MIT, Sep. 1997.
Google Scholar
S. Subramaniam et. al. Towards Robust Features for Classifying Audio in the CueVideo System, Proc. ACM Multimedia ‘89, pp. 393–400, Orlando FL, Nov. 1999.
Google Scholar
H. Sundaram S.F. Chang Audio Scene Segmentation Using Multiple Features, Models And Time Scales, ICASSP 2000, International Conference in Acoustics, Speech and Signal Processing, Istanbul Turkey, Jun. 2000.
Google Scholar
H. Sundaram, S.F. Chang Determining Computable Scenes in Films and their Structures using Audio-Visual Memory Models, Proc. Of ACM Multimedia 2000, pp. 95–104, Nov. 2000, Los Angeles, CA.
Google Scholar
H. Sundaram, S.F. Chang, Condensing Computable Scenes using Visual Complexity and Film Syntax Analysis, IEEE Proc. ICME 2001, Tokyo, Japan, Aug 22–25, 2001.
Google Scholar
H. Sundaram L. Xie Shih-Fu Chang A framework work audio-visual skim generation. Tech. Rep. # 2002–14, Columbia University, April 2002.
Google Scholar
H. Sundaram, S.F. Chang Computable Scenes and structures in Films, IEEE Trans. on Multimedia, Vol. 4, No. 2, June 2002.
Google Scholar
Y. Taniguchi et. al. PanoramiaExcerpts: Extracting and Packing Panoramas for Video Browsing, in Proc. ACM MM 97, pp. 427–436, Seattle WA, Nov. 1997.
Google Scholar
R. Tansley. The Multimedia Thesaurus: Adding A Semantic Layer to Multimedia Information. Ph.D. Thesis, Computer Science, University of Southampton, Southampton UK, August 2000.
Google Scholar
V. Tovinkere, R. J. Qian, Detecting Semantic Events in Soccer Games: Towards A Complete Solution, Proc. ICME 2001, Tokyo, Japan, Aug 22–25, 2001
Google Scholar
S. Uchihashi et. al. Video Manga:: Generating Semantically Meaningful Video Summaries Proc. ACM Multimedia ‘89, pp. 383–92, Orlando FL, Nov. 1999.
Google Scholar
T. Verma A Perceptually Based Audio Signal Model with application to Scalable Audio Compression, PhD thesis, Dept. Of Electrical Eng. Stanford University, Oct. 1999.
Google Scholar
L. Xie et. al Structure Analysis Of Soccer Video With Hidden Markov Models, to appear in ICASSP 2002, Orlando, FI, May 2002.
Google Scholar
P. Xu, L. Xie, S.F. Chang, A. Divakaran, A, Vetro, and H. Sun, Algorithms and system for segmentation and structure analysis in soccer video, Proc. ICME 2001, Tokyo, Japan, Aug 2001
Google Scholar
M. Yeung B.L. Yeo Time-Constrained Clustering for Segmentation of Video into Story Units, Proc. Int. Conf. on Pattern Recognition, ICPR ‘86, Vol. C pp. 375–380, Vienna Austria, Aug. 1996.
Google Scholar
B.L. Yeo, M. Yeung Classification, Simplification and Dynamic Visualization of Scene Transition Graphs for Video Browsing, Proc. SPIE ‘88, Storage and Retrieval of Image and Video Databases VI, San Jose CA, Feb. 1998.
Google Scholar
M. Yeung, B.L. Yeo and B. Liu, Segmentation of Video by Clustering and Graph Analysis, Computer Vision and Image Understanding, V. 71, No. 1, July 1998.
Google Scholar
D. Yow, B.L.Yeo, M. Yeung, and G. Liu, “Analysis and Presentation of Soccer Highlights from Digital Video” Proc. ACCV, 1995, Singapore, Dec. 1995
Google Scholar
T. Zhang C.0 Jay Kuo Heuristic Approach for Generic Audio Segmentation and Annotation, Proc. ACM Multimedia ‘89, pp. 67–76, Orlando FL, Nov. 1999.
Google Scholar
D. Zhong and S.F. Chang, “Structure Analysis of Sports Video Using Domain Models”, Proc. 1CME 2001, Tokyo, Japan, Aug. 2001
Google Scholar
D. Zhong Segmentation, Indexing and Summarization of Digital Video Content PhD Thesis, Dept. Of Electrical Eng. Columbia University, NY, Jan. 2001.
Google Scholar

Download references

Authors

Dr. Hari Sundaram
View author publications
You can also search for this author in PubMed Google Scholar
Prof. Shih-Fu Chang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Multimedia Signal Processing, The Hong Kong Polytechnic University, China
David Dagan Feng PhD & Wan-Chi Siu PhD &
School of Information Technologies, The University of Sydney, Australia
David Dagan Feng PhD
Microsoft Research, Asia, China
Hong-Jiang Zhang PhD

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sundaram, H., Chang, SF. (2003). Video Analysis and Summarization at Structural and Semantic Levels. In: Feng, D.D., Siu, WC., Zhang, HJ. (eds) Multimedia Information Retrieval and Management. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-05300-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-662-05300-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05533-1
Online ISBN: 978-3-662-05300-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics