Abstract
In this chapter, we present an overview of some of the research issues related to three areas of audio-visual analysis — (a) segmentation, (b) event detection and (c) summarization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
B. Arons Pitch-Based Emphasis Detection For Segmenting Speech Recordings, Proc. ICSLP 1994, Sep. 1994, vol. 4, pp. 1931–1934, Yokohama, Japan, 1994.
B. Adams et. al. Automated Film Rhythm Extraction for Scene Analysis, Proc. ICME 2001, Aug. 2001, Japan.
A.B. Benitez, S.F. Chang, J.R. Smith IMKA: A Multimedia Organization System Combining Perceptual and Semantic Knowledge, Proc. ACM MM 2001, Nov. 2001, Ottawa Canada.
A.S. Bregman Auditory Scene Analysis: The Perceptual Organization of Sound, MIT Press, 1990.
B. Burke and F. Shook, “Sports photography and reporting”, Chapter 12, in Television field production and reporting, 2nd Ed, Longman Publisher USA, 1996
M. Burrows, D.J. Wheeler A Block-sorting Lossless Data Compression Algorithm, Digital Systems Research Center Research Report #124, 1994.
C.-C. Chang, C.-J.Lin, LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/—cj1 in/l ibsvm
M.G. Christel et. al Evolving Video Skims into Useful Multimedia Abstractions, ACM CHI ‘88, pp. 171–78, Los Angeles, CA, Apr. 1998.
N. Christianini, J. Shawe-Taylor Support Vector Machines and other kernel-based learning methods, 2000, Cambridge University Press, New York.
T.M. Cover, J.A. Thomas Elements of Information Theory,1991, John Wiley and Sons.
S. Ebadollahi, S.F. Chang, H. Wu, Echocardiogram Videos: Summarization, Temporal Segmentation And Browsing, to appear in ICIP 2002, Sep. 2002, Rochester NY.
D.P.W. Ellis Prediction-Driven Computational Auditory Scene Analysis, Ph.D. thesis, Dept. of EECS, MIT, 1996.
J. Feldman Minimization of Boolean complexity in human concept learning, Nature, pp. 630–633, vol. 407, Oct. 2000.
Bob Foss Filmmaking: Narrative and Structural techniques Silman James Press LA, 1992.
Y. Gong; L.T. Sin; C. Chuan; H. Zhang; and M. Sakauchi, Automatic parsing of TV soccer programs, Proc. ICMCS ‘85, Washington D.C, May, 1995
B. Grosz J. Hirshberg Some Intonational Characteristics of Discourse Structure, Proc. Int. Conf. on Spoken Lang. Processing, pp. 429–432, 1992.
A. Hanjalic, R.L. Lagendijk, J. Biemond Automated high-level movie segmentation for advanced video-retrieval systems, IEEE Trans. on CSVT, Vol. 9 No. 4, pp. 580–88, Jun. 1999.
L. He et. al. Auto-Summarization of Audio-Video Presentations, ACM MM ‘89, Orlando FL, Nov. 1999.
J. Hirschberg, B. Groz Some Intonational Characteristics of Discourse Structure, Proc. ICSLP 1992.
J. Hirschberg D. Litman Empirical Studies on the Disambiguation of Cue Phrases, Computational Linguistics, 1992.
J. Huang; Z. Liu; Y. Wang, Joint video scene segmentation and classification based on hidden Markov model, Proc. ICME 2000, P 1551–1554 vol.3, New York, NY, July 30-Aug3, 2000
J. Huang; Z. Liu; Y. Wang, Integration of Audio and Visual Information for Content-Based Video Segmentation, Proc. ICIP 98. pp. 526–30, Chicago IL. Oct. 1998.
A. Jaimes and S.F. Chang, Concepts and Techniques for Indexing Visual Semantics, book chapter in Image Databases, Search and Retrieval of Digital Imagery, edited by V. Castelli and L. Bergman. Wiley Sons, New York, 2002
J.R. Kender B.L. Yeo, Video Scene Segmentation Via Continuous Video Coherence, CVPR ‘88, Santa Barbara CA, Jun. 1998.
R. Lienhart et. al. Automatic Movie Abstracting, Technical Report TR-97–003, Praktische Informatik IV, University of Mannheim, Jul. 1997.
L. Lu et. al. A robust audio classification and segmentation method, ACM Multimedia 2001, pp. 203–211, Ottawa, Canada, Oct. 2001.
T.S-Mahmood, D. Ponceleon, Learning video browsing behavior and its application in the generation of video previews, Proc. ACM Multimedia 2001, pp. 119–128, Ottawa, Canada, Oct. 2001.
[28] MPEG MDS Group, Text of ISO/IEC 15938–5 FDIS Information Technology Multimedia
Content Decsription Interface Part 5 Multimedia Description Schemes,ISO/IEC
JTC1/SC29/WG11 MPEG01/N4242, Sydney, July 2001.
J. Nam, A.H. Tewfik Combined audio and visual streams analysis for video sequence segmentation, Proc. ICASSP 97, pp. 2665 —2668, Munich, Germany, Apr. 1997.
M. Naphade et. al. Probabilistic Multimedia Objects Multijects: A novel Approach to Indexing and Retrieval in Multimedia Systems, Proc. I.E.E.E. International Conference on Image Processing, Volume 3, pages 536–540, Chicago, IL, Oct 1998.
M. Naphade et. al A Factor Graph Framework for Semantic Indexing and Retrieval in Video, Content-Based Access of Image and Video Library 2000 June 12, 2000 held in conjunction with the IEEE Computer Vision and Pattern Recognition 2000.
R. Patterson et. al. Complex Sounds and Auditory Images, in Auditory Physiology and Perception eds. Y Cazals et. al. pp. 429–46, Oxford, 1992.
S. Paek and S.-F. Chang, A Knowledge Engineering Approach for Image Classification Based on Probabilistic Reasoning Systems, IEEE International Conference on Multimedia and Expo. (ICME-2000), New York City, NY, USA, Jul 30-Aug 2, 2000.
S. Pfeiffer et. al. Abstracting Digital Movies Automatically, J. of Visual Communication and Image Representation, pp. 345–53, vol. 7, No. 4, Dec. 1996.
W.H. Press et. al Numerical recipes in C, 2nd ed. Cambridge University Press, 1992.
L. R. Rabiner B.H. Huang Fundamentals of Speech Recognition, Prentice-Hall 1993.
K. Reisz, G. Millar, The Technique of Film Editing,2nd ed. 1968, Focal Press.
C. Saraceno, R. Leonardi Identification of story units in audio-visual sequences by joint audio and video processing, Proc. ICIP 98. pp. 363–67, Chicago IL. Oct. 1998.
E. Scheirer M.Slaney Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator Proc. ICASSP ‘87, Munich, Germany Apr. 1997.
S. Pfeiffer et. al. Automatic Audio Content Analysis,Proc. ACM Multimedia ‘86, pp. 21–30. Boston, MA, Nov. 1996
J.Saunders Real Time Discrimination of Broadcast Speech/Music,Proc. ICASSP ‘86, pp. 993–6, Atlanata GA May 1996.
B. Shahraray, D.C. Gibbon Automated Authoring of Hypermedia Documents of Video Programs, in Proc. ACM MM 95, pp. 401–409, 1995.
D. O’Shaughnessy Recognition of Hesitations in Spontaneous Speech, Proc. ICASSP, 1992.
S. Sharff The Elements of Cinema: Towards a Theory of Cinesthetic Impact,1982, Columbia University Press.
L.J. Stifelman The Audio Notebook: Pen and Paper Interaction with Structured Speech, PhD Thesis, Program in Media Arts and Sciences, School of Architecture and Planning, MIT, Sep. 1997.
S. Subramaniam et. al. Towards Robust Features for Classifying Audio in the CueVideo System, Proc. ACM Multimedia ‘89, pp. 393–400, Orlando FL, Nov. 1999.
H. Sundaram S.F. Chang Audio Scene Segmentation Using Multiple Features, Models And Time Scales, ICASSP 2000, International Conference in Acoustics, Speech and Signal Processing, Istanbul Turkey, Jun. 2000.
H. Sundaram, S.F. Chang Determining Computable Scenes in Films and their Structures using Audio-Visual Memory Models, Proc. Of ACM Multimedia 2000, pp. 95–104, Nov. 2000, Los Angeles, CA.
H. Sundaram, S.F. Chang, Condensing Computable Scenes using Visual Complexity and Film Syntax Analysis, IEEE Proc. ICME 2001, Tokyo, Japan, Aug 22–25, 2001.
H. Sundaram L. Xie Shih-Fu Chang A framework work audio-visual skim generation. Tech. Rep. # 2002–14, Columbia University, April 2002.
H. Sundaram, S.F. Chang Computable Scenes and structures in Films, IEEE Trans. on Multimedia, Vol. 4, No. 2, June 2002.
Y. Taniguchi et. al. PanoramiaExcerpts: Extracting and Packing Panoramas for Video Browsing, in Proc. ACM MM 97, pp. 427–436, Seattle WA, Nov. 1997.
R. Tansley. The Multimedia Thesaurus: Adding A Semantic Layer to Multimedia Information. Ph.D. Thesis, Computer Science, University of Southampton, Southampton UK, August 2000.
V. Tovinkere, R. J. Qian, Detecting Semantic Events in Soccer Games: Towards A Complete Solution, Proc. ICME 2001, Tokyo, Japan, Aug 22–25, 2001
S. Uchihashi et. al. Video Manga:: Generating Semantically Meaningful Video Summaries Proc. ACM Multimedia ‘89, pp. 383–92, Orlando FL, Nov. 1999.
T. Verma A Perceptually Based Audio Signal Model with application to Scalable Audio Compression, PhD thesis, Dept. Of Electrical Eng. Stanford University, Oct. 1999.
L. Xie et. al Structure Analysis Of Soccer Video With Hidden Markov Models, to appear in ICASSP 2002, Orlando, FI, May 2002.
P. Xu, L. Xie, S.F. Chang, A. Divakaran, A, Vetro, and H. Sun, Algorithms and system for segmentation and structure analysis in soccer video, Proc. ICME 2001, Tokyo, Japan, Aug 2001
M. Yeung B.L. Yeo Time-Constrained Clustering for Segmentation of Video into Story Units, Proc. Int. Conf. on Pattern Recognition, ICPR ‘86, Vol. C pp. 375–380, Vienna Austria, Aug. 1996.
B.L. Yeo, M. Yeung Classification, Simplification and Dynamic Visualization of Scene Transition Graphs for Video Browsing, Proc. SPIE ‘88, Storage and Retrieval of Image and Video Databases VI, San Jose CA, Feb. 1998.
M. Yeung, B.L. Yeo and B. Liu, Segmentation of Video by Clustering and Graph Analysis, Computer Vision and Image Understanding, V. 71, No. 1, July 1998.
D. Yow, B.L.Yeo, M. Yeung, and G. Liu, “Analysis and Presentation of Soccer Highlights from Digital Video” Proc. ACCV, 1995, Singapore, Dec. 1995
T. Zhang C.0 Jay Kuo Heuristic Approach for Generic Audio Segmentation and Annotation, Proc. ACM Multimedia ‘89, pp. 67–76, Orlando FL, Nov. 1999.
D. Zhong and S.F. Chang, “Structure Analysis of Sports Video Using Domain Models”, Proc. 1CME 2001, Tokyo, Japan, Aug. 2001
D. Zhong Segmentation, Indexing and Summarization of Digital Video Content PhD Thesis, Dept. Of Electrical Eng. Columbia University, NY, Jan. 2001.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sundaram, H., Chang, SF. (2003). Video Analysis and Summarization at Structural and Semantic Levels. In: Feng, D.D., Siu, WC., Zhang, HJ. (eds) Multimedia Information Retrieval and Management. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-05300-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-662-05300-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05533-1
Online ISBN: 978-3-662-05300-3
eBook Packages: Springer Book Archive