Detection of Documentary Scene Changes by Audio-Visual Fusion

Velivelli, Atulya; Ngo, Chong-Wah; Huang, Thomas S.

doi:10.1007/3-540-45113-7_23

Atulya Velivelli⁸,
Chong-Wah Ngo⁹ &
Thomas S. Huang⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2728))

Included in the following conference series:

International Conference on Image and Video Retrieval

1210 Accesses
5 Citations

Abstract

The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Hampapur, R. Jain, and T. Weymouth, “Digital video segmentation,” in Proceedings of ACM Multimedia, San Francisco CA, October 1994, pp. 357–364.
Google Scholar
J.R. Kender and B. L. Yeo, “Video scene segmentation via continuous video coherence,” in CVPR, Santa Barbara CA, June 1998.
Google Scholar
C. Saraceno and R. Leonardi, “Audio as support to scene change detection and characterization of video sequences,” in Proceedings of ICASSP, vol 4, 1997, pp. 2597–2600.
Google Scholar
J. Huang, Z. Liu, and Y. Wang, “Integration of audio and visual information for content-based video segmentation,” in ICIP, Chicago, 1998.
Google Scholar
H. Sundaram and S.-F. Chang, “Video scene segmentation using audio and video features,” in ICME, New York, July 28–Aug 2, 2000.
Google Scholar
J. Huang, “Color-spatial image indexing and applications,” Ph.D. dissertation, Cornell University, 1998.
Google Scholar
C.W. Ngo, T. C. Pong, and R. T. Chin, “Video partitioning by temporal slice coherency,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, Aug 2001, pp. 941–953.
Article Google Scholar
H.V. Poor, An Introduction to Signal Detection and Estimation. New York: Springer, 2nd ed., 1994.
MATH Google Scholar
C.W. Ngo, T. C. Pong and H. J. Zhang, “Motion-based video representation for scene change detection,” International Journal of Computer Vision, vol. 50, No. 2, Nov, 2002.
Google Scholar
C.W. Ngo, “Motion Analysis and Segmentation through Spatio-temporal Slices Processing,” IEEE Trans. on Image Processing, Feb, 2003.
Google Scholar
D.A. Reynolds and R.C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72–83, Jan 1995.
Article Google Scholar
B. Maison, C. Neti, and A. Senior, “Audio-visual speaker recognition for video broadcast news: some fusion techniques,” in IEEE Multimedia Signal Processing Conference (MMSP99), Denmark, Sept 1999.
Google Scholar
L. Rabiner and B. H Juang, Fundamentals of speech recognition. New Jersey: Prentice Hall International Inc, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Beckman Institute for Advanced Science and Technology, Urbana
Atulya Velivelli & Thomas S. Huang
City University of Hong Kong, Hong Kong
Chong-Wah Ngo

Authors

Atulya Velivelli
View author publications
You can also search for this author in PubMed Google Scholar
Chong-Wah Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Thomas S. Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIACS Media Lab, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
Erwin M. Bakker & Michael S. Lew &
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, 405 N. Mathews Avenue, Urbana, IL, 61801, USA
Thomas S. Huang
University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Nicu Sebe
Siemens Corporate Research, 755 College Road East, Princeton, NJ, 08540, USA
Xiang Sean Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Velivelli, A., Ngo, CW., Huang, T.S. (2003). Detection of Documentary Scene Changes by Audio-Visual Fusion. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds) Image and Video Retrieval. CIVR 2003. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45113-7_23

Download citation

DOI: https://doi.org/10.1007/3-540-45113-7_23
Published: 24 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40634-1
Online ISBN: 978-3-540-45113-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics