Abstract
The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Hampapur, R. Jain, and T. Weymouth, “Digital video segmentation,” in Proceedings of ACM Multimedia, San Francisco CA, October 1994, pp. 357–364.
J.R. Kender and B. L. Yeo, “Video scene segmentation via continuous video coherence,” in CVPR, Santa Barbara CA, June 1998.
C. Saraceno and R. Leonardi, “Audio as support to scene change detection and characterization of video sequences,” in Proceedings of ICASSP, vol 4, 1997, pp. 2597–2600.
J. Huang, Z. Liu, and Y. Wang, “Integration of audio and visual information for content-based video segmentation,” in ICIP, Chicago, 1998.
H. Sundaram and S.-F. Chang, “Video scene segmentation using audio and video features,” in ICME, New York, July 28–Aug 2, 2000.
J. Huang, “Color-spatial image indexing and applications,” Ph.D. dissertation, Cornell University, 1998.
C.W. Ngo, T. C. Pong, and R. T. Chin, “Video partitioning by temporal slice coherency,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, Aug 2001, pp. 941–953.
H.V. Poor, An Introduction to Signal Detection and Estimation. New York: Springer, 2nd ed., 1994.
C.W. Ngo, T. C. Pong and H. J. Zhang, “Motion-based video representation for scene change detection,” International Journal of Computer Vision, vol. 50, No. 2, Nov, 2002.
C.W. Ngo, “Motion Analysis and Segmentation through Spatio-temporal Slices Processing,” IEEE Trans. on Image Processing, Feb, 2003.
D.A. Reynolds and R.C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72–83, Jan 1995.
B. Maison, C. Neti, and A. Senior, “Audio-visual speaker recognition for video broadcast news: some fusion techniques,” in IEEE Multimedia Signal Processing Conference (MMSP99), Denmark, Sept 1999.
L. Rabiner and B. H Juang, Fundamentals of speech recognition. New Jersey: Prentice Hall International Inc, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Velivelli, A., Ngo, CW., Huang, T.S. (2003). Detection of Documentary Scene Changes by Audio-Visual Fusion. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds) Image and Video Retrieval. CIVR 2003. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45113-7_23
Download citation
DOI: https://doi.org/10.1007/3-540-45113-7_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40634-1
Online ISBN: 978-3-540-45113-6
eBook Packages: Springer Book Archive