Skip to main content

Detection of Documentary Scene Changes by Audio-Visual Fusion

  • Conference paper
  • First Online:
Image and Video Retrieval (CIVR 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2728))

Included in the following conference series:

Abstract

The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Hampapur, R. Jain, and T. Weymouth, “Digital video segmentation,” in Proceedings of ACM Multimedia, San Francisco CA, October 1994, pp. 357–364.

    Google Scholar 

  2. J.R. Kender and B. L. Yeo, “Video scene segmentation via continuous video coherence,” in CVPR, Santa Barbara CA, June 1998.

    Google Scholar 

  3. C. Saraceno and R. Leonardi, “Audio as support to scene change detection and characterization of video sequences,” in Proceedings of ICASSP, vol 4, 1997, pp. 2597–2600.

    Google Scholar 

  4. J. Huang, Z. Liu, and Y. Wang, “Integration of audio and visual information for content-based video segmentation,” in ICIP, Chicago, 1998.

    Google Scholar 

  5. H. Sundaram and S.-F. Chang, “Video scene segmentation using audio and video features,” in ICME, New York, July 28–Aug 2, 2000.

    Google Scholar 

  6. J. Huang, “Color-spatial image indexing and applications,” Ph.D. dissertation, Cornell University, 1998.

    Google Scholar 

  7. C.W. Ngo, T. C. Pong, and R. T. Chin, “Video partitioning by temporal slice coherency,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, Aug 2001, pp. 941–953.

    Article  Google Scholar 

  8. H.V. Poor, An Introduction to Signal Detection and Estimation. New York: Springer, 2nd ed., 1994.

    MATH  Google Scholar 

  9. C.W. Ngo, T. C. Pong and H. J. Zhang, “Motion-based video representation for scene change detection,” International Journal of Computer Vision, vol. 50, No. 2, Nov, 2002.

    Google Scholar 

  10. C.W. Ngo, “Motion Analysis and Segmentation through Spatio-temporal Slices Processing,” IEEE Trans. on Image Processing, Feb, 2003.

    Google Scholar 

  11. D.A. Reynolds and R.C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72–83, Jan 1995.

    Article  Google Scholar 

  12. B. Maison, C. Neti, and A. Senior, “Audio-visual speaker recognition for video broadcast news: some fusion techniques,” in IEEE Multimedia Signal Processing Conference (MMSP99), Denmark, Sept 1999.

    Google Scholar 

  13. L. Rabiner and B. H Juang, Fundamentals of speech recognition. New Jersey: Prentice Hall International Inc, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Velivelli, A., Ngo, CW., Huang, T.S. (2003). Detection of Documentary Scene Changes by Audio-Visual Fusion. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds) Image and Video Retrieval. CIVR 2003. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45113-7_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-45113-7_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40634-1

  • Online ISBN: 978-3-540-45113-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics