Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement

Ikeda, Osamu

doi:10.1007/978-3-540-76856-2_59

Osamu Ikeda¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4842))

Included in the following conference series:

International Symposium on Visual Computing

2908 Accesses

Abstract

We present a robust method to detect and locate a speaker using a joint analysis of speech sound and video image. First, the short speech sound data is analyzed to estimate the rate of spoken syllables, and a difference image is formed using the optimal frame distance derived from the rate to detect the candidates of mouth. Then, they are tracked to positively prove that one of the candidates is the mouth; the rate of mouth movements is estimated from the brightness change profiles for the first candidate and, if both the rates agree, the three brightest parts are detected in the resulting difference image as mouth and eyes. If not, the second candidate is tracked and so on. The first-order moment of the power spectrum of the brightness change profile and the lateral shifts in the tracking are also used to check whether or not they are facial parts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Voice activity detection based on facial movement

Article Open access 22 July 2015

Voice Activity Detection for Monaural Speech Enhancement Using Visual Cues

Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection

Article 24 November 2020

References

Chellappa, R., Wilson, C.L., Sirohey, S.: Human and Machine Recognition of Faces: A Survey. Proc. IEEE 83, 705–740 (1995)
Article Google Scholar
Yang, M.-H., Kriegman, D.J., Ahuja, N.: Detecting Faces in Images: A Survey. IEEE Trans. PAMI 24, 34–58 (2002)
Google Scholar
Rowley, H.A., Baluja, S., Kanade, T.: Neural Network-Based Face Detection. IEEE Trans. PAMI 20, 23–38 (1998)
Google Scholar
Schneiderman, H., Kanade, T.: Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition. CVPR, 45–51 (1998)
Google Scholar
Osuna, E., Freund, R., Girosi, G.: Training Support Vector Machines: An Application to Face Detection. In: CVPR, pp. 130–136 (1997)
Google Scholar
Turk, M.A., Pentland, A.P.: Eigenfaces for pattern recognition. J. Cognitive Neuroscience 3, 71–96 (1991)
Article Google Scholar
Viola, P., Jones, M.: Rapid Object Detection Using a Boosted Cascade of Simple Features. CVPR 1, 511–518 (2001)
Google Scholar
Fröba, B., Ernst, A., Küblbeck, C.: Real-Time Face Detection. In: Proc. 4 th IASTED Signal and Image Processing, pp. 497–502 (2002)
Google Scholar
Freund, Y., Schapire, R.E.: A Short Introduction to Boosting. J. Jpn. Soc. Artificial Intelligence 14, 771–780 (1999)
Google Scholar
Wang, Y., Liu, Z., Huang, J.: Multimedia Content Analysis. IEEE Signal Processing Magazine 17, 12–36 (2000)
Article Google Scholar
Satoh, S., Nakamura, Y., Kanade, T.: Name-It: Naming and Detecting Faces in News Videos. IEEE Multimedia 6, 22–35 (1999)
Article Google Scholar
Wang, D.: Unsupervised Video Segmentation Based on Watersheds and Temporal Tracking. IEEE Trans. Circuits & Systems for Video Tech. 8, 539–545 (1998)
Article Google Scholar
Toklu, C., Tekalp, A.M., Erdem, A.T.: Simultaneous Alpha Map Generation and 2D Mesh Tracking for Multimedia Applications. ICIP 1, 113–116 (1997)
Google Scholar
Wang, J., Kankanhalli, M.S.: Experience based Sampling Technique for Multimedia Analysis. Proc. ACM Multimedia, 319–322 (2003)
Google Scholar
Ikeda, O.: Segmentation of Faces in Video Footage Using HSV Color for Face Detection and Image Retrieval. ICIP 3, 913–916 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering, Takushoku University, 815-1 Tate, Hachioji, Tokyo, 193-0985, Japan
Osamu Ikeda

Authors

Osamu Ikeda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

George Bebis Richard Boyle Bahram Parvin Darko Koracin Nikos Paragios Syeda-Mahmood Tanveer Tao Ju Zicheng Liu Sabine Coquillart Carolina Cruz-Neira Torsten Müller Tom Malzbender

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ikeda, O. (2007). Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2007. Lecture Notes in Computer Science, vol 4842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76856-2_59

Download citation

DOI: https://doi.org/10.1007/978-3-540-76856-2_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76855-5
Online ISBN: 978-3-540-76856-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement

Abstract

Access this chapter

Preview

Similar content being viewed by others

Voice activity detection based on facial movement

Voice Activity Detection for Monaural Speech Enhancement Using Visual Cues

Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement

Abstract

Access this chapter

Preview

Similar content being viewed by others

Voice activity detection based on facial movement

Voice Activity Detection for Monaural Speech Enhancement Using Visual Cues

Using Spasmodic Closure Patterns to Simplify Visual Voice Activity Detection

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation