Text-to-Audiovisual Speech Synthesizer

Goyal, Udit Kumar; Kapoor, Ashish; Kalra, Prem

doi:10.1007/3-540-45016-5_24

Udit Kumar Goyal²,
Ashish Kapoor² &
Prem Kalra²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1834))

Included in the following conference series:

International Conference on Virtual Worlds

421 Accesses
4 Citations

Abstract

This paper describes a text-to-audiovisual speech synthesizer system incorporating the head and eye movements. The face is modeled using a set of images of a human subject. Visemes, that are a set of lip images of the phonemes, are extracted from a recorded video. A smooth transition between visemes is achieved by morphing along the correspondence between the visemes obtained by optical flows. This paper also describes methods for introducing nonverbal mechanisms in visual speech communication such as eye blinks and head nods. For eye movements, a simple mask based approach is used. View morphing is used to generate the head movement. A complete audiovisual sequence is constructed by concatenating the viseme transitions and synchronizing the visual stream with the audio stream. An effort has been made to integrate all these features into a single system, which takes text, head and eye movement parameters and produces the audiovisual stream.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tsuhan Chen and Ram R. Rao, Audio-Visual Integration in Multimodal Communication, Proc. IEEE, Vol 86, No. 5, pages 837–852.
Google Scholar
G. Wolberg. Digital Image Warping, IEEE Computer Society Press, Los Alamitos, C.A., 1990.
Google Scholar
C. Bergler, M. Covell and M Slaney. Video Rewrite. Driving visual speech with audio. In SIGGRAPH’97 Proceedings, Los Angeles, CA, August 1997.
Google Scholar
E. Cosatto and H. Graf. Sample based synthesis of photorealistic talking heads. In Proceedings of Computer Animation’98, pages 103–110, Philadelphia, Pennsylvania, 1998.
Google Scholar
M.M. Cohen and D.W Massaro, Modeling coarticulation in synthetic visual speech. In N.M. Thalmann and D. Thalmann, editors, Models and Techniques in Computer Animation, pages 138–156, Springer-Verley, Tokyo, 1993.
Google Scholar
S.H. Watson, J.P Wright, K.C. Scott, D.S. Kagels, D. Freda and K.J. Hussey. An advanced morphing algorithm for interpolating phoneme images to simulate speech. Jet Propulsion Laboratory, California Institute of Technology, 1997.
Google Scholar
Tony Ezzat and Tomaso Poggio. Visual Speech Synthesis by Morphing Visemes (MikeTalk). A.I Memo No: 1658, MIT AI Lab, May 1999.
Google Scholar
D. Beymer, A. Shashua and T. Poggio. Example based image analysis and synthesis. Technical Report 1431, MIT AI Lab, 1993.
Google Scholar
Tony Ezzat and Tomaso Poggio. Facial Analysis and Synthesis using Image-Based Models. In Proceedings of the Workshop on the Algorithmic Foundations of Robotics, Toulouse, France, August 1996.
Google Scholar
Steven M. Seitz and Charles R. Dyer. View Morphing. In Proceedings of SIGGRAPH’96, pages 21–30, 1996.
Google Scholar
B.K.P Horn and B.G. Schnuck. Determining Optical flow. Artificial Intelligence, 17:185–203, 1981.
Article Google Scholar
F. Parke and K. Waters. Computer Facial Animation. A. K. Peters, Wellesley, Massachusetts, 1996.
Google Scholar
Alan Watt and Fabio Policarpo. The Computer Image. ACM Press, New York, SIGGRAPH Series, New York.
Google Scholar
Black and P. Taylor. The Festival Speech Synthesis System. University of Edinburgh, 1997.
Google Scholar
Catherine Pelachaud. Communication and Coarticulation in Facial Animation. Ph.D. Thesis, Department of Computer and Information Science,Univ. of Pennsylvania, Philadelphia, 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Delhi
Udit Kumar Goyal, Ashish Kapoor & Prem Kalra

Authors

Udit Kumar Goyal
View author publications
You can also search for this author in PubMed Google Scholar
Ashish Kapoor
View author publications
You can also search for this author in PubMed Google Scholar
Prem Kalra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

International Institute of Multimedia, Pôle Universitaire Léonard de Vinci, 92916, Paris La Défense Cedex, France
Jean-Claude Heudin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goyal, U.K., Kapoor, A., Kalra, P. (2000). Text-to-Audiovisual Speech Synthesizer. In: Heudin, JC. (eds) Virtual Worlds. VW 2000. Lecture Notes in Computer Science(), vol 1834. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45016-5_24

Download citation

DOI: https://doi.org/10.1007/3-540-45016-5_24
Published: 09 June 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67707-9
Online ISBN: 978-3-540-45016-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics