Abstract
This paper describes a text-to-audiovisual speech synthesizer system incorporating the head and eye movements. The face is modeled using a set of images of a human subject. Visemes, that are a set of lip images of the phonemes, are extracted from a recorded video. A smooth transition between visemes is achieved by morphing along the correspondence between the visemes obtained by optical flows. This paper also describes methods for introducing nonverbal mechanisms in visual speech communication such as eye blinks and head nods. For eye movements, a simple mask based approach is used. View morphing is used to generate the head movement. A complete audiovisual sequence is constructed by concatenating the viseme transitions and synchronizing the visual stream with the audio stream. An effort has been made to integrate all these features into a single system, which takes text, head and eye movement parameters and produces the audiovisual stream.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tsuhan Chen and Ram R. Rao, Audio-Visual Integration in Multimodal Communication, Proc. IEEE, Vol 86, No. 5, pages 837–852.
G. Wolberg. Digital Image Warping, IEEE Computer Society Press, Los Alamitos, C.A., 1990.
C. Bergler, M. Covell and M Slaney. Video Rewrite. Driving visual speech with audio. In SIGGRAPH’97 Proceedings, Los Angeles, CA, August 1997.
E. Cosatto and H. Graf. Sample based synthesis of photorealistic talking heads. In Proceedings of Computer Animation’98, pages 103–110, Philadelphia, Pennsylvania, 1998.
M.M. Cohen and D.W Massaro, Modeling coarticulation in synthetic visual speech. In N.M. Thalmann and D. Thalmann, editors, Models and Techniques in Computer Animation, pages 138–156, Springer-Verley, Tokyo, 1993.
S.H. Watson, J.P Wright, K.C. Scott, D.S. Kagels, D. Freda and K.J. Hussey. An advanced morphing algorithm for interpolating phoneme images to simulate speech. Jet Propulsion Laboratory, California Institute of Technology, 1997.
Tony Ezzat and Tomaso Poggio. Visual Speech Synthesis by Morphing Visemes (MikeTalk). A.I Memo No: 1658, MIT AI Lab, May 1999.
D. Beymer, A. Shashua and T. Poggio. Example based image analysis and synthesis. Technical Report 1431, MIT AI Lab, 1993.
Tony Ezzat and Tomaso Poggio. Facial Analysis and Synthesis using Image-Based Models. In Proceedings of the Workshop on the Algorithmic Foundations of Robotics, Toulouse, France, August 1996.
Steven M. Seitz and Charles R. Dyer. View Morphing. In Proceedings of SIGGRAPH’96, pages 21–30, 1996.
B.K.P Horn and B.G. Schnuck. Determining Optical flow. Artificial Intelligence, 17:185–203, 1981.
F. Parke and K. Waters. Computer Facial Animation. A. K. Peters, Wellesley, Massachusetts, 1996.
Alan Watt and Fabio Policarpo. The Computer Image. ACM Press, New York, SIGGRAPH Series, New York.
Black and P. Taylor. The Festival Speech Synthesis System. University of Edinburgh, 1997.
Catherine Pelachaud. Communication and Coarticulation in Facial Animation. Ph.D. Thesis, Department of Computer and Information Science,Univ. of Pennsylvania, Philadelphia, 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Goyal, U.K., Kapoor, A., Kalra, P. (2000). Text-to-Audiovisual Speech Synthesizer. In: Heudin, JC. (eds) Virtual Worlds. VW 2000. Lecture Notes in Computer Science(), vol 1834. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45016-5_24
Download citation
DOI: https://doi.org/10.1007/3-540-45016-5_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67707-9
Online ISBN: 978-3-540-45016-0
eBook Packages: Springer Book Archive