Skip to main content

Text-to-Audiovisual Speech Synthesizer

  • Conference paper
  • First Online:
Virtual Worlds (VW 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1834))

Included in the following conference series:

Abstract

This paper describes a text-to-audiovisual speech synthesizer system incorporating the head and eye movements. The face is modeled using a set of images of a human subject. Visemes, that are a set of lip images of the phonemes, are extracted from a recorded video. A smooth transition between visemes is achieved by morphing along the correspondence between the visemes obtained by optical flows. This paper also describes methods for introducing nonverbal mechanisms in visual speech communication such as eye blinks and head nods. For eye movements, a simple mask based approach is used. View morphing is used to generate the head movement. A complete audiovisual sequence is constructed by concatenating the viseme transitions and synchronizing the visual stream with the audio stream. An effort has been made to integrate all these features into a single system, which takes text, head and eye movement parameters and produces the audiovisual stream.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tsuhan Chen and Ram R. Rao, Audio-Visual Integration in Multimodal Communication, Proc. IEEE, Vol 86, No. 5, pages 837–852.

    Google Scholar 

  2. G. Wolberg. Digital Image Warping, IEEE Computer Society Press, Los Alamitos, C.A., 1990.

    Google Scholar 

  3. C. Bergler, M. Covell and M Slaney. Video Rewrite. Driving visual speech with audio. In SIGGRAPH’97 Proceedings, Los Angeles, CA, August 1997.

    Google Scholar 

  4. E. Cosatto and H. Graf. Sample based synthesis of photorealistic talking heads. In Proceedings of Computer Animation’98, pages 103–110, Philadelphia, Pennsylvania, 1998.

    Google Scholar 

  5. M.M. Cohen and D.W Massaro, Modeling coarticulation in synthetic visual speech. In N.M. Thalmann and D. Thalmann, editors, Models and Techniques in Computer Animation, pages 138–156, Springer-Verley, Tokyo, 1993.

    Google Scholar 

  6. S.H. Watson, J.P Wright, K.C. Scott, D.S. Kagels, D. Freda and K.J. Hussey. An advanced morphing algorithm for interpolating phoneme images to simulate speech. Jet Propulsion Laboratory, California Institute of Technology, 1997.

    Google Scholar 

  7. Tony Ezzat and Tomaso Poggio. Visual Speech Synthesis by Morphing Visemes (MikeTalk). A.I Memo No: 1658, MIT AI Lab, May 1999.

    Google Scholar 

  8. D. Beymer, A. Shashua and T. Poggio. Example based image analysis and synthesis. Technical Report 1431, MIT AI Lab, 1993.

    Google Scholar 

  9. Tony Ezzat and Tomaso Poggio. Facial Analysis and Synthesis using Image-Based Models. In Proceedings of the Workshop on the Algorithmic Foundations of Robotics, Toulouse, France, August 1996.

    Google Scholar 

  10. Steven M. Seitz and Charles R. Dyer. View Morphing. In Proceedings of SIGGRAPH’96, pages 21–30, 1996.

    Google Scholar 

  11. B.K.P Horn and B.G. Schnuck. Determining Optical flow. Artificial Intelligence, 17:185–203, 1981.

    Article  Google Scholar 

  12. F. Parke and K. Waters. Computer Facial Animation. A. K. Peters, Wellesley, Massachusetts, 1996.

    Google Scholar 

  13. Alan Watt and Fabio Policarpo. The Computer Image. ACM Press, New York, SIGGRAPH Series, New York.

    Google Scholar 

  14. Black and P. Taylor. The Festival Speech Synthesis System. University of Edinburgh, 1997.

    Google Scholar 

  15. Catherine Pelachaud. Communication and Coarticulation in Facial Animation. Ph.D. Thesis, Department of Computer and Information Science,Univ. of Pennsylvania, Philadelphia, 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Goyal, U.K., Kapoor, A., Kalra, P. (2000). Text-to-Audiovisual Speech Synthesizer. In: Heudin, JC. (eds) Virtual Worlds. VW 2000. Lecture Notes in Computer Science(), vol 1834. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45016-5_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-45016-5_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67707-9

  • Online ISBN: 978-3-540-45016-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics