Skip to main content

Speech Driven Facial Animation

  • Conference paper
Computer Animation and Simulation ’99

Part of the book series: Eurographics ((EUROGRAPH))

Abstract

In this paper, we present an approach that animates facial expressions through speech analysis. An individualized 3D head model is first generated by modifying a generic head model, where a set of MPEG-4 Facial Definition Parameters (FDPs) has been pre-defined. To animate facial expressions of the 3D head model, a real-time speech analysis module is employed to obtain mouth shapes that are converted to MPEG-4 Facial Animation Parameters (FAPs) to drive the 3D head model with corresponding facial expressions. The approach has been implemented as a real-time speech-driven facial animation system. On a PC with a single Pentinum-III 500MHz CPU, the system performance is around 15–24 frames/sec with image size 120×150. The input is live audio, and initial delay is within 4 seconds. An ongoing model-based visual communication system that integrates a 3D head motion estimation technique with this system is also described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Doenges P., Lavagetto F., Ostermann J., Pandzic I.S., Petajan E.“MPEG-4: Audio/Video and Synthetic Graphics/Audio for Mixed Media,” Image Communications Journal, Vol. 5, No. 4, May 1997.

    Google Scholar 

  2. MPEG-4 Part 2: Visual, ISO/IEC 14496–2, ISO/IEC JTC1/SC29/WG11 N2502, Atlantic City, Oct. 1998.

    Google Scholar 

  3. Yuencheng Lee, Demetri Terzopoulos, Keith Waters, “Realistic Modeling for Facial Animation,” Proc. of SIGGRAPH’96, pp. 55–62.

    Google Scholar 

  4. Brian Guenter, Cindy Grimm, Daniel Wood, Henrique Malvar, Frédéric Pighin, “Making Face,” Proc. of SIGGRAPH’98, pp. 55–66.

    Google Scholar 

  5. C. Bregler, M. Covell, M. Slaney, “Video Rewrite: Driving Visual Speech with Audio,” Proc. of SIGGRAPH’97, pp. 353–360.

    Google Scholar 

  6. Frédéric Pighin, Jamie Hecker, Dani Lischinski, Pichard Szeliski, David H. Sale-sin, “Synthesizing Realistic Facial Expression from Photographs,” Proc. of SIGGRAPH’98, pp. 75–84.

    Google Scholar 

  7. Marc Escher, Nadia Magnenat Thalmann, “Automatic 3D Cloning and Real-Time Animation of a Human Face,” Computer Animation’97, pp. 58–66.

    Google Scholar 

  8. Won-Sook Lee, Elwin Lee, Nadia Magnenat Thalmann, “Real Face Communication in a Virtual World,” Proc. of Virtual Worlds 98, Springer LNAI Press.

    Google Scholar 

  9. Jorn Ostermann, “Animation of Synthetic Faces in MPEG-4,” Computer Animation’98, pp. 49–51.

    Google Scholar 

  10. Eric Cosatto, Hans Peter Graf, “Sample-Based Synthesis of Photo-Realistic Talking Heads,” Computer Animation’98, pp. 103–110.

    Google Scholar 

  11. Won-Sook Lee, Nadia Magnenat Thalmann, “Head Modeling from Pictures and Morphing in 3D with Image Metamorphosis Based on Triangulation,” Proc. of CAPTECH’98, Geneva, Switzerland, 1998, pp. 254–267.

    Google Scholar 

  12. Applied Speech Technologies Corporation. http://www.speech.com.tw .

  13. Microsoft Speech Technology SAPI 4.0 SDK, http://www.microsoft.com/iit/sapisdk.htm .

  14. Lin-Shan Lee, Chiu-Yu Tseng, Ming Ouhyoung, “The Synthesis Rules in a Chinese Text-to-Speech System,” IEEE Trans, on Acoustics, Speech and Signal Processing, 37(9), 1989, pp. 1309–1320.

    Article  Google Scholar 

  15. Woei-Luen Perng, Yungkang Wu, Ming Ouhyoung, “Image Talk: A Real Time Synthetic Talking Head Using One Single Image with Chinese Text-To-Speech Capability,” Proc. of Pacafic Graphics ’98, Singapore, Oct., 1998, pp. 140–148.

    Google Scholar 

  16. Tzong-Jer Yang, Fu-Che Wu, and Ming Ouhyoung, “Real-Time 3D Head Motion Estimation in Facial Image Coding,”Proc. of MMM’98 (Multimedia Modeling Conference), Lausanne, Switzerland, pp. 50–51, Oct. 12–15, 1998.

    Google Scholar 

  17. Marc Escher, Igor Pandzic, Nadia Magnenat Thalmann “Facial Deformations for MPEG-4”, Proceedings of Computer Animation 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Wien

About this paper

Cite this paper

Yang, TJ., Lin, IC., Hung, CS., Huang, CF., Ouhyoung, M. (1999). Speech Driven Facial Animation. In: Magnenat-Thalmann, N., Thalmann, D. (eds) Computer Animation and Simulation ’99. Eurographics. Springer, Vienna. https://doi.org/10.1007/978-3-7091-6423-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-6423-5_10

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-211-83392-6

  • Online ISBN: 978-3-7091-6423-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics