Speech Driven Facial Animation

  • Tzong-Jer Yang
  • I-Chen Lin
  • Cheng-Sheng Hung
  • Chien-Feng Huang
  • Ming Ouhyoung
Part of the Eurographics book series (EUROGRAPH)


In this paper, we present an approach that animates facial expressions through speech analysis. An individualized 3D head model is first generated by modifying a generic head model, where a set of MPEG-4 Facial Definition Parameters (FDPs) has been pre-defined. To animate facial expressions of the 3D head model, a real-time speech analysis module is employed to obtain mouth shapes that are converted to MPEG-4 Facial Animation Parameters (FAPs) to drive the 3D head model with corresponding facial expressions. The approach has been implemented as a real-time speech-driven facial animation system. On a PC with a single Pentinum-III 500MHz CPU, the system performance is around 15–24 frames/sec with image size 120×150. The input is live audio, and initial delay is within 4 seconds. An ongoing model-based visual communication system that integrates a 3D head motion estimation technique with this system is also described.


Facial Expression Speech Signal Computer Animation Facial Animation Speech Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Doenges P., Lavagetto F., Ostermann J., Pandzic I.S., Petajan E.“MPEG-4: Audio/Video and Synthetic Graphics/Audio for Mixed Media,” Image Communications Journal, Vol. 5, No. 4, May 1997.Google Scholar
  2. 2.
    MPEG-4 Part 2: Visual, ISO/IEC 14496–2, ISO/IEC JTC1/SC29/WG11 N2502, Atlantic City, Oct. 1998.Google Scholar
  3. 3.
    Yuencheng Lee, Demetri Terzopoulos, Keith Waters, “Realistic Modeling for Facial Animation,” Proc. of SIGGRAPH’96, pp. 55–62.Google Scholar
  4. 4.
    Brian Guenter, Cindy Grimm, Daniel Wood, Henrique Malvar, Frédéric Pighin, “Making Face,” Proc. of SIGGRAPH’98, pp. 55–66.Google Scholar
  5. 5.
    C. Bregler, M. Covell, M. Slaney, “Video Rewrite: Driving Visual Speech with Audio,” Proc. of SIGGRAPH’97, pp. 353–360.Google Scholar
  6. 6.
    Frédéric Pighin, Jamie Hecker, Dani Lischinski, Pichard Szeliski, David H. Sale-sin, “Synthesizing Realistic Facial Expression from Photographs,” Proc. of SIGGRAPH’98, pp. 75–84.Google Scholar
  7. 7.
    Marc Escher, Nadia Magnenat Thalmann, “Automatic 3D Cloning and Real-Time Animation of a Human Face,” Computer Animation’97, pp. 58–66.Google Scholar
  8. 8.
    Won-Sook Lee, Elwin Lee, Nadia Magnenat Thalmann, “Real Face Communication in a Virtual World,” Proc. of Virtual Worlds 98, Springer LNAI Press.Google Scholar
  9. 9.
    Jorn Ostermann, “Animation of Synthetic Faces in MPEG-4,” Computer Animation’98, pp. 49–51.Google Scholar
  10. 10.
    Eric Cosatto, Hans Peter Graf, “Sample-Based Synthesis of Photo-Realistic Talking Heads,” Computer Animation’98, pp. 103–110.Google Scholar
  11. 11.
    Won-Sook Lee, Nadia Magnenat Thalmann, “Head Modeling from Pictures and Morphing in 3D with Image Metamorphosis Based on Triangulation,” Proc. of CAPTECH’98, Geneva, Switzerland, 1998, pp. 254–267.Google Scholar
  12. 12.
    Applied Speech Technologies Corporation.
  13. 13.
    Microsoft Speech Technology SAPI 4.0 SDK,
  14. 14.
    Lin-Shan Lee, Chiu-Yu Tseng, Ming Ouhyoung, “The Synthesis Rules in a Chinese Text-to-Speech System,” IEEE Trans, on Acoustics, Speech and Signal Processing, 37(9), 1989, pp. 1309–1320.CrossRefGoogle Scholar
  15. 15.
    Woei-Luen Perng, Yungkang Wu, Ming Ouhyoung, “Image Talk: A Real Time Synthetic Talking Head Using One Single Image with Chinese Text-To-Speech Capability,” Proc. of Pacafic Graphics ’98, Singapore, Oct., 1998, pp. 140–148.Google Scholar
  16. 16.
    Tzong-Jer Yang, Fu-Che Wu, and Ming Ouhyoung, “Real-Time 3D Head Motion Estimation in Facial Image Coding,”Proc. of MMM’98 (Multimedia Modeling Conference), Lausanne, Switzerland, pp. 50–51, Oct. 12–15, 1998.Google Scholar
  17. 17.
    Marc Escher, Igor Pandzic, Nadia Magnenat Thalmann “Facial Deformations for MPEG-4”, Proceedings of Computer Animation 1998.Google Scholar

Copyright information

© Springer-Verlag Wien 1999

Authors and Affiliations

  • Tzong-Jer Yang
    • 1
  • I-Chen Lin
    • 1
  • Cheng-Sheng Hung
    • 1
  • Chien-Feng Huang
    • 1
  • Ming Ouhyoung
    • 1
  1. 1.Communication and Multimedia Laboratory, Dept. of Computer Science and Information EngineeringNational Taiwan UniversityTaipeiTaiwan

Personalised recommendations