Speech Driven Facial Animation
In this paper, we present an approach that animates facial expressions through speech analysis. An individualized 3D head model is first generated by modifying a generic head model, where a set of MPEG-4 Facial Definition Parameters (FDPs) has been pre-defined. To animate facial expressions of the 3D head model, a real-time speech analysis module is employed to obtain mouth shapes that are converted to MPEG-4 Facial Animation Parameters (FAPs) to drive the 3D head model with corresponding facial expressions. The approach has been implemented as a real-time speech-driven facial animation system. On a PC with a single Pentinum-III 500MHz CPU, the system performance is around 15–24 frames/sec with image size 120×150. The input is live audio, and initial delay is within 4 seconds. An ongoing model-based visual communication system that integrates a 3D head motion estimation technique with this system is also described.
KeywordsFacial Expression Speech Signal Computer Animation Facial Animation Speech Analysis
Unable to display preview. Download preview PDF.
- 1.Doenges P., Lavagetto F., Ostermann J., Pandzic I.S., Petajan E.“MPEG-4: Audio/Video and Synthetic Graphics/Audio for Mixed Media,” Image Communications Journal, Vol. 5, No. 4, May 1997.Google Scholar
- 2.MPEG-4 Part 2: Visual, ISO/IEC 14496–2, ISO/IEC JTC1/SC29/WG11 N2502, Atlantic City, Oct. 1998.Google Scholar
- 3.Yuencheng Lee, Demetri Terzopoulos, Keith Waters, “Realistic Modeling for Facial Animation,” Proc. of SIGGRAPH’96, pp. 55–62.Google Scholar
- 4.Brian Guenter, Cindy Grimm, Daniel Wood, Henrique Malvar, Frédéric Pighin, “Making Face,” Proc. of SIGGRAPH’98, pp. 55–66.Google Scholar
- 5.C. Bregler, M. Covell, M. Slaney, “Video Rewrite: Driving Visual Speech with Audio,” Proc. of SIGGRAPH’97, pp. 353–360.Google Scholar
- 6.Frédéric Pighin, Jamie Hecker, Dani Lischinski, Pichard Szeliski, David H. Sale-sin, “Synthesizing Realistic Facial Expression from Photographs,” Proc. of SIGGRAPH’98, pp. 75–84.Google Scholar
- 7.Marc Escher, Nadia Magnenat Thalmann, “Automatic 3D Cloning and Real-Time Animation of a Human Face,” Computer Animation’97, pp. 58–66.Google Scholar
- 8.Won-Sook Lee, Elwin Lee, Nadia Magnenat Thalmann, “Real Face Communication in a Virtual World,” Proc. of Virtual Worlds 98, Springer LNAI Press.Google Scholar
- 9.Jorn Ostermann, “Animation of Synthetic Faces in MPEG-4,” Computer Animation’98, pp. 49–51.Google Scholar
- 10.Eric Cosatto, Hans Peter Graf, “Sample-Based Synthesis of Photo-Realistic Talking Heads,” Computer Animation’98, pp. 103–110.Google Scholar
- 11.Won-Sook Lee, Nadia Magnenat Thalmann, “Head Modeling from Pictures and Morphing in 3D with Image Metamorphosis Based on Triangulation,” Proc. of CAPTECH’98, Geneva, Switzerland, 1998, pp. 254–267.Google Scholar
- 12.Applied Speech Technologies Corporation. http://www.speech.com.tw.
- 13.Microsoft Speech Technology SAPI 4.0 SDK, http://www.microsoft.com/iit/sapisdk.htm.
- 15.Woei-Luen Perng, Yungkang Wu, Ming Ouhyoung, “Image Talk: A Real Time Synthetic Talking Head Using One Single Image with Chinese Text-To-Speech Capability,” Proc. of Pacafic Graphics ’98, Singapore, Oct., 1998, pp. 140–148.Google Scholar
- 16.Tzong-Jer Yang, Fu-Che Wu, and Ming Ouhyoung, “Real-Time 3D Head Motion Estimation in Facial Image Coding,”Proc. of MMM’98 (Multimedia Modeling Conference), Lausanne, Switzerland, pp. 50–51, Oct. 12–15, 1998.Google Scholar
- 17.Marc Escher, Igor Pandzic, Nadia Magnenat Thalmann “Facial Deformations for MPEG-4”, Proceedings of Computer Animation 1998.Google Scholar