Speech Driven Facial Animation

Yang, Tzong-Jer; Lin, I-Chen; Hung, Cheng-Sheng; Huang, Chien-Feng; Ouhyoung, Ming

doi:10.1007/978-3-7091-6423-5_10

Tzong-Jer Yang³,
I-Chen Lin³,
Cheng-Sheng Hung³,
Chien-Feng Huang³ &
…
Ming Ouhyoung³

Part of the book series: Eurographics ((EUROGRAPH))

138 Accesses
3 Citations

Abstract

In this paper, we present an approach that animates facial expressions through speech analysis. An individualized 3D head model is first generated by modifying a generic head model, where a set of MPEG-4 Facial Definition Parameters (FDPs) has been pre-defined. To animate facial expressions of the 3D head model, a real-time speech analysis module is employed to obtain mouth shapes that are converted to MPEG-4 Facial Animation Parameters (FAPs) to drive the 3D head model with corresponding facial expressions. The approach has been implemented as a real-time speech-driven facial animation system. On a PC with a single Pentinum-III 500MHz CPU, the system performance is around 15–24 frames/sec with image size 120×150. The input is live audio, and initial delay is within 4 seconds. An ongoing model-based visual communication system that integrates a 3D head motion estimation technique with this system is also described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Doenges P., Lavagetto F., Ostermann J., Pandzic I.S., Petajan E.“MPEG-4: Audio/Video and Synthetic Graphics/Audio for Mixed Media,” Image Communications Journal, Vol. 5, No. 4, May 1997.
Google Scholar
MPEG-4 Part 2: Visual, ISO/IEC 14496–2, ISO/IEC JTC1/SC29/WG11 N2502, Atlantic City, Oct. 1998.
Google Scholar
Yuencheng Lee, Demetri Terzopoulos, Keith Waters, “Realistic Modeling for Facial Animation,” Proc. of SIGGRAPH’96, pp. 55–62.
Google Scholar
Brian Guenter, Cindy Grimm, Daniel Wood, Henrique Malvar, Frédéric Pighin, “Making Face,” Proc. of SIGGRAPH’98, pp. 55–66.
Google Scholar
C. Bregler, M. Covell, M. Slaney, “Video Rewrite: Driving Visual Speech with Audio,” Proc. of SIGGRAPH’97, pp. 353–360.
Google Scholar
Frédéric Pighin, Jamie Hecker, Dani Lischinski, Pichard Szeliski, David H. Sale-sin, “Synthesizing Realistic Facial Expression from Photographs,” Proc. of SIGGRAPH’98, pp. 75–84.
Google Scholar
Marc Escher, Nadia Magnenat Thalmann, “Automatic 3D Cloning and Real-Time Animation of a Human Face,” Computer Animation’97, pp. 58–66.
Google Scholar
Won-Sook Lee, Elwin Lee, Nadia Magnenat Thalmann, “Real Face Communication in a Virtual World,” Proc. of Virtual Worlds 98, Springer LNAI Press.
Google Scholar
Jorn Ostermann, “Animation of Synthetic Faces in MPEG-4,” Computer Animation’98, pp. 49–51.
Google Scholar
Eric Cosatto, Hans Peter Graf, “Sample-Based Synthesis of Photo-Realistic Talking Heads,” Computer Animation’98, pp. 103–110.
Google Scholar
Won-Sook Lee, Nadia Magnenat Thalmann, “Head Modeling from Pictures and Morphing in 3D with Image Metamorphosis Based on Triangulation,” Proc. of CAPTECH’98, Geneva, Switzerland, 1998, pp. 254–267.
Google Scholar
Applied Speech Technologies Corporation. http://www.speech.com.tw .
Microsoft Speech Technology SAPI 4.0 SDK, http://www.microsoft.com/iit/sapisdk.htm .
Lin-Shan Lee, Chiu-Yu Tseng, Ming Ouhyoung, “The Synthesis Rules in a Chinese Text-to-Speech System,” IEEE Trans, on Acoustics, Speech and Signal Processing, 37(9), 1989, pp. 1309–1320.
Article Google Scholar
Woei-Luen Perng, Yungkang Wu, Ming Ouhyoung, “Image Talk: A Real Time Synthetic Talking Head Using One Single Image with Chinese Text-To-Speech Capability,” Proc. of Pacafic Graphics ’98, Singapore, Oct., 1998, pp. 140–148.
Google Scholar
Tzong-Jer Yang, Fu-Che Wu, and Ming Ouhyoung, “Real-Time 3D Head Motion Estimation in Facial Image Coding,”Proc. of MMM’98 (Multimedia Modeling Conference), Lausanne, Switzerland, pp. 50–51, Oct. 12–15, 1998.
Google Scholar
Marc Escher, Igor Pandzic, Nadia Magnenat Thalmann “Facial Deformations for MPEG-4”, Proceedings of Computer Animation 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Communication and Multimedia Laboratory, Dept. of Computer Science and Information Engineering, National Taiwan University, Taipei, 106, Taiwan
Tzong-Jer Yang, I-Chen Lin, Cheng-Sheng Hung, Chien-Feng Huang & Ming Ouhyoung

Authors

Tzong-Jer Yang
View author publications
You can also search for this author in PubMed Google Scholar
I-Chen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Sheng Hung
View author publications
You can also search for this author in PubMed Google Scholar
Chien-Feng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Ouhyoung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MIRA Laboratory, University of Geneva, Switzerland
Nadia Magnenat-Thalmann
Computer Graphics Laboratory, Swiss Federal Institute of Technology, Lausanne, Switzerland
Daniel Thalmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, TJ., Lin, IC., Hung, CS., Huang, CF., Ouhyoung, M. (1999). Speech Driven Facial Animation. In: Magnenat-Thalmann, N., Thalmann, D. (eds) Computer Animation and Simulation ’99. Eurographics. Springer, Vienna. https://doi.org/10.1007/978-3-7091-6423-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-7091-6423-5_10
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-83392-6
Online ISBN: 978-3-7091-6423-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics