Abstract
A framework dedicated to embodied agents facial animation based on speech analysis in presence of background noise is described. Target application areas are entertainment and mobile visual communication. This novel approach derives from the speech signal all the necessary information needed to drive 3-D facial models. Using both digital signal processing and soft computing (fuzzy logic and neural networks) methodologies, a very flexible and low-cost solution for the extraction of lips and facial-related information has been implemented. The main advantage of the speech-based approach is that it is not invasive, as speech is captured by means of a microphone and there is no physical contact with the subject (no use of magnetic sensors or optical markers). This gives additional flexibility to the application in that more applicability derives, if compared to other methodologies. First a speech-based lip driver system was developed in order to synchronize speech to lip movements, then the methodology was extended to some important facial movements so that a face-synching system could be modeled. The developed system is speaker and language independent, so also neural network training operations are not required.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Malcangi, M.: A Soft-Computing approach to fit a speech recognition system on a singlechip. In: 2002 International Workshop System-on-Chip for Real-Time Applications Proceedings, Banff, Canada, July 6-7 (2002)
Malcangi, M., de Tintis, R.: LipSync: A Real-Time System for Virtual Characters Lip-Synching. In: XIII Colloquium on Musical Informatics Proceedings, L’Aquila, Italy (2000)
Malcangi, M., de Tintis, R.: Sincronizzazione Labiale e Modellazione Facciale in Tempo Reale per l’Animazione di Personaggi Virtuali. In: II Convegno Tecnico Scientifico di MIMOS, Proceedings, Torino, October 28-29 (2002)
Poggi, I., Pelachaud, C.: Performative Facial Expressions in Animated Faces. In: Embodied Conversational Agents. MIT Press, Cambridge (2000)
Parke, F.I., Waters, K.: Speech Synchronized Animation. In: Computer Facial Animation. A K Peters, Ltd., Wellesley (1996)
Nitchie, E.B.: How to Read Lips For Fun and Profit. Hawthorne Books, New York (1979)
Cohen, M., Massaro, D.: Modeling co articulation in synthetic visual speech. In: Thalmann, N.M. (ed.) Models and Techniques in Computer Animation. Springer, Tokyo (1993)
Löfquist, A.: Speech as audible gestures. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modeling. Kluwer Academic Publishers, Dordrecht (1990)
Parke, F.I., Waters, K.: Anatomy of the Face, Head, and Neck. In: Computer Facial Animation. A K Peters, Ltd., Wellesley (1996)
Junqua, J.C., Mak, B., Reaves, B.: A robust algorithm for word boundary detection in presence of noise. IEEE Trans. Speech and Audio Processing 2(3) (July 1994)
Cao, Y., Sridharan, S., Moody, M.: Voiced/Unvoiced/Silence Classification of Noisy Speech in Real Time Audio Signal Processing. In: 5th Australian Regional Convention, Sydney (April 1995) (AES Preprint N. 4045)
Markowitz, J.A.: The Data of Speech Recognition. In: Using Speech Recognition. Prentice-Hall, Englewood Cliffs (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Malcangi, M., de Tintis, R. (2004). Audio Based Real-Time Speech Animation of Embodied Conversational Agents. In: Camurri, A., Volpe, G. (eds) Gesture-Based Communication in Human-Computer Interaction. GW 2003. Lecture Notes in Computer Science(), vol 2915. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24598-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-24598-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21072-6
Online ISBN: 978-3-540-24598-8
eBook Packages: Springer Book Archive