Audio Based Real-Time Speech Animation of Embodied Conversational Agents

Malcangi, Mario; de Tintis, Raffaele

doi:10.1007/978-3-540-24598-8_32

Audio Based Real-Time Speech Animation of Embodied Conversational Agents

Mario Malcangi⁸ &
Raffaele de Tintis⁹

Conference paper

2143 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2915))

Abstract

A framework dedicated to embodied agents facial animation based on speech analysis in presence of background noise is described. Target application areas are entertainment and mobile visual communication. This novel approach derives from the speech signal all the necessary information needed to drive 3-D facial models. Using both digital signal processing and soft computing (fuzzy logic and neural networks) methodologies, a very flexible and low-cost solution for the extraction of lips and facial-related information has been implemented. The main advantage of the speech-based approach is that it is not invasive, as speech is captured by means of a microphone and there is no physical contact with the subject (no use of magnetic sensors or optical markers). This gives additional flexibility to the application in that more applicability derives, if compared to other methodologies. First a speech-based lip driver system was developed in order to synchronize speech to lip movements, then the methodology was extended to some important facial movements so that a face-synching system could be modeled. The developed system is speaker and language independent, so also neural network training operations are not required.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Malcangi, M.: A Soft-Computing approach to fit a speech recognition system on a singlechip. In: 2002 International Workshop System-on-Chip for Real-Time Applications Proceedings, Banff, Canada, July 6-7 (2002)
Google Scholar
Malcangi, M., de Tintis, R.: LipSync: A Real-Time System for Virtual Characters Lip-Synching. In: XIII Colloquium on Musical Informatics Proceedings, L’Aquila, Italy (2000)
Google Scholar
Malcangi, M., de Tintis, R.: Sincronizzazione Labiale e Modellazione Facciale in Tempo Reale per l’Animazione di Personaggi Virtuali. In: II Convegno Tecnico Scientifico di MIMOS, Proceedings, Torino, October 28-29 (2002)
Google Scholar
Poggi, I., Pelachaud, C.: Performative Facial Expressions in Animated Faces. In: Embodied Conversational Agents. MIT Press, Cambridge (2000)
Google Scholar
Parke, F.I., Waters, K.: Speech Synchronized Animation. In: Computer Facial Animation. A K Peters, Ltd., Wellesley (1996)
Google Scholar
Nitchie, E.B.: How to Read Lips For Fun and Profit. Hawthorne Books, New York (1979)
Google Scholar
Cohen, M., Massaro, D.: Modeling co articulation in synthetic visual speech. In: Thalmann, N.M. (ed.) Models and Techniques in Computer Animation. Springer, Tokyo (1993)
Google Scholar
Löfquist, A.: Speech as audible gestures. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modeling. Kluwer Academic Publishers, Dordrecht (1990)
Google Scholar
Parke, F.I., Waters, K.: Anatomy of the Face, Head, and Neck. In: Computer Facial Animation. A K Peters, Ltd., Wellesley (1996)
Google Scholar
Junqua, J.C., Mak, B., Reaves, B.: A robust algorithm for word boundary detection in presence of noise. IEEE Trans. Speech and Audio Processing 2(3) (July 1994)
Google Scholar
Cao, Y., Sridharan, S., Moody, M.: Voiced/Unvoiced/Silence Classification of Noisy Speech in Real Time Audio Signal Processing. In: 5th Australian Regional Convention, Sydney (April 1995) (AES Preprint N. 4045)
Google Scholar
Markowitz, J.A.: The Data of Speech Recognition. In: Using Speech Recognition. Prentice-Hall, Englewood Cliffs (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

DICo – Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Via Comelico 39, 20135, Milano, Italy
Mario Malcangi
DSPengineering, Via Burigozzo 8, 20122, Milano, Italy
Raffaele de Tintis

Authors

Mario Malcangi
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele de Tintis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

InfoMus Lab, DIST- University of Genova, Viale Causa 13, I-16145, Genova, Italy
Antonio Camurri
InfoMus Lab, DIST University of Genova, Viale Causa 13, I-16145, Genova, Italy
Gualtiero Volpe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malcangi, M., de Tintis, R. (2004). Audio Based Real-Time Speech Animation of Embodied Conversational Agents. In: Camurri, A., Volpe, G. (eds) Gesture-Based Communication in Human-Computer Interaction. GW 2003. Lecture Notes in Computer Science(), vol 2915. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24598-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-540-24598-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21072-6
Online ISBN: 978-3-540-24598-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics