Skip to main content

Audio Based Real-Time Speech Animation of Embodied Conversational Agents

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2915))

Abstract

A framework dedicated to embodied agents facial animation based on speech analysis in presence of background noise is described. Target application areas are entertainment and mobile visual communication. This novel approach derives from the speech signal all the necessary information needed to drive 3-D facial models. Using both digital signal processing and soft computing (fuzzy logic and neural networks) methodologies, a very flexible and low-cost solution for the extraction of lips and facial-related information has been implemented. The main advantage of the speech-based approach is that it is not invasive, as speech is captured by means of a microphone and there is no physical contact with the subject (no use of magnetic sensors or optical markers). This gives additional flexibility to the application in that more applicability derives, if compared to other methodologies. First a speech-based lip driver system was developed in order to synchronize speech to lip movements, then the methodology was extended to some important facial movements so that a face-synching system could be modeled. The developed system is speaker and language independent, so also neural network training operations are not required.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Malcangi, M.: A Soft-Computing approach to fit a speech recognition system on a singlechip. In: 2002 International Workshop System-on-Chip for Real-Time Applications Proceedings, Banff, Canada, July 6-7 (2002)

    Google Scholar 

  2. Malcangi, M., de Tintis, R.: LipSync: A Real-Time System for Virtual Characters Lip-Synching. In: XIII Colloquium on Musical Informatics Proceedings, L’Aquila, Italy (2000)

    Google Scholar 

  3. Malcangi, M., de Tintis, R.: Sincronizzazione Labiale e Modellazione Facciale in Tempo Reale per l’Animazione di Personaggi Virtuali. In: II Convegno Tecnico Scientifico di MIMOS, Proceedings, Torino, October 28-29 (2002)

    Google Scholar 

  4. Poggi, I., Pelachaud, C.: Performative Facial Expressions in Animated Faces. In: Embodied Conversational Agents. MIT Press, Cambridge (2000)

    Google Scholar 

  5. Parke, F.I., Waters, K.: Speech Synchronized Animation. In: Computer Facial Animation. A K Peters, Ltd., Wellesley (1996)

    Google Scholar 

  6. Nitchie, E.B.: How to Read Lips For Fun and Profit. Hawthorne Books, New York (1979)

    Google Scholar 

  7. Cohen, M., Massaro, D.: Modeling co articulation in synthetic visual speech. In: Thalmann, N.M. (ed.) Models and Techniques in Computer Animation. Springer, Tokyo (1993)

    Google Scholar 

  8. Löfquist, A.: Speech as audible gestures. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modeling. Kluwer Academic Publishers, Dordrecht (1990)

    Google Scholar 

  9. Parke, F.I., Waters, K.: Anatomy of the Face, Head, and Neck. In: Computer Facial Animation. A K Peters, Ltd., Wellesley (1996)

    Google Scholar 

  10. Junqua, J.C., Mak, B., Reaves, B.: A robust algorithm for word boundary detection in presence of noise. IEEE Trans. Speech and Audio Processing 2(3) (July 1994)

    Google Scholar 

  11. Cao, Y., Sridharan, S., Moody, M.: Voiced/Unvoiced/Silence Classification of Noisy Speech in Real Time Audio Signal Processing. In: 5th Australian Regional Convention, Sydney (April 1995) (AES Preprint N. 4045)

    Google Scholar 

  12. Markowitz, J.A.: The Data of Speech Recognition. In: Using Speech Recognition. Prentice-Hall, Englewood Cliffs (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Malcangi, M., de Tintis, R. (2004). Audio Based Real-Time Speech Animation of Embodied Conversational Agents. In: Camurri, A., Volpe, G. (eds) Gesture-Based Communication in Human-Computer Interaction. GW 2003. Lecture Notes in Computer Science(), vol 2915. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24598-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24598-8_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21072-6

  • Online ISBN: 978-3-540-24598-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics