Skip to main content

Learning Expressive Human-Like Head Motion Sequences from Speech

  • Chapter
Data-Driven 3D Facial Animation

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. I. Albrecht, J. Haber, and H.P. Seidel. Automatic generation of non-verbal facial expressions from speech. In Computer Graphics International (CGI 2002), pages 283–293, Bradford, U.K., July 2002.

    Google Scholar 

  2. K.S. Arun, T.S. Huang, and S.D. Blostein. Least-squares fitting of two 3-d point sets. IEEE Trans. Pattern Anal. Mach. Intell., 9(5):698–700, 1987.

    Article  Google Scholar 

  3. P. Boersma and D. Weeninck. Praat, a system for doing phonetics by computer. Technical Report 132, Institute of Phonetic Sciences of the University of Amsterdam, Amsterdam, Netherlands, 1996. http://www.praat.org.

    Google Scholar 

  4. M. Brand. Voice puppetry. In Proc. 26th Ann. Conf. Computer Graph. Interactive Techniques (SIGGRAPH 1999), pages 21–28, New York, 1999.

    Google Scholar 

  5. C. Bregler, M. Covell, and M. Slaney. Video rewrite: Driving visual speech with audio. In Proc. 24th Annual Conf. Computer Graphics Interactive Techniques (SIGGRAPH 1997), pages 353–360, Los Angeles, CA, August 1997.

    Google Scholar 

  6. C. Busso, Z. Deng, M. Grimm, U. Neumann, and S. Narayanan. Rigid head motion in expressive speech animation: Analysis and synthesis. IEEE Transactions on Audio, Speech and Language Processing, 15(3): 1075–1086, March 2007.

    Google Scholar 

  7. C. Busso, Z. Deng, U. Neumann, and S.S. Narayanan. Natural head motion synthesis driven by acoustic prosodic features. Computer Animation and Virtual Worlds, 16(3–4):283–290, July 2005.

    Google Scholar 

  8. C. Busso and S. Narayanan. Interrelation between speech and facial gestures in emotional utterances: A single subject study. Accepted to appear in IEEE Transactions on Speech, Audio and Language Processing, 2007.

    Google Scholar 

  9. C. Busso and S.S. Narayanan. Interplay between linguistic and affective goals in facial expression during emotional utterances. In 7th International Seminar on Speech Production (ISSP 2006), pages 549–556, Ubatuba-SP, Brazil, December 2006.

    Google Scholar 

  10. J. Cassell, C. Pelachaud, N. Badler, M. Steedman, B. Achorn, T. Bechet, B. Douville, S. Prevost, and M. Stone. Animated conversation: Rule-based generation of facial expression gesture and spoken intonation for multiple conversational agents. In Computer Graphics (Proc. ACM SIGGRAPH’94), pages 413–420, Orlando, FL, 1994.

    Google Scholar 

  11. E. Chuang and C. Bregler. Mood swings: Expressive speech animation. ACM Transactions on Graphics, 24(2):331–347, April 2005.

    Article  Google Scholar 

  12. M.M. Cohen and D.W. Massaro. Modeling coarticulation in synthetic visual speech. In Magnenat-Thalmann N., Thalmann D. (Eds.), Models and Techniques in Computer Animation, Springer Verlag, pages 139–156, Tokyo, 1993.

    Google Scholar 

  13. R. Cowie and R.R. Cornelius. Describing the emotional states that are expressed in speech. Speech Communication, 40(1–2):5–32, April 2003.

    Article  MATH  Google Scholar 

  14. D. DeCarlo, C. Revilla, M. Stone, and J.J. Venditti. Making discourse visible: coding and animating conversational facial displays. In Computer Animation (CA 2002), pages 11–16, Geneva, Switzerland, June 2002.

    Google Scholar 

  15. Z. Deng, M. Bulut, U. Neumann, and S. Narayanan. Automatic dynamic expression synthesis for speech animation. In IEEE 17th International Conference on Computer Animation and Social Agents (CASA 2004), pages 267–274, Geneva, Switzerland, July 2004.

    Google Scholar 

  16. Z. Deng, C. Busso, S. Narayanan, and U. Neumann. Audio-based head motion synthesis for avatar-based telepresence systems. In ACM SIGMM 2004 Workshop on Effective Telepresence (ETP 2004), pages 24–30, ACM Press, New York, 2004.

    Google Scholar 

  17. Z. Deng, J.P. Lewis, and U. Neumann. Automated eye motion using texture synthesis. IEEE Computer Graphics and Applications, 25(2):24–30, March/April 2005.

    Article  Google Scholar 

  18. Z. Deng, J.P. Lewis, and U. Neumann. Synthesizing speech animation by learning compact speech co-articulation models. In Computer Graphics International (CGI 2005), pages 19–25, Stony Brook, NY, June 2005.

    Google Scholar 

  19. Z. Deng, U. Neumann, J.P. Lewis, T.Y. Kim, M. Bulut, and S. Narayanan. Expressive facial animation synthesis by learning speech co-articultion and expression spaces. IEEE Transactions on Visualization and Computer Graphics (TVCG), 12(6):1523–1534, November/December 2006.

    Article  Google Scholar 

  20. D. Eberly. 3D Game Engine Design: A Practical Approach to Real-Time Computer Graphics. Morgan Kaufmann Publishers, San Francisco, CA, 2000.

    Google Scholar 

  21. P. Ekman. Facial expression and emotion. American Psychologist, 48(4): 384–392, April 1993.

    Article  Google Scholar 

  22. P. Ekman and E.L. Rosenberg. What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). Oxford University Press, New York, 1997.

    Google Scholar 

  23. H.P. Graf, E. Cosatto, V. Strom, and F.J. Huang. Visual prosody: Facial movements accompanying speech. In Proc. of IEEE International Conference on Automatic Faces and Gesture Recognition, pages 396–401, Washington, DC, May 2002.

    Google Scholar 

  24. B. Granström and D. House. Audiovisual representation of prosody in expressive speech communication. Speech Communication, 46(3–4):473–484, July 2005.

    Article  Google Scholar 

  25. J. Gratch and S. Marsella. Lessons from emotion psychology for the design of lifelike characters. Applied Artificial Intelligence, 19(3–4):215–233, March–April 2005.

    Article  Google Scholar 

  26. D. Heylen. Challenges ahead head movements and other social acts in conversation. In Artificial Intelligence and Simulation of Behaviour (AISB 2005), Social Presence Cues for Virtual Humanoids Symposium, page 8, Hertfordshire, U.K., April 2005.

    Google Scholar 

  27. H. Hill and A. Johnston. Categorizing sex and identity from the biological motion of faces. Current Biology, 11(11):880–885, June 2001.

    Article  Google Scholar 

  28. H. Hotelling. Relations between two sets of variates. Biometrika, 28(3/4): 321–377, December 1936.

    Article  MATH  Google Scholar 

  29. L.N. Jefferies, J.T. Enns, S. DiPaola, and A. Arya. Facial actions as visual cues for personality. Computer Animation and Virtual Worlds, 17(3–4):371–382, July 2006.

    Google Scholar 

  30. K. Kakihara, S. Nakamura, and K. Shikano. Speech-to-face movement synthesis based on HMMS. In IEEE International Conference on Multimedia and Expo (ICME), volume 1, pages 427–430, New York, April 2000.

    Google Scholar 

  31. S. Kettebekov, M. Yeasin, and R. Sharma. Prosody based audiovisual coanalysis for coverbal gesture recognition. IEEE Transactions on Multimedia, 7(2): 234–242, April 2005.

    Article  Google Scholar 

  32. T. Kuratate, K.G. Munhall, P.E. Rubin, E. Vatikiotis-Bateson, and H. Yehia. Audio-visual synthesis of talking faces from speech production correlates. In Sixth European Conference on Speech Communication and Technology, Eurospeech 1999, pages 1279–1282, Budapest, Hungary, September 1999.

    Google Scholar 

  33. S. Lee, S. Yildirim, A. Kazemzadeh, and S. Narayanan. An articulatory study of emotional speech production. In 9th European Conference on Speech Communication and Technology (Interspeech’2005—Eurospeech), pages 497–500, Lisbon, Portugal, September 2005.

    Google Scholar 

  34. Y. Linde, A. Buzo, and R. Gray. An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1):84–95, January 1980.

    Article  Google Scholar 

  35. Maya software, Alias Systems division of Silicon Graphics Limited. http://www.alias.com, 2005.

    Google Scholar 

  36. K.G. Munhall, J.A. Jones, D.E. Callan, T. Kuratate, and E. Vatikiotis-Bateson. Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science, 15(2):133–137, February 2004.

    Article  Google Scholar 

  37. C. Pelachaud, N. Badler, and M. Steedman. Generating facial expressions for speech. Cognitive Science, 20(1):1–46, January 1996.

    Article  Google Scholar 

  38. R.W. Picard. Affective computing. Technical Report 321, MIT Media Laboratory Perceptual Computing Section, Cambridge, MA, November 1995.

    Google Scholar 

  39. L.R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, February 1989.

    Article  Google Scholar 

  40. M.E. Sargin, O. Aran, A. Karpov, F. Ofli, Y. Yasinnik, S. Wilson, E. Erzin, Y. Yemez, and A.M. Tekalp. Combined gesture-speech analysis and speech driven gesture synthesis. In IEEE International Conference on Multimedia and Expo (ICME 2006), pages 893–896, Toronto, ON, Canada, July 2006.

    Google Scholar 

  41. K.R. Scherer. Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2):227–256, April 2003.

    Article  MATH  Google Scholar 

  42. K. Shoemake. Animating rotation with quaternion curves. Computer Graphics (Proceedings of SIGGRAPH85), 19(3):245–254, July 1985.

    Article  Google Scholar 

  43. K. Smid, I.S. Pandzic, and V. Radman. Autonomous speaker agent. In IEEE 17th International Conference on Computer Animation and Social Agents (CASA 2004), pages 259–266, Geneva, Switzerland, July 2004.

    Google Scholar 

  44. E. Vatikiotis-Bateson, K.G. Munhall, Y. Kasahara, F. Garcia, and H. Yehia. Characterizing audiovisual information during speech. In Fourth International Conference on Spoken Language Processing (ICSLP 96), volume 3, pages 1485–1488, Philadelphia, PA, October 1996.

    Google Scholar 

  45. H. Yehia, T. Kuratate, and E. Vatikiotis-Bateson. Facial animation and head motion driven by speech acoustics. In 5th Seminar on Speech Production: Models and Data, pages 265–268, Kloster Seeon, Bavaria, Germany, May 2000.

    Google Scholar 

  46. H. Yehia, P. Rubin, and E. Vatikiotis-Bateson. Quantitative association of vocal-tract and facial behavior. Speech Commun., 26(1–2):23–43, 1998.

    Article  Google Scholar 

  47. S. Yildirim, M. Bulut, C.M. Lee, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S.S. Narayanan. An acoustic study of emotions expressed in speech. In 8th International Conference on Spoken Language Processing (ICSLP 04), Jeju Island, Korea, 2004.

    Google Scholar 

  48. S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. The HTK Book. Entropic Cambridge Research Laboratory, Cambridge, UK, 2002.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this chapter

Cite this chapter

Busso, C., Deng, Z., Neumann, U., Narayanan, S. (2008). Learning Expressive Human-Like Head Motion Sequences from Speech. In: Deng, Z., Neumann, U. (eds) Data-Driven 3D Facial Animation. Springer, London. https://doi.org/10.1007/978-1-84628-907-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-84628-907-1_6

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84628-906-4

  • Online ISBN: 978-1-84628-907-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics