Learning Expressive Human-Like Head Motion Sequences from Speech

Busso, Carlos; Deng, Zhigang; Neumann, Ulrich; Narayanan, Shrikanth

doi:10.1007/978-1-84628-907-1_6

Carlos Busso,
Zhigang Deng,
Ulrich Neumann &
…
Shrikanth Narayanan

1200 Accesses
9 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

I. Albrecht, J. Haber, and H.P. Seidel. Automatic generation of non-verbal facial expressions from speech. In Computer Graphics International (CGI 2002), pages 283–293, Bradford, U.K., July 2002.
Google Scholar
K.S. Arun, T.S. Huang, and S.D. Blostein. Least-squares fitting of two 3-d point sets. IEEE Trans. Pattern Anal. Mach. Intell., 9(5):698–700, 1987.
Article Google Scholar
P. Boersma and D. Weeninck. Praat, a system for doing phonetics by computer. Technical Report 132, Institute of Phonetic Sciences of the University of Amsterdam, Amsterdam, Netherlands, 1996. http://www.praat.org.
Google Scholar
M. Brand. Voice puppetry. In Proc. 26th Ann. Conf. Computer Graph. Interactive Techniques (SIGGRAPH 1999), pages 21–28, New York, 1999.
Google Scholar
C. Bregler, M. Covell, and M. Slaney. Video rewrite: Driving visual speech with audio. In Proc. 24th Annual Conf. Computer Graphics Interactive Techniques (SIGGRAPH 1997), pages 353–360, Los Angeles, CA, August 1997.
Google Scholar
C. Busso, Z. Deng, M. Grimm, U. Neumann, and S. Narayanan. Rigid head motion in expressive speech animation: Analysis and synthesis. IEEE Transactions on Audio, Speech and Language Processing, 15(3): 1075–1086, March 2007.
Google Scholar
C. Busso, Z. Deng, U. Neumann, and S.S. Narayanan. Natural head motion synthesis driven by acoustic prosodic features. Computer Animation and Virtual Worlds, 16(3–4):283–290, July 2005.
Google Scholar
C. Busso and S. Narayanan. Interrelation between speech and facial gestures in emotional utterances: A single subject study. Accepted to appear in IEEE Transactions on Speech, Audio and Language Processing, 2007.
Google Scholar
C. Busso and S.S. Narayanan. Interplay between linguistic and affective goals in facial expression during emotional utterances. In 7th International Seminar on Speech Production (ISSP 2006), pages 549–556, Ubatuba-SP, Brazil, December 2006.
Google Scholar
J. Cassell, C. Pelachaud, N. Badler, M. Steedman, B. Achorn, T. Bechet, B. Douville, S. Prevost, and M. Stone. Animated conversation: Rule-based generation of facial expression gesture and spoken intonation for multiple conversational agents. In Computer Graphics (Proc. ACM SIGGRAPH’94), pages 413–420, Orlando, FL, 1994.
Google Scholar
E. Chuang and C. Bregler. Mood swings: Expressive speech animation. ACM Transactions on Graphics, 24(2):331–347, April 2005.
Article Google Scholar
M.M. Cohen and D.W. Massaro. Modeling coarticulation in synthetic visual speech. In Magnenat-Thalmann N., Thalmann D. (Eds.), Models and Techniques in Computer Animation, Springer Verlag, pages 139–156, Tokyo, 1993.
Google Scholar
R. Cowie and R.R. Cornelius. Describing the emotional states that are expressed in speech. Speech Communication, 40(1–2):5–32, April 2003.
Article MATH Google Scholar
D. DeCarlo, C. Revilla, M. Stone, and J.J. Venditti. Making discourse visible: coding and animating conversational facial displays. In Computer Animation (CA 2002), pages 11–16, Geneva, Switzerland, June 2002.
Google Scholar
Z. Deng, M. Bulut, U. Neumann, and S. Narayanan. Automatic dynamic expression synthesis for speech animation. In IEEE 17th International Conference on Computer Animation and Social Agents (CASA 2004), pages 267–274, Geneva, Switzerland, July 2004.
Google Scholar
Z. Deng, C. Busso, S. Narayanan, and U. Neumann. Audio-based head motion synthesis for avatar-based telepresence systems. In ACM SIGMM 2004 Workshop on Effective Telepresence (ETP 2004), pages 24–30, ACM Press, New York, 2004.
Google Scholar
Z. Deng, J.P. Lewis, and U. Neumann. Automated eye motion using texture synthesis. IEEE Computer Graphics and Applications, 25(2):24–30, March/April 2005.
Article Google Scholar
Z. Deng, J.P. Lewis, and U. Neumann. Synthesizing speech animation by learning compact speech co-articulation models. In Computer Graphics International (CGI 2005), pages 19–25, Stony Brook, NY, June 2005.
Google Scholar
Z. Deng, U. Neumann, J.P. Lewis, T.Y. Kim, M. Bulut, and S. Narayanan. Expressive facial animation synthesis by learning speech co-articultion and expression spaces. IEEE Transactions on Visualization and Computer Graphics (TVCG), 12(6):1523–1534, November/December 2006.
Article Google Scholar
D. Eberly. 3D Game Engine Design: A Practical Approach to Real-Time Computer Graphics. Morgan Kaufmann Publishers, San Francisco, CA, 2000.
Google Scholar
P. Ekman. Facial expression and emotion. American Psychologist, 48(4): 384–392, April 1993.
Article Google Scholar
P. Ekman and E.L. Rosenberg. What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). Oxford University Press, New York, 1997.
Google Scholar
H.P. Graf, E. Cosatto, V. Strom, and F.J. Huang. Visual prosody: Facial movements accompanying speech. In Proc. of IEEE International Conference on Automatic Faces and Gesture Recognition, pages 396–401, Washington, DC, May 2002.
Google Scholar
B. Granström and D. House. Audiovisual representation of prosody in expressive speech communication. Speech Communication, 46(3–4):473–484, July 2005.
Article Google Scholar
J. Gratch and S. Marsella. Lessons from emotion psychology for the design of lifelike characters. Applied Artificial Intelligence, 19(3–4):215–233, March–April 2005.
Article Google Scholar
D. Heylen. Challenges ahead head movements and other social acts in conversation. In Artificial Intelligence and Simulation of Behaviour (AISB 2005), Social Presence Cues for Virtual Humanoids Symposium, page 8, Hertfordshire, U.K., April 2005.
Google Scholar
H. Hill and A. Johnston. Categorizing sex and identity from the biological motion of faces. Current Biology, 11(11):880–885, June 2001.
Article Google Scholar
H. Hotelling. Relations between two sets of variates. Biometrika, 28(3/4): 321–377, December 1936.
Article MATH Google Scholar
L.N. Jefferies, J.T. Enns, S. DiPaola, and A. Arya. Facial actions as visual cues for personality. Computer Animation and Virtual Worlds, 17(3–4):371–382, July 2006.
Google Scholar
K. Kakihara, S. Nakamura, and K. Shikano. Speech-to-face movement synthesis based on HMMS. In IEEE International Conference on Multimedia and Expo (ICME), volume 1, pages 427–430, New York, April 2000.
Google Scholar
S. Kettebekov, M. Yeasin, and R. Sharma. Prosody based audiovisual coanalysis for coverbal gesture recognition. IEEE Transactions on Multimedia, 7(2): 234–242, April 2005.
Article Google Scholar
T. Kuratate, K.G. Munhall, P.E. Rubin, E. Vatikiotis-Bateson, and H. Yehia. Audio-visual synthesis of talking faces from speech production correlates. In Sixth European Conference on Speech Communication and Technology, Eurospeech 1999, pages 1279–1282, Budapest, Hungary, September 1999.
Google Scholar
S. Lee, S. Yildirim, A. Kazemzadeh, and S. Narayanan. An articulatory study of emotional speech production. In 9th European Conference on Speech Communication and Technology (Interspeech’2005—Eurospeech), pages 497–500, Lisbon, Portugal, September 2005.
Google Scholar
Y. Linde, A. Buzo, and R. Gray. An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1):84–95, January 1980.
Article Google Scholar
Maya software, Alias Systems division of Silicon Graphics Limited. http://www.alias.com, 2005.
Google Scholar
K.G. Munhall, J.A. Jones, D.E. Callan, T. Kuratate, and E. Vatikiotis-Bateson. Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science, 15(2):133–137, February 2004.
Article Google Scholar
C. Pelachaud, N. Badler, and M. Steedman. Generating facial expressions for speech. Cognitive Science, 20(1):1–46, January 1996.
Article Google Scholar
R.W. Picard. Affective computing. Technical Report 321, MIT Media Laboratory Perceptual Computing Section, Cambridge, MA, November 1995.
Google Scholar
L.R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, February 1989.
Article Google Scholar
M.E. Sargin, O. Aran, A. Karpov, F. Ofli, Y. Yasinnik, S. Wilson, E. Erzin, Y. Yemez, and A.M. Tekalp. Combined gesture-speech analysis and speech driven gesture synthesis. In IEEE International Conference on Multimedia and Expo (ICME 2006), pages 893–896, Toronto, ON, Canada, July 2006.
Google Scholar
K.R. Scherer. Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2):227–256, April 2003.
Article MATH Google Scholar
K. Shoemake. Animating rotation with quaternion curves. Computer Graphics (Proceedings of SIGGRAPH85), 19(3):245–254, July 1985.
Article Google Scholar
K. Smid, I.S. Pandzic, and V. Radman. Autonomous speaker agent. In IEEE 17th International Conference on Computer Animation and Social Agents (CASA 2004), pages 259–266, Geneva, Switzerland, July 2004.
Google Scholar
E. Vatikiotis-Bateson, K.G. Munhall, Y. Kasahara, F. Garcia, and H. Yehia. Characterizing audiovisual information during speech. In Fourth International Conference on Spoken Language Processing (ICSLP 96), volume 3, pages 1485–1488, Philadelphia, PA, October 1996.
Google Scholar
H. Yehia, T. Kuratate, and E. Vatikiotis-Bateson. Facial animation and head motion driven by speech acoustics. In 5th Seminar on Speech Production: Models and Data, pages 265–268, Kloster Seeon, Bavaria, Germany, May 2000.
Google Scholar
H. Yehia, P. Rubin, and E. Vatikiotis-Bateson. Quantitative association of vocal-tract and facial behavior. Speech Commun., 26(1–2):23–43, 1998.
Article Google Scholar
S. Yildirim, M. Bulut, C.M. Lee, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S.S. Narayanan. An acoustic study of emotions expressed in speech. In 8th International Conference on Spoken Language Processing (ICSLP 04), Jeju Island, Korea, 2004.
Google Scholar
S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. The HTK Book. Entropic Cambridge Research Laboratory, Cambridge, UK, 2002.
Google Scholar

Download references

Authors

Carlos Busso
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Deng
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Neumann
View author publications
You can also search for this author in PubMed Google Scholar
Shrikanth Narayanan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Houston, Houston, TX, USA
Zhigang Deng PhD
University of Southern California, Los Angeles, CA, USA
Ulrich Neumann PhD

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Busso, C., Deng, Z., Neumann, U., Narayanan, S. (2008). Learning Expressive Human-Like Head Motion Sequences from Speech. In: Deng, Z., Neumann, U. (eds) Data-Driven 3D Facial Animation. Springer, London. https://doi.org/10.1007/978-1-84628-907-1_6

Download citation

DOI: https://doi.org/10.1007/978-1-84628-907-1_6
Publisher Name: Springer, London
Print ISBN: 978-1-84628-906-4
Online ISBN: 978-1-84628-907-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics