The Influence of Prosody on the Requirements for Gesture-Text Alignment

Wang, Yingying; Neff, Michael

doi:10.1007/978-3-642-40415-3_16

The Influence of Prosody on the Requirements for Gesture-Text Alignment

Yingying Wang²³ &
Michael Neff²³

Conference paper

2961 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8108))

Abstract

Designing an agent capable of multimodal communication requires synchronization of the agent’s performance across its communication channels: text, prosody, gesture, body movement and facial expressions. The synchronization of gesture and spoken text has significant repercussions for agent design. To explore this issue, we examined people’s sensitivity to misalignments between gesture and spoken text, varying both the gesture type and the prosodic emphasis. This study included ratings of individual clips and ratings of paired clips with different alignments. Subjects were unable to notice alignment errors of up to ±0.6s when shown a single clip. However, when shown paired clips, gestures occurring after the lexical affiliate are rated less positively. There is also evidence that stronger prosody cues make people more sensitive to misalignment. This suggests that agent designers may be able to “cheat” when it comes to maintaining tight synchronization between audio and gesture without a decrease in agent naturalness, but this cheating may not be optimal.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adolphs, R.: Neural systems for recognizing emotion. Current Opinion in Neurobiology 12(2), 169–177 (2002)
Article Google Scholar
Albrecht, I., Haber, J., Seidel, H.P.: Automatic generation of non-verbal facial expressions from speech. In: Advances in Modelling, Animation and Rendering, pp. 283–293. Springer, Heidelberg (2002)
Chapter Google Scholar
Boersma, P.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2002)
Google Scholar
Bregler, C., Covell, M., Slaney, M.: Video rewrite: Driving visual speech with audio. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, pp. 353–360. ACM Press/Addison-Wesley Publishing Co (1997)
Google Scholar
Cassell, J., Vilhjálmsson, H., Bickmore, T.: BEAT: the behavior expression animation toolkit. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 477–486. ACM (2001)
Google Scholar
Chuang, E., Bregler, C.: Mood swings: expressive speech animation. ACM Transactions on Graphics (TOG) 24(2), 331–347 (2005)
Article Google Scholar
Efron, D.: Gesture and Environments. King’s Crown Press (1941)
Google Scholar
Inc., A.: Maya, 3d computer graphics software (2008)
Google Scholar
Ju, E., Lee, J.: Expressive facial gestures from motion capture data, vol. 27(2), pp. 381–388 (2008)
Google Scholar
Kendon, A.: Current issues in the study of gesture. In: The Biological Foundations of Gestures: Motor and Semiotic Aspects, pp. 23–47 (1986)
Google Scholar
Kirchhof, C.: On the audiovisual integration of speech and gesture. In: The 5th Conference of the International Society for Gesture Studies, ISGS (2012)
Google Scholar
Kopp, S., Wachsmuth, I.: Synthesizing multimodal utterances for conversational agents. Computer Animation and Virtual Worlds 15(1), 39–52 (2004)
Article Google Scholar
Levine, S., Krähenbühl, P., Thrun, S., Koltun, V.: Gesture controllers. ACM Transactions on Graphics (TOG) 29(4), 124 (2010)
Article Google Scholar
Levine, S., Theobalt, C., Koltun, V.: Real-time prosody-driven synthesis of body language. ACM Transactions on Graphics (TOG) 28, 172 (2009)
Article Google Scholar
McNeill, D.: Hand and mind: What gestures reveal about thought. University of Chicago Press (1992)
Google Scholar
McNeill, D.: Gesture and thought. University of Chicago Press (2008)
Google Scholar
Montepare, J., Koff, E., Zaitchik, D., Albert, M.: The use of body movements and gestures as cues to emotions in younger and older adults. Journal of Nonverbal Behavior 23(2), 133–152 (1999)
Article Google Scholar
Morency, L.P., Sidner, C., Lee, C., Darrell, T.: Head gestures for perceptual interfaces: The role of context in improving recognition. Artificial Intelligence 171(8), 568–585 (2007)
Article Google Scholar
Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Transactions on Graphics (TOG) 27(1), 5 (2008)
Article Google Scholar
Rimé, B., Schiaratura, L.: Gesture and speech (1991)
Google Scholar
Sargin, M.E., Erzin, E., Yemez, Y., Tekalp, A., Erdem, A., Erdem, C., Ozkan, M.: Prosody-driven head-gesture animation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 2, pp. II–677. IEEE (2007)
Google Scholar
Schröder, M.: Speech and emotion research
Google Scholar
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: Creating animated conversational characters from recordings of human performance. ACM Transactions on Graphics (TOG) 23(3), 506–513 (2004)
Article Google Scholar
Terken, J.: Fundamental frequency and perceived prominence of accented syllables. The Journal of the Acoustical Society of America 89, 1768 (1991)
Article Google Scholar
Wallbott, H.G.: Bodily expression of emotion. European Journal of Social Psychology 28(6), 879–896 (1998)
Article Google Scholar
Yasinnik, Y., Renwick, M., Shattuck-Hufnagel, S.: The timing of speech-accompanying gestures with respect to prosody. In: Proceedings of the International Conference: From Sound to Sense, vol. 50, pp. 10–13 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Davis, USA
Yingying Wang & Michael Neff

Authors

Yingying Wang
View author publications
You can also search for this author in PubMed Google Scholar
Michael Neff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MACS, Heriot-Watt University, Riccarton, EH14 4AS, Edinburgh, UK
Ruth Aylett
Austrian Research Institute for Artificial Intelligence (OFAI), 1010, Vienna, Austria
Brigitte Krenn
CNRS-LTCI, Telecom-ParisTech, 75014, Paris, France
Catherine Pelachaud
School of Informatics, The University of Edinburgh, EH8 9LW, Edinburgh, UK
Hiroshi Shimodaira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Neff, M. (2013). The Influence of Prosody on the Requirements for Gesture-Text Alignment. In: Aylett, R., Krenn, B., Pelachaud, C., Shimodaira, H. (eds) Intelligent Virtual Agents. IVA 2013. Lecture Notes in Computer Science(), vol 8108. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40415-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-40415-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40414-6
Online ISBN: 978-3-642-40415-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics