Skip to main content

Synthesizing cooperative conversation

  • Conference paper
  • First Online:
  • 282 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1374))

Abstract

We describe an implemented system which automatically generates and animates conversations between multiple human-like agents with appropriate and synchronized speech, intonation, facial expressions, and hand gestures. Conversations are created by a dialogue planner that produces the text as well as the intonation of the utterances. The speaker/listener relationship, the text, and the intonation in turn drive facial expressions, lip motions, eye gaze, head motion, and arm gesture generators.

The original version of the paper was written while this author was working at the Université di Roma “La Sapienza”.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Argyle, M. and Cook, M., (1976) Gaze and Mutual gaze, Cambridge University Press.

    Google Scholar 

  • Badler, Norman, Phillips, Carry, and Webber, Bonnie (1993) Simulating Humans: Computer Graphics Animation and Control, Oxford University Press.

    Google Scholar 

  • Becket, Tripp M. (1994) The jack lisp api, Technical Report MS-CIS-94-01, Graphics Lab 59, University of Pennsylvania.

    Google Scholar 

  • Biermann, Alan W, Guinn, Curry I., Hipp, Richard and Smith, Ronnie W. (1993) ‘Efficient collaborative discourse: A theory and its implementation'. In Proceedings of the ARPA Human Language Technology Workshop, 177–181.

    Google Scholar 

  • Bolinger, Dwight (1989) Intonation and its uses, Stanford University Press.

    Google Scholar 

  • Calvert, Tom (1991) ‘Composition of realistic animation sequences for multiple human figures'. In Making Them Move: Mechanics, Control, and Animation of Articulated Figures, Badler, Norman I., Barsky, Brian A., and Zeltzer, David (eds.), San Mateo, CA: Morgan-Kaufmann, 35–50.

    Google Scholar 

  • Cassell, Justine, Pelachaud, Catherine, Badler, Norman, Steedman, Mark, Achorn, Brett, Becket, Tripp, Douville, Brett, Prevost, Scott, and Stone, Matthew (1994) ‘Animated conversation:rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents'. In Computer Graphics '94, 413–420.

    Google Scholar 

  • Cassell, Justine, Stone, Matthew, Douville, Brett, Prevost, Scott, Achorn, Brett, Badler, Norm, Steedman, Mark and Pelachaud, Catherine (1994) ‘Modeling the Interaction between Speech and Gesture'. In Proceedings of the Sixteenth Annual Meeting of the Cognitive Science Society, Atlanta, GA.

    Google Scholar 

  • Chen, D. T., Pieper, S. D., Singh, S. K., Rosen, J. M., and Zeltzer, D. (1993) ‘The virtual sailor: An implementation of interactive human body modeling'. In Proc. 1993 Virtual Reality Annual International Symposium, Seattle, WA:IEEE.

    Google Scholar 

  • Cohen, M. M. and Massaro, D. W. (1993) ‘Modeling coarticulation in synthetic visual speech'. In Models and Techniques in Computer Animation, Magnenat-Thalmann, M. and Thalmann, D. (eds.), Tokyo:Springer-Verlag.

    Google Scholar 

  • Collier, G. (1985) Emotional Expression, Lawrence Erlbaum Associates.

    Google Scholar 

  • Condon, W.S. and Osgton, W.D. (1971) 'speech and body motion synchrony of the speaker-hearer'. In The perception of Language, Horton, D.H. and Jenkins, J.J. (eds.), Academic Press, 150–184.

    Google Scholar 

  • Dale, Robert (1989) Generating Referring Expressions in a Domain of Objects and Processes, PhD thesis, Centre for Cognitive Science, University of Edinburgh.

    Google Scholar 

  • Davis, James and Hirschberg, Julia (1988) ‘Assigning intonational features in synthesized spoken discourse'. In ACL, Buffalo, 187–193.

    Google Scholar 

  • Duncan, S. (1974) 'some signals and rules for taking speaking turns in conversations'. In Nonverbal Communication, Weitz (ed.), Oxford University Press.

    Google Scholar 

  • Ekman, Paul (1976) Movements with precise meanings, The Journal of Communication, 26.

    Google Scholar 

  • Ekman, P. and Friesen, W. (1978) Facial Action Coding System, Consulting Psychologists Press, Inc.

    Google Scholar 

  • Essai, I.A. and Pentland, A. (1994) ‘A vision system for observing and extracting facial action parameters'. In Proceedings of Computer Vision and Pattern Recognition (CVPR 94), 76–83.

    Google Scholar 

  • Feiner, S. and McKeown, K.R. (1990) ‘Generating coordinated multimedia explanations'. In Proceedings of the Sixth Conference on Artificial Intelligence Applications, 290–296.

    Google Scholar 

  • Gourret, Jean-Paul, Magnenat-Thalmann, Nadia, and Thalmann, Daniel (1989) ‘Simulation of object and human skin deformations in a grasping task'. In Computer Graphics, 23(3), 21–30.

    Article  Google Scholar 

  • Guinn, Curry I. (1993) ‘A computational model of dialogue initiative in collaborative discourse'. In Human-Computer Collaboration: Reconciling Theory, Synthesizing Practice, Papers from the 1993 Fall Symposium Series, AAAI Technical Report FS-93-05.

    Google Scholar 

  • Hajičovà, Eva and Sgall (1988) ‘Topic and focus of a sentence and the patterning of a text'. In Text and Discourse Constitution, Petofi, Jànos (ed.), Berlin: De Gruyter.

    Google Scholar 

  • Halliday, Michael (1967) Intonation and Grammar in British English, The Hague: Mouton.

    Google Scholar 

  • Hill, D.R., Pearce, A., and Wyvill, B. (1988) ‘Animating speech: an automated approach using speech synthesised by rules'. In The Visual Computer, 3, 277–289.

    Article  Google Scholar 

  • Houghton, George (1986) The Production of Language in Dialogue: A Computational Model. PhD thesis, University of Sussex.

    Google Scholar 

  • Houghton, George and Isard, Stephen (1987) ‘Why to speak, what to say and how to say it'. In Modelling Cognition, Morris, P. (ed.), Wiley.

    Google Scholar 

  • Houghton, George and Pearson, M., ‘The production of spoken dialogue'. In Advances in Natural Language Generation: An Interdisciplinary Perspective, Vol. 1, Zock, M. and Sabah, G. (eds.), London: Pinter Publishers.

    Google Scholar 

  • Hovy, Eduard H (1988) ‘Planning coherent multisentential text'. In ACL, 163–169.

    Google Scholar 

  • Kalra, P., Mangili, A., Magnenat-Thalmann, N., and Thalmann, D. (1991) 'sMILE: A multilayered facial animation system'. In Modeling in Computer Graphics, Kunii, T.L. (ed.), Springer-Verlag.

    Google Scholar 

  • Kendon, Adam (1974) ‘Movement coordination in social interaction: some examples described'. In Nonverbal Communication, Weitz (ed.), Oxford University Press.

    Google Scholar 

  • Kendon, Adam (1980) ‘Gesticulation and speech: Two aspects of the process of utterance'. In The Relation between Verbal and Nonverbal Communication, Key, M.R. (ed.), Mouton, 207–227.

    Google Scholar 

  • Lee, Jintae and Kunii, Tosiyasu L. (1993) ‘Visual translation: From native language to sign language'. In Workshop on Visual Languages, Seattle, WA:IEEE.

    Google Scholar 

  • Lee, Philip, Wei, Susanna, Zhao, Jianmin, and Badler, Norman I. (1990) ‘Strength guided motion'. In Computer Graphics, 24(4), 253–262.

    Article  Google Scholar 

  • Liberman, Mark and Buchsbaum, A. L. (1985) 'structure and usage of current Bell Labs text to speech programs', Technical Memorandum TM 11225-850731-11, AT&T Bell Laboratories.

    Google Scholar 

  • Loomis, Jeffrey, Poizner, Howard, Bellugi, Ursula, Blakemore, Alynn, and Hollerbach, John (1983) ‘Computer graphic modeling of American Sign Language'. In Computer Graphics, 17(3), 105–114.

    Article  Google Scholar 

  • Lyons, John (1977) Semantics (vol II), Cambridge University Press.

    Google Scholar 

  • Magnenat-Thalmann, Nadia and Thalmann, Daniel (1991) ‘Human body deformations using joint-dependent local operators and finite-element theory'. In Making Them Move: Mechanics, Control, and Animation of Articulated Figures, Badler, Norman I., Barsky, Brian A., and Zeltzer, David (eds.), San Mateo, CA: Morgan-Kaufmann, 243–262.

    Google Scholar 

  • McNeill, David (1992) Hand and Mind: What Gestures Reveal about Thought, University of Chicago.

    Google Scholar 

  • Meteer, Marie W. (1991) ‘Bridging the generation gap between text planning and linguistic realization'. In Computational Intelligence, 7(4), 296–304.

    Article  Google Scholar 

  • Moore, Johanna D. and Paris, Cécile L. (1989) ‘Planning text for advisory dialogues'. In ACL, 203–211.

    Google Scholar 

  • Nahas, M., Huitric, H., and Saintourens, M. (1988) ‘Animation of a B-spline figure'. In The Visual Computer, 3(5), 272–276.

    Article  Google Scholar 

  • Parke, F.I. (1982) ‘A parameterized model for facial animation'. In IEEE Computer Graphics and Applications, 2(9), 61–70.

    Article  Google Scholar 

  • Patel, M. and Willis, P.J. (1991) ‘FACES — The facial animation, construction and editing system'. In Eurographics'91, 33–45.

    Google Scholar 

  • Pearce, A., Wyvill, B., and Hill, D.R. (1986) 'speech and expression: a computer solution to face animation'. In Graphics and Vision Interface '86, 136–140.

    Google Scholar 

  • Pelachaud, Catherine, Badler, Norman I., and Steedman, Marc (1991) ‘Linguistic issues in facial animation'. In Computer Animation '91, Magnenat-Thalmann, N. and Thalmann, D. (eds.), Springer-Verlag, 15–30.

    Google Scholar 

  • Power, Richard, (1977) ‘The organisation of purposeful dialogues'. In Linguistics, 17(1/2), 107–152.

    Google Scholar 

  • Prevost, Scott and Steedman, Mark (1993a) ‘Generating contextually appropriate intonation'. In Proceedings of the Sixth Conference of the European Chapter of the Association for Computational Linguistics, Utrecht, 332–340.

    Google Scholar 

  • Prevost, Scott and Steedman, Mark (1993b) ‘Using context to specify intonation in speech synthesis'. In Proceedings of the 3rd European Conference of Speech Communication and Technology (EUROSPEECH), Berlin, pages 2103–2106.

    Google Scholar 

  • Prevost, Scott and Steedman, Mark (1994) ‘Specifying intonation from context for speech synthesis'. In Speech Communication, 15(1–2), 139–153.

    Article  Google Scholar 

  • Prince, Ellen F. (1992) ‘The ZPG letter: Subjects, definiteness and information status'. In Discourse description: diverse analyses of a fund raising text, Thompsoni, S. and Mann, W. (eds.), John Benjamins B.V., 295–325.

    Google Scholar 

  • Reiter, Ehud (1994) ‘Has a consensus NL generation architecture appeared, and is it psycholinguistically plausible?'. In Seventh International Workshop on Natural Language Generation, 163–170.

    Google Scholar 

  • Rijpkema, Hans and Girard, Michael (1991) ‘Computer animation of hands and grasping'. In Computer Graphics, 25(4), 339–348.

    Article  Google Scholar 

  • Scherer, Klaus R. (1980) ‘The functions of nonverbal signs in conversation'. In The Social and Physiological Contexts of Language, Giles, H. and St. Clair, R. (eds.), Lawrence Erlbaum Associates, 225–243.

    Google Scholar 

  • Shieber, Stuart, Van Noord, Gertjan, Pereira, Fernando and Moore, Robert (1990) ‘Semantic-head-driven generation'. In Computational Linguistics, 16, 30–42.

    Google Scholar 

  • Steedman, Mark (1991) ‘Structure and intonation'. In Language, 67, 260–296.

    Google Scholar 

  • Takeuchi, Akikazu and Nagao, Katashi (1993) ‘Communicative facial displays as a new conversational modality'. In ACM/IFIP INTERCHI'93, Amsterdam.

    Google Scholar 

  • Terken, Jacques (1984) ‘The distribution of accents in instructions as a function of discourse structure'. In Language and Structure, 27, 269–289.

    Google Scholar 

  • Terzopoulos, D. and Waters, K. (1993) ‘Analysis and synthesis of facial image sequences using physical and anatomical models'. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(6), 569–579.

    Article  Google Scholar 

  • Wahlster, Wolfgang, André, Elisabeth, Bandyopadhyay, Son, Graf, Winfried, and Rist, Thomas, ‘WIP: The coordinated generation of multimodal presentations from a common representation'. In Computational Theories of Communication and their Applications, Stock, Oliviero, Slack, John, and Ortony, Andrew (eds.), Berlin: Springer Verlag.

    Google Scholar 

  • Walker, Lyn (1993) Informational redundancy and resource bounds in dialogue, PhD thesis, University of Pennsylvania (Institute for Research in Cognitive Science report IRCS-93-45).

    Google Scholar 

  • Zacharski, R., Monaghan, A.I.C., Ladd, D.R., and Delin, J., (1993) BRIDGE: Basic research on intonation in dialogue generation, Technical report, HCRC: University of Edinburgh, (Unpublished manuscript).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Harry Bunt Robbert-Jan Beun Tijn Borghuis

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag

About this paper

Cite this paper

Pelachaud, C., Cassell, J., Badler, N., Steedman, M., Prevost, S., Stone, M. (1998). Synthesizing cooperative conversation. In: Bunt, H., Beun, RJ., Borghuis, T. (eds) Multimodal Human-Computer Communication. CMC 1995. Lecture Notes in Computer Science, vol 1374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052313

Download citation

  • DOI: https://doi.org/10.1007/BFb0052313

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64380-7

  • Online ISBN: 978-3-540-69764-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics