Abstract
Virtual characters and robots interacting with people in social contexts should understand the users’ behaviours and respond back with gestures, facial expressions and gaze. The challenges in this area are the estimation of high level user states fusing low level multi-modal sensory input, taking socially appropriate decisions using this partial sensory information and rendering synchronized and timely multi-modal behaviours based on taken decisions. Moreover, these characters should be able to communicate with multiple users and also among each other in multi-party group interactions. In this chapter, we provide an overview of the methods for multi-modal and multi-party interactions and discuss the challenges in this area. We also mention our current work and point out the future research directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Atrey PK, Hossain MA, Saddik AE, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16(6):345–379
Berton A, Kaltenmeier A, Haiber U, Schreiner O (2006) Speech recognition. In: Wahlster W (ed) SmartKom: foundations of multimodal dialogue systems. Springer, Berlin
Bickmore T (2008) Framing and interpersonal stance in relational agents. In: Why conversational agents do what they do. Functional representations for generating conversational agent behavior. AAMAS 2008, Estoril, Portugal
Bickmore TW, Picard RW (2005) Establishing and maintaining long-term human-computer relationships. ACM Trans Comput-Hum Interact 12:293–327
Bohus D, Horvitz E (2009) Dialog in the open world: platform and applications. In: Proceedings of the 2009 international conference on multimodal interfaces, ICMI-MLMI’09. ACM, New York, pp 31–38
Bohus D, Horvitz E (2009) Learning to predict engagement with a spoken dialog system in open-world settings. In: Proceedings of the SIGDIAL 2009 conference: the 10th annual meeting of the special interest group on discourse and dialogue. Association for computational linguistics, pp 244–252
Bohus D, Horvitz E (2010) Computational models for multiparty turn-taking. Technical report MSR-TR-2010-115, Microsoft technical report
Bohus D, Horvitz E (2010) Facilitating multiparty dialog with gaze, gesture, and speech. In: International conference on multimodal interfaces and the workshop on machine learning for multimodal interaction, ICMI-MLMI’10. ACM, New York, pp 5:1–5:8
Bohus D, Horvitz E (2011) Decisions about turns in multiparty conversation: from perception to action. In: Proceedings of the 13th international conference on multimodal interfaces, ICMI’11. ACM, New York, pp 153–160
Brooks RA (1985) A robust layered control system for a mobile robot. Technical report, Cambridge
Cassell J, Vilhjálmsson HH, Bickmore T (2001) BEAT: the behavior expression animation toolkit. In: Proceedings of the 28th annual conference on computer graphics and interactive techniques, SIGGRAPH’01. ACM, New York, pp 477–486
Egges A, Kshirsagar S, Magnenat-Thalmann N (2004) Generic personality and emotion simulation for conversational agents: research articles. Comput Animat Virtual Worlds 15:1–13
Foster ME, Gaschler A, Giuliani M (2013) How can i help you? comparing engagement classification strategies for a robot bartender. In: Proceedings of the 15th international conference on multimodal interaction (ICMI 2013)
Gebhard P (2005) ALMA: a layered model of affect. In: Proceedings of the fourth international joint conference on autonomous agents and multiagent systems, AAMAS’05. ACM, New York, pp 29–36
Gockley R, Forlizzi J, Simmons R (2006) Interactions with a moody robot. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction, HRI’06. ACM, New York, pp 186–193
Kasap Z, Thalmann NM (2010) Towards episodic memory based long-term affective interaction with a human-like robot. In: IEEE international symposium on robot and human interactive communication (RO-MAN). IEEE, pp 479–484
Kasap Z, Moussa MB, Chaudhuri P, Magnenat-Thalmann N (2009) Making them remember: emotional virtual characters with memory. IEEE Comput Graph Appl 29:20–29
Keizer S, Foster ME, Lemon O, Gaschler A, Giuliani M (2013) Training and evaluation of an MDP model for social multi-user human-robot interaction. In: Proceedings of the 14th annual SIGdial meeting on discourse and dialogue
Kendon A (2010) Spacing and orientation in co-present interaction. In: Proceedings of the second international conference on development of multimodal interfaces: active listening and synchrony, COST’09. Springer, Berlin, Heidelberg, pp 1–15
Kipp M, Neff M, Kipp KH, Albrecht I (2007) Towards natural gesture synthesis: evaluating gesture units in a data-driven approach to gesture synthesis. In: Pelachaud C, Martin J-C, André E, Chollet G, Karpouzis K, Pelé D (eds) Intelligent virtual agents. Lecture notes in computer science, vol 4722. Springer, Berlin, pp 15–28
Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: Proceedings of the 6th international conference on intelligent virtual agents, IVA’06. Springer, Berlin, Heidelberg, pp 205–217
Krenn B, Sieber G (2008) Functional markup for behaviour planning: theory and practice. In: Proceedings of the AAMAS 2008 workshop FML: functional markup language. Why conversational agents do what they do, AAMAS’08
Lee J, DeVault D, Marsella S, Traum D (2008) Thoughts on fml: behavior generation in the virtual human communication architecture. In: Proceedings of the 1st functional markup language workshop
Lee J, Marsella S (2012) Modeling speaker behavior: a comparison of two approaches. In: Nakano Y, Neff M, Paiva A, Walker M (eds) Intelligent virtual agents. Lecture notes in computer science, vol 7502. Springer, Berlin, pp 161–174
Lee J, Marsella S (2006) Nonverbal behavior generator for embodied conversational agents. Intelligent virtual agents. Lecture notes in computer science, vol 4133. Springer, Berlin, pp 243–255
Lombard M, Ditton TB, Crane D, Davis B, Gil-Egui G, Horvath K, Rossman J, Park S (2000) Measuring presence: a literature-based approach to the development of a standardized paper-and-pencil instrument. In: IJsselsteijn W, Freeman J, de Ridder H (eds) Proceedings of the third international workshop on presence
Mascardi V, Demergasso D, Ancona D (2005) Languages for programming BDI-style agents: an overview. In: Woa’05
Michalowski MP (2006) A spatial model of engagement for a social robot. In: Proceedings of the 9th international workshop on advanced motion control (AMC 2006)
Mutlu B, Kanda T, Forlizzi J, Hodgins J, Ishiguro H (2012) Conversational gaze mechanisms for humanlike robots. ACM Trans Interact Intell Syst 1(2):12:1–12:33
Otsuka K, Sawada H, Yamato J (2007) Automatic inference of cross-modal nonverbal interactions in multiparty conversations: who responds to whom, when, and how? from gaze, head gestures, and utterances. In: Proceedings of the 9th international conference on multimodal interfaces, ICMI’07. ACM, New York, pp 255–262
Peters C, Pelachaud C, Bevacqua E, Mancini M, Poggi I (2005) A model of attention and interest using gaze behavior. In: Intelligent virtual agents. Springer, London, pp 229–240
Selfridge EO, Arizmendi I, Heeman PA, Williams JD (2011) Stability and accuracy in incremental speech recognition. In: Proceedings of the SIGDIAL 2011 conference, SIGDIAL’11. Stroudsburg, Association for Computational Linguistics, pp 110–119
Shapiro A (2011) Building a character animation system. In: Motion in games. Springer, London, pp 98–109
Si M, Marsella SC, Pynadath DV (2006) Thespian: Modeling socially normative behavior in a decision-theoretic framework. In: Gratch J, Young M, Aylett R, Ballin D, Olivier P (eds) Intelligent virtual agents. Lecture notes in computer science, vol 4133. Springer, Berlin, pp 369–382
Sidner CL, Kidd CD, Lee C, Lesh N (2004) Where to look: a study of human-robot engagement. In: Proceedings of the 9th international conference on intelligent user interfaces, IUI’04. ACM, New York, pp 78–84
Stiefelhagen R, Ekenel HK, Fügen C, Gieselmann P, Holzapfel H, Kraft F, Nickel K, Voit M, Waibel A (2007) Enabling multimodal human-robot interaction for the karlsruhe humanoid robot. IEEE Trans Robot 23(5):840–851
Traum D (2004) Issues in multi-party dialogues. In: Dignum F (ed) Advances in agent communication, pp 201–211
Vijayasenan D, Valente F, Bourlard H (2012) Multistream speaker diarization of meetings recordings beyond mfcc and tdoa features. Speech Commun 54(1):55–67
Wang Z, Lee J, Marsella SC (2013) Multi-party, multi-role comprehensive listening behavior. J Auton Agents Multi-Agent Syst
Yumak Z, Ren J, Magnenat-Thalmann N, Yuan J (2014) Modelling multi-party interactions among virtual characters, robots and humans. MIT presence: teleoperators and virtual environments (presence), vol 23(2). MIT Press, Cambridge, pp 172–190
Yumak Z, Ren J, Thalmann NM, Yuan J (2014) Tracking and fusion for multiparty interaction with a virtual character and a social robot. In: SIGGRAPH Asia 2014 autonomous virtual humans and social robot for telepresence. ACM, New York, p 3
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Yumak, Z., Magnenat-Thalmann, N. (2016). Multimodal and Multi-party Social Interactions. In: Magnenat-Thalmann, N., Yuan, J., Thalmann, D., You, BJ. (eds) Context Aware Human-Robot and Human-Agent Interaction. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-319-19947-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-19947-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19946-7
Online ISBN: 978-3-319-19947-4
eBook Packages: Computer ScienceComputer Science (R0)