Multimodal and Multi-party Social Interactions

Yumak, Zerrin; Magnenat-Thalmann, Nadia

doi:10.1007/978-3-319-19947-4_13

Zerrin Yumak⁷ &
Nadia Magnenat-Thalmann⁷

Part of the book series: Human–Computer Interaction Series ((HCIS))

1851 Accesses
3 Citations

Abstract

Virtual characters and robots interacting with people in social contexts should understand the users’ behaviours and respond back with gestures, facial expressions and gaze. The challenges in this area are the estimation of high level user states fusing low level multi-modal sensory input, taking socially appropriate decisions using this partial sensory information and rendering synchronized and timely multi-modal behaviours based on taken decisions. Moreover, these characters should be able to communicate with multiple users and also among each other in multi-party group interactions. In this chapter, we provide an overview of the methods for multi-modal and multi-party interactions and discuss the challenges in this area. We also mention our current work and point out the future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Atrey PK, Hossain MA, Saddik AE, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16(6):345–379
Google Scholar
Berton A, Kaltenmeier A, Haiber U, Schreiner O (2006) Speech recognition. In: Wahlster W (ed) SmartKom: foundations of multimodal dialogue systems. Springer, Berlin
Google Scholar
Bickmore T (2008) Framing and interpersonal stance in relational agents. In: Why conversational agents do what they do. Functional representations for generating conversational agent behavior. AAMAS 2008, Estoril, Portugal
Google Scholar
Bickmore TW, Picard RW (2005) Establishing and maintaining long-term human-computer relationships. ACM Trans Comput-Hum Interact 12:293–327
Google Scholar
Bohus D, Horvitz E (2009) Dialog in the open world: platform and applications. In: Proceedings of the 2009 international conference on multimodal interfaces, ICMI-MLMI’09. ACM, New York, pp 31–38
Google Scholar
Bohus D, Horvitz E (2009) Learning to predict engagement with a spoken dialog system in open-world settings. In: Proceedings of the SIGDIAL 2009 conference: the 10th annual meeting of the special interest group on discourse and dialogue. Association for computational linguistics, pp 244–252
Google Scholar
Bohus D, Horvitz E (2010) Computational models for multiparty turn-taking. Technical report MSR-TR-2010-115, Microsoft technical report
Google Scholar
Bohus D, Horvitz E (2010) Facilitating multiparty dialog with gaze, gesture, and speech. In: International conference on multimodal interfaces and the workshop on machine learning for multimodal interaction, ICMI-MLMI’10. ACM, New York, pp 5:1–5:8
Google Scholar
Bohus D, Horvitz E (2011) Decisions about turns in multiparty conversation: from perception to action. In: Proceedings of the 13th international conference on multimodal interfaces, ICMI’11. ACM, New York, pp 153–160
Google Scholar
Brooks RA (1985) A robust layered control system for a mobile robot. Technical report, Cambridge
Google Scholar
Cassell J, Vilhjálmsson HH, Bickmore T (2001) BEAT: the behavior expression animation toolkit. In: Proceedings of the 28th annual conference on computer graphics and interactive techniques, SIGGRAPH’01. ACM, New York, pp 477–486
Google Scholar
Egges A, Kshirsagar S, Magnenat-Thalmann N (2004) Generic personality and emotion simulation for conversational agents: research articles. Comput Animat Virtual Worlds 15:1–13
Article Google Scholar
Foster ME, Gaschler A, Giuliani M (2013) How can i help you? comparing engagement classification strategies for a robot bartender. In: Proceedings of the 15th international conference on multimodal interaction (ICMI 2013)
Google Scholar
Gebhard P (2005) ALMA: a layered model of affect. In: Proceedings of the fourth international joint conference on autonomous agents and multiagent systems, AAMAS’05. ACM, New York, pp 29–36
Google Scholar
Gockley R, Forlizzi J, Simmons R (2006) Interactions with a moody robot. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction, HRI’06. ACM, New York, pp 186–193
Google Scholar
Kasap Z, Thalmann NM (2010) Towards episodic memory based long-term affective interaction with a human-like robot. In: IEEE international symposium on robot and human interactive communication (RO-MAN). IEEE, pp 479–484
Google Scholar
Kasap Z, Moussa MB, Chaudhuri P, Magnenat-Thalmann N (2009) Making them remember: emotional virtual characters with memory. IEEE Comput Graph Appl 29:20–29
Google Scholar
Keizer S, Foster ME, Lemon O, Gaschler A, Giuliani M (2013) Training and evaluation of an MDP model for social multi-user human-robot interaction. In: Proceedings of the 14th annual SIGdial meeting on discourse and dialogue
Google Scholar
Kendon A (2010) Spacing and orientation in co-present interaction. In: Proceedings of the second international conference on development of multimodal interfaces: active listening and synchrony, COST’09. Springer, Berlin, Heidelberg, pp 1–15
Google Scholar
Kipp M, Neff M, Kipp KH, Albrecht I (2007) Towards natural gesture synthesis: evaluating gesture units in a data-driven approach to gesture synthesis. In: Pelachaud C, Martin J-C, André E, Chollet G, Karpouzis K, Pelé D (eds) Intelligent virtual agents. Lecture notes in computer science, vol 4722. Springer, Berlin, pp 15–28
Google Scholar
Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: Proceedings of the 6th international conference on intelligent virtual agents, IVA’06. Springer, Berlin, Heidelberg, pp 205–217
Google Scholar
Krenn B, Sieber G (2008) Functional markup for behaviour planning: theory and practice. In: Proceedings of the AAMAS 2008 workshop FML: functional markup language. Why conversational agents do what they do, AAMAS’08
Google Scholar
Lee J, DeVault D, Marsella S, Traum D (2008) Thoughts on fml: behavior generation in the virtual human communication architecture. In: Proceedings of the 1st functional markup language workshop
Google Scholar
Lee J, Marsella S (2012) Modeling speaker behavior: a comparison of two approaches. In: Nakano Y, Neff M, Paiva A, Walker M (eds) Intelligent virtual agents. Lecture notes in computer science, vol 7502. Springer, Berlin, pp 161–174
Google Scholar
Lee J, Marsella S (2006) Nonverbal behavior generator for embodied conversational agents. Intelligent virtual agents. Lecture notes in computer science, vol 4133. Springer, Berlin, pp 243–255
Google Scholar
Lombard M, Ditton TB, Crane D, Davis B, Gil-Egui G, Horvath K, Rossman J, Park S (2000) Measuring presence: a literature-based approach to the development of a standardized paper-and-pencil instrument. In: IJsselsteijn W, Freeman J, de Ridder H (eds) Proceedings of the third international workshop on presence
Google Scholar
Mascardi V, Demergasso D, Ancona D (2005) Languages for programming BDI-style agents: an overview. In: Woa’05
Google Scholar
Michalowski MP (2006) A spatial model of engagement for a social robot. In: Proceedings of the 9th international workshop on advanced motion control (AMC 2006)
Google Scholar
Mutlu B, Kanda T, Forlizzi J, Hodgins J, Ishiguro H (2012) Conversational gaze mechanisms for humanlike robots. ACM Trans Interact Intell Syst 1(2):12:1–12:33
Google Scholar
Otsuka K, Sawada H, Yamato J (2007) Automatic inference of cross-modal nonverbal interactions in multiparty conversations: who responds to whom, when, and how? from gaze, head gestures, and utterances. In: Proceedings of the 9th international conference on multimodal interfaces, ICMI’07. ACM, New York, pp 255–262
Google Scholar
Peters C, Pelachaud C, Bevacqua E, Mancini M, Poggi I (2005) A model of attention and interest using gaze behavior. In: Intelligent virtual agents. Springer, London, pp 229–240
Google Scholar
Selfridge EO, Arizmendi I, Heeman PA, Williams JD (2011) Stability and accuracy in incremental speech recognition. In: Proceedings of the SIGDIAL 2011 conference, SIGDIAL’11. Stroudsburg, Association for Computational Linguistics, pp 110–119
Google Scholar
Shapiro A (2011) Building a character animation system. In: Motion in games. Springer, London, pp 98–109
Google Scholar
Si M, Marsella SC, Pynadath DV (2006) Thespian: Modeling socially normative behavior in a decision-theoretic framework. In: Gratch J, Young M, Aylett R, Ballin D, Olivier P (eds) Intelligent virtual agents. Lecture notes in computer science, vol 4133. Springer, Berlin, pp 369–382
Google Scholar
Sidner CL, Kidd CD, Lee C, Lesh N (2004) Where to look: a study of human-robot engagement. In: Proceedings of the 9th international conference on intelligent user interfaces, IUI’04. ACM, New York, pp 78–84
Google Scholar
Stiefelhagen R, Ekenel HK, Fügen C, Gieselmann P, Holzapfel H, Kraft F, Nickel K, Voit M, Waibel A (2007) Enabling multimodal human-robot interaction for the karlsruhe humanoid robot. IEEE Trans Robot 23(5):840–851
Google Scholar
Traum D (2004) Issues in multi-party dialogues. In: Dignum F (ed) Advances in agent communication, pp 201–211
Google Scholar
Vijayasenan D, Valente F, Bourlard H (2012) Multistream speaker diarization of meetings recordings beyond mfcc and tdoa features. Speech Commun 54(1):55–67
Article Google Scholar
Wang Z, Lee J, Marsella SC (2013) Multi-party, multi-role comprehensive listening behavior. J Auton Agents Multi-Agent Syst
Google Scholar
Yumak Z, Ren J, Magnenat-Thalmann N, Yuan J (2014) Modelling multi-party interactions among virtual characters, robots and humans. MIT presence: teleoperators and virtual environments (presence), vol 23(2). MIT Press, Cambridge, pp 172–190
Google Scholar
Yumak Z, Ren J, Thalmann NM, Yuan J (2014) Tracking and fusion for multiparty interaction with a virtual character and a social robot. In: SIGGRAPH Asia 2014 autonomous virtual humans and social robot for telepresence. ACM, New York, p 3
Google Scholar

Download references

Author information

Authors and Affiliations

BeingThere Centre, Nanyang Technological University, Singapore, Singapore
Zerrin Yumak & Nadia Magnenat-Thalmann

Authors

Zerrin Yumak
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Magnenat-Thalmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zerrin Yumak .

Editor information

Editors and Affiliations

Institute for Media Innovation, Nanyang Technology University, Singapore, Singapore
Nadia Magnenat-Thalmann
Institute for Media Innovation, Nanyang Technological University, Singapore, Singapore
Junsong Yuan
Institute for Media Innovation, Nanyang Technological University, Singapore, Singapore
Daniel Thalmann
Center of HCI for Coexistence, Korea Inst. of Science &Technology, Seoul, Korea, Republic of (South Korea)
Bum-Jae You

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yumak, Z., Magnenat-Thalmann, N. (2016). Multimodal and Multi-party Social Interactions. In: Magnenat-Thalmann, N., Yuan, J., Thalmann, D., You, BJ. (eds) Context Aware Human-Robot and Human-Agent Interaction. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-319-19947-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-19947-4_13
Published: 26 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19946-7
Online ISBN: 978-3-319-19947-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics