Cooperating with Avatars Through Gesture, Language and Action

Narayana, Pradyumna; Krishnaswamy, Nikhil; Wang, Isaac; Bangar, Rahul; Patil, Dhruva; Mulay, Gururaj; Rim, Kyeongmin; Beveridge, Ross; Ruiz, Jaime; Pustejovsky, James; Draper, Bruce

doi:10.1007/978-3-030-01054-6_20

Pradyumna Narayana¹⁷,
Nikhil Krishnaswamy¹⁸,
Isaac Wang¹⁹,
Rahul Bangar¹⁷,
Dhruva Patil¹⁷,
Gururaj Mulay¹⁷,
Kyeongmin Rim¹⁸,
Ross Beveridge¹⁷,
Jaime Ruiz¹⁹,
James Pustejovsky¹⁸ &
…
Bruce Draper¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 868))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

1700 Accesses
7 Citations

Abstract

Advances in artificial intelligence are fundamentally changing how we relate to machines. We used to treat computers as tools, but now we expect them to be agents, and increasingly our instinct is to treat them like peers. This paper is an exploration of peer-to-peer communication between people and machines. Two ideas are central to the approach explored here: shared perception, in which people work together in a shared environment, and much of the information that passes between them is contextual and derived from perception; and visually grounded reasoning, in which actions are considered feasible if they can be visualized and/or simulated in 3D. We explore shared perception and visually grounded reasoning in the context of blocks world, which serves as a surrogate for cooperative tasks where the partners share a workspace. We begin with elicitation studies observing pairs of people working together in blocks world and noting the gestures they use. These gestures are grouped into three categories: social, deictic, and iconic gestures. We then build a prototype system in which people are paired with avatars in a simulated blocks world. We find that when participants can see but not hear each other, all three gesture types are necessary, but that when the participants can speak to each other the social and deictic gestures remain important while the iconic gestures become less so. We also find that ambiguities flip the conversational lead, in that the partner previously receiving information takes the lead in order to resolve the ambiguity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The Kinect v2 estimates the positions of 25 joints, but the 8 lower-body joints are consistently obscured by the table.

References

Küster, D., Krumhuber, E., Kappas, A.: Nonverbal behavior online: a focus on interactions with and via artificial agents and avatars. In: The Social Psychology of Nonverbal Communication, pp. 272–302. Springer (2015)
Google Scholar
Wobbrock, J.O., Morris, M.R., Wilson, A.D.: User-defined gestures for surface computing. In: CHI 2009, pp. 1083–1092. ACM, New York (2009). http://doi.acm.org/10.1145/1518701.1518866
Sproull, L., Subramani, M., Kiesler, S., Walker, J.H., Waters, K.: When the interface is a face. Hum. Comput. Interact. 11(2), 97–124 (1996)
Article Google Scholar
Dastani, M., Lorini, E., Meyer, J.-J., Pankov, A.: Other-condemning anger \(=\) blaming accountable agents for unattainable desires. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 1520–1522. International Foundation for Autonomous Agents and Multiagent Systems (2017)
Google Scholar
Li, J.: The benefit of being physically present: a survey of experimental works comparing copresent robots, telepresent robots and virtual agents. Int. J. Hum. Comput. Stud. 77, 23–37 (2015)
Article Google Scholar
Bolt, R.A.: “Put-that-there”: voice and gesture at the graphics interface. ACM 14(3), 262–270 (1980)
Google Scholar
Dumas, B., Lalanne, D., Oviatt, S.: Multimodal interfaces: a survey of principles, models and frameworks. In: Human Machine Interaction, pp. 3–26 (2009)
Google Scholar
Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)
Article Google Scholar
Quek, F., McNeill, D., Bryll, R., Duncan, S., Ma, X.-F., Kirbas, C., McCullough, K.E., Ansari, R.: Multimodal human discourse: gesture and speech. ACM Trans. Comput. Hum. Interact. (TOCHI) 9(3), 171–193 (2002)
Article Google Scholar
Clark, H.H., Brennan, S.E.: Grounding in communication. In: Resnick, L.B., Levine, J.M., Teasley, S.D. (eds.) Perspectives on Socially Shared Cognition, vol. 13, pp. 127–149. American Psychological Association (1991)
Google Scholar
Clark, H.H., Wilkes-Gibbs, D.: Referring as a collaborative process. Cognition 22(1), 1–39 (1986). http://www.sciencedirect.com/science/article/pii/0010027786900107
Article Google Scholar
Dillenbourg, P., Traum, D.: Sharing solutions: persistence and grounding in multimodal collaborative problem solving. J. Learn. Sci. 15(1), 121–151 (2006)
Article Google Scholar
Fussell, S.R., Kraut, R.E., Siegel, J.: Coordination of communication: effects of shared visual context on collaborative work. In: Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, CSCW 2000, pp. 21–30. ACM, New York (2000). http://doi.acm.org/10.1145/358916.358947
Fussell, S.R., Setlock, L.D., Yang, J., Ou, J., Mauer, E., Kramer, A.D.I.: Gestures over video streams to support remote collaboration on physical tasks. Hum. Comput. Interact. 19(3), 273–309 (2004)
Article Google Scholar
Kraut, R.E., Fussell, S.R., Siegel, J.: Visual information as a conversational resource in collaborative physical tasks. Hum. Comput. Interact. 18(1), 13–49 (2003)
Article Google Scholar
Gergle, D., Kraut, R.E., Fussell, S.R.: Action as language in a shared visual space. In: Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, CSCW 2004, pp. 487–496. ACM, New York (2004). http://doi.acm.org/10.1145/1031607.1031687
Reeves, L.M., Lai, J., Larson, J.A., Oviatt, S., Balaji, T., Buisine, S., Collings, P., Cohen, P., Kraal, B., Martin, J.-C.: Guidelines for multimodal user interface design. Commun. ACM 47(1), 57–59 (2004)
Article Google Scholar
Veinott, E.S., Olson, J., Olson, G.M., Fu, X.: Video helps remote work: speakers who need to negotiate common ground benefit from seeing each other. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 1999, pp. 302–309. ACM, New York (1999). http://doi.acm.org/10.1145/302979.303067
Lascarides, A., Stone, M.: Formal semantics for iconic gesture. In: Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue (BRANDIAL), pp. 64–71 (2006)
Google Scholar
Clair, A.S., Mead, R., Matarić, M.J., et al.: Monitoring and guiding user attention and intention in human-robot interaction. In: ICRA-ICAIR Workshop, Anchorage, AK, USA, vol. 1025 (2010)
Google Scholar
Matuszek, C., Bo, L., Zettlemoyer, L., Fox, D.: Learning from unscripted deictic gesture and language for human-robot interactions. In: AAAI, pp. 2556–2563 (2014)
Google Scholar
Krishnaswamy, N., Pustejovsky, J.: Multimodal semantic simulations of linguistically underspecified motion events. In: Spatial Cognition X: International Conference on Spatial Cognition. Springer (2016)
Google Scholar
Gilbert, M.: On Social Facts. Princeton University Press, Princeton (1992)
Google Scholar
Stalnaker, R.: Common ground. Linguist. Philos. 25(5), 701–721 (2002)
Article Google Scholar
Asher, N., Gillies, A.: Common ground, corrections, and coordination. Argumentation 17(4), 481–512 (2003)
Article Google Scholar
Tomasello, M., Carpenter, M.: Shared intentionality. Dev. Sci. 10(1), 121–125 (2007)
Article Google Scholar
Bergen, B.K.: Louder than words: the new science of how the mind makes meaning. In: Basic Books (AZ) (2012)
Google Scholar
Hsiao, K.-Y., Tellex, S., Vosoughi, S., Kubat, R., Roy, D.: Object schemas for grounding language in a responsive robot. Connection Sci. 20(4), 253–276 (2008)
Article Google Scholar
Dzifcak, J., Scheutz, M., Baral, C., Schermerhorn, P.: What to do and how to do it: translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In: IEEE International Conference on Robotics and Automation, ICRA 2009, pp. 4163–4168. IEEE (2009)
Google Scholar
Cangelosi, A.: Grounding language in action and perception: from cognitive agents to humanoid robots. Phys. Life Rev. 7(2), 139–151 (2010)
Article Google Scholar
Siskind, J.M.: Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. J. Artif. Intell. Res. (JAIR) 15, 31–90 (2001)
Article Google Scholar
Wang, I., Narayana, P., Patil, D., Mulay, G., Bangar, R., Draper, B., Beveridge, R., Ruiz, J.: Eggnog: a continuous, multi-modal data set of naturally occurring gestures with ground truth labels. In: 12th IEEE International Conference on Automatic Face and Gesture Recognition (2017)
Google Scholar
Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, New York (2004)
Book Google Scholar
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE MultiMedia 19, 4–10 (2012)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Krishnaswamy, N., Pustejovsky, J.: VoxSim: a visual platform for modeling motion language. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. ACL (2016)
Google Scholar
Goldstone, W.: Unity Game Development Essentials. Packt Publishing Ltd., Birmingham (2009)
Google Scholar
Pustejovsky, J., Moszkowicz, J.: The qualitative spatial dynamics of motion. J. Spat. Cogn. Comput. 11, 15–44 (2011)
Article Google Scholar
Pustejovsky, J., Krishnaswamy, N.: VoxML: a visualization modeling language. In: Chair, N.C.C., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, May 2016
Google Scholar
Pustejovsky, J.: Dynamic event structure and habitat theory. In: Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013), pp. 1–10. ACL (2013)
Google Scholar
McDonald, D., Pustejovsky, J.: On the representation of inferences and their lexicalization. In: Advances in Cognitive Systems, vol. 3 (2014)
Google Scholar
Pustejovsky, J.: The Generative Lexicon (1995)
Google Scholar
Narayana, P., Beveridge, R., Draper, B.: Gesture recognition: focus on the hands. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Hirst, G., McRoy, S., Heeman, P., Edmonds, P., Horton, D.: Repairing conversational misunderstandings and non-understandings. Speech Commun. 15(3), 213–229 (1994). http://www.sciencedirect.com/science/article/pii/0167639394900736
Article Google Scholar
Ponce-López, V., Chen, B., Oliu, M., Corneanu, C., Clapés, A., Guyon, I., Baró, X., Escalante, H.J., Escalera, S.: Chalearn lap 2016: first round challenge on first impressions - dataset and results. In: ECCV, pp. 400–418 (2016)
Google Scholar

Download references

Acknowledgments

This work was supported by the US Defense Advanced Research Projects Agency (DARPA) and the Army Research Office (ARO) under contract #W911NF-15-1-0459 at Colorado State University and the University of Florida and contract #W911NF-15-C-0238 at Brandeis University.

Author information

Authors and Affiliations

Department of Computer Science, Colorado State University, Fort Collins, CO, USA
Pradyumna Narayana, Rahul Bangar, Dhruva Patil, Gururaj Mulay, Ross Beveridge & Bruce Draper
Department of Computer Science, Brandeis University, Waltham, MA, USA
Nikhil Krishnaswamy, Kyeongmin Rim & James Pustejovsky
Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
Isaac Wang & Jaime Ruiz

Authors

Pradyumna Narayana
View author publications
You can also search for this author in PubMed Google Scholar
Nikhil Krishnaswamy
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Bangar
View author publications
You can also search for this author in PubMed Google Scholar
Dhruva Patil
View author publications
You can also search for this author in PubMed Google Scholar
Gururaj Mulay
View author publications
You can also search for this author in PubMed Google Scholar
Kyeongmin Rim
View author publications
You can also search for this author in PubMed Google Scholar
Ross Beveridge
View author publications
You can also search for this author in PubMed Google Scholar
Jaime Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
James Pustejovsky
View author publications
You can also search for this author in PubMed Google Scholar
Bruce Draper
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pradyumna Narayana .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Narayana, P. et al. (2019). Cooperating with Avatars Through Gesture, Language and Action. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-01054-6_20
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01053-9
Online ISBN: 978-3-030-01054-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics