Detecting Address Estimation Errors from Users’ Reactions in Multi-user Agent Conversation

  • Ryo Hotta
  • Hung-Hsuan Huang
  • Shochi Otogi
  • Kyoji Kawagoe
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8511)


Nowadays, embodied conversational agents are gradually getting deployed in real-world applications like the guides in museums or exhibitions. In these applications, it is necessary for the agent to identify the addressee of each user utterance to deliberate appropriate responses in interacting with visitor groups. However, as long as the addressee identification mechanism is not completely correct, the agent makes error in its responses. Once there is an error, the agent’s hypothesis collapses and the following decision-making path may go to a totally different direction. We are working on developing the mechanism to detect the error from the users’ reactions and the mechanism to recover the error. This paper presents the first step, a method to detect laughing, surprises, and confused facial expressions after the agent’s wrong responses. This method is machine learning base with the data (user reactions) collected in a WOZ (Wizard of Oz) experiment and reached an accuracy over 90%.


Multi-party conversation human-agent interaction Gaze 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kendon, A.: Some functions of gaze direction in social interaction. Acta Psychologica 26, 22–63 (1967)CrossRefGoogle Scholar
  2. 2.
    Duncan, S.: Some signals and rules for taking speaking turns in conversations. Journal of Personality and Psychology 23(2), 283–292 (1972)CrossRefGoogle Scholar
  3. 3.
    Sacks, H., Schegloff, E.A., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50(4), 696–735 (1974)CrossRefGoogle Scholar
  4. 4.
    Terken, J., Joris, I., Valk, L.D.: Multimodalcues for addressee-hood in triadic communication with a human information retrieval agent. In: Proceedings of the 9th International Conference on Multimodal Interfaces, ICMI 2007 (2007)Google Scholar
  5. 5.
    Katzenmaier, M., Stiefelhagen, R., Schultz, T.: Identifying the addressee in human-human-robot interactions based on head pose and speech. In: Proceedings of the 6th International Conference on Multimodal Interfaces, ICM 2004 (2004)Google Scholar
  6. 6.
    Huang, H.H., Baba, N., Nakano, Y.: Making virtual conversational agent aware of the addressee of users’ utterances in multi-user conversation from nonverbal information. In: 13th International Conference on Multimodal Interaction (ICMI 2011), pp. 401–408 (2011)Google Scholar
  7. 7.
    Baba, N., Huang, H.H., Nakano, Y.: Addressee identification for human-human-agent multiparty conversations in different proxemics. In: 14th International Conference on Multimodal Interaction (ICMI 2012), 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Eye Gaze and Multimodality (2012)Google Scholar
  8. 8.
    Nakano, Y., Baba, N., Huang, H.H., Hayashi, Y.: Implementation and evaluation of multimodal addressee identification mechanism for multiparty conversation systems. In: 15th International Conference on Multimodal Interaction, ICMI 2013 (2013)Google Scholar
  9. 9.
    Argyle, M., Cook, M.: Gaze and Mutual Gaze. Cambridge University Press (1976)Google Scholar
  10. 10.
    Vertegaal, R., Slagter, R., van der Veer, G., Nijholt, A.: Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 301–308 (2001)Google Scholar
  11. 11.
    Takemae, Y., Otsuka, K., Mukawa, N.: Video cut editing rule based on participants’ gaze in multiparty conversation. In: 11th ACM International Conference on Multimedia (2003)Google Scholar
  12. 12.
    Lunsford, R., Oviatt, S.: Human perception of intended addressee during computer-assisted meetings. In: Proceedings of the 8th International Conference on Multimodal Interfaces, ICMI 2006, pp. 20–27. ACM, New York (2006)Google Scholar
  13. 13.
    Dowding, J., Alena, R., Clancey, W.J., Sierhuis, M., Graham, J.: Are you talking to me? dialogue systems supporting mixed teams of humans and robots. In: AAAI Fall Symposium (2006)Google Scholar
  14. 14.
    Rodriguez, H., Beck, D., Lind, D., Lok, B.: Audio analysis of human/virtual-human interaction. In: Prendinger, H., Lester, J.C., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 154–161. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  15. 15.
    Ekman, P., Friesen, W.V., Hager, J.C.: Facial action coding system (facs). Website (2002)Google Scholar
  16. 16.
    Visage Technologies AB: Visage|SDK. Website (2008),
  17. 17.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. ACM SIGKDD Explorations 11(1), 11–18 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ryo Hotta
    • 1
  • Hung-Hsuan Huang
    • 1
  • Shochi Otogi
    • 1
  • Kyoji Kawagoe
    • 1
  1. 1.Graduate School of Information Science & EngineeringRitsumeikan UniversityJapan

Personalised recommendations