Predicting Turn-Taking by Compact Gazing Transition Patterns in Multiparty Conversation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10749)


Gaze behavior plays an important role for analyzing turn-taking in multiparty conversation. In this study, we propose a general and powerful model for predicting turn-taking by analyzing gaze transition patterns in four-participant conversation. We propose gaze labels of different speaker’s and listener’s gaze movements and then code every gaze transition pattern to a two-label pattern. After that, we analyze the gaze transition patterns by quantitative analysis to confirm their effectiveness. Finally, we build up a prediction model for predicting turn-taking based on these gaze transition patterns. Experiments demonstrate that the prediction results obtained by our model are superior to the state-of-the-art.


Multiparty conversation Gaze behavior analysis Turn-taking Nonverbal behaviors Gaze transition pattern 


  1. 1.
    Bohus, D., Horvitz, E.: Decisions about turns in multiparty conversation: from perception to action. In: Proceedings of International Conference on Multimodal Interfaces, pp. 153–160 (2011)Google Scholar
  2. 2.
    Chen, L., Harper, M.P.: Multimodal floor control shift detection. In: Proceedings of International Conference on Multimodal Interfaces, pp. 15–22 (2009)Google Scholar
  3. 3.
    Dan, B., Horvitz, E.: Multiparty turn taking in situated dialog: study, lessons, and directions. In: Proceedings of Annual Meeting of the Special Interest Group in Discourse and Dialogue, pp. 98–109 (2011)Google Scholar
  4. 4.
    Dielmann, A., Garau, G., Bourlard, H.: Floor holder detection and end of speaker turn prediction in meetings. In: International Conference on Speech and Language Processing, Interspeech (2010)Google Scholar
  5. 5.
    Duncan, S.: Some signals and rules for taking speaking turns in conversations. J. Pers. Soc. Psychol. 23(2), 283–292 (1972)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Ferrer, L., Shriberg, E., Stolcke, A.: Is the speaker done yet? faster and more accurate end-of-utterance detection using prosody. In: Proceedings of ICSLP, p. 2002 (2002)Google Scholar
  7. 7.
    Gatica-Perez, D.: Analyzing group interactions in conversations: a review. In: 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp. 41–46 (2006)Google Scholar
  8. 8.
    Goodwin, C.: Restarts, pauses, and the achievement of a state of mutual gaze at turn beginning. Sociol. Inq. 50, 272–302 (1980)CrossRefGoogle Scholar
  9. 9.
    Gorga, S., Otsuka, K.: Conversation scene analysis based on dynamic Bayesian network and image-based gaze detection. In: Proceedings of International Conference on Multimodal Interfaces (2010)Google Scholar
  10. 10.
    Haberman, S.J.: The analysis of residuals in cross-classified tables. Biometrics 29, 205–220 (1973)CrossRefGoogle Scholar
  11. 11.
    Ishii, R., Kumano, S., Otsuka, K.: Predicting next speaker based on head movement in multi-party meetings. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)Google Scholar
  12. 12.
    Ishii, R., Otsuka, K., Kumano, S., Matsuda, M., Yamato, J.: Predicting next speaker and timing from gaze transition patterns in multi-party meetings, pp. 79–86 (2013).
  13. 13.
    Jokinen, K., Harada, K., Nishida, M., Yamamoto, S.: Turn-alignment using eye-gaze and speech in conversational interaction. In: Annual Conference of the International Speech Communication Association, pp. 2018–2021 (2010)Google Scholar
  14. 14.
    Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychologica 26(1), 22–63 (1967)CrossRefGoogle Scholar
  15. 15.
    de Kok, I., Heylen, D.: Multimodal end-of-turn prediction in multi-party meetings. In: Proceedings of the 2009 International Conference on Multimodal Interfaces, ICMI-MLMI 2009. ACM, New York, pp. 91–98 (2009).
  16. 16.
    Kumano, S., Otsuka, K., Dan, M., Yamato, J.: Recognizing communicative facial expressions for discovering interpersonal emotions in group meetings. In: Proceedings International Conference on Multimodal Interaction, pp. 99–106 (2009)Google Scholar
  17. 17.
    Laskowski, K., Edlund, J., Heldner, M.: A single-port non-parametric model of turn-taking in multi-party conversation. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, 1988. ICASSP-88, pp. 5600–5603 (2011)Google Scholar
  18. 18.
    Levow, G.A.: Turn-taking in mandarin dialogue: interactions of tone and intonation. In: Proceedings of the SIGHAN Workshop (2005)Google Scholar
  19. 19.
    Wiemann, J.M., Mark, L.K.: Turn-taking in conversations. J. Commun. 25(2), 75–92 (1975)Google Scholar
  20. 20.
    Otsuka, K.: Conversational scene analysis. IEEE Sig. Process. Mag. 28, 127–131 (2011)CrossRefGoogle Scholar
  21. 21.
    Otsuka, K., Araki, S., Ishizuka, K., Fujimoto, M., Heinrich, M., Yamato, J.: A realtime multimodal system for analyzing group meetings by combining face pose tracking and speaker diarization. In: Proceedings of International Conference on Multimodal Interfaces, pp. 257–264 (2008)Google Scholar
  22. 22.
    Otsuka, K., Takemae, Y., Yamato, J.: A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In: Proceedings of Internetional Conference on Multimodal Interfaces, pp. 191–198 (2005)Google Scholar
  23. 23.
    Raux, A., Eskenazi, M.: A finite-state turn-taking model for spoken dialog systems. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 629–637 (2009)Google Scholar
  24. 24.
    Sacks, H., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50(4), 696–735 (1974)CrossRefGoogle Scholar
  25. 25.
    Schlangen, D.: From reaction to prediction experiments with computational models of turn-taking. In: Proceedings of Interspeech 2006, Panel on Prosody of Dialogue Acts and Turn-Taking (2006)Google Scholar
  26. 26.
    Thrisson, K.R.: Natural turn-taking needs no manual: computational theory and model, from perception to action. In: Granström, B., House, D., Karlsson, I. (eds.) Multimodality in Language and Speech Systems. Text, Speech and Language Technology, vol. 19. Springer, Dordrecht (2002). Google Scholar
  27. 27.
    Traum, D., Rickel, J.: Embodied agents for multi-party dialogue in immersive virtual worlds. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2, pp. 766–773 (2002)Google Scholar
  28. 28.
    Traum, D.R.: A computational theory of grounding in natural language conversation (1994)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Foshan UniversityFoshanChina
  2. 2.South China University of TechnologyGuangzhouChina

Personalised recommendations