Abstract
A novel probabilistic framework is proposed for inferring gaze patterns and the structure of conversation in face-to-face multiparty communication, based on head directions and the presence/absence of utterances of participants. First, we define three classes of conversational regimes, which are characterized by the topology of the gaze pattern; we assume that they indicate the structure of the conversation, i.e. who is talking to whom. Next, the problem is formulated as joint estimation of both regime state from the gaze pattern and utterance, and the gaze pattern from head directions. We then devise a dynamic Bayesian network, called the Markov-switching model. The regime changes over time are based on Markov transitions, and controls the dynamics of the gaze patterns and utterances. Furthermore, Bayesian estimation of regime, gaze pattern, and model parameters are implemented using a Markov chain Monte Carlo method. Experiments on four-person conversations confirm accurate gaze estimation and the effectiveness of the framework toward identification of the conversation structures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cutler, R., Rui, Y., Gupta, A., Cadiz, J., Tashev, T., He, L., Colburn, A., Zhang, Z., Liu, Z., Silverberg, S.: Distributed meetings: A meeting capture and broadcasting system. In: Proc. ACM Multimedia 2002, pp. 503–512 (2002)
Bett, M., Gross, R., Yu, H., Zhu, X., Pan, Y., Yang, J., Waibel, A.: Multimodal meeting tracker. In: Proc. RIAO 2000: Content-Based Multimodal Inform. Access (2000)
Heylen, D., Es, I.V., Nijholt, A., Dijk, B.V.: Experimenting with the gaze of a conversational agent. In: Proc. Int. CLASS Workshop on Natural Intelligent and Effective Interaction in Multimodal Dialogue Systems, pp. 93–100 (2002)
McCowan, I., Perez, D., Bengio, S., Lathoud, G., Barnard, M., Zhang, D.: Automatic analysis of multimodal group actions in meetings. IEEE Trans. PAMIÂ 27 (2005)
Zhang, D., Perez, D.G., Bengio, S., McCowan, I., Lathoud, G.: Modeling individual and group actions in meetings: A two-layer HMM framework. In: Proc. 2nd. IEEE Workshop on Event Mining (2004)
Clark, H.H., Carlson, T.B.: Hearers and speech acts. Language 58, 332–373 (1982)
Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychologica 26, 22–63 (1967)
Argyle, M., Cook, M.: Gaze and Mutual Gaze. Cambridge University Press, Cambridge (1976)
Jovanovic, N., Akker, R.: Towards automatic addressee identification in multi-party dialogues. In: Proc. SIGdial 2004, pp. 89–92 (2004)
Takemae, Y., Otsuka, K., Mukawa, N.: An analysis of speakers’ gaze behavior for automatic addressee identification in multiparty conversation and its application to video editing. In: Proc. of IEEE Int. Workshop on Robot and Human Interactive Communication (IEEE/RO-MAN), pp. 581–586 (2004)
Ohno, T., Mukawa, N.: A free-head, simple calibration, gaze tracking system that enables gaze-based interaction. In: Proc. Eye Tracking Research & Application Symposium (ETRA) 2004, pp. 115–122 (2004)
Matsumoto, Y., Zelinsky, A.: An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In: Proc. Int. Conf. Automatic Face and Gesture Recognition 2004, pp. 499–504 (2000)
Stiefelhagen, R., Yang, J., Waibel, A.: Modeling focus of attention for meeting index based on multiple cues. IEEE Trans. Neural Networks 13 (2002)
Reidsma, D., Akker, R., Rienks, R., Poppe, R., Nijholt, A., Heylen, D., Zwiers, J.: Virtual meeting rooms: From observation to simulation. Proc. Social Intelli. Design (2005)
Morency, L.-P., Rahimi, A., Darrell, T.: Adaptive view-based appearance model. In: Proc. CVPR 2003, pp. 803–810 (2003)
Kim, C.-J., Nelson, C.R.: State-Space Models with Regime Switching. MIT Press, Cambridge (1999)
Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC (1996)
Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian computer vision system for modeling human interactions. IEEE Trans. PAMIÂ 22 (2000)
Takemae, Y., Otsuka, K., Mukawa, N.: Impact of video editing based on participants’ gaze in multiparty conversation. In: Proc. ACM CHI 2004, pp. 1333–1336 (2004)
Novic, D.G., Hansen, B., Ward, K.: Coordinating turn-taking with gaze. In: Proc. Int. Conf. Spoken Language 1996, pp. 1888–1891 (1996)
Chen, R., Li, T.-H.: Blind restoration of linearly degraded discrete signals by Gibbs sampling. IEEE Trans. Signal Processing 43, 2410–2413 (1995)
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. John Wiley & Sons, Chichester (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Otsuka, K., Takemae, Y., Yamato, J., Murase, H. (2006). Probabilistic Inference of Gaze Patterns and Structure of Multiparty Conversations from Head Directions and Utterances. In: Washio, T., Sakurai, A., Nakajima, K., Takeda, H., Tojo, S., Yokoo, M. (eds) New Frontiers in Artificial Intelligence. JSAI 2005. Lecture Notes in Computer Science(), vol 4012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780496_38
Download citation
DOI: https://doi.org/10.1007/11780496_38
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35470-3
Online ISBN: 978-3-540-35471-0
eBook Packages: Computer ScienceComputer Science (R0)