Probabilistic Inference of Gaze Patterns and Structure of Multiparty Conversations from Head Directions and Utterances

Otsuka, Kazuhiro; Takemae, Yoshinao; Yamato, Junji; Murase, Hiroshi

doi:10.1007/11780496_38

Kazuhiro Otsuka^7,9,
Yoshinao Takemae⁸,
Junji Yamato⁷ &
…
Hiroshi Murase⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4012))

Included in the following conference series:

Annual Conference of the Japanese Society for Artificial Intelligence

695 Accesses
2 Citations

Abstract

A novel probabilistic framework is proposed for inferring gaze patterns and the structure of conversation in face-to-face multiparty communication, based on head directions and the presence/absence of utterances of participants. First, we define three classes of conversational regimes, which are characterized by the topology of the gaze pattern; we assume that they indicate the structure of the conversation, i.e. who is talking to whom. Next, the problem is formulated as joint estimation of both regime state from the gaze pattern and utterance, and the gaze pattern from head directions. We then devise a dynamic Bayesian network, called the Markov-switching model. The regime changes over time are based on Markov transitions, and controls the dynamics of the gaze patterns and utterances. Furthermore, Bayesian estimation of regime, gaze pattern, and model parameters are implemented using a Markov chain Monte Carlo method. Experiments on four-person conversations confirm accurate gaze estimation and the effectiveness of the framework toward identification of the conversation structures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cutler, R., Rui, Y., Gupta, A., Cadiz, J., Tashev, T., He, L., Colburn, A., Zhang, Z., Liu, Z., Silverberg, S.: Distributed meetings: A meeting capture and broadcasting system. In: Proc. ACM Multimedia 2002, pp. 503–512 (2002)
Google Scholar
Bett, M., Gross, R., Yu, H., Zhu, X., Pan, Y., Yang, J., Waibel, A.: Multimodal meeting tracker. In: Proc. RIAO 2000: Content-Based Multimodal Inform. Access (2000)
Google Scholar
Heylen, D., Es, I.V., Nijholt, A., Dijk, B.V.: Experimenting with the gaze of a conversational agent. In: Proc. Int. CLASS Workshop on Natural Intelligent and Effective Interaction in Multimodal Dialogue Systems, pp. 93–100 (2002)
Google Scholar
McCowan, I., Perez, D., Bengio, S., Lathoud, G., Barnard, M., Zhang, D.: Automatic analysis of multimodal group actions in meetings. IEEE Trans. PAMI 27 (2005)
Google Scholar
Zhang, D., Perez, D.G., Bengio, S., McCowan, I., Lathoud, G.: Modeling individual and group actions in meetings: A two-layer HMM framework. In: Proc. 2nd. IEEE Workshop on Event Mining (2004)
Google Scholar
Clark, H.H., Carlson, T.B.: Hearers and speech acts. Language 58, 332–373 (1982)
Article Google Scholar
Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychologica 26, 22–63 (1967)
Article Google Scholar
Argyle, M., Cook, M.: Gaze and Mutual Gaze. Cambridge University Press, Cambridge (1976)
Google Scholar
Jovanovic, N., Akker, R.: Towards automatic addressee identification in multi-party dialogues. In: Proc. SIGdial 2004, pp. 89–92 (2004)
Google Scholar
Takemae, Y., Otsuka, K., Mukawa, N.: An analysis of speakers’ gaze behavior for automatic addressee identification in multiparty conversation and its application to video editing. In: Proc. of IEEE Int. Workshop on Robot and Human Interactive Communication (IEEE/RO-MAN), pp. 581–586 (2004)
Google Scholar
Ohno, T., Mukawa, N.: A free-head, simple calibration, gaze tracking system that enables gaze-based interaction. In: Proc. Eye Tracking Research & Application Symposium (ETRA) 2004, pp. 115–122 (2004)
Google Scholar
Matsumoto, Y., Zelinsky, A.: An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In: Proc. Int. Conf. Automatic Face and Gesture Recognition 2004, pp. 499–504 (2000)
Google Scholar
Stiefelhagen, R., Yang, J., Waibel, A.: Modeling focus of attention for meeting index based on multiple cues. IEEE Trans. Neural Networks 13 (2002)
Google Scholar
Reidsma, D., Akker, R., Rienks, R., Poppe, R., Nijholt, A., Heylen, D., Zwiers, J.: Virtual meeting rooms: From observation to simulation. Proc. Social Intelli. Design (2005)
Google Scholar
Morency, L.-P., Rahimi, A., Darrell, T.: Adaptive view-based appearance model. In: Proc. CVPR 2003, pp. 803–810 (2003)
Google Scholar
Kim, C.-J., Nelson, C.R.: State-Space Models with Regime Switching. MIT Press, Cambridge (1999)
Google Scholar
Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC (1996)
Google Scholar
Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian computer vision system for modeling human interactions. IEEE Trans. PAMI 22 (2000)
Google Scholar
Takemae, Y., Otsuka, K., Mukawa, N.: Impact of video editing based on participants’ gaze in multiparty conversation. In: Proc. ACM CHI 2004, pp. 1333–1336 (2004)
Google Scholar
Novic, D.G., Hansen, B., Ward, K.: Coordinating turn-taking with gaze. In: Proc. Int. Conf. Spoken Language 1996, pp. 1888–1891 (1996)
Google Scholar
Chen, R., Li, T.-H.: Blind restoration of linearly degraded discrete signals by Gibbs sampling. IEEE Trans. Signal Processing 43, 2410–2413 (1995)
Article Google Scholar
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. John Wiley & Sons, Chichester (1994)
Book Google Scholar

Download references

Author information

Authors and Affiliations

NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, 243-0198, Japan
Kazuhiro Otsuka & Junji Yamato
NTT Cyber Solutions Laboratories, Nippon Telegraph and Telephone Corporation, Yokosuka, 239-0847, Japan
Yoshinao Takemae
Graduate School of Information Science, Nagoya University, Nagoya, 464-8601, Japan
Kazuhiro Otsuka & Hiroshi Murase

Authors

Kazuhiro Otsuka
View author publications
You can also search for this author in PubMed Google Scholar
Yoshinao Takemae
View author publications
You can also search for this author in PubMed Google Scholar
Junji Yamato
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Murase
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISIR, Osaka University, 8-1, Mihogaoka, 567-0047, Osaka, Ibaraki, Japan
Takashi Washio
CREST, Japan Science and Technology Agency, Japan
Akito Sakurai
Dept. of Information Media, Tokyo Denki University, room 1009, 10th floor, building 11, 2-2, Kandanishikicho, Chiyodaku, 101-8457, Tokyo, Japan
Katsuto Nakajima
National Institute of Informatics, 2-1-2 Chiyoda-ku, 101-8430, Tokyo, Japan
Hideaki Takeda
Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Satoshi Tojo
Department of ISEE, Kyushu University, 819-0395, Fukuoka, Japan
Makoto Yokoo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Otsuka, K., Takemae, Y., Yamato, J., Murase, H. (2006). Probabilistic Inference of Gaze Patterns and Structure of Multiparty Conversations from Head Directions and Utterances. In: Washio, T., Sakurai, A., Nakajima, K., Takeda, H., Tojo, S., Yokoo, M. (eds) New Frontiers in Artificial Intelligence. JSAI 2005. Lecture Notes in Computer Science(), vol 4012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780496_38

Download citation

DOI: https://doi.org/10.1007/11780496_38
Published: 29 June 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35470-3
Online ISBN: 978-3-540-35471-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics