Towards Computer Understanding of Human Interactions

McCowan, Iain; Gatica-Perez, Daniel; Bengio, Samy; Moore, Darren; Bourlard, Hervé

doi:10.1007/978-3-540-39863-9_18

Towards Computer Understanding of Human Interactions

Iain McCowan⁸,
Daniel Gatica-Perez⁸,
Samy Bengio⁸,
Darren Moore⁸ &
…
Hervé Bourlard⁸

Conference paper

705 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2875))

Abstract

People meet in order to interact – disseminating information, making decisions, and creating new ideas. Automatic analysis of meetings is therefore important from two points of view: extracting the information they contain, and understanding human interaction processes. Based on this view, this article presents an approach in which relevant information content of a meeting is identified from a variety of audio and visual sensor inputs and statistical models of interacting people. We present a framework for computer observation and understanding of interacting people, and discuss particular tasks within this framework, issues in the meeting context, and particular algorithms that we have adopted. We also comment on current developments and the future challenges in automatic meeting analysis.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Waibel, A., Schultz, T., Bett, M., Malkin, R., Rogina, I., Stiefelhagen, R., Yang, J.: SMaRT:the Smart Meeting Room Task at ISL. In: Proc. IEEE ICASSP 2003 (2003)
Google Scholar
Bobick, A., Intille, S., Davis, J., Baird, F., Pinhanez, C., Campbell, L., Ivanov, Y., Schutte, A., Wilson, A.: The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment. PRESENCE: Teleoperators and Virtual Environments 8 (August 1999)
Google Scholar
Johnson, N., Galata, A., Hogg, D.: The acquisition and use of interaction behaviour models. In: Proc. IEEE Int. Conference on Computer Vision and Pattern Recognition (June 1998)
Google Scholar
Jebara, T., Pentland, A.: Action reaction learning: Automatic visual analysis and synthesis of interactive behaviour. In: Proc. International Conference on Vision Systems (January 1999)
Google Scholar
Oliver, N., Rosario, B., Pentland, A.: A bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (August 2000)
Google Scholar
Hongeng, S., Nevatia, R.: Multi-agent event recognition. In: Proc. IEEE Int. Conference on Computer Vision (Vancouver) (July 2001)
Google Scholar
Carletta, J., Isard, A., Isard, S., Kowtko, J., Doherty-Sneddon, G., Anderson, A.: The coding of dialogue structure in a corpus. In: Andernach, J., van de Burgt, S., van der Hoeven, G. (eds.) Proceedings of the Twente Workshop on Language Technology: Corpus-based approaches to dialogue modelling, Universiteit Twente (1995)
Google Scholar
Morgan, N., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Janin, A., Pfau, T., Shriberg, E., Stolcke, A.: The meeting project at ICSI. In: Proc. of the Human Language Technology Conference, San Diego, CA (March 2001)
Google Scholar
Bales, R.F.: Interaction Process Analysis: A method for the study of small groups. Addison-Wesley, Reading (1951)
Google Scholar
McGrath, J.E.: Groups: Interaction and Performance. Prentice-Hall, Englewood Cliffs (1984)
Google Scholar
McGrath, J., Kravitz, D.: Group research. Annual Review of Psychology 33, 195–230 (1982)
Article Google Scholar
Padilha, E., Carletta, J.C.: A simulation of small group discussion. In: EDILOG (2002)
Google Scholar
Parker, K.C.H.: Speaking turns in small group interaction: A context-sensitive event sequence model. Journal of Personality and Social Psychology 54(6), 965–971 (1988)
Article Google Scholar
Fay, N., Garrod, S., Carletta, J.: Group discussion as interactive dialogue or serial monologue: The influence of group size. Psychological Science 11(6), 487–492 (2000)
Article Google Scholar
Novick, D., Hansen, B., Ward, K.: Coordinating turn-taking with gaze. In: Proceedings of the 1996 International Conference on Spoken Language Processing, ICSLP 1996 (1996)
Google Scholar
Krauss, R., Garlock, C., Bricker, P., McMahon, L.: The role of audible and visible back-channel responses in interpersonal communication. Journal of Personality and Social Psychology 35(7), 523–529 (1977)
Article Google Scholar
DePaulo, B., Rosenthal, R., Eisenstat, R., Rogers, P., Finkelstein, S.: Decoding discrepant nonverbal cues. Journal of Personality and Social Psychology 36(3), 313–323 (1978)
Article Google Scholar
Kubala, F.: Rough’n’ready: a meeting recorder and browser. ACM Computing Surveys 31 (1999)
Google Scholar
Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H., Zechner, K.: Advances in automatic meeting record creation and access. In: Proc. IEEE ICASSP, Salt Lake City, UT (May 2001)
Google Scholar
Renals, S., Ellis, D.: Audio information access from meeting rooms. In: Proc. IEEE ICASSP 2003 (2003)
Google Scholar
Cutler, R., Rui, Y., Gupta, A., Cadiz, J., Tashev, I., He, L., Colburn, A., Zhang, Z., Liu, Z., Silverberg, S.: Distributed meetings: A meeting capture and broadcasting system. In: Proc. ACM Multimedia Conference (2002)
Google Scholar
Gatica-Perez, D., Lathoud, G., McCowan, I., Odobez, J.-M.: A mixed-state i-particle filter for multi-camera speaker tracking. In: Proceedings of WOMTEC (September 2003)
Google Scholar
Doucet, A., de Freitas, N., Gordon, N.: Sequential Monte Carlo Methods in Practice. Springer, Heidelberg (2001)
MATH Google Scholar
Cutler, R.: The distributed meetings system. In: Proceedings of IEEE ICASSP 2003 (2003)
Google Scholar
Stanford, V., Garofolo, J., Michel, M.: The nist smart space and meeting room projects: Signals, acquisition, annotation, and metrics. In: Proceedings of IEEE ICASSP 2003 (2003)
Google Scholar
Silverman, H., Patterson, W., Flanagan, J., Rabinkin, D.: A digital processing system for source location and sound capture by large microphone arrays. In: Proceedings of ICASSP 1997 (April 1997)
Google Scholar
Shriberg, E., Stolcke, A., Baron, D.: Observations on overlap: findings and implications for automatic processing of multi-party conversation. In: Proceedings of Eurospeech 2001, vol. 2, pp. 1359–1362 (2001)
Google Scholar
Pfau, T., Ellis, D., Stolcke, A.: Multispeaker speech activity detection for the ICSI meeting recorder. In: Proceedings of ASRU 2001 (2001)
Google Scholar
Kemp, T., Schmidt, M., Westphal, M., Waibel, A.: Strategies for automatic segmentation of audio data. In: Proceedings of ICASSP 2000 (2000)
Google Scholar
Lathoud, G., McCowan, I.: Location based speaker segmentation. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (April 2003)
Google Scholar
Lathoud, G., McCowan, I., Moore, D.: Segmenting multiple concurrent speakers using microphone arrays. In: Proceedings of Eurospeech 2003 (September 2003)
Google Scholar
Bitzer, J., Simmer, K.U.: Superdirective microphone arrays. In: Brandstein, M., Ward, D. (eds.) Microphone Arrays, ch. 2, pp. 19–38. Springer, Heidelberg (2001)
Google Scholar
McCowan, I., Bourlard, H.: Microphone array post-filter based on noise field coherence. To appear in IEEE Transactions on Speech and Audio Processing (November 2003)
Google Scholar
Moore, D., McCowan, I.: Microphone array speech recognition: Experiments on overlapping speech in meetings. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (April 2003)
Google Scholar
Jain, A., Bolle, R., Pankanti, S.: Biometrics: Person Identification in Networked Society. Kluwer, Dordrecht (1999)
Google Scholar
Mariéthoz, J., Bengio, S.: A comparative study of adaptation methods for speaker verification. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP (2002)
Google Scholar
Marcel, S., Bengio, S.: Improving face verification using skin color information. In: Proceedings of the 16th International Conference on Pattern Recognition, ICPR, IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Sanderson, C., Paliwal, K.: Polynomial Features for Robust Face Authentication. In: Proceedings of International Conference on Image Processing, vol. 3, pp. 997–1000 (2002)
Google Scholar
Bengio, S., Marcel, C., Marcel, S., Mariéthoz, J.: Confidence measures for multimodal identity verification. Information Fusion 3(4), 267–276 (2002)
Article Google Scholar
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Starner, T., Pentland, A.: Visual recognition of american sign language using HMMs. In: Proc. Int. Work. on Auto. Face and Gesture Recognition, Zurich (1995)
Google Scholar
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Transactions on Multimedia 2, 141–151 (2000)
Article Google Scholar
Bengio, S.: An asynchronous hidden markov model for audio-visual speech recognition. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, NIPS 15, MIT Press, Cambridge (2003)
Google Scholar
McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G.: Automatic analysis of multimodal group actions in meetings. Tech. Rep. RR 03–27, IDIAP (2003)
Google Scholar
De Gelder, B., Vroomen, J.: The perception of emotions by ear and by eye. Cognition and Emotion 14, 289–311 (2002)
Article Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proc. IEEE Int. Conf. on Computer Vision (CVPR) (December 2001)
Google Scholar
Basu, S., Choudhury, T., Clarkson, B., Pentland, A.: Learning human interactions with the influence model. Tech. Rep. 539, MIT Media Laboratory (June 2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP), P.O. Box 592, CH-1920, Martigny, Switzerland
Iain McCowan, Daniel Gatica-Perez, Samy Bengio, Darren Moore & Hervé Bourlard

Authors

Iain McCowan
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Gatica-Perez
View author publications
You can also search for this author in PubMed Google Scholar
Samy Bengio
View author publications
You can also search for this author in PubMed Google Scholar
Darren Moore
View author publications
You can also search for this author in PubMed Google Scholar
Hervé Bourlard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Emile Aarts
Practice & Research in Intelligent Systems & Media, (PRISM) Laboratory, Department of Computer Science, University College Dublin (UCD), Belfield, Dublin 4, Ireland
René W. Collier
Philips Research Europe, HTC 34, 5656 AE, Eindhoven, The Netherlands
Evert van Loenen
Philips Research, Media Interaction, Prof. Holstlaan 4, 5656 AE, Eindhoven, The Netherlands
Boris de Ruyter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McCowan, I., Gatica-Perez, D., Bengio, S., Moore, D., Bourlard, H. (2003). Towards Computer Understanding of Human Interactions. In: Aarts, E., Collier, R.W., van Loenen, E., de Ruyter, B. (eds) Ambient Intelligence. EUSAI 2003. Lecture Notes in Computer Science, vol 2875. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39863-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-39863-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20418-3
Online ISBN: 978-3-540-39863-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics