Abstract
People meet in order to interact – disseminating information, making decisions, and creating new ideas. Automatic analysis of meetings is therefore important from two points of view: extracting the information they contain, and understanding human interaction processes. Based on this view, this article presents an approach in which relevant information content of a meeting is identified from a variety of audio and visual sensor inputs and statistical models of interacting people. We present a framework for computer observation and understanding of interacting people, and discuss particular tasks within this framework, issues in the meeting context, and particular algorithms that we have adopted. We also comment on current developments and the future challenges in automatic meeting analysis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Waibel, A., Schultz, T., Bett, M., Malkin, R., Rogina, I., Stiefelhagen, R., Yang, J.: SMaRT:the Smart Meeting Room Task at ISL. In: Proc. IEEE ICASSP 2003 (2003)
Bobick, A., Intille, S., Davis, J., Baird, F., Pinhanez, C., Campbell, L., Ivanov, Y., Schutte, A., Wilson, A.: The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment. PRESENCE: Teleoperators and Virtual Environments 8 (August 1999)
Johnson, N., Galata, A., Hogg, D.: The acquisition and use of interaction behaviour models. In: Proc. IEEE Int. Conference on Computer Vision and Pattern Recognition (June 1998)
Jebara, T., Pentland, A.: Action reaction learning: Automatic visual analysis and synthesis of interactive behaviour. In: Proc. International Conference on Vision Systems (January 1999)
Oliver, N., Rosario, B., Pentland, A.: A bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (August 2000)
Hongeng, S., Nevatia, R.: Multi-agent event recognition. In: Proc. IEEE Int. Conference on Computer Vision (Vancouver) (July 2001)
Carletta, J., Isard, A., Isard, S., Kowtko, J., Doherty-Sneddon, G., Anderson, A.: The coding of dialogue structure in a corpus. In: Andernach, J., van de Burgt, S., van der Hoeven, G. (eds.) Proceedings of the Twente Workshop on Language Technology: Corpus-based approaches to dialogue modelling, Universiteit Twente (1995)
Morgan, N., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Janin, A., Pfau, T., Shriberg, E., Stolcke, A.: The meeting project at ICSI. In: Proc. of the Human Language Technology Conference, San Diego, CA (March 2001)
Bales, R.F.: Interaction Process Analysis: A method for the study of small groups. Addison-Wesley, Reading (1951)
McGrath, J.E.: Groups: Interaction and Performance. Prentice-Hall, Englewood Cliffs (1984)
McGrath, J., Kravitz, D.: Group research. Annual Review of Psychology 33, 195–230 (1982)
Padilha, E., Carletta, J.C.: A simulation of small group discussion. In: EDILOG (2002)
Parker, K.C.H.: Speaking turns in small group interaction: A context-sensitive event sequence model. Journal of Personality and Social Psychology 54(6), 965–971 (1988)
Fay, N., Garrod, S., Carletta, J.: Group discussion as interactive dialogue or serial monologue: The influence of group size. Psychological Science 11(6), 487–492 (2000)
Novick, D., Hansen, B., Ward, K.: Coordinating turn-taking with gaze. In: Proceedings of the 1996 International Conference on Spoken Language Processing, ICSLP 1996 (1996)
Krauss, R., Garlock, C., Bricker, P., McMahon, L.: The role of audible and visible back-channel responses in interpersonal communication. Journal of Personality and Social Psychology 35(7), 523–529 (1977)
DePaulo, B., Rosenthal, R., Eisenstat, R., Rogers, P., Finkelstein, S.: Decoding discrepant nonverbal cues. Journal of Personality and Social Psychology 36(3), 313–323 (1978)
Kubala, F.: Rough’n’ready: a meeting recorder and browser. ACM Computing Surveys 31 (1999)
Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H., Zechner, K.: Advances in automatic meeting record creation and access. In: Proc. IEEE ICASSP, Salt Lake City, UT (May 2001)
Renals, S., Ellis, D.: Audio information access from meeting rooms. In: Proc. IEEE ICASSP 2003 (2003)
Cutler, R., Rui, Y., Gupta, A., Cadiz, J., Tashev, I., He, L., Colburn, A., Zhang, Z., Liu, Z., Silverberg, S.: Distributed meetings: A meeting capture and broadcasting system. In: Proc. ACM Multimedia Conference (2002)
Gatica-Perez, D., Lathoud, G., McCowan, I., Odobez, J.-M.: A mixed-state i-particle filter for multi-camera speaker tracking. In: Proceedings of WOMTEC (September 2003)
Doucet, A., de Freitas, N., Gordon, N.: Sequential Monte Carlo Methods in Practice. Springer, Heidelberg (2001)
Cutler, R.: The distributed meetings system. In: Proceedings of IEEE ICASSP 2003 (2003)
Stanford, V., Garofolo, J., Michel, M.: The nist smart space and meeting room projects: Signals, acquisition, annotation, and metrics. In: Proceedings of IEEE ICASSP 2003 (2003)
Silverman, H., Patterson, W., Flanagan, J., Rabinkin, D.: A digital processing system for source location and sound capture by large microphone arrays. In: Proceedings of ICASSP 1997 (April 1997)
Shriberg, E., Stolcke, A., Baron, D.: Observations on overlap: findings and implications for automatic processing of multi-party conversation. In: Proceedings of Eurospeech 2001, vol. 2, pp. 1359–1362 (2001)
Pfau, T., Ellis, D., Stolcke, A.: Multispeaker speech activity detection for the ICSI meeting recorder. In: Proceedings of ASRU 2001 (2001)
Kemp, T., Schmidt, M., Westphal, M., Waibel, A.: Strategies for automatic segmentation of audio data. In: Proceedings of ICASSP 2000 (2000)
Lathoud, G., McCowan, I.: Location based speaker segmentation. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (April 2003)
Lathoud, G., McCowan, I., Moore, D.: Segmenting multiple concurrent speakers using microphone arrays. In: Proceedings of Eurospeech 2003 (September 2003)
Bitzer, J., Simmer, K.U.: Superdirective microphone arrays. In: Brandstein, M., Ward, D. (eds.) Microphone Arrays, ch. 2, pp. 19–38. Springer, Heidelberg (2001)
McCowan, I., Bourlard, H.: Microphone array post-filter based on noise field coherence. To appear in IEEE Transactions on Speech and Audio Processing (November 2003)
Moore, D., McCowan, I.: Microphone array speech recognition: Experiments on overlapping speech in meetings. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (April 2003)
Jain, A., Bolle, R., Pankanti, S.: Biometrics: Person Identification in Networked Society. Kluwer, Dordrecht (1999)
Mariéthoz, J., Bengio, S.: A comparative study of adaptation methods for speaker verification. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP (2002)
Marcel, S., Bengio, S.: Improving face verification using skin color information. In: Proceedings of the 16th International Conference on Pattern Recognition, ICPR, IEEE Computer Society Press, Los Alamitos (2002)
Sanderson, C., Paliwal, K.: Polynomial Features for Robust Face Authentication. In: Proceedings of International Conference on Image Processing, vol. 3, pp. 997–1000 (2002)
Bengio, S., Marcel, C., Marcel, S., Mariéthoz, J.: Confidence measures for multimodal identity verification. Information Fusion 3(4), 267–276 (2002)
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Starner, T., Pentland, A.: Visual recognition of american sign language using HMMs. In: Proc. Int. Work. on Auto. Face and Gesture Recognition, Zurich (1995)
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Transactions on Multimedia 2, 141–151 (2000)
Bengio, S.: An asynchronous hidden markov model for audio-visual speech recognition. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, NIPS 15, MIT Press, Cambridge (2003)
McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G.: Automatic analysis of multimodal group actions in meetings. Tech. Rep. RR 03–27, IDIAP (2003)
De Gelder, B., Vroomen, J.: The perception of emotions by ear and by eye. Cognition and Emotion 14, 289–311 (2002)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proc. IEEE Int. Conf. on Computer Vision (CVPR) (December 2001)
Basu, S., Choudhury, T., Clarkson, B., Pentland, A.: Learning human interactions with the influence model. Tech. Rep. 539, MIT Media Laboratory (June 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McCowan, I., Gatica-Perez, D., Bengio, S., Moore, D., Bourlard, H. (2003). Towards Computer Understanding of Human Interactions. In: Aarts, E., Collier, R.W., van Loenen, E., de Ruyter, B. (eds) Ambient Intelligence. EUSAI 2003. Lecture Notes in Computer Science, vol 2875. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39863-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-39863-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20418-3
Online ISBN: 978-3-540-39863-9
eBook Packages: Springer Book Archive