Skip to main content

Towards Computer Understanding of Human Interactions

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2875))

Abstract

People meet in order to interact – disseminating information, making decisions, and creating new ideas. Automatic analysis of meetings is therefore important from two points of view: extracting the information they contain, and understanding human interaction processes. Based on this view, this article presents an approach in which relevant information content of a meeting is identified from a variety of audio and visual sensor inputs and statistical models of interacting people. We present a framework for computer observation and understanding of interacting people, and discuss particular tasks within this framework, issues in the meeting context, and particular algorithms that we have adopted. We also comment on current developments and the future challenges in automatic meeting analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Waibel, A., Schultz, T., Bett, M., Malkin, R., Rogina, I., Stiefelhagen, R., Yang, J.: SMaRT:the Smart Meeting Room Task at ISL. In: Proc. IEEE ICASSP 2003 (2003)

    Google Scholar 

  2. Bobick, A., Intille, S., Davis, J., Baird, F., Pinhanez, C., Campbell, L., Ivanov, Y., Schutte, A., Wilson, A.: The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment. PRESENCE: Teleoperators and Virtual Environments 8 (August 1999)

    Google Scholar 

  3. Johnson, N., Galata, A., Hogg, D.: The acquisition and use of interaction behaviour models. In: Proc. IEEE Int. Conference on Computer Vision and Pattern Recognition (June 1998)

    Google Scholar 

  4. Jebara, T., Pentland, A.: Action reaction learning: Automatic visual analysis and synthesis of interactive behaviour. In: Proc. International Conference on Vision Systems (January 1999)

    Google Scholar 

  5. Oliver, N., Rosario, B., Pentland, A.: A bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (August 2000)

    Google Scholar 

  6. Hongeng, S., Nevatia, R.: Multi-agent event recognition. In: Proc. IEEE Int. Conference on Computer Vision (Vancouver) (July 2001)

    Google Scholar 

  7. Carletta, J., Isard, A., Isard, S., Kowtko, J., Doherty-Sneddon, G., Anderson, A.: The coding of dialogue structure in a corpus. In: Andernach, J., van de Burgt, S., van der Hoeven, G. (eds.) Proceedings of the Twente Workshop on Language Technology: Corpus-based approaches to dialogue modelling, Universiteit Twente (1995)

    Google Scholar 

  8. Morgan, N., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Janin, A., Pfau, T., Shriberg, E., Stolcke, A.: The meeting project at ICSI. In: Proc. of the Human Language Technology Conference, San Diego, CA (March 2001)

    Google Scholar 

  9. Bales, R.F.: Interaction Process Analysis: A method for the study of small groups. Addison-Wesley, Reading (1951)

    Google Scholar 

  10. McGrath, J.E.: Groups: Interaction and Performance. Prentice-Hall, Englewood Cliffs (1984)

    Google Scholar 

  11. McGrath, J., Kravitz, D.: Group research. Annual Review of Psychology 33, 195–230 (1982)

    Article  Google Scholar 

  12. Padilha, E., Carletta, J.C.: A simulation of small group discussion. In: EDILOG (2002)

    Google Scholar 

  13. Parker, K.C.H.: Speaking turns in small group interaction: A context-sensitive event sequence model. Journal of Personality and Social Psychology 54(6), 965–971 (1988)

    Article  Google Scholar 

  14. Fay, N., Garrod, S., Carletta, J.: Group discussion as interactive dialogue or serial monologue: The influence of group size. Psychological Science 11(6), 487–492 (2000)

    Article  Google Scholar 

  15. Novick, D., Hansen, B., Ward, K.: Coordinating turn-taking with gaze. In: Proceedings of the 1996 International Conference on Spoken Language Processing, ICSLP 1996 (1996)

    Google Scholar 

  16. Krauss, R., Garlock, C., Bricker, P., McMahon, L.: The role of audible and visible back-channel responses in interpersonal communication. Journal of Personality and Social Psychology 35(7), 523–529 (1977)

    Article  Google Scholar 

  17. DePaulo, B., Rosenthal, R., Eisenstat, R., Rogers, P., Finkelstein, S.: Decoding discrepant nonverbal cues. Journal of Personality and Social Psychology 36(3), 313–323 (1978)

    Article  Google Scholar 

  18. Kubala, F.: Rough’n’ready: a meeting recorder and browser. ACM Computing Surveys 31 (1999)

    Google Scholar 

  19. Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H., Zechner, K.: Advances in automatic meeting record creation and access. In: Proc. IEEE ICASSP, Salt Lake City, UT (May 2001)

    Google Scholar 

  20. Renals, S., Ellis, D.: Audio information access from meeting rooms. In: Proc. IEEE ICASSP 2003 (2003)

    Google Scholar 

  21. Cutler, R., Rui, Y., Gupta, A., Cadiz, J., Tashev, I., He, L., Colburn, A., Zhang, Z., Liu, Z., Silverberg, S.: Distributed meetings: A meeting capture and broadcasting system. In: Proc. ACM Multimedia Conference (2002)

    Google Scholar 

  22. Gatica-Perez, D., Lathoud, G., McCowan, I., Odobez, J.-M.: A mixed-state i-particle filter for multi-camera speaker tracking. In: Proceedings of WOMTEC (September 2003)

    Google Scholar 

  23. Doucet, A., de Freitas, N., Gordon, N.: Sequential Monte Carlo Methods in Practice. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  24. Cutler, R.: The distributed meetings system. In: Proceedings of IEEE ICASSP 2003 (2003)

    Google Scholar 

  25. Stanford, V., Garofolo, J., Michel, M.: The nist smart space and meeting room projects: Signals, acquisition, annotation, and metrics. In: Proceedings of IEEE ICASSP 2003 (2003)

    Google Scholar 

  26. Silverman, H., Patterson, W., Flanagan, J., Rabinkin, D.: A digital processing system for source location and sound capture by large microphone arrays. In: Proceedings of ICASSP 1997 (April 1997)

    Google Scholar 

  27. Shriberg, E., Stolcke, A., Baron, D.: Observations on overlap: findings and implications for automatic processing of multi-party conversation. In: Proceedings of Eurospeech 2001, vol. 2, pp. 1359–1362 (2001)

    Google Scholar 

  28. Pfau, T., Ellis, D., Stolcke, A.: Multispeaker speech activity detection for the ICSI meeting recorder. In: Proceedings of ASRU 2001 (2001)

    Google Scholar 

  29. Kemp, T., Schmidt, M., Westphal, M., Waibel, A.: Strategies for automatic segmentation of audio data. In: Proceedings of ICASSP 2000 (2000)

    Google Scholar 

  30. Lathoud, G., McCowan, I.: Location based speaker segmentation. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (April 2003)

    Google Scholar 

  31. Lathoud, G., McCowan, I., Moore, D.: Segmenting multiple concurrent speakers using microphone arrays. In: Proceedings of Eurospeech 2003 (September 2003)

    Google Scholar 

  32. Bitzer, J., Simmer, K.U.: Superdirective microphone arrays. In: Brandstein, M., Ward, D. (eds.) Microphone Arrays, ch. 2, pp. 19–38. Springer, Heidelberg (2001)

    Google Scholar 

  33. McCowan, I., Bourlard, H.: Microphone array post-filter based on noise field coherence. To appear in IEEE Transactions on Speech and Audio Processing (November 2003)

    Google Scholar 

  34. Moore, D., McCowan, I.: Microphone array speech recognition: Experiments on overlapping speech in meetings. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (April 2003)

    Google Scholar 

  35. Jain, A., Bolle, R., Pankanti, S.: Biometrics: Person Identification in Networked Society. Kluwer, Dordrecht (1999)

    Google Scholar 

  36. Mariéthoz, J., Bengio, S.: A comparative study of adaptation methods for speaker verification. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP (2002)

    Google Scholar 

  37. Marcel, S., Bengio, S.: Improving face verification using skin color information. In: Proceedings of the 16th International Conference on Pattern Recognition, ICPR, IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  38. Sanderson, C., Paliwal, K.: Polynomial Features for Robust Face Authentication. In: Proceedings of International Conference on Image Processing, vol. 3, pp. 997–1000 (2002)

    Google Scholar 

  39. Bengio, S., Marcel, C., Marcel, S., Mariéthoz, J.: Confidence measures for multimodal identity verification. Information Fusion 3(4), 267–276 (2002)

    Article  Google Scholar 

  40. Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  41. Starner, T., Pentland, A.: Visual recognition of american sign language using HMMs. In: Proc. Int. Work. on Auto. Face and Gesture Recognition, Zurich (1995)

    Google Scholar 

  42. Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Transactions on Multimedia 2, 141–151 (2000)

    Article  Google Scholar 

  43. Bengio, S.: An asynchronous hidden markov model for audio-visual speech recognition. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, NIPS 15, MIT Press, Cambridge (2003)

    Google Scholar 

  44. McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G.: Automatic analysis of multimodal group actions in meetings. Tech. Rep. RR 03–27, IDIAP (2003)

    Google Scholar 

  45. De Gelder, B., Vroomen, J.: The perception of emotions by ear and by eye. Cognition and Emotion 14, 289–311 (2002)

    Article  Google Scholar 

  46. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proc. IEEE Int. Conf. on Computer Vision (CVPR) (December 2001)

    Google Scholar 

  47. Basu, S., Choudhury, T., Clarkson, B., Pentland, A.: Learning human interactions with the influence model. Tech. Rep. 539, MIT Media Laboratory (June 2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McCowan, I., Gatica-Perez, D., Bengio, S., Moore, D., Bourlard, H. (2003). Towards Computer Understanding of Human Interactions. In: Aarts, E., Collier, R.W., van Loenen, E., de Ruyter, B. (eds) Ambient Intelligence. EUSAI 2003. Lecture Notes in Computer Science, vol 2875. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39863-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39863-9_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20418-3

  • Online ISBN: 978-3-540-39863-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics