Extracting Activities from Multimodal Observation

  • Oliver Brdiczka
  • Jérôme Maisonnasse
  • Patrick Reignier
  • James L. Crowley
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4252)


This paper addresses the extraction of small group configurations and activities in an intelligent meeting environment. The proposed approach takes a continuous stream of observations coming from different sensors in the environment as input. The goal is to separate distinct distributions of these observations corresponding to distinct group configurations and activities. In this paper, we explore an unsupervised method based on the calculation of the Jeffrey divergence between histograms over observations. The obtained distinct distributions of observations can be interpreted as distinct segments of group configuration and activity. To evaluate this approach, we recorded a seminar and a cocktail party meeting. The observations of the seminar were generated by a speech activity detector, while the observations of the cocktail party meeting were generated by both the speech activity detector and a visual tracking system. We measured the correspondence between detected segments and labelled group configurations and activities. The obtained results are promising, in particular as our method is completely unsupervised.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bilmes, J.A.: A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models, Technical Report, University of Berkeley (1998)Google Scholar
  2. 2.
    Brdiczka, O., Maisonnasse, J., Reignier, P.: Automatic Detection of Interaction Groups. In: Proc. Int’l Conf. Multimodal Interfaces (2005)Google Scholar
  3. 3.
    Brdiczka, O., Reignier, P., Maisonnasse, J.: Unsupervised segmentation of small group meetings using speech activity detection. In: Proc. Int’l Workshop on Multimodal Multiparty Meeting Processing (2005)Google Scholar
  4. 4.
    Caporossi, A., Hall, D., Reignier, P., Crowley, J.L.: Robust visual tracking from dynamic control of processing. In: Proc. Int’l PETS Workshop (2004)Google Scholar
  5. 5.
    McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., Zhang, D.: Auto-matic Analysis of Multimodal Group Actions in Meetings. IEEE Trans. on Pattern Analysis and Machine Intelligence (March 2005)Google Scholar
  6. 6.
    Muehlenbrock, M., Brdiczka, O., Snowdon, D., Meunier, J.-L.: Learning to Detect User Activity and Availability from a Variety of Sensor Data. In: Proc. IEEE Int’l Conference on Pervasive Computing and Communications (March 2004)Google Scholar
  7. 7.
    Puzicha, J., Hofmann, T., Buhmann, J.: Non-parametric Similarity Measures for Unsupervised Texture Segmentation and Image Retrieval. In: Proc. Int’l Conf. Computer Vision and Pattern Recognition (1997)Google Scholar
  8. 8.
    Qian, R.J., Sezan, M.I., Mathews, K.E.: Face Tracking Using Robust Statistical Estimation. In: Proc. Workshop on Perceptual User Interfaces, San Francisco (1998)Google Scholar
  9. 9.
    Stiefelhagen, R., Steusloff, H., Waibel, A.: CHIL - Computers in the Human Interaction Loop. In: Proc. Int’l Workshop on Image Analysis for Multimedia Interactive Services (2004)Google Scholar
  10. 10.
    Zaidenberg, S., Brdiczka, O., Reignier, P., Crowley, J.L.: Learning context models for the recognition of scenarios. In: Proc. IFIP Conf. on AI Applications and Innovations (2006)Google Scholar
  11. 11.
    Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I., Lathoud, G.: Multimodal Group Action Clustering in Meetings. In: Proc. Int’l Workshop on Video Surveillance & Sensor Networks (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Oliver Brdiczka
    • 1
  • Jérôme Maisonnasse
    • 1
  • Patrick Reignier
    • 1
  • James L. Crowley
    • 1
  1. 1.INRIA Rhône-AlpesMontbonnotFrance

Personalised recommendations