Abstract
This paper addresses the problem of segmenting small group meetings in order to detect different group configurations and activities in an intelligent environment. Our approach takes speech activity detection of individuals attending a meeting as input. The goal is to separate distinct distributions of speech activity observation corresponding to distinct group configurations and activities. We propose an unsupervised method based on the calculation of the Jeffrey divergence between histograms of speech activity observations. These histograms are generated from adjacent windows of variable size slid from the beginning to the end of a meeting recording. The peaks of the resulting Jeffrey divergence curves are detected using successive robust mean estimation. After a merging and filtering process, the retained peaks are used to select the best model, i.e. the best speech activity distribution allocation for a given meeting recording. These distinct distributions can be interpreted as distinct segments of group configuration and activity. To evaluate, we recorded 6 small group meetings. We measured the correspondence between detected segments and labeled group configurations and activities. The obtained results are promising, in particular as our method is completely unsupervised.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Basu S., Conversational Scene Analysis, Ph.D. Thesis, MIT Department of EECS. September, 2002.
Brdiczka, O., Maisonnasse, J., and Reignier, P., Automatic Detection of Interaction Groups, Proc. Int’l Conf. Multimodal Interfaces, 2005 (to appear).
Burger, S., MacLaren, V., and Yu, H., The ISL Meeting Corpus; The Impact of Meeting Type on Speech Style, Proc, of ICSLP 2002, Denver, CO, USA, 2002.
Lamel L., Gauvain J.L., Eskenazi M., BREF, a large vocabulary spoken corpus for French, Eurospeech’91, Gênes (Italie), 1991
McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., and Zhang, D., Automatic Analysis of Multimodal Group Actions in Meetings, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 305–317, March 2005.
Metze, F., Mc Donough, J., Soltau, H., Waibel, A., Lavie, A., Burger, S., Langley, C., Levin, L., Schultz, T., Pianesi, F., Cattoni, R., Lazzari, G., Mana, N., Pianta, E., Besacier, L., Blanchon, H., Vaufreydaz, D., Taddei, L., The Nespole! Speech-to-Speech Translation System, Human Language Technologies 2002, San Diego, California (USA), March 2002.
Muehlenbrock, M., Brdiczka, O., Snowdon, D., and Meunier, J.-L., Learning to Detect User Activity and Availability from a Variety of Sensor Data, Proc. IEEE Int’l Conference on Pervasive Computing and Communications, March 2004.
Puzicha, J., Hofmann, Th., and Buhmann, J., Non-parametric Similarity Measures for Unsupervised Texture Segmentation and Image Retrieval, Proc. Int’l Conf. Computer Vision and Pattern Recognition, 1997.
Qian, R. J., Sezan, M., and Mathews, K. E., Face Tracking Using Robust Statistical Estimation, Proc. Workshop on Perceptual User Interfaces, San Francisco, 1998.
Rabiner L., Juang B.H., Fundamentals of Speech Recognition, Prentice Hall PTR, ISBN 0-130-15157-2,1993.
Stiefelhagen, R., Steusloff, H., and Waibel, A., CHIL-Computers in the Human Interaction Loop, Proc. Int’l Workshop on Image Analysis for Multimedia Interactive Services, 2004.
Taboada J., Feijoo S., Balsa R., Hernandez C., Explicit estimation of speech boundaries, IEEE Proc. Sci. Meas. Technol., vol. 141, pp. 153–159, 1994.
Vaufreydaz, D., Modélisaiion statistique du langage à partir d’Internet pour la reconnaissance automatique de la parole continue, Ph.D. thesis in Computer Sciences, University Joseph Fourier, Grenoble (France), 226 pages, January 2002.
Weiser, M., Ubiquitous Computing: Definition 1, http://www.ubiq.com/hypertext/weiser/UbiHome.html. March 1996.
Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, L., and Lathoud, G., Multimodal Group Action Clustering in Meetings, Proc. Int’l Workshop on Video Surveillance & Sensor Networks, 2004.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 International Federation for Information Processing
About this paper
Cite this paper
Brdiczka, O., Vaufreydaz, D., Maisonnasse, J., Reignier, P. (2006). Unsupervised Segmentation of Meeting Configurations and Activities using Speech Activity Detection. In: Maglogiannis, I., Karpouzis, K., Bramer, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2006. IFIP International Federation for Information Processing, vol 204. Springer, Boston, MA . https://doi.org/10.1007/0-387-34224-9_23
Download citation
DOI: https://doi.org/10.1007/0-387-34224-9_23
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34223-8
Online ISBN: 978-0-387-34224-5
eBook Packages: Computer ScienceComputer Science (R0)