Visual Attention, Speaking Activity, and Group Conversational Analysis in Multi-Sensor Environments

Gatica-Perez, Daniel; Odobez, Jean-Marc

doi:10.1007/978-0-387-93808-0_16

Daniel Gatica-Perez⁴ &
Jean-Marc Odobez⁴

3043 Accesses
1 Citations

Abstract

Among the many possibilities of automation enabled by multi-sensor environments - several of which are discussed in this Handbook - one particularly relevant is the analysis of social interaction in the workplace, and more specifically, of conversational group interaction. Group conversations are ubiquitous, and represent a fundamental means through which ideas are discussed, progress is reported, and knowledge is created and disseminated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Argyle M, JGraham (1977) The central europe experiment - looking at persons and looking at things. Journal of Environmental Psychology and Nonverbal Behaviour 1:6–16
Article Google Scholar
Ba S, Odobez JM (2008) Multi-person visual focus of attention from head pose and meeting contextual cues. Tech. Rep. 47, Idiap Research Institute
Google Scholar
Ba S, Odobez JM (2008) Recognizing human visual focus of attention from head pose in meetings. IEEE Trans. on System, Man and Cybernetics: part B, Man, Vol. 39. No. 1. pp. 16-34, Feb 2009
Article Google Scholar
Ba SO, Odobez JM (2005) A Rao-Blackwellized mixed state particle filter for head pose tracking. In: Proc. ACM-ICMI-MMMP, pp 9–16
Google Scholar
Bachour K, Kaplan F, Dillenbourg P (Sept, 2008) An interactive table for regulating face-to-face collaborative learning. In: Proc. European Conf. on Technology-Enhanced Learning (ECTEL), Maastricht
Google Scholar
Basu S, Choudhury T, Clarkson B, Pentland A (Dec. 2001) Towards measuring human interactions in conversational settings. In: Proc. IEEE CVPR Int. Workshop on Cues in Communication (CVPR-CUES), Kauai
Google Scholar
Burgoon JK, Dunbar NE (2006) The Sage Handbook of Nonverbal Communication, Sage, chap Nonverbal expressions of dominance and power in human relationships
Google Scholar
Cappella J (1985) Multichannel integrations of nonverbal behavior, Erlbaum, chap Controlling the floor in conversation
Google Scholar
Carletta J, Ashby S, Bourban S, Flynn M, Guillemot M, T Hain JK, Karaiskos V, Kraaij W, Kronenthal M, Lathoud G, Lincoln M, A Lisowska IM, Post W, Reidsma D, Wellner P (2005) The AMI meeting corpus: A pre-announcement. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh
Google Scholar
Chen L, Harper M, Franklin A, Rose T, Kimbara I (2005) A Multimodal Analysis of Floor Control in Meetings. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI)
Google Scholar
Cook M, Smith JMC (1975) The role of gaze in impression formation. British Journal of Social and Clinical Psychology
Google Scholar
DiMicco JM, Pandolfo A, Bender W (2004) Influencing group participation with a shared display. In: Proc. ACM Conf. on Computer Supported Cooperative Work (CSCW), Chicago
Google Scholar
Dines J, Vepa J, Hain T (2006) The segmentation of multi-channel meeting recordings for automatic speech recognition. In: Int. Conf. on Spoken Language Processing (Interspeech ICSLP)
Google Scholar
Dovidio JF, Ellyson SL (1982) Decoding visual dominance: atributions of power based on relative percentages of looking while speaking and looking while listening. Social Psychology Quarterly 45(2):106–113
Article Google Scholar
Dunbar NE, Burgoon JK (2005) Perceptions of power and interactional dominance in interpersonal relationships. Journal of Social and Personal Relationships 22(2):207–233
Article Google Scholar
Duncan Jr S (1972) Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23(2):283–292
Article Google Scholar
Efran JS (1968) Looking for approval: effects of visual behavior of approbation from persons differing in importance. Journal of Personality and Social Psychology 10(1):21–25
Article Google Scholar
Exline RV, Ellyson SL, Long B (1975) Advances in the study of communication and affect, Plenum Press, chap Visual behavior as an aspect of power role relationships
Google Scholar
Fay N, Garod S, Carletta J (2000) Group discussion as interactive dialogue or serial monologue: the influence of group size. Psychological Science 11(6):487–492
Article Google Scholar
Freedman EG, Sparks DL (1997) Eye-head coordination during head-unrestrained gaze shifts in rhesus monkeys. Journal of Neurophysiology 77:2328–2348
Google Scholar
Gatica-Perez D (2006) Analyzing human interaction in conversations: a review. In: Proc. IEEE Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems (MFI), Heidelberg
Google Scholar
Gatica-Perez D (2009) Automatic Nonverbal Analysis of Social Interaction in Small Groups: a Review, Image and Vision Computing, Special Issue on Human Naturalistic Behavior
Google Scholar
Gauvain J, Lee CH (1992) Bayesian learning for hidden Markov model with Gaussian mixture state observation densities. Speech Communication 11:205–213
Article Google Scholar
Goodwin C, Heritage J (1990) Conversation analysis. Annual Review of Anthropology pp 981–987
Google Scholar
Hall JA, Coats EJ, LeBeau LS (2005) Nonverbal behavior and the vertical dimension of social relations: A meta-analysis. Psychological Bulletin 131(6):898–924
Article Google Scholar
Hayhoe M, Ballard D (2005) Eye movements in natural behavior. TRENDS in Cognitive Sciences 9(4):188–194
Article Google Scholar
Hung H, Jayagopi D, Yeo C, Friedland G, Ba SO, Odobez JM, Ramchandran K, Mirghafori N, Gatica-Perez D (2007) Using audio and video features to classify the most dominant person in a group meeting. In: Proc. of ACM Multimedia
Google Scholar
Hung H, Huang Y, Friedland G, Gatica-Perez D (2008) Estimating the dominant person in multi-party conversations using speaker diarization strategies. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas
Google Scholar
Hung H, Jayagopi D, Ba S, Odobez JM, Gatica-Perez D (2008) Investigating automatic dominance estimation in groups from visual attention and speaking activity. in Proc. Int. Conf. on Multimodal Interfaces (ICMI), Chania, October.
Google Scholar
Jayagopi D, Hung H, Yeo C, Gatica-Perez D (2009) Modeling dominance in group conversations using nonverbal activity cues. IEEE Trans. on Audio, Speech, and Language Processing, Special Issue on Multimodal Processing for Speech-based Interactions, Vol. 17, No. 3, pp. 501-513. March
Google Scholar
Jovanovic N, Op den Akker H (2004) Towards automatic addressee identification in multi-party dialogues. In: 5th SIGdial Workshop on Discourse and Dialogue
Google Scholar
Kendon A (1967) Some functions of gaze-direction in social interaction. Acta Psychologica 26:22–63
Article Google Scholar
Kim T, Chang A, Holland L, Pentland A (2008) Meeting mediator: Enhancing group collaboration with sociometric feedback. In: Proc. ACM Conf. on Computer Supported Cooperative Work (CSCW), San Diego
Google Scholar
Knapp ML, Hall JA (2005) Nonverbal Communication in Human Interaction. Wadsworth Publishing
Google Scholar
Kouadio M, Pooch U (2002) Technology on social issues of videoconferencing on the internet: a survey. Journal of Network and Computer Applications 25:37–56
Article Google Scholar
Kulyk O, Wang J, Terken J (2006) Real-time feedback on nonverbal behaviour to enhance social dynamics in small group meetings. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI)
Google Scholar
Langton S, Watt R, Bruce V (2000) Do the eyes have it ? cues to the direction of social attention. Trends in Cognitive Sciences 4(2):50–58
Article Google Scholar
Lathoud G (2006) Spatio-temporal analysis of spontaneous speech with microphone arrays. PhD thesis, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Google Scholar
Lathoud G, McCowan I (2003) Location Based Speaker Segmentation. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-03), Hong Kong
Google Scholar
Matena L, Jaimes A, Popescu-Belis A (2008) Graphical representation of meetings on mobile devices. In: MobileHCI conference, Amsterdam, The Netherlands
Google Scholar
Morimoto C, Mimica M (2005) Eye gaze tracking techniques for interactive applications. Computer Vision and Image Understanding 98:4–24
Article Google Scholar
Novick D, Hansen B, Ward K (1996) Coordinating turn taking with gaze. In: International Conference on Spoken Language Processing
Google Scholar
Odobez JM, Ba S (2007) A Cognitive and Unsupervised MAP Adaptation Approach to the Recognition of Focus of Attention from Head pose. In: Proc. of ICME
Google Scholar
Ohno T (2005) Weak gaze awareness in video-mediated communication. In: Proceedings of Conference on Human Factors in Computing Systems, pp 1709–1712
Google Scholar
Otsuka K, Takemae Y, Yamato J, Murase H (2005) A probabilistic inference of multiparty-conversation structure based on markov-switching models of gaze patterns, head directions, and utterances. In: Proc. of ICMI, pp 191–198
Google Scholar
Otsuka K, Yamato J, Takemae Y, Murase H (2006) Conversation scene analysis with dynamic bayesian network based on visual head tracking. In: Proc. of ICME
Google Scholar
Otsuka K, Yamato J, Takemae Y, Murase H (2006) Quantifying interpersonal influence in face-to-face conversations based on visual attention patterns. In: Proc. ACM CHI Extended Abstract, Montreal
Google Scholar
Ramírez J, Górriz J, Segura J (2007) Robust speech recognition and understanding, I-Tech, I-Tech Education and Publishing, Vienna, chap Voice activity detection: Fundamentals and speech recognition system robustness
Google Scholar
Ranjan A, Birnholtz J, Balakrishnan R (2008) Improving meeting capture by applying television production principles with audio and motion detection. In: CHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA, pp 227–236, DOI http://doi.acm.org/10.1145/1357054.1357095
Chapter Google Scholar
Rhee HS, Pirkul H, Jacob V, Barhki R (1995) Effects of computer-mediated communication on group negotiation: Au empirical study. In: Proceedings of the 28th Annual Hawaii International Conference on System Sciences, pp 981–987
Google Scholar
Rienks R, Heylen D (2005) Automatic dominance detection in meetings using easily detectable features. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh
Google Scholar
Rienks R, Zhang D, Gatica-Perez D, Post W (2006) Detection and application of influence rankings in small-group meetings. In: Proc. Int. Conf. on Multimodal Interfaces (ICMI), Banff
Google Scholar
Schmid Mast M (2002) Dominance as expressed and inferred through speaking time: A meta-analysis. Human Communication Research 28(3):420–450
Google Scholar
Shriberg E, Stolcke A, Baron D (2001) Can prosody aid the automatic processing of multi-party meetings? evidence from predicting punctuation, disfluencies, and overlapping speech. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (Prosody 2001)
Google Scholar
Stiefelhagen R (2002) Tracking and modeling focus of attention. PhD thesis, University of Karlsruhe
Google Scholar
Stiefelhagen R, Yang J, Waibel A (2002) Modeling focus of attention for meeting indexing based on multiple cues. IEEE Trans on Neural Networks 13(4):928–938
Article Google Scholar
Sturm J, Herwijnen OHV, Eyck A, Terken J (2007) Influencing social dynamics in meetings through a peripheral display. In: Proc. Int. Conf. on Multimodal Interfaces (ICMI), Nagoya
Google Scholar
Takemae Y, Otsuka K, Yamato J (2005) Automatic video editing system using stereo-based head tracking for multiparty conversation. In: ACM Conference on Human Factors in Computing Systems, pp 1817–1820
Google Scholar
Valente F (2006) Infinite models for speaker clustering. In: Int. Conf. on Spoken Language Processing (Interspeech ICSLP)
Google Scholar
Vijayasenan D, Valente F, Bourlard H (2008) Integration of tdoa features in information bottleneck framework for fast speaker diarization. In: Interspeech 2008
Google Scholar
Wrigley SJ, Brown GJ, Wan V, Renals S (2005) Speech and crosstalk detection in multi-channel audio. IEEE Trans on Speech and Audio Processing 13:84–91
Article Google Scholar
Yeo C, Ramchandran K (2008) Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection. Tech. Rep. UCB/EECS-2008-79, EECS Department, University of California, Berkeley
Google Scholar

Download references

Author information

Authors and Affiliations

Idiap Research Institute and Ecole Polytechnique Fédérale de Lausanne, Switzerland
Daniel Gatica-Perez & Jean-Marc Odobez

Authors

Daniel Gatica-Perez
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Odobez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Daniel Gatica-Perez or Jean-Marc Odobez .

Editor information

Editors and Affiliations

Future University Hakodate, Kameda-Nakano 116-2, Hakodate, Hokkaido, 041-8655, Japan
Hideyuki Nakashima
Department of Electrical Engineering, Stanford University, 350 Serra Mall, Stanford, CA, 94305-9515, USA
Hamid Aghajan
School of Computing & Mathematics, University of Ulster at Jordanstown, Shore Road, Newtownabbey, Co. Antrim, UK, BT37 0QB
Juan Carlos Augusto

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gatica-Perez, D., Odobez, JM. (2010). Visual Attention, Speaking Activity, and Group Conversational Analysis in Multi-Sensor Environments. In: Nakashima, H., Aghajan, H., Augusto, J.C. (eds) Handbook of Ambient Intelligence and Smart Environments. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-93808-0_16

Download citation

DOI: https://doi.org/10.1007/978-0-387-93808-0_16
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-93807-3
Online ISBN: 978-0-387-93808-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics