Visual Attention, Speaking Activity, and Group Conversational Analysis in Multi-Sensor Environments

  • Daniel Gatica-Perez
  • Jean-Marc Odobez


Among the many possibilities of automation enabled by multi-sensor environments - several of which are discussed in this Handbook - one particularly relevant is the analysis of social interaction in the workplace, and more specifically, of conversational group interaction. Group conversations are ubiquitous, and represent a fundamental means through which ideas are discussed, progress is reported, and knowledge is created and disseminated.


Visual Attention Nonverbal Communication Meeting Room Conversational Event Computer Support Cooperative Work 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Argyle M, JGraham (1977) The central europe experiment - looking at persons and looking at things. Journal of Environmental Psychology and Nonverbal Behaviour 1:6–16CrossRefGoogle Scholar
  2. [2]
    Ba S, Odobez JM (2008) Multi-person visual focus of attention from head pose and meeting contextual cues. Tech. Rep. 47, Idiap Research InstituteGoogle Scholar
  3. [3]
    Ba S, Odobez JM (2008) Recognizing human visual focus of attention from head pose in meetings. IEEE Trans. on System, Man and Cybernetics: part B, Man, Vol. 39. No. 1. pp. 16-34, Feb 2009CrossRefGoogle Scholar
  4. [4]
    Ba SO, Odobez JM (2005) A Rao-Blackwellized mixed state particle filter for head pose tracking. In: Proc. ACM-ICMI-MMMP, pp 9–16Google Scholar
  5. [5]
    Bachour K, Kaplan F, Dillenbourg P (Sept, 2008) An interactive table for regulating face-to-face collaborative learning. In: Proc. European Conf. on Technology-Enhanced Learning (ECTEL), MaastrichtGoogle Scholar
  6. [6]
    Basu S, Choudhury T, Clarkson B, Pentland A (Dec. 2001) Towards measuring human interactions in conversational settings. In: Proc. IEEE CVPR Int. Workshop on Cues in Communication (CVPR-CUES), KauaiGoogle Scholar
  7. [7]
    Burgoon JK, Dunbar NE (2006) The Sage Handbook of Nonverbal Communication, Sage, chap Nonverbal expressions of dominance and power in human relationshipsGoogle Scholar
  8. [8]
    Cappella J (1985) Multichannel integrations of nonverbal behavior, Erlbaum, chap Controlling the floor in conversationGoogle Scholar
  9. [9]
    Carletta J, Ashby S, Bourban S, Flynn M, Guillemot M, T Hain JK, Karaiskos V, Kraaij W, Kronenthal M, Lathoud G, Lincoln M, A Lisowska IM, Post W, Reidsma D, Wellner P (2005) The AMI meeting corpus: A pre-announcement. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), EdinburghGoogle Scholar
  10. [10]
    Chen L, Harper M, Franklin A, Rose T, Kimbara I (2005) A Multimodal Analysis of Floor Control in Meetings. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI)Google Scholar
  11. [11]
    Cook M, Smith JMC (1975) The role of gaze in impression formation. British Journal of Social and Clinical PsychologyGoogle Scholar
  12. [12]
    DiMicco JM, Pandolfo A, Bender W (2004) Influencing group participation with a shared display. In: Proc. ACM Conf. on Computer Supported Cooperative Work (CSCW), ChicagoGoogle Scholar
  13. [13]
    Dines J, Vepa J, Hain T (2006) The segmentation of multi-channel meeting recordings for automatic speech recognition. In: Int. Conf. on Spoken Language Processing (Interspeech ICSLP)Google Scholar
  14. [14]
    Dovidio JF, Ellyson SL (1982) Decoding visual dominance: atributions of power based on relative percentages of looking while speaking and looking while listening. Social Psychology Quarterly 45(2):106–113CrossRefGoogle Scholar
  15. [15]
    Dunbar NE, Burgoon JK (2005) Perceptions of power and interactional dominance in interpersonal relationships. Journal of Social and Personal Relationships 22(2):207–233CrossRefGoogle Scholar
  16. [16]
    Duncan Jr S (1972) Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23(2):283–292CrossRefGoogle Scholar
  17. [17]
    Efran JS (1968) Looking for approval: effects of visual behavior of approbation from persons differing in importance. Journal of Personality and Social Psychology 10(1):21–25CrossRefGoogle Scholar
  18. [18]
    Exline RV, Ellyson SL, Long B (1975) Advances in the study of communication and affect, Plenum Press, chap Visual behavior as an aspect of power role relationshipsGoogle Scholar
  19. [19]
    Fay N, Garod S, Carletta J (2000) Group discussion as interactive dialogue or serial monologue: the influence of group size. Psychological Science 11(6):487–492CrossRefGoogle Scholar
  20. [20]
    Freedman EG, Sparks DL (1997) Eye-head coordination during head-unrestrained gaze shifts in rhesus monkeys. Journal of Neurophysiology 77:2328–2348Google Scholar
  21. [21]
    Gatica-Perez D (2006) Analyzing human interaction in conversations: a review. In: Proc. IEEE Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems (MFI), HeidelbergGoogle Scholar
  22. [22]
    Gatica-Perez D (2009) Automatic Nonverbal Analysis of Social Interaction in Small Groups: a Review, Image and Vision Computing, Special Issue on Human Naturalistic BehaviorGoogle Scholar
  23. [23]
    Gauvain J, Lee CH (1992) Bayesian learning for hidden Markov model with Gaussian mixture state observation densities. Speech Communication 11:205–213CrossRefGoogle Scholar
  24. [24]
    Goodwin C, Heritage J (1990) Conversation analysis. Annual Review of Anthropology pp 981–987Google Scholar
  25. [25]
    Hall JA, Coats EJ, LeBeau LS (2005) Nonverbal behavior and the vertical dimension of social relations: A meta-analysis. Psychological Bulletin 131(6):898–924CrossRefGoogle Scholar
  26. [26]
    Hayhoe M, Ballard D (2005) Eye movements in natural behavior. TRENDS in Cognitive Sciences 9(4):188–194CrossRefGoogle Scholar
  27. [27]
    Hung H, Jayagopi D, Yeo C, Friedland G, Ba SO, Odobez JM, Ramchandran K, Mirghafori N, Gatica-Perez D (2007) Using audio and video features to classify the most dominant person in a group meeting. In: Proc. of ACM MultimediaGoogle Scholar
  28. [28]
    Hung H, Huang Y, Friedland G, Gatica-Perez D (2008) Estimating the dominant person in multi-party conversations using speaker diarization strategies. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las VegasGoogle Scholar
  29. [29]
    Hung H, Jayagopi D, Ba S, Odobez JM, Gatica-Perez D (2008) Investigating automatic dominance estimation in groups from visual attention and speaking activity. in Proc. Int. Conf. on Multimodal Interfaces (ICMI), Chania, October.Google Scholar
  30. [30]
    Jayagopi D, Hung H, Yeo C, Gatica-Perez D (2009) Modeling dominance in group conversations using nonverbal activity cues. IEEE Trans. on Audio, Speech, and Language Processing, Special Issue on Multimodal Processing for Speech-based Interactions, Vol. 17, No. 3, pp. 501-513. MarchGoogle Scholar
  31. [31]
    Jovanovic N, Op den Akker H (2004) Towards automatic addressee identification in multi-party dialogues. In: 5th SIGdial Workshop on Discourse and DialogueGoogle Scholar
  32. [32]
    Kendon A (1967) Some functions of gaze-direction in social interaction. Acta Psychologica 26:22–63CrossRefGoogle Scholar
  33. [33]
    Kim T, Chang A, Holland L, Pentland A (2008) Meeting mediator: Enhancing group collaboration with sociometric feedback. In: Proc. ACM Conf. on Computer Supported Cooperative Work (CSCW), San DiegoGoogle Scholar
  34. [34]
    Knapp ML, Hall JA (2005) Nonverbal Communication in Human Interaction. Wadsworth PublishingGoogle Scholar
  35. [35]
    Kouadio M, Pooch U (2002) Technology on social issues of videoconferencing on the internet: a survey. Journal of Network and Computer Applications 25:37–56CrossRefGoogle Scholar
  36. [36]
    Kulyk O, Wang J, Terken J (2006) Real-time feedback on nonverbal behaviour to enhance social dynamics in small group meetings. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI)Google Scholar
  37. [37]
    Langton S, Watt R, Bruce V (2000) Do the eyes have it ? cues to the direction of social attention. Trends in Cognitive Sciences 4(2):50–58CrossRefGoogle Scholar
  38. [38]
    Lathoud G (2006) Spatio-temporal analysis of spontaneous speech with microphone arrays. PhD thesis, École Polytechnique Fédérale de Lausanne, Lausanne, SwitzerlandGoogle Scholar
  39. [39]
    Lathoud G, McCowan I (2003) Location Based Speaker Segmentation. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-03), Hong KongGoogle Scholar
  40. [40]
    Matena L, Jaimes A, Popescu-Belis A (2008) Graphical representation of meetings on mobile devices. In: MobileHCI conference, Amsterdam, The NetherlandsGoogle Scholar
  41. [41]
    Morimoto C, Mimica M (2005) Eye gaze tracking techniques for interactive applications. Computer Vision and Image Understanding 98:4–24CrossRefGoogle Scholar
  42. [42]
    Novick D, Hansen B, Ward K (1996) Coordinating turn taking with gaze. In: International Conference on Spoken Language ProcessingGoogle Scholar
  43. [43]
    Odobez JM, Ba S (2007) A Cognitive and Unsupervised MAP Adaptation Approach to the Recognition of Focus of Attention from Head pose. In: Proc. of ICMEGoogle Scholar
  44. [44]
    Ohno T (2005) Weak gaze awareness in video-mediated communication. In: Proceedings of Conference on Human Factors in Computing Systems, pp 1709–1712Google Scholar
  45. [45]
    Otsuka K, Takemae Y, Yamato J, Murase H (2005) A probabilistic inference of multiparty-conversation structure based on markov-switching models of gaze patterns, head directions, and utterances. In: Proc. of ICMI, pp 191–198Google Scholar
  46. [46]
    Otsuka K, Yamato J, Takemae Y, Murase H (2006) Conversation scene analysis with dynamic bayesian network based on visual head tracking. In: Proc. of ICMEGoogle Scholar
  47. [47]
    Otsuka K, Yamato J, Takemae Y, Murase H (2006) Quantifying interpersonal influence in face-to-face conversations based on visual attention patterns. In: Proc. ACM CHI Extended Abstract, MontrealGoogle Scholar
  48. [48]
    Ramírez J, Górriz J, Segura J (2007) Robust speech recognition and understanding, I-Tech, I-Tech Education and Publishing, Vienna, chap Voice activity detection: Fundamentals and speech recognition system robustnessGoogle Scholar
  49. [49]
    Ranjan A, Birnholtz J, Balakrishnan R (2008) Improving meeting capture by applying television production principles with audio and motion detection. In: CHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA, pp 227–236, DOI CrossRefGoogle Scholar
  50. [50]
    Rhee HS, Pirkul H, Jacob V, Barhki R (1995) Effects of computer-mediated communication on group negotiation: Au empirical study. In: Proceedings of the 28th Annual Hawaii International Conference on System Sciences, pp 981–987Google Scholar
  51. [51]
    Rienks R, Heylen D (2005) Automatic dominance detection in meetings using easily detectable features. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), EdinburghGoogle Scholar
  52. [52]
    Rienks R, Zhang D, Gatica-Perez D, Post W (2006) Detection and application of influence rankings in small-group meetings. In: Proc. Int. Conf. on Multimodal Interfaces (ICMI), BanffGoogle Scholar
  53. [53]
    Schmid Mast M (2002) Dominance as expressed and inferred through speaking time: A meta-analysis. Human Communication Research 28(3):420–450Google Scholar
  54. [54]
    Shriberg E, Stolcke A, Baron D (2001) Can prosody aid the automatic processing of multi-party meetings? evidence from predicting punctuation, disfluencies, and overlapping speech. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (Prosody 2001)Google Scholar
  55. [55]
    Stiefelhagen R (2002) Tracking and modeling focus of attention. PhD thesis, University of KarlsruheGoogle Scholar
  56. [56]
    Stiefelhagen R, Yang J, Waibel A (2002) Modeling focus of attention for meeting indexing based on multiple cues. IEEE Trans on Neural Networks 13(4):928–938CrossRefGoogle Scholar
  57. [57]
    Sturm J, Herwijnen OHV, Eyck A, Terken J (2007) Influencing social dynamics in meetings through a peripheral display. In: Proc. Int. Conf. on Multimodal Interfaces (ICMI), NagoyaGoogle Scholar
  58. [58]
    Takemae Y, Otsuka K, Yamato J (2005) Automatic video editing system using stereo-based head tracking for multiparty conversation. In: ACM Conference on Human Factors in Computing Systems, pp 1817–1820Google Scholar
  59. [59]
    Valente F (2006) Infinite models for speaker clustering. In: Int. Conf. on Spoken Language Processing (Interspeech ICSLP)Google Scholar
  60. [60]
    Vijayasenan D, Valente F, Bourlard H (2008) Integration of tdoa features in information bottleneck framework for fast speaker diarization. In: Interspeech 2008Google Scholar
  61. [61]
    Wrigley SJ, Brown GJ, Wan V, Renals S (2005) Speech and crosstalk detection in multi-channel audio. IEEE Trans on Speech and Audio Processing 13:84–91CrossRefGoogle Scholar
  62. [62]
    Yeo C, Ramchandran K (2008) Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection. Tech. Rep. UCB/EECS-2008-79, EECS Department, University of California, BerkeleyGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Idiap Research Institute and Ecole Polytechnique Fédérale de LausanneSwitzerland

Personalised recommendations