Advertisement

Computers in the Human Interaction Loop

  • A. Waibel
  • R. Stiefelhagen
  • R. Carlson
  • J. Casas
  • J. Kleindienst
  • L. Lamel
  • O. Lanz
  • D. Mostefa
  • M. Omologo
  • F. Pianesi
  • L. Polymenakos
  • G. Potamianos
  • J. Soldatos
  • G. Sutschet
  • J. Terken

Abstract

It is a common experience in our modern world, for us humans to be overwhelmed by the complexities of technological artifacts around us, and by the attention they demand. While technology provides wonderful support and helpful assistance, it also causes an increased preoccupation with technology itself and a related fragmentation of attention. But as humans, we would rather attend to a meaningful dialog and interaction with other humans, than to control the operations of machines that serve us. The cause for such complexity and distraction, however, is a natural consequence of the flexibility and choice of functions and features that technology has to offer. Thus flexibility of choice and the availability of desirable functions are in conflict with ease of use and our very ability to enjoy their benefits.

Keywords

Situation Model Automatic Speech Recognition Multimodal Interface Meeting Participant Perceptual Component 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Abad, A., Canton-Ferrer, C., Segura, C., Landabaso, J.L., Macho, D., Casas, J.R., Hernando, J., Pardas, M., Nadeu, C.: UPC Audio, Video and Multimodal Person Tracking Systems in the CLEAR Evaluation Campaign. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop. Springer LNCS 4122, Southampton, UK (2006)Google Scholar
  2. [2]
    Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: A compact model for speaker adaptation training. In: Proc. Int. Conf. Spoken Language Process. (ICSLP), pp. 1137–1140. Philadelphia, PA (1996)Google Scholar
  3. [3]
    Andreou, A., Kamm, T., Cohen, J.: Experiments in vocal tract normalisation. In: Proc. CAIP Works.: Frontiers in Speech Recognition II (1994)Google Scholar
  4. [4]
    Anguera, X., Wooters, C., Hernando, J.: Acoustic beamforming for speaker diarization of meetings. IEEE Trans. Audio Speech Language Process. 15(7), 2011–2022 (2007)CrossRefGoogle Scholar
  5. [5]
    Bales, R.F.: Interaction process analysis: a method for the study of small groups. University of Chicago press (1976)Google Scholar
  6. [6]
    Benne, K.D., Sheats, P.: Functional roles of group members. Journal of Social Issues 4 pp. 41–49 (1948)Google Scholar
  7. [7]
    Bernardin, K., Gehrig, T., Stiefelhagen, R.: Multi-Level Particle Filter Fusion of Features and Cues for Audio-Visual Person Tracking. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 70–81. Springer, Baltimore, MD, USA (2007)Google Scholar
  8. [8]
    Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, Special Issue on Video Tracking in Complex Scenes for Surveillance Applications (2008)Google Scholar
  9. [9]
    Beskow, J., Karlsson, I., Kewley, J., Salvi, G.: SYNFACE - A talking head telephone for the hearing-impaired, pp. 1178–1186. Springer-Verlag (2004)Google Scholar
  10. [10]
    Beskow, J., Nordenberg, M.: Data-driven synthesis of expressive visual speech using an mpeg-4 talking head. In: Proceedings of Interspeech 2005. Lisbon (2005)Google Scholar
  11. [11]
    Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(3), 257–267 (2001)CrossRefGoogle Scholar
  12. [12]
    Boud, D., Keogh, R., (Eds.), D.W.: Reflection: Turning experience into learning. Kogan Page, London (1988)Google Scholar
  13. [13]
    Brunelli, R., Brutti, A., Chippendale, P., Lanz, O., Omologo, M., Svaizer, P., Tobia, F.: A generative approach to audio-visual person tracking. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop, pp. 55–68. Springer LNCS 4122, Southampton, UK (2006)Google Scholar
  14. [14]
    Brutti, A.: A person tracking system for CHIL meetings. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625. Springer, Baltimore, MD, USA (2007)Google Scholar
  15. [15]
    Callaway, C., Not, E., Stock, O.: Report generation for post-visit summaries in museum environments. In: O. Stock, M. Zancanaro (eds.). PEACH: Intelligent Interfaces for Museum Visits. Springer (2007)Google Scholar
  16. [16]
    Canton-Ferrer, C., Casas, J.R., Pardàs, M.: Human model and motion based 3D action recognition in multiple view scenarios (invited paper). In: 14th European Signal Processing Conference, EUSIPCO. EURASIP, University of Pisa, Florence, Italy (2006). ISBN: 0-387-34223-0Google Scholar
  17. [17]
    Canton-Ferrer, C., Salvador, J., Casas, J., M.Pardas: Multi-person tracking strategies based on voxel analysis. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 91–103. Springer, Baltimore, MD, USA (2007)Google Scholar
  18. [18]
    Canton-Ferrer, C., Segura, C., Casas, J.R., Pardàs, M., Hernando, J.: Audiovisual head orientation estimation with particle filters in multisensor scenarios. EURASIP Journal on Advances in Signal Processing (2007)Google Scholar
  19. [19]
    The CHIL technology catalogue. http://chil.server.de/servlet/is/5777/
  20. [20]
    Chippendale, P., Lanz, O.: Optimised meeting recording and annotation using real-time video analysis. In: Proc. 5th Joint Workshop on Machine Learning and Multimodal Interaction, MLMI08. Utrecht, The Netherlands (2008)Google Scholar
  21. [21]
    CLEAR – Classification of Events, Activities, and Relationships Evaluation and Workshop: http://www.clear-evaluation.org
  22. [22]
  23. [23]
    Danninger, M., Stiefelhagen, R.: A context-aware virtual secretary in a smart office environment. In: Proceedings of the ACM Multimedia 2008. Vancouver, Canada (2008)Google Scholar
  24. [24]
    Davis, S., Mermelstein, P.: Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech, and Signal Process. 28(4), 357–366 (1980)CrossRefGoogle Scholar
  25. [25]
    D2.2 functional requirements and chil cooperative information system software design, part 2, cooperative information system software design. Available on http://chil.server.de
  26. [26]
    Dempster, A.P., Laird, M.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Statistical Society Series B (methodological) 39, 1–38 (1977)MATHMathSciNetGoogle Scholar
  27. [27]
    Dimakis, N., Soldatos, J., Polymenakos, L., Curin, J., Fleury, P., Kleindienst, J.: Integrated development of context-aware applications in smart spaces. IEEE Pervasive Computing 7(4), 71–79 (2008)CrossRefGoogle Scholar
  28. [28]
    Dong, W., Lepri, B., Cappelletti, A., Pentland, A., Pianesi, F., Zancanaro, M.: Using the influence model to recognize functional roles in meetings. In: Proceedings of the International Conference on Multimodal Interaction ICMI2007. Nagoya, Japan (2007)Google Scholar
  29. [29]
    Dourish, P.: The appropriation of interactive technologies: Some lessons from placeless documents. Computer Supported Cooperative Work (2003)Google Scholar
  30. [30]
    Doyle, M., Straus, D.: How To Make Meetings Work. The Berkley Publishing Group, New York, NY (1993)Google Scholar
  31. [31]
    Edlund, J., Beskow, J.: Pushy versus meek - using avatars to influence turn-taking behaviour. In: Proceedings of Interspeech 2007 ICSLP, pp. 682–685. Antwerp, Belgium (2007)Google Scholar
  32. [32]
    Edlund, J., Gustafson, J., Heldner, M., Hjalmarsson, A.: Towards human-like spoken dialogue systems. Speech Communication 50(8-9), 630–645 (2008). URL http://www.speech.kth.se/prod/publications/files/3145.pdf CrossRefGoogle Scholar
  33. [33]
    Edlund, J., Heldner, M.: Exploring prosody in interaction control. Phonetica 62(2-4), 215–226 (2005)CrossRefGoogle Scholar
  34. [34]
    Edlund, J., Heldner, M.: Underpinning /nailon/: automatic estimation of pitch range and speaker relative pitch. In: C. Müller (ed.) Speaker Classification. Springer/LNAI (2007)Google Scholar
  35. [35]
    Ekenel, H.K., Stiefelhagen, R.: Analysis of local appearance-based face recognition: Effects of feature selection and feature normalization. In: CVPR Biometrics Workshop. New York, USA (2006)Google Scholar
  36. [36]
    ELRA Catalogue of Language Resources: http://catalog.elra.info
  37. [37]
    FIPA: The foundation for intelligent physical agents. http://www.fipa.org
  38. [38]
    Fiscus, J.G.: A post-processing system to yield reduced word error rates: Recogniser output voting error reduction (ROVER). In: Proc. Automatic Speech Recognition and Understanding Works. (ASRU), pp. 347–352. Santa Barbara, CA (1997)Google Scholar
  39. [39]
    Fiscus, J.G., Ajot, J., Michel, M., Garofolo, J.S.: The Rich Transcription 2006 Spring meeting recognition evaluation. In: S. Renals, S. Bengio, J.G. Fiscus (eds.) Machine Learning for Multimodal Interaction, vol. 4299, pp. 309–322. LNCS (2006)Google Scholar
  40. [40]
    Fleury, P., Cuřín, J., Kleindienst, J.: SitCom - development platform for multimodal perceptual services. In: Proceedings of the 3nd International Conference on Industrial Applications of Holonic and Multi-Agent Systems, pp. 106–113. Regensburg, Germany (2007). V. Marik, V. Vyatkin, A.W. Colombo (Eds.): HoloMAS 2007, LNAI 4659Google Scholar
  41. [41]
    Gauvain, J.L., Lee, C.: Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Trans. on Speech and Audio Processing 2(2), 291–298 (1994). URL ftp://tlp.limsi.fr/public/map93.ps.Z CrossRefGoogle Scholar
  42. [42]
    Gehrig, T., McDonough, J.: Tracking multiple speakers with probabilistic data association filters. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop. Springer LNCS 4122, Southampton, UK (2006)Google Scholar
  43. [43]
    Gopinath, R.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP), pp. 661–664. Seattle, WA (1998)Google Scholar
  44. [44]
    Haeb-Umbach, R., Ney, H.: Linear discriminant analysis for improved large vocabulary continuous speech recognition. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP), vol. 1, pp. 13–16 (1992)Google Scholar
  45. [45]
    Heldner, M., Edlund, J., Carlson, R.: Interruption impossible. In: M. Horne, G. Bruce (eds.) Nordic Prosody: Proceedings of the IXth Conference, Lund 2004, pp. 97–105. Peter Lang, Frankfurt am Main (2006)Google Scholar
  46. [46]
    Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoustic Society America 87(4), 1738–1752 (1990)CrossRefGoogle Scholar
  47. [47]
    Huang, J., Marcheret, E., Visweswariah, K.: Improving speaker diarization for CHIL lecture meetings. In: Proc. Interspeech, pp. 1865–1868. Antwerp, Belgium (2007)Google Scholar
  48. [48]
    Huang, J., Marcheret, E., Visweswariah, K., Potamianos, G.: The IBM RT07 evaluation systems for speaker diarization on lecture meetings. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 497–508. Springer, Baltimore, MD, USA (2007)Google Scholar
  49. [49]
    Hugot, V.: Eye gaze analysis in human-human communication. Master thesis, KTH Speech, Music and Hearing (2007)Google Scholar
  50. [50]
    Ivanov, Y.A., Bobick., A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 852–872 (2000)CrossRefGoogle Scholar
  51. [51]
    JADE: Java Agent DEvelopent Framework. http://jade.tilab.com
  52. [52]
    Katsarakis, N., Talantzis, F., Pnevmatikakis, A., Polymenakos, L.: The AIT 3D audio / visual person tracker for CLEAR 2007. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 35–46. Springer, Baltimore, MD, USA (2007)Google Scholar
  53. [53]
    Katznbach, J., Smith, D.: The Wisdom of Teams. Creating the High Performance Organisations. Harvard Business School Press, Cambridge, MA (1993)Google Scholar
  54. [54]
    Klee, U., Gehrig, T., McDonough, J.: Kalman filters for time delay of arrival-based source localization. Journal of Advanced Signal Processing, Special Issue on Multi-Channel Speech Processing (2006)Google Scholar
  55. [55]
    Kray, C., Wasinger, R., Kortuem, G.: Concepts and issues in interfaces for multiple users and multiple devices. In: Proceedings of the Workshop on Multi-User and Ubiquitous User Interfaces (MU3I), IUI/CADUI (2004)Google Scholar
  56. [56]
    Kruger, R., Carpendale, M., Scott, S., Tang, A.: Fluid integration of rotation and translation. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2005). Portland, Oregon (2005)Google Scholar
  57. [57]
    Kulyk, O., Wang, C., Terken, J.: Real-time feedback based on nonverbal behaviour to enhance social dynamics in small group meetings. In: MLMI’05: Proceedings of the Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, LNCS, vol. 3869, pp. 150–161 (2006)Google Scholar
  58. [58]
    Landabaso, J.L., M. Pardas, M.: Foreground regions extraction and characterization towards real-time object tracking. In: Machine Learning for Multimodal Interaction (MLMI), vol. 3869, pp. 241–249. Springer LNCS (2006)Google Scholar
  59. [59]
    Lanz, O.: Approximate Bayesian Multibody Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(9), 1436–1449 (2006)CrossRefGoogle Scholar
  60. [60]
    Lanz, O., Brunelli, R.: Dynamic head location and pose from video. In: IEEE Conf. Multisensor Fusion and Integration (2006)Google Scholar
  61. [61]
    Lanz, O., Chippendale, P., Brunelli, R.: An appearance-based particle filter for visual tracking in smart rooms. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 57–69. Springer, Baltimore, MD, USA (2007)Google Scholar
  62. [62]
    Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)CrossRefGoogle Scholar
  63. [63]
    Luque, J., Anguera, X., Temko, A., Hernando, J.: Speaker diarization for conference room: The UPC RT07 s evaluation system. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 543–554. Springer, Baltimore, MD, USA (2007)Google Scholar
  64. [64]
    Morris, M., Piper, A., Cassanego, A., Huang, A., Paepcke, A., Winograd, T.: Mediating group dynamics through tabletop interface design. IEEE Computer Graphics and Applications pp. 65–73 (2006)Google Scholar
  65. [65]
    M.Voit, R.Stiefelhagen: Tracking head pose and focus of attention with multiple far-field cameras. In: International Conference On Multimodal Interfaces - ICMI 2006. Banff, Canada (2006)Google Scholar
  66. [66]
    Nickel, K., Gehrig, T., Ekenel, H.K., McDonough, J., Stiefelhagen, R.: An audio-visual particle filter for speaker tracking on the CLEAR’06 evaluation dataset. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop. Springer LNCS 4122, Southampton, UK (2006)Google Scholar
  67. [67]
    Nickel, K., Gehrig, T., Stiefelhagen, R., McDonough, J.: A Joint Particle Filter for Audio-visual Speaker Tracking. In: Proceedings of the Seventh International Conference On Multimodal Interfaces - ICMI 2005, pp. 61–68. ACM Press (2005)Google Scholar
  68. [68]
    The NIST MarkIII Microphone Array: http://www.nist.gov/smartspace/mk3_presentation.html
  69. [69]
    Pianesi, F., Zancanaro, M., Lepri, B., Cappelletti, A.: A multimodal annotated corpus of consensus decision making meetings. The Journal of Language Resources and Evaluation 41(3–4) (2007)Google Scholar
  70. [70]
    Pianesi, F., Zancanaro, M., Not, E., Leonardi, C., Falcon, V., Lepri, B.: Multimodal support to group dynamics. Personal and Ubiquitous Computing 12(2) (2008)Google Scholar
  71. [71]
    Povey, D., Woodland, P.: Improved discriminative training techniques for large vocabulary continuous speech recognition. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP). Salt Lake City, UT (2001)Google Scholar
  72. [72]
    Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proc. Int. Conf. Acoustics Speech Signal Process. (ICASSP), pp. 105–108. Orlando, FL (2002)Google Scholar
  73. [73]
    Rentzeperis, E., Stergiou, A., Boukis, C., Pnevmatikakis, A., Polymenakos, L.C.: The 2006 Athens Information Technology speech activity detection and speaker diarization systems. In: Machine Learning for Multimodal Interaction, vol. 4299, pp. 385–395. LNCS (2006)Google Scholar
  74. [74]
    The Rich Transcription 2006 Spring Meeting Recognition Evaluation Website: http://www.nist.gov/speech/tests/rt/2006-spring
  75. [75]
    Rich Transcription 2007 Meeting Recognition Evaluation. http://www.nist.gov/speech/tests/rt/2007
  76. [76]
    Schwenk, H.: Efficient training of large neural networks for language modeling. In: IJCNN, pp. 3059–3062 (2004)Google Scholar
  77. [77]
    Segura, C., Abad, A., Nadeu, C., Hernando, J.: Multispeaker localization and tracking in intelligent environments. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 82–90. Springer, Baltimore, MD, USA (2007)Google Scholar
  78. [78]
    Sellen, A., Harper, R.: The Myth of the Paperless Office. MIT Press (2001)Google Scholar
  79. [79]
    Shen, C., Vernier, F., Forlines, C., Ringel, M.: Diamondspin: An extensible toolkit for around-the-table interaction. In: ACM Conference on Human Factors in Computing Systems (CHI) (2004)Google Scholar
  80. [80]
    Siciliano, C., Williams, G., Beskow, J., Faulkner, A.: Evaluation of a multilingual synthetic talking face as a communication aid for the hearing impaired. In: Proc of ICPhS, XV Intl Conference of Phonetic Sciences, pp. 131–134. Barcelona, Spain (2003)Google Scholar
  81. [81]
    Skantze, G., House, D., Edlund, J.: User responses to prosodic variation on fragmentary grounding utterances in dialogue. In: Proceedings Interspeech 2006, pp. 2002–2005. Pittsburgh, PA (2006)Google Scholar
  82. [82]
    SmarTrack - a SmarT people Tracker. Patent pending. Online at http://tev.fbk.eu/smartrack/
  83. [83]
    Soldatos, J., Dimakis, N., Stamatis, K., Polymenakos, L.: A Breadboard Architecture for Pervasive Context-Aware Services in Smart Spaces: Middleware Components and Prototype Applications. Personal and Ubiquitous Computing Journal 11(2), 193–212 (2007). URL http://www.springerlink.com/content/j14821834364128w/ CrossRefGoogle Scholar
  84. [84]
    Stiefelhagen, R., Bernardin, K., Bowers, R., Garofolo, J., Mostefa, D., Soundararajan, P.: The CLEAR 2006 Evaluation. In: Multimodal Technologies for Perception of Humans, Proceedings of the First International CLEAR Evaluation Workshop, CLEAR 2006, no. 4122 in Springer LNCS, pp. 1–45. Southampton, UK (2006)Google Scholar
  85. [85]
    Stiefelhagen, R., Bernardin, K., Bowers, R., Rose, R.T., Michel, M., Garofolo, J.: The CLEAR 2007 Evaluation. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 3–34. Springer, Baltimore, MD, USA (2007)Google Scholar
  86. [86]
    Stiefelhagen, R., Bernardin, K., Ekenel, H., McDonough, J., Nickel, K., Voit, M., Woelfel, M.: Audio-visual perception of a lecturer in a smart seminar room. Signal Processing - Special Issue on Multimodal Interfaces 86(12) (2006)Google Scholar
  87. [87]
    Stiefelhagen, R., Bowers, R., Fiscus, J. (eds.): Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625. Springer, Baltimore, MD, USA (2007)Google Scholar
  88. [88]
    Stiefelhagen, R., Garofolo, J. (eds.): Multimodal Technologies for Perception of Humans, First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR’06. No. 4122 in LNCS. Springer, Southampton, UK (2006)Google Scholar
  89. [89]
    Sturm, J., van Herwijnen, O.H., Eyck, A., Terken, J.: Influencing social dynamics in meetings through a peripheral display. In: ICMI ’07: Proceedings of the 9th international conference on Multimodal interfaces, pp. 263–270. ACM, New York, NY, USA (2007)Google Scholar
  90. [90]
    Svanfeldt, G., Olszewski, D.: Perception experiment combining a parametric loudspeaker and a synthetic talking head. In: Proceedings of Interspeech, pp. 1721–1724 (2005)Google Scholar
  91. [91]
    Tang, J.C.: Finding from observational studies of collaborative work. International Journal of Man-Machine Studies 34(2), 143–160 (1991)CrossRefGoogle Scholar
  92. [92]
    Tyagi, A., Potamianos, G., Davis, J.W., Chu, S.M.: Fusion of multiple camera views for kernel-based 3D tracking. In: Proc. IEEE Works. Motion and Video Computing (WMVC). Austin, Texas (2007)Google Scholar
  93. [93]
    VACE - Video Analysis and Content Extraction, http://iris.usc.edu/Outlines/vace/vace.html
  94. [94]
    Waibel, A., Stiefelhagen, R. (eds.): Computers in the Human Interaction Loop. Human-Computer Interaction. Springer (2009)Google Scholar
  95. [95]
    Wallers, Å., Edlund, J., Skantze, G.: The effects of prosodic features on the interpretation of synthesised backchannels. In: E. André, L. Dybkjaer, W. Minker, H. Neumann, M. Weber (eds.) Proceedings of Perception and Interactive Technologies, pp. 183–187. Springer, Kloster Irsee, Germany (2006)CrossRefGoogle Scholar
  96. [96]
    Wojek, C., Nickel, K., Stiefelhagen, R.: Activity recognition and room level tracking in an office environment. In: IEEE Int. Conference on Multisensor Fusion and Integration for Intelligent Systems. Heidelberg, Germany (2006)Google Scholar
  97. [97]
    Wölfel, M.: Warped-twice minimum variance distortionless response spectral estimation. In: Proc. EUSIPCO (2006)Google Scholar
  98. [98]
    Wölfel, M., McDonough, J.: Combining multi-source far distance speech recognition strategies: Beamforming, blind channel and confusion network combination. In: Proc. Interspeech (2005)Google Scholar
  99. [99]
    Zancanaro, M., Lepri, B., Pianesi, F.: Automatic detection of group functional roles in face to face interactions. In: Proceedings of the International Conference of Multimodal Interfaces ICMI-06 (2006)Google Scholar
  100. [100]
    Zhang, Z., Potamianos, G., Senior, A.W., Huang, T.S.: Joint face and head tracking inside multi-camera smart rooms. Signal, Image and Video Processing pp. 163–178 (2007)Google Scholar
  101. [101]
    Zhu, X., Barras, C., Lamel, L., Gauvain, J.L.: Speaker diarization: from Broadcast News to lectures. In: Machine Learning for Multimodal Interaction, vol. 4299, pp. 396–406. LNCS (2006)Google Scholar
  102. [102]
    Zhu, X., Barras, C., Lamel, L., Gauvain, J.L.: Multi-stage speaker diarization for conference and lecture meetings. In: Multimodal Technologies for Perception of Humans, Proceedings of the International Evaluation Workshops CLEAR 2007 and RT 2007, LNCS, vol. 4625, pp. 533–542. Springer, Baltimore, MD, USA (2007)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • A. Waibel
    • 1
  • R. Stiefelhagen
    • 1
  • R. Carlson
    • 2
  • J. Casas
    • 3
  • J. Kleindienst
    • 4
  • L. Lamel
    • 5
  • O. Lanz
    • 6
  • D. Mostefa
    • 7
  • M. Omologo
    • 6
  • F. Pianesi
    • 6
  • L. Polymenakos
    • 8
  • G. Potamianos
    • 9
  • J. Soldatos
    • 8
  • G. Sutschet
    • 10
  • J. Terken
    • 11
  1. 1.Universität Karlsruhe (TH), Interactive Systems LabsKarlsruheGermany
  2. 2.Kungl Tekniska Högskolan, Centre for Speech TechnologyStockholmSweden
  3. 3.Universitat Politecnica de CatalunyaBarcelonaSpain
  4. 4.IBM ResearchCzech Republic
  5. 5.LIMSI-CNRSFrance
  6. 6.Foundation Bruno Kessler, irstTrentoItaly
  7. 7.ELDAParisFrance
  8. 8.Athens Information TechnologyGreece
  9. 9.Institute of Computer Science, FORTHCreteGreece
  10. 10.Fraunhofer Institute IITBKarlsruheGermany
  11. 11.Technische Universiteit EindhovenNetherlands

Personalised recommendations