Speech Communication and Multimodal Interfaces

  • Björn Schuller
  • Markus Ablaßmeier
  • Ronald Müller
  • Stefan Reifinger
  • Tony Poitschke
  • Gerhard Rigoll
Part of the Signals and Communication Technology book series (SCT)


Speech Recognition Speech Signal Emotion Recognition Automatic Speech Recognition Facial Expression Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    When Do We Interact Multimodally? Cognitive Load and Multimodal Communication Patterns., 2004.Google Scholar
  2. 2.
    Ang, J., Dhillon, R., Krupski, A., Shriberg, E., and Stolcke, A. Prosody-Based Automatic Detection of Annoyance and Frustration in Human-Computer Dialog. In Proceedings of the International Conference on Speech and Language Processing (ICSLP 2002). Denver, CO, 2002.Google Scholar
  3. 3.
    Arsic, D., Wallhoff, F., Schuller, B., and Rigoll, G. Video Based Online Behavior Detection Using Probabilistic Multi-Stream Fusion. In Proceedings of the International IEEE Conference on Image Processing (ICIP 2005). 2005.Google Scholar
  4. 4.
    Batliner, A., Hacker, C., Steidl, S., Nöth, E., Russel, S. D. M., and Wong, M. ‘You Stupid Tin Box’-Children Interacting with the AIBO Robot:A Cross-linguisitc Emotional Speech Corpus. In Proceedings of the LREC 2004. Lisboa, Portugal, 2004.Google Scholar
  5. 5.
    Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., and Suhm, B., editors. Audiovisual and Multimodal Speech Systems. In: Handbook of Standards and Resources for Spoken Language Systems-Supplement Volume. D. Gibbon, I. Mertins, R.K. Moore, Kluwer International Series in Engineering and Computer Science, 2000.Google Scholar
  6. 6.
    Bolt, R. A. “Put-That-There”: Voice and Gesture at the Graphics Interface. In International Conference on Computer Graphics and Interactive Techniques, pages 262–270. July 1980.Google Scholar
  7. 7.
    Carpenter, B. The Logic of Typed Feature Structures. Cambridge, England, 1992.Google Scholar
  8. 8.
    Chuang, Z. and Wu, C. Emotion Recognition using Acoustic Features and Textual Content. In Proceedings of the International IEEE Conference on Multimedia and Expo (ICME) 2004. Taipei, Taiwan, 2004.Google Scholar
  9. 9.
    Core, M. G. Analyzing and Predicting Patterns of DAMSL Utterance Tags. In AAAI Spring Symposium Technical Report SS-98-01. AAAI Press, 1998. ISBN ISBN 1-57735-046-4.Google Scholar
  10. 10.
    Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J. G. Emotion Recognition in Human-computer Interaction. IEEE Signal Processing magazine, 18(1):32–80, January 2001.CrossRefGoogle Scholar
  11. 11.
    Devillers, L. and Lamel, L. Emotion Detection in Task-Oriented Dialogs. In Proceedings of the International Conference on Multimedia and Expo(ICME 2003), IEEE, Multimedia Human-Machine Interface and Interaction, volume III, pages 549–552. Baltimore, MD, 2003.Google Scholar
  12. 12.
    Ekman, P. and Friesen, W. Facial Action Coding System. Consulting Psychologists Press, 1978.Google Scholar
  13. 13.
    Freund, Y. and Schapire, R. Experiments with a New Boosting Algorithm. In International Conference on Machine Learning, pages 148–156. 1996.Google Scholar
  14. 14.
    Geiser, G., editor. Mensch-Maschine-Kommunikation. Oldenbourg-Verlag, München, 1990.Google Scholar
  15. 15.
    Goldschen, A. and Loehr, D. The Role of the DARPA Communicator Architecture as a Human-Computer Interface for Distributed Simulations. In Simulation Interoperability Standards Organization (SISO) Spring Simulation Interoperability Workshop. Orlando, Florida, 1999.Google Scholar
  16. 16.
    Grosz, B. and Sidner, C. Attentions, Intentions and the Structure of Discourse. Computational Linguistics, 12(3):175–204, 1986.Google Scholar
  17. 17.
    Hartung, K., Münch, S., and Schomaker, L. MIAMI: Software Architecture, Deliverable Report 4. Report of ESPRIT III: Basic Research Project 8579, Multimodal Interface for Advanced Multimedia Interfaces (MIAMI). Technical report, 1996.Google Scholar
  18. 18.
    Hewett, T., Baecker, R., Card, S., Carey, T., Gasen, J., Mantei, M., Perlman, G., Strong, G., and Verplank, W., editors. Curricula for Human-Computer Interaction. ACM Special Interest Group on Computer-Human Interaction, Curriculum Development Group, 1996.Google Scholar
  19. 19.
    Hoch, S., Althoff, F., McGlaun, G., and Rigoll, G. Bimodal Fusion of Emotional Data in an Automotive Environment. In Proc. of the ICASSP 2005, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing. 2005.Google Scholar
  20. 20.
    Jiao, F., Li, S., Shum, H., and Schuurmanns, D. Face Alignment Using Statistical Models and Wavelet Features. In Conference on Computer Vision and Pattern Recognition. 2003.Google Scholar
  21. 21.
    Joachims, T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Technical report, LS-8 Report 23, Dortmund, Germany, 1997.Google Scholar
  22. 22.
    Johnston, M. Unification-based Multimodal Integration. In Below, R. K. and Booker, L., editors, Proccedings of the 4th International Conference on Genetic Algorithms. Morgan Kaufmann, 1997.Google Scholar
  23. 23.
    Krahmer, E. The Science and Art of Voice Interfaces. Technical report, Philips Research, Eindhoven, Netherlands, 2001.Google Scholar
  24. 24.
    Langley, P., Thompson, C., Elio, R., and Haddadi, A. An Adaptive Conversational Interface for Destination Advice. In Proceedings of the Third International Workshop on Cooperative Information Agents. Springer, Uppsala, Sweden, 1999.Google Scholar
  25. 25.
    Lee, C. M. and Pieraccini, R. Combining Acoustic and Language Information for Emotion Recognition. In Proceedings of the International Conference on Speech and Language Processing (ICSLP 2002). Denver, CO, 2002.Google Scholar
  26. 26.
    Lee, T. S. Image Representation Using 2D Gabor Wavelets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(10):959–971, 1996.CrossRefGoogle Scholar
  27. 27.
    Levin, E., Pieraccini, R., and Eckert, W. A stochastic Model of Human-Machine Interaction for Learning Dialog Strategies. IEEE Transactions on Speech and Audio Processing, 8(1):11–23, 2000.CrossRefGoogle Scholar
  28. 28.
    Litman, D., Kearns, M., Singh, S., and Walker, M. Automatic Optimization of Dialogue Management. In Proceedings of the 18th International Conference on Computational Linguistics. Saarbrücken, Germany, 2000.Google Scholar
  29. 29.
    Maybury, M. T. and Stock, O. Multimedia Communication, including Text. In Hovy, E., Ide, N., Frederking, R., Mariani, J., and Zampolli, A., editors, Multilingual Information Management: Current Levels and Future Abilities. A study commissioned by the US National Science Foundation and also delivered to European Commission Language Engineering Office and the US Defense Advanced Research Projects Agency, 1999.Google Scholar
  30. 30.
    McGlaun, G., Althoff, F., Lang, M., and Rigoll, G. Development of a Generic Multimodal Framework for Handling Error Patterns during Human-Machine Interaction. In SCI 2004, 8th World Multi-Conference on Systems, Cybernetics, and Informatics, Orlando, FL, USA. 2004.Google Scholar
  31. 31.
    McTear, M. F. Spoken Dialogue Technology: Toward the Conversational User Interface. Springer Verlag, London, 2004. ISBN 1-85233-672-2.Google Scholar
  32. 32.
    Mehrabian, A. Communication without Words. Psychology Today, 2(4):53–56, 1968.Google Scholar
  33. 33.
    Nielsen, J. Usability Engineering. Academic Press, Inc., 1993. ISBN 0-12-518405-0.Google Scholar
  34. 34.
    Nogueiras, A., Moreno, A., Bonafonte, A., and Marino, J. Speech Emotion Recognition Using Hidden Markov Models. In Eurospeech 2001 Poster Proceedings, pages 2679–2682. Scandinavia, 2001.Google Scholar
  35. 35.
    Oviatt, S. Ten Myths of Multimodal Interaction. Communications of the ACM 42, 11:74–81, 1999.CrossRefGoogle Scholar
  36. 36.
    Oviatt, S., Cohen, P., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J., and Ferro, D. Designing the User Interface for Multimodal Speech and Pen-based Gesture Applications: State-of-the-Art Systems and Future Research Directionss. Human Computer Interaction, (15(4)):263–322, 2000.CrossRefGoogle Scholar
  37. 37.
    Pantic, M. and Rothkrantz, L. Automatic Analysis of Facial Expressions: The State of the Art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1424–1445, 2000.CrossRefGoogle Scholar
  38. 38.
    Pantic, M. and Rothkrantz, L. Toward an Affect-Sensitive Multimodal Human-Computer Interaction. Proccedings of the IEEE, 91:1370–1390, September 2003.CrossRefGoogle Scholar
  39. 39.
    Petrushin, V. Emotion in Speech: Recognition and Application to Call Centers. In Proceedings of the Conference on Artificial Neural Networks in Engineering(ANNIE’ 99). 1999.Google Scholar
  40. 40.
    Picard, R. W. Affective Computing. MIT Press, Massachusetts, 2nd edition, 1998. ISBN 0-262-16170-2.Google Scholar
  41. 41.
    Pieraccini, R., Levin, E., and Eckert, W. AMICA: The AT&T Mixed Initiative Conversational Architecture. In Proceedings of the Eurospeech’ 97, pages 1875–1878. Rhodes, Greece, 1997.Google Scholar
  42. 42.
    Reason, J. Human Error. Cambridge University Press, 1990. ISBN 0521314194.Google Scholar
  43. 43.
    Sadek, D. and de Mori, R. Dialogue Systems. In de Mori, R., editor, Spoken Dialogues with computers, pages 523–562. Academic Press, 1998.Google Scholar
  44. 44.
    Schomaker, L., Nijtmanns, J., Camurri, C., Morasso, P., and Benoit, C. A Taxonomy of Multimodal Interaction in the Human Information Processing System. Report of ESPRIT III: Basic Research Project 8579, Multimodal Interface for Advanced Multimedia Interfaces (MIAMI). Technical report, 1995.Google Scholar
  45. 45.
    Schuller, B., Müller, R., Lang, M., and Rigoll, G. Speaker Independent Emotion Recognition by Early Fusion of Acousticand Linguistic Features within Ensembles. In Proceedings of the ISCA Interspeech 2005. Lisboa, Portugal, 2005.Google Scholar
  46. 46.
    Schuller, B., Rigoll, G., and Lang, M. Hidden Markov Model-Based Speech Emotion Recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), volume II, pages 1–4. 2003.Google Scholar
  47. 47.
    Schuller, B., Rigoll, G., and Lang, M. Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine-Belief Network Architecture. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), volume I, pages 577–580. Montreal, Quebec, 2004.Google Scholar
  48. 48.
    Schuller, B., Villar, R. J., Rigoll, G., and Lang, M. Meta-Classifiers in Acoustic and Linguistic Feature Fusion-Based Affect Recognition. In Proceedings of the International Conference on Acoustics, Speechand Signal Processing (ICASSP) 2005, volume 1, pages 325–329. Philadelphia, Pennsylvania, 2005.Google Scholar
  49. 49.
    Shneiderman, B. Designing the user interface: Strategies for effective human-computer interaction (3rd ed.). Addison-Wesley Publishing, 1998. ISBN 0201694972.Google Scholar
  50. 50.
    Smith, W. and Hipp, D. Spoken Natural Language Dialog Systems: A Practical Approach. Oxford University Press, 1994. ISBN 0-19-509187-6.Google Scholar
  51. 51.
    Tian, Y., Kanade, T., and Cohn, J. Evaluation of Gabor-wavelet-based Facial Action Unit Recognitionin Image Sequences of Increasing Complexity. In Proceedings of the Fifth IEEE International Conference on AutomaticFace and Gesture Recognition, pages 229–234. May 2002.Google Scholar
  52. 52.
    Turk, M. and Pentland, A. Face Recognition Using Eigenfaces. In Proc. of Conference on Computer Vision and Pattern Recognition, pages 586–591. 1991.Google Scholar
  53. 53.
    van Zanten, G. V. User-modeling in Adaptive Dialogue Management. In Proceedings of the Eurospeech’ 99, pages 1183–1186. Budapest, Hungary, 1999.Google Scholar
  54. 54.
    Ververidis, D. and Kotropoulos, C. A State of the Art Review on Emotional Speech Databases. In Proceedings of the 1st Richmedia Conference, pages 109–119. Lausanne, Sitzerland, 2003.Google Scholar
  55. 55.
    Witten, I. H. and Frank, E. Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco, CA, 2000. ISBN 1-558-60552-5.Google Scholar
  56. 56.
    Wu, L., Oviatt, S., and Cohen, P. Multimodal integration-A Statistical Review. 1(4), pages 334–341. 1999.Google Scholar
  57. 57.
    Young, S. Probabilistic Methods in Spoken Dialogue Systems. Philosophical Transactions of the Royal Society, 358:1389–1402, 2000.zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Björn Schuller
    • 1
  • Markus Ablaßmeier
    • 1
  • Ronald Müller
    • 1
  • Stefan Reifinger
    • 1
  • Tony Poitschke
    • 1
  • Gerhard Rigoll
    • 1
  1. 1.Institute for Human-Machine CommunicationTechnische Universität MünchenMunichGermany

Personalised recommendations