Skip to main content

Automatic Recognition of Emotions from Speech: A Review of the Literature and Recommendations for Practical Realisation

  • Chapter
Affect and Emotion in Human-Computer Interaction

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4868))

Abstract

In this article we give guidelines on how to address the major technical challenges of automatic emotion recognition from speech in human-computer interfaces, which include audio segmentation to find appropriate units for emotions, extraction of emotion relevant features, classification of emotions, and training databases with emotional speech. Research so far has mostly dealt with offline evaluation of vocal emotions, and online processing has hardly been addressed. Online processing is, however, a necessary prerequisite for the realization of human-computer interfaces that analyze and respond to the user’s emotions while he or she is interacting with an application. By means of a sample application, we demonstrate how the challenges arising from online processing may be solved. The overall objective of the paper is to help readers to assess the feasibility of human-computer interfaces that are sensitive to the user’s emotional voice and to provide them with guidelines of how to technically realize such interfaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Murray, I., Arnott, J.: Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the Acoustical Society of America 93(2), 1097–1108 (1993)

    Article  Google Scholar 

  2. Wilting, J., Krahmer, E., Swerts, M.: Real vs. acted emotional speech. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)

    Google Scholar 

  3. Velten, E.: A laboratory task for induction of mood states. Behavior Research & Therapy 6, 473–482 (1968)

    Article  Google Scholar 

  4. Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: Proceedings of ICSLP, Philadelphia, USA (1996)

    Google Scholar 

  5. Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18(4), 407–422 (2005)

    Article  Google Scholar 

  6. Litman, D.J., Forbes-Riley, K.: Predicting student emotions in computer-human tutoring dialogues. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain (2004)

    Google Scholar 

  7. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)

    Google Scholar 

  8. Engberg, I.S., Hansen, A.V.: Documentation of the Danish Emotional Speech Database (DES). Technical report. Aalborg University, Aalborg, Denmark (1996)

    Google Scholar 

  9. Schiel, F., Steininger, S., Türk, U.: The SmartKom multimodal corpus at BAS. In: Proceedings of the 3rd Language Resources & Evaluation Conference (LREC) 2002, Las Palmas, Gran Canaria, Spain, pp. 200–206 (2002)

    Google Scholar 

  10. Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M., Wong, M.: You stupid tin box - children interacting with the AIBO robot: A cross-linguistic emotional speech corpus. In: Proceedings of the 4th International Conference of Language Resources and Evaluation LREC 2004, Lisbon, pp. 171–174 (2004)

    Google Scholar 

  11. Tato, R., Santos, R., Kompe, R., Pardo, J.M.: Emotional space improves emotion recognition. In: Proceedings International Conference on Spoken Language Processing, Denver, pp. 2029–2032 (2002)

    Google Scholar 

  12. Yu, C., Aoki, P.M., Woodruff, A.: Detecting user engagement in everyday conversations. In: Proceedings of Interspeech 2004 — ICSLP, Jeju, Korea, pp. 1329–1332 (2004)

    Google Scholar 

  13. Grimm, M., Kroschel, K., Harris, H., Nass, C., Schuller, B., Rigoll, G., Moosmayr, T.: On the necessity and feasibility of detecting a driver‘s emotional state while driving. In: International Conference on Affective Computing and Intelligent Interaction, Lisbon, Portugal, pp. 126–138 (2007)

    Google Scholar 

  14. Kollias, S.: ERMIS — Emotionally Rich Man-machine Intelligent System. (2002) retrieved: 09.02.2007, http://www.image.ntua.gr/ermis/

  15. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: IS-LTC 2006, Ljubljana, Slovenia (2006)

    Google Scholar 

  16. Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: How to find trouble in communication. Speech Communication 40, 117–143 (2003)

    Article  MATH  Google Scholar 

  17. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1998)

    Google Scholar 

  18. Madan, A.: Jerk-O-Meter: Speech-Feature Analysis Provides Feedback on Your Phone Interactions (2005), retrieved: 28.06.2007, http://www.media.mit.edu/press/jerk-o-meter/

  19. Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Electronic Speech Signal Processing Conference, Prague, Czech Republic (2005)

    Google Scholar 

  20. Riccardi, G., Hakkani-Tür, D.: Grounding emotions in human-machine conversational systems. In: Proceedings of Intelligent Technologies for Interactive Entertainment, INTETAIN, Madonna di Campiglio, Italy (2005)

    Google Scholar 

  21. Ai, H., Litman, D.J., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)

    Google Scholar 

  22. Jones, C., Jonsson, I.: Using Paralinguistic Cues in Speech to Recognise Emotions in Older Car Drivers. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)

    Google Scholar 

  23. Schuller, B., Rigoll, G., Grimm, M., Kroschel, K., Moosmayr, T., Ruske, G.: Effects of in-car noise-conditions on the recognition of emotion within speech. In: Proc. of the DAGA 2007, Stuttgart, Germany (2007)

    Google Scholar 

  24. Jones, C., Sutherland, J.: Acoustic Emotion Recognition for Affective Computer Gaming. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)

    Google Scholar 

  25. Jones, C., Deeming, A.: Affective Human-Robotic Interaction. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)

    Google Scholar 

  26. Hegel, F., Spexard, T., Vogt, T., Horstmann, G., Wrede, B.: Playing a different imitation game: Interaction with an empathic android robot. In: Proc. 2006 IEEE-RAS International Conference on Humanoid Robots (Humanoids 2006) (2006)

    Google Scholar 

  27. Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: Proceedings of International Conference on Multimedia & Expo., Amsterdam, The Netherlands (2005)

    Google Scholar 

  28. Fernandez, R., Picard, R.W.: Classical and novel discriminant features for affect recognition from speech. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)

    Google Scholar 

  29. Oudeyer, P.Y.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59(1–2), 157–183 (2003)

    Google Scholar 

  30. Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)

    Google Scholar 

  31. Nicholas, G., Rotaru, M., Litman, D.J.: Exploiting word-level features for emotion recognition. In: Proceedings of the IEEE/ACL Workshop on Spoken Language Technology, Aruba (2006)

    Google Scholar 

  32. Batliner, A., Zeißler, V., Frank, C., Adelhardt, J., Shi, R.P., Nöth, E.: We are not amused - but how do you know? User states in a multi-modal dialogue system. In: Proceedings of Eurospeech 2003, Geneva, Switzerland, pp. 733–736 (2003)

    Google Scholar 

  33. Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion recognition by speech signals. In: Proceedings of Eurospeech 2003, Geneva, Switzerland, pp. 125–128 (2003)

    Google Scholar 

  34. Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Transaction on speech and audio processing 13(2), 293–303 (2005)

    Article  Google Scholar 

  35. Zhang, S.,, P.: C.C., Kong, F.: Automatic emotion recognition of speech signal in mandarin. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)

    Google Scholar 

  36. Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Proc. Language Resources and Evaluation Conference (LREC 2006), Genoa (2006)

    Google Scholar 

  37. Wagner, J., Vogt, T., André, E.: A systematic comparison of different hmm designs for emotion recognition from acted and spontaneous speech. In: International Conference on Affective Computing and Intelligent Interaction (ACII), Lisbon, Portugal, pp. 114–125 (2007)

    Google Scholar 

  38. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov models. Speech Communication 41, 603–623 (2003)

    Article  Google Scholar 

  39. Petrushin, V.A.: Creating emotion recognition agents for speech signal. In: Dautenhahn, K., Bond, A.H., Canamero, L., Edmonds, B. (eds.) Socially Intelligent Agents. Creating Relationships with Computers and Robots, pp. 77–84. Kluwer Academic Publishers, Dordrecht (2002)

    Google Scholar 

  40. Scherer, K.R., Banse, R., Walbott, H.G., Goldbeck, T.: Vocal clues in emotion encoding and decoding. Motivation and Emotion 15, 123–148 (1991)

    Article  Google Scholar 

  41. Polzin, T.S., Waibel, A.H.: Detecting emotions in speech. In: Proceedings of Cooperative Multimodal Communications, Tilburg, The Netherlands (1998)

    Google Scholar 

  42. Polzin, T.S., Waibel, A.H.: Emotion-sensitive human-computer interfaces. In: Workshop on Speech and Emotion, Newcastle, Northern Ireland, UK, pp. 201–206 (2000)

    Google Scholar 

  43. Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A.: Emotion recognition based on phoneme classes. In: Proceedings of Interspeech 2004 — ICSLP, Jeju, Korea (2004)

    Google Scholar 

  44. Nogueiras, A., Moreno, A., Bonafonte, A., No, J.M.: Speech emotion recognition using hidden markov models. In: Proceedings of Eurospeech, Aalborg, Denmark (2001)

    Google Scholar 

  45. Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., van der Werf, R.J., Morency, L.P.: Virtual rapport. In: 6th International Conference on Intelligent Virtual Agents, Marina del Rey, USA (2006)

    Google Scholar 

  46. de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., de Carolis, B.: From Greta’s mind to her face: modelling the dynamics of affective states in a conversational embodied agent. International Journal of Human-Computer Studies 59, 81–118 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christian Peter Russell Beale

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Vogt, T., André, E., Wagner, J. (2008). Automatic Recognition of Emotions from Speech: A Review of the Literature and Recommendations for Practical Realisation. In: Peter, C., Beale, R. (eds) Affect and Emotion in Human-Computer Interaction. Lecture Notes in Computer Science, vol 4868. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85099-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85099-1_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85098-4

  • Online ISBN: 978-3-540-85099-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics