Automatic Recognition of Emotions from Speech: A Review of the Literature and Recommendations for Practical Realisation

Vogt, Thurid; André, Elisabeth; Wagner, Johannes

doi:10.1007/978-3-540-85099-1_7

Thurid Vogt¹,
Elisabeth André¹ &
Johannes Wagner¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4868))

5371 Accesses
57 Citations

Abstract

In this article we give guidelines on how to address the major technical challenges of automatic emotion recognition from speech in human-computer interfaces, which include audio segmentation to find appropriate units for emotions, extraction of emotion relevant features, classification of emotions, and training databases with emotional speech. Research so far has mostly dealt with offline evaluation of vocal emotions, and online processing has hardly been addressed. Online processing is, however, a necessary prerequisite for the realization of human-computer interfaces that analyze and respond to the user’s emotions while he or she is interacting with an application. By means of a sample application, we demonstrate how the challenges arising from online processing may be solved. The overall objective of the paper is to help readers to assess the feasibility of human-computer interfaces that are sensitive to the user’s emotional voice and to provide them with guidelines of how to technically realize such interfaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Murray, I., Arnott, J.: Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the Acoustical Society of America 93(2), 1097–1108 (1993)
Article Google Scholar
Wilting, J., Krahmer, E., Swerts, M.: Real vs. acted emotional speech. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)
Google Scholar
Velten, E.: A laboratory task for induction of mood states. Behavior Research & Therapy 6, 473–482 (1968)
Article Google Scholar
Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: Proceedings of ICSLP, Philadelphia, USA (1996)
Google Scholar
Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18(4), 407–422 (2005)
Article Google Scholar
Litman, D.J., Forbes-Riley, K.: Predicting student emotions in computer-human tutoring dialogues. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain (2004)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)
Google Scholar
Engberg, I.S., Hansen, A.V.: Documentation of the Danish Emotional Speech Database (DES). Technical report. Aalborg University, Aalborg, Denmark (1996)
Google Scholar
Schiel, F., Steininger, S., Türk, U.: The SmartKom multimodal corpus at BAS. In: Proceedings of the 3rd Language Resources & Evaluation Conference (LREC) 2002, Las Palmas, Gran Canaria, Spain, pp. 200–206 (2002)
Google Scholar
Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M., Wong, M.: You stupid tin box - children interacting with the AIBO robot: A cross-linguistic emotional speech corpus. In: Proceedings of the 4th International Conference of Language Resources and Evaluation LREC 2004, Lisbon, pp. 171–174 (2004)
Google Scholar
Tato, R., Santos, R., Kompe, R., Pardo, J.M.: Emotional space improves emotion recognition. In: Proceedings International Conference on Spoken Language Processing, Denver, pp. 2029–2032 (2002)
Google Scholar
Yu, C., Aoki, P.M., Woodruff, A.: Detecting user engagement in everyday conversations. In: Proceedings of Interspeech 2004 — ICSLP, Jeju, Korea, pp. 1329–1332 (2004)
Google Scholar
Grimm, M., Kroschel, K., Harris, H., Nass, C., Schuller, B., Rigoll, G., Moosmayr, T.: On the necessity and feasibility of detecting a driver‘s emotional state while driving. In: International Conference on Affective Computing and Intelligent Interaction, Lisbon, Portugal, pp. 126–138 (2007)
Google Scholar
Kollias, S.: ERMIS — Emotionally Rich Man-machine Intelligent System. (2002) retrieved: 09.02.2007, http://www.image.ntua.gr/ermis/
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: IS-LTC 2006, Ljubljana, Slovenia (2006)
Google Scholar
Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: How to find trouble in communication. Speech Communication 40, 117–143 (2003)
Article MATH Google Scholar
Picard, R.W.: Affective Computing. MIT Press, Cambridge (1998)
Google Scholar
Madan, A.: Jerk-O-Meter: Speech-Feature Analysis Provides Feedback on Your Phone Interactions (2005), retrieved: 28.06.2007, http://www.media.mit.edu/press/jerk-o-meter/
Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Electronic Speech Signal Processing Conference, Prague, Czech Republic (2005)
Google Scholar
Riccardi, G., Hakkani-Tür, D.: Grounding emotions in human-machine conversational systems. In: Proceedings of Intelligent Technologies for Interactive Entertainment, INTETAIN, Madonna di Campiglio, Italy (2005)
Google Scholar
Ai, H., Litman, D.J., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)
Google Scholar
Jones, C., Jonsson, I.: Using Paralinguistic Cues in Speech to Recognise Emotions in Older Car Drivers. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
Google Scholar
Schuller, B., Rigoll, G., Grimm, M., Kroschel, K., Moosmayr, T., Ruske, G.: Effects of in-car noise-conditions on the recognition of emotion within speech. In: Proc. of the DAGA 2007, Stuttgart, Germany (2007)
Google Scholar
Jones, C., Sutherland, J.: Acoustic Emotion Recognition for Affective Computer Gaming. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
Google Scholar
Jones, C., Deeming, A.: Affective Human-Robotic Interaction. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
Google Scholar
Hegel, F., Spexard, T., Vogt, T., Horstmann, G., Wrede, B.: Playing a different imitation game: Interaction with an empathic android robot. In: Proc. 2006 IEEE-RAS International Conference on Humanoid Robots (Humanoids 2006) (2006)
Google Scholar
Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: Proceedings of International Conference on Multimedia & Expo., Amsterdam, The Netherlands (2005)
Google Scholar
Fernandez, R., Picard, R.W.: Classical and novel discriminant features for affect recognition from speech. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)
Google Scholar
Oudeyer, P.Y.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59(1–2), 157–183 (2003)
Google Scholar
Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)
Google Scholar
Nicholas, G., Rotaru, M., Litman, D.J.: Exploiting word-level features for emotion recognition. In: Proceedings of the IEEE/ACL Workshop on Spoken Language Technology, Aruba (2006)
Google Scholar
Batliner, A., Zeißler, V., Frank, C., Adelhardt, J., Shi, R.P., Nöth, E.: We are not amused - but how do you know? User states in a multi-modal dialogue system. In: Proceedings of Eurospeech 2003, Geneva, Switzerland, pp. 733–736 (2003)
Google Scholar
Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion recognition by speech signals. In: Proceedings of Eurospeech 2003, Geneva, Switzerland, pp. 125–128 (2003)
Google Scholar
Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Transaction on speech and audio processing 13(2), 293–303 (2005)
Article Google Scholar
Zhang, S.,, P.: C.C., Kong, F.: Automatic emotion recognition of speech signal in mandarin. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)
Google Scholar
Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Proc. Language Resources and Evaluation Conference (LREC 2006), Genoa (2006)
Google Scholar
Wagner, J., Vogt, T., André, E.: A systematic comparison of different hmm designs for emotion recognition from acted and spontaneous speech. In: International Conference on Affective Computing and Intelligent Interaction (ACII), Lisbon, Portugal, pp. 114–125 (2007)
Google Scholar
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov models. Speech Communication 41, 603–623 (2003)
Article Google Scholar
Petrushin, V.A.: Creating emotion recognition agents for speech signal. In: Dautenhahn, K., Bond, A.H., Canamero, L., Edmonds, B. (eds.) Socially Intelligent Agents. Creating Relationships with Computers and Robots, pp. 77–84. Kluwer Academic Publishers, Dordrecht (2002)
Google Scholar
Scherer, K.R., Banse, R., Walbott, H.G., Goldbeck, T.: Vocal clues in emotion encoding and decoding. Motivation and Emotion 15, 123–148 (1991)
Article Google Scholar
Polzin, T.S., Waibel, A.H.: Detecting emotions in speech. In: Proceedings of Cooperative Multimodal Communications, Tilburg, The Netherlands (1998)
Google Scholar
Polzin, T.S., Waibel, A.H.: Emotion-sensitive human-computer interfaces. In: Workshop on Speech and Emotion, Newcastle, Northern Ireland, UK, pp. 201–206 (2000)
Google Scholar
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A.: Emotion recognition based on phoneme classes. In: Proceedings of Interspeech 2004 — ICSLP, Jeju, Korea (2004)
Google Scholar
Nogueiras, A., Moreno, A., Bonafonte, A., No, J.M.: Speech emotion recognition using hidden markov models. In: Proceedings of Eurospeech, Aalborg, Denmark (2001)
Google Scholar
Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., van der Werf, R.J., Morency, L.P.: Virtual rapport. In: 6th International Conference on Intelligent Virtual Agents, Marina del Rey, USA (2006)
Google Scholar
de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., de Carolis, B.: From Greta’s mind to her face: modelling the dynamics of affective states in a conversational embodied agent. International Journal of Human-Computer Studies 59, 81–118 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia Concepts and Applications, University of Augsburg, Augsburg, Germany
Thurid Vogt, Elisabeth André & Johannes Wagner

Authors

Thurid Vogt
View author publications
You can also search for this author in PubMed Google Scholar
Elisabeth André
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Wagner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christian Peter Russell Beale

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vogt, T., André, E., Wagner, J. (2008). Automatic Recognition of Emotions from Speech: A Review of the Literature and Recommendations for Practical Realisation. In: Peter, C., Beale, R. (eds) Affect and Emotion in Human-Computer Interaction. Lecture Notes in Computer Science, vol 4868. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85099-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-85099-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85098-4
Online ISBN: 978-3-540-85099-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics