EmoVoice — A Framework for Online Recognition of Emotions from Voice

Vogt, Thurid; André, Elisabeth; Bee, Nikolaus

doi:10.1007/978-3-540-69369-7_21

EmoVoice — A Framework for Online Recognition of Emotions from Voice

Thurid Vogt¹,
Elisabeth André¹ &
Nikolaus Bee¹

Conference paper

1759 Accesses
63 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5078))

Abstract

We present EmoVoice, a framework for emotional speech corpus and classifier creation and for offline as well as real-time online speech emotion recognition. The framework is intended to be used by non-experts and therefore comes with an interface to create an own personal or application specific emotion recogniser. Furthermore, we describe some applications and prototypes that already use our framework to track online emotional user states from voice information.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ai, H., Litman, D.J., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)
Google Scholar
Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M., Wong, M.: ”You stupid tin box” - children interacting with the AIBO robot: A cross-linguistic emotional speech corpus. In: Proceedings of the 4th International Conference of Language Resources and Evaluation LREC 2004, Lisbon, pp. 171–174 (2004)
Google Scholar
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: Proc. IS-LTC 2006, Ljubljana, Slovenia (2006)
Google Scholar
Boersma, P., Weenink, D.: Praat: doing phonetics by computer (version 4.5.15) [computer program] (2007) (Retrieved 24.02.2007) http://www.praat.org/
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005a)
Google Scholar
Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Electronic Speech Signal Processing Conference, Prague, Czech Republic (2005b)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Charles, F., Lemercier, S., Vogt, T., Bee, N., Mancini, M., Urbain, J., Price, M., André, E., Pelachaud, C., Cavazza, M.: Affective interactive narrative in the callas project. In: Demo paper in Proceedings of the 4th International Conference on Virtual Storytelling, Saint Malo, France (2007)
Google Scholar
de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., de Carolis, B.: From Greta’s mind to her face: modelling the dynamics of affective states in a conversational embodied agent. International Journal of Human-Computer Studies 59, 81–118 (2003)
Article Google Scholar
Fink, G.: Developing HMM-based recognizers with esmeralda. In: Matoušek, V., et al. (eds.). Lecture notes in Artificial Intelligence, vol. 1962, pp. 229–234. Springer, Heidelberg (1999)
Google Scholar
Gilroy, S.W., Cavazza, M., Chaignon, R., Mäkelä, S.-M., Niiranen, M., André, E., Vogt, T., Billinghurst, M., Seichter, H., Benayoun, M.: An emotionally responsive AR art installation. In: Proceedings of ISMAR Workshop 2: Mixed Reality Entertainment and Art, Nara, Japan (2007)
Google Scholar
Hall, M.A.: Correlation-based feature subset selection for machine learning. Master’s thesis, University of Waikato, New Zealand (1998)
Google Scholar
Hegel, F., Spexard, T., Vogt, T., Horstmann, G., Wrede, B.: Playing a different imitation game: Interaction with an empathic android robot. In: Proc. 2006 IEEE-RAS International Conference on Humanoid Robots (Humanoids 2006) (2006)
Google Scholar
Jones, C., Deeming, A.: Affective human-robotic interaction. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2007)
Google Scholar
Jones, C., Sutherland, J.: Acoustic emotion recognition for affective computer gaming. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2007)
Google Scholar
Kim, J., André, E., Rehm, M., Vogt, T., Wagner, J.: Integrating information from speech and physiological signals to achieve emotional sensitivity. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)
Google Scholar
Madan, A.: Jerk-O-Meter: Speech-Feature Analysis Provides Feedback on Your Phone Interactions (2005) (retrieved: 28.06.2007), http://www.media.mit.edu/press/jerk-o-meter/
Oudeyer, P.-Y.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59(1–2), 157–183 (2003)
Google Scholar
Rehm, M., Vogt, T., Wissner, M., Bee, N.: Dancing the night away — controlling a virtual karaoke dancer by multimodal expressive cues. In: Proceedings of AAMAS 2008 (2008)
Google Scholar
Schiel, F., Steininger, S., Türk, U.: The SmartKom multimodal corpus at BAS. In: Proceedings of the 3rd Language Resources & Evaluation Conference (LREC) 2002, Las Palmas, Gran Canaria, Spain, pp. 200–206 (2002)
Google Scholar
Schuller, B., Rigoll, G., Grimm, M., Kroschel, K., Moosmayr, T., Ruske, G.: Effects of in-car noise-conditions on the recognition of emotion within speech. In: Proc. of the DAGA 2007, Stuttgart, Germany (2007a)
Google Scholar
Schuller, B., Seppi, D., Batliner, A., Maier, A., Steidl, S.: Towards more reality in the recognition of emotional speech. In: IEEE (ed.) Proc. ICASSP 2007, Honolulu, Hawaii, USA, vol. 2, pp. 941–944 (2007b)
Google Scholar
Velten, E.: A laboratory task for induction of mood states. Behavior Research & Therapy 6, 473–482 (1968)
Article Google Scholar
Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: Proceedings of International Conference on Multimedia & Expo, Amsterdam, The Netherlands (2005)
Google Scholar
Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Proc. Language Resources and Evaluation Conference (LREC 2006), Genoa (2006)
Google Scholar
Vogt, T., André, E., Wagner, J.: Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2007)
Google Scholar
Wilting, J., Krahmer, E., Swerts, M.: Real vs. acted emotional speech. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia Concepts and their Applications, University of Augsburg, Germany
Thurid Vogt, Elisabeth André & Nikolaus Bee

Authors

Thurid Vogt
View author publications
You can also search for this author in PubMed Google Scholar
Elisabeth André
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaus Bee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Elisabeth André Laila Dybkjær Wolfgang Minker Heiko Neumann Roberto Pieraccini Michael Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vogt, T., André, E., Bee, N. (2008). EmoVoice — A Framework for Online Recognition of Emotions from Voice. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-69369-7_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69368-0
Online ISBN: 978-3-540-69369-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics