Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech

Batliner, Anton; Steidl, Stefan; Hacker, Christian; Nöth, Elmar

doi:10.1007/s11257-007-9039-4

Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech

Original Paper
Published: 12 October 2007

Volume 18, pages 175–206, (2008)
Cite this article

User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Anton Batliner¹,
Stefan Steidl¹,
Christian Hacker¹ &
…
Elmar Nöth¹

456 Accesses
58 Citations
Explore all metrics

Abstract

The ‘traditional’ first two dimensions in emotion research are VALENCE and AROUSAL. Normally, they are obtained by using elicited, acted data. In this paper, we use realistic, spontaneous speech data from our ‘AIBO’ corpus (human-robot communication, children interacting with Sony’s AIBO robot). The recordings were done in a Wizard-of-Oz scenario: the children believed that AIBO obeys their commands; in fact, AIBO followed a fixed script and often disobeyed. Five labellers annotated each word as belonging to one of eleven emotion-related states; seven of these states which occurred frequently enough are dealt with in this paper. The confusion matrices of these labels were used in a Non-Metrical Multi-dimensional Scaling to display two dimensions; the first we interpret as VALENCE, the second, however, not as AROUSAL but as INTERACTION, i.e., addressing oneself (angry, joyful) or the communication partner (motherese, reprimanding). We show that it depends on the specifity of the scenario and on the subjects’ conceptualizations whether this new dimension can be observed, and discuss impacts on the practice of labelling and processing emotional data. Two-dimensional solutions based on acoustic and linguistic features that were used for automatic classification of these emotional states are interpreted along the same lines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emotion Recognition from Speech

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Article 10 April 2015

Speech Emotion Recognition Systems: A Cross-Language, Inter-racial, and Cross-Gender Comparison

References

Ai, H., Litman, D.J., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of the International Conference on Spoken Language Processing (Interspeech 2006—ICSLP), pp. 797–800. Pittsburgh (2006)
Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the International Conference on Spoken Language Processing (Interspeech 2002—ICSLP), pp. 2037–2040. Denver, (2002)
Batliner A. and Möbius B. (2005). Prosodic models, automatic speech understanding and speech synthesis: towards the common ground?. In: Barry, W. and Dommelen, W. (eds) The Integration of Phonetic Knowledge in Speech Technology, pp 21–44. Springer, Dordrecht
Chapter Google Scholar
Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: Desperately seeking emotions: actors, wizards, and human beings. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 195–200. Newcastle (2000a)
Batliner A., Huber R., Niemann H., Nöth E., Spilker J. and Fischer K. (2000b). The recognition of emotion. In: Wahlster, W. (eds) Verbmobil: Foundations of Speech-to-Speech Translations, pp 122–130. Springer, Berlin
Google Scholar
Batliner A., Fischer K., Huber R., Spilker J. and Nöth E. (2003a). How to find trouble in communication. Speech Comm. 40:117–143
Article MATH Google Scholar
Batliner, A., Hacker, C., Steidl, S., Nöth, E., Haas, J.: User states, user strategies, and system performance: how to match the one with the other. In: Proceedings of an ISCA Tutorial and Research Workshop on Error Handling in Spoken Dialogue Systems, pp. 5–10. Chateau d’Oex (2003b)
Batliner, A., Zeissler, V., Frank, C., Adelhardt, J., Shi, R.P., Nöth, E.: We are not amused—but how do you know? User states in a multi-modal dialogue system. In: Proceedings of the European Conference on Speech Communication and Technology (Interspeech 2003—Eurospeech), pp. 733–736. Geneva (2003c)
Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M., Wong, M.: “You stupid tin box”—children interacting with the AIBO robot: a cross-linguistic emotional speech corpus. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2004), pp. 171–174. Lisbon (2004a)
Batliner A., Hacker C., Steidl S., Nöth E. and Haas J. (2004b). From emotion to interaction: lessons from real human-machine-dialogues. In: André, E., Dybkiaer, L., Minker, W., and Heisterkamp, P. (eds) Affective Dialogue Systems, Proceedings of a Tutorial and Research Workshop, Lecture Notes in Artificial Intelligence, pp 1–12. Springer, Berlin
Google Scholar
Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Private emotions vs. social interaction—towards new dimensions in research on emotion. In: Proceedings of a Workshop on Adapting the Interaction Style to Affective Factors, 10th International Conference on User Modelling. Edinburgh, no pagination (2005a)
Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Tales of tuning—prototyping for automatic classification of emotional user states. In: Proceedings of the European Conference on Speech Communication and Technology (Interspeech 2005—Eurospeech), pp. 489–492. Lisbon (2005b)
Batliner, A., Biersack, S., Steidl, S.: The prosody of pet robot directed speech: evidence from children. In: Proceedings of Speech Prosody 2006, pp. 1–4. Dresden (2006a)
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: Proceedings of Language Technologies (IS-LTC 2006), pp. 240–245. Ljubljana (2006b)
Campbell, N.: A language-resources approach to emotion: the analysis of expressive speech. In: Proceedings of a Satellite Workshop of the International Conference on Language Resources and Evaluation (LREC 2006) on Corpora for Research on Emotion and Affect, pp. 1–5. Genoa (2006)
Cowie R. and Cornelius R. (2003). Describing the emotional states that are expressed in speech. Speech Comm. 40:5–32
Article MATH Google Scholar
Cowie R. and Schröder M. (2004). Piecing together the emotion jigsaw. In: Bengio, S. and Bourlard, H. (eds) Machine Learning for Multimodal Interaction, First International Workshop, MLMI 2004, Martigny, Switzerland, June 21–23, 2004, Lecture Notes in Computer Science, pp 305–317. Springer, Berlin
Google Scholar
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: ‘FEELTRACE’: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 19–24. Newcastle, Northern Ireland (2000)
Devillers, L., Vidrascu, L.: Real-life emotions detection with lexical and paralinguistic cues on Human–Human call center dialogs. In: Proceedings of the International Conference on Spoken Language Processing (Interspeech 2006—ICSLP), pp. 801–804. Pittsburgh (2006)
Devillers L., Vidrascu L. and Lamel L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18:407–422
Article Google Scholar
D’Mello, S.K., Craig, S.D., Witherspoon, A., McDaniel, B., Graesser, A.: Automatic detection of learner’s affect from conversational cues. User Model. User-Adapt. Inter. 18, doi:10.1007/s11257-007-9037-6 (2008)
Gratch J., Mao W. and Marsella S. (2006). Modeling social emotions and social attributions. In: Sun, R. (eds) Cognitive Modeling and Multi-agent Interactions, pp 219–251. Cambridge University Press, Cambridge
Google Scholar
Jäger, R., Bortz, J.: Rating scales with smilies as symbolic labels—determined and checked by methods of Psychophysics. In: 70. Annual Meeting of the International Society for Psychophysics. Leipzig, no pagination (2001)
Kehrein R. (2002). Prosodie und Emotionen. Niemeyer, Tübingen
Google Scholar
Kruskal, J., Wish, M.: Multidimensional scaling. Sage University, Beverly Hills and London (1978)
Labov W. (1970). The study of language in its social context. Stud. Gen. 3:30–87
Google Scholar
Laskowski, K., Burger, S.: Annotation and analysis of emotionally relevant behavior in the ISL meeting corpus. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2006), pp. 1111–1116. Genoa (2006)
Lee, C., Narayanan, S., Pieraccini, R.: Recognition of negative emotions from the speech signal. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU’01). no pagination (2001)
Lyons, M., Akamatsu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor wavelets. In: Proceedings of the 3rd International Conference on Face & Gesture Recognition (FG ’98), pp. 200–205. Nara (1998)
Müller S. and Kasper W. (2000). HPSG analysis of German. In: Wahlster, W. (eds) Verbmobil: Foundations of Speech-to-Speech Translations, pp 238–253. Springer, Berlin
Google Scholar
Neiberg, D., Elenius, K., Laskowski, K.: Emotion recognition in spontaneous speech using GMMs. In: Proceedings of The International Conference on Spoken Language Processing (Interspeech 2006—ICSLP), pp. 809–812. Pittsburgh (2006)
Ortony A., Clore G.L. and Collins A. (1988). The cognitive structure of emotion. Cambridge University Press, Cambridge
Google Scholar
Osgood C., Suci G. and Tannenbaum P. (1957). The measurement of meaning. University of Illinois Press, Urbana
Google Scholar
Picard R. (1997). Affective Computing. MIT Press, Cambridge, MA
Google Scholar
Poggi, I., Pelachaud, C., Carolis, B.D.: To display or not to display? towards the architecture of a reflexive agent. In: Proceedings of the 2nd Workshop on Attitude, Personality and Emotions in User-adapted Interaction. User Modeling 2001, pp. 13–17. Sonthofen (2001)
Portele T. (2004). Interaction modeling in the SmartKom system. In: André, E., Dybkiaer, L., Minker, W. and Heisterkamp, P. (eds) Affective Dialogue Systems, Proceedings of a Tutorial and Research Workshop, Lecture Notes in Artificial Intelligence, pp 89–94. Springer, Berlin
Google Scholar
Reidsma, D., Heylen, D., Ordelman, R.: Annotating emotions in meetings. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2006), pp. 1117–1122. Genoa (2006)
Sammon J. (1969). A nonlinear mapping for data structure analysis. IEEE Trans. Comput. C- 18:401–409
Article Google Scholar
Scherer, K.R.: Adding the affective dimension: a new look in speech analysis and synthesis. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 1996). Philadelphia, no pagination (1996)
Scherer, K.R.: The nature and study of appraisal: a review of the issues. In: Scherer K.R., Schorr A., Johnstone T. (eds.) Appraisal Processes in Emotion: Theory, Methods, Research, pp. 369–391. Oxford University Press (2001)
Scherer K.R. (2003). Vocal communication of emotion: a review of research paradigms. Speech Comm. 40:227–256
Article MATH Google Scholar
Scherer K. and Ceschi G. (2000). Criteria for emotion recognition from verbal and nonverbal expression: studying baggage loss in the airport. Personal. Soc. Psychol. Bull. 26:327–339
Article Google Scholar
Schlosberg H. (1941). A scale for judgment of facial expressions. J. Exper. Psychol. 29:497–510
Article Google Scholar
Schlosberg H. (1952). The description of facial expressions in terms of two dimensions. J. Exper. Psychol. 44:229–237
Article Google Scholar
Schlosberg H. (1954). Three dimensions of emotion. Psychol. Rev. 61:81–88
Article Google Scholar
Schröder, M.: Speech and Emotion Research. An Overview of Research Frameworks and a Dimensional Approach to Emotional Speech Synthesis, Vol. 7 of Reports in Phonetics, University of the Saarland. Institute for Phonetics, University of Saarbrücken (2004)
Slaney, M., McRoberts, G.: Baby ears: a recognition system for affective vocalizations. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1998), pp. 985–988. Seattle (1998)
Steidl, S., Levit, M., Batliner, A., Nöth, E., Niemann, H.: “Of all things the measure is man”: automatic classification of emotions and inter-labeler consistency. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), pp. 317–320. Philadelphia (2005)
Streit M., Batliner A. and Portele T. (2006). Emotions analysis and emotion-handling subdialogues. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems, pp 317–332. Springer, Berlin
Chapter Google Scholar
Watzlawick P., Beavin J. and Jackson D.D. (1967). Pragmatics of human communications. W.W. Norton & Company, New York
Google Scholar
Wundt W. (1896). Grundriss der Psychologie. Engelmann, Leipzig
Google Scholar
Zeißler V., Adelhardt J., Batliner A., Frank C., Nöth E., Shi P. and Niemann H. (2006). The prosody module. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems, pp 139–152. Springer, Berlin
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Mustererkennung, FAU Erlangen – Nürnberg, Martensstr. 3, 91058, Erlangen, Germany
Anton Batliner, Stefan Steidl, Christian Hacker & Elmar Nöth

Authors

Anton Batliner
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Steidl
View author publications
You can also search for this author in PubMed Google Scholar
Christian Hacker
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anton Batliner.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Batliner, A., Steidl, S., Hacker, C. et al. Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech. User Model User-Adap Inter 18, 175–206 (2008). https://doi.org/10.1007/s11257-007-9039-4

Download citation

Received: 03 July 2006
Accepted: 14 January 2007
Published: 12 October 2007
Issue Date: February 2008
DOI: https://doi.org/10.1007/s11257-007-9039-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech

Abstract

Access this article

Similar content being viewed by others

Emotion Recognition from Speech

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Speech Emotion Recognition Systems: A Cross-Language, Inter-racial, and Cross-Gender Comparison

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech

Abstract

Access this article

Similar content being viewed by others

Emotion Recognition from Speech

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Speech Emotion Recognition Systems: A Cross-Language, Inter-racial, and Cross-Gender Comparison

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation