Skip to main content
Log in

Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech

  • Original Paper
  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Abstract

The ‘traditional’ first two dimensions in emotion research are VALENCE and AROUSAL. Normally, they are obtained by using elicited, acted data. In this paper, we use realistic, spontaneous speech data from our ‘AIBO’ corpus (human-robot communication, children interacting with Sony’s AIBO robot). The recordings were done in a Wizard-of-Oz scenario: the children believed that AIBO obeys their commands; in fact, AIBO followed a fixed script and often disobeyed. Five labellers annotated each word as belonging to one of eleven emotion-related states; seven of these states which occurred frequently enough are dealt with in this paper. The confusion matrices of these labels were used in a Non-Metrical Multi-dimensional Scaling to display two dimensions; the first we interpret as VALENCE, the second, however, not as AROUSAL but as INTERACTION, i.e., addressing oneself (angry, joyful) or the communication partner (motherese, reprimanding). We show that it depends on the specifity of the scenario and on the subjects’ conceptualizations whether this new dimension can be observed, and discuss impacts on the practice of labelling and processing emotional data. Two-dimensional solutions based on acoustic and linguistic features that were used for automatic classification of these emotional states are interpreted along the same lines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ai, H., Litman, D.J., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of the International Conference on Spoken Language Processing (Interspeech 2006—ICSLP), pp. 797–800. Pittsburgh (2006)

  • Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the International Conference on Spoken Language Processing (Interspeech 2002—ICSLP), pp. 2037–2040. Denver, (2002)

  • Batliner A. and Möbius B. (2005). Prosodic models, automatic speech understanding and speech synthesis: towards the common ground?. In: Barry, W. and Dommelen, W. (eds) The Integration of Phonetic Knowledge in Speech Technology, pp 21–44. Springer, Dordrecht

    Chapter  Google Scholar 

  • Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: Desperately seeking emotions: actors, wizards, and human beings. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 195–200. Newcastle (2000a)

  • Batliner A., Huber R., Niemann H., Nöth E., Spilker J. and Fischer K. (2000b). The recognition of emotion. In: Wahlster, W. (eds) Verbmobil: Foundations of Speech-to-Speech Translations, pp 122–130. Springer, Berlin

    Google Scholar 

  • Batliner A., Fischer K., Huber R., Spilker J. and Nöth E. (2003a). How to find trouble in communication. Speech Comm. 40:117–143

    Article  MATH  Google Scholar 

  • Batliner, A., Hacker, C., Steidl, S., Nöth, E., Haas, J.: User states, user strategies, and system performance: how to match the one with the other. In: Proceedings of an ISCA Tutorial and Research Workshop on Error Handling in Spoken Dialogue Systems, pp. 5–10. Chateau d’Oex (2003b)

  • Batliner, A., Zeissler, V., Frank, C., Adelhardt, J., Shi, R.P., Nöth, E.: We are not amused—but how do you know? User states in a multi-modal dialogue system. In: Proceedings of the European Conference on Speech Communication and Technology (Interspeech 2003—Eurospeech), pp. 733–736. Geneva (2003c)

  • Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M., Wong, M.: “You stupid tin box”—children interacting with the AIBO robot: a cross-linguistic emotional speech corpus. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2004), pp. 171–174. Lisbon (2004a)

  • Batliner A., Hacker C., Steidl S., Nöth E. and Haas J. (2004b). From emotion to interaction: lessons from real human-machine-dialogues. In: André, E., Dybkiaer, L., Minker, W., and Heisterkamp, P. (eds) Affective Dialogue Systems, Proceedings of a Tutorial and Research Workshop, Lecture Notes in Artificial Intelligence, pp 1–12. Springer, Berlin

    Google Scholar 

  • Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Private emotions vs. social interaction—towards new dimensions in research on emotion. In: Proceedings of a Workshop on Adapting the Interaction Style to Affective Factors, 10th International Conference on User Modelling. Edinburgh, no pagination (2005a)

  • Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Tales of tuning—prototyping for automatic classification of emotional user states. In: Proceedings of the European Conference on Speech Communication and Technology (Interspeech 2005—Eurospeech), pp. 489–492. Lisbon (2005b)

  • Batliner, A., Biersack, S., Steidl, S.: The prosody of pet robot directed speech: evidence from children. In: Proceedings of Speech Prosody 2006, pp. 1–4. Dresden (2006a)

  • Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: Proceedings of Language Technologies (IS-LTC 2006), pp. 240–245. Ljubljana (2006b)

  • Campbell, N.: A language-resources approach to emotion: the analysis of expressive speech. In: Proceedings of a Satellite Workshop of the International Conference on Language Resources and Evaluation (LREC 2006) on Corpora for Research on Emotion and Affect, pp. 1–5. Genoa (2006)

  • Cowie R. and Cornelius R. (2003). Describing the emotional states that are expressed in speech. Speech Comm. 40:5–32

    Article  MATH  Google Scholar 

  • Cowie R. and Schröder M. (2004). Piecing together the emotion jigsaw. In: Bengio, S. and Bourlard, H. (eds) Machine Learning for Multimodal Interaction, First International Workshop, MLMI 2004, Martigny, Switzerland, June 21–23, 2004, Lecture Notes in Computer Science, pp 305–317. Springer, Berlin

    Google Scholar 

  • Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: ‘FEELTRACE’: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 19–24. Newcastle, Northern Ireland (2000)

  • Devillers, L., Vidrascu, L.: Real-life emotions detection with lexical and paralinguistic cues on Human–Human call center dialogs. In: Proceedings of the International Conference on Spoken Language Processing (Interspeech 2006—ICSLP), pp. 801–804. Pittsburgh (2006)

  • Devillers L., Vidrascu L. and Lamel L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18:407–422

    Article  Google Scholar 

  • D’Mello, S.K., Craig, S.D., Witherspoon, A., McDaniel, B., Graesser, A.: Automatic detection of learner’s affect from conversational cues. User Model. User-Adapt. Inter. 18, doi:10.1007/s11257-007-9037-6 (2008)

  • Gratch J., Mao W. and Marsella S. (2006). Modeling social emotions and social attributions. In: Sun, R. (eds) Cognitive Modeling and Multi-agent Interactions, pp 219–251. Cambridge University Press, Cambridge

    Google Scholar 

  • Jäger, R., Bortz, J.: Rating scales with smilies as symbolic labels—determined and checked by methods of Psychophysics. In: 70. Annual Meeting of the International Society for Psychophysics. Leipzig, no pagination (2001)

  • Kehrein R. (2002). Prosodie und Emotionen. Niemeyer, Tübingen

    Google Scholar 

  • Kruskal, J., Wish, M.: Multidimensional scaling. Sage University, Beverly Hills and London (1978)

  • Labov W. (1970). The study of language in its social context. Stud. Gen. 3:30–87

    Google Scholar 

  • Laskowski, K., Burger, S.: Annotation and analysis of emotionally relevant behavior in the ISL meeting corpus. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2006), pp. 1111–1116. Genoa (2006)

  • Lee, C., Narayanan, S., Pieraccini, R.: Recognition of negative emotions from the speech signal. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU’01). no pagination (2001)

  • Lyons, M., Akamatsu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor wavelets. In: Proceedings of the 3rd International Conference on Face & Gesture Recognition (FG ’98), pp. 200–205. Nara (1998)

  • Müller S. and Kasper W. (2000). HPSG analysis of German. In: Wahlster, W. (eds) Verbmobil: Foundations of Speech-to-Speech Translations, pp 238–253. Springer, Berlin

    Google Scholar 

  • Neiberg, D., Elenius, K., Laskowski, K.: Emotion recognition in spontaneous speech using GMMs. In: Proceedings of The International Conference on Spoken Language Processing (Interspeech 2006—ICSLP), pp. 809–812. Pittsburgh (2006)

  • Ortony A., Clore G.L. and Collins A. (1988). The cognitive structure of emotion. Cambridge University Press, Cambridge

    Google Scholar 

  • Osgood C., Suci G. and Tannenbaum P. (1957). The measurement of meaning. University of Illinois Press, Urbana

    Google Scholar 

  • Picard R. (1997). Affective Computing. MIT Press, Cambridge, MA

    Google Scholar 

  • Poggi, I., Pelachaud, C., Carolis, B.D.: To display or not to display? towards the architecture of a reflexive agent. In: Proceedings of the 2nd Workshop on Attitude, Personality and Emotions in User-adapted Interaction. User Modeling 2001, pp. 13–17. Sonthofen (2001)

  • Portele T. (2004). Interaction modeling in the SmartKom system. In: André, E., Dybkiaer, L., Minker, W. and Heisterkamp, P. (eds) Affective Dialogue Systems, Proceedings of a Tutorial and Research Workshop, Lecture Notes in Artificial Intelligence, pp 89–94. Springer, Berlin

    Google Scholar 

  • Reidsma, D., Heylen, D., Ordelman, R.: Annotating emotions in meetings. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2006), pp. 1117–1122. Genoa (2006)

  • Sammon J. (1969). A nonlinear mapping for data structure analysis. IEEE Trans. Comput. C- 18:401–409

    Article  Google Scholar 

  • Scherer, K.R.: Adding the affective dimension: a new look in speech analysis and synthesis. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 1996). Philadelphia, no pagination (1996)

  • Scherer, K.R.: The nature and study of appraisal: a review of the issues. In: Scherer K.R., Schorr A., Johnstone T. (eds.) Appraisal Processes in Emotion: Theory, Methods, Research, pp. 369–391. Oxford University Press (2001)

  • Scherer K.R. (2003). Vocal communication of emotion: a review of research paradigms. Speech Comm. 40:227–256

    Article  MATH  Google Scholar 

  • Scherer K. and Ceschi G. (2000). Criteria for emotion recognition from verbal and nonverbal expression: studying baggage loss in the airport. Personal. Soc. Psychol. Bull. 26:327–339

    Article  Google Scholar 

  • Schlosberg H. (1941). A scale for judgment of facial expressions. J. Exper. Psychol. 29:497–510

    Article  Google Scholar 

  • Schlosberg H. (1952). The description of facial expressions in terms of two dimensions. J. Exper. Psychol. 44:229–237

    Article  Google Scholar 

  • Schlosberg H. (1954). Three dimensions of emotion. Psychol. Rev. 61:81–88

    Article  Google Scholar 

  • Schröder, M.: Speech and Emotion Research. An Overview of Research Frameworks and a Dimensional Approach to Emotional Speech Synthesis, Vol. 7 of Reports in Phonetics, University of the Saarland. Institute for Phonetics, University of Saarbrücken (2004)

  • Slaney, M., McRoberts, G.: Baby ears: a recognition system for affective vocalizations. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1998), pp. 985–988. Seattle (1998)

  • Steidl, S., Levit, M., Batliner, A., Nöth, E., Niemann, H.: “Of all things the measure is man”: automatic classification of emotions and inter-labeler consistency. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), pp. 317–320. Philadelphia (2005)

  • Streit M., Batliner A. and Portele T. (2006). Emotions analysis and emotion-handling subdialogues. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems, pp 317–332. Springer, Berlin

    Chapter  Google Scholar 

  • Watzlawick P., Beavin J. and Jackson D.D. (1967). Pragmatics of human communications. W.W. Norton & Company, New York

    Google Scholar 

  • Wundt W. (1896). Grundriss der Psychologie. Engelmann, Leipzig

    Google Scholar 

  • Zeißler V., Adelhardt J., Batliner A., Frank C., Nöth E., Shi P. and Niemann H. (2006). The prosody module. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems, pp 139–152. Springer, Berlin

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton Batliner.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Batliner, A., Steidl, S., Hacker, C. et al. Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech. User Model User-Adap Inter 18, 175–206 (2008). https://doi.org/10.1007/s11257-007-9039-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11257-007-9039-4

Keywords

Navigation