Skip to main content

Automatic Labeling Affective Scenes in Spoken Conversations

  • Chapter
  • First Online:
Cognitive Infocommunications, Theory and Applications

Abstract

Research in affective computing has mainly focused on analyzing human emotional states as perceivable within limited contexts such as speech utterances. In our study, we focus on the dynamic transitions of the emotional states that are appearing throughout the conversations and investigate computational models to automatically label emotional states using the proposed affective scene framework. An affective scene includes a complete sequence of emotional states in a conversation from its start to its end. Affective scene instances include different patterns of behavior such as who manifests an emotional state, when it is manifested, and which kinds of changes occur due to the influence of one’s emotion onto another interlocutor. In this paper, we present the design and training of an automatic affective scene segmentation and classification system for spoken conversations. We comparatively evaluate the contributions of different feature types in the acoustic, lexical and psycholinguistic space and their correlations and combination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this chapter, the word ‘interlocutor’ encompasses the person speaking and expressing (speaker) the emotion and the person listening and perceiving (listener) that emotion.

  2. 2.

    The churn rate is “the percentage of customers who stop buying the products or services of a particular company.” In the telecommunication industry, some studies found that the approximate annual churn rate is \(30\%\) [24, 49].

  3. 3.

    Turn refers to the spoken content of a speaker at a time. For example, speaker A says something, which is speaker A’s turn, then, speaker B says something, which is speaker B’s turn.

  4. 4.

    https://github.com/firojalam/openSMILE-configuration.

  5. 5.

    By data level, we refer to the data preparation phase, i.e., before feature extraction we select segments of the majority class, which is neutral in this case.

  6. 6.

    By feature level, we refer to the over-sampling process on feature vector for minority classes.

References

  1. Alam F (2016) Computational models for analyzing affective behaviors and personality from speech and text. PhD thesis, University of Trento

    Google Scholar 

  2. Alam F, Riccardi G (2013) Comparative study of speaker personality traits recognition in conversational and broadcast news speech. In: Proceedings of interspeech, ISCA, pp 2851–2855

    Google Scholar 

  3. Alam F, Riccardi G (2014) Fusion of acoustic, linguistic and psycholinguistic features for speaker personality traits recognition. In: Proceedings of international conference on acoustics, speech and signal processing (ICASSP), pp 955–959

    Google Scholar 

  4. Alam F, Riccardi G (2014) Predicting personality traits using multimodal information. In: Proceedings of the 2014 ACM multi media on workshop on computational personality recognition, ACM, pp 15–18

    Google Scholar 

  5. Alam F, Chowdhury SA, Danieli M, Riccardi G (2016) How interlocutors coordinate with each other within emotional segments? In: COLING: international conference on computational linguistics

    Google Scholar 

  6. Baranyi P, Csapó Á (2012) Definition and synergies of cognitive infocommunications. Acta Polytech Hung 9(1):67–83

    Google Scholar 

  7. Barrett LF, Lewis M, Haviland-Jones JM (2016) Handbook of emotions. Guilford Publications

    Google Scholar 

  8. Carletta J (1996) Assessing agreement on classification tasks: the kappa statistic. Comput Linguist 22(2):249–254

    Google Scholar 

  9. Castán D, Ortega A, Miguel A (2014) Lleida E (2014) Audio segmentation-by-classification approach based on factor analysis in broadcast news domain. EURASIP J Audio Speech Music Process 1:1–13

    Google Scholar 

  10. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 321–357

    Google Scholar 

  11. Chowdhury SA (2017) Computational modeling of turn-taking dynamics in spoken conversations. PhD thesis, University of Trento

    Google Scholar 

  12. Chowdhury SA, Riccardi G (2017) A deep learning approach to modeling competitiveness in spoken conversation. In: Proceedings of international conference on acoustics, speech and signal processing (ICASSP), IEEE

    Google Scholar 

  13. Chowdhury SA, Riccardi G, Alam F (2014) Unsupervised recognition and clustering of speech overlaps in spoken conversations. In: Proceedings of workshop on speech, language and audio in multimedia—SLAM2014. pp 62–66

    Google Scholar 

  14. Chowdhury SA, Danieli M, Riccardi G (2015) Annotating and categorizing competition in overlap speech. In: Proceedings of ICASSP. IEEE

    Google Scholar 

  15. Chowdhury SA, Danieli M, Riccardi G (2015) The role of speakers and context in classifying competition in overlapping speech. In: Sixteenth annual conference of the international speech communication association

    Google Scholar 

  16. Chowdhury SA, Stepanov E, Riccardi G (2016) Predicting user satisfaction from turn-taking in spoken conversations. In: Proceedings of Interspeech

    Google Scholar 

  17. Danieli M, Riccardi G, Alam F (2015) Emotion unfolding and affective scenes: a case study in spoken conversations. In: Proceedings of emotion representations and modelling for companion systems (ERM4CT) 2015. ICMI

    Google Scholar 

  18. Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In: Proceedings of Interspeech. pp 801–804

    Google Scholar 

  19. Eyben F, Weninger F, Gross F, Schuller B (2013) Recent developments in opensmile, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on Multimedia (ACMM). ACM, pp 835–838

    Google Scholar 

  20. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuousvalued attributes for classification learning. Thirteenth international joint conference on articial intelligence, vol 2. Morgan Kaufmann Publishers, pp 1022–1027

    Google Scholar 

  21. Filipowicz A, Barsade S, Melwani S (2011) Understanding emotional transitions: the interpersonal consequences of changing emotions in negotiations. J Pers Soc Psychol 101(3):541

    Article  Google Scholar 

  22. Fisher W, Groff R, Roane H (2011) Applied behavior analysis: history, philosophy, principles, and basic methods. In: Handbook of applied behavior analysis, pp 3–13

    Google Scholar 

  23. Frijda NH (1993) Moods, emotion episodes, and emotions

    Google Scholar 

  24. Galanis D, Karabetsos S, Koutsombogera M, Papageorgiou H, Esposito A, Riviello MT (2013) Classification of emotional speech units in call centre interactions. In: 2013 IEEE 4th international conference on cognitive infocommunications (CogInfoCom). IEEE, pp 403–406

    Google Scholar 

  25. Gross JJ (1998) The emerging field of emotion regulation: an integrative review. Rev Gen Psychol 2(3):271

    Article  Google Scholar 

  26. Gross JJ, Thompson RA (2007) Emotion regulation: conceptual foundations. In: Handbook of emotion regulation, vol 3, p 24

    Google Scholar 

  27. Harrigan J, Rosenthal R (2008) New handbook of methods in nonverbal behavior research. Oxford University Press

    Google Scholar 

  28. Hoffman ML (2008) Empathy and prosocial behavior. Handb Emot 3:440–455

    Google Scholar 

  29. Juslin PN, Scherer KR (2005) Vocal expression of affect. In: The new handbook of methods in nonverbal behavior research. pp 65–135

    Google Scholar 

  30. Kim S, Georgiou PG, Lee S, Narayanan S (2007) Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In: Proceedings of multimedia signal processing, 2007 (MMSP 2007). pp 48–51

    Google Scholar 

  31. Konar A, Chakraborty A (2014) Emotion recognition: a pattern analysis approach. Wiley

    Google Scholar 

  32. Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: Proceedings of machine learning: European conference on machine learning (ECML). Springer, pp 171–182

    Google Scholar 

  33. Lee CC, Busso C, Lee S, Narayanan SS (2009) Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions. In: Proceedings of Interspeech. pp 1983–1986

    Google Scholar 

  34. McCall C, Singer T (2013) Empathy and the brain. In: Understanding other minds: Perspectives from developmental social neuroscience. pp 195–214

    Google Scholar 

  35. NIST (2009) The 2009 RT-09 RIch transcription meeting recognition evaluation plan. NIST

    Google Scholar 

  36. Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: Liwc 2001. Lawrence Erlbaum Associates, Mahway, p 71

    Google Scholar 

  37. Perry A, Shamay-Tsoory S (2013) Understanding emotional and cognitive empathy: a neuropsychological. In: Understanding other minds: Perspectives from developmental social neuroscience. Oup Oxford, p 178

    Google Scholar 

  38. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. MIT Press. http://research.microsoft.com/~jplatt/smo.html

  39. Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, et al (2011) The kaldi speech recognition toolkit. In: Proceedings of automatic speech recognition and understanding workshop (ASRU). pp 1–4

    Google Scholar 

  40. Riccardi G, Hakkani-Tür D (2005) Grounding emotions in human-machine conversational systems. In: Lecture notes in computer science. Springer, pp 144–154

    Google Scholar 

  41. Robbins S, Judge TA, Millett B, Boyle M (2013) Organisational behaviour. Pearson Higher Education AU

    Google Scholar 

  42. Scherer KR (2000) Psychological models of emotion. Neuropsychol Emot 137(3):137–162

    Google Scholar 

  43. Scherer KR (2001) Appraisal considered as a process of multilevel sequential checking. Theory Methods Res Apprais Process Emot 92–120

    Google Scholar 

  44. Schuller B, Batliner A (2013) Computational paralinguistics: emotion, affect and personality in speech and language processing. Wiley

    Google Scholar 

  45. Schuller B, Steidl S, Batliner A (2009a) The interspeech 2009 emotion challenge. In: Proceedings of Interspeech. pp 312–315

    Google Scholar 

  46. Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A (2009b) Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of automatic speech recognition and understanding workshop (ASRU). pp 552–557

    Google Scholar 

  47. Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller C, Narayanan S (2013) Paralinguistics in speech and language state-of-the-art and the challenge. Comput Speech Lang 27(1):4–39

    Article  Google Scholar 

  48. Stepanov E, Favre B, Alam F, Chowdhury S, Singla K, Trione J, Béchet F, Riccardi G (2015) Automatic summarization of call-center conversations. In: In Proceedings of the IEEE automatic speech recognition and understanding workshop (ASRU 2015)

    Google Scholar 

  49. Tamaddoni Jahromi A, Sepehri MM, Teimourpour B, Choobdar S (2010) Modeling customer churn in a non-contractual setting: the case of telecommunications service providers. J Strateg Mark 18(7):587–598

    Article  Google Scholar 

  50. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Firoj Alam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Alam, F., Danieli, M., Riccardi, G. (2019). Automatic Labeling Affective Scenes in Spoken Conversations. In: Klempous, R., Nikodem, J., Baranyi, P. (eds) Cognitive Infocommunications, Theory and Applications. Topics in Intelligent Engineering and Informatics, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-95996-2_6

Download citation

Publish with us

Policies and ethics