Automatic Labeling Affective Scenes in Spoken Conversations

Alam, Firoj; Danieli, Morena; Riccardi, Giuseppe

doi:10.1007/978-3-319-95996-2_6

Firoj Alam⁶,
Morena Danieli⁶ &
Giuseppe Riccardi⁶

Part of the book series: Topics in Intelligent Engineering and Informatics ((TIEI,volume 13))

506 Accesses
1 Citations
1 Altmetric

Abstract

Research in affective computing has mainly focused on analyzing human emotional states as perceivable within limited contexts such as speech utterances. In our study, we focus on the dynamic transitions of the emotional states that are appearing throughout the conversations and investigate computational models to automatically label emotional states using the proposed affective scene framework. An affective scene includes a complete sequence of emotional states in a conversation from its start to its end. Affective scene instances include different patterns of behavior such as who manifests an emotional state, when it is manifested, and which kinds of changes occur due to the influence of one’s emotion onto another interlocutor. In this paper, we present the design and training of an automatic affective scene segmentation and classification system for spoken conversations. We comparatively evaluate the contributions of different feature types in the acoustic, lexical and psycholinguistic space and their correlations and combination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this chapter, the word ‘interlocutor’ encompasses the person speaking and expressing (speaker) the emotion and the person listening and perceiving (listener) that emotion.
2.
The churn rate is “the percentage of customers who stop buying the products or services of a particular company.” In the telecommunication industry, some studies found that the approximate annual churn rate is \(30\%\) [24, 49].
3.
Turn refers to the spoken content of a speaker at a time. For example, speaker A says something, which is speaker A’s turn, then, speaker B says something, which is speaker B’s turn.
4.
https://github.com/firojalam/openSMILE-configuration.
5.
By data level, we refer to the data preparation phase, i.e., before feature extraction we select segments of the majority class, which is neutral in this case.
6.
By feature level, we refer to the over-sampling process on feature vector for minority classes.

References

Alam F (2016) Computational models for analyzing affective behaviors and personality from speech and text. PhD thesis, University of Trento
Google Scholar
Alam F, Riccardi G (2013) Comparative study of speaker personality traits recognition in conversational and broadcast news speech. In: Proceedings of interspeech, ISCA, pp 2851–2855
Google Scholar
Alam F, Riccardi G (2014) Fusion of acoustic, linguistic and psycholinguistic features for speaker personality traits recognition. In: Proceedings of international conference on acoustics, speech and signal processing (ICASSP), pp 955–959
Google Scholar
Alam F, Riccardi G (2014) Predicting personality traits using multimodal information. In: Proceedings of the 2014 ACM multi media on workshop on computational personality recognition, ACM, pp 15–18
Google Scholar
Alam F, Chowdhury SA, Danieli M, Riccardi G (2016) How interlocutors coordinate with each other within emotional segments? In: COLING: international conference on computational linguistics
Google Scholar
Baranyi P, Csapó Á (2012) Definition and synergies of cognitive infocommunications. Acta Polytech Hung 9(1):67–83
Google Scholar
Barrett LF, Lewis M, Haviland-Jones JM (2016) Handbook of emotions. Guilford Publications
Google Scholar
Carletta J (1996) Assessing agreement on classification tasks: the kappa statistic. Comput Linguist 22(2):249–254
Google Scholar
Castán D, Ortega A, Miguel A (2014) Lleida E (2014) Audio segmentation-by-classification approach based on factor analysis in broadcast news domain. EURASIP J Audio Speech Music Process 1:1–13
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 321–357
Google Scholar
Chowdhury SA (2017) Computational modeling of turn-taking dynamics in spoken conversations. PhD thesis, University of Trento
Google Scholar
Chowdhury SA, Riccardi G (2017) A deep learning approach to modeling competitiveness in spoken conversation. In: Proceedings of international conference on acoustics, speech and signal processing (ICASSP), IEEE
Google Scholar
Chowdhury SA, Riccardi G, Alam F (2014) Unsupervised recognition and clustering of speech overlaps in spoken conversations. In: Proceedings of workshop on speech, language and audio in multimedia—SLAM2014. pp 62–66
Google Scholar
Chowdhury SA, Danieli M, Riccardi G (2015) Annotating and categorizing competition in overlap speech. In: Proceedings of ICASSP. IEEE
Google Scholar
Chowdhury SA, Danieli M, Riccardi G (2015) The role of speakers and context in classifying competition in overlapping speech. In: Sixteenth annual conference of the international speech communication association
Google Scholar
Chowdhury SA, Stepanov E, Riccardi G (2016) Predicting user satisfaction from turn-taking in spoken conversations. In: Proceedings of Interspeech
Google Scholar
Danieli M, Riccardi G, Alam F (2015) Emotion unfolding and affective scenes: a case study in spoken conversations. In: Proceedings of emotion representations and modelling for companion systems (ERM4CT) 2015. ICMI
Google Scholar
Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In: Proceedings of Interspeech. pp 801–804
Google Scholar
Eyben F, Weninger F, Gross F, Schuller B (2013) Recent developments in opensmile, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on Multimedia (ACMM). ACM, pp 835–838
Google Scholar
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuousvalued attributes for classification learning. Thirteenth international joint conference on articial intelligence, vol 2. Morgan Kaufmann Publishers, pp 1022–1027
Google Scholar
Filipowicz A, Barsade S, Melwani S (2011) Understanding emotional transitions: the interpersonal consequences of changing emotions in negotiations. J Pers Soc Psychol 101(3):541
Article Google Scholar
Fisher W, Groff R, Roane H (2011) Applied behavior analysis: history, philosophy, principles, and basic methods. In: Handbook of applied behavior analysis, pp 3–13
Google Scholar
Frijda NH (1993) Moods, emotion episodes, and emotions
Google Scholar
Galanis D, Karabetsos S, Koutsombogera M, Papageorgiou H, Esposito A, Riviello MT (2013) Classification of emotional speech units in call centre interactions. In: 2013 IEEE 4th international conference on cognitive infocommunications (CogInfoCom). IEEE, pp 403–406
Google Scholar
Gross JJ (1998) The emerging field of emotion regulation: an integrative review. Rev Gen Psychol 2(3):271
Article Google Scholar
Gross JJ, Thompson RA (2007) Emotion regulation: conceptual foundations. In: Handbook of emotion regulation, vol 3, p 24
Google Scholar
Harrigan J, Rosenthal R (2008) New handbook of methods in nonverbal behavior research. Oxford University Press
Google Scholar
Hoffman ML (2008) Empathy and prosocial behavior. Handb Emot 3:440–455
Google Scholar
Juslin PN, Scherer KR (2005) Vocal expression of affect. In: The new handbook of methods in nonverbal behavior research. pp 65–135
Google Scholar
Kim S, Georgiou PG, Lee S, Narayanan S (2007) Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In: Proceedings of multimedia signal processing, 2007 (MMSP 2007). pp 48–51
Google Scholar
Konar A, Chakraborty A (2014) Emotion recognition: a pattern analysis approach. Wiley
Google Scholar
Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: Proceedings of machine learning: European conference on machine learning (ECML). Springer, pp 171–182
Google Scholar
Lee CC, Busso C, Lee S, Narayanan SS (2009) Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions. In: Proceedings of Interspeech. pp 1983–1986
Google Scholar
McCall C, Singer T (2013) Empathy and the brain. In: Understanding other minds: Perspectives from developmental social neuroscience. pp 195–214
Google Scholar
NIST (2009) The 2009 RT-09 RIch transcription meeting recognition evaluation plan. NIST
Google Scholar
Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: Liwc 2001. Lawrence Erlbaum Associates, Mahway, p 71
Google Scholar
Perry A, Shamay-Tsoory S (2013) Understanding emotional and cognitive empathy: a neuropsychological. In: Understanding other minds: Perspectives from developmental social neuroscience. Oup Oxford, p 178
Google Scholar
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. MIT Press. http://research.microsoft.com/~jplatt/smo.html
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, et al (2011) The kaldi speech recognition toolkit. In: Proceedings of automatic speech recognition and understanding workshop (ASRU). pp 1–4
Google Scholar
Riccardi G, Hakkani-Tür D (2005) Grounding emotions in human-machine conversational systems. In: Lecture notes in computer science. Springer, pp 144–154
Google Scholar
Robbins S, Judge TA, Millett B, Boyle M (2013) Organisational behaviour. Pearson Higher Education AU
Google Scholar
Scherer KR (2000) Psychological models of emotion. Neuropsychol Emot 137(3):137–162
Google Scholar
Scherer KR (2001) Appraisal considered as a process of multilevel sequential checking. Theory Methods Res Apprais Process Emot 92–120
Google Scholar
Schuller B, Batliner A (2013) Computational paralinguistics: emotion, affect and personality in speech and language processing. Wiley
Google Scholar
Schuller B, Steidl S, Batliner A (2009a) The interspeech 2009 emotion challenge. In: Proceedings of Interspeech. pp 312–315
Google Scholar
Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A (2009b) Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of automatic speech recognition and understanding workshop (ASRU). pp 552–557
Google Scholar
Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller C, Narayanan S (2013) Paralinguistics in speech and language state-of-the-art and the challenge. Comput Speech Lang 27(1):4–39
Article Google Scholar
Stepanov E, Favre B, Alam F, Chowdhury S, Singla K, Trione J, Béchet F, Riccardi G (2015) Automatic summarization of call-center conversations. In: In Proceedings of the IEEE automatic speech recognition and understanding workshop (ASRU 2015)
Google Scholar
Tamaddoni Jahromi A, Sepehri MM, Teimourpour B, Choobdar S (2010) Modeling customer churn in a non-contractual setting: the case of telecommunications service providers. J Strateg Mark 18(7):587–598
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
Firoj Alam, Morena Danieli & Giuseppe Riccardi

Authors

Firoj Alam
View author publications
You can also search for this author in PubMed Google Scholar
Morena Danieli
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Riccardi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Firoj Alam .

Editor information

Editors and Affiliations

Faculty of Electronics, Wrocław University of Science and Technology, Wrocław, Poland
Ryszard Klempous
Faculty of Electronics, Wrocław University of Science and Technology, Wrocław, Poland
Jan Nikodem
Széchenyi István University, Győr, Hungary
Péter Zoltán Baranyi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Alam, F., Danieli, M., Riccardi, G. (2019). Automatic Labeling Affective Scenes in Spoken Conversations. In: Klempous, R., Nikodem, J., Baranyi, P. (eds) Cognitive Infocommunications, Theory and Applications. Topics in Intelligent Engineering and Informatics, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-95996-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-95996-2_6
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95995-5
Online ISBN: 978-3-319-95996-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics