Multimedia Tools and Applications

, Volume 78, Issue 2, pp 2345–2366 | Cite as

Fear emotion classification in speech by acoustic and behavioral cues

  • Shin-ae Yoon
  • Guiyoung Son
  • Soonil KwonEmail author


Machine-based emotional speech classification has become a requirement for natural and familiar human-computer interactions. Because emotional speech recognition systems use a person’s voice to spontaneously detect their emotional status and take subsequent appropriate actions, they can be used widely for various reason in call centers and emotional based media services. Emotional speech recognition systems are primarily developed using emotional acoustic data. While there are several emotional acoustic databases available for emotion recognition systems in other countries, there is currently no real situational data related to the “fear emotion” available. Thus, in this study, we collected acoustic data recordings which represent real urgent and fearful situations from an emergency call center. To classify callers’ emotions more accurately, we also included the additional behavioral feature “interjection” which can be classified as a type of disfluency which arises due to cognitive dysfunction observed in spontaneous speech when a speaker gets hyperemotional. We used Support Vector Machines (SVM), with the interjections feature, as well as conventionally used acoustic features (i.e., F0 variability, voice intensity variability, and Mel-Frequency Cepstral Coefficients; MFCCs) to identify which emotional category acoustic data fell into. The results of our study revealed that the MFCC was the best acoustic feature for spontaneous fear speech classification. In addition, we demonstrated the validity of behavioral features as an important criteria for emotional classification improvement.


Emotional speech classification Emergency situation Behavioral cue Disfluency(interjection) Speech signal processing 



This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No.2017-0-00189, Voice emotion recognition and indexing for affective multimedia service)

Supplementary material

11042_2018_6329_MOESM1_ESM.docx (28 kb)
S1 File . The Questionnaire details (DOCX 27 kb)


  1. 1.
    Barrett LF (1998) Discrete emotions or dimensions? The role of valence focus and arousal focus. Cognit Emot 12(4):579–599CrossRefGoogle Scholar
  2. 2.
    Batliner A, Fischer K, Huber R, Spilker J, Nöth E (2000) Desperately seeking emotions or: actors, wizards, and human beings. In: ISCA tutorial and research workshop (ITRW) on speech and emotionGoogle Scholar
  3. 3.
    Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167CrossRefGoogle Scholar
  4. 4.
    Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Interspeech, pp 1517–1520Google Scholar
  5. 5.
    Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1(20):6–9Google Scholar
  6. 6.
    Corley M, Stewart OW (2008) Hesitation disfluencies in spontaneous speech: the meaning of um. Lang Linguist Compass 2(4):589–602CrossRefGoogle Scholar
  7. 7.
    Davison GC, Vogel RS, Coffman SG (1997) Think-aloud approaches to cognitive assessment and the articulated thoughts in simulated situations paradigm. J Consult Clin Psychol 65(6):950–958CrossRefGoogle Scholar
  8. 8.
    Devillers L, Vasilescu I, Vidrascu L (2004) Anger versus fear detection in recorded conversations. In: Proceedings of speech prosody, pp 205–208Google Scholar
  9. 9.
    Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine learning based detection. Neural Netw 18(4):407–422. CrossRefGoogle Scholar
  10. 10.
    Dibble JL, Wisner AM, Dobbins L, Cacal M, Taniguchi E, Peyton A, van Raalte L, Kubulins A (2015) Hesitation to share bad news: by-product of verbal message planning or functional communication behavior? Commun Res 42(2):213–236CrossRefGoogle Scholar
  11. 11.
    El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587zbMATHCrossRefGoogle Scholar
  12. 12.
    Fontaine JR, Scherer KR, Roesch EB, Ellsworth PC (2007) The world of emotions is not two-dimensional. Psychol Sci 18(12):1050–1057CrossRefGoogle Scholar
  13. 13.
    Forbes-Riley K, Litman DJ (2004) Predicting emotion in spoken dialogue from multiple knowledge sources. In: HLT-NAACL. Citeseer, pp 201–208Google Scholar
  14. 14.
    Galanis D, Karabetsos S, Koutsombogera M, Papageorgiou H, Esposito A, Riviello M-T (2013) Classification of emotional speech units in call centre interactions. In: Cognitive infocommunications (CogInfoCom), 2013 IEEE 4th international conference on. IEEE, pp 403–406Google Scholar
  15. 15.
    Goberman AM, Hughes S, Haydock T (2011) Acoustic characteristics of public speaking: anxiety and practice effects. Speech Comm 53(6):867–876CrossRefGoogle Scholar
  16. 16.
    Hamann S (2012) Mapping discrete and dimensional emotions onto the brain: controversies and consensus. Trends Cogn Sci 16(9):458–466. CrossRefGoogle Scholar
  17. 17.
    Iliou T, Anagnostopoulos C-N (2009) Statistical evaluation of speech features for emotion recognition. In: Digital telecommunications, 2009. ICDT'09. Fourth International Conference on. IEEE, pp 121–126Google Scholar
  18. 18.
    Izard CE, Libero DZ, Putnam P, Haynes OM (1993) Stability of emotion experiences and their relations to traits of personality. J Pers Soc Psychol 64(5):847CrossRefGoogle Scholar
  19. 19.
    Juslin PN, Laukka P (2003) Communication of emotions in vocal expression and music performance: different channels, same code? Psychol Bull 129(5):770–814. CrossRefGoogle Scholar
  20. 20.
    Kao Y-h, Lee L-s (2006) Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In: InterSpeechGoogle Scholar
  21. 21.
    Laukka P, Juslin P, Bresin R (2005) A dimensional approach to vocal expression of emotion. Cognit Emot 19(5):633–653CrossRefGoogle Scholar
  22. 22.
    Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303CrossRefGoogle Scholar
  23. 23.
    Lee H, Kim E, Lee M (2003) A validation study of Korea positive and negative affect schedule: the PANAS scales. Korean J Clin Psychol 22(4):935–946Google Scholar
  24. 24.
    Lee F-M, Li L-H, Huang R-Y (2008) Recognizing low/high anger in speech for call centers. In: Proceedings of 7th international conference on signal processing, robotics and automation. World Scientific and Engineering Academy and Society (WSEAS), University of Cambridge, UK, pp 171–176Google Scholar
  25. 25.
    Lindsey AE, Greene JO, Parker RG, Sassi M (1995) Effects of advance message formulation on message encoding: evidence of cognitively based hesitation in the production of multiple-goal messages. Commun Q 43(3):320–331CrossRefGoogle Scholar
  26. 26.
    Lindström A, Villing J, Larsson S, Seward A, Åberg N, Holtelius C (2008) The effect of cognitive load on disfluencies during in-vehicle spoken dialogue. In: INTERSPEECH, pp 1196–1199Google Scholar
  27. 27.
    Liscombe J, Riccardi G, Hakkani-Tür DZ (2005) Using context to improve emotion detection in spoken dialog systems. In: Interspeech, pp 1845–1848Google Scholar
  28. 28.
    Luengo I, Navas E, Hernáez I, Sánchez J (2005) Automatic emotion recognition using prosodic parameters. In: Interspeech, pp 493–496Google Scholar
  29. 29.
    Lugger M, Yang B (2007) The relevance of voice quality features in speaker independent emotion recognition. In: Acoustics, speech and signal processing, 2007. ICASSP 2007. IEEE International Conference on. IEEE, pp IV-17–IV-20Google Scholar
  30. 30.
    Mao X, Chen L, Fu L (2009) Multi-level speech emotion recognition based on HMM and ANN. In: Computer science and information engineering, 2009 WRI world congress on. IEEE, pp 225–229Google Scholar
  31. 31.
    Mauss IB, Robinson MD (2009) Measures of emotion: a review. Cognit Emot 23(2):209–237. CrossRefGoogle Scholar
  32. 32.
    Mehrabian A, Russell JA (1974) An approach to environmental psychology. the MIT Press, CambridgeGoogle Scholar
  33. 33.
    Metze F, Englert R, Bub U, Burkhardt F, Stegmann J (2009) Getting closer: tailored human–computer speech dialog. Univ Access Inf Soc 8(2):97–108CrossRefGoogle Scholar
  34. 34.
    Morrison D, Wang RL, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49(2):98–112CrossRefGoogle Scholar
  35. 35.
    Narayanan S (2002) Towards modeling user behavior in human-machine interaction: effect of errors and emotions. In: Proc. ISLE workshop on dialogue tagging for multi-modal human computer interactionGoogle Scholar
  36. 36.
    Narayanan S, Georgiou PG (2013) Behavioral signal processing: deriving human behavioral informatics from speech and language: computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond. Proc IEEE Inst Electr Electron Eng 101(5):1203–1233. CrossRefGoogle Scholar
  37. 37.
    Neiberg D, Elenius K (2008) Automatic recognition of anger in spontaneous speech. In: INTERSPEECH, pp 2755–2758Google Scholar
  38. 38.
    Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: Interspeech, pp 809–812Google Scholar
  39. 39.
    Ostir GV, Markides KS, Black SA, Goodwin JS (2000) Emotional well-being predicts subsequent functional independence and survival. J Am Geriatr Soc 48(5):473–478CrossRefGoogle Scholar
  40. 40.
    Pan Y, Shen P, Shen L (2012) Speech emotion recognition using support vector machine. IJSH International Journal of Smart Home 6(2):101–108Google Scholar
  41. 41.
    Panksepp J (1989) The neurobiology of emotions: of animal brains and human feelingsGoogle Scholar
  42. 42.
    Pao T-L, Chen Y-T, Yeh J-H, Li P-J (2006) Mandarin emotional speech recognition based on SVM and NN. In: Pattern recognition, 2006. ICPR 2006. 18th International Conference on. IEEE, pp 1096–1100Google Scholar
  43. 43.
    Petrushin V (1999) Emotion in speech: recognition and application to call centers. In: Proceedings of artificial neural networks in engineeringGoogle Scholar
  44. 44.
    Pfister T (2010) Emotion detection from speech. 2010Google Scholar
  45. 45.
    Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191CrossRefGoogle Scholar
  46. 46.
    Plutchik R (1980) A general psychoevolutionary theory of emotion. Theories of Emotion 1(3-31):4Google Scholar
  47. 47.
    Polzehl T, Schmitt A, Metze F, Wagner M (2011) Anger recognition in speech using acoustic and linguistic cues. Speech Comm 53(9-10):1198–1209CrossRefGoogle Scholar
  48. 48.
    Rao KS, Koolagudi SG, Vempada RR (2013) Emotion recognition from speech using global and local prosodic features. Int J Speech Technol 16(2):143–160CrossRefGoogle Scholar
  49. 49.
    Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 110(1):145–172CrossRefGoogle Scholar
  50. 50.
    Rv B (1984) The characteristics and recognizability of vocal expression of emotions. Walter de Gruyter, Inc., The NetherlandsGoogle Scholar
  51. 51.
    Sahidullah M, Saha G (2012) Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Comm 54(4):543–565CrossRefGoogle Scholar
  52. 52.
    Salovey P, Kokkonen M, Lopes PN, Mayer JD (2004) Emotional intelligence: what do we know? In: Feelings and emotions: the Amsterdam symposium, Jun, 2001, Amsterdam, Netherlands. Cambridge University PressGoogle Scholar
  53. 53.
    Sato N, Obuchi Y (2007) Emotion recognition using mel-frequency cepstral coefficients. IMT 2(3):835–848Google Scholar
  54. 54.
    Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: Multimedia and expo, 2003. ICME'03. Proceedings. 2003 international conference on. IEEE, pp I–401Google Scholar
  55. 55.
    Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Acoustics, speech, and signal processing, 2004. Proceedings. (ICASSP'04). IEEE international conference on. IEEE, pp I–577Google Scholar
  56. 56.
    Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Comm 53(9-10):1062–1087CrossRefGoogle Scholar
  57. 57.
    Tahon M, Delaborde A, Devillers L (2011) Real-life emotion detection from speech in human-robot interaction: experiments across diverse corpora with child and adult voices. In: InterspeechGoogle Scholar
  58. 58.
    Utane AS, Nalbalwar S (2013) Emotion recognition through speech using Gaussian mixture model and hidden Markov model. IJARCSSE 3(4)Google Scholar
  59. 59.
    Ververidis D, Kotropoulos C, Pitas I (2004) Automatic emotional speech classification. In: Acoustics, speech, and signal processing, 2004. Proceedings. (ICASSP'04). IEEE international conference on. IEEE, pp I–593Google Scholar
  60. 60.
    Vidrascu L, Devillers L (2005) Annotation and detection of blended emotions in real human-human dialogs recorded in a call center. In: Multimedia and expo, 2005. ICME 2005. IEEE international conference on. IEEE, p 4 ppGoogle Scholar
  61. 61.
    Vidrascu L, Devillers L (2005) Detection of real-life emotions in call centers. In: INTERSPEECH, vol 10, pp 1841–1844Google Scholar
  62. 62.
    Watson D, Clark LA, Tellegen A (1988) Development and validation of brief measures of positive and negative affect: the PANAS scales. J Pers Soc Psychol 54(6):1063–1070CrossRefGoogle Scholar
  63. 63.
    Wingate ME (1984) Fluency, disfluency, dysfluency, and stuttering. J Fluen Disord 9(2):163–168CrossRefGoogle Scholar
  64. 64.
    Xiao Z, Dellandrea E, Dou W, Chen L (2005) Features extraction and selection for emotional speech classification. In: Advanced video and signal based surveillance, 2005. AVSS 2005. IEEE conference on. IEEE, pp 411–416Google Scholar
  65. 65.
    Yik MS, Russell JA, Barrett LF (1999) Structure of self-reported current affect: integration and beyond. J Pers Soc Psychol 77(3):600CrossRefGoogle Scholar
  66. 66.
    Zhang S (2008) Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In: International symposium on neural networks. Springer, pp 457–464Google Scholar
  67. 67.
    Zhu A, Luo Q (2007) Study on speech emotion recognition system in E-learning. In: International conference on human-computer interaction. Springer, pp 544–552Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Software, College of Software and Convergence TechnologySejong UniversitySeoulRepublic of Korea

Personalised recommendations