Skip to main content

Abstract

This chapter will examine current approaches to speech based emotion recognition. Following a brief introduction that describes the current widely utilised approaches to building such systems, it will attempt to broadly segregate components commonly involved in emotion recognition systems based on their function (i.e., feature extraction, normalisation, classifier, etc.) to give a broad view of the landscape. The next section of the chapter will then attempt to explain in more detail those components that are part of the most current systems. The chapter will also present a broad overview of how phonetic and speaker variability are dealt with in emotion recognition systems. Finally, the chapter presents the authors’ views on what are the current and future research challenges in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A.R. Damasio, A second chance for emotion, in Cognitive Neuroscience of Emotion, ed. by R. Lane et al. (Oxford University Press, New York, 2000), pp. 12–23

    Google Scholar 

  2. K. Scherer, On the nature and function of emotion: a component process approach, in Approaches to Emotion, ed. by K.R. Scherer, P. Ekman (Lawrence Erlbaum Associates, Inc., Hillsdale, 1984), pp. 293–317

    Google Scholar 

  3. S. Tompkins, Affect Imagery Consciousness-Volume I the Positive Affects: The Positive Affects (Springer Publishing Company, New York, 1962)

    Google Scholar 

  4. O. Mowrer, Learning Theory and Behavior (Wiley, New York, 1960)

    Book  Google Scholar 

  5. G. Bower, Mood and memory. Am. Psychol. 36, 129–148 (1981)

    Article  Google Scholar 

  6. K.R. Scherer, What are emotions? And how can they be measured? Soc. Sci. Inf. 44, 695–729 (2005)

    Article  Google Scholar 

  7. P. Verduyn, I. Van Mechelen, F. Tuerlinckx, The relation between event processing and the duration of emotional experience. Emotion 11, 20 (2011)

    Article  Google Scholar 

  8. P. Verduyn, E. Delvaux, H. Van Coillie, F. Tuerlinckx, I. Van Mechelen, Predicting the duration of emotional experience: two experience sampling studies. Emotion 9, 83 (2009)

    Article  Google Scholar 

  9. A. Moors, P.C. Ellsworth, K.R. Scherer, N.H. Frijda, Appraisal theories of emotion: state of the art and future development. Emot. Rev. 5, 119–124 (2013)

    Article  Google Scholar 

  10. I. Fónagy, Emotions, voice and music. Res. Aspects Singing 33, 51–79 (1981)

    Google Scholar 

  11. J. Ohala, Cross-language use of pitch: an ethological view. Phonetica 40, 1 (1983)

    Article  Google Scholar 

  12. E. Kramer, Judgment of personal characteristics and emotions from nonverbal properties of speech. Psychol. Bull. 60, 408 (1963)

    Article  Google Scholar 

  13. I.R. Murray, J.L. Arnott, Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93, 1097–1108 (1993)

    Article  Google Scholar 

  14. I. Pollack, H. Rubenstein, A. Horowitz, Communication of verbal modes of expression. Lang. Speech 3, 121–130 (1960)

    Google Scholar 

  15. K.R. Scherer, R. Banse, H.G. Wallbott, Emotion inferences from vocal expression correlate across languages and cultures. J. Cross Cult. Psychol. 32, 76–92 (2001). doi:10.1177/0022022101032001009

    Article  Google Scholar 

  16. C. Darwin, The Expressions of Emotions in Man and Animals (John Murray, London, 1872)

    Book  Google Scholar 

  17. P. Ekman, An argument for basic emotions. Cogn. Emot. 6, 169–200 (1992)

    Article  Google Scholar 

  18. B. De Gelder, Recognizing emotions by ear and by eye, in Cognitive Neuroscience of Emotion, ed. by R. Lane et al. (Oxford University Press, New York, 2000), pp. 84–105

    Google Scholar 

  19. T. Johnstone, K. Scherer, Vocal communication of emotion, in Handbook of Emotions, ed. by M. Lewis, J. Haviland, 2nd edn. (Guilford, New York, 2000), pp. 220–235

    Google Scholar 

  20. K.R. Scherer, Vocal communication of emotion: a review of research paradigms. Speech Comm. 40, 227–256 (2003)

    Article  MATH  Google Scholar 

  21. K. Scherer, Vocal affect expression: a review and a model for future research. Psychol. Bull. 99, 143–165 (1986)

    Article  Google Scholar 

  22. R. Frick, Communicating emotion: the role of prosodic features. Psychol. Bull. 97, 412–429 (1985)

    Article  Google Scholar 

  23. R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, J.G. Taylor, Emotion recognition in human–computer interaction. Signal Proc. Mag. 18, 32–80 (2001)

    Article  Google Scholar 

  24. R. Cowie, R.R. Cornelius, Describing the emotional states that are expressed in speech. Speech Comm. 40, 5–32 (2003)

    Article  MATH  Google Scholar 

  25. J. Averill, A semantic atlas of emotional concepts. JSAS Cat. Sel. Doc. Psychol. 5, 330 (1975)

    Google Scholar 

  26. R. Plutchik, The Psychology and Biology of Emotion (HarperCollins College Div, New York, 1994)

    Google Scholar 

  27. R. Cowie, E. Douglas-cowie, B. Apolloni, J. Taylor, A. Romano, W. Fellenz, What a neural net needs to know about emotion words, in Computational Intelligence and Applications, ed. by N. Mastorakis (Word Scientific Engineering Society, Singapore, 1999), pp. 109–114

    Google Scholar 

  28. H. Scholsberg, A scale for the judgment of facial expressions. J. Exp. Psychol. 29, 497 (1941)

    Article  Google Scholar 

  29. H. Schlosberg, Three dimensions of emotion. Psychol. Rev. 61, 81 (1954)

    Article  Google Scholar 

  30. J.A. Russell, A. Mehrabian, Evidence for a three-factor theory of emotions. J. Res. Pers. 11, 273–294 (1977)

    Article  Google Scholar 

  31. R. Lazarus, Emotion and Adaptation (Oxford University Press, New York, 1991)

    Google Scholar 

  32. M. Schröder, Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis (Ph. D thesis), Saarland University (2004)

    Google Scholar 

  33. P. Ekman, E.R. Sorenson, W.V. Friesen, Pan-cultural elements in facial displays of emotion. Science 164, 86–88 (1969). doi:10.1126/science.164.3875.86

    Article  Google Scholar 

  34. R.R. Cornelius, The Science of Emotion: Research and Tradition in the Psychology of Emotions (Prentice-Hall, Inc, Upper Saddle River, 1996)

    Google Scholar 

  35. M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, R. Cowie, Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies, in INTERSPEECH (2008), pp. 597–600

    Google Scholar 

  36. M. Grimm, K. Kroschel, Emotion estimation in speech using a 3d emotion space concept, in Robust Speech Recognition and Understanding, ed. by M. Grimm, K. Kroschel (I-Tech, Vienna, 2007), pp. 281–300

    Chapter  Google Scholar 

  37. H.P. Espinosa, C.A.R. García, L.V. Pineda, Features selection for primitives estimation on emotional speech, in 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (2010), pp. 5138–5141

    Google Scholar 

  38. H. Gunes, B. Schuller, M. Pantic, R. Cowie, Emotion representation, analysis and synthesis in continuous space: A survey, in 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011) (2011), pp. 827–834

    Google Scholar 

  39. T. Sobol-Shikler, Automatic inference of complex affective states. Comput. Speech Lang. 25(1), 45–62 (2011). http://dx.doi.org/10.1016/j.csl.2009.12.005

    Article  Google Scholar 

  40. B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth, Acoustic emotion recognition: A benchmark comparison of performances, in ASRU 2009. IEEE Workshop on Automatic Speech Recognition & Understanding, 2009 (2009), pp. 552–557

    Google Scholar 

  41. M. Wollmer, B. Schuller, F. Eyben, G. Rigoll, Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening. IEEE J. Sel. Top. Signal Process. 4, 867–881 (2010)

    Article  Google Scholar 

  42. R. Barra, J.M. Montero, J. Macias-Guarasa, L.F. D’Haro, R. San-Segundo, R. Cordoba, Prosodic and segmental rubrics in emotion identification, in 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings (2006), pp. I–I

    Google Scholar 

  43. M. Borchert, A. Dusterhoft, Emotions in speech – Experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments, in Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE ’05 (2005), pp. 147–151

    Google Scholar 

  44. M. Lugger, B. Yang, An incremental analysis of different feature groups in speaker independent emotion recognition, in ICPhS (2007)

    Google Scholar 

  45. M. Pantic, L.J.M. Rothkrantz, Toward an affect-sensitive multimodal human–computer interaction. Proc. IEEE 91, 1370–1390 (2003)

    Article  Google Scholar 

  46. D. Ververidis, C. Kotropoulos, Emotional speech recognition: resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)

    Article  Google Scholar 

  47. L. Vidrascu, L. Devillers, Five emotion classes detection in real-world call center data: The use of various types of paralinguistic features, in Proceedings of International Workshop on Paralinguistic Speech – 2007 (2007), pp. 11–16.

    Google Scholar 

  48. S. Yacoub, S. Simske, X. Lin, J. Burns, Recognition of emotions in interactive voice response systems, in Eighth European Conference on Speech Communication and Technology (2003), pp. 729–732

    Google Scholar 

  49. D. Bitouk, R. Verma, A. Nenkova, Class-level spectral features for emotion recognition. Speech Commun. 52, 613–625 (2010)

    Article  Google Scholar 

  50. M. El Ayadi, M.S. Kamel, F. Karray, Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44, 572–587 (2011). doi:10.1016/j.patcog.2010.09.020

    Article  MATH  Google Scholar 

  51. C. Lee, S. Narayanan, R. Pieraccini, Combining acoustic and language information for emotion recognition, in Seventh International Conference on Spoken Language Processing (2002), pp. 873–876

    Google Scholar 

  52. B. Schuller, A. Batliner, S. Steidl, D. Seppi, Emotion recognition from speech: Putting ASR in the loop, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2009. ICASSP 2009 (2009), pp. 4585–4588

    Google Scholar 

  53. B. Schuller, G. Rigoll, M. Lang, Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ’04), vol. 1 (2004) pp. I-577–I-580

    Google Scholar 

  54. S. Mozziconacci, D. Hermes, Role of intonation patterns in conveying emotion in speech, in 14th International Conference of Phonetic Sciences (1999), pp. 2001–2004

    Google Scholar 

  55. F. Burkhardt, W. Sendlmeier, Verification of acoustical correlates of emotional speech using formant-synthesis, in ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000), pp. 151–156

    Google Scholar 

  56. M. Wöllmer, F. Eyben, B. Schuller, E. Douglas-Cowie, R. Cowie, Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks, in INTERSPEECH (2009), pp. 1595–1598

    Google Scholar 

  57. A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, V. Aharonson, L. Kessous, Whodunnit–searching for the most important feature types signalling emotion-related user states in speech. Comput. Speech Lang. 25, 4–28 (2011)

    Article  Google Scholar 

  58. T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40 (2010)

    Article  Google Scholar 

  59. E. Ambikairajah, H. Li, W. Liang, Y. Bo, V. Sethu, Language identification: a tutorial. IEEE Circuits Syst. Magazine 11, 82–108 (2011)

    Article  Google Scholar 

  60. M. Kockmann, L. Burget, J. Cernocky, Brno University of Technology System for Interspeech 2009 Emotion Challenge, in INTERSPEECH-2009 (2009), pp. 348–351

    Google Scholar 

  61. V. Sethu, J. Epps, E. Ambikairajah, Speaker variability in speech based emotion models – analysis and normalisation, in ICASSP (2013)

    Google Scholar 

  62. C. Clavel, I. Vasilescu, L. Devillers, G. Richard, T. Ehrette, Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun. 50(6), 487–503 (2008). http://dx.doi.org/10.1016/j.specom.2008.03.012

    Article  Google Scholar 

  63. V. Sethu, Automatic emotion recognition: an investigation of acoustic and prosodic parameters (PhD Thesis), The University of New South Wales, Sydney (2009)

    Google Scholar 

  64. V. Sethu, E. Ambikairajah, J. Epps, On the use of speech parameter contours for emotion recognition. EURASIP J. Audio Speech Music Process. 2013, 1–14 (2013)

    Article  Google Scholar 

  65. C. Busso, S. Lee, S.S. Narayanan, Using neutral speech models for emotional speech analysis, in INTERSPEECH (2007), pp. 2225–2228

    Google Scholar 

  66. B. Schuller, G. Rigoll, M. Lang, Hidden Markov model-based speech emotion recognition, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03), vol. 2 (2003), pp. II-1–II-4

    Google Scholar 

  67. V. Sethu, E. Ambikairajah, J. Epps, Group delay features for emotion detection, in INTERSPEECH-2007 (2007), pp. 2273–2276

    Google Scholar 

  68. C.M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, S. Narayanan, Emotion recognition based on phoneme classes, in INTERSPEECH (2004)

    Google Scholar 

  69. D. Küstner, R. Tato, T. Kemp, B. Meffert, Towards real life applications in emotion recognition, in Affective Dialogue Systems, ed. by E. André, L. Dybkaer, W. Minker, P. Heisterkamp (Springer, Berlin, 2004), pp. 25–35

    Chapter  Google Scholar 

  70. B. Schuller, S. Steidl, A. Batliner, The INTERSPEECH 2009 emotion challenge, in INTERSPEECH-2009, Brighton (2009), pp. 312–315

    Google Scholar 

  71. B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, F. Weninger, F. Eyben, E. Marchi, M. Mortillaro, H. Salamin, A. Polychroniou, F. Valente, S. Kim, The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism, in Interspeech, Lyon (2013)

    Google Scholar 

  72. C.-C. Lee, E. Mower, C. Busso, S. Lee, S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach. Speech Commun. 53(11), 1162–1171 (2011). http://dx.doi.org/10.1016/j.specom.2011.06.004

    Article  Google Scholar 

  73. D. Morrison, R. Wang, L.C. De Silva, Ensemble methods for spoken emotion recognition in call-centres. Speech Commun. 49(2), 98–112 (2007). http://dx.doi.org/10.1016/j.specom.2006.11.004

    Article  Google Scholar 

  74. J. Cichosz, K. Slot, Emotion recognition in speech signal using emotion-extracting binary decision trees, Doctoral Consortium. ACII (2007)

    Google Scholar 

  75. M.W. Bhatti, W. Yongjin, G. Ling, A neural network approach for human emotion recognition in speech, in Proceedings of the 2004 International Symposium on Circuits and Systems, 2004. ISCAS ’04., vol. 2 (2004), pp. II-181–II-184

    Google Scholar 

  76. V. Petrushin, Emotion in speech: recognition and application to call centers, in Conference on Artificial Neural Networks in Engineering (1999), pp. 7–10

    Google Scholar 

  77. L. Chul Min, S.S. Narayanan, R. Pieraccini, Classifying emotions in human–machine spoken dialogs, in 2002 IEEE International Conference on Multimedia and Expo, 2002. ICME ’02 Proceedings, vol. 1 (2002), pp. 737–740

    Google Scholar 

  78. C.C. Burges, A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998). doi:10.1023/A:1009715923555

    Article  Google Scholar 

  79. W.M. Campbell, D.E. Sturim, D.A. Reynolds, A. Solomonoff, SVM based speaker verification using a GMM supervector kernel and NAP variability compensation, in 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings (2006), pp. I-I

    Google Scholar 

  80. H. Hao, X. Ming-Xing, W. Wei, GMM supervector based SVM with spectral features for speech emotion recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007 (2007), pp. IV-413–IV-416

    Google Scholar 

  81. O. Kwon, K. Chan, J. Hao, T. Lee, Emotion recognition by speech signals, in INTERSPEECH-2003 (2003), pp. 125–128

    Google Scholar 

  82. O. Pierre-Yves, The production and recognition of emotions in speech: features and algorithms. Int. J. Hum. Comput. Stud. 59(7), 157–183 (2003). http://dx.doi.org/10.1016/S1071-5819(02)00141-6

    Article  Google Scholar 

  83. D.G. Childers, C.K. Lee, Vocal quality factors: analysis, synthesis, and perception. J. Acoust. Soc. Am. 90, 2394–2410 (1991)

    Article  Google Scholar 

  84. B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, Paralinguistics in speech and language—state-of-the-art and the challenge. Comput. Speech Lang. 27(1), 4–39 (2013). http://dx.doi.org/10.1016/j.csl.2012.02.005

    Article  Google Scholar 

  85. T.L. Nwe, S.W. Foo, L.C. De Silva, Speech emotion recognition using hidden Markov models. Speech Commun. 41, 603–623 (2003)

    Article  Google Scholar 

  86. A. Nogueiras, A. Moreno, A. Bonafonte, J. Mariño, Speech emotion recognition using hidden Markov models, in Proceedings of EUROSPEECH-2001 (2001), pp. 2679–2682

    Google Scholar 

  87. B. Vlasenko, B. Schuller, A. Wendemuth, G. Rigoll, Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing, in Affective Computing and Intelligent Interaction, ed. by A. Paiva, R. Prada, R. Picard (Springer, Berlin, 2007), pp. 139–147

    Chapter  Google Scholar 

  88. L. Rabiner, B. Juang, An introduction to hidden Markov models. IEEE ASSP Magazine 3, 4–16 (1986)

    Article  Google Scholar 

  89. P. Chang-Hyun, S. Kwee-Bo, Emotion recognition and acoustic analysis from speech signal, in Proceedings of the International Joint Conference on Neural Networks, 2003., vol. 4 (2003), pp. 2594–2598

    Google Scholar 

  90. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  91. A. Batliner, Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom, (Wiley, 2013)

    Google Scholar 

  92. A. Batliner, R. Huber, in Speaker Characteristics and Emotion Classification, ed. by C. Müller, vol. 4343 (Springer, Berlin, 2007), pp. 138–151

    Chapter  Google Scholar 

  93. C. Busso, M. Bulut, S.S. Narayanan, in Toward Effective Automatic Recognition Systems of Emotion in Speech, ed. by J. Gratch, S. Marsella (Oxford University Press, New York, 2012)

    Google Scholar 

  94. B. Schuller, A. Batliner, S. Steidl, D. Seppi, Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53, 1062–1087 (2011). doi:10.1016/j.specom.2011.01.011

    Article  Google Scholar 

  95. L. Van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 85 (2008)

    Google Scholar 

  96. C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. Chang, S. Lee, S. Narayanan, IEMOCAP: interactive emotional dyadic motion capture database. Lang. Res. Eval. 42, 335–359 (2008)

    Article  Google Scholar 

  97. V. Sethu, E. Ambikairajah, J. Epps, Phonetic and speaker variations in automatic emotion classification, in INTERSPEECH-2008 (2008), pp. 617–620

    Google Scholar 

  98. D. Ni, V. Sethu, J. Epps, E. Ambikairajah, Speaker variability in emotion recognition – An adaptation based approach, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 5101–5104

    Google Scholar 

  99. K. Jae-Bok, P. Jeong-Sik, O. Yung-Hwan, On-line speaker adaptation based emotion recognition using incremental emotional information, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4948–4951

    Google Scholar 

  100. V. Sethu, E. Ambikairajah, J. Epps, Speaker normalisation for speech-based emotion detection, in 2007 15th International Conference on Digital Signal Processing (2007), pp. 611–614

    Google Scholar 

  101. C. Busso, A. Metallinou, S.S. Narayanan, Iterative feature normalization for emotional speech detection, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 5692–5695

    Google Scholar 

  102. B. Schuller, M. Wimmer, D. Arsic, T. Moosmayr, G. Rigoll, Detection of security related affect and behaviour in passenger transport, in Interspeech, Brisbane (2008), pp. 265–268

    Google Scholar 

  103. L. Ming, A. Metallinou, D. Bone, S. Narayanan, Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 1937–1940

    Google Scholar 

  104. T. Rahman, C. Busso, A personalized emotion recognition system using an unsupervised feature adaptation scheme, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 5117–5120

    Google Scholar 

  105. M. Tahon, A. Delaborde, L. Devillers, Real-life emotion detection from speech in human–robot interaction: Experiments across diverse corpora with child and adult voices, in INTERSPEECH (2011), pp. 3121–3124

    Google Scholar 

  106. B. Schuller, B. Vlasenko, F. Eyben, M. Wollmer, A. Stuhlsatz, A. Wendemuth, G. Rigoll, Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1, 119–131 (2010). doi:10.1109/T-AFFC.2010.8

    Article  Google Scholar 

  107. E. Douglas-Cowie, N. Campbell, R. Cowie, P. Roach, Emotional speech: towards a new generation of databases. Speech Commun. 40, 33–60 (2003)

    Article  MATH  Google Scholar 

  108. E. Bozkurt, E. Erzin, C.E. Erdem, A.T. Erdem, Improving automatic emotion recognition from speech signals, in INTERSPEECH (2009), pp. 324–327

    Google Scholar 

  109. C.-C. Lee, E. Mower, C. Busso, S. Lee, S.S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach, in INTERSPEECH-2009 (2009), pp. 320–323

    Google Scholar 

  110. B. Vlasenko, A. Wendemuth, Processing affected speech within human machine interaction, in INTERSPEECH (2009), pp. 2039–2042

    Google Scholar 

  111. I. Luengo, E. Navas, I. Hernaez, Combining spectral and prosodic information for emotion recognition in the Interspeech 2009 emotion challenge, in INTERSPEECH-2009 (2009), pp. 332–335

    Google Scholar 

  112. S. Planet, I. Iriondo, J. Socoró, C. Monzo, J. Adell, GTM-URL contribution to the INTERSPEECH 2009 emotion challenge, in INTERSPEECH-2009 (2009), pp. 316–319

    Google Scholar 

  113. P. Dumouchel, N. Dehak, Y. Attabi, R. Dehak, N. Boufaden, Cepstral and long-term features for emotion recognition, in INTERSPEECH-2009 (2009), pp. 344–347

    Google Scholar 

  114. T. Vogt, E. André, Exploring the benefits of discretization of acoustic features for speech emotion recognition, in INTERSPEECH (2009), pp. 328–331

    Google Scholar 

  115. R. Barra Chicote, F. Fernández Martínez, L. Lutfi, S. Binti, J.M. Lucas Cuesta, J. Macías Guarasa, J.M. Montero Martínez, R. San Segundo Hernández, J.M. Pardo Muñoz, Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions, in INTERSPEECH 2009 (2009), pp. 336–339

    Google Scholar 

  116. O. Räsänen, J. Pohjalainen, Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon (2013)

    Google Scholar 

  117. G. Gosztolya, R. Busa-Fekete, L. TĂłth, Detecting autism, emotions and social signals using AdaBoost. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon

    Google Scholar 

  118. H.-y. Lee, T.-y. Hu, H. Jing, Y.-F. Chang, Y. Tsao, Y.-C. Kao, T.-L. Pao, Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon (2013)

    Google Scholar 

  119. V. Sethu, J. Epps, E. Ambikairajah, H. Li, GMM based speaker variability compensated system for Interspeech 2013 ComParE emotion challenge. Presented at the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vidhyasaharan Sethu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Sethu, V., Epps, J., Ambikairajah, E. (2015). Speech Based Emotion Recognition. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-1456-2_7

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-1455-5

  • Online ISBN: 978-1-4939-1456-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics