Skip to main content

Deep Learning Approaches for Speech Emotion Recognition

  • Chapter
  • First Online:
Deep Learning-Based Approaches for Sentiment Analysis

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Abstract

In recent times, the rise of several multimodal (audio, video, etc.) content-sharing sites like Soundcloud and Dubsmash have made development of sentiment analytical techniques for these imperative. Particularly, there is much to explore when it comes to audio data, which has proliferated rapidly. Of all the various aspects of audio sentiment studies, emotion recognition in speech signals has gained momentum and attention in recent times. Recognizing specific emotions inherent in spoken language could go a long way in healthcare, information sciences, human–computer interaction, etc. This chapter examines the process of delineating sentiments from speech, and the impact of various deep learning techniques on the same. Factors like extracting relevant features and the performances of several deep learning architectures on such datasets are analyzed. Performances using various classical and deep learning approaches are presented as well. Finally, some conclusions and suggestions on the way forward are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sawhney, R., P. Manchanda, P. Mathur, R. Shah, and R. Singh. 2018, October. Exploring and learning suicidal ideation connotations on social media with deep learning. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 167–175

    Google Scholar 

  2. Mathur, P., R. Shah, R. Sawhney, and D. Mahata. 2018, July. Detecting offensive tweets in Hindi-English code-switched language. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, 18–26

    Google Scholar 

  3. Mathur, P., R. Sawhney, M. Ayyar, and R. Shah. 2018, October. Did you offend me? Classification of offensive tweets in hinglish language. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), 138–148

    Google Scholar 

  4. Sawhney, R., P. Manchanda, R. Singh, and S. Aggarwal. 2018, July. A computational approach to feature extraction for identification of suicidal ideation in tweets. In Proceedings of ACL 2018, Student Research Workshop, 91–98

    Google Scholar 

  5. Mishra, R., P. Sinha, R. Sawhney, D. Mahata, P. Mathur, and R. Shah. 2019, June. SNAP-BATNET: Cascading author profiling and social network graphs for suicide ideation detection on social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

    Google Scholar 

  6. Chowdhury, A., R. Sawhney, P. Mathur, D. Mahata, and R. Shah. 2019, June. Speak up, Fight Back! Detection of social media disclosures of sexual harassment. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

    Google Scholar 

  7. Sawhney, R., P. Mathur, and R. Shankar. 2018, May. A firefly algorithm based wrapper-penalty feature selection method for cancer diagnosis. In International Conference on Computational Science and Its Applications, 438–449. Cham: Springer

    Google Scholar 

  8. Jain, R., R. Sawhney, and P. Mathur. 2018, March. Feature selection for cryotherapy and immunotherapy treatment methods based on gravitational search algorithm. In 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 1–7. IEEE

    Google Scholar 

  9. Sawhney, R., and R. Jain. 2018, February. Modified Binary Dragonfly algorithm for Feature Selection in Human Papillomavirus-Mediated disease treatment. In 2018 International Conference on Communication, Computing and Internet of Things (IC3IoT)

    Google Scholar 

  10. Dellaert, F., T. Polzin, and A. Waibel. 1996, October. Recognizing emotion in speech. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, vol. 3, 1970–1973. IEEE

    Google Scholar 

  11. Vogt, T., and E. Andr. 2005, July. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In 2005 IEEE International Conference on Multimedia and Expo, 474–477. IEEE

    Google Scholar 

  12. Petrushin, V.A. 2000. Emotion recognition in speech signal: Experimental study, development, and application

    Google Scholar 

  13. Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology 70(3), 614 (1996)

    Article  Google Scholar 

  14. Rabiner, L.R., Sambur, M.R.: An algorithm for determining the endpoints of isolated utterances. Bell System Technical Journal 54(2), 297–315 (1975)

    Article  Google Scholar 

  15. Zhu, A., and Q. Luo. 2007. Study on speech emotion recognition system in E-learning. In Human Computer Interaction, Part III, HCII, ed. J. Jacko, 544–552. LNCS. Berlin: Springer

    Chapter  Google Scholar 

  16. Zhang, S. 2008. Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Advances in Neural Networks, ed. Sun et al. 457–464. Lecture Notes in Computer Science. Berlin: Springer

    Google Scholar 

  17. Origlia, A., V. Galatá, and B. Ludusan. 2010. Automatic classification of emotions via global and local prosodic features on a multilingual emotional database. In Proceeding of the 2010 Speech Prosody. Chicago

    Google Scholar 

  18. Hu, H., M.X. Xu, and W. Wu. 2007, April. GMM supervector based SVM with spectral features for speech emotion recognition. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4, IV-413. IEEE

    Google Scholar 

  19. Bitouk, D., Verma, R., Nenkova, A.: Class-level spectral features for emotion recognition. Speech Communication 52(7–8), 613–625 (2010)

    Article  Google Scholar 

  20. Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America 55(6), 1304–1312 (1974)

    Article  Google Scholar 

  21. Morrison, D., Wang, R., De Silva, L.C.: Ensemble methods for spoken emotion recognition in call-centres. Speech Communication 49(2), 98–112 (2007)

    Article  Google Scholar 

  22. Razak, A.A., R. Komiya, M. Izani, and Z. Abidin. 2005, July. Comparison between fuzzy and NN method for speech emotion recognition. In Third International Conference on Information Technology and Applications (ICITA’05), vol. 1, 297–302. IEEE

    Google Scholar 

  23. Shrawankar, U., and M. Thakare. 2013. Techniques for feature extraction in speech recognition system: A comparative study. arXiv preprint arXiv:1305.1145

  24. Gupta, H., and D. Gupta. 2016, January. LPC and LPCC method of feature extraction in Speech Recognition System. In 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence), 498–502. IEEE

    Google Scholar 

  25. Mao, X., Chen, L., Zhang, B.: Mandarin speech emotion recognition based on a hybrid of HMM/ANN. International Journal of Computers 1(4), 321–324 (2007)

    Google Scholar 

  26. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)

    Article  Google Scholar 

  27. Sato, N., Obuchi, Y.: Emotion recognition using mel-frequency cepstral coefficients. Information and Media Technologies 2(3), 835–848 (2007)

    Google Scholar 

  28. Bou-Ghazale, S.E., Hansen, J.H.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing 8(4), 429–442 (2000)

    Article  Google Scholar 

  29. Molau, S., M. Pitz, R. Schluter, and H. Ney. 2001. Computing mel-frequency cepstral coefficients on the power spectrum. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 1, 73–76. IEEE

    Google Scholar 

  30. Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Communication 53(5), 768–785 (2011)

    Article  Google Scholar 

  31. Liu, G.K. 2018. Evaluating Gammatone Frequency Cepstral Coefficients with neural networks for emotion recognition from speech. arXiv preprint arXiv:1806.09010

  32. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Communication 41(4), 603–623 (2003)

    Article  Google Scholar 

  33. Seehapoch, T., and S. Wongthanavasu. 2013, January. Speech emotion recognition using support vector machines. In 2013 5th International Conference on Knowledge and Smart Technology (KST), 86–91. IEEE

    Google Scholar 

  34. Schuller, B., G. Rigoll, and M. Lang. 2004, May. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, I-577. IEEE

    Google Scholar 

  35. Kwon, O.W., K. Chan, J. Hao, and T.W. Lee. 2003. Emotion recognition by speech signals. In Eighth European Conference on Speech Communication and Technology

    Google Scholar 

  36. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  37. Ladha, L., Deepa, T.: Feature selection methods and algorithms. International Journal on Computer Science and Engineering 3(5), 1787–1797 (2011)

    Google Scholar 

  38. Chrysostomou, K. 2009. Wrapper feature selection. In Encyclopedia of Data Warehousing and Mining, 2nd ed, 2103–2108. IGI Global

    Google Scholar 

  39. Lee, C.M., S. Narayanan, and R. Pieraccini. 2001. Recognition of negative emotions from the speech signal. In IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU’01, 240–243. IEEE

    Google Scholar 

  40. Sedaaghi, M.H., C. Kotropoulos, and D. Ververidis. 2007, October. Using adaptive genetic algorithms to improve speech emotion recognition. In 2007 IEEE 9th Workshop on Multimedia Signal Processing, 461–464. IEEE

    Google Scholar 

  41. Schuller, B., D. Arsi, F. Wallhoff, M. Land, and G. Rigoll. 2005. Bioanalogacoustic emotion recognition by genetic feature generation based on low-level-descriptors. In Proceedings of the International Conference on Computer as Tool (EUROCON), 1292–1295

    Google Scholar 

  42. Rong, J., Li, G., Chen, Y.P.P.: Acoustic feature selection for automatic emotion recognition from speech. Information Processing & Management 45(3), 315–328 (2009)

    Article  Google Scholar 

  43. Petrushin, V. 1999, November. Emotion in speech: Recognition and application to call centers. In Proceedings of Artificial Neural Networks in Engineering, vol. 710, 22

    Google Scholar 

  44. Kononenko, I. 1994, April. Estimating attributes: Analysis and extensions of RELIEF. In European Conference on Machine Learning, 171–182. Berlin: Springer

    Google Scholar 

  45. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44(3), 572–587 (2011)

    Article  MATH  Google Scholar 

  46. Koolagudi, S.G., Rao, K.S.: Emotion recognition from speech: A review. International Journal of Speech Technology 15(2), 99–117 (2012)

    Article  Google Scholar 

  47. Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review 43(2), 155–177 (2015)

    Article  Google Scholar 

  48. Vogt, T., E. Andr, and N. Bee. 2008, June. EmoVoiceA framework for online recognition of emotions from voice. In International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, 188–199. Berlin: Springer

    Google Scholar 

  49. Kang, B.S., C.H. Han, S.T. Lee, D.H. Youn, and C. Lee. 2000. Speaker dependent emotion recognition using speech signals

    Google Scholar 

  50. Kostoulas, T.P., and N. Fakotakis. 2006, July. A speaker dependent emotion recognition framework. In Proceedings of 5th International Symposium, Communication Systems, Networks and Digital Signal Processing (CSNDSP), 305–309. University of Patras

    Google Scholar 

  51. Cen, L., W. Ser, and Z.L. Yu. 2008, December. Speech emotion recognition using canonical correlation analysis and probabilistic neural network. In 2008 Seventh International Conference on Machine Learning and Applications, 859–862. IEEE

    Google Scholar 

  52. Anagnostopoulos, C.N., and E. Vovoli. 2009. Sound processing features for speaker-dependent and phrase-independent emotion recognition in Berlin Database. In Information Systems Development, 413–421. Boston, MA: Springer

    Chapter  Google Scholar 

  53. Lang, P.J.: The emotion probe: Studies of motivation and attention. American Psychologist 50(5), 372 (1995)

    Article  Google Scholar 

  54. Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39(6), 1161 (1980)

    Article  Google Scholar 

  55. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  56. Wang, L. (ed.). 2005. Support Vector Machines: Theory and Applications, vol. 177. Springer Science & Business Media

    Google Scholar 

  57. Laptev, I., and B. Caputo. 2004, August. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, 32–36. IEEE

    Google Scholar 

  58. Joachims, T. 1998, April. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning, 137–142. Berlin: Springer

    Google Scholar 

  59. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)

    Article  Google Scholar 

  60. Chavhan, Y., Dhore, M.L., Yesaware, P.: Speech emotion recognition using support vector machine. International Journal of Computer Applications 1(20), 6–9 (2010)

    Article  Google Scholar 

  61. Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. International Journal of Smart Home 6(2), 101–108 (2012)

    Google Scholar 

  62. Zhou, Y., Y. Sun, J. Zhang, and Y. Yan. 2009. Speech emotion recognition using both spectral and prosodic features. In International Conference on Information Engineering and Computer Science, ICIECS, Wuhan, December 1920, 1–4. New York: IEEE Press

    Google Scholar 

  63. Rajasekhar, A., and M.K. Hota. 2018, April. A study of speech, speaker and emotion recognition using mel frequency cepstrum coefficients and support vector machines. In 2018 International Conference on Communication and Signal Processing (ICCSP), 0114–0118. IEEE

    Google Scholar 

  64. Ram, C.S., and R. Ponnusamy. 2014, February. An effective automatic speech emotion recognition for Tamil language using Support Vector Machine. In 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), 19–23. IEEE

    Google Scholar 

  65. Sinith, M.S., E. Aswathi, T.M. Deepa, C.P. Shameema, and S. Rajan. 2015, December. Emotion recognition from audio signals using Support Vector Machine. In 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 139–144. IEEE

    Google Scholar 

  66. Koolagudi, S.G., S. Maity, V.A. Kumar, S. Chakrabarti, and K.S. Rao. 2009, August. IITKGP-SESC: Speech database for emotion analysis. In International Conference on Contemporary Computing, 485–492. Berlin: Springer

    Google Scholar 

  67. Koolagudi, S.G., R. Reddy, J. Yadav, and K.S. Rao. 2011, February. IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In 2011 International Conference on Devices and Communications (ICDeCom), 1–5. IEEE

    Google Scholar 

  68. Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  69. Sonnhammer, E.L., Von Heijne, G., Krogh, A.: July. A hidden Markov model for predicting transmembrane helices in protein sequences. Ismb 6, 175–182 (1998)

    Google Scholar 

  70. Miller, D.R., Leek, T., Schwartz, R.M.: August. A hidden Markov model information retrieval system. SIGIR 99, 214–221 (1999)

    Article  Google Scholar 

  71. Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. Computer Speech & Language 6(3), 225–242 (1992)

    Article  Google Scholar 

  72. Schuller, B., G. Rigoll, and M. Lang. 2003, April. Hidden Markov model-based speech emotion recognition. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP’03), vol. 2, II-1. IEEE

    Google Scholar 

  73. Lin, Y.L., and G. Wei. 2005, August. Speech emotion recognition based on HMM and SVM. In 2005 International Conference on Machine Learning and Cybernetics, vol. 8, 4898–4901. IEEE

    Google Scholar 

  74. Le, D., and E.M. Provost. 2013, December. Emotion recognition from spontaneous speech using hidden Markov models with deep belief networks. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 216–221. IEEE

    Google Scholar 

  75. Huang, R., and C. Ma. 2006, August. Toward a speaker-independent real-time affect detection system. In 18th International Conference on Pattern Recognition (ICPR’06), vol. 1, 1204–1207. IEEE

    Google Scholar 

  76. Yang, M.H., and N. Ahuja. 1998, December. Gaussian mixture model for human skin color and its applications in image and video databases. In Storage and Retrieval for Image and Video Databases VII, vol. 3656, 458–467. International Society for Optics and Photonics

    Google Scholar 

  77. Huang, Y., Englehart, K.B., Hudgins, B., Chan, A.D.: A Gaussian mixture model based classification scheme for myoelectric control of powered upper limb prostheses. IEEE Transactions on Biomedical Engineering 52(11), 1801–1811 (2005)

    Article  Google Scholar 

  78. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83 (1995)

    Article  Google Scholar 

  79. Neiberg, D., K. Elenius, K. Laskowski. (2006). Emotion recognition in spontaneous speech using GMMs

    Google Scholar 

  80. Thapliyal, N., Amoli, G.: Speech based emotion recognition with gaussian mixture model. International Journal of Advanced Research in Computer Engineering & Technology 1(5), 2278–1323 (2012)

    Google Scholar 

  81. Ververidis, D., and C. Kotropoulos. 2005, May. Emotional speech classification using Gaussian mixture models. In 2005 IEEE International Symposium on Circuits and Systems, 2871–2874. IEEE

    Google Scholar 

  82. Ververidis, D., and C. Kotropoulos. 2005, July. Emotional speech classification using Gaussian mixture models and the sequential floating forward selection algorithm. In 2005 IEEE International Conference on Multimedia and Expo, 1500–1503. IEEE

    Google Scholar 

  83. Pudil, P., F.J. Ferri, J. Novovicova, and J. Kittler. 1994, October. Floating search methods for feature selection with non-monotonic criterion functions. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3-Conference C: Signal Processing (Cat. No. 94CH3440-5), vol. 2, 279–283. IEEE

    Google Scholar 

  84. Bombatkar, A., G. Bhoyar, K. Morjani, S. Gautam, and V. Gupta. 2014. Emotion recognition using Speech Processing Using k-nearest neighbor algorithm. International Journal of Engineering Research and Applications (IJERA). ISSN, 2248-9622

    Google Scholar 

  85. Khan, M., Goskula, T., Nasiruddin, M., Quazi, R.: Comparison between K-NN and SVM method for speech emotion recognition. International Journal on Computer Science and Engineering 3(2), 607–611 (2011)

    Google Scholar 

  86. Lee, C.C., Mower, E., Busso, C., Lee, S., Narayanan, S.: Emotion recognition using a hierarchical binary decision tree approach. Speech Communication 53(9–10), 1162–1171 (2011)

    Article  Google Scholar 

  87. Cichosz, J., and K. Slot. 2007. Emotion recognition in speech signal using emotion-extracting binary decision trees. In Proceedings of Affective Computing and Intelligent Interaction

    Google Scholar 

  88. Schuller, B., S. Reiter, R. Muller, M. Al-Hames, M. Lang, and G. Rigoll. 2005, July. Speaker independent speech emotion recognition by ensemble classification. In 2005 IEEE International Conference on Multimedia and Expo, 864–867. IEEE

    Google Scholar 

  89. Schuller, B., M. Lang, and G. Rigoll. 2005. Robust acoustic speech emotion recognition by ensembles of classifiers. In Tagungsband Fortschritte der Akustik-DAGA 05, MĂĽnchen

    Google Scholar 

  90. LeCun, Y., Bengio, Y., Hinton, Geoffrey: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  91. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  92. Krizhevsky, A., I. Sutskever, and G.E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, 1097–1105

    Google Scholar 

  93. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  94. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)

    Article  Google Scholar 

  95. He, K., X. Zhang, S. Ren, and J. Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the 13th European Conference on Computer Vision, 346–3610. New York, NY, USA: Springer

    Chapter  Google Scholar 

  96. Stuhlsatz, A., et al. 2011. Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 5688–5691

    Google Scholar 

  97. Han, K., D. Yu, and I. Tashev. 2014. Speech emotion recognition using deep neural network and extreme learning machine. In Proceedings of Interspeech, 223–227

    Google Scholar 

  98. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: Theory and applications. Neurocomputing 70(1), 489–501 (2006)

    Article  Google Scholar 

  99. Huang, Z., M. Dong, Q. Mao, Y. Zhan. 2014. Speech emotion recognition using CNN. In Proceedings of ACM International Conference on Multimedia, New York, NY, USA, 801–804

    Google Scholar 

  100. Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia 16(8), 2203–2213 (2014)

    Article  Google Scholar 

  101. Trigeorgis, G., et al. 2016. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In Proceedings of the 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, Shanghai, China, 5200–5204

    Google Scholar 

  102. Bhargava, M., and R. Rose. 2015. Architectures for deep neural network based acoustic models defined over windowed speech waveforms. In Sixteenth Annual Conference of the International Speech Communication Association

    Google Scholar 

  103. Sainath, T.N., R.J. Weiss, A. Senior, K.W. Wilson, and O. Vinyals. 2015. Learning the speech front-end with raw waveform CLDNNs. In Sixteenth Annual Conference of the International Speech Communication Association

    Google Scholar 

  104. Kim, J., and R.A. Saurous. 2018. Emotion recognition from human speech using temporal information and deep learning. In Proceedings of Interspeech, 937–940

    Google Scholar 

  105. Zhang, S., Zhang, S., Huang, T., Gao, W.: Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia 20(6), 1576–1590 (2018)

    Article  Google Scholar 

  106. G. Alex, et al. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In International Conference on Artificial Neural Networks, September 11–15, 799–804

    Google Scholar 

  107. Chen, M., He, X., Yang, J., Zhang, H.: 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Letters 25(10), 1440–1444 (2018)

    Article  Google Scholar 

  108. Rasmus, A., M. Berglund, M. Honkala, H. Valpola, and T. Raiko. 2015. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, 3546–3554

    Google Scholar 

  109. Huang, J., Y. Li, J. Tao, Z. Lian, M. Niu, and J. Yi. 2018, May. Speech emotion recognition using semi-supervised learning with ladder networks. In 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), 1–5. IEEE

    Google Scholar 

  110. Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. Journal of Statistical Software 36(11), 1–13 (2010)

    Article  Google Scholar 

  111. Jaitly, N., and G. Hinton. 2011, May. Learning a better representation of speech soundwaves using restricted boltzmann machines. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5884–5887. IEEE

    Google Scholar 

  112. Lawrence, I., Lin, K.: A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268 (1989)

    Article  MATH  Google Scholar 

  113. Ringeval, F., A. Sonderegger, J. Sauer, and D. Lalanne. 2013, April. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 1–8. IEEE

    Google Scholar 

  114. Wei, H.C., and S.S. Narayanan. 2016. Attention assisted discovery of sub-utterance structure in speech emotion recognition. In Proceedings of Interspeech, 1387–1391

    Google Scholar 

  115. Huang, C.W., S.S. Narayanan. 2017. Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition. In Proceedings of IEEE International Conference on Multimedia and Expo, Hong Kong, 583–588

    Google Scholar 

  116. Michael, N., and N.T. Vu. 2017. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. In Proceedings of Interspeech, 1263–1267

    Google Scholar 

  117. Chan, W., and I. Lane. 2015. Deep convolutional neural networks for acoustic modeling in low resource languages. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2056–2060

    Google Scholar 

  118. Eyben, F., et al. 2016, April-June. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing 7 (2): 190–202

    Article  Google Scholar 

  119. Provost, E.M. 2013. Identifying salient sub-utterance emotion dynamics using flexible units and estimates of affective ow. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 3682–3686

    Google Scholar 

  120. Wollmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: LSTM modeling of continuous emotions in an audiovisual affect recognition framework. Image and Vision Computing 31(2), 153–163 (2013)

    Article  Google Scholar 

  121. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 2169–2178 (2006)

    Google Scholar 

  122. Murray, I., Arnott, J.: Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America 32(2), 1097–1108 (1993)

    Article  Google Scholar 

  123. Chaspari, T., D. Dimitriadis, and P. Maragos, Emotion classification of speech using modulation features. In Proceedings of the European Signal Processing Conference (EUSIPCO)

    Google Scholar 

  124. Kingma, D., and J. Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of ICLR, San Diego, USA

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajiv Ratn Shah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bhavan, A., Sharma, M., Piplani, M., Chauhan, P., Hitkul, Shah, R.R. (2020). Deep Learning Approaches for Speech Emotion Recognition. In: Agarwal, B., Nayak, R., Mittal, N., Patnaik, S. (eds) Deep Learning-Based Approaches for Sentiment Analysis. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-1216-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1216-2_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1215-5

  • Online ISBN: 978-981-15-1216-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics