Deep Learning Approaches for Speech Emotion Recognition

Bhavan, Anjali; Sharma, Mohit; Piplani, Mehak; Chauhan, Pankaj; Hitkul; Shah, Rajiv Ratn

doi:10.1007/978-981-15-1216-2_10

Anjali Bhavan⁸,
Mohit Sharma⁸,
Mehak Piplani⁸,
Pankaj Chauhan⁸,
Hitkul⁸ &
…
Rajiv Ratn Shah⁸

Part of the book series: Algorithms for Intelligent Systems ((AIS))

2455 Accesses
2 Citations

Abstract

In recent times, the rise of several multimodal (audio, video, etc.) content-sharing sites like Soundcloud and Dubsmash have made development of sentiment analytical techniques for these imperative. Particularly, there is much to explore when it comes to audio data, which has proliferated rapidly. Of all the various aspects of audio sentiment studies, emotion recognition in speech signals has gained momentum and attention in recent times. Recognizing specific emotions inherent in spoken language could go a long way in healthcare, information sciences, human–computer interaction, etc. This chapter examines the process of delineating sentiments from speech, and the impact of various deep learning techniques on the same. Factors like extracting relevant features and the performances of several deep learning architectures on such datasets are analyzed. Performances using various classical and deep learning approaches are presented as well. Finally, some conclusions and suggestions on the way forward are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sawhney, R., P. Manchanda, P. Mathur, R. Shah, and R. Singh. 2018, October. Exploring and learning suicidal ideation connotations on social media with deep learning. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 167–175
Google Scholar
Mathur, P., R. Shah, R. Sawhney, and D. Mahata. 2018, July. Detecting offensive tweets in Hindi-English code-switched language. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, 18–26
Google Scholar
Mathur, P., R. Sawhney, M. Ayyar, and R. Shah. 2018, October. Did you offend me? Classification of offensive tweets in hinglish language. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), 138–148
Google Scholar
Sawhney, R., P. Manchanda, R. Singh, and S. Aggarwal. 2018, July. A computational approach to feature extraction for identification of suicidal ideation in tweets. In Proceedings of ACL 2018, Student Research Workshop, 91–98
Google Scholar
Mishra, R., P. Sinha, R. Sawhney, D. Mahata, P. Mathur, and R. Shah. 2019, June. SNAP-BATNET: Cascading author profiling and social network graphs for suicide ideation detection on social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Google Scholar
Chowdhury, A., R. Sawhney, P. Mathur, D. Mahata, and R. Shah. 2019, June. Speak up, Fight Back! Detection of social media disclosures of sexual harassment. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Google Scholar
Sawhney, R., P. Mathur, and R. Shankar. 2018, May. A firefly algorithm based wrapper-penalty feature selection method for cancer diagnosis. In International Conference on Computational Science and Its Applications, 438–449. Cham: Springer
Google Scholar
Jain, R., R. Sawhney, and P. Mathur. 2018, March. Feature selection for cryotherapy and immunotherapy treatment methods based on gravitational search algorithm. In 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 1–7. IEEE
Google Scholar
Sawhney, R., and R. Jain. 2018, February. Modified Binary Dragonfly algorithm for Feature Selection in Human Papillomavirus-Mediated disease treatment. In 2018 International Conference on Communication, Computing and Internet of Things (IC3IoT)
Google Scholar
Dellaert, F., T. Polzin, and A. Waibel. 1996, October. Recognizing emotion in speech. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, vol. 3, 1970–1973. IEEE
Google Scholar
Vogt, T., and E. Andr. 2005, July. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In 2005 IEEE International Conference on Multimedia and Expo, 474–477. IEEE
Google Scholar
Petrushin, V.A. 2000. Emotion recognition in speech signal: Experimental study, development, and application
Google Scholar
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology 70(3), 614 (1996)
Article Google Scholar
Rabiner, L.R., Sambur, M.R.: An algorithm for determining the endpoints of isolated utterances. Bell System Technical Journal 54(2), 297–315 (1975)
Article Google Scholar
Zhu, A., and Q. Luo. 2007. Study on speech emotion recognition system in E-learning. In Human Computer Interaction, Part III, HCII, ed. J. Jacko, 544–552. LNCS. Berlin: Springer
Chapter Google Scholar
Zhang, S. 2008. Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Advances in Neural Networks, ed. Sun et al. 457–464. Lecture Notes in Computer Science. Berlin: Springer
Google Scholar
Origlia, A., V. Galatá, and B. Ludusan. 2010. Automatic classification of emotions via global and local prosodic features on a multilingual emotional database. In Proceeding of the 2010 Speech Prosody. Chicago
Google Scholar
Hu, H., M.X. Xu, and W. Wu. 2007, April. GMM supervector based SVM with spectral features for speech emotion recognition. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4, IV-413. IEEE
Google Scholar
Bitouk, D., Verma, R., Nenkova, A.: Class-level spectral features for emotion recognition. Speech Communication 52(7–8), 613–625 (2010)
Article Google Scholar
Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America 55(6), 1304–1312 (1974)
Article Google Scholar
Morrison, D., Wang, R., De Silva, L.C.: Ensemble methods for spoken emotion recognition in call-centres. Speech Communication 49(2), 98–112 (2007)
Article Google Scholar
Razak, A.A., R. Komiya, M. Izani, and Z. Abidin. 2005, July. Comparison between fuzzy and NN method for speech emotion recognition. In Third International Conference on Information Technology and Applications (ICITA’05), vol. 1, 297–302. IEEE
Google Scholar
Shrawankar, U., and M. Thakare. 2013. Techniques for feature extraction in speech recognition system: A comparative study. arXiv preprint arXiv:1305.1145
Gupta, H., and D. Gupta. 2016, January. LPC and LPCC method of feature extraction in Speech Recognition System. In 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence), 498–502. IEEE
Google Scholar
Mao, X., Chen, L., Zhang, B.: Mandarin speech emotion recognition based on a hybrid of HMM/ANN. International Journal of Computers 1(4), 321–324 (2007)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Article Google Scholar
Sato, N., Obuchi, Y.: Emotion recognition using mel-frequency cepstral coefficients. Information and Media Technologies 2(3), 835–848 (2007)
Google Scholar
Bou-Ghazale, S.E., Hansen, J.H.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing 8(4), 429–442 (2000)
Article Google Scholar
Molau, S., M. Pitz, R. Schluter, and H. Ney. 2001. Computing mel-frequency cepstral coefficients on the power spectrum. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 1, 73–76. IEEE
Google Scholar
Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Communication 53(5), 768–785 (2011)
Article Google Scholar
Liu, G.K. 2018. Evaluating Gammatone Frequency Cepstral Coefficients with neural networks for emotion recognition from speech. arXiv preprint arXiv:1806.09010
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Communication 41(4), 603–623 (2003)
Article Google Scholar
Seehapoch, T., and S. Wongthanavasu. 2013, January. Speech emotion recognition using support vector machines. In 2013 5th International Conference on Knowledge and Smart Technology (KST), 86–91. IEEE
Google Scholar
Schuller, B., G. Rigoll, and M. Lang. 2004, May. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, I-577. IEEE
Google Scholar
Kwon, O.W., K. Chan, J. Hao, and T.W. Lee. 2003. Emotion recognition by speech signals. In Eighth European Conference on Speech Communication and Technology
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Ladha, L., Deepa, T.: Feature selection methods and algorithms. International Journal on Computer Science and Engineering 3(5), 1787–1797 (2011)
Google Scholar
Chrysostomou, K. 2009. Wrapper feature selection. In Encyclopedia of Data Warehousing and Mining, 2nd ed, 2103–2108. IGI Global
Google Scholar
Lee, C.M., S. Narayanan, and R. Pieraccini. 2001. Recognition of negative emotions from the speech signal. In IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU’01, 240–243. IEEE
Google Scholar
Sedaaghi, M.H., C. Kotropoulos, and D. Ververidis. 2007, October. Using adaptive genetic algorithms to improve speech emotion recognition. In 2007 IEEE 9th Workshop on Multimedia Signal Processing, 461–464. IEEE
Google Scholar
Schuller, B., D. Arsi, F. Wallhoff, M. Land, and G. Rigoll. 2005. Bioanalogacoustic emotion recognition by genetic feature generation based on low-level-descriptors. In Proceedings of the International Conference on Computer as Tool (EUROCON), 1292–1295
Google Scholar
Rong, J., Li, G., Chen, Y.P.P.: Acoustic feature selection for automatic emotion recognition from speech. Information Processing & Management 45(3), 315–328 (2009)
Article Google Scholar
Petrushin, V. 1999, November. Emotion in speech: Recognition and application to call centers. In Proceedings of Artificial Neural Networks in Engineering, vol. 710, 22
Google Scholar
Kononenko, I. 1994, April. Estimating attributes: Analysis and extensions of RELIEF. In European Conference on Machine Learning, 171–182. Berlin: Springer
Google Scholar
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44(3), 572–587 (2011)
Article MATH Google Scholar
Koolagudi, S.G., Rao, K.S.: Emotion recognition from speech: A review. International Journal of Speech Technology 15(2), 99–117 (2012)
Article Google Scholar
Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review 43(2), 155–177 (2015)
Article Google Scholar
Vogt, T., E. Andr, and N. Bee. 2008, June. EmoVoiceA framework for online recognition of emotions from voice. In International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, 188–199. Berlin: Springer
Google Scholar
Kang, B.S., C.H. Han, S.T. Lee, D.H. Youn, and C. Lee. 2000. Speaker dependent emotion recognition using speech signals
Google Scholar
Kostoulas, T.P., and N. Fakotakis. 2006, July. A speaker dependent emotion recognition framework. In Proceedings of 5th International Symposium, Communication Systems, Networks and Digital Signal Processing (CSNDSP), 305–309. University of Patras
Google Scholar
Cen, L., W. Ser, and Z.L. Yu. 2008, December. Speech emotion recognition using canonical correlation analysis and probabilistic neural network. In 2008 Seventh International Conference on Machine Learning and Applications, 859–862. IEEE
Google Scholar
Anagnostopoulos, C.N., and E. Vovoli. 2009. Sound processing features for speaker-dependent and phrase-independent emotion recognition in Berlin Database. In Information Systems Development, 413–421. Boston, MA: Springer
Chapter Google Scholar
Lang, P.J.: The emotion probe: Studies of motivation and attention. American Psychologist 50(5), 372 (1995)
Article Google Scholar
Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39(6), 1161 (1980)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Wang, L. (ed.). 2005. Support Vector Machines: Theory and Applications, vol. 177. Springer Science & Business Media
Google Scholar
Laptev, I., and B. Caputo. 2004, August. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, 32–36. IEEE
Google Scholar
Joachims, T. 1998, April. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning, 137–142. Berlin: Springer
Google Scholar
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Article Google Scholar
Chavhan, Y., Dhore, M.L., Yesaware, P.: Speech emotion recognition using support vector machine. International Journal of Computer Applications 1(20), 6–9 (2010)
Article Google Scholar
Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. International Journal of Smart Home 6(2), 101–108 (2012)
Google Scholar
Zhou, Y., Y. Sun, J. Zhang, and Y. Yan. 2009. Speech emotion recognition using both spectral and prosodic features. In International Conference on Information Engineering and Computer Science, ICIECS, Wuhan, December 1920, 1–4. New York: IEEE Press
Google Scholar
Rajasekhar, A., and M.K. Hota. 2018, April. A study of speech, speaker and emotion recognition using mel frequency cepstrum coefficients and support vector machines. In 2018 International Conference on Communication and Signal Processing (ICCSP), 0114–0118. IEEE
Google Scholar
Ram, C.S., and R. Ponnusamy. 2014, February. An effective automatic speech emotion recognition for Tamil language using Support Vector Machine. In 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), 19–23. IEEE
Google Scholar
Sinith, M.S., E. Aswathi, T.M. Deepa, C.P. Shameema, and S. Rajan. 2015, December. Emotion recognition from audio signals using Support Vector Machine. In 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 139–144. IEEE
Google Scholar
Koolagudi, S.G., S. Maity, V.A. Kumar, S. Chakrabarti, and K.S. Rao. 2009, August. IITKGP-SESC: Speech database for emotion analysis. In International Conference on Contemporary Computing, 485–492. Berlin: Springer
Google Scholar
Koolagudi, S.G., R. Reddy, J. Yadav, and K.S. Rao. 2011, February. IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In 2011 International Conference on Devices and Communications (ICDeCom), 1–5. IEEE
Google Scholar
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)
Article MathSciNet MATH Google Scholar
Sonnhammer, E.L., Von Heijne, G., Krogh, A.: July. A hidden Markov model for predicting transmembrane helices in protein sequences. Ismb 6, 175–182 (1998)
Google Scholar
Miller, D.R., Leek, T., Schwartz, R.M.: August. A hidden Markov model information retrieval system. SIGIR 99, 214–221 (1999)
Article Google Scholar
Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. Computer Speech & Language 6(3), 225–242 (1992)
Article Google Scholar
Schuller, B., G. Rigoll, and M. Lang. 2003, April. Hidden Markov model-based speech emotion recognition. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP’03), vol. 2, II-1. IEEE
Google Scholar
Lin, Y.L., and G. Wei. 2005, August. Speech emotion recognition based on HMM and SVM. In 2005 International Conference on Machine Learning and Cybernetics, vol. 8, 4898–4901. IEEE
Google Scholar
Le, D., and E.M. Provost. 2013, December. Emotion recognition from spontaneous speech using hidden Markov models with deep belief networks. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 216–221. IEEE
Google Scholar
Huang, R., and C. Ma. 2006, August. Toward a speaker-independent real-time affect detection system. In 18th International Conference on Pattern Recognition (ICPR’06), vol. 1, 1204–1207. IEEE
Google Scholar
Yang, M.H., and N. Ahuja. 1998, December. Gaussian mixture model for human skin color and its applications in image and video databases. In Storage and Retrieval for Image and Video Databases VII, vol. 3656, 458–467. International Society for Optics and Photonics
Google Scholar
Huang, Y., Englehart, K.B., Hudgins, B., Chan, A.D.: A Gaussian mixture model based classification scheme for myoelectric control of powered upper limb prostheses. IEEE Transactions on Biomedical Engineering 52(11), 1801–1811 (2005)
Article Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83 (1995)
Article Google Scholar
Neiberg, D., K. Elenius, K. Laskowski. (2006). Emotion recognition in spontaneous speech using GMMs
Google Scholar
Thapliyal, N., Amoli, G.: Speech based emotion recognition with gaussian mixture model. International Journal of Advanced Research in Computer Engineering & Technology 1(5), 2278–1323 (2012)
Google Scholar
Ververidis, D., and C. Kotropoulos. 2005, May. Emotional speech classification using Gaussian mixture models. In 2005 IEEE International Symposium on Circuits and Systems, 2871–2874. IEEE
Google Scholar
Ververidis, D., and C. Kotropoulos. 2005, July. Emotional speech classification using Gaussian mixture models and the sequential floating forward selection algorithm. In 2005 IEEE International Conference on Multimedia and Expo, 1500–1503. IEEE
Google Scholar
Pudil, P., F.J. Ferri, J. Novovicova, and J. Kittler. 1994, October. Floating search methods for feature selection with non-monotonic criterion functions. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3-Conference C: Signal Processing (Cat. No. 94CH3440-5), vol. 2, 279–283. IEEE
Google Scholar
Bombatkar, A., G. Bhoyar, K. Morjani, S. Gautam, and V. Gupta. 2014. Emotion recognition using Speech Processing Using k-nearest neighbor algorithm. International Journal of Engineering Research and Applications (IJERA). ISSN, 2248-9622
Google Scholar
Khan, M., Goskula, T., Nasiruddin, M., Quazi, R.: Comparison between K-NN and SVM method for speech emotion recognition. International Journal on Computer Science and Engineering 3(2), 607–611 (2011)
Google Scholar
Lee, C.C., Mower, E., Busso, C., Lee, S., Narayanan, S.: Emotion recognition using a hierarchical binary decision tree approach. Speech Communication 53(9–10), 1162–1171 (2011)
Article Google Scholar
Cichosz, J., and K. Slot. 2007. Emotion recognition in speech signal using emotion-extracting binary decision trees. In Proceedings of Affective Computing and Intelligent Interaction
Google Scholar
Schuller, B., S. Reiter, R. Muller, M. Al-Hames, M. Lang, and G. Rigoll. 2005, July. Speaker independent speech emotion recognition by ensemble classification. In 2005 IEEE International Conference on Multimedia and Expo, 864–867. IEEE
Google Scholar
Schuller, B., M. Lang, and G. Rigoll. 2005. Robust acoustic speech emotion recognition by ensembles of classifiers. In Tagungsband Fortschritte der Akustik-DAGA 05, München
Google Scholar
LeCun, Y., Bengio, Y., Hinton, Geoffrey: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Krizhevsky, A., I. Sutskever, and G.E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, 1097–1105
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)
Article Google Scholar
He, K., X. Zhang, S. Ren, and J. Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the 13th European Conference on Computer Vision, 346–3610. New York, NY, USA: Springer
Chapter Google Scholar
Stuhlsatz, A., et al. 2011. Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 5688–5691
Google Scholar
Han, K., D. Yu, and I. Tashev. 2014. Speech emotion recognition using deep neural network and extreme learning machine. In Proceedings of Interspeech, 223–227
Google Scholar
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: Theory and applications. Neurocomputing 70(1), 489–501 (2006)
Article Google Scholar
Huang, Z., M. Dong, Q. Mao, Y. Zhan. 2014. Speech emotion recognition using CNN. In Proceedings of ACM International Conference on Multimedia, New York, NY, USA, 801–804
Google Scholar
Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia 16(8), 2203–2213 (2014)
Article Google Scholar
Trigeorgis, G., et al. 2016. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In Proceedings of the 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, Shanghai, China, 5200–5204
Google Scholar
Bhargava, M., and R. Rose. 2015. Architectures for deep neural network based acoustic models defined over windowed speech waveforms. In Sixteenth Annual Conference of the International Speech Communication Association
Google Scholar
Sainath, T.N., R.J. Weiss, A. Senior, K.W. Wilson, and O. Vinyals. 2015. Learning the speech front-end with raw waveform CLDNNs. In Sixteenth Annual Conference of the International Speech Communication Association
Google Scholar
Kim, J., and R.A. Saurous. 2018. Emotion recognition from human speech using temporal information and deep learning. In Proceedings of Interspeech, 937–940
Google Scholar
Zhang, S., Zhang, S., Huang, T., Gao, W.: Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia 20(6), 1576–1590 (2018)
Article Google Scholar
G. Alex, et al. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In International Conference on Artificial Neural Networks, September 11–15, 799–804
Google Scholar
Chen, M., He, X., Yang, J., Zhang, H.: 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Letters 25(10), 1440–1444 (2018)
Article Google Scholar
Rasmus, A., M. Berglund, M. Honkala, H. Valpola, and T. Raiko. 2015. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, 3546–3554
Google Scholar
Huang, J., Y. Li, J. Tao, Z. Lian, M. Niu, and J. Yi. 2018, May. Speech emotion recognition using semi-supervised learning with ladder networks. In 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), 1–5. IEEE
Google Scholar
Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. Journal of Statistical Software 36(11), 1–13 (2010)
Article Google Scholar
Jaitly, N., and G. Hinton. 2011, May. Learning a better representation of speech soundwaves using restricted boltzmann machines. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5884–5887. IEEE
Google Scholar
Lawrence, I., Lin, K.: A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268 (1989)
Article MATH Google Scholar
Ringeval, F., A. Sonderegger, J. Sauer, and D. Lalanne. 2013, April. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 1–8. IEEE
Google Scholar
Wei, H.C., and S.S. Narayanan. 2016. Attention assisted discovery of sub-utterance structure in speech emotion recognition. In Proceedings of Interspeech, 1387–1391
Google Scholar
Huang, C.W., S.S. Narayanan. 2017. Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition. In Proceedings of IEEE International Conference on Multimedia and Expo, Hong Kong, 583–588
Google Scholar
Michael, N., and N.T. Vu. 2017. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. In Proceedings of Interspeech, 1263–1267
Google Scholar
Chan, W., and I. Lane. 2015. Deep convolutional neural networks for acoustic modeling in low resource languages. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2056–2060
Google Scholar
Eyben, F., et al. 2016, April-June. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing 7 (2): 190–202
Article Google Scholar
Provost, E.M. 2013. Identifying salient sub-utterance emotion dynamics using flexible units and estimates of affective ow. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 3682–3686
Google Scholar
Wollmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: LSTM modeling of continuous emotions in an audiovisual affect recognition framework. Image and Vision Computing 31(2), 153–163 (2013)
Article Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 2169–2178 (2006)
Google Scholar
Murray, I., Arnott, J.: Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America 32(2), 1097–1108 (1993)
Article Google Scholar
Chaspari, T., D. Dimitriadis, and P. Maragos, Emotion classification of speech using modulation features. In Proceedings of the European Signal Processing Conference (EUSIPCO)
Google Scholar
Kingma, D., and J. Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of ICLR, San Diego, USA
Google Scholar

Download references

Author information

Authors and Affiliations

MIDAS, IIIT-Delhi, New Delhi, India
Anjali Bhavan, Mohit Sharma, Mehak Piplani, Pankaj Chauhan, Hitkul & Rajiv Ratn Shah

Authors

Anjali Bhavan
View author publications
You can also search for this author in PubMed Google Scholar
Mohit Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Mehak Piplani
View author publications
You can also search for this author in PubMed Google Scholar
Pankaj Chauhan
View author publications
You can also search for this author in PubMed Google Scholar
Hitkul
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Ratn Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajiv Ratn Shah .

Editor information

Editors and Affiliations

Indian Institute of Information Technology Kota (IIIT-Kota), Jaipur, Rajasthan, India
Basant Agarwal
Faculty of Science and Engineering, School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, Australia
Richi Nayak
Department of Computer Science and Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India
Namita Mittal
Department of Computer Science and Engineering, SOA University, Bhubaneswar, Odisha, India
Srikanta Patnaik

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bhavan, A., Sharma, M., Piplani, M., Chauhan, P., Hitkul, Shah, R.R. (2020). Deep Learning Approaches for Speech Emotion Recognition. In: Agarwal, B., Nayak, R., Mittal, N., Patnaik, S. (eds) Deep Learning-Based Approaches for Sentiment Analysis. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-1216-2_10

Download citation

DOI: https://doi.org/10.1007/978-981-15-1216-2_10
Published: 25 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1215-5
Online ISBN: 978-981-15-1216-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics