Abstract
In recent times, the rise of several multimodal (audio, video, etc.) content-sharing sites like Soundcloud and Dubsmash have made development of sentiment analytical techniques for these imperative. Particularly, there is much to explore when it comes to audio data, which has proliferated rapidly. Of all the various aspects of audio sentiment studies, emotion recognition in speech signals has gained momentum and attention in recent times. Recognizing specific emotions inherent in spoken language could go a long way in healthcare, information sciences, human–computer interaction, etc. This chapter examines the process of delineating sentiments from speech, and the impact of various deep learning techniques on the same. Factors like extracting relevant features and the performances of several deep learning architectures on such datasets are analyzed. Performances using various classical and deep learning approaches are presented as well. Finally, some conclusions and suggestions on the way forward are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sawhney, R., P. Manchanda, P. Mathur, R. Shah, and R. Singh. 2018, October. Exploring and learning suicidal ideation connotations on social media with deep learning. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 167–175
Mathur, P., R. Shah, R. Sawhney, and D. Mahata. 2018, July. Detecting offensive tweets in Hindi-English code-switched language. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, 18–26
Mathur, P., R. Sawhney, M. Ayyar, and R. Shah. 2018, October. Did you offend me? Classification of offensive tweets in hinglish language. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), 138–148
Sawhney, R., P. Manchanda, R. Singh, and S. Aggarwal. 2018, July. A computational approach to feature extraction for identification of suicidal ideation in tweets. In Proceedings of ACL 2018, Student Research Workshop, 91–98
Mishra, R., P. Sinha, R. Sawhney, D. Mahata, P. Mathur, and R. Shah. 2019, June. SNAP-BATNET: Cascading author profiling and social network graphs for suicide ideation detection on social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Chowdhury, A., R. Sawhney, P. Mathur, D. Mahata, and R. Shah. 2019, June. Speak up, Fight Back! Detection of social media disclosures of sexual harassment. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Sawhney, R., P. Mathur, and R. Shankar. 2018, May. A firefly algorithm based wrapper-penalty feature selection method for cancer diagnosis. In International Conference on Computational Science and Its Applications, 438–449. Cham: Springer
Jain, R., R. Sawhney, and P. Mathur. 2018, March. Feature selection for cryotherapy and immunotherapy treatment methods based on gravitational search algorithm. In 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 1–7. IEEE
Sawhney, R., and R. Jain. 2018, February. Modified Binary Dragonfly algorithm for Feature Selection in Human Papillomavirus-Mediated disease treatment. In 2018 International Conference on Communication, Computing and Internet of Things (IC3IoT)
Dellaert, F., T. Polzin, and A. Waibel. 1996, October. Recognizing emotion in speech. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, vol. 3, 1970–1973. IEEE
Vogt, T., and E. Andr. 2005, July. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In 2005 IEEE International Conference on Multimedia and Expo, 474–477. IEEE
Petrushin, V.A. 2000. Emotion recognition in speech signal: Experimental study, development, and application
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology 70(3), 614 (1996)
Rabiner, L.R., Sambur, M.R.: An algorithm for determining the endpoints of isolated utterances. Bell System Technical Journal 54(2), 297–315 (1975)
Zhu, A., and Q. Luo. 2007. Study on speech emotion recognition system in E-learning. In Human Computer Interaction, Part III, HCII, ed. J. Jacko, 544–552. LNCS. Berlin: Springer
Zhang, S. 2008. Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Advances in Neural Networks, ed. Sun et al. 457–464. Lecture Notes in Computer Science. Berlin: Springer
Origlia, A., V. Galatá, and B. Ludusan. 2010. Automatic classification of emotions via global and local prosodic features on a multilingual emotional database. In Proceeding of the 2010 Speech Prosody. Chicago
Hu, H., M.X. Xu, and W. Wu. 2007, April. GMM supervector based SVM with spectral features for speech emotion recognition. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4, IV-413. IEEE
Bitouk, D., Verma, R., Nenkova, A.: Class-level spectral features for emotion recognition. Speech Communication 52(7–8), 613–625 (2010)
Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America 55(6), 1304–1312 (1974)
Morrison, D., Wang, R., De Silva, L.C.: Ensemble methods for spoken emotion recognition in call-centres. Speech Communication 49(2), 98–112 (2007)
Razak, A.A., R. Komiya, M. Izani, and Z. Abidin. 2005, July. Comparison between fuzzy and NN method for speech emotion recognition. In Third International Conference on Information Technology and Applications (ICITA’05), vol. 1, 297–302. IEEE
Shrawankar, U., and M. Thakare. 2013. Techniques for feature extraction in speech recognition system: A comparative study. arXiv preprint arXiv:1305.1145
Gupta, H., and D. Gupta. 2016, January. LPC and LPCC method of feature extraction in Speech Recognition System. In 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence), 498–502. IEEE
Mao, X., Chen, L., Zhang, B.: Mandarin speech emotion recognition based on a hybrid of HMM/ANN. International Journal of Computers 1(4), 321–324 (2007)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Sato, N., Obuchi, Y.: Emotion recognition using mel-frequency cepstral coefficients. Information and Media Technologies 2(3), 835–848 (2007)
Bou-Ghazale, S.E., Hansen, J.H.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing 8(4), 429–442 (2000)
Molau, S., M. Pitz, R. Schluter, and H. Ney. 2001. Computing mel-frequency cepstral coefficients on the power spectrum. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 1, 73–76. IEEE
Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Communication 53(5), 768–785 (2011)
Liu, G.K. 2018. Evaluating Gammatone Frequency Cepstral Coefficients with neural networks for emotion recognition from speech. arXiv preprint arXiv:1806.09010
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Communication 41(4), 603–623 (2003)
Seehapoch, T., and S. Wongthanavasu. 2013, January. Speech emotion recognition using support vector machines. In 2013 5th International Conference on Knowledge and Smart Technology (KST), 86–91. IEEE
Schuller, B., G. Rigoll, and M. Lang. 2004, May. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, I-577. IEEE
Kwon, O.W., K. Chan, J. Hao, and T.W. Lee. 2003. Emotion recognition by speech signals. In Eighth European Conference on Speech Communication and Technology
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Ladha, L., Deepa, T.: Feature selection methods and algorithms. International Journal on Computer Science and Engineering 3(5), 1787–1797 (2011)
Chrysostomou, K. 2009. Wrapper feature selection. In Encyclopedia of Data Warehousing and Mining, 2nd ed, 2103–2108. IGI Global
Lee, C.M., S. Narayanan, and R. Pieraccini. 2001. Recognition of negative emotions from the speech signal. In IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU’01, 240–243. IEEE
Sedaaghi, M.H., C. Kotropoulos, and D. Ververidis. 2007, October. Using adaptive genetic algorithms to improve speech emotion recognition. In 2007 IEEE 9th Workshop on Multimedia Signal Processing, 461–464. IEEE
Schuller, B., D. Arsi, F. Wallhoff, M. Land, and G. Rigoll. 2005. Bioanalogacoustic emotion recognition by genetic feature generation based on low-level-descriptors. In Proceedings of the International Conference on Computer as Tool (EUROCON), 1292–1295
Rong, J., Li, G., Chen, Y.P.P.: Acoustic feature selection for automatic emotion recognition from speech. Information Processing & Management 45(3), 315–328 (2009)
Petrushin, V. 1999, November. Emotion in speech: Recognition and application to call centers. In Proceedings of Artificial Neural Networks in Engineering, vol. 710, 22
Kononenko, I. 1994, April. Estimating attributes: Analysis and extensions of RELIEF. In European Conference on Machine Learning, 171–182. Berlin: Springer
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44(3), 572–587 (2011)
Koolagudi, S.G., Rao, K.S.: Emotion recognition from speech: A review. International Journal of Speech Technology 15(2), 99–117 (2012)
Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review 43(2), 155–177 (2015)
Vogt, T., E. Andr, and N. Bee. 2008, June. EmoVoiceA framework for online recognition of emotions from voice. In International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, 188–199. Berlin: Springer
Kang, B.S., C.H. Han, S.T. Lee, D.H. Youn, and C. Lee. 2000. Speaker dependent emotion recognition using speech signals
Kostoulas, T.P., and N. Fakotakis. 2006, July. A speaker dependent emotion recognition framework. In Proceedings of 5th International Symposium, Communication Systems, Networks and Digital Signal Processing (CSNDSP), 305–309. University of Patras
Cen, L., W. Ser, and Z.L. Yu. 2008, December. Speech emotion recognition using canonical correlation analysis and probabilistic neural network. In 2008 Seventh International Conference on Machine Learning and Applications, 859–862. IEEE
Anagnostopoulos, C.N., and E. Vovoli. 2009. Sound processing features for speaker-dependent and phrase-independent emotion recognition in Berlin Database. In Information Systems Development, 413–421. Boston, MA: Springer
Lang, P.J.: The emotion probe: Studies of motivation and attention. American Psychologist 50(5), 372 (1995)
Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39(6), 1161 (1980)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Wang, L. (ed.). 2005. Support Vector Machines: Theory and Applications, vol. 177. Springer Science & Business Media
Laptev, I., and B. Caputo. 2004, August. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, 32–36. IEEE
Joachims, T. 1998, April. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning, 137–142. Berlin: Springer
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Chavhan, Y., Dhore, M.L., Yesaware, P.: Speech emotion recognition using support vector machine. International Journal of Computer Applications 1(20), 6–9 (2010)
Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. International Journal of Smart Home 6(2), 101–108 (2012)
Zhou, Y., Y. Sun, J. Zhang, and Y. Yan. 2009. Speech emotion recognition using both spectral and prosodic features. In International Conference on Information Engineering and Computer Science, ICIECS, Wuhan, December 1920, 1–4. New York: IEEE Press
Rajasekhar, A., and M.K. Hota. 2018, April. A study of speech, speaker and emotion recognition using mel frequency cepstrum coefficients and support vector machines. In 2018 International Conference on Communication and Signal Processing (ICCSP), 0114–0118. IEEE
Ram, C.S., and R. Ponnusamy. 2014, February. An effective automatic speech emotion recognition for Tamil language using Support Vector Machine. In 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), 19–23. IEEE
Sinith, M.S., E. Aswathi, T.M. Deepa, C.P. Shameema, and S. Rajan. 2015, December. Emotion recognition from audio signals using Support Vector Machine. In 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 139–144. IEEE
Koolagudi, S.G., S. Maity, V.A. Kumar, S. Chakrabarti, and K.S. Rao. 2009, August. IITKGP-SESC: Speech database for emotion analysis. In International Conference on Contemporary Computing, 485–492. Berlin: Springer
Koolagudi, S.G., R. Reddy, J. Yadav, and K.S. Rao. 2011, February. IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In 2011 International Conference on Devices and Communications (ICDeCom), 1–5. IEEE
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)
Sonnhammer, E.L., Von Heijne, G., Krogh, A.: July. A hidden Markov model for predicting transmembrane helices in protein sequences. Ismb 6, 175–182 (1998)
Miller, D.R., Leek, T., Schwartz, R.M.: August. A hidden Markov model information retrieval system. SIGIR 99, 214–221 (1999)
Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. Computer Speech & Language 6(3), 225–242 (1992)
Schuller, B., G. Rigoll, and M. Lang. 2003, April. Hidden Markov model-based speech emotion recognition. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP’03), vol. 2, II-1. IEEE
Lin, Y.L., and G. Wei. 2005, August. Speech emotion recognition based on HMM and SVM. In 2005 International Conference on Machine Learning and Cybernetics, vol. 8, 4898–4901. IEEE
Le, D., and E.M. Provost. 2013, December. Emotion recognition from spontaneous speech using hidden Markov models with deep belief networks. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 216–221. IEEE
Huang, R., and C. Ma. 2006, August. Toward a speaker-independent real-time affect detection system. In 18th International Conference on Pattern Recognition (ICPR’06), vol. 1, 1204–1207. IEEE
Yang, M.H., and N. Ahuja. 1998, December. Gaussian mixture model for human skin color and its applications in image and video databases. In Storage and Retrieval for Image and Video Databases VII, vol. 3656, 458–467. International Society for Optics and Photonics
Huang, Y., Englehart, K.B., Hudgins, B., Chan, A.D.: A Gaussian mixture model based classification scheme for myoelectric control of powered upper limb prostheses. IEEE Transactions on Biomedical Engineering 52(11), 1801–1811 (2005)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83 (1995)
Neiberg, D., K. Elenius, K. Laskowski. (2006). Emotion recognition in spontaneous speech using GMMs
Thapliyal, N., Amoli, G.: Speech based emotion recognition with gaussian mixture model. International Journal of Advanced Research in Computer Engineering & Technology 1(5), 2278–1323 (2012)
Ververidis, D., and C. Kotropoulos. 2005, May. Emotional speech classification using Gaussian mixture models. In 2005 IEEE International Symposium on Circuits and Systems, 2871–2874. IEEE
Ververidis, D., and C. Kotropoulos. 2005, July. Emotional speech classification using Gaussian mixture models and the sequential floating forward selection algorithm. In 2005 IEEE International Conference on Multimedia and Expo, 1500–1503. IEEE
Pudil, P., F.J. Ferri, J. Novovicova, and J. Kittler. 1994, October. Floating search methods for feature selection with non-monotonic criterion functions. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3-Conference C: Signal Processing (Cat. No. 94CH3440-5), vol. 2, 279–283. IEEE
Bombatkar, A., G. Bhoyar, K. Morjani, S. Gautam, and V. Gupta. 2014. Emotion recognition using Speech Processing Using k-nearest neighbor algorithm. International Journal of Engineering Research and Applications (IJERA). ISSN, 2248-9622
Khan, M., Goskula, T., Nasiruddin, M., Quazi, R.: Comparison between K-NN and SVM method for speech emotion recognition. International Journal on Computer Science and Engineering 3(2), 607–611 (2011)
Lee, C.C., Mower, E., Busso, C., Lee, S., Narayanan, S.: Emotion recognition using a hierarchical binary decision tree approach. Speech Communication 53(9–10), 1162–1171 (2011)
Cichosz, J., and K. Slot. 2007. Emotion recognition in speech signal using emotion-extracting binary decision trees. In Proceedings of Affective Computing and Intelligent Interaction
Schuller, B., S. Reiter, R. Muller, M. Al-Hames, M. Lang, and G. Rigoll. 2005, July. Speaker independent speech emotion recognition by ensemble classification. In 2005 IEEE International Conference on Multimedia and Expo, 864–867. IEEE
Schuller, B., M. Lang, and G. Rigoll. 2005. Robust acoustic speech emotion recognition by ensembles of classifiers. In Tagungsband Fortschritte der Akustik-DAGA 05, MĂĽnchen
LeCun, Y., Bengio, Y., Hinton, Geoffrey: Deep learning. Nature 521(7553), 436–444 (2015)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Krizhevsky, A., I. Sutskever, and G.E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, 1097–1105
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)
He, K., X. Zhang, S. Ren, and J. Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the 13th European Conference on Computer Vision, 346–3610. New York, NY, USA: Springer
Stuhlsatz, A., et al. 2011. Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 5688–5691
Han, K., D. Yu, and I. Tashev. 2014. Speech emotion recognition using deep neural network and extreme learning machine. In Proceedings of Interspeech, 223–227
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: Theory and applications. Neurocomputing 70(1), 489–501 (2006)
Huang, Z., M. Dong, Q. Mao, Y. Zhan. 2014. Speech emotion recognition using CNN. In Proceedings of ACM International Conference on Multimedia, New York, NY, USA, 801–804
Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia 16(8), 2203–2213 (2014)
Trigeorgis, G., et al. 2016. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In Proceedings of the 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, Shanghai, China, 5200–5204
Bhargava, M., and R. Rose. 2015. Architectures for deep neural network based acoustic models defined over windowed speech waveforms. In Sixteenth Annual Conference of the International Speech Communication Association
Sainath, T.N., R.J. Weiss, A. Senior, K.W. Wilson, and O. Vinyals. 2015. Learning the speech front-end with raw waveform CLDNNs. In Sixteenth Annual Conference of the International Speech Communication Association
Kim, J., and R.A. Saurous. 2018. Emotion recognition from human speech using temporal information and deep learning. In Proceedings of Interspeech, 937–940
Zhang, S., Zhang, S., Huang, T., Gao, W.: Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia 20(6), 1576–1590 (2018)
G. Alex, et al. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In International Conference on Artificial Neural Networks, September 11–15, 799–804
Chen, M., He, X., Yang, J., Zhang, H.: 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Letters 25(10), 1440–1444 (2018)
Rasmus, A., M. Berglund, M. Honkala, H. Valpola, and T. Raiko. 2015. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, 3546–3554
Huang, J., Y. Li, J. Tao, Z. Lian, M. Niu, and J. Yi. 2018, May. Speech emotion recognition using semi-supervised learning with ladder networks. In 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), 1–5. IEEE
Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. Journal of Statistical Software 36(11), 1–13 (2010)
Jaitly, N., and G. Hinton. 2011, May. Learning a better representation of speech soundwaves using restricted boltzmann machines. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5884–5887. IEEE
Lawrence, I., Lin, K.: A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268 (1989)
Ringeval, F., A. Sonderegger, J. Sauer, and D. Lalanne. 2013, April. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 1–8. IEEE
Wei, H.C., and S.S. Narayanan. 2016. Attention assisted discovery of sub-utterance structure in speech emotion recognition. In Proceedings of Interspeech, 1387–1391
Huang, C.W., S.S. Narayanan. 2017. Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition. In Proceedings of IEEE International Conference on Multimedia and Expo, Hong Kong, 583–588
Michael, N., and N.T. Vu. 2017. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. In Proceedings of Interspeech, 1263–1267
Chan, W., and I. Lane. 2015. Deep convolutional neural networks for acoustic modeling in low resource languages. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2056–2060
Eyben, F., et al. 2016, April-June. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing 7 (2): 190–202
Provost, E.M. 2013. Identifying salient sub-utterance emotion dynamics using flexible units and estimates of affective ow. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 3682–3686
Wollmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: LSTM modeling of continuous emotions in an audiovisual affect recognition framework. Image and Vision Computing 31(2), 153–163 (2013)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 2169–2178 (2006)
Murray, I., Arnott, J.: Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America 32(2), 1097–1108 (1993)
Chaspari, T., D. Dimitriadis, and P. Maragos, Emotion classification of speech using modulation features. In Proceedings of the European Signal Processing Conference (EUSIPCO)
Kingma, D., and J. Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of ICLR, San Diego, USA
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Bhavan, A., Sharma, M., Piplani, M., Chauhan, P., Hitkul, Shah, R.R. (2020). Deep Learning Approaches for Speech Emotion Recognition. In: Agarwal, B., Nayak, R., Mittal, N., Patnaik, S. (eds) Deep Learning-Based Approaches for Sentiment Analysis. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-1216-2_10
Download citation
DOI: https://doi.org/10.1007/978-981-15-1216-2_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1215-5
Online ISBN: 978-981-15-1216-2
eBook Packages: EngineeringEngineering (R0)