Abstract
This paper presents a multiresolution-based feature extraction for speech emotion recognition in unconstrained environments. In the proposed approach, Mel Frequency Cepstral Coefficients (MFCC) are derived from Discrete Wavelet Transform (DWT) sub-band coefficients. The extracted features are further combined with conventional MFCCs and pitch-based features to form the final feature vector. Linear Discriminant Analysis (LDA) is used to reduce the dimension of the resulting features set prior to Naive Bayes classification. To assess the performance of the proposed approach in unconstrained environments, noisy speech data are generated by adding real world noises to clean speech signals from the Berlin German Emotional Database (EMO-DB) The proposal is also tested through speaker-dependent and speaker-independent experiments. The overall performance results show improvement in speech emotion detection over baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cowie, R., et al.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)
López-de Ipiña, K., et al.: On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn. Comput. 7(1), 44–55 (2015)
Petrushin, V.: Emotion in speech: recognition and application to call centers. In: Proceedings of Artificial Neural Networks in Engineering (2000)
Xiaoqing, J., Kewen, X., Yongliang, L., Jianchuan, B.: Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning. J. China Univ. Posts Telecommun. 24(2), 1–17 (2017)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: INTERSPEECH, ISCA, pp. 1517–1520 (2005)
Juszkiewicz, Ł.: Improving noise robustness of speech emotion recognition system. In: Zavoral, F., Jung, J.J., Badica, C. (eds.) Intelligent Distributed Computing VII. Studies in Computational Intelligence, vol. 511, pp. 223–232. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-01571-2_27
Staroniewicz, P., Majewski, W.: Polish emotional speech database – recording and preliminary validation. In: Esposito, A., Vích, R. (eds.) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. LNCS (LNAI), vol. 5641, pp. 42–49. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03320-9_5
Tawari, A., Trivedi, M.M.: Speech emotion analysis in noisy real-world environment. In: 2010 20th International Conference on Pattern Recognition, pp. 4605–4608, August 2010
Huang, C., Chen, G., Yu, H., Bao, Y., Zhao, L.: Speech emotion recognition under white noise. Arch. Acoust. 38(4), 457–463 (2013)
Hyun, K., Kim, E., Kwak, Y.: Robust speech emotion recognition using log frequency power ratio. In: 2006 SICE-ICASE International Joint Conference, pp. 2586–2589, October 2006
Yeh, L.Y., Chi, T.S.: Spectro-temporal modulations for robust speech emotion recognition. In: INTERSPEECH (2010)
Georgogiannis, A., Digalakis, V.: Speech emotion recognition using non-linear teager energy based features in noisy environments. In: 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 2045–2049, August 2012
Bashirpour, M., Geravanchizadeh, M.: Speech emotion recognition based on power normalized cepstral coefficients in noisy conditions. Iran. J. Electr. Electron. Eng. 12, 197–205 (2016)
Karimi, S., Sedaaghi, M.H.: Robust emotional speech classification in the presence of babble noise. Int. J. Speech Technol. 16(2), 215–227 (2013)
Huang, Y., Tian, K., Wu, A., Zhang, G.: Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition. J. Ambient Intell. Humaniz. Comput. 10(5), 1787–1798 (2017)
Schuller, B., Arsic, D., Wallhoff, F., Rigoll, G.: Emotion recognition in the noise applying large acoustic feature sets. In: Speech Prosody (2006)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Lalitha, S., Geyasruti, D., Narayanan, R., Shravani, M.: Emotion detection using MFCC and cepstrum features. Procedia Comput. Sci. 70, 29–35 (2015). Proceedings of the 4th International Conference on Eco-friendly Computing and Communication Systems
Chelali, F.Z., Djeradi, A.: Text dependant speaker recognition using MFCC, LPC and DWT. Int. J. Speech Technol. 20(3), 725–740 (2017)
Picone, J.W.: Signal modeling techniques in speech recognition. Proc. IEEE 81(9), 1215–1247 (1993)
Kopparapu, S.K., Laxminarayana, M.: Choice of Mel filter bank in computing MFCC of a resampled speech. In: 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp. 121–124, May 2010
Subasi, A., Gursoy, M.I.: EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Syst. Appl. 37(12), 8659–8666 (2010)
Sekkate, S., Khalil, M., Adib, A.: A feature level fusion scheme for robust speaker identification. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds.) BDCA 2018. CCIS, vol. 872, pp. 289–300. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96292-4_23
Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, San diego (1998)
Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Klein, W.B., Palival, K.K. (eds.) Speech Coding and Synthesis. Elsevier, Amsterdam (1995)
Chebbi, S., Jebara, S.B.: On the use of pitch-based features for fear emotion detection from speech. In: 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–6, March 2018
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)
Darekar, R.V., Dhande, A.P.: Emotion recognition from marathi speech database using adaptive artificial neural network. Biologically Inspired Cogn. Archit. 23, 35–42 (2018)
Aouani, H., Ayed, Y.B.: Emotion recognition in speech using MFCC with SVM, DSVM and auto-encoder. In: 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–5, March 2018
Gómez-Lopera, J., Martínez-Aroza, J., Román-Roldán, R., Román-Gálvez, R., Blanco-Navarro, D.: The evaluation problem in discrete semi-hidden Markov models. Math. Comput. Simul. 137, 350–365 (2017). MAMERN VI-2015: 6th International Conference on Approximation Methods and Numerical Modeling in Environment and Natural Resources
Bhakre, S.K., Bang, A.: Emotion recognition on the basis of audio signal using naive Bayes classifier. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2363–2367, September 2016
Duda, R., Hart, P.: Pattern Classifications and Scene Analysis. Wiley, New York (1973)
Rao, K.S., Koolagudi, S.G.: Robust emotion recognition using pitch synchronous and sub-syllabic spectral features. In: Rao, K.S., Koolagudi, S.G. (eds.) Robust Emotion Recognition using Spectral and Prosodic Features, pp. 17–46. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6360-3_2
Palo, H.K., Mohanty, M.N.: Wavelet based feature combination for recognition of emotions. Ain Shams Eng. J. 9(4), 1799–1806 (2018)
Seehapoch, T., Wongthanavasu, S.: Speech emotion recognition using support vector machines. In: 2013 5th International Conference on Knowledge and Smart Technology (KST), pp. 86–91, January 2013
Wang, K., An, N., Li, L.: Speech emotion recognition based on wavelet packet coefficient model. In: The 9th International Symposium on Chinese Spoken Language Processing, pp. 478–482, September 2014
Shahnaz, C., et al.: Emotion recognition based on EMD-wavelet analysis of speech signals. In: 2015 IEEE International Conference on Digital Signal Processing (DSP), pp. 307–310, July 2015
Bhargava, M., Polzehl, T.: Improving automatic emotion recognition from speech using rhythm and temporal feature. In: ICECIT, pp. 2229–3116, March 2013
Pearce, D., Hirsch, H.G., Gmbh, E.E.D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR 2000, pp. 29–32 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sekkate, S., Khalil, M., Adib, A., Ben Jebara, S. (2019). A Multiresolution-Based Fusion Strategy for Improving Speech Emotion Recognition Efficiency. In: Renault, É., Boumerdassi, S., Leghris, C., Bouzefrane, S. (eds) Mobile, Secure, and Programmable Networking. MSPN 2019. Lecture Notes in Computer Science(), vol 11557. Springer, Cham. https://doi.org/10.1007/978-3-030-22885-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-22885-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22884-2
Online ISBN: 978-3-030-22885-9
eBook Packages: Computer ScienceComputer Science (R0)