A Multiresolution-Based Fusion Strategy for Improving Speech Emotion Recognition Efficiency

Sekkate, Sara; Khalil, Mohammed; Adib, Abdellah; Ben Jebara, Sofia

doi:10.1007/978-3-030-22885-9_10

Sara Sekkate¹⁸,
Mohammed Khalil¹⁸,
Abdellah Adib¹⁸ &
…
Sofia Ben Jebara¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 11557))

Included in the following conference series:

International Conference on Mobile, Secure, and Programmable Networking

713 Accesses
4 Citations

Abstract

This paper presents a multiresolution-based feature extraction for speech emotion recognition in unconstrained environments. In the proposed approach, Mel Frequency Cepstral Coefficients (MFCC) are derived from Discrete Wavelet Transform (DWT) sub-band coefficients. The extracted features are further combined with conventional MFCCs and pitch-based features to form the final feature vector. Linear Discriminant Analysis (LDA) is used to reduce the dimension of the resulting features set prior to Naive Bayes classification. To assess the performance of the proposed approach in unconstrained environments, noisy speech data are generated by adding real world noises to clean speech signals from the Berlin German Emotional Database (EMO-DB) The proposal is also tested through speaker-dependent and speaker-independent experiments. The overall performance results show improvement in speech emotion detection over baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cowie, R., et al.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)
Article Google Scholar
López-de Ipiña, K., et al.: On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn. Comput. 7(1), 44–55 (2015)
Article Google Scholar
Petrushin, V.: Emotion in speech: recognition and application to call centers. In: Proceedings of Artificial Neural Networks in Engineering (2000)
Google Scholar
Xiaoqing, J., Kewen, X., Yongliang, L., Jianchuan, B.: Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning. J. China Univ. Posts Telecommun. 24(2), 1–17 (2017)
Article Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: INTERSPEECH, ISCA, pp. 1517–1520 (2005)
Google Scholar
Juszkiewicz, Ł.: Improving noise robustness of speech emotion recognition system. In: Zavoral, F., Jung, J.J., Badica, C. (eds.) Intelligent Distributed Computing VII. Studies in Computational Intelligence, vol. 511, pp. 223–232. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-01571-2_27
Chapter Google Scholar
Staroniewicz, P., Majewski, W.: Polish emotional speech database – recording and preliminary validation. In: Esposito, A., Vích, R. (eds.) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. LNCS (LNAI), vol. 5641, pp. 42–49. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03320-9_5
Chapter Google Scholar
Tawari, A., Trivedi, M.M.: Speech emotion analysis in noisy real-world environment. In: 2010 20th International Conference on Pattern Recognition, pp. 4605–4608, August 2010
Google Scholar
Huang, C., Chen, G., Yu, H., Bao, Y., Zhao, L.: Speech emotion recognition under white noise. Arch. Acoust. 38(4), 457–463 (2013)
Article Google Scholar
Hyun, K., Kim, E., Kwak, Y.: Robust speech emotion recognition using log frequency power ratio. In: 2006 SICE-ICASE International Joint Conference, pp. 2586–2589, October 2006
Google Scholar
Yeh, L.Y., Chi, T.S.: Spectro-temporal modulations for robust speech emotion recognition. In: INTERSPEECH (2010)
Google Scholar
Georgogiannis, A., Digalakis, V.: Speech emotion recognition using non-linear teager energy based features in noisy environments. In: 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 2045–2049, August 2012
Google Scholar
Bashirpour, M., Geravanchizadeh, M.: Speech emotion recognition based on power normalized cepstral coefficients in noisy conditions. Iran. J. Electr. Electron. Eng. 12, 197–205 (2016)
Google Scholar
Karimi, S., Sedaaghi, M.H.: Robust emotional speech classification in the presence of babble noise. Int. J. Speech Technol. 16(2), 215–227 (2013)
Article Google Scholar
Huang, Y., Tian, K., Wu, A., Zhang, G.: Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition. J. Ambient Intell. Humaniz. Comput. 10(5), 1787–1798 (2017)
Article Google Scholar
Schuller, B., Arsic, D., Wallhoff, F., Rigoll, G.: Emotion recognition in the noise applying large acoustic feature sets. In: Speech Prosody (2006)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
Lalitha, S., Geyasruti, D., Narayanan, R., Shravani, M.: Emotion detection using MFCC and cepstrum features. Procedia Comput. Sci. 70, 29–35 (2015). Proceedings of the 4th International Conference on Eco-friendly Computing and Communication Systems
Article Google Scholar
Chelali, F.Z., Djeradi, A.: Text dependant speaker recognition using MFCC, LPC and DWT. Int. J. Speech Technol. 20(3), 725–740 (2017)
Article Google Scholar
Picone, J.W.: Signal modeling techniques in speech recognition. Proc. IEEE 81(9), 1215–1247 (1993)
Article Google Scholar
Kopparapu, S.K., Laxminarayana, M.: Choice of Mel filter bank in computing MFCC of a resampled speech. In: 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp. 121–124, May 2010
Google Scholar
Subasi, A., Gursoy, M.I.: EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Syst. Appl. 37(12), 8659–8666 (2010)
Article Google Scholar
Sekkate, S., Khalil, M., Adib, A.: A feature level fusion scheme for robust speaker identification. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds.) BDCA 2018. CCIS, vol. 872, pp. 289–300. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96292-4_23
Chapter Google Scholar
Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, San diego (1998)
MATH Google Scholar
Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Klein, W.B., Palival, K.K. (eds.) Speech Coding and Synthesis. Elsevier, Amsterdam (1995)
Google Scholar
Chebbi, S., Jebara, S.B.: On the use of pitch-based features for fear emotion detection from speech. In: 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–6, March 2018
Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)
Article Google Scholar
Darekar, R.V., Dhande, A.P.: Emotion recognition from marathi speech database using adaptive artificial neural network. Biologically Inspired Cogn. Archit. 23, 35–42 (2018)
Article Google Scholar
Aouani, H., Ayed, Y.B.: Emotion recognition in speech using MFCC with SVM, DSVM and auto-encoder. In: 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–5, March 2018
Google Scholar
Gómez-Lopera, J., Martínez-Aroza, J., Román-Roldán, R., Román-Gálvez, R., Blanco-Navarro, D.: The evaluation problem in discrete semi-hidden Markov models. Math. Comput. Simul. 137, 350–365 (2017). MAMERN VI-2015: 6th International Conference on Approximation Methods and Numerical Modeling in Environment and Natural Resources
Article MathSciNet Google Scholar
Bhakre, S.K., Bang, A.: Emotion recognition on the basis of audio signal using naive Bayes classifier. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2363–2367, September 2016
Google Scholar
Duda, R., Hart, P.: Pattern Classifications and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Rao, K.S., Koolagudi, S.G.: Robust emotion recognition using pitch synchronous and sub-syllabic spectral features. In: Rao, K.S., Koolagudi, S.G. (eds.) Robust Emotion Recognition using Spectral and Prosodic Features, pp. 17–46. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6360-3_2
Chapter Google Scholar
Palo, H.K., Mohanty, M.N.: Wavelet based feature combination for recognition of emotions. Ain Shams Eng. J. 9(4), 1799–1806 (2018)
Article Google Scholar
Seehapoch, T., Wongthanavasu, S.: Speech emotion recognition using support vector machines. In: 2013 5th International Conference on Knowledge and Smart Technology (KST), pp. 86–91, January 2013
Google Scholar
Wang, K., An, N., Li, L.: Speech emotion recognition based on wavelet packet coefficient model. In: The 9th International Symposium on Chinese Spoken Language Processing, pp. 478–482, September 2014
Google Scholar
Shahnaz, C., et al.: Emotion recognition based on EMD-wavelet analysis of speech signals. In: 2015 IEEE International Conference on Digital Signal Processing (DSP), pp. 307–310, July 2015
Google Scholar
Bhargava, M., Polzehl, T.: Improving automatic emotion recognition from speech using rhythm and temporal feature. In: ICECIT, pp. 2229–3116, March 2013
Google Scholar
Pearce, D., Hirsch, H.G., Gmbh, E.E.D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR 2000, pp. 29–32 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Team Networks, Telecoms & Multimedia, LIM@II-FSTM, B.P. 146, 20650, Mohammedia, Morocco
Sara Sekkate, Mohammed Khalil & Abdellah Adib
COSIM Laboratory, Higher School of Communications of Tunis, Carthage University, Route de Raoued Km 3.5, Cité El Ghazala, 2083, Ariana, Tunisia
Sofia Ben Jebara

Authors

Sara Sekkate
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Khalil
View author publications
You can also search for this author in PubMed Google Scholar
Abdellah Adib
View author publications
You can also search for this author in PubMed Google Scholar
Sofia Ben Jebara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sara Sekkate .

Editor information

Editors and Affiliations

Télécom SudParis, Évry, France
Éric Renault
Conservatoire National des Arts et Métiers, Paris, France
Selma Boumerdassi
Hassan II Mohammedia University, Mohammedia, Morocco
Cherkaoui Leghris
Conservatoire National des Arts et Métiers, Paris, France
Samia Bouzefrane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sekkate, S., Khalil, M., Adib, A., Ben Jebara, S. (2019). A Multiresolution-Based Fusion Strategy for Improving Speech Emotion Recognition Efficiency. In: Renault, É., Boumerdassi, S., Leghris, C., Bouzefrane, S. (eds) Mobile, Secure, and Programmable Networking. MSPN 2019. Lecture Notes in Computer Science(), vol 11557. Springer, Cham. https://doi.org/10.1007/978-3-030-22885-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-22885-9_10
Published: 21 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22884-2
Online ISBN: 978-3-030-22885-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics