Skip to main content

A Multiresolution-Based Fusion Strategy for Improving Speech Emotion Recognition Efficiency

  • Conference paper
  • First Online:
Mobile, Secure, and Programmable Networking (MSPN 2019)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 11557))

Abstract

This paper presents a multiresolution-based feature extraction for speech emotion recognition in unconstrained environments. In the proposed approach, Mel Frequency Cepstral Coefficients (MFCC) are derived from Discrete Wavelet Transform (DWT) sub-band coefficients. The extracted features are further combined with conventional MFCCs and pitch-based features to form the final feature vector. Linear Discriminant Analysis (LDA) is used to reduce the dimension of the resulting features set prior to Naive Bayes classification. To assess the performance of the proposed approach in unconstrained environments, noisy speech data are generated by adding real world noises to clean speech signals from the Berlin German Emotional Database (EMO-DB) The proposal is also tested through speaker-dependent and speaker-independent experiments. The overall performance results show improvement in speech emotion detection over baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cowie, R., et al.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)

    Article  Google Scholar 

  2. López-de Ipiña, K., et al.: On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn. Comput. 7(1), 44–55 (2015)

    Article  Google Scholar 

  3. Petrushin, V.: Emotion in speech: recognition and application to call centers. In: Proceedings of Artificial Neural Networks in Engineering (2000)

    Google Scholar 

  4. Xiaoqing, J., Kewen, X., Yongliang, L., Jianchuan, B.: Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning. J. China Univ. Posts Telecommun. 24(2), 1–17 (2017)

    Article  Google Scholar 

  5. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: INTERSPEECH, ISCA, pp. 1517–1520 (2005)

    Google Scholar 

  6. Juszkiewicz, Ł.: Improving noise robustness of speech emotion recognition system. In: Zavoral, F., Jung, J.J., Badica, C. (eds.) Intelligent Distributed Computing VII. Studies in Computational Intelligence, vol. 511, pp. 223–232. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-01571-2_27

    Chapter  Google Scholar 

  7. Staroniewicz, P., Majewski, W.: Polish emotional speech database – recording and preliminary validation. In: Esposito, A., Vích, R. (eds.) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. LNCS (LNAI), vol. 5641, pp. 42–49. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03320-9_5

    Chapter  Google Scholar 

  8. Tawari, A., Trivedi, M.M.: Speech emotion analysis in noisy real-world environment. In: 2010 20th International Conference on Pattern Recognition, pp. 4605–4608, August 2010

    Google Scholar 

  9. Huang, C., Chen, G., Yu, H., Bao, Y., Zhao, L.: Speech emotion recognition under white noise. Arch. Acoust. 38(4), 457–463 (2013)

    Article  Google Scholar 

  10. Hyun, K., Kim, E., Kwak, Y.: Robust speech emotion recognition using log frequency power ratio. In: 2006 SICE-ICASE International Joint Conference, pp. 2586–2589, October 2006

    Google Scholar 

  11. Yeh, L.Y., Chi, T.S.: Spectro-temporal modulations for robust speech emotion recognition. In: INTERSPEECH (2010)

    Google Scholar 

  12. Georgogiannis, A., Digalakis, V.: Speech emotion recognition using non-linear teager energy based features in noisy environments. In: 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 2045–2049, August 2012

    Google Scholar 

  13. Bashirpour, M., Geravanchizadeh, M.: Speech emotion recognition based on power normalized cepstral coefficients in noisy conditions. Iran. J. Electr. Electron. Eng. 12, 197–205 (2016)

    Google Scholar 

  14. Karimi, S., Sedaaghi, M.H.: Robust emotional speech classification in the presence of babble noise. Int. J. Speech Technol. 16(2), 215–227 (2013)

    Article  Google Scholar 

  15. Huang, Y., Tian, K., Wu, A., Zhang, G.: Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition. J. Ambient Intell. Humaniz. Comput. 10(5), 1787–1798 (2017)

    Article  Google Scholar 

  16. Schuller, B., Arsic, D., Wallhoff, F., Rigoll, G.: Emotion recognition in the noise applying large acoustic feature sets. In: Speech Prosody (2006)

    Google Scholar 

  17. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  18. Lalitha, S., Geyasruti, D., Narayanan, R., Shravani, M.: Emotion detection using MFCC and cepstrum features. Procedia Comput. Sci. 70, 29–35 (2015). Proceedings of the 4th International Conference on Eco-friendly Computing and Communication Systems

    Article  Google Scholar 

  19. Chelali, F.Z., Djeradi, A.: Text dependant speaker recognition using MFCC, LPC and DWT. Int. J. Speech Technol. 20(3), 725–740 (2017)

    Article  Google Scholar 

  20. Picone, J.W.: Signal modeling techniques in speech recognition. Proc. IEEE 81(9), 1215–1247 (1993)

    Article  Google Scholar 

  21. Kopparapu, S.K., Laxminarayana, M.: Choice of Mel filter bank in computing MFCC of a resampled speech. In: 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp. 121–124, May 2010

    Google Scholar 

  22. Subasi, A., Gursoy, M.I.: EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Syst. Appl. 37(12), 8659–8666 (2010)

    Article  Google Scholar 

  23. Sekkate, S., Khalil, M., Adib, A.: A feature level fusion scheme for robust speaker identification. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds.) BDCA 2018. CCIS, vol. 872, pp. 289–300. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96292-4_23

    Chapter  Google Scholar 

  24. Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, San diego (1998)

    MATH  Google Scholar 

  25. Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Klein, W.B., Palival, K.K. (eds.) Speech Coding and Synthesis. Elsevier, Amsterdam (1995)

    Google Scholar 

  26. Chebbi, S., Jebara, S.B.: On the use of pitch-based features for fear emotion detection from speech. In: 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–6, March 2018

    Google Scholar 

  27. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)

    Article  Google Scholar 

  28. Darekar, R.V., Dhande, A.P.: Emotion recognition from marathi speech database using adaptive artificial neural network. Biologically Inspired Cogn. Archit. 23, 35–42 (2018)

    Article  Google Scholar 

  29. Aouani, H., Ayed, Y.B.: Emotion recognition in speech using MFCC with SVM, DSVM and auto-encoder. In: 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–5, March 2018

    Google Scholar 

  30. Gómez-Lopera, J., Martínez-Aroza, J., Román-Roldán, R., Román-Gálvez, R., Blanco-Navarro, D.: The evaluation problem in discrete semi-hidden Markov models. Math. Comput. Simul. 137, 350–365 (2017). MAMERN VI-2015: 6th International Conference on Approximation Methods and Numerical Modeling in Environment and Natural Resources

    Article  MathSciNet  Google Scholar 

  31. Bhakre, S.K., Bang, A.: Emotion recognition on the basis of audio signal using naive Bayes classifier. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2363–2367, September 2016

    Google Scholar 

  32. Duda, R., Hart, P.: Pattern Classifications and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  33. Rao, K.S., Koolagudi, S.G.: Robust emotion recognition using pitch synchronous and sub-syllabic spectral features. In: Rao, K.S., Koolagudi, S.G. (eds.) Robust Emotion Recognition using Spectral and Prosodic Features, pp. 17–46. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6360-3_2

    Chapter  Google Scholar 

  34. Palo, H.K., Mohanty, M.N.: Wavelet based feature combination for recognition of emotions. Ain Shams Eng. J. 9(4), 1799–1806 (2018)

    Article  Google Scholar 

  35. Seehapoch, T., Wongthanavasu, S.: Speech emotion recognition using support vector machines. In: 2013 5th International Conference on Knowledge and Smart Technology (KST), pp. 86–91, January 2013

    Google Scholar 

  36. Wang, K., An, N., Li, L.: Speech emotion recognition based on wavelet packet coefficient model. In: The 9th International Symposium on Chinese Spoken Language Processing, pp. 478–482, September 2014

    Google Scholar 

  37. Shahnaz, C., et al.: Emotion recognition based on EMD-wavelet analysis of speech signals. In: 2015 IEEE International Conference on Digital Signal Processing (DSP), pp. 307–310, July 2015

    Google Scholar 

  38. Bhargava, M., Polzehl, T.: Improving automatic emotion recognition from speech using rhythm and temporal feature. In: ICECIT, pp. 2229–3116, March 2013

    Google Scholar 

  39. Pearce, D., Hirsch, H.G., Gmbh, E.E.D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR 2000, pp. 29–32 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara Sekkate .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sekkate, S., Khalil, M., Adib, A., Ben Jebara, S. (2019). A Multiresolution-Based Fusion Strategy for Improving Speech Emotion Recognition Efficiency. In: Renault, É., Boumerdassi, S., Leghris, C., Bouzefrane, S. (eds) Mobile, Secure, and Programmable Networking. MSPN 2019. Lecture Notes in Computer Science(), vol 11557. Springer, Cham. https://doi.org/10.1007/978-3-030-22885-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22885-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22884-2

  • Online ISBN: 978-3-030-22885-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics