Advertisement

Feature extraction algorithms to improve the speech emotion recognition rate

  • 10 Accesses

Abstract

In this digitally growing era speech emotion recognition plays significant role in several applications such as Human Computer Interface (HCI), lie detection, automotive system to assist steering, intelligent tutoring system, audio mining, security, Telecommunication, Interaction between a human and machine at home, hospitals, shops etc. Speech is a unique human characteristic used as a tool to communicate and express one’s perspective to others. Speech emotion recognition is extracting the emotions of the speaker from his or her speech signal. Feature extraction, Feature selection and classifier are three main stages of the emotion recognition. The main aim of this work is to improve the speech emotion recognition rate of a system using the different feature extraction algorithms. The work emphasizes on the preprocessing of the received audio samples where the noise from speech samples is removed using filters. In next step, the Mel Frequency Cepstral Coefficients (MFCC), Discrete Wavelet Transform (DWT), pitch, energy and Zero crossing rate (ZCR) algorithms are used for extracting the features. In feature selection stage Global feature algorithm is used to remove redundant information from features and to identify the emotions from extracted features machine learning classification algorithms are used. These feature extraction algorithms are validated for universal emotions comprising Anger, Happiness, Sad and Neutral.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Bachu, R. G., Kopparthi, S., Adapa, B., & Barkana, B. D. (2008, June). Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In American society for engineering education (ASEE) zone conference proceedings (pp. 1–7).

  2. Badshah, A. M., Ahmad, J., Lee, M. Y., & Baik, S. W. (2016). Divide-and-conquer based ensemble to spot emotions in speech using MFCC and random forest. arXiv preprint arXiv:1610.01382.

  3. Budati, A. K., & Valiveti, H. (2019). Identify the user presence by GLRT and NP detection criteria in cognitive radio spectrum sensing. International Journal of Communication Systems.

  4. Cook, S. (2002). Speech recognition HOWTO. The Linux Documentation Project.

  5. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition,44(3), 572–587.

  6. Ingale, A. B., & Chaudhari, D. S. (2012). Speech emotion recognition. International Journal of Soft Computing and Engineering (IJSCE),2(1), 235–238.

  7. Kurpukdee, N., Kasuriya, S., Chunwijitra, V., Wutiwiwatchai, C., & Lamsrichan, P. (2017, May). A study of support vector machines for emotional speech recognition. In 2017 8th international conference of information and communication technology for embedded systems (IC-ICTES) (pp. 1–6). IEEE.

  8. Li, G., Lutman, M. E., Wang, S., & Bleeck, S. (2012). Relationship between speech recognition in noise and sparseness. International Journal of Audiology,51(2), 75–82.

  9. Likitha, M. S., Gupta, S. R. R., Hasitha, K., & Raju, A. U. (2017, March). Speech based human emotion recognition using MFCC. In 2017 international conference on wireless communications, signal processing and networking (WiSPNET) (pp. 2257–2260). IEEE.

  10. Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE,13(5), e0196391.

  11. Palo, H. K., & Mohanty, M. N. (2018). Comparative analysis of neural networks for speech emotion recognition. International Journal of Engineering and Technology,7, 112–116.

  12. Saste, S. T., & Jagdale, S. M. (2017, April). Emotion recognition from speech using MFCC and DWT for security system. In 2017 international conference of electronics, communication and aerospace technology (ICECA) (Vol. 1, pp. 701–704). IEEE.

  13. Selvaraj, M., Bhuvana, R., & Padmaja, S. (2016). Human speech emotion recognition. International Journal of Engineering and Technology,8, 311–323.

  14. Shambhavi, S. S., & Nitnaware, V. N. (2015). Emotion speech recognition using MFCC and SVM. International Journal of Engineering Research & Technology. https://doi.org/10.17577/IJERTV4IS060932.

  15. Zaidan, N. A., & Salam, M. S. (2016). MFCC global features selection in improving speech emotion recognition rate. In Advances in machine learning and signal processing (pp. 141–153). Cham: Springer.

  16. Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology,16(6), 582–589.

Download references

Author information

Correspondence to Anusha Koduru.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Koduru, A., Valiveti, H.B. & Budati, A.K. Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol (2020). https://doi.org/10.1007/s10772-020-09672-4

Download citation

Keywords

  • Emotion recognition
  • Preprocessing
  • Feature extraction
  • Feature selection
  • Mel Frequency Cepstral coefficients
  • Discrete wavelet transform
  • Zero crossing rate