Skip to main content
Log in

Emotional speaker recognition in real life conditions using multiple descriptors and i-vector speaker modeling technique

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Emotional speaker recognition under real life conditions becomes an urgent need for several applications. This paper proposes a novel approach using multiple feature extraction methods and i-vector modeling technique in order to improve emotional speaker recognition under real conditions. The performance of the proposed approach is evaluated on real condition speech signal (IEMOCAP corpus) under clean and noisy environments using various SNR levels. We examined divers known spectral features in speaker recognition (MFCC, LPCC and RASTA-PLP) and performed combined features called MFCC-SDC coefficients. The feature vectors are then classified using the multiclass Support Vector Machines (SVM). Experimental results illustrate good robustness of the proposed system against talking conditions (emotions) and against real life environment (noise). Besides, results reveal that MFCC-SDC features outperforms the conventional MFCCs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Ayadi ME, Kamel MS, Karray F (2011) Survey on speech emotion recognition :features classification schemes and databases. J Patt Recog 572–587

  2. Boulianne G Kenny, P (2005) Eigenvoice modeling with sparse training data. IEEE Trans Speech Audio Proc

  3. Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang J, Lee S, Narayanan S (2008) IEMOCAP: interactive emotional dyadic motion capture database. J Langua Resou Eval 42(4):335–359

    Article  Google Scholar 

  4. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    Book  Google Scholar 

  5. Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Sign Proc ASSP-28(4):357–366

    Article  Google Scholar 

  6. Dhonde SB, SM Jagade (2015) Feature Extraction Techniques in Speaker Recognition: A Review, International Journal on Recent Technologies in Mechanical and Electrical Engineering (IJRMEE), ISSN: 2349–7947, 2

  7. Hermansky H, Morgan H (1992) RASTA-PLP speech analysis technique. IEEE Int Conf Acoust Speech Sign Proc 1:121124

    Google Scholar 

  8. Hsu C, Lin C (2001) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415425

    Google Scholar 

  9. Kenny P, Dehak N, Dehak R (2009) Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. Interspeech

  10. Mackov L (2014) Emotional speaker veri_cation based on i-vectors. IEEE International Conference

  11. Mackov L et al (2015) Best feature selection for emotional speaker verification in i-vector representation

  12. Murali Krishna N, Lakshmi PV, Srinivas Y (2013) Inferring the Human Emotional State of Mind using Assymetric Distrubution, International Journal of Advanced Computer Science and Applications(IJACSA) 4;1

  13. Ouellet P, Dumouchel P, Kenny P, Boulianne G (2007) Joint factor analysis versus eigen channels in speaker recognition. IEEE Trans

  14. Prithvi P, Kumar TK (2015) Comparative Analysis of MFCC, LFCC, RASTA -PLP, International Journal of Scientific Engineering and Research (IJSER) 2347–3878

  15. Quatieri TF, Reynolds DA, Dunn RB (2000) Speaker verification using adapted gaussian mixture models. Digital signal processing

  16. Richardson F, Reynolds D, Dehak N (2015) A Uni_ed Deep Neural Network for Speaker and Language Recognition, International Speech Communication Association, INTERSPEECH, September 6–10, 2015, Dresden, Germany

  17. Rusu C, Ghiurcau MV, Astola J (2011) Speaker recognition in an emotional environment. Proceedings of SPAMEC

  18. Sarmah K, Bhattacharjee U (2014) GMM based Language Identification using MFCC and SDC Features. Int J Comput Appl 85(5)

    Article  Google Scholar 

  19. Shahin I (2009) Speaker Identification in Emotional Environments. Iranian J Electric Computer Eng 8 (1)

  20. Shahin I (2013) Speaker identification in emotional talking environments using both gender and emotion cues. IEEE

  21. Shashidhar G, Koolagudi K (2012) Sreenivasa Rao. Emotion recognition from speech : a review. Springer Science

  22. Sirisha Devi J, Yarramalle S, Nandyala SP (2014) Speaker emotion recognition based on speech features and classification techniques, I.J. Comput Netw Info Sec 7:61–77

    Google Scholar 

  23. Sreenivasa Rao K, Koolagudi SG, Sharma K (2012) Speaker recognition in emotional environment. Communications in Computer and Information

  24. Tao D, Guo Y, Song M (2016) Person Re-Identification by Dual-RegularizedKISS Metric Learning. IEEE Transa Image Proc 25(6)

    Article  MathSciNet  Google Scholar 

  25. Tao D, Cheng J, Song M (2016) Manifold Ranking-Based Matrix Factorization for Saliency Detection. IEEE Trans Neural Netw Learn Syst 27(6)

    Article  MathSciNet  Google Scholar 

  26. Tao D, Guo Y, Li Y, Gao X (2018) Tensor Rank Preserving Discriminant Analysis for Facial Recognition. IEEE Trans Image Proc 27(1)

    Article  MathSciNet  Google Scholar 

  27. Van Leeuwen D, Hasan Bahari M, Saeidi R (2013) Accent recognition using i-vector. ICASSP

  28. Vapnik VN (1998) Statistical Learning Theory. Wiley-Interscience, New York

    MATH  Google Scholar 

  29. Xia R, Liu Y (2012) Using i-vector space model for emotion recognition. INTERSPEECH

  30. Xu M, Bao H, Zheng TF (2007) Emotion attribute projection for speaker recognition on emotional speech. in Proceedings of Interspeech

  31. Yang Y, Li C (2011) Emotional Speaker Identification by Humans and Machines, Speech Commun: Springer. 167173

    Chapter  Google Scholar 

  32. Yang Y, Li C (2013) Emotional speaker recognition based on i-vector through atom aligned sparse representation

  33. Yeh J-H, Pao T-L, Lin C-Y, Tsai Y-W, Chen Y-T (2011) Segment-based emotion recognition from continuous mandarin Chinese speech. Comput Hum Behav 27:1545–1552

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asma Mansour.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mansour, A., Chenchah, F. & Lachiri, Z. Emotional speaker recognition in real life conditions using multiple descriptors and i-vector speaker modeling technique. Multimed Tools Appl 78, 6441–6458 (2019). https://doi.org/10.1007/s11042-018-6256-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6256-2

Keywords

Navigation