International Journal of Speech Technology

, Volume 17, Issue 4, pp 401–408 | Cite as

A comparative analysis of classifiers in emotion recognition through acoustic features

  • Swarna Kuchibhotla
  • H. D. Vankayalapati
  • R. S. Vaddi
  • K. R. Anne


The most popular features used in speech emotion recognition are prosody and spectral. However the performance of the system degrades substantially, when these acoustic features employed individually i.e either prosody or spectral. In this paper a feature fusion method (combination of energy,pitch prosody features and MFCC spectral features) is proposed. The fused features are classified individually using linear discriminant analysis (LDA), regularized discriminant analysis (RDA), support vector machine (SVM) and k nearest neighbour (kNN). The results are validated over Berlin and Spanish emotional speech databases. Results showed that,the performance is improved by 20 % approximately for each classifier when compared with performance of each classifier with individual features. Results also reveal that RDA is a better choice as a classifier for emotion classification because LDA suffers from singularity problem, which occurs due to high dimensional and small sample size speech samples i.e the number of available training speech samples is small compared to the dimensionality of the sample space. RDA eliminates this singularity problem by using regularization criteria and give better results.


Emotion recognition Feature fusion Classification 



This work was supported by Research Project on “Non-intrusive real time driving process ergonomics monitoring system to improve road safety in a car – pc environment” funded by DST, New Delhi.


  1. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of german emotional speech. In: Interspeech, pp. 1517–1520.Google Scholar
  2. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human-computer interaction. IEEE on Signal Processing Magazine, 18(1), 32–80.CrossRefGoogle Scholar
  3. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.CrossRefMATHGoogle Scholar
  4. El Ayadi, M. M., Kamel, M. S., & Karray, F. (2007). Speech emotion recognition using gaussian mixture vector autoregressive models. In: Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, IEEE, vol. 4, pp. IV-957.Google Scholar
  5. Ji, S., & Ye, J. (2008). Generalized linear discriminant analysis: A unified framework and efficient model selection. IEEE Transactions on Neural Networks, 19(10), 1768–1782.CrossRefGoogle Scholar
  6. Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: a review. International Journal of Speech Technology, 15(2), 99–117.CrossRefGoogle Scholar
  7. Koolagudi, S. G., Kumar, N., & Rao, K. S. (2011). Speech emotion recognition using segmental level prosodic analysis. In: Devices and communications (ICDeCom), 2011 International Conference on, IEEE, pp. 1–5.Google Scholar
  8. Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Automatic emotion recognition using prosodic parameters. In: Interspeech, pp. 493–496.Google Scholar
  9. Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490–501. Google Scholar
  10. Milton, A., Roy, S. S., & Selvi, S. (2013). Svm scheme for speech emotion recognition using mfcc feature. International Journal of Computer Applications, 69(9), 34–39.Google Scholar
  11. Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Neural Computing and Applications, 9(4), 290–296.Google Scholar
  12. Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.CrossRefGoogle Scholar
  13. Ravikumar, M., & Suresha, M. (2013). Dimensionality reduction and classification of color features data using svm and knn. International Journal of Image Processing and Visual Communication, 1(4), 16–21.Google Scholar
  14. Sato, N., & Obuchi, Y. (2007). Emotion recognition using mel-frequency cepstral coefficients. Information and Media Technologies, 2(3), 835–848.Google Scholar
  15. Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1), 227–256.CrossRefMATHGoogle Scholar
  16. Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden markov model-based speech emotion recognition. In: Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP’03). 2003 IEEE International Conference on, IEEE, vol. 2, pp. II-1.Google Scholar
  17. Tato, R., Santos, R., Kompe, R., & Pardo, J. M. (2002). Emotional space improves emotion recognition. In: Interspeech.Google Scholar
  18. Vankayalapati, H., Anne, K., & Kyamakya, K. (2010). Extraction of visual and acoustic features of the driver for monitoring driver ergonomics applied to extended driver assistance systems. In: Data and mobility, Springer, pp. 83–94.Google Scholar
  19. Vankayalapati, H. D., & SVKK Anne, K. R. (2011). Driver emotion detection from the acoustic features of the driver for real-time assessment of driving ergonomics process. International Society for Advanced Science and Technology (ISAST) Transactions on Computers and Intelligent Systems journal, 3(1), 65–73.Google Scholar
  20. Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.CrossRefGoogle Scholar
  21. Vogt, T., André, E., & Wagner, J. (2008). Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In C. Peter & R. Beale (Eds.), Affect and emotion in human–computer interaction (pp. 75–91). Springer.Google Scholar
  22. Ye, J., Xiong, T., Li, Q., Janardan, R., Bi, J., Cherkassky, V., & Kambhamettu, C. (2006). Efficient model selection for regularized linear discriminant analysis. In: Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, pp. 532–539.Google Scholar
  23. Zhou, Y., Sun, Y., Zhang, J., & Yan, Y. (2009). Speech emotion recognition using both spectral and prosodic features. In: Information engineering and computer science, 2009. ICIECS 2009. International Conference on, IEEE, pp. 1–4.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Swarna Kuchibhotla
    • 1
  • H. D. Vankayalapati
    • 2
  • R. S. Vaddi
    • 2
  • K. R. Anne
    • 2
  1. 1.Acharya Nagarjuna UniversityNamburu, Gunter DtIndia
  2. 2.V. R. Siddhartha Engineering CollegeKanuruIndia

Personalised recommendations