International Journal of Speech Technology

, Volume 22, Issue 3, pp 521–531 | Cite as

Improving the performance of the speaker emotion recognition based on low dimension prosody features vector

  • Ashishkumar Prabhakar GudmalwarEmail author
  • Ch V Rama Rao
  • Anirban Dutta


Speaker emotion recognition is an important research issue as it finds lots of applications in human–robot interaction, computer–human interaction, etc. This work deals with the recognition of emotion of the speaker from speech utterance. For that features like pitch, log energy, zero crossing rate, and first three formant frequencies are used. Feature vectors are constructed using the 11 statistical parameters of each feature. The Artificial Neural Network (ANN) is chosen as a classifier owing to its universal function approximation capabilities. In ANN based classifier, the time required for training the network as well as for classification depends upon the dimension of feature vector. This work focused on development of a speaker emotion recognition system using prosody features as well as reduction of dimensionality of feature vectors. Here, principle component analysis (PCA) is used for feature vector dimensionality reduction. Emotional prosody speech and transcription from Linguistic Data Consortium (LDC) and Berlin emotional databases are considered for evaluating the performance of proposed approach for seven types of emotion recognition. The performance of the proposed method is compared with existing approaches and better performance is obtained with proposed method. From experimental results it is observed that 75.32% and 84.5% recognition rate is obtained for Berlin emotional database and LDC emotional speech database respectively.


Prosody PCA Emotion recognition Recognition rate 


  1. Abelin, A. & Allwood, J. (2000). Cross-linguistic interpretation of emotional prosody. In Proceedings of the ISCA workshop on speech and emotion.Google Scholar
  2. Anagnostopoulos, C. N., & Vovoli, E. (2010). Sound processing features for speaker-dependent and phrase-independent emotion recognition in Berlin database. In G. Papadopoulos, W. Wojtkowski, G. Wojtkowski, S. Wrycza, & J. Zupancic (Eds.), Information systems development (pp. 413–421). Boston: Springer.Google Scholar
  3. Anagnostopoulos, C. N., Iliou, T., & Giannoukos, (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155–177.CrossRefGoogle Scholar
  4. Atassi, H. & Esposito, A. (2008). A speaker independent approach to the classification of emotional vocal expressions. In Proceedings of 20th IEEE international conference on tools with artificial intelligence (pp. 147–152).Google Scholar
  5. Banse, R., & Sherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.CrossRefGoogle Scholar
  6. Bisio, I., Delfino, A., Lavagetto, F., Marchese, M., & Sciarrone, A. (2013). Gender-driven emotion recognition through speech signals for ambient intelligence applications. IEEE Transactions on Emerging Topics in Computing, 1(2), 244–257.CrossRefGoogle Scholar
  7. Breazeal, C. (2001). Designing social robots. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  8. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. & Weiss, B. (2005). A database of German emotional speech, In Proceedings of interspeech.Google Scholar
  9. Burkhardt, F., & Sendlmeier, W. (2000). Verification of acoustical correlates of emotional speech using formant-synthesis. In Proceedings of the ISCA workshop on speech and emotion.Google Scholar
  10. Cao, H., Vermaa, R., & Nenkovab, A. (2015). Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Computer Speech & Language, 29(1), 186–202.CrossRefGoogle Scholar
  11. Cen, L., Ser, W., & Yu, Z. L. (2008). Speech emotion recognition using canonical correlation analysis and probabilistic neural network, In Proceedings of 7th international conference on machine learning and applications (pp. 859–862).Google Scholar
  12. Chiaverini, S., Siciliano, B., & Villani, L. (1999). A survey of robot interaction control scheme with experimental comparison. IEEE/ASME Transactions on Mechatronics, 4(3), 273–285.CrossRefGoogle Scholar
  13. De Cheveign, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917–1930.CrossRefGoogle Scholar
  14. FirozShah, A., Vimal Krishnan, V. R., Raji Sukumar, A., Jayakumar, A., & Babu Anto, P. (2009). Speaker independent automatic emotion recognition from speech: A comparison of MFCCs and discrete wavelet transforms. In Proceedings of international conference on advances in recent technologies in communication and computing (pp. 528–531).Google Scholar
  15. Fu, L., Mao, X., & Chen, L. (2008a). Relative speech emotion recognition based artificial neural network. In Proceedings of IEEE Pacific-Asia workshop on computational intelligence and industrial application (pp. 140–144).Google Scholar
  16. Fu, L., Mao, X., & Chen, L. (2008b). Speaker independent emotion recognition using HMMs fusion system with relative features. In Proceedings of 1st international conference on intelligent networks and intelligent systems (pp. 608–611).Google Scholar
  17. Giannakopoulos, T., Pikrakis, A., & Theodoridis, S. (2009). A dimensional approach to emotion recognition of speech from movies. In Proceedings of IEEE international conference on acoustics, speech and signal processing (pp. 65–68).Google Scholar
  18. Iliou, T., & Anagnostopoulos, C. N. (2009). Comparison of different classifiers for emotion recognition. In Proceedings of panhellenic conference in informatics (pp. 102–106).Google Scholar
  19. Kostoulas, T. P., & Fakotakis, N. (2006). A speaker dependent emotion recognition framework, CSNDSP. In Proceedings of 5th international symposium computers, systems, networks and digital signal processing (pp. 305–309).Google Scholar
  20. Kostoulas, T., Ganchev, T., Lazaridis, A., & Fakotakis, N. (2010). Enhancing emotion recognition from speech through feature selection, In P. Sojka, A. Hork, I. Kopecek, K. Pala (Eds.), International conference on text, speech and dialogue, lecture notes in artificial intelligence (Vol. 6231, pp. 338–344).Google Scholar
  21. Loni, D. Y., & Subbaraman, S. (2014). Formant estimation of speech and singing voice by combining wavelet with LPC and Cepstrum techniques. In 9th international conference on industrial and information systems (ICIIS) IEEE (pp. 1–7).Google Scholar
  22. Luengo, I., Navas, E., & Hernaez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transaction on Multimedia, 12, 490–501.CrossRefGoogle Scholar
  23. Lugger, M., & Yang, B. (2007a). An incremental analysis of different feature groups in speaker independent emotion recognition. In Proceedings of international congress phonetic sciences (pp. 2149–2152).Google Scholar
  24. Lugger, M., & Yang, B. (2007b). The relevance of voice quality features in speaker independent emotion recognition. In Proceedings of IEEE international conference on acoustics, speech and signal processing (pp. 17–20).Google Scholar
  25. McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: A rough benchmark. In Proceedings of ISCA workshop speech emotion in Belfast, UK (pp. 207–212).Google Scholar
  26. Mishra, H. K., & Sekhar, C. C. (2009). Variational gaussian mixture models for speech emotion recognition. In Proceedings of 7th international conference on advances in pattern recognition (pp. 183–186).Google Scholar
  27. Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In Proceedings of INTERSPEECH conference (pp. 809–812).Google Scholar
  28. Oudeyer, P. Y. (2003). The production and recognition of emotions in speech: Feature and algorithms. International Journal of Human-Computer Studies, 59(1), 157–183.Google Scholar
  29. Pantic, M., & Rothkrantz, L. J. (2003). Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9), 1370–1390.CrossRefGoogle Scholar
  30. Picard, R. (1997). Affective computing. Cambridge, MA: MIT Press.Google Scholar
  31. Ramakrishnan, S. (2012). Recognition of emotion from speech: A review. In S. Ramakrishnan (Ed.), Speech enhancement, modeling and recognition: Algorithms and applications (pp. 121–138). London: IntechOpen.CrossRefGoogle Scholar
  32. Ross, M., Shaffer, H., Cohen, A., Freudberg, R., & Manley, H. (1974). Average magnitude difference function pitch extractor. IEEE Transactions on Acoustics, Speech, and Signal Processing, 22(5), 353–362.CrossRefGoogle Scholar
  33. Schuller, B., Muller, R., Lang, M., & Rigoll, G. (2005a). Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In Proceedings of 9th Eurospeech Interspeech (pp. 805–809).Google Scholar
  34. Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., et al. (2007). The relevance of feature type for the automatic classification of emotional user states: Low level descriptors and functionals. In Proceedings of interspeech (pp. 2253–2256).Google Scholar
  35. Schuller, B., Mller, R., Eyben, F., Gast, J., Hrnler, B., Wllmer, M., et al. (2009). Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing, 27, 1760–1774.CrossRefGoogle Scholar
  36. Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., et al. (2010). Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Transaction on Affect Computing, 1, 119–131.CrossRefGoogle Scholar
  37. Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, 9(7), 727–730.CrossRefGoogle Scholar
  38. Sidorova, J. (2007). Speech emotion recognition. Ph.D. Thesis, Universitat Pompeu Fabra, Barcelona.Google Scholar
  39. Talkin, D. (1995). A robust algorithm for pitch tracking (RAPT). Speech Coding and Synthesis, 495, 495–518.Google Scholar
  40. Tan, L. N., & Alwan, A. (2013). Multi-band summary correlogram-based pitch detection for noisy speech. Speech Communication, 55(7), 841–856.CrossRefGoogle Scholar
  41. Tickle, A. (2000). English and Japanese speakers emotion vocalizations and recognition: a comparison highlighting vowel quality. In ISCA workshop on speech and emotion, Belfast.Google Scholar
  42. Tolkmitt, F. J., & Scherer, K. R. (1986). Effect of experimentally induced stress on vocal parameter. Journal of Experimental Psychology: Human Perception and Performance, 12(3), 302–313.Google Scholar
  43. Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. Journal of the Acoustical Society of America, 52(4), 1238–1250.CrossRefGoogle Scholar
  44. Wu, S., Falk, T.H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In Proceedings of 16th international conference on digital signal processing.Google Scholar
  45. Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction on Affect Computing, 2, 10–21.CrossRefGoogle Scholar
  46. Yang, C., Ji, L., & Liu, G. (2009a). Study to speech emotion recognition based on TWINsSVM. In Proceedings of 5th international conference on natural computation (pp. 312–316).Google Scholar
  47. Yang, T., Yang, J., & Bi, F. (2009b). Emotion statuses recognition of speech signal using intuitionistic fuzzy set. In Proceedings of world congress on software engineering (pp. 204–207).Google Scholar
  48. Yun, S., Yoo, C. D. (2009). Speech emotion recognition via a max-margin framework incorporating a loss function based on the Watson and Tellegens emotion model. In Proceedings IEEE international conference on acoustics, speech and signal processing (pp. 4169–4172).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Ashishkumar Prabhakar Gudmalwar
    • 1
    Email author
  • Ch V Rama Rao
    • 1
  • Anirban Dutta
    • 1
  1. 1.National Institute of Technology, MeghalayaShillongIndia

Personalised recommendations