Emotions recognition: different sets of features and models
- 74 Downloads
The better and effective human–machine communication is ensured by performing affective computing. In the recent years, healthy research has been progressing in recognizing emotions by using various databases. This paper mainly emphasizes the effectiveness on the basis of using different sets of features and modeling techniques in evaluating the performance of multiple speaker-independent and speaker-dependent emotion recognition systems. It has become a challenging task to improve the performance of emotion recognition system, since EMO-DB Berlin database used in this work contains only ten speeches uttered by ten speakers in different emotions namely, Anger, Boredom, Disgust, Fear, Happiness, Sadness and Neutral. Speaker dependent and independent emotion recognition is done by creating models using clustering technique, Gaussian mixture modeling (GMM) and continuous density hidden Markov modeling (CDHMM) techniques for all emotions. The emotion recognition system is also evaluated for mel frequency cepstrum (MFCC) and concatenated MFCC with probability & shifted delta cepstrum (SDC), mel frequency linear predictive cepstrum (MFPLPC) and concatenated MFPLPC with probability & SDC and formants for clustering used as a modeling technique. These features provide complementary evidence in assessing the performance of the system based on VQ based clustering technique. This algorithm provides 99 and 100% as overall weighted accuracy recall (WAR) for performance evaluation with respect to correct identification of emotion for any one feature and modeling technique.
KeywordsEmotion recognition system (ERS) Gaussian mixure model (GMM) Continuous density Hidden Markov Model (CDHMM) Mel frequency perceptual linear predictive cepstrum (MFPLPC) Shifted delta cepstrum (SDC) Weighted accuracy recall (WAR)
Compliance with ethical standards
Conflict of interest
The authors have declared that no competing interest exists.
- Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1991). The challenge of inverse E: The RASTA PLP method. Proceeding Twenty-fifth Asilomar conferene on signals, systems and computers (pp. 800–804) Pacific Grove, CA, IEEE. https://ieeexplore.ieee.org/document/186557/.
- Hermansky, H., Tsuga, K., Makino, S., & Wakita, H. (1986). Perceptually based processing in automatic speech recognition. Proceedings IEEE international conference on acoustics, speech and signal processing (pp. 1971–1974). https://ieeexplore.ieee.org/document/1168649/.
- Iliou, T., & Anagnostopoulos, C. N. (2009). Comparison of different classifiers for emotion recognition. Proceedings of 13th panhellenic conference on informatics (pp. 102–106).Google Scholar
- Kohler, M. A., & Kennedy, M. (2002). Language identification using shifted delta cepstra. IEEE 45th midwest symposium on circuits and systems (pp. 69–72). https://ieeexplore.ieee.org/document/1186972/.
- Patel, P., Chaudhari, A., Kale, R., & Pund, M. (2017). Emotion recognition from speech with gaussian mixture models & via boosted GMM. International Journal of Research in Science & Engineering, 3(2), 47–53.Google Scholar
- Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. NJ: Prentice Hall.Google Scholar
- Rao, K. S., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S.V.S.K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3(2), 3603–3607.Google Scholar
- Revathi, A., & Venkataramani, Y. (2011). Perceptual features based continuous speech recognition in additive noise environment using various modeling techniques. STM Journals on Current Trends in Signal Processing, 2(3), 1–15.Google Scholar
- Sapra, A., Panwar, N., & Panwar, S. (2013). Emotion recognition from speech. International Journal of Emerging Technology and Advanced Engineering, 3(2), 341–345.Google Scholar
- Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, Winter-Spring 2009, 8(1), 41–46.Google Scholar
- Vogt, T., & Andre, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of language resources and evaluation conference, 2006 (LREC 2006). https://www.informatik.uni-augsburg.de/lehrstuehle/hcm/publications/2006-LREC/lrec06.pdf.
- Yu, D., & Tashev, I. (2014). Speech emotion recognition using deep neural network and extreme learning machine. INTERSPEECH (pp. 223–226). https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/IS140441.pdf.
- Zhang, Z., Coutinho, E., Deng, J., & Schuller, B. (2015). Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 115–126.Google Scholar