Segment based emotion recognition using combined reduced features

  • Mihir Narayan MohantyEmail author
  • Hemanta Kumar Palo
S.I.: Emotion Recognition in Speech


The attitude of a human being involves with their emotions. Emotions can be observed in either verbally or visually or both. Verbal emotion recognition is a difficult task and an area of speech processing. It has a wide variety of applications in almost all fields. In this work, the authors have tried to recognize five types of emotion as anger, sadness, happiness, fear, and neutral. The work is focussed on the choice of spectral feature computation. For such purpose, Mel-frequency Cepstral coefficients (MFCC), spectral roll-off, spectral centroid and spectral flux are considered on frame-level extraction. Some of these features need to be reduced, combined, and balanced. The combined methods are verified and observed the effectiveness of results. The resulting features are used with neural network (NN) based models for recognition purpose. The models of multilayer perceptron (MLP), radial basis function network (RBFN), probabilistic neural network (PNN) and deep neural network (DNN) are considered and tested for the chosen features. It is observed that less amount of features provides reliable accuracy in case of PNN. The same utilizes less time for training and testing in case of MLP, RBFN, and PNN. However, DNN is not suitable for fewer amounts of features. It requires large data for better accuracy in the particular field. The results support the PNN with an average accuracy of 96.9% with low-dimensional feature sets, whereas the average accuracy of MLP, RBFN, DNN models found 90.1%, 92.7%, and 73.6% respectively.


Emotional speech recognition Low dimensional features Vector quantization Probabilistic neural network Deep neural network i-Vector 



  1. Al-Shoshan, A. I. (2006). Speech and music classification and separation: A review. Journal of King Saud University, 19(1), 95–133.CrossRefGoogle Scholar
  2. Bhattacharjee, D., Basu, D. K., Nasipuri, M., & Kundu, M. (2010). Reduction of feature vectors using rough set theory for human face recognition. CoRR abs/1005 (pp. 40–44)Google Scholar
  3. Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7), 613–625.CrossRefGoogle Scholar
  4. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Ninth European Conference on Speech Communication and Technology, Interspeech, (pp. 1517–1520).Google Scholar
  5. Burred, J. J., & Lerch, A. (2004). Hierarchical automatic audio signal classification. Journal of the Audio Engineering Society, 52(7/8), 724–739.Google Scholar
  6. Chiou, B. C., & Chen, C. P. (2013, October). Feature space dimension reduction in speech emotion recognition using support vector machine. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific (pp. 1–6). IEEE.Google Scholar
  7. Delac, K., Grgic, M., & Grgic, S. (2005). Independent comparative study of PCA, ICA, and LDA on the FERET data set. International Journal of Imaging Systems and Technology, 15(5), 252–260.CrossRefGoogle Scholar
  8. Fewzee, P., & Karray, F. (2012). Dimensionality reduction for emotional speech recognition. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom) (pp. 532–537). IEEE.Google Scholar
  9. Gamage, K. W., Sethu, V., Le, P. N., & Ambikairajah, E. (2015, December). An i-vector GPLDA system for speech based emotion recognition. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific (pp. 289–292). IEEE.Google Scholar
  10. Gomes, J., & El-Sharkawy, M. (2016). Implementation of i-vector algorithm in speech emotion recognition by using two different classifiers: Gaussian mixture model and support vector machine. International Journal of Advanced Research in Computer Science and Software Engineering, 6(9), 8–16.Google Scholar
  11. Haq, S., & Jackson, P. J. (2010). Multimodal emotion recognition. In Machine audition: Principles, algorithms and systems, 398–423.Google Scholar
  12. Haykins, S. (2006). Neural networks: A comprehensive foundation (2nd ed.). Delhi, India: Pearson Education.Google Scholar
  13. Jiang, J., Wu, Z., Xu, M., Jia, J., & Cai, L. (2013). Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific (pp. 1–4). IEEE.Google Scholar
  14. Kaushik, R., Sharma, M., Sarma, K. K., & Kaplun, D. I. (2016). I-vector based emotion recognition in assamese speech. International Journal of Engineering and Future Technology, 1(1), 111–124.Google Scholar
  15. Khanna, P., & Kumar, M. S. (2011). Application of vector quantization in emotion recognition from human speech. In International conference on information intelligence, systems, technology and management (pp. 118–125). Springer, Berlin, Heidelberg.Google Scholar
  16. Koolagudi, S. G., Murthy, Y. S., & Bhaskar, S. P. (2018). Choice of a classifier, based on properties of a dataset: Case study-speech emotion recognition. International Journal of Speech Technology, 21(1), 167–183.CrossRefGoogle Scholar
  17. Lopez-Otero, P., Dacia-Fernandez, L., & Garcia-Mateo, C. (2014). A study of acoustic features for depression detection. In 2014 International Workshop on Biometrics and Forensics (IWBF) (pp. 1–6). IEEE.Google Scholar
  18. Low, L. S. A., Maddage, N. C., Lech, M., Sheeber, L. B., & Allen, N. B. (2011). Detection of clinical depression in adolescents’ speech during family interactions. IEEE Transactions on Biomedical Engineering, 58(3), 574–586.CrossRefGoogle Scholar
  19. Mao, K. Z., Tan, K. C., & Ser, W. (2000). Probabilistic neural-network structure determination for pattern classification. IEEE Transactions on Neural Networks, 11(4), 1009–1016.CrossRefGoogle Scholar
  20. Martínez, A. M., & Kak, A. C. (2001). Pca versus lda. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 228–233.CrossRefGoogle Scholar
  21. Mohanty, M. N., & Routray, A. (2015). Machine learning approach for emotional speech classification. Springer International Publishing Switzerland 2015, SEMCCO 2014, LNCS 8947, Chap. 43 (pp. 1–12).Google Scholar
  22. Navarrete, P., & Ruiz-del-Solar, J. (2002). Analysis and comparison of eigenspace-based face recognition approaches. International Journal of Pattern Recognition and Artificial Intelligence, 16(07), 817–830.CrossRefzbMATHGoogle Scholar
  23. Ooi, K. E. B., Lech, M., & Allen, N. B. (2013). Multichannel weighted speech classification system for prediction of major depression in adolescents. IEEE Trans. Biomed. Engineering, 60(2), 497–506.CrossRefGoogle Scholar
  24. Palo, H. K., & Mohanty, M. N. (2017). Wavelet based feature combination for recognition of emotions. Ain Shams Engineering Journal. Scholar
  25. Palo, H. K., Mohanty, M. N., & Chandra, M. (2016). Efficient feature combination techniques for emotional speech classification. International Journal of Speech Technology, 19(1), 135–150.CrossRefGoogle Scholar
  26. Parthasarathy, S., Cowie, R., & Busso, C. (2016). Using agreement on direction of change to build rank-based emotion classifiers. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(11), 2108–2121.CrossRefGoogle Scholar
  27. Peeters, G., Giordano, B. L., Susini, P., Misdariis, N., & McAdams, S. (2011). The timbre toolbox: Extracting audio descriptors from musical signals. The Journal of the Acoustical Society of America, 130(5), 2902–2916.CrossRefGoogle Scholar
  28. Přibil, J., & Přibilová, A. (2013). Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 8. Scholar
  29. Quan, C., Wan, D., Zhang, B., & Ren, F. (2013, December). Reduce the dimensions of emotional features by principal component analysis for speech emotion recognition. In 2013 IEEE/SICE International Symposium on System Integration (SII) (pp. 222–226). IEEE.Google Scholar
  30. Quan, C., Zhang, B., Sun, X., & Ren, F. (2017). A combined cepstral distance method for emotional speech recognition. International Journal of Advanced Robotic Systems, 14(4), 1–9.CrossRefGoogle Scholar
  31. Rabiner, L. R., & Schafer, R. W. (2007). Introduction to digital speech processing. Foundations and Trends in Signal Processing, 1(1–2), 1–194.CrossRefzbMATHGoogle Scholar
  32. Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., & Pantic, M. (2011). Avec 2011–the first international audio/visual emotion challenge. In Affective Computing and Intelligent Interaction (pp. 415–424). Springer, Berlin, Heidelberg.Google Scholar
  33. Sivanandam, S. N., & Deepa, D. S. (2011). Principle of soft computing (2nd ed.). India: Wiley.Google Scholar
  34. Specht, D. F., & Romsdahl, H. (1994). Experience with adaptive probabilistic neural network and adaptive general regression neural network. IEEE/INNS International Joint Conference, Neural Network, 2, 203–1208.Google Scholar
  35. Stolar, M. N., Lech, M., Stolar, S. J., & Allen, N. B. (2018). Detection of adolescent depression from speech using optimised spectral roll-off parameters. Biomedical Journal of Scientific & Technical Research. Scholar
  36. Tao, Y., Wang, K., Yang, J., An, N., & Li, L. (2015). Harmony search for feature selection in speech emotion recognition. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII) (pp. 362–367). IEEE.Google Scholar
  37. Wang, K., An, N., & Li, L. (2014). Speech emotion recognition based on wavelet packet coefficient model. In 2014 9th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 478–482). IEEE.Google Scholar
  38. Wang, K., An, N., Li, B. N., & Zhang, Y. (2015). Speech emotion recognition using Fourier parameters. IEEE Transaction on Affective Computing, 6(1), 69–75.CrossRefGoogle Scholar
  39. Wenjing, H., Haifeng, L., & Chunyu, G. (2009). A hybrid speech emotion perception method of VQ-based feature processing and ANN recognition. In WRI Global Congress on Intelligent Systems, 2009. GCIS’09 (Vol. 2, pp. 145–149). IEEE.Google Scholar
  40. Wu, S., Falk, T. H., & Chan, W.-Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, Elsevier, 53, 768–785.CrossRefGoogle Scholar
  41. Xu, X., Deng, J., Zheng, W., Zhao, L., & Schuller, B. (2015). Dimensionality reduction for speech emotion features by multiscale kernels. In Sixteenth Annual Conference of the International Speech Communication Association, Interspeech 2015 (pp. 1532–1536)Google Scholar
  42. Yuan, J., Chen, L., Fan, T., & Jia, J. (2015). Dimension reduction of speech emotion feature based on weighted linear discriminant analysis. International Journal of Signal Processing, Image Processing and Pattern Recognition, 8(11), 299–308.CrossRefGoogle Scholar
  43. Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99(3), 432.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringSiksha ‘O’ Anusandhan (Deemed to be University)BhubaneswarIndia

Personalised recommendations