International Journal of Speech Technology

, Volume 18, Issue 4, pp 505–520 | Cite as

Four-stage feature selection to recognize emotion from speech signals

  • A. Milton
  • S. Tamil Selvi


Feature selection plays an important role in emotion recognition from speech signals because it improves the classification accuracy by choosing the best uncorrelated features. In wrapper method of feature selection, the features are evaluated by a classifier. Features of large dimension will increase the computational complexity of the classifier, and further it will affect the training of classifiers which needs inverse of covariance matrix. We propose a four-stage feature selection method which avoids the problem of curse of dimensionality by the principle of divide and conquer. In the proposed method, the dimension of the feature vector is shortened at any stage in a way that the classifiers, whose training is affected by the large feature dimension, can also be used to evaluate the features. Experimental results show that the four-stage feature selection method improves classification accuracy. Another method to improve classification accuracy is evolved by bringing together several classifiers with a fusion technique. Class-specific multiple classifiers scheme is one such method that improves classification accuracy by combining optimum performance feature set and classifier for each emotional class. In this work, we improve the performance of the class-specific multiple classifiers scheme by embedding the proposed feature selection method in its structure.


Speech emotion recognition Human computer interaction Multistage feature selection Multiple classifiers 


  1. Aha, D. W., & Bankert, R. L. (1996). A comparative evaluation of sequential feature selection algorithms. Learning from Data, Lecture Notes in Statistics, 112, 199–206.CrossRefGoogle Scholar
  2. Ai, H., Litman, D. J., Forbes-Riley, K., Rotaru, M., Tetreault, A., & Purandare, A. (2006). Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In Proceedings of Interspeech, 2006 (pp. 797–800). Pittsburgh, PA.Google Scholar
  3. Altun, H., & Polat, G. (2009). Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Systems with Applications, 36(4), 8197–8203.CrossRefGoogle Scholar
  4. Batliner, A., Fischer, K., Huber, R., Spiker, J., & Nöth, E. (2000). Desperately seeking emotions: Actors, wizards and human beings. In: Proceedings of the ISCA workshop on speech and emotion: A conceptual framework for research, Belfast. pp. 195–200.Google Scholar
  5. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., & Amir, N. (2011). Whodunnit-searching for the most important feature types signaling emotion-related user states in speech. Computer Speech & Language, 25(1), 4–28.CrossRefGoogle Scholar
  6. Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7), 613–625.CrossRefGoogle Scholar
  7. Böck, R., Hübner, D., & Wendemuth, A. (2010). Determining optimal signal features and parameters for HMM-based emotion classification. In: Proceedings of the 15th IEEE mediterranean electrotechnical conference, Valletta, MT. pp. 1586–1590.Google Scholar
  8. Boersma, P., & Weenink, D. (2009). Praat:doing phonetics by computer (computer program). Amsterdam: Institute of Phonetic Sciences, University of Amsterdam.Google Scholar
  9. Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication, 53(9–10), 1186–1197.CrossRefGoogle Scholar
  10. Burkhardt, F., Paeschke, A., Rolfes, M., Senlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In: Proceedings of interspeech 2005, Lisbon. pp. 1517–1520.Google Scholar
  11. Calix, R. A., & Knapp, G. M. (2013). Actor level emotion magnitude prediction in text and speech. Multimedia Tools and Applications, 62, 319–332.CrossRefGoogle Scholar
  12. Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falcão, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460.CrossRefGoogle Scholar
  13. Inanoglu, Z., & Young, S. (2009). Data-driven emotion conversion in spoken English. Speech Communication, 51(3), 268–283.CrossRefGoogle Scholar
  14. Klein, J., Moon, Y., & Picard, R. W. (2002). This computer responds to user frustration: theory, design and results. Interacting with Computers, 14(2), 119–140.CrossRefGoogle Scholar
  15. Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.CrossRefGoogle Scholar
  16. Kotti, M., & Paternò, F. (2012). Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. International Journal of Speech Technology, 15(2), 131–150.CrossRefGoogle Scholar
  17. Kuncheva, L. I., Bezdek, J. C., & Duin, R. P. W. (2001). Decision templates for multiple classifier fusion: An experimental comparison. Pattern Recognition, 34(2), 299–314.MATHCrossRefGoogle Scholar
  18. Lee, C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9–10), 1162–1171.CrossRefGoogle Scholar
  19. Lee, C. M., & Narayanan, S. S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transaction on Speech and Audio Processing, 13(2), 293–303.CrossRefGoogle Scholar
  20. Makhoul, J. (1975). Linear prediction: A tutorial review. Proceeding of IEEE, 63(4), 561–580.CrossRefGoogle Scholar
  21. Mansoorizadeh, M., & Charkari, N. M. (2010). Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools and Applications, 49, 277–297.CrossRefGoogle Scholar
  22. Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006). The enterface’05 audio-visual emotion database. In: Proceedings of IEEE workshop on multimedia database management, AtlantaGoogle Scholar
  23. Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech & Language, 28(3), 727–742.CrossRefGoogle Scholar
  24. Murray, I. R., & Arnott, J. L. (2008). Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Computer Speech & Language, 22(2), 107–129.CrossRefGoogle Scholar
  25. Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transaction on Affective Computing, 3(1), 116–125.CrossRefGoogle Scholar
  26. Pérez-Espinosa, H., Reyes-García, C. A., & Villasenor-Pineda, L. (2012). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control, 7(1), 79–87.CrossRefGoogle Scholar
  27. Pfister, T., & Robinson, P. (2011). Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis. IEEE Transaction on Affective Computing, 2(2), 66–78.CrossRefGoogle Scholar
  28. Pierre-Yves, O. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human–Computer Studies, 59(1–2), 157–183.CrossRefGoogle Scholar
  29. Rong, J., Li, G., & Chen, Y. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing and Management, 45(3), 315–328.CrossRefGoogle Scholar
  30. Scheirer, J., Fernandez, R., Klein, J., & Picard, R. W. (2002). Frustrating the user on purpose: a step toward building an effective computer. Interacting with Computers, 14(2), 93–118.CrossRefGoogle Scholar
  31. Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011a). Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 3(9–10), 1062–1087.CrossRefGoogle Scholar
  32. Schuller, B., Müller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of interspeech 2005, Lisbon. pp. 805-808.Google Scholar
  33. Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., & Rigoll, G. (2010). Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Transaction on Affective Computing, 1(1), 1–14.CrossRefGoogle Scholar
  34. Schuller, B., Zhang, Z., Weninger, F., & Rigoll, G. (2011b). Using multiple databases for training in emotion recognition: To unite or to vote? In: Proceedings of interspeech 2011, Florence. pp. 1553–1556.Google Scholar
  35. Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.CrossRefGoogle Scholar
  36. Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transaction on Audio, Speech and Language Processing, 14(4), 1145–1154.CrossRefGoogle Scholar
  37. Väyrynen, E., Toivanen, J., & Seppänen, T. (2011). Classification of emotion in spoken Finnish using vowel-length segments: Increasing reliability with a fusion technique. Speech Communication, 53(3), 269–282.CrossRefGoogle Scholar
  38. Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956–2970.MATHCrossRefGoogle Scholar
  39. Vlasenko, B., Prylipko, D., Böck, R., & Wendemuth, A. (2014). Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Computer Speech & Language, 28, 483–500.CrossRefGoogle Scholar
  40. Vlasenko, B., & Wendemuth, A. (2009). Processing affected speech within human machine interaction. In: Proceedings of interspeech 2009, Brighton. pp. 2039–2042.Google Scholar
  41. Wang, F., Sahli, H., Gao, J., Jiang, D., & Verhelst, W. (2014). Relevance units machine based dimensional and continuous speech emotion prediction. Multimedia Tools and Applications,. doi: 10.1007/s11042-014-2319-1.Google Scholar
  42. Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2010). Multi-stage classification of emotional speech motivated by a dimensional model. Multimedia Tools and Applications, 46, 119–345.CrossRefGoogle Scholar
  43. Zhang, S., & Zhao, X. (2013). Dimensionality reduction-based spoken emotion recognition. Multimedia Tools and Applications, 63(3), 615–646.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringSt. Xavier’s Catholic College of EngineeringChunkankadaiIndia
  2. 2.Department of Electronics and Communication EngineeringNational Engineering CollegeKovilpattiIndia

Personalised recommendations