A Hybrid Neural Emotion Recogniser for Human-Robotic Agent Interaction

  • Alexandru Traista
  • Mark Elshaw
Part of the Communications in Computer and Information Science book series (CCIS, volume 311)


This paper presents a hybrid neural approach to emotion recognition from speech, which combines feature selection using principal component analysis (PCA) with unsupervised neural clustering through self-organising map (SOM). Given the importance that is associated with emotions in humans, it is unlikely that robots will be accepted as anything more that machines if they do not express and recognise emotions. In this paper, we describe the performance of an unsupervised approach to emotion recognition that achieves similar performance to current supervised intelligent approaches. Performance, however, reduces when the system is tested using samples from a male volunteer not in the training set using a low cost microphone. Through the use of an unsupervised neural approach, it is possible to go beyond the basic binary classification of emotions to consider the similarity between emotions and whether speech can express multiple emotions at the same time.


Emotion recognition social robot interaction unsupervised neural learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abdi, H., Williams, L.J.: Principal Component Analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 433–459 (2010)CrossRefGoogle Scholar
  2. 2.
    Attias, H.: Learning in High Dimensions: Modular Mixture Models. Microsoft Research, USA (2001)Google Scholar
  3. 3.
    Breazeal, C.: The Role of Expression in Robots that Learn from People. Phil. Trans. R. Soc. B 364(1535), 3527–3538 (2009)CrossRefGoogle Scholar
  4. 4.
    Burkhardt, F., Paeschke, A., Rolfe, M., Sendlmeier, W., Weis, B.: A Database of German Emotional Speech. In: Interspeech, Lisbon (2005)Google Scholar
  5. 5.
    Doya, K.: What are the Computations of the Cerebellum, the Basal Ganglia and the cerebral cortex? Neural Networks 12(7-8), 961–974 (1999)CrossRefGoogle Scholar
  6. 6.
    Elshaw, M., Moore, R.K., Klein, M.: An Attention-gating Recurrent Working Memory Architecture for Emergent Speech Representation. Connection Science 22(2), 157–175 (2010)CrossRefGoogle Scholar
  7. 7.
    Eyben, F., Woellmer, M., Schuller, B.: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. ACM Multimedia, 1459–1462 (2010)Google Scholar
  8. 8.
    Hall, M.: Correlation-based Feature Selection for Machine Learning (1999)Google Scholar
  9. 9.
    Haykin, S.: Neural Networks: A Comprehensive Foundation, Toronto, Canada. Macmillian College Publishing Company (1994)Google Scholar
  10. 10.
    Holmes, J., Holmes, W.: Speech Synthesis and Recognition. Taylor and Francis, London (2001)Google Scholar
  11. 11.
    Huang, Y., Zhang, G., Xu, X.: Speech Emotion Recognition Research Based on the Stacked Generalization Ensemble Neural Network for Robot Pet. In: Pattern Recognition, CCPR, pp. 1–5 (2009)Google Scholar
  12. 12.
    Kohonen, T.: Self-Organization of Topologically Correct Feature Maps. Biological Cybernetics 43, 59–69 (1982)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Mehrabian, A.: Pleasure-Arousal-Dominance: A General Framework for Describing and Measuring Individual Differences in Temperament. Current Psychology 14(4), 261–292 (1996)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Pan, Y., Shen, P., Shen, L.: Speech Emotion Recognition Using Support Vector Machine. International Journal of Smart Home 6(2), 101–107 (2012)Google Scholar
  15. 15.
    Shami, M., Verhelst, W.: An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication 49(3) (2007)Google Scholar
  16. 16.
    Slavova, V., Verhelst, W., Sahli, H.: A Cognitive Science Reasoning in Recognition of Emotions in Audio-Visual Speech. International Journal Information Technologies and Knowledge 2, 324–334 (2008)Google Scholar
  17. 17.
    Sobin, C., Alpert, M.: Emotion in Speech: The Acoustic Attributes of Fear, Anger, Sadness, and Joy. Journal of Psycholinguistic Research 28(4), 347–365 (1999)CrossRefGoogle Scholar
  18. 18.
    ten Bosch, L., Van Hamme, H., Boves, L., Moore, R.K.: A computational model of language acquisition: the emergence of words. Fundamenta Informaticae 90, 229–249 (2009)Google Scholar
  19. 19.
    Traunmüller, H., Eriksson, A.: The Frequency Range of the Voice Fundamental in the Speech of Male and Female Adults. Department of Linguistics, University of Stockholm, Stockholm (1994)Google Scholar
  20. 20.
    Vogt, T., André, E., Bee, N.: EmoVoice — A Framework for Online Recognition of Emotions from Voice. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds.) PIT 2008. LNCS (LNAI), vol. 5078, pp. 188–199. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  21. 21.
    Zhang, G., Song, Q., Fei, S.: Speech Emotion Recognition System Based on BP Neural Network in Matlab Environment. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds.) ISNN 2008, Part II. LNCS, vol. 5264, pp. 801–808. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Alexandru Traista
    • 1
  • Mark Elshaw
    • 1
  1. 1.Department of Computing, Faculty of Computing and EngineeringCoventry UniversityCoventryUK

Personalised recommendations