Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech

  • Ginevra Castellano
  • Loic Kessous
  • George Caridakis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4868)


In this paper we present a multimodal approach for the recognition of eight emotions. Our approach integrates information from facial expressions, body movement and gestures and speech. We trained and tested a model with a Bayesian classifier, using a multimodal corpus with eight emotions and ten subjects. Firstly, individual classifiers were trained for each modality. Next, data were fused at the feature level and the decision level. Fusing the multimodal data resulted in a large increase in the recognition rates in comparison with the unimodal systems: the multimodal approach gave an improvement of more than 10% when compared to the most successful unimodal system. Further, the fusion performed at the feature level provided better results than the one performed at the decision level.


Affective body language Affective speech Emotion recognition Multimodal fusion 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Picard, R.: Affective computing. MIT Press, Boston (1997)Google Scholar
  2. 2.
    Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine (January 2001)Google Scholar
  3. 3.
    Sebe, N., Cohen, I., Huang, T.S.: Multimodal Emotion Recognition. Handbook of Pattern Recognition and Computer Vision. World Scientific, Singapore (2005)Google Scholar
  4. 4.
    Pantic, M., Sebe, N., Cohn, J., Huang, T.S.: Affective Multimodal Human-Computer Interaction. In: ACM Multimedia, Singapore, pp. 669–676 (November 2005)Google Scholar
  5. 5.
    Scherer, K.R., Wallbott, H.G.: Analysis of Nonverbal Behavior. In: Handbook Of Discourse: Analysis, ch.11, vol. 2. Academic Press, London (1985)Google Scholar
  6. 6.
    Scherer, K.R., Ellgring, H.: Multimodal Expression of Emotion: Affect Programs or Componential Appraisal Patterns? Emotion 7(1) (2007)Google Scholar
  7. 7.
    Banse, R., Scherer, K.R.: Acoustic Profiles in Vocal Emotion Expression. Journal of Personality and Social Psychology, 614–636 (1996)Google Scholar
  8. 8.
    Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: IEEE International Conference on Multimedia & Expo (ICME 2005) (2005)Google Scholar
  9. 9.
    Gunes, H., Piccardi, M.: A Bimodal Face and Body Gesture Database for Automatic Analysis of Human Nonverbal Affective Behavior. In: Proc. of ICPR 2006 the 18th International Conference on Pattern Recognition, Hong Kong, China, August 20-24 (2006)Google Scholar
  10. 10.
    Bänziger, T., Pirker, H., Scherer, K.: Gemep - geneva multimodal emotion portrayals: a corpus for the study of multimodal emotional expressions. In: Deviller, L., et al. (eds.) Proceedings of LREC 2006 Workshop on Corpora for Research on Emotion and Affect, Genoa. Italy, pp. 15–19 (2006)Google Scholar
  11. 11.
    Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: towards a new generation of databases. Speech Communication 40, 33–60 (2003)zbMATHCrossRefGoogle Scholar
  12. 12.
    Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(12), 1424–1445 (2000)CrossRefGoogle Scholar
  13. 13.
    Ioannou, S., Raouzaiou, A., Tzouvaras, V., Mailis, T., Karpouzis, K., Kollias, S.: Emotion recognition through facial expression analysis based on a neurofuzzy network. Neural Networks 18(4), 423–435 (2005)CrossRefGoogle Scholar
  14. 14.
    Cowie, R., Douglas-Cowie, E.: Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In: Proc. International Conf. on Spoken Language Processing, pp. 1989–1992 (1996)Google Scholar
  15. 15.
    Scherer, K.R.: Adding the affective dimension: A new look in speech analysis and synthesis. In: Proc. International Conf. on Spoken Language Processing, pp. 1808–1811 (1996)Google Scholar
  16. 16.
    Camurri, A., Lagerlöf, I., Volpe, G.: Recognizing Emotion from Dance Movement: Comparison of Spectator Recognition and Automated Techniques. International Journal of Human-Computer Studies 59(1-2), 213–225 (2003)CrossRefGoogle Scholar
  17. 17.
    Bianchi-Berthouze, N., Kleinsmith, A.: A categorical approach to affective gesture recognition. Connection Science 15(4), 259–269 (2003)CrossRefGoogle Scholar
  18. 18.
    Castellano, G., Villalba, S.D., Camurri, A.: Recognising Human Emotions from Body Movement and Gesture Dynamics. In: Proc. of 2nd International Conference on Affective Computing and Intelligent Interaction, Lisbon (2007)Google Scholar
  19. 19.
    Picard, R.W., Vyzas, E., Healey, J.: Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(10), 1175–1191 (2001)CrossRefGoogle Scholar
  20. 20.
    Pantic, M., Rothkrantz, L.J.M.: Towards an Affect-sensitive Multimodal Human-Computer Interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)CrossRefGoogle Scholar
  21. 21.
    Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzaeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal information. In: Proc. of ACM 6th int’l Conf. on Multimodal Interfaces (ICMI 2004), State College, PA, October 2004, pp. 205–211 (2004)Google Scholar
  22. 22.
    Kim, J., André, E., Rehm, M., Vogt, T., Wagner, J.: Integrating information from speech and physiological signals to achieve emotional sensitivity. In: Proc. of the 9th European Conference on Speech Communication and Technology (2005)Google Scholar
  23. 23.
    Gunes, H., Piccardi, M.: Bi-modal emotion recognition from expressive face and body gestures. Journal of Network and Computer Applications (2006), doi:10.1016/j.jnca.2006.09.007Google Scholar
  24. 24.
    el Kaliouby, R., Robinson, P.: Generalization of a Vision-Based Computational Model of Mind-Reading. In: Proceedings of First International Conference on Affective Computing and Intelligent Interfaces, pp. 582–589 (2005)Google Scholar
  25. 25.
    Engelbrecht, A.P., Fletcher, L., Cloete, I.: Variance analysis of sensitivity information for pruning multilayer feedforward neural networks. In: IJCNN 1999. International Joint Conference on Neural Networks, vol. 3, pp. 1829–1833 (1999)Google Scholar
  26. 26.
    Densley, D.J., Willis, P.J.: Emotional posturing: a method towards achieving emotional figure animation. Computer Animation, 8 (1997)Google Scholar
  27. 27.
    Young, J.W.: Head and Face Anthropometry of Adult U.S. Civilians, FAA Civil Aeromedical Institute, 1963–1993 (final report, 1993)Google Scholar
  28. 28.
    Raouzaiou, A., Tsapatsoulis, N., Karpouzis, K., Kollias, S.: Parameterized facial expression synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing 2002(10), 1021–1038 (2002)zbMATHCrossRefGoogle Scholar
  29. 29.
    Camurri, A., Coletta, P., Massari, A., Mazzarino, B., Peri, M., Ricchetti, M., Ricci, A., Volpe, G.: Toward real-time multimodal processing: EyesWeb 4.0. In: Proc. AISB 2004 Convention: Motion, Emotion and Cognition, Leeds, UK (March 2004)Google Scholar
  30. 30.
    Camurri, A., Mazzarino, B., Volpe, G.: Analysis of Expressive Gesture: The Eyesweb Expressive Gesture Processing Library. In: Camurri, A., Volpe, G. (eds.) GW 2003. LNCS (LNAI), vol. 2915. Springer, Heidelberg (2004)Google Scholar
  31. 31.
    Castellano, G., Camurri, A., Mazzarino, B., Volpe, G.: A mathematical model to analyse the dynamics of gesture expressivity. In: Proc. of AISB 2007 Convention: Artificial and Ambient Intelligence, Newcastle upon Tyne, UK (April 2007)Google Scholar
  32. 32.
    Kessous, L., Amir, N.: Comparison of feature extraction approaches based on the Bark time/frequency representation for classification of expressive speechpaper submitted to Interspeech (2007)Google Scholar
  33. 33.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  34. 34.
    Kononenko, I.: On Biases in Estimating Multi-Valued Attributes. In: 14th International Joint Conference on Articial Intelligence, pp. 1034–1040 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ginevra Castellano
    • 1
  • Loic Kessous
    • 2
  • George Caridakis
    • 3
  1. 1.InfoMus LabDIST - University of GenovaGenovaItaly
  2. 2.Department of Speech, Language and HearingUniversity of Tel Aviv, Sheba CenterTel AvivIsrael
  3. 3.Image, Video and Multimedia Systems LaboratoryNational Technical University of AthensAthensGreece

Personalised recommendations