Literature Survey on Emotion Recognition for Social Signal Processing

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 614)


Emotion is a significant aspect o the progress of human–computer interaction systems. To achieve best functionality through HCI, the computer is able to understand the emotions of human effectively. To do so, there is a need for designing an effective emotion recognition system by using the social behavior of human beings into the account. The signals through which human being tries to express the emotions are called as social signals, and the examples are facial expression, speech, and gestures. A vast research is carried out in earlier to achieve effective results in the emotion recognition system through social signal processing. This paper outlines the details of earlier developed approaches based on this aspect. Since there are number of social signals, the complete survey is categorized as audio-based and image-based. A further classification is based on the modality of input, i.e., single modal (single social signal) or multimodal (multiple social signals). Based on the methodology accomplished to achieve the objectives, this survey is further classified into different classes and details are provided more clearly. Brief details about the databases involved in the accomplishment are also explained clearly.


Emotion recognition Social signal Speech Face Feature extraction Classification Databases 


  1. 1.
    Loth S, DeRuiter JP (2016) Editorial: understanding social signals: how do we recognize the intentions of others? Front. Psychol. 7:281CrossRefGoogle Scholar
  2. 2.
    Piwek L, Pollick F, Petrini K (2015) Audio-visual integration of emotional signals from others’ social interactions. Front Psychol 6:611Google Scholar
  3. 3.
    Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25:556–570CrossRefGoogle Scholar
  4. 4.
    Cowie R et al (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18:33–80CrossRefGoogle Scholar
  5. 5.
    Fragopanagos N, Taylor JG (2005) Emotion recognition in human-computer interaction. Neural Netw. 18:389–405CrossRefGoogle Scholar
  6. 6.
    Ayadi ME, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Patt Recogn 44:572–587CrossRefGoogle Scholar
  7. 7.
    Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53:1062–1087CrossRefGoogle Scholar
  8. 8.
    Sumathi CP, Santhanam T, Mahadevi M (2013) Automatic facial expression analysis a survey. Int J Compu Sci Eng Surv (IJCSES) 3:47–59Google Scholar
  9. 9.
    Pantic M, Bartlett M (2007) Machine analysis of facial expressions face recognition. I-Tech Education and Publishing, Vienna, Austria, pp 377–416Google Scholar
  10. 10.
    Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In Proceedings of the IEEE international conference on multimedia and expo, pp 865–868Google Scholar
  11. 11.
    Wang Y, Guan L (2008) Recognizing human emotional state from audiovisual signals. IEEE Trans Multimedia 10:936–946CrossRefGoogle Scholar
  12. 12.
    Anina I (2015) OuluVS2: a multi-view audiovisual database for non-rigid mouth motion analysis. In: 11th IEEE international conference and workshops on automatic face and gesture recognition (FG)Google Scholar
  13. 13.
    Ringeval F, Sonderegger A, Sauer J, Lalanne D (2013) Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: Proceedings of IEEE face and gestures 2nd international workshop on emotion representation, analysis and synthesis in continuous time and space (EmoSPACE), pp 1–8Google Scholar
  14. 14.
    Valstar M et al (2013) AVEC 2013—the continuous audio/visual emotion and depression recognition challenge, ACMMultimediaGoogle Scholar
  15. 15.
    Petridis S, Martinez B, Pantic M (2013) The MAHNOB laughter database. Image Vis Comput 31:186–202CrossRefGoogle Scholar
  16. 16.
    Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multim 19:34–41CrossRefGoogle Scholar
  17. 17.
    Lin JC, Wu CH, Wei WL (2012) Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition. IEEE Trans Multim 14:142–156CrossRefGoogle Scholar
  18. 18.
    McKeown G, Valstar M, Pantic M, Cowie R (2010) The SEMAINE corpus of emotionally coloured character interactions. In: Proceedings of IEEE international conference on multimedia and expo, pp 1–6Google Scholar
  19. 19.
    Haq S, Jackson PJB (2009) Speaker-dependent audio-visual emotion recognition. In: Proceedings of international conference on auditory-visual speech processing, pp 53–58Google Scholar
  20. 20.
    Schuler B, Muller R, Hornler B, Hothker A, Konosu H, Rigoll G (2007) Audiovisual recognition of spontaneous interest within conversations. In: Proceedings of the 9th international conference on multimodal interfaces (ICMI), Special Session on Multimodal Analysis of Human Spontaneous Behaviour, ACM SIGCHI, pp 30–37Google Scholar
  21. 21.
    Martin O, Kotsia I, Macq B, Pitas I (2006) The eNTERFACE’05 audiovisual emotion database. In: International conference on data engineering workshopsGoogle Scholar
  22. 22.
    Patterson EK (2002) CUAVE: a new audio-visual database for multimodal human-computer interface research. In: IEEE International conference on acoustics, speech, and signal processing (ICASSP)Google Scholar
  23. 23.
    Wu CH, Yeh JF, Chuang ZJ (2009) Emotion perception and recognition from speech, affective information processing. Springer, New York, pp 93–110Google Scholar
  24. 24.
    Wu CH, Liang WB (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans Affect Comput 2:1–12CrossRefGoogle Scholar
  25. 25.
    Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun 49:98–112CrossRefGoogle Scholar
  26. 26.
    Murray IR, Arnott JL (1993) Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoustic Soc Am 93:1097–1108CrossRefGoogle Scholar
  27. 27.
    Scherer KR (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40:227–256CrossRefGoogle Scholar
  28. 28.
    Luengo I, Navas E, Hernaez I, Sanchez J (2005) Automatic emotion recognition using prosodic parameters. In: Proceedings of interspeech, pp 493–496Google Scholar
  29. 29.
    Kooladugi SG, Kumar N, Rao KS (2011) Speech emotion recognition using segmental level prosodic analysis. In: International conference on devices and communications, pp 1–5Google Scholar
  30. 30.
    Sudhakar RS, Anil MC (2015) Analysis of speech features for emotion detection: a review. In: 2015 International conference on computing communication control and automation (ICCUBEA)Google Scholar
  31. 31.
    Rao KS, Koolagudi SK, Vempada RR (2013) Emotion recognition from speech using global and local prosodic features. Int J Speech Technol 16(2):143–160CrossRefGoogle Scholar
  32. 32.
    Idris I, Salam MH (2014) Emotion detection with hybrid voice quality and prosodic features using neural network. In: Fourth world congress on information and communication technologies (WICT)Google Scholar
  33. 33.
    Jacob A, Mythili P (2015) Prosodic feature based speech emotion recognition at segmental and supra segmental levels. IEEE International conference on signal processing, informatics, communication and energy systems (SPICES)Google Scholar
  34. 34.
    Zhou Y, Sun Y, Zhang J, Yan Y (2009) Speech emotion recognition using both spectral and prosodic features. In: International conference on information engineering and computer science, ICIECS 2009Google Scholar
  35. 35.
    Swain M, Routray A, Kabisatpathy P, Kundu JN (2016) Study of prosodic feature extraction for multidialectal Odia speech emotion recognition. In: IEEE Region 10 conference (TENCON)Google Scholar
  36. 36.
    Schuller B, Steidl S, Batliner A (2009) The INTERSPEECH 2009 emotion challenge. In: Proceedings of interspeech, pp 312–315Google Scholar
  37. 37.
    Schuller B et al. (2010) The INTERSPEECH 2010 paralinguistic challenge, In: Proceedings of interspeech, pp 2794–2797Google Scholar
  38. 38.
    Schuller B et al. (2013) The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of interspeech, pp 148–152Google Scholar
  39. 39.
    Jiang D, Cui Y, Zhang X, Fan P, Gonzalez I, Sahli H (2011) Audio visual emotion recognition based on triple-stream dynamic Bayesian network models. In: Proceedings of affective computing and intelligent interaction, pp 609–618Google Scholar
  40. 40.
    Metallinou A, Lee S, Narayanan S (2008) Audio-visual emotion recognition using Gaussian mixture models for face and voice. In: Proceedings of international symposium on multimedia, pp 250–257Google Scholar
  41. 41.
    Rudovic O, Petridis S, Pantic M (2013) Bimodal log-linear regression for fusion of audio and visual features. In: Proceedings 21st ACM international conference on multimedia, pp 789–792Google Scholar
  42. 42.
    Krishna Kishore KV, Krishna Satish P (2013) Emotion recognition in speech using MFCC and wavelet features. In: IEEE 3rd international advance computing conference (IACC)Google Scholar
  43. 43.
    Zhang Q, An N, Wang K, Ren F, Li L (2013) Speech emotion recognition using combination of features. In: Fourth international conference on intelligent control and information processing (ICICIP)Google Scholar
  44. 44.
    Huang Y, Zhang G, Li X, Da F (2011) Improved emotion recognition with novel global utterance-level features. Int J Appl Math Inf Sci 5:147–153Google Scholar
  45. 45.
    Schuller B, Steidl S, Batliner A, Schiel F, Krajewski J (2011) The INTERSPEECH 2011 speaker state challenge. In: Proceedings of interspeech, pp 3201–3204Google Scholar
  46. 46.
    Sumathi CP, Santhanam T, Mahadevi M (2013) Automatic facial expression analysis a survey. Int J Comput Sci Eng Surv (IJCSES) 3:47–59Google Scholar
  47. 47.
    Saatci Y, Town C (2006) Cascaded classification of gender and facial expression using active appearance models. In: The 7th conference on automatic face and gesture recognition FGR’06Google Scholar
  48. 48.
    Wilhelm T, Bohne H-J, Gross H-M (2005) Classification of face images for gender, age, facial expression, and identity, ICANN 2005. LNCS 3696:569–574Google Scholar
  49. 49.
    Zhou Z-H, Geng X (2002) Projection functions for eye detection. State Key Labaratory for Novel Software Technology, NU, ChinazbMATHGoogle Scholar
  50. 50.
    Lee CS, Elgammal A (2006) Nonlinear shape and appearance models for facial expression analysis and synthesis. In: Proceedings of the 18th international conference on pattern recognition (ICPR’06) vol 01, pp 497–502Google Scholar
  51. 51.
    Liebelt J, Xiao J, Yang J (2006) Robust AAM fitting by fusion of images and disparity data. IEEE computer society conference on computer vision and pattern recognition, New York, vol 2, pp 2483–2490, 17–22 Jun 2006Google Scholar
  52. 52.
    Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Patt Anal Mach Intell 23:681–685CrossRefGoogle Scholar
  53. 53.
    Antonini G, Sorci M, Bierlaire M, Thiran J.-P (2006) Discrete choice models for static facial expression recognition, ACIVS’06, pp 710–721Google Scholar
  54. 54.
    Ratliff MS, Patterson E (2008) Emotion recognition using facial expressions with active appearance models. In: Proceeding HCI ‘08 proceedings of the third IASTED international conference on human computer interaction, pp 138–143, Innsbruck, Austria—17–19 Mar 2008Google Scholar
  55. 55.
    Ko K-E, Sim K-B (2010) Emotion recognition in facial image sequences using a combination of AAM with FACs and DBN. In: International conference on intelligent robotics and applicationsGoogle Scholar
  56. 56.
    Whitehill J, Omlin C (2006) Haar features for FACS AU recognition. In: Proceedings of the IEEE international conference on face and gesture recognitionGoogle Scholar
  57. 57.
    Zheng W, Liu C (2016) Facial expression recognition based on texture and shape. In: 25th wireless and optical communication conference (WOCC)Google Scholar
  58. 58.
    Wu CH, Lin JC, Wei WL (2013) Two-level hierarchical alignment for semi-coupled HMM-based audiovisual emotion recognition with temporal course. IEEE Trans Multim 15:1880–1895CrossRefGoogle Scholar
  59. 59.
    Van Kuilenburg H, Wiering M, den Uy M (2005) A model based method for automatic facial expression recognition. In: European Conference on Machine LearningGoogle Scholar
  60. 60.
    Shan C, Gong S, Mcowan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27:803–816CrossRefGoogle Scholar
  61. 61.
    Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: application to face recognition. IEEE Trans Patt Anal Mach Intell 28:2037–2041CrossRefGoogle Scholar
  62. 62.
    Rosas VP, Mihalcea R, Morency L-P (2013) Multimodal sentiment analysis of Spanish online videos. IEEE Intell Syst 28:38–45CrossRefGoogle Scholar
  63. 63.
    Ramirez GA, Baltrušaitis T, Morency LP (2011) Modeling latent discriminative dynamic of multi-dimensional affective signals. In: Proceedings of affective computing and intelligent interaction, pp 396–406Google Scholar
  64. 64.
    Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M (2011) AVEC 2011 the first international audio/visual emotion challenge. In: Proceedings of first international audio/visual emotion challenge and workshop (ACII), pp 415–424Google Scholar
  65. 65.
    Schuller B, Valstar M, Eyben F, Cowie R, Pantic M (2012) AVEC 2012—the continuous audio/visual emotion challenge. In: Proceedings of international audio/visual emotion challenge and workshop (AVEC), ACM ICMIGoogle Scholar
  66. 66.
    Valstar M et al (2013) AVEC 2013—the continuous audio/visual emotion and depression recognition challenge. In: ACM MultimediaGoogle Scholar
  67. 67.
    Valstar M et al (2014) AVEC 2014—3D dimensional affect and depression recognition challenge. In: Proceedings of AVEC 2014, held in conjunction with the 22nd ACM international conference on multimedia (MM 2014)Google Scholar
  68. 68.
    Vuppaturi A, Meher S (2015) Facial expression recognition using local binary patterns and kullback leibler divergence. In: International conference on communications and signal processing (ICCSP)Google Scholar
  69. 69.
    Radlak K, Smolka B (2016) High dimensional local binary patterns for facial expression recognition in the wild. In: 18th mediterranean electrotechnical conference (MELECON)Google Scholar
  70. 70.
    Li HD, Xu QS, Liang YZ (2012) Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. Analytica Chimica Acta 740(31):20–26Google Scholar
  71. 71.
    Dhall A, Goecke R, Gedeon T (2015) Automatic group happiness intensity analysis. IEEE Trans Affect ComputGoogle Scholar
  72. 72.
    Sidorov M, Sopov E, Ivanov I, Minker W (2015) Feature and decision level audio-visual data fusion in emotion recognition problem. In: 12th international conference on informatics in control, automation and robotics (ICINCO)Google Scholar
  73. 73.
    Gaffary Y, Eyharabide V, Martin JC, Ammi M (2014) The impact of combining kinesthetic and facial expression displays on emotion recognition by users. Int J Human-Comput Inter 30(11):904–920CrossRefGoogle Scholar
  74. 74.
    Biswas P, Langdon P (2015) Multimodal intelligent eye-gaze tracking system. Int J Human-Comput Inter 31(4):277–294CrossRefGoogle Scholar
  75. 75.
    Bosch N, Chen H, D’Mello S, Baker R, Shute V (2015) Accuracy versus availability heuristic in multimodal affects detection in the wild. In: Proceedings of the 2015 ACM on international conference on multimodal interaction (ICMI’15), pp 267–274Google Scholar
  76. 76.
    Metallinou A, Wollmer M, Katsamanis A, Eyben F, Schuller B, Narayanan S (2012) Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans Affect Comput 3:184–198CrossRefGoogle Scholar
  77. 77.
    Sayedelahl A, Araujo P, Kamel MS (2013) Audio-visual feature decision level fusion for spontaneous emotion estimation in speech conversations. In: International conference on multimedia and expo workshops, pp 1–6Google Scholar
  78. 78.
    Rudovic O, Petridis S, Pantic M (2013) Bimodal log-linear regression for fusion of audio and visual features. In: Proceedings of 21st ACM international conference multimedia, pp 789–792Google Scholar
  79. 79.
    Thireou T, Reczko M (2007) Bidirectional long short-term memory networks for predicting the subcellular localization of eukaryotic proteins. IEEE/ACM Trans Comput Biol Bioinf 4(3) July–Sept 2007Google Scholar
  80. 80.
    Wollmer M, Kaiser M, Eyben F, Schuller B, Rigoll G (2013) LSTM modeling of continuous emotions in an audiovisual affect recognition framework, in Image and Vision Computing (IMAVIS). Spec Issue Affect Anal Contin Input 31:153–163Google Scholar
  81. 81.
    Song M, You M, Li N, Chen C (2008) A robust multimodal approach for emotion recognition. Neuro-computing 71:1913–1920Google Scholar
  82. 82.
    Paleari M, Benmokhtar R, Huet B (2009) Evidence theory-based multimodal emotion recognition. In: Proceedings of 15th international multimedia modeling conference advances in multimedia modeling, pp 435–446Google Scholar
  83. 83.
    Zeng Z, Tu J, Pianfetti BM, Huang TS (2008) Audio-visual affective expression recognition through multistream fused HMM. IEEE Trans Multim 10:570–577CrossRefGoogle Scholar
  84. 84.
    Lu K, Jia Y (2012) Audio-visual emotion recognition with boosted coupled HMM. In: International conference on pattern recognition (ICPR), pp 1148–1151Google Scholar
  85. 85.
    Nicolaou M, Gunes H, Pantic M (2012) Audio-visual classification and fusion of spontaneous affective data in likelihood space. In: International conference on pattern recognition (ICPR), pp 3695–3699Google Scholar
  86. 86.
    Gupta S, Mehra A (2015) Speech emotion recognition using SVM with thresholding fusion. In: 2nd international conference on signal processing and integrated networks (SPIN)Google Scholar
  87. 87.
    Xiaoxi M, Weisi L, Dongyan H, Minghui D, Li H (2017) Facial emotion recognition. In: IEEE 2nd international conference on signal and image processing (ICSIP)Google Scholar
  88. 88.
    SInith MS, Aswathi E, Deepa TM (2015) Emotion recognition from audio signals using support vector machine. IEEE Recent Adv Intell Comput Syst (RAICS)Google Scholar
  89. 89.
    Yu Z, Zhang C (2015) Image based static facial expression recognition with multiple deep network learning. In; Proceedings of the 2015 ACM on international conference on multimodal interaction, ICMI’15, ACM, New York, NY, USA, pp 435–442Google Scholar
  90. 90.
    Kim B, Roh J, Dong S, Lee S (2016) Hierarchical committee of deep convolution neural networks for robust facial expression recognition. J Multim User Interf 1–17Google Scholar
  91. 91.
    G. Levi and T. Hassner, “Emotion recognition in the wild via convolutional neural networks and mapped binary patterns,” in Proc. ACM International Conference on Multimodal Interaction (ICMI), Nov 2015Google Scholar
  92. 92.
    Yi D, Lei Z, Liao S, Li SZ (2014) Learning face representation from scratch. CoRR, abs/1411.7923Google Scholar
  93. 93.
    Ouellet S (2014) Real-time emotion recognition for gaming using deep convolutional network features. CoRR, abs/1408.3750Google Scholar
  94. 94.
    Metallinou A, Lee S, Narayanan S (2010) Decision level combination of multiple modalities for recognition and analysis of emotional expression. In: Proceedings of international conference on acoustics, speech, and signal processing, pp 2462–2465Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of ECEJNTUCEAAnantapurIndia
  2. 2.Department of ECEVardhaman College of EngineeringHyderabadIndia
  3. 3.Department of ECEN.B.K.R Institute of Science & TechnologyVidyanagar, NelloreIndia

Personalised recommendations