A Multimodal Emotion Recognition System Using Facial Landmark Analysis

Rahdari, Farhad; Rashedi, Esmat; Eftekhari, Mahdi

doi:10.1007/s40998-018-0142-9

Farhad Rahdari¹,
Esmat Rashedi² &
Mahdi Eftekhari³

757 Accesses
18 Citations
Explore all metrics

Abstract

This paper introduces a multimodal emotion recognition system based on two different modalities, i.e., affective speech and facial expression. For affective speech, the common low-level descriptors including prosodic and spectral audio features (i.e., energy, zero crossing rate, MFCC, LPC, PLP and temporal derivatives) are extracted, whereas a novel visual feature extraction method is proposed in the case of facial expression. This method exploits the displacement of specific landmarks across consecutive frames of an utterance for feature extraction. To this end, the time series of temporal variations for each landmark is analyzed individually for extracting primary visual features, and then, the extracted features of all landmarks are concatenated for constructing the final feature vector. The analysis of displacement signal of landmarks is performed by the discrete wavelet transform which is a widely used mathematical transform in signal processing applications. In order to reduce the complexity of derived models and improve the efficiency, a variety of dimensionality-reduction schemes are applied. Furthermore, to exploit the advantages of multimodal emotion recognition systems, the feature-level fusion of the audio and the proposed visual features is examined. Results of experiments conducted on three SAVEE, RML and eNTERFACE05 databases show the efficiency of proposed visual feature extraction method in terms of performance criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

References

Baltrušaitis T et al (2016) Openface: an open source facial behavior analysis toolkit. In: IEEE winter conference on applications of computer vision (WACV), 2016. IEEE
Barrett LF (1998) Discrete emotions or dimensions? The role of valence focus and arousal focus. Cogn Emot 12(4):579–599
Article Google Scholar
Caridakis G. et al (2007) Multimodal emotion recognition from expressive faces, body gestures and speech. In: Artificial intelligence and innovations 2007: From theory to applications, pp 375–388
Cawley GC, Talbot NL (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107
MathSciNet MATH Google Scholar
Cevikalp H, Triggs B (2013) Hyperdisk based large margin classifier. Pattern Recognit 46(6):1523–1531
Article MATH Google Scholar
Chakraborty C, Talukdar P (2016) Issues and limitations of HMM in speech processing: a survey. Int J Comput Appl 141(7):13–17
Google Scholar
Chao L et al (2016) Audio visual emotion recognition with temporal alignment and perception attention. arXiv preprint arXiv:1603.08321
Cid F et al (2015) A novel multimodal emotion recognition approach for affective human robot interaction’. In: Proceedings of FinE, pp 1–9
Colombetti G (2009) From affect programs to dynamical discrete emotions. Philos Psychol 22(4):407–425
Article Google Scholar
Datcu D, Rothkrantz L (2009) Multimodal recognition of emotions in car environments. DCI&I 2009
Datcu D, Rothkrantz L (2014) Semantic audio-visual data fusion for automatic emotion recognition. In: Emotion recognition: a pattern analysis approach, pp 411–435
Degirmenci A (2014) Introduction to hidden Markov models. Harvard University. http://scholar.harvard.edu/files/adegirmenci/files/hmm_adegirmenci_2014.pdf. Accessed 10 Oct 2016
Dobrišek S et al (2013) Towards efficient multi-modal emotion recognition. Int J Adv Rob Syst 10(1):53
Article Google Scholar
Ekman P et al (2013) Emotion in the human face: Guidelines for research and an integration of findings. Elsevier, New York
Google Scholar
Fadil C et al (2015) Multimodal emotion recognition using deep networks. In: VI Latin American congress on biomedical engineering CLAIB 2014, Paraná, Argentina 29–31 October 2014. Springer, Berlin
Fugal DL (2009) Conceptual wavelets in digital signal processing: an in-depth, practical approach for the non-mathematician, Space & Signals Technical Pub
Gera A, Bhattacharya A (2014) Emotion recognition from audio and visual data using f-score based fusion. In: Proceedings of the 1st IKDD conference on data sciences, ACM
Ghahramani Z (2001) An introduction to hidden Markov models and Bayesian networks. Int J Pattern Recognit Artif Intell 15(01):9–42
Article Google Scholar
Gharavian D et al (2017) Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks. Multimed Tools Appl 76(2):2331–2352
Article Google Scholar
Goodfellow I et al (2016) Deep learning. MIT Press, Cambridge
MATH Google Scholar
Goyal A et al (2016) A multimodal mixture-of-experts model for dynamic emotion prediction in movies. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2016. IEEE
Guo J et al (2017) Multi-modality network with visual and geometrical information for micro emotion recognition. In: 12th IEEE international conference on automatic face & gesture recognition (FG 2017), 2017, IEEE
Gupta S et al (2013) Feature extraction using MFCC. Signal Image Process 4(4):101
MathSciNet Google Scholar
Hall M et al (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Haq S, Jackson PJ (2010) Multimodal emotion recognition. In: Machine audition: principles, algorithms and systems, pp 398–423
Haq S et al (2008) Audio-visual feature selection and reduction for emotion classification. In: Proceedings of the international conference on auditory-visual speech processing (AVSP’08), Tangalooma, Australia
Haq S et al (2015) Bimodal human emotion classification in the speaker-dependent scenario. Pakistan Academy of Sciences, Islamabad, p 27
Google Scholar
Haq S et al (2016) Audio-visual emotion classification using filter and wrapper feature selection approaches. Sindh Univ Res J-SURJ (Sci Ser) 47(1):67–72
MathSciNet Google Scholar
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752
Article Google Scholar
Hossain MS et al (2016) Audio-visual emotion recognition using big data towards 5G. Mobile Netw Appl 21(5):753–763
Article Google Scholar
Huang K-C et al (2013) Learning collaborative decision-making parameters for multimodal emotion recognition. IEEE international conference on multimedia and expo (ICME), 2013. IEEE
Jackson P, Haq S (2014) Surrey audio-visual expressed emotion(SAVEE) database. University of Surrey, Guildford
Google Scholar
Jaimes A, Sebe N (2007) Multimodal human–computer interaction: a survey. Comput Vis Image Underst 108(1):116–134
Article Google Scholar
Jiang D et al (2011) Audio visual emotion recognition based on triple-stream dynamic bayesian network models. In: International conference on affective computing and intelligent interaction. Springer, Berlin
Karamizadeh S et al (2014) Advantage and drawback of support vector machine functionality. In: International conference on computer, communications, and control technology (I4CT), 2014. IEEE
Kaya H, Salah AA (2016) Combining modality-specific extreme learning machines for emotion recognition in the wild. J Multimodal User Interfaces 10(2):139–149
Article Google Scholar
Kaya H et al (2017) Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image Vis Comput 65:66–75
Article Google Scholar
Mansoorizadeh M, Charkari NM (2010) Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools Appl 49(2):277–297
Article Google Scholar
Martin O et al (2006) The enterface’05 audio-visual emotion database. In: Proceedings of the 22nd international conference on data engineering workshops, 2006. IEEE
Mou W et al (2016) Automatic recognition of emotions and membership in group videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops
Noroozi F et al (2017) Audio-visual emotion recognition in video clips. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2017.2713783
Google Scholar
Oatley K, Johnson-Laird PN (1987) Towards a cognitive theory of emotions. Cogn Emot 1(1):29–50
Article Google Scholar
Paleari M et al (2010) Towards multimodal emotion recognition: a new approach. In: Proceedings of the ACM international conference on image and video retrieval. ACM
Patwardhan A, Knapp G (2016) Multimodal Affect Recognition using Kinect. arXiv preprint arXiv:1607.02652
Plutchik R (1980) A general psychoevolutionary theory of emotion. Theor Emot 1(3–31):4
Google Scholar
Poria S et al (2016) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: IEEE 16th International conference on data mining (ICDM), 2016. IEEE
Poria S et al (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fus 37:98–125
Article Google Scholar
Reynolds D (2015) Gaussian mixture models. In: Encyclopedia of biometrics, pp 827–832
Rifkin R et al (2003) Advances in learning theory: methods, models and applications, eds. suykens, horvath, basu, micchelli, and vandewalle, ser. In: NATO science series III: computer and systems sciences, vol 190. IOS Press, Amsterdam
Sebe N et al (2005) Multimodal emotion recognition. Handb Pattern Recognit Comput Vis 4:387–419
Article Google Scholar
Seng K et al (2016) A combined rule-based and machine learning audio-visual emotion recognition approach. IEEE Trans Affect Comput 9(1):3–13
Article MathSciNet Google Scholar
Soleymani M et al (2017) A survey of multimodal sentiment analysis. Image Vis Comput 65:3–14
Article Google Scholar
Štruc V, Mihelic F (2010) Multi-modal emotion recognition using canonical correlations and acoustic features. In: 20th International conference on pattern recognition (ICPR), 2010. IEEE
Subramaniam A et al (2016) Bi-modal first impressions recognition using temporally ordered deep audio and stochastic visual features. In: Computer vision–ECCV 2016 workshops. Springer, Berlin
Tao J, Tan T (2005) Affective computing: a review. In: International conference on affective computing and intelligent interaction. Springer, Berlin
Tzirakis P et al (2017) End-to-end multimodal emotion recognition using deep neural networks. IEEE J Sel Top Signal Process 11(8):1301–1309
Article MathSciNet Google Scholar
Valstar MF et al (2015) Fera 2015-second facial expression recognition and analysis challenge. In: 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), 2015. IEEE
Vaseghi SV (2008) Advanced digital signal processing and noise reduction. Wiley, New York
Book Google Scholar
Walecki R et al (2015) Variable-state latent conditional random fields for facial expression recognition and action unit detection. In: 11th IEEE International conference and workshops on automatic face and gesture recognition (FG), 2015. IEEE
Wang Y, Guan L (2008) Recognizing human emotional state from audiovisual signals. IEEE Trans Multimed 10(5):936–946
Article Google Scholar
Wang Y et al (2012) Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Trans Multimed 14(3):597–607
Article MathSciNet Google Scholar
Xie Z et al (2015) A new audiovisual emotion recognition system using entropy-estimation-based multimodal information fusion. IEEE International symposium on circuits and systems (ISCAS), 2015. IEEE
You Q et al (2016) Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the 9th ACM international conference on web search and data mining. ACM
Yu D, Deng L (2016) Automatic speech recognition. Springer, Berlin
MATH Google Scholar
Zhalehpour S et al (2014) Multimodal emotion recognition with automatic peak frame selection. In: Proceedings of the IEEE international symposium on innovations in intelligent systems and applications (INISTA), 2014. IEEE

Download references

Acknowledgment

The authors gratefully acknowledge the financial support provided by Institute of Science and High Technology and Environmental Sciences, Graduate University of Advanced Technology, Kerman, Iran, under Contract Number 3165.

Author information

Authors and Affiliations

Department of Computer and IT, Research Institute of Science and High Technology and Environmental Sciences, Graduate University of Advanced Technology, Kerman, Iran
Farhad Rahdari
Department of Electrical and Computer Engineering, Graduate University of Advanced Technology, Kerman, Iran
Esmat Rashedi
Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
Mahdi Eftekhari

Authors

Farhad Rahdari
View author publications
You can also search for this author in PubMed Google Scholar
Esmat Rashedi
View author publications
You can also search for this author in PubMed Google Scholar
Mahdi Eftekhari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farhad Rahdari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rahdari, F., Rashedi, E. & Eftekhari, M. A Multimodal Emotion Recognition System Using Facial Landmark Analysis. Iran J Sci Technol Trans Electr Eng 43 (Suppl 1), 171–189 (2019). https://doi.org/10.1007/s40998-018-0142-9

Download citation

Received: 14 August 2017
Accepted: 20 September 2018
Published: 22 October 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s40998-018-0142-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multimodal Emotion Recognition System Using Facial Landmark Analysis

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Multimodal Emotion Recognition System Using Facial Landmark Analysis

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation