Recognition of facial actions and their temporal segments based on duration models

Gonzalez, Isabel; Cartella, Francesco; Enescu, Valentin; Sahli, Hichem

doi:10.1007/s11042-014-2320-8

Recognition of facial actions and their temporal segments based on duration models

Published: 21 October 2014

Volume 74, pages 10001–10024, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Isabel Gonzalez¹,
Francesco Cartella¹,
Valentin Enescu¹ &
…
Hichem Sahli^1,2

265 Accesses
4 Citations
Explore all metrics

Abstract

Being able to automatically analyze finegrained changes in facial expression into action units (AUs), of the Facial Action Coding System (FACS), and their temporal models (i.e., sequences of temporal phases, neutral, onset, apex, and offset), in face videos would greatly benefit for facial expression recognition systems. Previous works, considered combining, per AU, a discriminative frame-based Support Vector Machine (SVM) and a dynamic generative Hidden Markov Models (HMM), to detect the presence of the AU in question and its temporal segments in an input image sequence. The major drawback of HMMs, is that they do not model well time dependent dynamics as the ones of AUs, especially when dealing with spontaneous expressions. To alleviate this problem, in this paper, we exploit efficient duration modeling of the temporal behavior of AUs, and we propose hidden semi-Markov model (HSMM) and variable duration semi-Markov model (VDHMM) to recognize the dynamics of AU’s. Such models allow the parameterization and inference of the AU’s state duration distributions. Within our system, geometrical and appearance based measurements, as well as their first derivatives, modeling both the dynamics and the appearance of AUs, are applied to pair-wise SVM classifiers for a frame-based classification. The output of which are then fed as evidence to the HSMM or VDHMM for inferring AUs temporal phases. A thorough investigation into the aspect of duration modeling and its application to AU recognition through extensive comparison to state-of-art SVM-HMM approaches are presented. For comparison, an average recognition rate of 64.83 % and 64.66 % is achieved for the HSMM and VDHMM respectively. Our framework has several benefits: (1) it models the AU’s temporal phases duration; (2) it does not require any assumption about the underlying structure of the AU events, and (3) compared to HMM, the proposed HSMM and VDHMM duration models reduce the duration error of the temporal phases of an AU, and they are especially better in recognizing the offset ending of an AU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hidden Markov Models for Modeling Occurrence Order of Facial Temporal Dynamics

Micro-Facial Movements: An Investigation on Spatio-Temporal Descriptors

Automatic Micro-expression Recognition from Long Video Using a Single Spotted Apex

References

Asteriadis S, Karpouzis K, Kollias S (2014) Visual focus of attention in non-calibrated environments using gaze estimation. Int J Comput Vis 107(3):293–316
Article MathSciNet Google Scholar
Bartlett M, Viola P, Sejnowski T, Larsen J, Hager J, Ekman P (1996) Classifying facial action. In: Advances in Neural Information Processing Systems, vol 8, pp 823–829
Bartlett M, Littlewort G, Lainscsek C, Fasel I, Movellan J (2004) Machine learning methods for fully automatic recognition of facial expressions and facial actions. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol 1, pp 592–597
Bartlett M, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2006) Fully automatic facial action recognition in spontaneous behavior. In: Proceedings Of the 7th International conference on Face and Gesture Recognition, pp 223–230
Bartlett MS, Littlewort G, Braathen B, Sejnowski TJ, Movellan JR (2002) A prototype for automatic recognition of spontaneous facial actions. In: Advances in Neural Information Processing Systems, pp 1271–1278
Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal users affective state analysis in naturalistic interaction. Journal on Multimodal User Interfaces 3(1–2):49–66
Article Google Scholar
Chuang CF, Shih FY (2006) Rapid and brief communication: Recognizing facial action units using independent component analysis and support vector machine. Pattern Recogn 39(9):1795–1798
Article MATH Google Scholar
Cohn J, Zlochower A, Lien J, Kanade T (1999) Automated face analysis by feature point tracking has high concurrent validity with manual facs coding. Psychophysiology 36:35–43
Article Google Scholar
Cohn JF, Zlochower AJ, Lien JJ, Kanade T (1998) Feature-point tracking by optical flow discriminates subtle differences in facial expression. In: Proceedings of the 3rd. International Conference on Face and Gesture Recognition, IEEE Computer Society, Washington, DC, USA, FG ’98, pp 396
Cohn JF, Reed LI, Moriyama T, Xiao J, Schmidt K, Ambadar Z (2004) Multimodal coordination of facial action, head rotation, and eye motion during spontaneous smiles. In: Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, IEEE Computer Society, Washington, DC, USA, FGR’, vol 04, pp 129–135
Ekman P, Friesen W (1975) Unmasking the face. A guide to recognizing emotions from facial clues. Prentice-Hall, New Jersey
Google Scholar
Ekman P, Friesen W, Hager J (2002) The Facial Action Coding System on CD ROM. Network Information Research Center, Salt Lake City
Google Scholar
Fasel B, Luettin J (2003) Automatic facial expression analysis: a survey. Pattern Recognit 36(1):259–275
Article MATH Google Scholar
Forney GD (1973) The viterbi algorithm. Proc IEEE 61(3):268–278
Article MathSciNet Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MATH MathSciNet Google Scholar
Gonzalez I, Sahli H, Enescu V, Verhelst W (2011) Context-independent facial action unit recognition using shape and gabor phase information. In: Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction - Volume Part I, Springer-Verlag, Berlin, Heidelberg, ACII’11, pp 548–557
Hou Y, Sahli H, Ravyse I, Zhang Y, Zhao R (2007) Robust shape-based head tracking. In: Advanced Concepts for Intelligent Vision Systems, LNCS 4678. Springer-Verlag, pp 340–351
el Kaliouby R (2005) Mind-reading machines: Automated inference of complex mental states
Koelstra S, Pantic M (2008) Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics. In: 2008. FG ’08. 8th IEEE International Conference on Automatic Face Gesture Recognition, pp 1–8
Kovesi P (1999) Image features from phase congruency. Videre: A J Comput Vis Res 1(3)
Lien JJJ, Kanade T, Cohn J, Li C (1999) Detection, tracking, and classification of action units in facial expression. J Robot Auton Syst
Liu X, Liang Y, Lou Y, Li H, Shan B (2010 ) Noise-robust voice activity detector based on hidden semi-markov models. In: Proceedings of the 2010 20th International Conference on Pattern Recognition, ICPR ’10, pp 81–84
Mahoor M, Cadavid S, Messinger D, Cohn J (2009) A framework for automated measurement of the intensity of non-posed facial action units. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009, pp 74–80
Messinger D, Fogel A, Dickson K (2001) All smiles are positive, but some smiles are more positive than others. Dev Psychol 37(5):642–653
Article Google Scholar
Murphy KP (2002) Dynamic bayesian networks: Representation, inference and learning. PhD thesis, UC Berkeley
Google Scholar
Ogbureke UK, Cabral JP, Carson-Berndsen J (2012) Explicit duration modelling in hmm-based speech synthesis using continuous hidden markov model. In: 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp 700–705
Platt J (1999) Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola AJ, Bartlett P, Schoelkopf B, Schuurmans D (eds) Advances in Large Margin Classifiers. MIT Press, pp 61–74
Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Reilly J, Ghent J, McDonald J (2008) Affective Computing: Emotion Modelling, Synthesis and Recognition, InTech Education and Publishing, chap Modelling, Classification and Synthesis of Facial Expressions, pp 107–132
Shi Q, Wang L, Cheng L, Smola A (2008) Discriminative human action segmentation and recognition using semi-markov model. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp 1–8
Tian Yl, Kanade T, Cohn JF (2002) Evaluation of gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity. In: Proceedings Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, pp 229–234
Tong Y, Chen J, Ji Q (2010) A unified probabilistic framework for spontaneous facial action modeling and understanding. IEEE Trans Pattern Anal Mach Intell 32 (2):258–273
Article Google Scholar
Ulukaya S, Erdem CE (2014) Gaussian mixture model based estimation of the neutral face shape for emotion recognition. Digital Signal Processing
Valstar M, Pantic M (2006) Fully automatic facial action unit detection and temporal analysis. In: Conference on Computer Vision and Pattern Recognition Workshop, 2006. CVPRW ’06., pp 149–149
Valstar M, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In: Proceedings of Int’l Conf. Language Resources and Evaluation, Workshop on Emotion, Malta, pp 65–70
Valstar MF, Pantic M (2012) Fully automatic recognition of the temporal phases of facial actions. IEEE Trans Syst Man Cybern B 42(1):28–43
Article Google Scholar
Vapnik V (1999) The Nature of Statistical Learning Theory, 2nd edn. Springer-Verlag, New York
Google Scholar
Yu SZ (2010) Hidden semi-markov models. Artif Intell:215–243
Yu SZ, Kobayashi H (2003) An efficient forward-backward algorithm for an explicit-duration hidden markov model. IEEE Signal Process Lett 10(1):11–14
Article MATH Google Scholar
Yu SZ, Kobayashi H (2006) Practical implementation of an efficient forward-backward algorithm for an explicit-duration hidden markov model. IEEE Trans Signal Process 54(5):1947–1951
Article Google Scholar
Zhang Y, Ji Q (2005) Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Trans Pattern Anal Mach Intell 27(5):699–714
Article Google Scholar
Zor C, Windeatt T (2009) Upper facial action unit recognition. In: Tistarelli M, Nixon M (eds) Advances in Biometrics, Lecture Notes in Computer Science, vol 5558. Springer Berlin Heidelberg, pp 239–248

Download references

Acknowledgments

The research reported in this paper has been partly supported by the EU FP7 project ALIZ-E (grant 248116), and the VUB-IRP EmoApp project (grant VUB-IRP5).

Author information

Authors and Affiliations

Department Electronics and Informatics, VUB-NPU Joint AVSP Lab, Vrije Universiteit Brussel (VUB), B-1050, Brussels, Belgium
Isabel Gonzalez, Francesco Cartella, Valentin Enescu & Hichem Sahli
Interuniveristy Microelectronics Center (IMEC), Kapeldreef 75, Leuven, Belgium
Hichem Sahli

Authors

Isabel Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Cartella
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Enescu
View author publications
You can also search for this author in PubMed Google Scholar
Hichem Sahli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isabel Gonzalez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gonzalez, I., Cartella, F., Enescu, V. et al. Recognition of facial actions and their temporal segments based on duration models. Multimed Tools Appl 74, 10001–10024 (2015). https://doi.org/10.1007/s11042-014-2320-8

Download citation

Received: 02 March 2014
Revised: 12 August 2014
Accepted: 10 October 2014
Published: 21 October 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s11042-014-2320-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition of facial actions and their temporal segments based on duration models

Abstract

Access this article

Similar content being viewed by others

Hidden Markov Models for Modeling Occurrence Order of Facial Temporal Dynamics

Micro-Facial Movements: An Investigation on Spatio-Temporal Descriptors

Automatic Micro-expression Recognition from Long Video Using a Single Spotted Apex

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recognition of facial actions and their temporal segments based on duration models

Abstract

Access this article

Similar content being viewed by others

Hidden Markov Models for Modeling Occurrence Order of Facial Temporal Dynamics

Micro-Facial Movements: An Investigation on Spatio-Temporal Descriptors

Automatic Micro-expression Recognition from Long Video Using a Single Spotted Apex

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation