Abstract
Being able to automatically analyze finegrained changes in facial expression into action units (AUs), of the Facial Action Coding System (FACS), and their temporal models (i.e., sequences of temporal phases, neutral, onset, apex, and offset), in face videos would greatly benefit for facial expression recognition systems. Previous works, considered combining, per AU, a discriminative frame-based Support Vector Machine (SVM) and a dynamic generative Hidden Markov Models (HMM), to detect the presence of the AU in question and its temporal segments in an input image sequence. The major drawback of HMMs, is that they do not model well time dependent dynamics as the ones of AUs, especially when dealing with spontaneous expressions. To alleviate this problem, in this paper, we exploit efficient duration modeling of the temporal behavior of AUs, and we propose hidden semi-Markov model (HSMM) and variable duration semi-Markov model (VDHMM) to recognize the dynamics of AU’s. Such models allow the parameterization and inference of the AU’s state duration distributions. Within our system, geometrical and appearance based measurements, as well as their first derivatives, modeling both the dynamics and the appearance of AUs, are applied to pair-wise SVM classifiers for a frame-based classification. The output of which are then fed as evidence to the HSMM or VDHMM for inferring AUs temporal phases. A thorough investigation into the aspect of duration modeling and its application to AU recognition through extensive comparison to state-of-art SVM-HMM approaches are presented. For comparison, an average recognition rate of 64.83 % and 64.66 % is achieved for the HSMM and VDHMM respectively. Our framework has several benefits: (1) it models the AU’s temporal phases duration; (2) it does not require any assumption about the underlying structure of the AU events, and (3) compared to HMM, the proposed HSMM and VDHMM duration models reduce the duration error of the temporal phases of an AU, and they are especially better in recognizing the offset ending of an AU.
Similar content being viewed by others
References
Asteriadis S, Karpouzis K, Kollias S (2014) Visual focus of attention in non-calibrated environments using gaze estimation. Int J Comput Vis 107(3):293–316
Bartlett M, Viola P, Sejnowski T, Larsen J, Hager J, Ekman P (1996) Classifying facial action. In: Advances in Neural Information Processing Systems, vol 8, pp 823–829
Bartlett M, Littlewort G, Lainscsek C, Fasel I, Movellan J (2004) Machine learning methods for fully automatic recognition of facial expressions and facial actions. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol 1, pp 592–597
Bartlett M, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2006) Fully automatic facial action recognition in spontaneous behavior. In: Proceedings Of the 7th International conference on Face and Gesture Recognition, pp 223–230
Bartlett MS, Littlewort G, Braathen B, Sejnowski TJ, Movellan JR (2002) A prototype for automatic recognition of spontaneous facial actions. In: Advances in Neural Information Processing Systems, pp 1271–1278
Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal users affective state analysis in naturalistic interaction. Journal on Multimodal User Interfaces 3(1–2):49–66
Chuang CF, Shih FY (2006) Rapid and brief communication: Recognizing facial action units using independent component analysis and support vector machine. Pattern Recogn 39(9):1795–1798
Cohn J, Zlochower A, Lien J, Kanade T (1999) Automated face analysis by feature point tracking has high concurrent validity with manual facs coding. Psychophysiology 36:35–43
Cohn JF, Zlochower AJ, Lien JJ, Kanade T (1998) Feature-point tracking by optical flow discriminates subtle differences in facial expression. In: Proceedings of the 3rd. International Conference on Face and Gesture Recognition, IEEE Computer Society, Washington, DC, USA, FG ’98, pp 396
Cohn JF, Reed LI, Moriyama T, Xiao J, Schmidt K, Ambadar Z (2004) Multimodal coordination of facial action, head rotation, and eye motion during spontaneous smiles. In: Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, IEEE Computer Society, Washington, DC, USA, FGR’, vol 04, pp 129–135
Ekman P, Friesen W (1975) Unmasking the face. A guide to recognizing emotions from facial clues. Prentice-Hall, New Jersey
Ekman P, Friesen W, Hager J (2002) The Facial Action Coding System on CD ROM. Network Information Research Center, Salt Lake City
Fasel B, Luettin J (2003) Automatic facial expression analysis: a survey. Pattern Recognit 36(1):259–275
Forney GD (1973) The viterbi algorithm. Proc IEEE 61(3):268–278
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Gonzalez I, Sahli H, Enescu V, Verhelst W (2011) Context-independent facial action unit recognition using shape and gabor phase information. In: Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction - Volume Part I, Springer-Verlag, Berlin, Heidelberg, ACII’11, pp 548–557
Hou Y, Sahli H, Ravyse I, Zhang Y, Zhao R (2007) Robust shape-based head tracking. In: Advanced Concepts for Intelligent Vision Systems, LNCS 4678. Springer-Verlag, pp 340–351
el Kaliouby R (2005) Mind-reading machines: Automated inference of complex mental states
Koelstra S, Pantic M (2008) Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics. In: 2008. FG ’08. 8th IEEE International Conference on Automatic Face Gesture Recognition, pp 1–8
Kovesi P (1999) Image features from phase congruency. Videre: A J Comput Vis Res 1(3)
Lien JJJ, Kanade T, Cohn J, Li C (1999) Detection, tracking, and classification of action units in facial expression. J Robot Auton Syst
Liu X, Liang Y, Lou Y, Li H, Shan B (2010 ) Noise-robust voice activity detector based on hidden semi-markov models. In: Proceedings of the 2010 20th International Conference on Pattern Recognition, ICPR ’10, pp 81–84
Mahoor M, Cadavid S, Messinger D, Cohn J (2009) A framework for automated measurement of the intensity of non-posed facial action units. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009, pp 74–80
Messinger D, Fogel A, Dickson K (2001) All smiles are positive, but some smiles are more positive than others. Dev Psychol 37(5):642–653
Murphy KP (2002) Dynamic bayesian networks: Representation, inference and learning. PhD thesis, UC Berkeley
Ogbureke UK, Cabral JP, Carson-Berndsen J (2012) Explicit duration modelling in hmm-based speech synthesis using continuous hidden markov model. In: 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp 700–705
Platt J (1999) Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola AJ, Bartlett P, Schoelkopf B, Schuurmans D (eds) Advances in Large Margin Classifiers. MIT Press, pp 61–74
Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Reilly J, Ghent J, McDonald J (2008) Affective Computing: Emotion Modelling, Synthesis and Recognition, InTech Education and Publishing, chap Modelling, Classification and Synthesis of Facial Expressions, pp 107–132
Shi Q, Wang L, Cheng L, Smola A (2008) Discriminative human action segmentation and recognition using semi-markov model. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp 1–8
Tian Yl, Kanade T, Cohn JF (2002) Evaluation of gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity. In: Proceedings Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, pp 229–234
Tong Y, Chen J, Ji Q (2010) A unified probabilistic framework for spontaneous facial action modeling and understanding. IEEE Trans Pattern Anal Mach Intell 32 (2):258–273
Ulukaya S, Erdem CE (2014) Gaussian mixture model based estimation of the neutral face shape for emotion recognition. Digital Signal Processing
Valstar M, Pantic M (2006) Fully automatic facial action unit detection and temporal analysis. In: Conference on Computer Vision and Pattern Recognition Workshop, 2006. CVPRW ’06., pp 149–149
Valstar M, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In: Proceedings of Int’l Conf. Language Resources and Evaluation, Workshop on Emotion, Malta, pp 65–70
Valstar MF, Pantic M (2012) Fully automatic recognition of the temporal phases of facial actions. IEEE Trans Syst Man Cybern B 42(1):28–43
Vapnik V (1999) The Nature of Statistical Learning Theory, 2nd edn. Springer-Verlag, New York
Yu SZ (2010) Hidden semi-markov models. Artif Intell:215–243
Yu SZ, Kobayashi H (2003) An efficient forward-backward algorithm for an explicit-duration hidden markov model. IEEE Signal Process Lett 10(1):11–14
Yu SZ, Kobayashi H (2006) Practical implementation of an efficient forward-backward algorithm for an explicit-duration hidden markov model. IEEE Trans Signal Process 54(5):1947–1951
Zhang Y, Ji Q (2005) Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Trans Pattern Anal Mach Intell 27(5):699–714
Zor C, Windeatt T (2009) Upper facial action unit recognition. In: Tistarelli M, Nixon M (eds) Advances in Biometrics, Lecture Notes in Computer Science, vol 5558. Springer Berlin Heidelberg, pp 239–248
Acknowledgments
The research reported in this paper has been partly supported by the EU FP7 project ALIZ-E (grant 248116), and the VUB-IRP EmoApp project (grant VUB-IRP5).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gonzalez, I., Cartella, F., Enescu, V. et al. Recognition of facial actions and their temporal segments based on duration models. Multimed Tools Appl 74, 10001–10024 (2015). https://doi.org/10.1007/s11042-014-2320-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2320-8