Skip to main content
Log in

Recognition of facial actions and their temporal segments based on duration models

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Being able to automatically analyze finegrained changes in facial expression into action units (AUs), of the Facial Action Coding System (FACS), and their temporal models (i.e., sequences of temporal phases, neutral, onset, apex, and offset), in face videos would greatly benefit for facial expression recognition systems. Previous works, considered combining, per AU, a discriminative frame-based Support Vector Machine (SVM) and a dynamic generative Hidden Markov Models (HMM), to detect the presence of the AU in question and its temporal segments in an input image sequence. The major drawback of HMMs, is that they do not model well time dependent dynamics as the ones of AUs, especially when dealing with spontaneous expressions. To alleviate this problem, in this paper, we exploit efficient duration modeling of the temporal behavior of AUs, and we propose hidden semi-Markov model (HSMM) and variable duration semi-Markov model (VDHMM) to recognize the dynamics of AU’s. Such models allow the parameterization and inference of the AU’s state duration distributions. Within our system, geometrical and appearance based measurements, as well as their first derivatives, modeling both the dynamics and the appearance of AUs, are applied to pair-wise SVM classifiers for a frame-based classification. The output of which are then fed as evidence to the HSMM or VDHMM for inferring AUs temporal phases. A thorough investigation into the aspect of duration modeling and its application to AU recognition through extensive comparison to state-of-art SVM-HMM approaches are presented. For comparison, an average recognition rate of 64.83 % and 64.66 % is achieved for the HSMM and VDHMM respectively. Our framework has several benefits: (1) it models the AU’s temporal phases duration; (2) it does not require any assumption about the underlying structure of the AU events, and (3) compared to HMM, the proposed HSMM and VDHMM duration models reduce the duration error of the temporal phases of an AU, and they are especially better in recognizing the offset ending of an AU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Asteriadis S, Karpouzis K, Kollias S (2014) Visual focus of attention in non-calibrated environments using gaze estimation. Int J Comput Vis 107(3):293–316

    Article  MathSciNet  Google Scholar 

  2. Bartlett M, Viola P, Sejnowski T, Larsen J, Hager J, Ekman P (1996) Classifying facial action. In: Advances in Neural Information Processing Systems, vol 8, pp 823–829

  3. Bartlett M, Littlewort G, Lainscsek C, Fasel I, Movellan J (2004) Machine learning methods for fully automatic recognition of facial expressions and facial actions. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol 1, pp 592–597

  4. Bartlett M, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2006) Fully automatic facial action recognition in spontaneous behavior. In: Proceedings Of the 7th International conference on Face and Gesture Recognition, pp 223–230

  5. Bartlett MS, Littlewort G, Braathen B, Sejnowski TJ, Movellan JR (2002) A prototype for automatic recognition of spontaneous facial actions. In: Advances in Neural Information Processing Systems, pp 1271–1278

  6. Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal users affective state analysis in naturalistic interaction. Journal on Multimodal User Interfaces 3(1–2):49–66

    Article  Google Scholar 

  7. Chuang CF, Shih FY (2006) Rapid and brief communication: Recognizing facial action units using independent component analysis and support vector machine. Pattern Recogn 39(9):1795–1798

    Article  MATH  Google Scholar 

  8. Cohn J, Zlochower A, Lien J, Kanade T (1999) Automated face analysis by feature point tracking has high concurrent validity with manual facs coding. Psychophysiology 36:35–43

    Article  Google Scholar 

  9. Cohn JF, Zlochower AJ, Lien JJ, Kanade T (1998) Feature-point tracking by optical flow discriminates subtle differences in facial expression. In: Proceedings of the 3rd. International Conference on Face and Gesture Recognition, IEEE Computer Society, Washington, DC, USA, FG ’98, pp 396

  10. Cohn JF, Reed LI, Moriyama T, Xiao J, Schmidt K, Ambadar Z (2004) Multimodal coordination of facial action, head rotation, and eye motion during spontaneous smiles. In: Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, IEEE Computer Society, Washington, DC, USA, FGR’, vol 04, pp 129–135

  11. Ekman P, Friesen W (1975) Unmasking the face. A guide to recognizing emotions from facial clues. Prentice-Hall, New Jersey

    Google Scholar 

  12. Ekman P, Friesen W, Hager J (2002) The Facial Action Coding System on CD ROM. Network Information Research Center, Salt Lake City

    Google Scholar 

  13. Fasel B, Luettin J (2003) Automatic facial expression analysis: a survey. Pattern Recognit 36(1):259–275

    Article  MATH  Google Scholar 

  14. Forney GD (1973) The viterbi algorithm. Proc IEEE 61(3):268–278

    Article  MathSciNet  Google Scholar 

  15. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MATH  MathSciNet  Google Scholar 

  16. Gonzalez I, Sahli H, Enescu V, Verhelst W (2011) Context-independent facial action unit recognition using shape and gabor phase information. In: Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction - Volume Part I, Springer-Verlag, Berlin, Heidelberg, ACII’11, pp 548–557

  17. Hou Y, Sahli H, Ravyse I, Zhang Y, Zhao R (2007) Robust shape-based head tracking. In: Advanced Concepts for Intelligent Vision Systems, LNCS 4678. Springer-Verlag, pp 340–351

  18. el Kaliouby R (2005) Mind-reading machines: Automated inference of complex mental states

  19. Koelstra S, Pantic M (2008) Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics. In: 2008. FG ’08. 8th IEEE International Conference on Automatic Face Gesture Recognition, pp 1–8

  20. Kovesi P (1999) Image features from phase congruency. Videre: A J Comput Vis Res 1(3)

  21. Lien JJJ, Kanade T, Cohn J, Li C (1999) Detection, tracking, and classification of action units in facial expression. J Robot Auton Syst

  22. Liu X, Liang Y, Lou Y, Li H, Shan B (2010 ) Noise-robust voice activity detector based on hidden semi-markov models. In: Proceedings of the 2010 20th International Conference on Pattern Recognition, ICPR ’10, pp 81–84

  23. Mahoor M, Cadavid S, Messinger D, Cohn J (2009) A framework for automated measurement of the intensity of non-posed facial action units. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009, pp 74–80

  24. Messinger D, Fogel A, Dickson K (2001) All smiles are positive, but some smiles are more positive than others. Dev Psychol 37(5):642–653

    Article  Google Scholar 

  25. Murphy KP (2002) Dynamic bayesian networks: Representation, inference and learning. PhD thesis, UC Berkeley

    Google Scholar 

  26. Ogbureke UK, Cabral JP, Carson-Berndsen J (2012) Explicit duration modelling in hmm-based speech synthesis using continuous hidden markov model. In: 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp 700–705

  27. Platt J (1999) Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola AJ, Bartlett P, Schoelkopf B, Schuurmans D (eds) Advances in Large Margin Classifiers. MIT Press, pp 61–74

  28. Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  29. Reilly J, Ghent J, McDonald J (2008) Affective Computing: Emotion Modelling, Synthesis and Recognition, InTech Education and Publishing, chap Modelling, Classification and Synthesis of Facial Expressions, pp 107–132

  30. Shi Q, Wang L, Cheng L, Smola A (2008) Discriminative human action segmentation and recognition using semi-markov model. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp 1–8

  31. Tian Yl, Kanade T, Cohn JF (2002) Evaluation of gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity. In: Proceedings Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, pp 229–234

  32. Tong Y, Chen J, Ji Q (2010) A unified probabilistic framework for spontaneous facial action modeling and understanding. IEEE Trans Pattern Anal Mach Intell 32 (2):258–273

    Article  Google Scholar 

  33. Ulukaya S, Erdem CE (2014) Gaussian mixture model based estimation of the neutral face shape for emotion recognition. Digital Signal Processing

  34. Valstar M, Pantic M (2006) Fully automatic facial action unit detection and temporal analysis. In: Conference on Computer Vision and Pattern Recognition Workshop, 2006. CVPRW ’06., pp 149–149

  35. Valstar M, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In: Proceedings of Int’l Conf. Language Resources and Evaluation, Workshop on Emotion, Malta, pp 65–70

  36. Valstar MF, Pantic M (2012) Fully automatic recognition of the temporal phases of facial actions. IEEE Trans Syst Man Cybern B 42(1):28–43

    Article  Google Scholar 

  37. Vapnik V (1999) The Nature of Statistical Learning Theory, 2nd edn. Springer-Verlag, New York

    Google Scholar 

  38. Yu SZ (2010) Hidden semi-markov models. Artif Intell:215–243

  39. Yu SZ, Kobayashi H (2003) An efficient forward-backward algorithm for an explicit-duration hidden markov model. IEEE Signal Process Lett 10(1):11–14

    Article  MATH  Google Scholar 

  40. Yu SZ, Kobayashi H (2006) Practical implementation of an efficient forward-backward algorithm for an explicit-duration hidden markov model. IEEE Trans Signal Process 54(5):1947–1951

    Article  Google Scholar 

  41. Zhang Y, Ji Q (2005) Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Trans Pattern Anal Mach Intell 27(5):699–714

    Article  Google Scholar 

  42. Zor C, Windeatt T (2009) Upper facial action unit recognition. In: Tistarelli M, Nixon M (eds) Advances in Biometrics, Lecture Notes in Computer Science, vol 5558. Springer Berlin Heidelberg, pp 239–248

Download references

Acknowledgments

The research reported in this paper has been partly supported by the EU FP7 project ALIZ-E (grant 248116), and the VUB-IRP EmoApp project (grant VUB-IRP5).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isabel Gonzalez.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gonzalez, I., Cartella, F., Enescu, V. et al. Recognition of facial actions and their temporal segments based on duration models. Multimed Tools Appl 74, 10001–10024 (2015). https://doi.org/10.1007/s11042-014-2320-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2320-8

Keywords

Navigation