Skip to main content
Log in

A depth video-based facial expression recognition system utilizing generalized local directional deviation-based binary pattern feature discriminant analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Facial expression recognition from video data is considered to be a very challenging task in the research areas of computer vision, image processing, and pattern recognition. A novel approach is proposed in this paper to recognize facial expressions using depth video data. After extracting Local Directional Deviation-based Binary Pattern (LD2BP) features from depth images, the features are then extended by Generalized Discriminant Analysis (GDA) to improve them. At last, the time-sequential LD2BP-GDA features are applied with Hidden Markov Models (HMMs) for expression training and recognition. The proposed approach outperforms the conventional facial expression recognition approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Aleksic PS, Katsaggelos AK (2006) Automatic facial expression recognition using facial animation parameters and multistream HMMs. IEEE Trans Inform Sec 1:3–11

    Article  Google Scholar 

  2. M. S. Bartlett, G. Donato, J. R. Movellan, J. C. Hager, P. Ekman, and T. J. Sejnowski (1999) “Face image analysis for expression measurement and detection of deceit,” in Proceedings of the Sixth Joint Symposium on Neural Computation, pp. 8–15

  3. Bartlett MS, Movellan JR, Sejnowski TJ (2002) Face recognition by independent component analysis. IEEE Trans Neural Network 13(6):1450–1464

    Article  Google Scholar 

  4. M.D. Breitenstein, J. Jensen, C. Hoilund, T.B. Moeslund, and L. Van Gool (2009) “Head pose estimation from passive stereo images,” in proceedings of 16th Scandinavian Conference on Image Analysis, pp. 219–228

  5. M.D. Breitenstein, D. Kuettel, T. Weise, L. Van Gool, and H. Pfister (2008) “Real-time face pose estimation from single range images,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8

  6. P. Breuer, C. Eckes, and S. Muller (2007) “Hand gesture recognition with a novel IR time-of-flight range camera: a pilot study,” in Proceedings of the 3rd international conference on Computer vision/computer graphics collaboration techniques, pp. 247–260

  7. I. Buciu, C. Kotropoulos, and I. Pitas, “ICA and Gabor representation for facial expression recognition,” in Proceedings of the IEEE, pp. 855–858, 2003

  8. Q. Cai, D. Gallup, C. Zhang, and Z. Zhang, “3d deformable face tracking with a commodity depth camera,” in Proceeding of European Conference on Computer Vision, pp. -242, 2010

  9. Calder AJ, Burton AM, Miller P, Young AW, Akamatsu S (2001) A principal component analysis of facial expressions. Vis Res 41:1179–1208

    Article  Google Scholar 

  10. Calder AJ, Young AW, Keane J (2000) Configural information in facial expression perception. J Exp Psychol Hum Percept Perform 26(2):527–551

    Article  Google Scholar 

  11. Caschera MC, Ferri F, Grifoni P (2013) InteSe: an integrated model for resolving ambiguities in multimodal sentences. IEEE Trans Syst, Man, Cybernet: Syst 43(4):911–931

    Article  Google Scholar 

  12. Chang KI, Bowyer KW, Flynn PJ (2006) Multiple nose region matching for 3d face recognition under varying facial expression. IEEE Trans Patt Anal Mach Intell 28(10):1695–1700

    Article  Google Scholar 

  13. Chao-Fa C, Shin FY (2006) Recognizing facial action units using independent component analysis and support vector machine. Pattern Recogn 39:1795–1798

    Article  MATH  Google Scholar 

  14. Chen F, Kotani K (2008) Facial expression recognition by supervised independent component analysis using MAP estimation. IEICE Trans Inf Syst E91-D(2):341–350

    Article  Google Scholar 

  15. Cohen, N. Sebe, L. Chen, A. Garg, T. S. Huang, “Facial Expression Recognition from Video Sequences: Temporal and Static Modeling,” Computer Vision and Image Understanding, pp. 160–187, 2003

  16. Cohen I, Sebe N, Garg A, Chen LS, Huang TS (2003) Facial expression recognition from video sequences: temporal and static modeling. Comput Vis Image Underst 91:160–187

    Article  Google Scholar 

  17. Donato G, Bartlett MS, Hagar JC, Ekman P, Sejnowski TJ (1999) Classifying facial actions. IEEE Trans Patt Anal Mach Intell 21(10):974–989

    Article  Google Scholar 

  18. P. Dreuw, H. Ney, G. Martinez, O. Crasborn, J. Piater, J.M. Moya, and M. Wheatley (2010) “The signspeak project - bridging the gap between signers and speakers,” in Proceedings of International Conference on Language Resources and Evaluation, pp. 476–481

  19. Dubuisson S, Davoine F, Masson M (2002) A solution for facial expression representation and recognition. Signal Process Image Commun 17:657–673

    Article  Google Scholar 

  20. Ekman P, Priesen WV (1978) Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto

    Google Scholar 

  21. A. El-Yacoubi, R. Sabourin, M. Gilloux, and C.Y. Suen, “Off-Line Handwritten Word Recognition Using Hidden Markov Models,” in L.C. Jain and B. Lazzerini (eds.), Knowledge-Based Intelligent Techniques in Character Recognition, pp. 191–229, CRC Press LLC, 1999

  22. G. Fanelli, J. Gall, and L. Van Gool (2011) “Real time head pose estimation with random regression forests,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 617–624

  23. M. Gales and S. Young (2013)” The application of hidden Markov models in speech recognition,” Foundations and Trends in Signal Processing, vol. 1, no. 3, pp. 195–304

  24. H. Hamer, J. Gall, T. Weise, and L. Van Gool (2010) “An object-dependent hand pose prior from sparse training data,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. 671–678

  25. H. Hamer, K. Schindler, E. Koller-Meier, and L. Van Gool (2009) “Tracking a hand manipulating an object” in Proceedings of IEEE International Conference on Computer Vision, pp. 1475–1482

  26. L. He, X. Wang, C. Yu, and K. Wu, “facial expression recognition using embedded hidden markov model,” IEEE International Conference on Systems, Man and Cybernetics, pp. 1568–1572, 2009

  27. Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York

    Book  Google Scholar 

  28. Iddan GJ, Yahav G (2001) 3D imaging in the studio (and elsewhere…). Proc SPIE 4298:48–55

    Article  Google Scholar 

  29. Iddan GJ, Yahav G (2001) 3D imaging in the studio (and elsewhere…). Proc SPIE 4298:48–55

    Article  Google Scholar 

  30. J. Wang., Z. Liu, J. Chorowski, Z. Chen, and Y. Wu (2012) “Robust 3d action recognition with random occupancy patterns,” in Proceedings of European Conference on Computer Vision, pp. 872–885

  31. T. Jabid, M. H. Kabir, O. Chae (2010) “Local Directional Pattern (LDP) a robust image descriptor for object recognition”, in Proceedings of the IEEE Advanced Video and Signal Based Surveillance (AVSS), pp. 482–487

  32. Jalal A, Uddin MZ, Kim JT, Kim TS (2011) Recognition of human home activities via depth silhouettes and transformation for smart homes. Indoor Built Environ 21(1):184–190

    Article  Google Scholar 

  33. Karklin Y, Lewicki MS (2003) Learning higher-order structures in natural images. Netw Comput Neural Syst 14:483–499

    Article  Google Scholar 

  34. Kim D-S, Jeon I-J, Lee S-Y, Rhee P-K, Chung D-J (2006) Embedded face recognition based on fast genetic algorithm for intelligent digital photography. IEEE Trans Consum Electron 52(3):726–734

    Article  Google Scholar 

  35. Kollorz E, Penne J, Hornegger J, Barke A (2008) Gesture recognition with a time-of-flight camera. Int J Intell Syst Technol Appl 5:334–343

    Google Scholar 

  36. Koppula HS, Gupta R, Saxena A (2013) Human activity learning using object affordances from rgb-d videos. Int J Robot Res 32(8):951–970

    Article  Google Scholar 

  37. J. Lei, X. Ren, qnd D. Fox (2012) “Fine-grained kitchen activity recognition using rgb-d,” in Proceedings of ACM Conference on Ubiquitous Computing, pp. 208–211

  38. Z. Li and R. Jarvis, “Real time hand gesture recognition using a range camera,” in Proceedings of Australasian Conference on Robotics and Automation, 2009

  39. Li W, Zhang Z, Liu Z (2008) Expandable data-driven graphical modeling of human actions based on salient postures. IEEE Trans Circ Syst Video Technol 18(11):1499–1510

    Article  Google Scholar 

  40. Liu C (2004) Enhanced independent component analysis and its application to content based face image retrieval. IEEE Trans Syst, Man, Cybernet- B: Cybernet 34(2):1117–1127

    Article  Google Scholar 

  41. X. Liu and K. Fujimura (2004) “Hand gesture recognition using depth data,” in Proceedings of International Conference on Automatic Face and Gesture Recognition, pp. 529–534

  42. M. Liu, S. Shan, R. Wang, and X. Chen, “Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1749–1756, 2014

  43. X. Lu and A.K. Jain (2006) “Automatic feature extraction for multiview 3d face recognition,” in Proceedings of 7th International Conference on Automatic Face and Gesture Recognition, pp. 585–59

  44. D. D. Luong, S. Lee, and T.-S. Kim (2013) “Human Computer Interface Using the Recognized Finger Parts of Hand Depth Silhouette via Random Forests,” in Proceedings of 13th International Conference on Control, Automation and Systems, pp. 905–909

  45. M. J. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba (1998) “Coding facial expressions with Gabor wavelets,” in Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, pp.200-205(1998)

  46. Marnik J (2007) The polish finger alphabet hand postures recognition using elastic graph matching. Comput Recog Syst 2 45:454–461

    Article  Google Scholar 

  47. A. McCallum, D. Freitag, and F.C.N. Pereira (2000) “Maximum entropy markov models for information extraction and segmentation,” in Proceedings of International Conference on Machine Learning, pp. 591–598

  48. Meulders M, Boeck PD, Mechelen IV, Gelman A (2005) Probabilistic feature analysis of facial perception of emotions. Appl Stat 54(4):781–793

    MathSciNet  MATH  Google Scholar 

  49. A. Mian, M. Bennamoun, and R. Owens (2006) “Automatic 3d face detection, normalization and recognition,” in Proceedings of Third International Symposium on 3D Data Processing, Visualization, and Transmission, pp. 735–742

  50. Mitra S, Acharya T (2007) Gesture recognition: a survey. IEEE Trans Syst, Man, Cybernet-C: Appl Rev 37(3):311–324

    Article  Google Scholar 

  51. Z. Mo and U. Neumann, “Real-time hand pose recognition using low-resolution depth images,” in Proceedigns of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1499–1505, 2006

  52. L.P. Morency, P. Sundberg, and T. Darrell (2003) “Pose estimation using 3d view-based eigenspaces,” in Proceedings of IEEE International Workshop on Analysis and Modeling of Faces and Gestures, pp. 45–52

  53. Nair P, Cavallaro A (2009) 3-d face detection, landmark localization, and registration using a point distribution model. IEEE Trans Multimed 11(4):611–623

    Article  Google Scholar 

  54. I. Oikonomidis, N. Kyriazis, and A.A. Argyros (2012) “Tracking the articulated motion of two strongly interacting hands,“in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1862–1869

  55. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray scale and rotation invariant texture analysis with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24:971–987

    Article  MATH  Google Scholar 

  56. Ong S, Ranganath S (2005) Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans Patt Anal Mach Intell 27(6):873–891

    Article  Google Scholar 

  57. O. Oreifej and Z. Liu, “Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. 716–723, 2013

  58. Padgett C, Cottrell G (1997) “Representation face images for emotion classification”, advances in neural information processing systems, vol 9. MIT Press, Cambridge, MA

    Google Scholar 

  59. T. Pei, T. Starner, H. Hamilton, I. Essa, and J. Rehg (2009) “Learnung the basic units in american sign language using discriminative segmental feature selection,” in Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4757–4760

  60. J. Penne, S. Soutschek, L. Fedorowicz, and J. Hornegger, “Robust real-time 3d time-of-flight based gesture navigation,” in Proceedings of International Conference on Automatic Face and Gesture Recognition, pp. 1–2, 2008

  61. Phillips PJ, Wechsler H, Huang J, Rauss P (1998) The FERET database and evaluation procedure for face-recognition algorithms. Image Vis Comput 16:295–306

    Article  Google Scholar 

  62. Rabiner LR (1989) A tutorial on hidden Markov modes and selected application in speech recognition. Proceed IEEE 77:257–286

    Article  Google Scholar 

  63. Rahman MT, Kehtarnavaz N (2008) Real-time face-priority auto focus for digital and cell-phone cameras. IEEE Trans Consum Electron 54(4):1506–1513

    Article  Google Scholar 

  64. M. Schmidt, M. Schels, and F. Schwenker, “A hidden markov model based approach for facial expression recognition in image sequences,” In Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition, pp. 149–160, 2010

  65. E. Seemann, K. Nickel, and R. Stiefelhagen (2004) “Head pose estimation using stereo vision for human-robot interaction,” in Proceedings of Sixth IEEE International Conference on on Automatic Face and Gesture Recognition, pp. 626–631

  66. Segundo M, Silva L, Bellon O, Queirolo C (2010) Automatic face segmentation and facial landmark detection in range images. IEEE Trans Syst, Man, Cybernet, B: Cybernet 40(5):1319–1330

    Article  Google Scholar 

  67. Shan C, Gong S, McOwan P (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27:803–816

    Article  Google Scholar 

  68. S. Soutschek, J. Penne, J. Hornegger, and J. Kornhuber, “3-d gesture-based scene navigation in medical imaging applications using time-of-flight cameras,” in Proceedings of Workshop On Time of Flight Camera based Computer Vision, pp. 1–6, 2008

  69. Y. Sun and L. Yin, “Automatic pose estimation of 3d facial models,” in Proceedings of International Conference on Pattern Recognition, pp. 1–4, 2008

  70. J. Sung, C. Ponce, B. Selman, and A. Saxena (2012) “Unstructured human activity detection from rgbd images,” in Proceedings of IEEE International Conference on Robotics and Automation, pp. 842–849

  71. H. Takimoto, S. Yoshimori, Y. Mitsukura, and M. Fukumi, “Classification of hand postures based on 3d vision model for human-robot interaction,” in Proceedings of International Symposium on Robot and Human Interactive Communication, pp. 292–297, 2010

  72. Uddin MZ, Hassan MM (2013) A depth video-based facial expression recognition system using radon transform, generalized discriminant analysis, and hidden markov model. Multimed Tools Appl. doi:10.1007/s11042-013-1793-1

    Google Scholar 

  73. Uddin MZ, Lee JJ, Kim T-S (2009) An enhanced independent component-based human facial expression recognition from video. IEEE Trans Consum Electron 55(4):2216–2224

    Article  Google Scholar 

  74. M. Van den Bergh, and L. Van Gool, “Combining rgb and tof cameras for real-time 3d hand gesture interaction,” in Proceedings of IEEE Workshop on Applications of Computer Vision, pp. 66–72, 2011

  75. A. Vieira, E. Nascimento, G. Oliveira, Z. Liu, and M. Campos (2012)”Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences,” in Proceedings of Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 252–259

  76. W Li., Z. Zhang, and Z. Liu, “Action recognition based on a bag of 3d points,” in Proceedings of workshop on human activity understanding from 3D Data, pp. 9–14, 2010

  77. Y. Wang, K. Huang, and T. Tan (2007) “Human activity recognition based on r transform,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8

  78. Weise T, Bouaziz S, Li H, Pauly M (2011) Realtime performance-based facial animation. ACM Trans Graph 30(no. 4, article 77):1–10

    Article  Google Scholar 

  79. T. Weise, B. Leibe, and L. Van Gool (2007)”Fast 3d scanning with automatic motion compensation,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8

  80. Wilson AD, Bobick AF (2001) Hidden Markov models for modeling and recognizing gesture under variation. Int J Patt Recog Artif Intell - IJPRAI 15(1):123–160

    Article  Google Scholar 

  81. Yang HD, Sclaroff S, Lee SW (2009) Sign language spotting with a threshold model based on conditional random fields. IEEE Trans Patt Anal Mach Intell 31(7):1264–1277

    Article  Google Scholar 

  82. X. Yang and Y. Tian (2012) “Eigenjoints-based action recognition using naive-bayesnearest-neighbor,” in Proceedings of Workshop on Human Activity Understanding from 3D Data, pp. 14–19

  83. X. Yang, C. Zhang, and Y Tian (2012) “Recognizing actions using depth motion mapsbased histograms of oriented gradients,” in Proceedings of ACM International Conference on Multimedia, pp. 1057–1060

  84. P. Yu, D. Xu, and P. Yu (2010) “Comparison of PCA, LDA and GDA for palm print verification,” in Proceedings of the International Conference on Information, Networking and Automation, pp.148-152

  85. Zhao G, Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Patt Anal Mach Intell 29(6):915–928

    Article  Google Scholar 

Download references

Acknowledgments

This paper was supported by Faculty Research Fund, Sungkyunkwan University, 2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Zia Uddin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Uddin, M.Z. A depth video-based facial expression recognition system utilizing generalized local directional deviation-based binary pattern feature discriminant analysis. Multimed Tools Appl 75, 6871–6886 (2016). https://doi.org/10.1007/s11042-015-2614-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2614-5

Keywords

Navigation