Advertisement

Deep convolutional BiLSTM fusion network for facial expression recognition

  • Dandan LiangEmail author
  • Huagang Liang
  • Zhenbo Yu
  • Yipu Zhang
Original Article
  • 37 Downloads

Abstract

Deep learning algorithms have shown significant performance improvements for facial expression recognition (FER). Most deep learning-based methods, however, focus more attention on spatial appearance features for classification, discarding much useful temporal information. In this work, we present a novel framework that jointly learns spatial features and temporal dynamics for FER. Given the image sequence of an expression, spatial features are extracted from each frame using a deep network, while the temporal dynamics are modeled by a convolutional network, which takes a pair of consecutive frames as input. Finally, the framework accumulates clues from fused features by a BiLSTM network. In addition, the framework is end-to-end learnable, and thus temporal information can be adapted to complement spatial features. Experimental results on three benchmark databases, CK+, Oulu-CASIA and MMI, show that the proposed framework outperforms state-of-the-art methods.

Keywords

Facial expression recognition Deep network BiLSTM Spatial–temporal features 

Notes

References

  1. 1.
    Afshar, S., Salah, A.A.: Facial expression recognition in the wild using improved dense trajectories and fisher vector encoding. In: Computer Vision and Pattern Recognition Workshops, pp. 1517–1525 (2016)Google Scholar
  2. 2.
    Agarwal, S., Santra, B., Mukherjee, D.P.: Anubhav: recognizing emotions through facial expression. Vis. Comput. 34, 1–15 (2016)Google Scholar
  3. 3.
    Bargal, S.A., Barsoum, E., Ferrer, C.C., Zhang, C.: Emotion recognition in the wild from videos using images. In: ACM International Conference on Multimodal Interaction, pp. 433–436 (2016)Google Scholar
  4. 4.
    Chi, J., Tu, C., Zhang, C.: Dynamic 3D facial expression modeling using Laplacian smooth and multi-scale mesh matching. Vis. Comput. 30(6–8), 649–659 (2014)CrossRefGoogle Scholar
  5. 5.
    Danelakis, A., Theoharis, T., Pratikakis, I.: A spatio-temporal wavelet-based descriptor for dynamic 3D facial expression retrieval and recognition. Vis. Comput. 32(6–8), 1–11 (2016)Google Scholar
  6. 6.
    Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015)Google Scholar
  7. 7.
    Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Personal. Soc. Psychol. 17(2), 124 (1971)CrossRefGoogle Scholar
  8. 8.
    Fan, Y., Lu, X., Li, D., Liu, Y.: Video-based emotion recognition using CNN–RNN and C3D hybrid networks. In: ACM International Conference on Multimodal Interaction, pp. 445–450 (2016)Google Scholar
  9. 9.
    Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.H.: Challenges in representation learning: a report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124 (2013)Google Scholar
  10. 10.
    Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)CrossRefGoogle Scholar
  11. 11.
    Guo, Y., Zhao, G., Pietikainen, M.: Dynamic facial expression recognition using longitudinal facial expression atlases. In: European Conference on Computer Vision, pp. 631–644 (2012)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  13. 13.
    Jaiswal, S., Valstar, M.: Deep learning the dynamic appearance and shape of facial action units. In: Applications of Computer Vision (WACV), pp. 1–8 (2016)Google Scholar
  14. 14.
    Jung, H., Lee, S., Yim, J., Park, S., Kim, J.: Joint fine-tuning in deep neural networks for facial expression recognition. In: IEEE International Conference on Computer Vision, pp. 2983–2991 (2015)Google Scholar
  15. 15.
    Kacem, A., Daoudi, M., Amor, B.B., Alvarezpaiva, J.C.: A novel space-time representation on the positive semidefinite cone for facial expression recognition. In: IEEE International Conference on Computer Vision, pp. 3199–3208 (2017)Google Scholar
  16. 16.
    Khorrami, P., Paine, T.L., Brady, K., Dagli, C., Huang, T.S.: How deep neural networks can improve emotion recognition on video data, pp. 619–623 (2016)Google Scholar
  17. 17.
    Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the British Machine Vision Conference, pp. 1–10 (2008)Google Scholar
  18. 18.
    LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404 (1990)Google Scholar
  19. 19.
    Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S., Feng, J.: Video-based person re-identification with accumulative motion context. In: CoRR (2017)Google Scholar
  20. 20.
    Liu, M., Li, S., Shan, S., Wang, R., Chen, X.: Deeply learning deformable facial action parts model for dynamic expression analysis. In: Asian Conference on Computer Vision, pp. 143–157 (2014)Google Scholar
  21. 21.
    Liu, M., Shan, S., Wang, R., Chen, X.: Learning expression lets on spatio-temporal manifold for dynamic facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1749–1756 (2014)Google Scholar
  22. 22.
    Lucey, P., Cohn, J.F., Kanade, T., Saragih, J.: The extended Cohn–Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Computer Vision and Pattern Recognition Workshops, pp. 94–101 (2010)Google Scholar
  23. 23.
    Metaxas, D.N., Huang, J., Liu, B., Yang, P., Liu, Q., Zhong, L.: Learning active facial patches for expression analysis. In: Computer Vision and Pattern Recognition, pp. 2562–2569 (2012)Google Scholar
  24. 24.
    Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In: Applications of Computer Vision (WACV), pp. 1–10 (2016)Google Scholar
  25. 25.
    Ofodile, I., Kulkarni, K., Corneanu, C.A., Escalera, S., Baro, X., Hyniewska, S., Allik, J., Anbarjafari, G.: Automatic recognition of deceptive facial expressions of emotion. In: CoRR (2017)Google Scholar
  26. 26.
    Sanin, A., Sanderson, C., Harandi, M.T., Lovell, B.C.: Spatio-temporal covariance descriptors for action and gesture recognition. In: IEEE Workshop on Applications of Computer Vision, pp. 103–110 (2013)Google Scholar
  27. 27.
    Saudagare, P.V., Chaudhari, D.: Facial expression recognition using neural network-an overview. Int. J. Soft Comput. Eng. (IJSCE) 2(1), 224–227 (2012)Google Scholar
  28. 28.
    Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. In: Image and Vision Computing, pp. 803–816 (2009)Google Scholar
  29. 29.
    Sikka, K., Sharma, G., Bartlett, M.: Lomo: latent ordinal model for facial analysis in videos. In: Computer Vision and Pattern Recognition, pp. 5580–5589 (2016)Google Scholar
  30. 30.
    Sikka, K., Wu, T., Susskind, J., Bartlett, M.: Exploring bag of words architectures in the facial expression domain. In: Computer Vision—ECCV 2012. Workshops and Demonstrations, pp. 250–259 (2012)Google Scholar
  31. 31.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: CoRR (2014)Google Scholar
  32. 32.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, pp. 4278–4284 (2017)Google Scholar
  33. 33.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  34. 34.
    Taini, M., Zhao, G., Li, S.Z., Pietikainen, M.: Facial expression recognition from near-infrared video sequences. In: International Conference on Pattern Recognition, pp. 1–4 (2011)Google Scholar
  35. 35.
    Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: Proceedings of the 3rd International Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, p. 65 (2010)Google Scholar
  36. 36.
    Valstar, M.F., Almaev, T., Girard, J.M., Mckeown, G.: Fera 2015 second facial expression recognition and analysis challenge. In: IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pp. 1–8 (2015)Google Scholar
  37. 37.
    Yang, P.: Learning active facial patches for expression analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2562–2569 (2012)Google Scholar
  38. 38.
    Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. In: CoRR (2014)Google Scholar
  39. 39.
    Yu, Z., Liu, Q., Liu, G.: Deeper cascaded peak-piloted network for weak expression recognition. Vis. Comput. 6–8, 1–9 (2017)MathSciNetGoogle Scholar
  40. 40.
    Yu, Z., Zhang, C.: Image based static facial expression recognition with multiple deep network learning. In: ACM on International Conference on Multimodal Interaction, pp. 435–442 (2015)Google Scholar
  41. 41.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23, 1499–1503 (2016)CrossRefGoogle Scholar
  42. 42.
    Zhang, Z., Luo, P., Chen, C.L., Tang, X.: From facial expression recognition to interpersonal relation prediction. Int. J. Comput. Vis. 126(5), 550–569 (2018)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Zhao, G., Huang, X., Taini, M., Li, S.Z., Pietikäinen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011)CrossRefGoogle Scholar
  44. 44.
    Zhao, X., Liang, X., Liu, L., Li, T., Han, Y., Vasconcelos, N., Yan, S.: Peak-piloted deep network for facial expression recognition. In: European Conference on Computer Vision, pp. 425–442 (2016)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Dandan Liang
    • 1
    Email author
  • Huagang Liang
    • 1
  • Zhenbo Yu
    • 2
  • Yipu Zhang
    • 1
  1. 1.School of Electronic and Control EngineeringChang’an UniversityXi’anChina
  2. 2.B-DAT, School of Information and ControlNanjing University of Information Science and TechnologyNanjingChina

Personalised recommendations