A discriminative deep association learning for facial expression recognition

  • Xing Jin
  • Wenyun Sun
  • Zhong JinEmail author
Original Article


Deep learning based facial expression recognition becomes more successful in many applications. However, the lack of labeled data is still a bottleneck for better recognition performance. Thus, it is of practical significance to exploit the rich unlabeled data for training deep neural networks (DNNs). In this paper, we propose a novel discriminative deep association learning (DDAL) framework. The unlabeled data is provided to train the DNNs with the labeled data simultaneously, in a multi-loss deep network based on association learning. Moreover, the discrimination loss is also utilized to ensure intra-class clustering and inter-class centers separating. Furthermore, a large synthetic facial expression dataset is generated and used as unlabeled data. By exploiting association learning mechanism on two facial expression datasets, competitive results are obtained. By utilizing synthetic data, the performance is increased clearly.


Facial expression recognition Association learning Deep network Synthetic facial expression 



This work is partially supported by National Natural Science Foundation of China under Grant Nos. 61872188, U1713208, 61602244, 61672287, 61702262, 61773215. Meanwhile, this work is partially supported by China Postdoctoral Science Foundation under Grant No.2018M643183.


  1. 1.
    Wan M, Yang G, Gai S, Yang Z (2017) Two-dimensional discriminant locality preserving projections (2ddlpp) and its application to feature extraction via fuzzy set. Multimedia Tools Appl 76(1):355–371CrossRefGoogle Scholar
  2. 2.
    Wan M, Li M, Yang G, Gai S, Jin Z (2014) Feature extraction using two-dimensional maximum embedding difference. Inf Sci 274:55–69CrossRefGoogle Scholar
  3. 3.
    Wan M, Lai Z, Yang G, Yang Z, Zhang F, Zheng H (2017) Local graph embedding based on maximum margin criterion via fuzzy set. Fuzzy Sets Syst 318:120–131MathSciNetCrossRefGoogle Scholar
  4. 4.
    Lai Z, Wong WK, Xu Y, Yang J, Zhang D (2015) Approximate orthogonal sparse embedding for dimensionality reduction. IEEE Trans Neural Netw Learn Syst 27(4):723–735MathSciNetCrossRefGoogle Scholar
  5. 5.
    Lai Z, Xu Y, Chen Q, Yang J, Zhang D (2014) Multilinear sparse principal component analysis. IEEE Trans Neural Netw Learn Syst 25(10):1942–1950CrossRefGoogle Scholar
  6. 6.
    Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC, et al. (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction ACM, pp 543–550Google Scholar
  7. 7.
    Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction ACM, pp 503–510Google Scholar
  8. 8.
    Zeng N, Zhang H, Song B, Liu W, Li Y, Dobaie AM (2018) Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273:643–649CrossRefGoogle Scholar
  9. 9.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105Google Scholar
  10. 10.
    Simonyan, K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
  11. 11.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9Google Scholar
  12. 12.
    Bazrafkan S, Nedelcu T, Filipczuk P, Corcoran P (2017) Deep learning for facial expression recognition: a step closer to a smartphone that knows your moods. In: 2017 IEEE International Conference on Consumer Electronics (ICCE), pp 217–220Google Scholar
  13. 13.
    Kaya H, Gürpınar F, Salah AA (2017) Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image Vision Comput 65:66–75CrossRefGoogle Scholar
  14. 14.
    Knyazev B, Shvetsov R, Efremova N, Kuharenko A (2017) Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598
  15. 15.
    Ding H, Zhou SK, Chellappa R (2017) Facenet2expnet: Regularizing a deep face recognition net for expression recognition. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp 118–126Google Scholar
  16. 16.
    Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242Google Scholar
  17. 17.
    Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434
  18. 18.
    Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: AAAI, pp 1171–1177Google Scholar
  19. 19.
    Gao Y, Ma J, Yuille AL (2017) Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples. IEEE Trans Image Process 26(5):2545–2560MathSciNetCrossRefGoogle Scholar
  20. 20.
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680Google Scholar
  21. 21.
    Lee DH (2013) Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p 2Google Scholar
  22. 22.
    Wen J, Xu Y, Li Z, Ma Z, Xu Y (2018) Inter-class sparsity based discriminative least square regression. Neural Netw 102:36–47CrossRefGoogle Scholar
  23. 23.
    Wen J, Fang X, Cui J, Fei L, Yan K, Chen Y, Xu Y (2018) Robust sparse linear discriminant analysis. IEEE Trans Circuits Syst Video Technol 29(2):390–403CrossRefGoogle Scholar
  24. 24.
    Roesch EB, Tamarit L, Reveret L, Grandjean D, Sander D, Scherer KR (2011) Facsgen: a tool to synthesize emotional facial expressions through systematic manipulation of facial action units. J Nonverbal Behav 35(1):1–16CrossRefGoogle Scholar
  25. 25.
    Ekman P, Rosenberg EL (1997) What the face reveals: basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, OxfordGoogle Scholar
  26. 26.
    Li J, Zhang D, Zhang J, Zhang J, Li T, Xia Y, Yan Q, Xun L (2017) Facial expression recognition with faster R-CNN. Procedia Comput Sci 107:135–140CrossRefGoogle Scholar
  27. 27.
    Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99Google Scholar
  28. 28.
    Hu P, Cai D, Wang S, Yao A, Chen Y (2017) Learning supervised scoring ensemble for emotion recognition in the wild. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp 553–560Google Scholar
  29. 29.
    Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231CrossRefGoogle Scholar
  30. 30.
    Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497Google Scholar
  31. 31.
    Pons G, Masip D (2018) Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition. arXiv preprint arXiv:1802.06664
  32. 32.
    Cohen I, Sebe N, Cozman FG, Huang TS (2003) Semi-supervised learning for facial expression recognition. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp 17–22Google Scholar
  33. 33.
    Zhang Z, Ringeval F, Dong B, Coutinho E, Marchi E, Schüller B (2016) Enhanced semi-supervised learning for multimodal emotion recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 5185–5189Google Scholar
  34. 34.
    Du C, Du C, Li J, Zheng Wl, Lu Bl, He H (2017) Semi-supervised bayesian deep multi-modal emotion recognition. arXiv preprint arXiv:1704.07548
  35. 35.
    Haeusser P, Mordvintsev A, Cremers D (2017) Learning by association–a versatile semi-supervised training method for neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 89–98Google Scholar
  36. 36.
    Haeusser P, Frerix T, Mordvintsev A, Cremers D (2017) Associative domain adaptation. In: Proceedings of the IEEE Conference on International Conference on Computer Vision (ICCV), pp 2765–2773Google Scholar
  37. 37.
    Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, pp 302–309Google Scholar
  38. 38.
    Langner O, Dotsch R, Bijlstra G, Wigboldus DH, Hawk ST, Van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cognit Emotion 24(8):1377–1388CrossRefGoogle Scholar
  39. 39.
    Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619CrossRefGoogle Scholar
  40. 40.
    Krinidis S, Pitas I (2006) Facial expression synthesis through facial expressions statistical analysis. In: 2006 14th European Signal Processing Conference, pp 1–5Google Scholar
  41. 41.
    Abbasnejad I, Sridharan S, Nguyen D, Denman S, Fookes C, Lucey S (2017) Using synthetic data to improve facial expression analysis with 3d convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1609–1618Google Scholar
  42. 42.
    Zhou Y, Shi BE (2017) Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp 370–376Google Scholar
  43. 43.
    Kulkarni TD, Whitney WF, Kohli P, Tenenbaum J (2015) Deep convolutional inverse graphics network. In: Advances in Neural information processing systems, pp 2539–2547Google Scholar
  44. 44.
    Dosovitskiy A, Tobias Springenberg J, Brox T (2015) Learning to generate chairs with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1538–1546Google Scholar
  45. 45.
    King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10(Jul):1755–1758Google Scholar
  46. 46.
    Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image Vis Comput 47:3–18CrossRefGoogle Scholar
  47. 47.
    Liu W, Zhang H, Tao D, Wang Y, Lu K (2016) Large-scale paralleled sparse principal component analysis. Multimedia Tools Appl 75(3):1481–1493CrossRefGoogle Scholar
  48. 48.
    Sun W, Zhao H, Jin Z (2017) An efficient unconstrained facial expression recognition algorithm based on stack binarized auto-encoders and binarized neural networks. Neurocomputing 267:385–395CrossRefGoogle Scholar
  49. 49.
    Moeini A, Moeini H (2015) Multimodal facial expression recognition based on 3D face reconstruction from 2D images. In: Face and facial expression recognition from real world videos, Springer, pp 46–57Google Scholar
  50. 50.
    Sun W, Zhao H, Jin Z (2018) A visual attention based roi detection method for facial expression recognition. Neurocomputing 296:12–22CrossRefGoogle Scholar
  51. 51.
    Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on Multimedia, pp 357–360Google Scholar
  52. 52.
    Cugu I, Sener E, AkbaS E (2017) Microexpnet: An extremely small and fast model for expression recognition from frontal face images. arXiv preprint arXiv:1711.07011
  53. 53.
    Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British Machine Vision Conference. British Machine Vision Association, p 275-1Google Scholar
  54. 54.
    Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2983–2991Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Computer Science and Engineering and Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of EducationNanjing University of Science and TechnologyNanjingChina
  2. 2.College of Electronics and Information EngineeringShenzhen UniversityShenzhenChina

Personalised recommendations