A discriminative deep association learning for facial expression recognition
Abstract
Deep learning based facial expression recognition becomes more successful in many applications. However, the lack of labeled data is still a bottleneck for better recognition performance. Thus, it is of practical significance to exploit the rich unlabeled data for training deep neural networks (DNNs). In this paper, we propose a novel discriminative deep association learning (DDAL) framework. The unlabeled data is provided to train the DNNs with the labeled data simultaneously, in a multi-loss deep network based on association learning. Moreover, the discrimination loss is also utilized to ensure intra-class clustering and inter-class centers separating. Furthermore, a large synthetic facial expression dataset is generated and used as unlabeled data. By exploiting association learning mechanism on two facial expression datasets, competitive results are obtained. By utilizing synthetic data, the performance is increased clearly.
Keywords
Facial expression recognition Association learning Deep network Synthetic facial expressionNotes
Acknowledgements
This work is partially supported by National Natural Science Foundation of China under Grant Nos. 61872188, U1713208, 61602244, 61672287, 61702262, 61773215. Meanwhile, this work is partially supported by China Postdoctoral Science Foundation under Grant No.2018M643183.
References
- 1.Wan M, Yang G, Gai S, Yang Z (2017) Two-dimensional discriminant locality preserving projections (2ddlpp) and its application to feature extraction via fuzzy set. Multimedia Tools Appl 76(1):355–371CrossRefGoogle Scholar
- 2.Wan M, Li M, Yang G, Gai S, Jin Z (2014) Feature extraction using two-dimensional maximum embedding difference. Inf Sci 274:55–69CrossRefGoogle Scholar
- 3.Wan M, Lai Z, Yang G, Yang Z, Zhang F, Zheng H (2017) Local graph embedding based on maximum margin criterion via fuzzy set. Fuzzy Sets Syst 318:120–131MathSciNetCrossRefGoogle Scholar
- 4.Lai Z, Wong WK, Xu Y, Yang J, Zhang D (2015) Approximate orthogonal sparse embedding for dimensionality reduction. IEEE Trans Neural Netw Learn Syst 27(4):723–735MathSciNetCrossRefGoogle Scholar
- 5.Lai Z, Xu Y, Chen Q, Yang J, Zhang D (2014) Multilinear sparse principal component analysis. IEEE Trans Neural Netw Learn Syst 25(10):1942–1950CrossRefGoogle Scholar
- 6.Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC, et al. (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction ACM, pp 543–550Google Scholar
- 7.Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction ACM, pp 503–510Google Scholar
- 8.Zeng N, Zhang H, Song B, Liu W, Li Y, Dobaie AM (2018) Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273:643–649CrossRefGoogle Scholar
- 9.Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105Google Scholar
- 10.Simonyan, K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
- 11.Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9Google Scholar
- 12.Bazrafkan S, Nedelcu T, Filipczuk P, Corcoran P (2017) Deep learning for facial expression recognition: a step closer to a smartphone that knows your moods. In: 2017 IEEE International Conference on Consumer Electronics (ICCE), pp 217–220Google Scholar
- 13.Kaya H, Gürpınar F, Salah AA (2017) Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image Vision Comput 65:66–75CrossRefGoogle Scholar
- 14.Knyazev B, Shvetsov R, Efremova N, Kuharenko A (2017) Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598
- 15.Ding H, Zhou SK, Chellappa R (2017) Facenet2expnet: Regularizing a deep face recognition net for expression recognition. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp 118–126Google Scholar
- 16.Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242Google Scholar
- 17.Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434
- 18.Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: AAAI, pp 1171–1177Google Scholar
- 19.Gao Y, Ma J, Yuille AL (2017) Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples. IEEE Trans Image Process 26(5):2545–2560MathSciNetCrossRefGoogle Scholar
- 20.Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680Google Scholar
- 21.Lee DH (2013) Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p 2Google Scholar
- 22.Wen J, Xu Y, Li Z, Ma Z, Xu Y (2018) Inter-class sparsity based discriminative least square regression. Neural Netw 102:36–47CrossRefGoogle Scholar
- 23.Wen J, Fang X, Cui J, Fei L, Yan K, Chen Y, Xu Y (2018) Robust sparse linear discriminant analysis. IEEE Trans Circuits Syst Video Technol 29(2):390–403CrossRefGoogle Scholar
- 24.Roesch EB, Tamarit L, Reveret L, Grandjean D, Sander D, Scherer KR (2011) Facsgen: a tool to synthesize emotional facial expressions through systematic manipulation of facial action units. J Nonverbal Behav 35(1):1–16CrossRefGoogle Scholar
- 25.Ekman P, Rosenberg EL (1997) What the face reveals: basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, OxfordGoogle Scholar
- 26.Li J, Zhang D, Zhang J, Zhang J, Li T, Xia Y, Yan Q, Xun L (2017) Facial expression recognition with faster R-CNN. Procedia Comput Sci 107:135–140CrossRefGoogle Scholar
- 27.Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99Google Scholar
- 28.Hu P, Cai D, Wang S, Yao A, Chen Y (2017) Learning supervised scoring ensemble for emotion recognition in the wild. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp 553–560Google Scholar
- 29.Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231CrossRefGoogle Scholar
- 30.Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497Google Scholar
- 31.Pons G, Masip D (2018) Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition. arXiv preprint arXiv:1802.06664
- 32.Cohen I, Sebe N, Cozman FG, Huang TS (2003) Semi-supervised learning for facial expression recognition. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp 17–22Google Scholar
- 33.Zhang Z, Ringeval F, Dong B, Coutinho E, Marchi E, Schüller B (2016) Enhanced semi-supervised learning for multimodal emotion recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 5185–5189Google Scholar
- 34.Du C, Du C, Li J, Zheng Wl, Lu Bl, He H (2017) Semi-supervised bayesian deep multi-modal emotion recognition. arXiv preprint arXiv:1704.07548
- 35.Haeusser P, Mordvintsev A, Cremers D (2017) Learning by association–a versatile semi-supervised training method for neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 89–98Google Scholar
- 36.Haeusser P, Frerix T, Mordvintsev A, Cremers D (2017) Associative domain adaptation. In: Proceedings of the IEEE Conference on International Conference on Computer Vision (ICCV), pp 2765–2773Google Scholar
- 37.Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, pp 302–309Google Scholar
- 38.Langner O, Dotsch R, Bijlstra G, Wigboldus DH, Hawk ST, Van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cognit Emotion 24(8):1377–1388CrossRefGoogle Scholar
- 39.Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619CrossRefGoogle Scholar
- 40.Krinidis S, Pitas I (2006) Facial expression synthesis through facial expressions statistical analysis. In: 2006 14th European Signal Processing Conference, pp 1–5Google Scholar
- 41.Abbasnejad I, Sridharan S, Nguyen D, Denman S, Fookes C, Lucey S (2017) Using synthetic data to improve facial expression analysis with 3d convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1609–1618Google Scholar
- 42.Zhou Y, Shi BE (2017) Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp 370–376Google Scholar
- 43.Kulkarni TD, Whitney WF, Kohli P, Tenenbaum J (2015) Deep convolutional inverse graphics network. In: Advances in Neural information processing systems, pp 2539–2547Google Scholar
- 44.Dosovitskiy A, Tobias Springenberg J, Brox T (2015) Learning to generate chairs with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1538–1546Google Scholar
- 45.King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10(Jul):1755–1758Google Scholar
- 46.Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image Vis Comput 47:3–18CrossRefGoogle Scholar
- 47.Liu W, Zhang H, Tao D, Wang Y, Lu K (2016) Large-scale paralleled sparse principal component analysis. Multimedia Tools Appl 75(3):1481–1493CrossRefGoogle Scholar
- 48.Sun W, Zhao H, Jin Z (2017) An efficient unconstrained facial expression recognition algorithm based on stack binarized auto-encoders and binarized neural networks. Neurocomputing 267:385–395CrossRefGoogle Scholar
- 49.Moeini A, Moeini H (2015) Multimodal facial expression recognition based on 3D face reconstruction from 2D images. In: Face and facial expression recognition from real world videos, Springer, pp 46–57Google Scholar
- 50.Sun W, Zhao H, Jin Z (2018) A visual attention based roi detection method for facial expression recognition. Neurocomputing 296:12–22CrossRefGoogle Scholar
- 51.Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on Multimedia, pp 357–360Google Scholar
- 52.Cugu I, Sener E, AkbaS E (2017) Microexpnet: An extremely small and fast model for expression recognition from frontal face images. arXiv preprint arXiv:1711.07011
- 53.Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British Machine Vision Conference. British Machine Vision Association, p 275-1Google Scholar
- 54.Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2983–2991Google Scholar