Abstract
Deriving an effective facial expression recognition component is important for a successful human-computer interaction system. Nonetheless, recognizing facial expression remains a challenging task. This paper describes a novel approach towards facial expression recognition task. The proposed method is motivated by the success of Convolutional Neural Networks (CNN) on the face recognition problem. Unlike other works, we focus on achieving good accuracy while requiring only a small sample data for training. Scale Invariant Feature Transform (SIFT) features are used to increase the performance on small data as SIFT does not require extensive training data to generate useful features. In this paper, both Dense SIFT and regular SIFT are studied and compared when merged with CNN features. Moreover, an aggregator of the models is developed. The proposed approach is tested on the FER-2013 and CK+ datasets. Results demonstrate the superiority of CNN with Dense SIFT over conventional CNN and CNN with SIFT. The accuracy even increased when all the models are aggregated which generates state-of-art results on FER-2013 and CK+ datasets, where it achieved 73.4% on FER-2013 and 99.1% on CK+.
T. Connie and M. Al-Shabi—These authors contributed equally to this work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Berretti, S., et al.: A set of selected SIFT features for 3D facial expression recognition. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 4125–4128 (2010)
Dosovitskiy, A., et al.: Discriminative unsupervised feature learning with exemplar convolutional neural networks. ArXiv:14066909 Cs. (2014)
Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971)
Goodfellow, I.J. et al.: Challenges in representation learning: a report on three machine learning contests. ArXiv:13070414 Cs Stat. (2013)
He, K., et al.: Deep residual learning for image recognition. ArXiv:151203385 Cs. (2015)
He, K., et al.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. ArXiv:150201852 Cs. (2015)
Kahou, S.E., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of 15th ACM on International Conference on Multimodal Interaction, pp. 543–550. ACM, New York (2013)
Khorrami, P., et al.: Do deep neural networks learn facial action units when doing expression recognition? ArXiv:151002969 Cs. (2015)
Kim, B.-K., et al.: Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J. Multimodal User Interfaces 10(2), 173–189 (2016)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. ArXiv:14126980 Cs. (2014)
Li, J., Lam, E.Y.: Facial expression recognition using deep neural networks. In: 2015 IEEE International Conference on Imaging Systems and Techniques (IST), pp. 1–6 (2015)
Liu, C., Wechsler, H.: Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process. 11(4), 467–476 (2002)
Liu, M., et al.: AU-inspired deep networks for facial expression feature learning. Neurocomputing 159, 126–136 (2015)
Lopes, A.T., et al.: Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recogn. 61, 610–628 (2017)
Lowe, D.G.: Object Recognition from local scale-invariant features. In: Proceedings of 7th IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999)
Lucey, P., et al.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Workshops, pp. 94–101 (2010)
Maas, A.L., et al.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)
Mollahosseini, A., et al.: Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10 (2016)
Ng, H.-W., et al.: Deep learning for emotion recognition on small datasets using transfer learning. In: Proceedings of 2015 ACM on International Conference on Multimodal Interaction, pp. 443–449. ACM, New York (2015)
Shan, C., et al.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Sun, B., et al.: Facial expression recognition in the wild based on multimodal texture features. J. Electron. Imaging 25(6), 061407 (2016)
Tang, Y.: Deep learning using linear support vector machines. ArXiv:13060239 Cs Stat. (2013)
Wang, Z., Ying, Z.: Facial expression recognition based on local phase quantization and sparse representation. In: 2012 8th International Conference on Natural Computation (ICNC), pp. 222–225 (2012)
Whitehill, J., Omlin, C.W.: Haar features for FACS AU recognition. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR 2006), pp. 5–101 (2006)
Xu, M., et al.: Facial expression recognition based on transfer learning from deep convolutional networks. In: 2015 11th International Conference on Natural Computation (ICNC), pp. 702–708 (2015)
Yu, Z., Zhang, C.: Image based static facial expression recognition with multiple deep network learning. In: Proceedings of 2015 ACM on International Conference on Multimodal Interaction, pp. 435–442. ACM, New York (2015)
Zhang, T., et al.: A deep neural network driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimed. 18(12), 1 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Connie, T., Al-Shabi, M., Cheah, W.P., Goh, M. (2017). Facial Expression Recognition Using a Hybrid CNN–SIFT Aggregator. In: Phon-Amnuaisuk, S., Ang, SP., Lee, SY. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture Notes in Computer Science(), vol 10607. Springer, Cham. https://doi.org/10.1007/978-3-319-69456-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-69456-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69455-9
Online ISBN: 978-3-319-69456-6
eBook Packages: Computer ScienceComputer Science (R0)