Abstract
In computer vision, classifying facial attributes has attracted deep interest from researchers and corporations. Deep Neural Network based approaches are now widely spread for such tasks and have reached higher detection accuracies than previously manually-designed approaches. Our paper reports how preprocessing and face image alignment influence accuracy scores when detecting face attributes. More importantly it demonstrates how the combination of a representation of the shape of a face and its appearance, organized as a sequence of convolutional neural networks, improves classification scores of facial attributes when compared with previous work on the FER+ dataset. While most studies in the field have tried to improve detection accuracy by averaging multiple very deep networks, exposed work concentrates on building efficient models while maintaining high accuracy scores. By taking advantage of the face shape component and relying on an efficient shallow CNN architecture, we unveil the first available, highly accurate real-time implementation on mobile browsers.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Deepar face features tracker for augmented reality apps (2016). http://www.deepar.com
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015), Software. https://www.tensorflow.org/
Barsoum, E., Zhang, C., Canton Ferrer, C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: ACM International Conference on Multimodal Interaction (ICMI) (2016)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Carcagnì, P., Del Coco, M., Leo, M., Distante, C.: Facial expression recognition and histograms of oriented gradients: a comprehensive study. SpringerPlus 4(1), 645 (2015). https://doi.org/10.1186/s40064-015-1427-3
Chen, J., Chen, Z., Chi, Z., Fu, H.: Facial expression recognition based on facial components detection and hog features (2014)
The Computer Vision Machine Learning Team: An on-device deep neural network for face detection (2015). https://machinelearning.apple.com/2017/11/16/face-detection.html
Dapogny, A., Bailly, K., Dubuisson, S.: Pairwise conditional random forests for facial expression recognition. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3783–3791, December 2015. https://doi.org/10.1109/ICCV.2015.431
Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 2106–2112, November 2011. https://doi.org/10.1109/ICCVW.2011.6130508
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. CoRR abs/1501.00092 (2015). http://arxiv.org/abs/1501.00092
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.H., Zhou, Y., Ramaiah, C., Feng, F., Li, R., Wang, X., Athanasakis, D., Shawe-Taylor, J., Milakov, M., Park, J., Ionescu, R., Popescu, M., Grozea, C., Bergstra, J., Xie, J., Romaszko, L., Xu, B., Chuang, Z., Bengio, Y.: Challenges in representation learning: A report on three machine learning contests. Neural Netw. 64, 59–63 (2015). https://doi.org/10.1016/j.neunet.2014.09.005, http://www.sciencedirect.com/science/article/pii/S0893608014002159, special Issue on “Deep Learning of Representations”
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017). http://arxiv.org/abs/1704.04861
Huang, D., Shan, C., Ardabilian, M., Wang, Y., Chen, L.: Local binary patterns and its application to facial image analysis: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 41(6), 765–781 (2011). https://doi.org/10.1109/TSMCC.2011.2118750
Itseez: Open source computer vision library (2015). https://github.com/itseez/opencv
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196 (2017). http://arxiv.org/abs/1710.10196
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression
Lyons, M., Akamatsu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor wavelets. In: Proceedings of the 3rd. International Conference on Face & Gesture Recognition, FG 1998, pp. 200–205. IEEE Computer Society, Washington, DC (1998). http://dl.acm.org/citation.cfm?id=520809.796143
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018). http://arxiv.org/abs/1801.04381
Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009). https://doi.org/10.1016/j.imavis.2008.08.005
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. CoRR abs/1409.4842 (2014). http://arxiv.org/abs/1409.4842
Yu, Z., Zhang, C.: Image based static facial expression recognition with multiple deep network learning. IEEE - Institute of Electrical and Electronics Engineers, November 2015, https://www.microsoft.com/en-us/research/publication/image-based-static-facial-expression-recognition-with-multiple-deep-network-learning/
Zakai, A.: Emscripten: an LLVM-to-Javascript compiler, October 2011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Livet, N., Berkowski, G. (2018). Shape and Appearance Based Sequenced Convnets to Detect Real-Time Face Attributes on Mobile Devices. In: Perales, F., Kittler, J. (eds) Articulated Motion and Deformable Objects. AMDO 2018. Lecture Notes in Computer Science(), vol 10945. Springer, Cham. https://doi.org/10.1007/978-3-319-94544-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-94544-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94543-9
Online ISBN: 978-3-319-94544-6
eBook Packages: Computer ScienceComputer Science (R0)